5,388 Matching Annotations
  1. Feb 2024
    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This manuscript introduces an exciting way to measure SARS-CoV-2 aerosolized shedding using a disposable exhaled breath condensate collection device (EBCD). The paper draws the conclusion that the contagious shedding of the virus via aerosol route persists at a high level 8 days after symptoms.

      Strengths:

      The methodology is potentially of high importance and the paper is clearly written. The study design is clever. If aerosolized viral load kinetics truly differed from those of nasal swabs, then this would be a very important finding.

      Thank you for your encouraging remarks. We agree that a comparison between aerosolized viral load and nasal swabs would strengthen our findings, and we have collected new specimens which will enable this comparison: In each session we collected both nasal swabs and exhaled breath samples, and we are in the process of analyzing these data. These data will be included in our revised manuscript.

      Weaknesses:

      The study conclusions are not entirely supported by the data for several reasons:

      (1) Most data points in the study are relatively late during infection when viral loads from other compartments (nasal and oral swabs) are typically much lower than peak viral loads which often occur in the pre-symptomatic or early symptomatic phase of infection. Moreover, the generation time for SARS-CoV-2 has been estimated to be 3-4 days on average meaning that most infections occur before or very early during symptoms. Therefore, the available epidemiologic data does not support 12 days of infection (day 8 symptoms) as important for most transmissions. Therefore, many of the measurement timepoints in this study may not be relevant for transmission.

      Thank you for your comment. Notably, our new data set includes a small number of specimens that were collected prior to the start of symptoms, and so we may be able to partially address this concern with those data. That said, we agree that a limitation of our study is that we were unable to collect specimens prior to symptom onset, and that this pre-symptomatic period represents a fruitful area for future work. However, significant questions do remain open regarding transmission dynamics of SARS-CoV-2, including the extent of transmission after symptom onset, and therefore, despite this limitation of our data, we feel that our method may contribute to further understanding of those dynamics. However, we will include a more prominent discussion of this limitation in the revised manuscript.

      (2) Fig 1A would be more powerful as a correlation plot between viral load from nasal samples (x-axis) and aerosol (y-axis). One would expect at least a rough correlation (as has been seen between viral loads in oral and nasal samples) and deviations from this correlation would provide crucial information about how and when aerosol shedding is discordant from nasal samples (ie early vs late time points, low versus high viral loads< etc...). It is too strong to state correspondence is 100% when viral load is only measured in one compartment and nasal swabs are reduced to the oversimplified "positive or negative".

      Thank you for this suggestion, we agree that the figure would be more powerful as a correlation plot between viral load from nasal samples and aerosol. Unfortunately, at the time these samples were collected, the ER at Northwestern Hospital was diagnosing SARS-CoV-2 patients using the Abbott ID NOW rapid diagnostic platform, which, despite being a PCR-based system, does not provide quantitative information about viral loads, and instead provides a binary positive/negative result. Since we were looking for a direct comparison between the clinical diagnostic test and our test, we considered the binary aspect of our data (detected/undetected), and found 100% correspondence, meaning that when the clinical test detected SARS-CoV-2, our test did too. We have collected additional data which includes quantitative PCR values from nasal swabs collected at the same time as breath samples and we will include these data in the format you suggest, once analyzed, in our revised manuscript.

      (3) Results are reported in RNA copies which is fine but particle-forming units (pfu, or quantitative culture) are likely a more accurate surrogate of infectivity. It is quite possible that all of these samples would have been negative for pfu given that the ratio of RNA: pfu is often >1000 (though also dynamic over time during infection). This could be another indicator that most samples in the study were collected too late during infection to represent contagious time points.

      We agree that culturing exhaled breath samples would be an important addition to our understanding of the transmission dynamics of SARS-CoV-2 and we consider this to be an important next step for our method. Because we did not perform culturing of our breath samples in this study, we avoided making claims about infectivity of our samples in this manuscript, and instead speculate about the future utility of our method in understanding transmission dynamics, once an appropriate surrogate of infectivity is performed. We will make sure this is clearer in the revised manuscript. That said, other groups have successfully cultured breath samples with corresponding CT values in a range that are well within the range we found in our study, and sufficient for transmission (for example, Alsved et al, 2023, CT range ~33-38). These studies support the idea that a significant portion of the viral RNA measured in our samples may come from viable virus. Therefore, quantifying the ratio of viable to nonviable virus in our samples is an important next step. We appreciate this comment, and we will add a clearer discussion of this point to the revised manuscript.

      (4) Individual kinetic curves should be shown for participants with more than three time points to demonstrate whether there are clear kinetic trends within individuals that would help further validate this approach. The inclusion of single samples from individuals is less informative.

      We will add individual kinetic curves to the revised manuscript.

      (5) The S-shaped model in 2A is somewhat misleading as it is fit to means but there is tremendous variability within the data. Therefore the 8-day threshold should be listed clearly as a mean but not a rule for all individuals. The statement that viral RNA copies do not decrease until 8 days from symptom onset is unlikely to be true for all infected people and can't be made based on the available data in this study given that many people contributed only one datapoint.

      We will clarify the language in the manuscript and make limitations of the 8-day interpretation clearer.

      (6) The incubation period for SARS-CoV-2 is highly variable. Therefore duration of symptoms is a rather poor correlate of the duration of infection. This further diminishes the interpretive value of positive samples from individuals who were only sampled once.

      We will add a discussion of this point to the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Lane and colleagues measured the abundance of SARS-CoV-2 on breath in 60 outpatients after the development of COVID-19 symptoms using a novel breath collection apparatus. They found that, overall, viral abundance remains high for approximately eight days following the development of symptoms, after which viral abundance on breath drops to a low level that may persist for approximately 20 days or more. They did not identify significant differences in viral shedding on breath by vaccination status or viral variant. They also noted substantial variation in the degree and duration of shedding across individuals.

      Strengths:

      The primary strengths of this study are (1) the focus on breath, rather than the more traditional nasal/oropharyngeal swabs, and (2) the fact that the data were collected at multiple time points for each infection. This allows the authors to characterize not only mean viral abundance across individuals but also how that abundance changes over time, allowing for a better understanding of the potential duration of infectiousness of SARS-CoV-2.

      Weaknesses:

      The sample size is moderate (60) and focuses only on outpatients. While these are minor weaknesses (as the authors note, the majority of SARS-CoV-2 transmission likely occurs among those with symptoms below the threshold of hospitalization), it would nevertheless be useful to have a fuller understanding of variation in viral shedding across clinical groups.

      We agree this would be very interesting and feel our method, which is straightforward to perform in clinical settings, lends itself to future studies across clinical groups. We have added discussion of this to the discussion section of the manuscript.

      Furthermore, the study lacks information on viral shedding prior to the development of symptoms, which may be a critical period for transmission. Since the samples were collected at home by study participants using a novel apparatus, it is difficult to assess the degree to which actual variation in viral abundance, user variability, and/or measurement variation is inherent to the apparatus.

      This is a great point, which we will discuss in our revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Reviewer #1’s main concerns revolved around the evidential strength of the study’s conclusion that age-specific effects of birth weight on brain structure are more localized and less consistent across cohorts than age-uniform, stable effects. Specifically, the reviewer points out the evidence (or lack of such) for age-specific effects. We have rearticulated as a “bullet-point summarization” the reviewer’s concerns for a better response (please, see the original reviewer’s response in the annexed document). We thank the reviewer for his/her comment.

      Concern #1: No direct statistical comparisons are conducted between samples (beyond the spin-tests).

      In the initial version of the manuscript, the spin-tests represented a key test since they compared the spatial distribution of birth weight effects across samples. In the revised manuscript, we additionally perform a replicability analysis across samples both for birth weight effects on brain characteristics and on brain change in a similar fashion as described for the within-sample analysis. The results of these analyses provide complementary evidence of robust associations of birth weight effects on cortical characteristics (for area and volume, less so for thickness) and of unreliable associations of birth weight on cortical change. These analyses are briefly mentioned in the main document and fully described as supplementary information. Briefly, the effects of birth weight on cortical area and cortical volume showed high (exploratory and confirmatory) replicability while replicability was almost nonexistent for the effects of birth weight on cortical change. See below, under Reviewer #1, concern #2, for a description of the changes in the revised manuscript.

      Concern #2: The differential composition of samples in terms of age distribution leads to the possibility that lack of results is explained by methodological differences.

      The revised version of the manuscript provides now a within-sample replicability analysis of the birth weight effects on cortical change. This analysis addresses the reviewer’s concern as the lack of replicability in this analysis cannot be attributed to sample or methodological differences. We thank the reviewer for suggesting this analysis which provides further quantification of the (lack of) robustness of the birth weight effects on cortical change. See below for changes in the revised version of the manuscript concerning additional replicability analyses which were carried out as a response to reviewer #1 concerns #1 and #2.

      pp. 12-3. “Additionally, we performed replicability analyses both across and within samples to further investigate the robustness of the effects of birth weight on cortical characteristics and cortical change. Split-half analyses within datasets were performed, to investigate the replicability of significant effects 36,37 of BW on cortical characteristics within samples (refer to Figure 1). These analyses further confirmed that the significant effects were largely replicable for volume and area, but not for thickness (see Supplementary Figure 11). Split-half analyses of BW on cortical change (refer to Figure 2) showed, in general, a very low degree of replicability on the three different cortical measures. See Supplementary Table 3. Replicability across datasets showed a similar pattern, that is, replicability was high for the effect of brain weight on cortical characteristics but very low for the effects of cortical change. See Supplementary Table 4 for stats. See Supplementary statistical methods for a full description of the analyses. These analyses provide complementary evidence of robust associations of BW with cortical area and volume – but not cortical change - across and within samples.”

      p. 41. “For each dataset and cortical measure, we assessed the effects of birth weight on cortical structure and cortical change (…)”

      p. 42. “Across samples replicability was performed as described in the within-sample replicability analysis (i.e., we assessed the exploratory and confirmatory replicability) except that split-half was not performed - the three datasets were compared with each other - and the analyses were performed in the original fsaverage space.”

      pp. 54-55. “The exploratory replicability of birth weight on cortical change was negligible across datasets and measures [.00 (.00), .00 (.00), .00 (.00) for area, .02 (.09), .00 (.02), .01 (.03) for volume, and .01 (.05), .01 (.14), .00 (.01) for thickness] while confirmatory replicability was generally poor, except for the ABCD dataset [.02 (.05), .68 (.35), .00 (.00) for area, .08 (.14), .56 (.25), .00 (.02) for volume, and .37 (.26), .60 (.27), .01 (.03) for thickness] (see Supplementary Table 3).

      These results are not fully comparable to other studies assessing the replicability of brain phenotype associations due to analytical differences (e.g. sample size, multiple-comparison correction method)20,36, yet clearly show that the rate of replicability of BW associations with cortical area and volume are comparable to benchmark brain-phenotype associations such as body-mass index and age68. Lower levels of replicability in the LCBC subsample are likely attributable to higher sample variability (e.g. increased age span). Kinship may lead to inflated patterns of replicability within the ABCD cohort. Confirmatory replicability is, also, to some degree, affected by sample size, and thus the estimates of confirmatory replicability may be somewhat inflated in the ABCD dataset.

      Finally, the degree of across-sample replicability was high for the effects of birth weight on cortical area and volume (average confirmatory replicability = .96 and .93), low for thickness (.27), and negligible for the effects of birth weight on cortical change (.03, .06, and .06). See further information in Supplementary Table 4.”

      Concern #3: Some datasets have a narrow age range precluding the detection of age-related effects.

      We do not believe concern #3 is a major problem since timebirth weight refers to a within subject contrast, e.g., longitudinal-only-based contrast. Birth weight, even when self reported, is a highly reliable measure and the sample sizes are relatively large (n = 635, 1759, and 3324 unique individuals). Note that the smaller dataset does have longer follow-up times and more observations per participant, increasing the reliability of estimations in individual change. Structural MRI measures have very high reliability. Clearly, longitudinal brain change is less reliable, yet the present sample size and the high reliability of birth weight should provide enough statistical power to capture even small time-varying effects of birth weight on brain structure. Note as well that in each model age is treated as a covariate. Rather, the consistency of timebirth weight (that is, the effects of birth weight on cortical change) is assessed with split-half replications within and across samples. In this methodological pipeline, a narrow age range for a given dataset, if anything, may constitute an advantage. We have clarified the statistical model (see changes in the revised manuscript, referred to in response to reviewer #1, concern #5).

      Concern #4 The modeling strategy does not allow for non-linear interaction between age and BW suggesting the use of spline models instead in a mega-analytical fashion.

      Indeed, we agree that some - if not most - brain structures follow non-linear trajectories throughout life. In the present study, age regressors are used only for accounting for variance in the data rather than capturing any effect of interest. Rather, it is the time*birth weight regressor that captures age-varying changes in brain structure. Time reflects within-subject follow-up time. We believe non-linear modeling of age will only account for additional variance (compared to linear models) in the LCBC dataset given the dataset’s wider age range, while it will not have any consequential effect in the ABCD and UKB datasets (as predicted in the provisional response). In any case, we recognize it as a valid concern. Consequently, we have rerun the main models in an ROI-based fashion using or not using spline models to fit age. Specifically, we have fitted the models in each of Desikan-Killiany’s ROIs using generalized additive mixed models (GAMM with age as a smooth term) or linear mixed models (LME with age as a linear regressor). The results are shown in Supplementary Figures 13 and 14. The Beta regressors are nearly identical. As expected, the differences are noticeable in the LCBC dataset while the effect of using - or not using- splines to fit age is almost null in the other two datasets. See also FDR-corrected maps below for both birth weight effects on brain structure and brain change (we opted to show Beta-maps as supplementary material as the multiple-comparisons correction in the ROI-based analysis is not fully comparable with the one used in the vertex-wise approach).

      p. 9: “Both birth weight effects on cortical characteristics and cortical change were rerun (ROIwise) using spline models that accounted for possible non-linear effects of age on cortical structure. The results were comparable to those reported above in Figures 1 and 2. See Supplementary Figures 13 and 14 for birth weight effects on cortical characteristics and cortical change, respectively.”

      Caption to Supplementary Figure 13. “Comparison between spline (GAMM) and linear (LME) models on the effect of birth weight on cortical characteristics. Age was fitted either as a smoothing spline using generalized additive mixed models (GAMM, mgcv r-package) or a linear regressor with a linear mixed models (LME, lmer r-package) framework. The analyses were performed ROI-wise using the Desikan-Killiany atlas. Significance was considered at a FDR corrected threshold of p < 0.04. All the remaining parameters were comparable to the main analyses shown in Figure 1. The viridis-yellow scale represents the lower-higher Beta regressors. Red contour displays regions showing significant effects of birth weight. Note the high correspondence with both fitting models. Differences are only noticeable in the LCBC sample due to the datasets’ wider age range (i.e., lifespan dataset).” Caption to Supplementary Figure 14. “Comparison between spline (GAMM) and linear (LME) models on the effect of birth weight on cortical change. Age was fitted either as a smoothing spline using generalized additive mixed models (GAMM, mgcv r-package) or a linear regressor with a linear mixed models (LME, lmer r-package) framework. The analyses were performed on ROI-based using the Desikan-Killiany atlas. Significance was considered at a FDR corrected threshold of p < 0.04. All the remaining parameters were comparable to the main analyses shown in Figure 1. The viridis-yellow scale represents the lower-higher Beta regressors. Red contour displays regions showing significant effects of birth weight. Note the high correspondence with both fitting models. Differences are only noticeable in the LCBC sample due to the datasets’ wider age range (i.e., lifespan dataset).” The figures below show the birth weight effects on brain characteristics (above) and change (below) using a GAMM or an LME approach; that is, using age as a smooth term or as a regressor. FDR-corrected p < 0.05 values are shown in a signed logarithmic scale. Red-yellow values represent positive associations between birth weight and brain while blue-lightblue values represent negative associations. The results are qualitatively comparable and quantitative differences exist only in the LCBC dataset. Please see Supplementary Figures 13 and 14 in the revised manuscript.

      Author response image 1.

      Concern #5: Greater clarity regarding the statistical models and the provision of effect-size maps.

      The revised manuscript provides additional information regarding the statistical model, especially in the results section, to avoid misunderstanding (see below examples of clarifications in the revised manuscript). We now provide Beta-maps, F-maps, unthresholded p-values maps, and degrees of freedom for the main univariate analyses. That is, we provide this information for both the whole sample and the twin analyses which correspond to Figures 1, 2, 4, and 5. We opted not to compute effect-size estimates (e.g. partial eta-squared, cohen’s d) due to the ambiguous interpretation of these maps in the context of linear mixed models.

      p.8. “To test the effect of birth weight on cortical change we rerun the analyses with BW x time and age x time interactions. Note BW x time (i.e., within-subject follow-up time) represents the contrasts of interest while age – and age interactions – are used to account for differences in age across individuals.”

      p.11. “In contrast, the spatial correlation of the maps capturing BW-associated cortical change (i.e., BW x time contrast) …”

      p. 12. “Additionally, we performed replicability analysis both across and within samples to further investigate the robustness of the effects of birth weight on cortical characteristics and cortical change.”

      p. 14: “BW discordance analyses on twins specifically were run as described for the main analyses above, with the exception that twin scans were reconstructed using FS v6.0.1. for ABCD and the addition of the twin’s mean birth weight as a covariate.”

      p .31. “Group-level unthresholded p-maps, F-maps, Beta-maps, and degrees of freedom for the univariate analyses accompany this manuscript as additional material.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For the Authors):

      (1) While not absolutely necessary - it would be nice to see at least at the in-situ level what happens to the handful of other HC-important transcription factors in the Rbm24 KO (IKZF2, Barlh1, RFX) as the authors did look at Insm1.

      Reply: Thanks for your suggested experiments. We agree that knowing whether the genes that are known to be involved in cell survival regulation are changed will provide insights into the mechanisms underlying cell death of Rbm24-/- HCs. Our data showed that Ikzf2 seemed to be upregulated when in the Rbm24-/- HCs, relative to Rbm24+/+ HCs at P5. We also tested Barlh1 and RFX, but we did not obtain confident data to present. Nonetheless, following the reviewer’s logic, we further tested Gata3, another gene involved in HC survival, and found that Gata3 was down-regulated in Rbm24 -/- HCs, compared to Rbm24+/+ HCs. Please refer to the text on lines 12-22 on page 12 and lines 1-10 on page 13, and Figure 3-figure supplement 1.

      (2) Major comments: The nomenclature for mouse gene vs. mouse protein needs to be addressed throughout the manuscript. The nomenclature when referring to a mouse gene: gene symbols are italicized, with only the first letter in upper-case (e.g. Rbm24).

      The nomenclature when referring to a mouse protein: Protein symbols are not italicized, and all letters are in upper-case (e.g. RBM24).

      Reply: Thanks for pointing it out. In the entire manuscript, we have followed the reviewer’s comments to list gene and protein.

      (3) Supplemental Figure 2D: Individual data points should be displayed on the bar graph via dots. SEM is not appropriate for this graph as SEM precision with only 3 samples is low. Furthermore, readers are more interested in knowing the variability within samples and not proximity of mean to the population mean, therefore standard deviation (SD) should be used instead.

      Reply: We have edited the Figure 1-figure supplement 2D, as suggested. The Figure 1figure supplement 2 legend was updated, too. Please refer to line 21-22 on page 32.

      (4) Red/Green should be avoided, especially when both are on the same image (merged immunofluorescence images that are found throughout the manuscript). I highly recommend changing to a color-blind friendly color scheme (such as cyan/green/magenta, cyan/magenta/yellow, etc.) for inclusivity.

      Reply: Thanks for pointing it out. We have changed the red to magenta in all our Figures and figure supplements.

      (5) Minor comments: As CRISPR-stop is a major method used throughout the paper, a brief explanation is needed for readers to understand what this methodology entails and why it was used. Something along the lines of," The CRISPR-stop technique allows for the introduction of early stop codons without the induction of DNA damage via Cas9 which can cause deleterious effects".

      Reply: We have further elaborated how CRISPR-stop works and its advantages. Please refer to lines 8-13 on page 5.

      (6) Page 5; line 5 - "Phenotypes occur earlier..." Grammar

      Reply: The grammar error was corrected. Please refer to line 4, page 5.

      (7) Page 5; line 5 - "Given Pou4f3 is the upstream regulator..." Not proven, rephrase

      Reply: We have rephrased this sentence. Please refer to lines 5-6 on page 5.

      (8) Supplemental 1A: Fine, Proof of knockout, I wouldn't mention INSM1 being "irregular"

      Reply: We have rephrased this sentence. Please refer to lines 2-3 on page 6.

      (9) Page 5; line21 - "Alignment of Insm1+ OHCs was not as regular..." Not a good description

      Reply: We have rephrased this sentence. Please refer to lines 2-3 on page 6.

      (10) Page 6; line11 - "Rbm24 was completely absent.." Redundancy with line 9

      Reply: Thanks for pointing it out, and we have removed the redundant sentence.

      (11) Page 7 - HA tag should be indicated originally as: Hemaglutinin (HA)

      Reply: We have switched “HA” to “Hemaglutinin (HA)”. Please refer to line 15, page 7.

      (12) Page 9, line 11- "Determine if autonomous/noncell autonomous." Disagree, cells still clustered in supplemental fig 4.

      Reply: We have removed this sentence.

      Reviewer #2 (Recommendations For The Authors):

      The writing of the manuscript is adequate, but it would certainly be improved by professional editing.

      Reply: Thanks for the reviewer’s encouraging comments. The revised version of our manuscript has been edited by an English native speaker.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript from Richter et al. is a very thorough anatomical description of the external sensory organs in Drosophila larvae. It represents an important tool for investigating the relationship between the structure and function of sensory organs. Using improved electron microscopy analysis and digital modeling, the authors provide compelling evidence offering the basis for molecular and functional studies to decipher the sensory strategies of larvae to navigate through their environment.

      Public Reviews:

      Summary

      This is a very meticulous and precise anatomical description of the external sensory organs (sensillia) in Drosophila larvae. Extending on their previous study (Rist and Thum 2017) that analyzed the anatomy of the terminal organ, a major external taste organ of fruit fly larva, the authors examined the anatomy of the remaining head sensory organs - the dorsal organ, the ventral organ, and the labial organ-also described the sensory organs of the thoracic and abdominal segments. Improved serial electron microscopy and digital modeling are used to the fullest to provide a definitive and clear picture of the sensory organs, the sensillia, and adjacent ganglia, providing an integral and accurate map, which is dearly needed in the field. The authors revise all the data for the abdominal and thoracic segments and describe in detail, for the first time, the head and tail segments and construct a complete structural and neuronal map of the external larval sensilla.

      Strengths

      It is a very thorough anatomical description of the external sensory organs of the genetically amenable fruitfly. This study represents a very useful tool for the research community that will definitely use it as a reference paper. In addition to the classification and nomenclature of the different types of sensilla throughout the larval body, the wealth of data presented here will be valuable to the scientific community. It will allow for investigating sensory processing in depth. Serial electron microscopy and digital modeling are used to the fullest to provide a comprehensive, definitive, and clear picture of the sensory organs. The discussion places the anatomical data into a functional and developmental frame. The study offers fundamental anatomical insights, which will be helpful for future functional studies and to understand the sensory strategies of Drosophila larvae in response to the external environment. By analyzing different larval stages (L1 and L3), this work offers some insights into the developmental aspects of the larval sense organs and their corresponding sensory cells.

      Weaknesses

      There are no apparent weaknesses, although it is not a complete novel anatomical study. It revisits many data that already existed, adding new information. However, the repetitiveness of some data and prior studies may be avoided for easy readability.

      We would like to thank the reviewers for their respective reviews. The detailed comments and efforts have helped us to improve our manuscript. In the following, we have listed the comments one by one and provide the respective information on how we addressed the concerns.

      Recommendations for the authors:

      We have tried to address every single comment as far as possible. In order to structure our response a little better, we have listed the relevant page number and the original comments once again. Directly following this you will find our response and a description of what we have changed in the manuscript.

      REVIEWER #1 (Recommendations For The Authors):

      I have a few comments that will help the reader navigate this long and detailed paper.

      REVIEWER 1.1. page 4

      The final section of "the Structural organization of Drosophila larvae" needs some reorganization.

      Specifically:

      "The DO and the TO are prominently located on the tip of the head lobes" Can the authors rewrite the sentence in a way that it is clear that there is one DO and one VO on each side of the head? Check at the beginning of each section, please. There is a mention about hemi-segments but it is still confusing.

      Done – replaced with “The largest sense organs of Drosophila larvae are arranged in pairs on the right and left side of the head.”

      REVIEWER 1.2. page 5

      "The sequence of sensilla is always similar for and different between T1, T2-T3, and A1-A7" This sentence is not clear, please break it into two sentences.

      Done – replaced with: “We noticed varying arrangements for T1, T2-T3, and A1-A7, with a consistent sequence of sensilla in each configuration.”

      REVIEWER 1.3. figures page 4

      Double hair can't be found in Figure 1B or C (is it h3, h4?) - please clarify.

      Done - changed to double hair organ in page 11, included double hair sketch in legend in figure 1B. We changed the name of the structure to double hair organ, to clarify that this is a compound sensillum consisting of two individual sensilla.

      REVIEWER 1.4. page 5

      The authors go back and forth in their descriptions of the different sensory organs. Knob sensilla and then papilla sensilla are discussed and then a few lines later a further description is done. Please unify the description of each separately.

      Done – we restructured the whole section.

      REVIEWER 1.5. figures page 6

      "We found three hair sensilla on T1-T3, and "two" on A1-A7" - in the figure there seem to be "four" on A1-A7.

      Done – we included the two hair sensilla of the double hair organ

      REVIEWER 1.6. figures page 6

      DORSAL ORGAN:

      Can the authors explain the colour map meaning in Figure 2A? It is explained in 2C but the image already has colours. Add your sentence "Color code in A applies to all micrographs in this Figure".

      Done – we added a sentence to explain that the color code in A applies to the whole figure.

      REVIEWER 1.7. page 6

      Page 10: which comprises seven olfactory sensilla "composing" three dendrites each: replace this with"with". At the end, we want to think 7 X 3= 21 ORNs.

      Done – replaced.

      REVIEWER 1.8. page 9

      CHORDOTONAL ORGANS:

      "We find these these DO associated ChO (doChO).. .". Please remove one "these"

      Done – removed.

      REVIEWER 1.9. page 8

      Is the DO associated ChO part of the dorsal ganglion???? It does not look like it. Could you clarify?

      Done – we added a sentence that clarifies that the ChO neuron is not iside the DOG.

      REVIEWER 1.10. page 9 VENTRAL ORGAN: A figures page 12

      Please add to the Figure 8 legend the description of 8c' and 8c'?

      Done – added description in figure legend.

      B page 9

      8H, what are the *, arrows? Please clarify - it is hard to interpret the figure.

      Done – we added parentheses in the figure legend that state which structures the asterisks and arrows indicate.

      C page 9

      "Three of them are innervated by a single neuron () and one by two neurons () (Figure 8F-I). Please add which are innervated by 1 (VO1, VO2-VO4) and which by 2 (VO3).

      Done – we added parentheses that clarify which sensilla are innervated by 1 or 2 neurons.

      REVIEWER 1.11. page 9

      Can you add something (or speculate) about the difference in sensory processing of the different types of sensilla?

      Done – new sentence in discussion:

      ‘Their different size and microtubule organization likely correlate with processing of different stimulus intesities applied to the mechanotransduction apparatus (Bechstedt et al. 2010).’

      REVIEWER 1.12. figures page 16

      PAPILLA AND HAIR SENSILLA:

      FIGURE 10a, please add the name of each sensillum from p1, p2, px py, etc... (if not we have to go back to figure 1 when you describe specific ps.)

      Thanks for the comment, it really makes it a lot easier for the reader.

      REVIEWER 1.13. figures page 18 Figure 11, can you add the name of each hair, please?

      Done – updated figure.

      REVIEWER 1.14. figures pages 16, 18, 20

      In Figures 10, 11, and 12 you clearly draw an area on the internal side that I assume is what you call the "electron-dense sheath". It is wider in papilla sensilla than in hair sensilla, most likely due to the difference in stimuli sensed that you explain in detail in the discussion. Can you say in the figure what this "internal" thing is? Can you add this difference to your list "Apart from the difference in outer appearance and structure of the tubular body"?

      This is the basal septum, but it is not certain that it is wider in the papillae sensillae, at least we could not observe this in our data sets. The impression could have been created by different scales in the 3D reconstructions and a perspective view. Therefore, we do not want to list this as a difference here, as we are not sure.

      However, we have now specified the socket septum in the figure legends and in Figures 10A, 11A and 12A.

      REVIEWER 1.15. page 11

      KNOB SENSILLA:

      Page 25;" Knob sensilla have been described under "vaious" names such as": add various.

      Done

      REVIEWER 1.16. page 12

      "reveals that the three hair and the two papilla sensilla are associated with a single dendrite." Can you write that "reveals THAT EACH OF the three hair and the two papilla sensilla" if not it seems that there is only one dendrite.

      Done

      REVIEWER 1.17. figures page 25 TERMINAL SENSORY CONES:

      Please name the t1-t7 cones in Figure 15A.

      Done – we updated the figure.

      REVIEWER 1.18. page 13

      The spiracle sense organ deserves a new paragraph. As does the papilla sensillum of the anal plate.

      Done – we added subtitles before the prargraphs.

      Discussion:

      REVIEWER 1.19. page 15

      Page 38: "v'entral" correct typo

      PAGE 15

      Done – we have updated the nomenclature  ventral 1 (v), ventral 2 (v’) and ventral 3 (v’’)

      REVIEWER #2 (Recommendations For The Authors):

      I have only a few comments:

      REVIEWER 2.1. page 5

      p.5, right column, middle: the use of trichoid, campaniform, and basiconical (sensilla) in previous works were based on even older papers and reviews that attempted to link EM architecture to function (e.g., KEIL, T. A. & STEINBRECHT, R. A. (1986). Mechanosensitive and olfactory sensilla of insects. In Insect Ultrastructure, vol. 2. (ed. R. C. King & H. Akai), pp. 477-516. New York/London: Plenum Press). Trichoid sensilla can be mechano-sensitive, olfactory, or gustatory; trichoid simply refers to the shape (hair). The same applies to basiconical sensilla. The use of "campaniform", which Ghysen et al called "papilla sensilla", was the only really problematic case, because these (Drosophila larval) sensilla did not really resemble closely the classical campaniform sensilla (e.g., adult haltere). The only reason we called them campaniform is because they were not more similar to any other type of (previously named) sensillum.

      Thank you for the explanation. The nomenclature of structures is generally always a complex topic with often different approaches and principles. We are aware of this and have therefore tried to be as careful as possible. We were not sure from this comment whether you were suggesting to change the text or whether you wanted to explain how these names were assigned to the sensilla in the past. However, we hope that the current version is in line with your understanding, but could of course make changes if necessary (see also comments of reviewer 1).

      REVIEWER 2.2. page 9

      p.21, Labial Organ: the ventral lip is the labium; the dorsal one is the labrum.

      Done – replaced labrum with labium.

      REVIEWER 2.3. page 9

      p.20/21, Ventral organ and labial organ: here, the projection of the axons could be mentioned as an ordering principle. In the previous literature, for larva and embryo, a labial organ (lbo) was described that most likely corresponds to the labial organ presented here. This (previously mentioned) lbo characteristically projects along the labial nerve to the labial segment (hence the name). It fasciculates with axons of another sensory complex, also generated by the labial segment, namely the ventral pharyngeal sensory organ (VPS). Does the labial organ described here share this axonal path?

      Yes, it has the same axonal pathway and is the same organ as the lbo. We have tried to standardise the nomenclature for all important external head organs (DO, TO, VO, LO) and have therefore used abbreviations with two letters. However, to avoid confusion, we have now added that the LO was also called lbo in the past.

      For the ventral organ, the segmental origin (to my knowledge) was never clarified. The axons of the ventral organ project along the maxillary nerve (which carries axons of the terminal=maxillary organ). This nerve, closely before entering the VNC, splits into a main branch to the maxillary segment (TO axons) and a thinner branch that appears to target the mandibular segment. This branch could contain the axons of the ventral organ (as described previously and in this paper). Could the authors confirm this axonal projection of the VO?

      In this work, we did not focus on the axonal projections into the SEZ. This is also not a simple and fast process, as in the entire larval dataset, the large head nerves unfortunately exhibit a highly variable quality of representation. Therefore, the reconstruction of nerves and individual neurons within it is often challenging and very time-consuming. The research question is, of course, very intriguing, and one could also attempt to match each sensory neuron of the periphery with the existing map of the brain connectome. However, this is a project in itself, exceeding the scope of this work, and is therefore more feasible as a subsequent project.

      REVIEWER #3 (Recommendations For The Authors):

      Minor suggestions that the authors might consider:

      REVIEWER 3.1. figures all

      Recheck the scale bar in figures and figure legends. Missing in a few places.

      Done – we replaced or added some (missing) scale bars in figures and figure legends (see annotated figure document).

      REVIEWER 3.2. figures page 4

      The color schematic in Figure 1 can be improved for readability.

      Done – we changed the color schematic, especially for the head region to improve readability.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript titled "Coevolution due to physical interactions is not a major driving force behind evolutionary rate covariation" by Little et al., explores the potential contribution of physical interaction between correlated evolutionary rates among gene pairs. The authors find that physical interaction is not the main driving of evolutionary rate covariation (ECR). This finding is similar to a previous report by Clark et al. (2012), Genome Research, wherein the authors stated that "direct physical interaction is not required to produce ERC." The previous study used 18 Saccharomycotina yeast species, whereas the present study used 332 Saccharomycotina yeast species and 11 outgroup taxa. As a result, the present study is better positioned to evaluate the interplay between physical interaction and ECR more robustly.

      Strengths & Weaknesses:

      Various analyses nicely support the authors' claims. Accordingly, I have only one significant comment and several minor comments that focus on wordsmithing - e.g., clarifying the interpretation of statistical results and requesting additional citations to support claims in the introduction.

      We are pleased the reviewer found the analyses to support the claims. We have addressed comments related to clarifying interpretations as suggested in the Recommendations to the Authors. For example, we have added discussion and clarification on the other parameters that could affect the strength of ERC correlations.

      Reviewer #2 (Public Review):

      Summary:

      The authors address an important outstanding question: what forces are the primary drivers of evolutionary rate covariation? Exploration of this topic is important because it is currently difficult to interpret the functional/mechanistic implications of evolutionary covariation. These analyses also speak to the predictive power (and limits) of evolutionary rate covariation. This study reinforces the existing paradigm that covariation is driven by a varied/mixed set of interaction types that all fall under the umbrella explanation of 'co-functional interactions'.

      Strengths:

      Very smart experimental design that leverages individual protein domains for increased resolution.

      Weaknesses:

      Nuanced and sometimes inconclusive results that are difficult to capture in a short title/abstract statement.

      We appreciate the reviewer’s acknowledgement of the experimental design. We have addressed the nuance of the results by changing the title and clarifying other statements throughout the manuscript as suggested in the reviewer’s recommendations. We have also addressed reviewer comments asking for further explanation on using Fisher transformations when normalizing the Pearson correlations for branch counts.

      Reviewer #3 (Public Review):

      Summary:

      The paper makes a convincing argument that physical interactions of proteins do not cause substantial evolutionary co-variation.

      Strengths:

      The presented analyses are reasonable and look correct and the conclusions make sense.

      Weaknesses:

      The overall problem of the analysis is that nobody who has followed the literature on evolutionary rate variation over the last 20 years would think that physical interactions are a major cause of evolutionary rate variation. First, there have been probably hundreds of studies showing that gene expression level is the primary driver of evolutionary rate variation (see, for example, [1]). The present study doesn't mention this once. People can argue the causes or the strength of the effect, but entirely ignoring this body of literature is a serious lack of scholarship. Second, interacting proteins will likely be co-expressed, so the obvious null hypothesis would be to ask whether their observed rates are higher or lower than expected given their respective gene expression levels. Third, protein-protein interfaces exert a relatively weak selection pressure so I wouldn't expect them to play much role in the overall evolutionary rate of a protein.

      We thank the reviewer for their comments and suggestions. A point to immediately clarify is that the methods studied in this manuscript deal with rate variation of individual proteins over time, and if that variation correlates with that of another protein.. The numerous studies the reviewer refers to deal with explaining the differences in average rate between proteins. These are different sources of variation. It has not, to our knowledge, been shown that variation in the expression level of a single protein over time is responsible for its variation in evolutionary rate over time, let alone to a degree that allows its variation to correlate with that of a functionally related protein. That question interests us, but it is not the focus of this study.

      In our study, we sought to test for a contribution of physical interaction to the correlation of evolutionary rate changes as they vary over time, i.e. between branches. We made many changes to clarify this distinction in our revisions.

      We agree that the manuscript would be more clear to define the forces proposed to lead to difference in rate in general, which includes expression levels. We had generally considered expression level as one of the many potential non-physical forces, but failed to make that explicit and instead focused on selection pressure. In our revision we describe expression level as another potential driver of evolutionary rate variation over time. References to previous literature have been made in the introduction. We also added a more explicit explanation of the rate covariation over time that we are measuring in contrast with the association between expression level and rate differences between proteins that was studied in previous literature.

      On point 3, the authors seem confused though, as they claim a co-evolving interface would evolve faster than the rest of the protein (Figure 1, caption). Instead, the observation is they evolve slower (see, for example, [2]). This makes sense: A binding interface adds additional constraint that reduces the rate at which mutations accumulate. However, the effect is rather weak.

      The values in Fig 1B are a measure of correlation, specifically a Fisher transformed correlation coefficient. They are not evolutionary rates, so they are not reflecting faster or slower evolution, rather more or less covariation of evolutionary rates over time. We are not predicting that physically interacting interfaces evolve faster than the rest of the protein, but rather that if physical interaction drives covariation in evolutionary rates over time, their correlation would be stronger between pairs of physically interacting domains. In response, we have used clearer language in the figure caption and reorganized labels in Figure 1B to clearly show that the values are correlations. Revised Figure 1 Legend:

      “Overview of experimental schema and hypotheses. Proteins that share functional/physical relationships have similar relative rates of evolution across the phylogeny, as shown in (A) with SMC5 and SMC6. The color scale along the bottom indicates the relative evolutionary rate (RER) of the specific protein for that species compared to the genome-wide average. A higher (red) RER indicates that the protein is evolving at a faster rate than the genome average for that branch. Conversely, a lower (blue) RER indicates that protein is evolving at a slower rate than the genome average. The ERC (right) is a Pearson correlation of the RERs for each shared branch of the gene pair. (B) Suppose the correlation in relative evolutionary rates between two proteins is due to compensatory coevolution and physical interactions. In that case, the correlation of their rates (ie. ERC value) would be higher for just the amino acids in the physically interacting domain. (C) Outline of experimental design. Created with Biorender.com

      All in all, I'm fine with the analysis the authors perform, and I think the conclusions make sense, but the authors have to put some serious effort into reading the relevant literature and then reassess whether they are actually asking a meaningful question and, if so, whether they're doing the best analysis they could do or whether alternative hypotheses or analyses would make more sense.

      [1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4523088/

      [2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4854464/

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      (1) Numerous parameters influence ECR calculation. The authors note that their use of a large dataset of budding yeast provides sufficient statistical power to calculate ECR. I agree with that. However, a discussion of other parameters needs to be improved, especially when comparing the present study to others like Kann et al., Hakes et al., and Jothi et al.. For example, what is the evolutionary breadth and depth used in the Kann, Hakes, Jothi and other studies? How does that compare to the present study? Budding yeast evolve rapidly with gene presence/absence polymorphisms observed in genes otherwise considered universally conserved. Is there any reason to expect different results in a younger, slower-evolving clade such as mammals? There is potential to acknowledge and discuss other parameters that may influence ECR, such as codon optimization and gene/complex "essentiality," among others.

      More discussion of these parameters is a good idea. We have added the number and phylogeny of species used in the previous studies in the discussion paragraph starting with “Previous studies attributed varying degrees of evolutionary rate covariation signal to physical interactions between proteins.” We also like the idea of studying the effect of younger and more slowly evolving clades as opposed to the contrary, but currently we lack the required number of datasets to do this.

      We have also added more discussion and clarification of potential non-physical forces leading to ERC correlations in the introduction.

      Minor comments

      (1) It would be good to add a citation to the second sentence of the first paragraph, which reads, "It has been observed that some genes have rates that covary with those of other genes and that they tend to be functionally related."

      Added citation to Clark et al. 2012

      (2) In the last sentence of the first paragraph of the introduction, ERC is discussed in the context of only amino acid divergence, however, there is no reason that DNA sequences can't be used, especially if ERC is being calculated among species that are less ancient than, for example, Saccharomycotina yeasts. Thus, it may be more accurate to suggest that ERC measures how correlated branch-specific rates of sequence divergence are with those of another gene.

      Nice suggestion to generalize. We have made this change.

      (3) ERC was not calculated in reference #2. For the sentence "Protein pairs that have high ERC values (i.e., high rate covariation) are often found to participate in shared cellular functions, such as in a metabolic pathway2 or meiosis3 or being in a protein complex together," I think more appropriate citations (including inspiring work by the corresponding author) would be

      a) Coevolution of Interacting Fertilization Proteins (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000570)

      b) Evolutionary rate covariation analysis of E-cadherin identifies Raskol as a regulator of cell adhesion and actin dynamics in Drosophila (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007720)

      c) An orthologous gene coevolution network provides insight into eukaryotic cellular and genomic structure and function (https://www.science.org/doi/10.1126/sciadv.abn0105)

      d) PhyKIT: a broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data (https://academic.oup.com/bioinformatics/article/37/16/2325/6131675)

      Thank you for pointing out these works. We agree that there are more appropriate citations and we have referenced your suggested b-d.

      (4) The dataset of 343 yeast species also includes outgroup taxa. Therefore, indicating that 332 species are Saccharomycotina yeast and 11 are closely related outgroup taxa may be more accurate.

      Thank you for the suggestion, the following sentence has been added, citing the Shen et. al 2018 paper that the dataset was derived from:

      “To investigate the discrepancy between contributions to ERC signal from co-function and physical interaction, we used a dataset of 343 evolutionarily distant yeast species. 332 of the species are Saccharomycotina with 11 closely related outgroup species providing as much evolutionary divergence as humans to roundworms3”

      (5) Are there statistics/figures to support the claim that "Almost all complexes and pathways had mean ERC values significantly greater than a null distribution consisting of random protein pairs"?

      This is shown in supplementary figure 1. A reference to this figure was added as well as quantification within the text.

      (6)Similar to the previous comment, can quantitative values be added to the statement "While protein complexes appear to have higher mean ERC scores than the pathways..."?

      The median of the mean ERC scores for protein complexes is 5.366 while the median for the mean ERC score in pathways is 4.597. This quantification has been included in the text: “While protein complexes have higher mean ERC scores (median 5.366) than the pathways (median 4.597), the members of a given complex are also co-functional, making interpretation of the relative contribution of physical interactions to the average ERC score difficult”

      These quantifications are were also added to the figure caption for figure 2A

      (7) A semantic point: In the sentence "The lack of significance in the global permutation test shows that the...", I recommend saying that the analysis suggests, not shows, because there is potential for a type II error.

      Good suggestion, we have made this change.

      (8) The authors suggest that shared evolutionary pressures, "and hence shared levels of constraint," drive signatures of coevolution. The manuscript does not delve into selection measures (e.g., dN/dS). Perhaps it would be more representative to remove any implication of selection.

      We have added better language to clarify that discussion of selection is purely a hypothesis and that selection is not probed in our analyses.

      “Previous work finds evidence that relaxation of selective constraint can lead to drastic rate variation and hence covariation6. Rather, the greater and consistent contribution comes from non-physical interaction drivers that could include variation in essentiality, expression level, codon adaptation, and network connectivity. These non-physical forces would be under shared selective pressures and hence shared levels of constraint, the result of which was elevated ERC between non-interacting proteins, as visible in our study of genetic pathways that do not physically interact (Figure 2).”

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      -Title: In my opinion, the title of the manuscript is a somewhat misleading summary of the results of this paper. In the majority of the analyses in this paper, physical interactions do account for a significantly outsized portion of the ERC signature. The current title downplays the consistent (although sometimes small effect-sized) result that physically interacting domains do show higher ERC than non-physically interacting domains by every statistical measure employed in this paper to compare physical vs non-physical interactions. The authors' interpretation of their results within the manuscript body is that the effect of physical interactions is an inconsistent, weak, and non-generalizable driver of ERC. I generally agree with the authors' interpretations, but the nuance of these interpretations is lost in the title of the paper. I would suggest rewording the title to try to capture the nuance or at least be subjectively accurate. For example, stating that "...physical interactions are not the sole driving force.." is inarguably accurate based on these results.

      As an alternative title, I would suggest focusing on an important takeaway from the paper: ERC is a reliable predictor of co-functional interactions but not necessarily physical interactions. I agree with the statement that "there is not a strong enough signal to confidently call an interaction physical or not and would be of little value to an experimentalist wanting to infer interacting domains" and I think that a title that emphasizes this idea would be more accurate and impactful.

      Great suggestion. We agree that the current title is downplaying the minutiae of the method and the signal we capture with it, we have used your suggested title.

      There are an outsized number of complexes that had ROC-AUCs greater than 0.5 which is why we performed the permutation tests to determine how significant each of the individual ROC-AUCs were given the differing number of protein/domain pairs in each complex. Between the statistical methods used only 3 of the 17 complexes ranked physical interactions significantly higher than non-physically interacting domains in every analysis. Even among the 3 that were statistically significant some of the physically interacting domains still fell among the bottom portion of the ERC scores for that complex (Figure 5: MCM and CUL8 complexes) This is why we concluded that physical interactions are not the sole driving force of the signal captured by ERC.

      -Abstract: related to my preceding comment, the word "negligible" in the abstract is misleading. If physical interactions were truly entirely negligible, the comparisons of physically interacting vs non-physically interacting domains would yield 0.5. Instead, these comparisons always yielded results greater than 0.5. Consider rewording.

      Thank you for the suggestion this phrasing has been changed to “Therefore, we conclude that coevolution due to physical interaction is weak, but present in the signal captured by ERC”

      We agree that “negligible” may be too strong of a word, however, the comparisons do not always yield results greater than 0.5.

      5 of the 17 complexes do not reach the 0.5 threshold for the initial ROC analysis and even among those that do, only 4 had significantly high ROC-AUCs. You are correct that the signal is not completely negligible which is why we continued by determining if the physical interaction was driving high ERC only within proteins (Figure 5)

      -Figure 3: I think there may be an error in the domain labeling in Figure 3. The comparison between OKP1_2 and AME1_3 is the highest ERC value in the matrix. From the complex structure, it appears that OKP1_2 and AME1_3 are two helix domains that appear to physically interact. However, in the ERC matrix, they are not shaded to indicate they are a physical interaction pair. Please double-check that the interacting domains are properly annotated, since mis-annotation would have a large impact on the interpretation of this figure with respect to the overall question the paper addresses.

      Thank you for catching this - fixed.

      Minor comments:

      -Methods: "The full ERC pipeline can be found at (Github)." Provide github URL here? Thanks for the catch, fixed

      -Discussion: "Evidence for physical coevolution however was tempered by a global permutation test, which did not reach significance, indicating that this inference is sensitive to approach and further underlines the relatively weak contribution of physical coevolution." The word "relatively" may not be a good choice of words. In comparison to what? As is, the phrasing could be interpreted as implying "in comparison to non-physical interactions". This would not be accurate, because the results show that in general, physical interactions are a stronger contributor to ERC (consistent trend but varied significance, depending on methodology) than non-physical interactions.

      Thank you for your help with clarification. The word relatively was removed.

      However, we do not agree that in general physical interactions are a stronger contributor to ERC than non-physical interactions (such as gene expression, codon adaptation, etc.). In all of our statistical tests a maximum of 5 of the 17 complexes ranked physical interactions significantly higher than non-physical interactions. While the ROC-AUC is greater than 0.5 for 12 of the 17 complexes only 4 of those were significant.

      -I have not seen Fisher-transformed correlation coefficients used in the context of ERC. I understand that it's helpful in normalizing the results so that they are comparable between ERC comparisons with differing numbers of overlapping branches (i.e. points on a linear correlation plot). A reference of where the authors got this idea or a little more verbiage to describe the rationale would be helpful. On a related note, I would expect that using linear correlation p-value instead of R-squared would account for differences in overlapping branches, eliminating the need to apply fisher-transformation. It would be helpful for the authors to outline their rationale for using a correlation coefficient rather than a P-value.

      We agree that this method could be made clearer. We made a methodological choice to use Fisher transformation over linear correlation p-value. Both methods should achieve the same end result by taking the number of branches into consideration. We have added additional explanation to the results section “Both protein pathways and complexes have elevated ERC”:

      “ERC was calculated for all pairs of the 12,552 genes. For each pair the correlation is Fisher transformed to normalize for the number of shared branches that contribute to the correlation. This normalization is necessary to reduce false positives that have high correlation solely due to a small number of data points. This normalization also allows for direct comparison of ERC between gene pairs that have differing numbers of branches contributing to the score.”

      We also added additional explanation in the methods section including the formula used to calculate the Fisher transformation

      -Did the authors use Pearson or Spearman correlation coefficient?

      Pearson. We clarified this in the methods section, “Calculating evolutionary rate covariation” : “Evolutionary rate covariation is calculated by correlating relative evolutionary rates (RERs) between two gene trees using a Pearson correlation.”

      -Did the authors explore ERC between domains within a single protein? Do domains within a protein exhibit ERC? I would expect that they do. If they do, this could likely be attributed to linkage/genetic hitchhiking, representing a new angle/factor beyond physical interaction that could lead to ERC. This is just an idea for a future analysis, not necessarily a request within the scope of the present paper.

      We did calculate the ERC between domains of a single protein but did not include them in the analysis since they didn’t address the specific question we posed. As expected they are highly correlated, and past unpublished studies in the lab do find a very weak, but detectable genome-wide, signature of rate covariation between neighboring colinear genes on a chromosome. That signal was however so weak as to be eclipsed by true functional relationships, when present.

      Reviewer #3 (Recommendations For The Authors):

      Please read the literature and revise accordingly.

      We understand the confusion surrounding previous literature on the relationship between expression levels and evolutionary rates when comparing between different proteins. Those studies clearly showed how expression level is highly predictive of a given protein’s average evolutionary rate. However, we are studying the change in evolutionary rate over branches for single proteins. This is inherently different because we’re following rate fluctuations in the same protein over time. To our knowledge it has not yet been shown that expression level commonly varies enough over time to produce large rate variations over time in the same protein, and if it is responsible for the correlations of rate we observe between co-functional proteins. It is however reasonable to expect that what governs between-protein differences in rate could also contribute to between-branch differences (over time for a single protein). In fact, our earlier study approached this (Clark et al. Genome Research 2012). We expect expression level could influence rate over time and lump its effect together with general non-physical forces, such as selection pressures. We recognize we could do better in defining more of the non-physical forces and the past literature. We added the following section to the introduction and many other clarifying statements throughout the manuscript:

      “For the purposes of this study, the forces that contribute to correlated evolutionary rates are grouped into two bins, physical and non-physical. The physical force is coevolution occurring at physical interaction interfaces. Non-physical forces include gene co-expression, codon adaptation, selective pressures, and gene essentiality. There is a well accepted negative relationship between gene expression and rate of protein evolution where genes that are highly expressed generally have slower rates of evolution14,15. However, Cope et al.16 found that there is a weak relationship between both gene expression and the number of interactions a protein has with the coevolution of expression level. Conversely, they found a strong relationship between proteins that physically interact and the coevolution of gene expression. These findings illuminate the difference between the strong relationship of gene expression level on the average evolutionary rate of a protein and the weak contribution of gene expression level to correlated evolutionary rates of proteins across branches. The finding that physically interacting proteins have strong expression level coevolution brings to question how much coevolution of physically interacting proteins contributes to overall covariation in protein evolutionary rates.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides novel and important findings regarding the impact of noradrenergic signaling from the locus coeruleus on hippocampal gene expression. The locus coeruleus is the sole source of noradrenaline to the hippocampus and many rapid molecular changes induced by stress are regulated by noradrenaline. This manuscript provides a rigorous investigation into hippocampal genes uniquely regulated by noradrenaline in the presence or absence of stress. Data were collected and analyses were performed using solid methodology, and the results mostly convincingly support the conclusion made with few weaknesses. The study would benefit from a more comprehensive analyses of sex differences.

      Response: We thank the reviewers and the editors for the positive evaluation of our work and for the constructive feedback. To address some of the key criticisms, we have performed several new experiments and analyses. Importantly, we now provide a much more rigorous comparison of males and females, which strongly suggests that there are no major sex differences in the transcriptomic response to stress and noradrenaline in the hippocampus. We think that these - and other additions discussed below - significantly strengthen the manuscript. We provide detailed responses to all the reviewers comments. We have added numbers to the reviewers’ comments for easier referencing.

      Reviewer #1 (Public Review):

      Comment 1: Privitera et al., provide a comprehensive and rigorous assessment of how noradrenaline (NA) inputs from the locus coeruleus (LC) to the hippocampus regulate stress-induced acute changes in gene expression. They utilize RNA-sequencing with selective activation/inhibition of LC-NA activity using pharmacological, chemogenetic and optogenetic manipulations to identify a great number of reproducible sets of genes impacted by LC activation. It is noteworthy that this study compares transcriptomic changes in the hippocampus induced by stress alone, as compared with selective circuit activation/inhibition. This reveals a small set of genes that were found to be highly reproducible. Further, the publicly available data will be highly useful to the scientific community.

      Response: We are very grateful for this positive evaluation.

      Comment 2: A major strength of the study is the inclusion of both males and females. However, with this aspect of the study also lies the biggest weakness. While the experiments tested males and females, they were not powered for identifying sex differences. There are vast amounts of literature documenting the inherent sex differences, both under resting and stress-evoked conditions, in the LC-NA system and this is a major missed opportunity to better understand if there is an impact of these sex-specific differences at the genetic level in a major LC projection region. There are many instances whereby sex effects are apparent, but do not pass multiple testing correction due to low n's. The authors highlight one of them (Ctla2b) in supplemental figure 6. This gene is only upregulated by stress in females. It is appreciated that the manuscript provides an incredible amount of novel data, making the investigation of sex differences ambitious. Data are publicly available for others to conduct follow up work, and therefore it may be useful if a list of those genes that were different based on targeted interrogation of the dataset be provided with a clear statement that multiple testing corrections failed. This will aid further investigations that are powered to evaluate sex effects.

      Response: The assessment of the reviewers and the editorial feedback encouraged us to look more thoroughly into potential sex differences, because we believe it would indeed be a major additional strength if our manuscript could make a firm statement on this important issue. To this end, we have expanded the manuscript in two major ways:

      (1) To expand the analysis of sex effects also to the dorsal hippocampus, and to increase robustness of the data, we have performed RNA-seq in 32 additional samples of male and female mice exposed to stress (or control) and propranolol (or saline) injection. Figure 1fh and Supplementary Figure 1d-f have been updated to reflect this new addition, and the results are presented in a new section on Pages 3-4 (pasted below for ease of reviewing). In summary, the strongly support our initial observation that the effects of stress on gene expression, as well as the effects of propranolol on blocking stress-induced effects, are highly similar in both sexes.

      (2) To further increase the power for detection of sex-effects, we have performed a small meta-analysis. For this, we combined several RNAseq datasets from the current manuscript and published datasets from our previous work (Floriou-Servou et al., 2018; von Ziegler et al., 2022), which also investigated transcriptomic sex-differences in the hippocampus 45 min after cold swim stress exposure in the same setup as used for the current manuscript. This approach increased our sample size to 51 males and 20 females. In summary, this well-powered approach shows no evidence for sex differences in the transcriptional response to stress, even when more lenient analyses were applied. These results are described in a new section on page 4, and summarized in Supplementary Figures 1f+g. This section is pasted below for ease of reviewing.

      "While blocking β-adrenergic receptors was able to block stress-induced gene expression, we did not test whether propranolol might decrease gene expression already at baseline, independent of stress. Additionally, all tests had thus far been conducted in male mice, raising the question about potential sex differences in NA-mediated transcriptomic responses. To address these two issues, we repeated the experiment in both sexes and included a group that received a propranolol injection but was not exposed to stress (Fig. 1f). Combining the data from both experiments, we repeated the analysis for each region, to identify genes whose response to stress was inhibited by propranolol (Figure 1g). As in the previous experiment, we found that many of the stress-induced gene expression changes were blocked by propranolol injection in both dHC (Figure 1g, left panel) and vHC (Figure 1g, right panel). Importantly, propranolol did not change the expression level of these genes in the absence of stress. We then directly compared the genes sensitive to stress and propranolol treatment in both dHC and vHC. To this end, we plotted the union of genes showing a significant stress:propranolol interaction in either region in one heatmap across both dHC and vHC (Supplementary Figure 1d). This showed again that the stress-induced changes were very similar in dHC and vHC, and that propranolol similarly blocked many of them. Finally, we asked whether the response differs between males and females. Despite clear sex differences in gene expression at baseline (data not shown), we found no significant sex differences in response to stress or propranolol between male and female mice (FDR<0.05; Fig. 1g). To more directly visualize this, we compared females and males by plotting the log2-fold changes of the stress:propranolol interaction across all stress-induced genes that were blocked by propranolol. We find very similar regulation patterns in both sexes (Figure 1h). Although none of these sex differences are significant, some genes seem to show quantitative differences, so we plotted the expression patterns of the 5 genes showing the largest difference in interaction term as box-plots, which suggest that these spurious differences are likely due to noisy coefficient estimates (Supplementary Fig. 1e). To address concerns that our analysis of sex differences might not have been sufficiently powered, we performed a meta-analysis of the experiments shown here along with previously published datasets from our lab (Floriou-Servou et al. 2018; von Ziegler et al. 2022). In all these experiments, the vHC of male and female mice was profiled 45 min after exposure to an acute swim stress challenge. This resulted in a sample size of 51 males and 20 females. Despite this high number of independent samples, we could not identify any statistically significant interaction between sex and the stress response. To identify candidates that might not reach significance while discounting differences due to noise in fold-change estimates, we reproduced the same analysis using DESeq2 with Approximate Posterior Estimation for generalized linear model (apeglm) logFC shrinkage (A. Zhu, Ibrahim, and Love 2018). This analysis also did not reveal any sex differences in the stress response (Supplementary Fig. 1f). We then tailored the meta-analysis specifically to the set of stress-responsive genes that were blocked by propranolol, and also for these genes the response to stress was strikingly similar in both sexes (Supplementary Fig. 1g). Altogether, we conclude that there are no major sex differences in the rapid transcriptomic stress response in the hippocampus, and that blocking beta-receptors prevents a large set of stress-induced genes in both females and males."

      To put these findings in context with existing literature, we agree with the reviewer that there are many studies that have reported sex differences in the LC-circuitry as summarized by Bangasser and colleagues (Bangasser et al., 2016, 2019). However, these studies primarily focus on the LC itself, suggesting that female rats have more LC neurons, denser LC-dendrites in the peri-LC region, and that LC neurons are more readily activated by stress in females because of heightened sensitivity to CRF-signaling. A recent study in mice reports, in contrast, that females have fewer TH-positive neurons in the LC, but they also find enhanced excitability of LC neurons in females (Mariscal et al., 2023). Similarly, one study has suggested molecular differences in the makeup of the LC (Mulvey et al., 2018). Our experiments, however, focus on the impact of NA release in a projection region (hippocampus). Further, we use a strong stress induction protocol (swim stress) and various potent modes of direct LC activation, so differences in "LC-excitability" are likely less relevant in this context. We added evidence showing that we trigger powerful NA release in both sexes (Supplementary Figure 2c-h; see response to Reviewer #2, Comment #3 for more details). In addition, we show that the intensity or pattern of LC stimulation does not appear to alter the molecular response (Figure 3a-b), and that various stressors (mild or intense) all trigger the same NA-dependent molecular changes (Figure 4a-b). Therefore, our results suggest that once NA is released (in the hippocampus), the molecular downstream effects on gene expression are very similar - independent of stimulation intensity, sex, or hippocampal subregion (dorsal/ventral). This does not mean that there are no sex differences for activation of LC, but rather that the transcriptional response to NA release in the hippocampus is robust across sexes, and that propranolol seems to block NA-dependent effects similarly in both sexes. This does not rule out quantitative differences between sexes that only emerge with targeted analyses of individual genes, or once fluctuations in ovarian hormones are taken into account. We have updated the section in the discussion to summarize these considerations in light of the new results (see pages 20-21, section: "A uniform molecular response to stress and noradrenaline release in both sexes").

      Comment 3: A major finding of the present study is the involvement of noradrenergic transcriptomic changes occurring in astrocytic genes in the hippocampus. Given the stated importance of this finding within the discussion, it seems that some additional dialogue integrating this with current literature about the role of astrocytes in the hippocampus during stress or fear memory would be important.

      Response: We thank the reviewer for giving us an opportunity to add a more detailed discussion about the role of astrocytes and thyroid hormones in the hippocampus during learning and memory formation. We have added these statements to the discussion:

      “Within the hippocampus, astrocytic pathways are emerging as important players for learning and memory processes (Gibbs, Hutchinson, and Hertz 2008; Bohmbach et al. 2022). In fact, it is well-known that NA enhances memory consolidation (Schwabe et al. 2022; McGaugh and Roozendaal 2002), and recent work suggests that these effects are mediated by astrocytic β-adrenergic receptors (Gao et al. 2016; Iqbal et al. 2023). Our transcriptomic screens revealed Dio2 as the most prominent target influenced by LC activity. Dio2 is selectively expressed in astrocytes and encodes for the intracellular type II iodothyronine deiodinase, which converts thyroxine (T4) to the bioactive thyroid hormone 3,3',5-triiodothyronine (T3) and therefore regulates the local availability of T3 in the brain (Bianco et al. 2019). Enzymatic activity of DIO2 has further been shown to be increased by prolonged noradrenergic transmission through desipramine treatment in LC projection areas (Campos-Barros et al. 1994). This suggests that the LC-NA system and its widespread projections could act as a major regulator of brain-derived T3. Notably, T3-signaling plays a role in hippocampal memory formation (Rivas and Naranjo 2007; Sui et al. 2006), raising the possibility that NA-induced Dio2 activity in astrocytes might mediate some of these effects.”

      Comment 4: The comparison of the candidate genes activated by the LC in the present study (swim) with datasets published by Floriou-Servou et al., 2018 (Novelty, swim, restraint, and footshock) is an interesting and important comparison. Were there other stressors identified in this paper or other publications that do not regulate these candidate genes? Further, can references be added to clarify to the reader, that prior studies have identified that novelty, restraint and footshock all activate LC-NA neurons.

      ponse: Thank you for the positive feedback. We have only tested the stressors reported in Figure 4a-b (novelty, swim, restraint, and footshock). It is known that all these stressors trigger noradrenaline release, in fact we are not aware of stressors that do not trigger NA release. This reproducible finding supports the notion that the identified set of genes is indeed highly NAresponsive. As suggested, we have now included references that show increased NA release in response to all these stressors:

      “Therefore, we assessed their expression in a dataset comparing the effect of various stressors on the hippocampal transcriptome (Floriou-Servou et al., 2018). The stressors included restraint, novelty and footshock stress, which have all previously been shown to increase hippocampal NA release (HajósKorcsok et al., 2003; Lima et al., 2019; Masatoshi Tanaka et al., 1982).”

      Comment 5: Comparisons are made between chemogenetic studies and yohimbine, stating that fewer genes were activated by chemogenetic activation of LC neurons. There is clear justification for why this may occur, but a caveat may need to be mentioned, that evidence of neuronal activation in the LC by each of these methods were conducted at 90 (yohimbine) versus 45 (hM3Dq) minutes, and therefore it cannot be ruled out that differences in LC-NA activity levels might also contribute.

      Response: The reviewer raises an important point about some inconsistencies between the time points chosen in our study, an aspect that was also pointed out by Reviewer #2. We have chosen the 45 and 90 min time points for two different reasons. On the one hand, cFos changes on the protein level are known to peak 90 min after neuronal activation, and we wanted to capture the strongest possible cFos signal in the LC. On the other hand, we wanted to measure gene expression changes triggered by NA release, which already occur 45 min after noradrenergic activation (Roszkowski et al., 2016). Thus, when the experimental design allowed separate experiments (e.g. systemic yohimbine injection), we chose to measure gene expression after 45 min, but to validate cFos activation in the LC separately after 90min. In response to DREADD activation, however, we wanted to confirm within the same animal that LC activation was successful, and thus we collected LC and hippocampus simultaneously (Figure 2c,d). While the cFos increase is already very pronounced at the 45min time point (Figure 2g), the quality of IHC is slightly lower because the tissue cannot be perfused in this experimental design. Therefore, we do not think that the time point for cFos sampling matters in this context. However, we agree with the reviewer that it remains unclear whether yohimbine and DREADDs activate the LC with similar potency. To directly compare NA release would require a set of photometry-based experiments to measure NA release using genetically-encoded NA-sensors. While we have added such experiments for LC activation with DREADDs and optogenetics to show rapid NA release indeed occurs in the hippocampus (see Reviewer #2, Comment 3; Supplementary Figure 2c-h), yohimbine interferes with the NA-sensors as explained in detail in response to Reviewer 2, Comment 3. Thus, it was too challenging for us to directly compare the release dynamics in response to DREADDs and yohimbine, which was also not the main focus of our work. To explicitly address this caveat, we have extended the corresponding section in the discussion:

      "Finally, our observation that systemic administration of the α2-adrenergic receptor antagonist yohimbine very closely recapitulates the transcriptional response to stress stands in contrast to the much more selective transcriptional changes observed after chemogenetic or optogenetic LC-NA activation. This difference could be due to various factors. First, it remains unclear how strong the LC gets activated by yohimbine versus hM3Dq-DREADDs. However, given the potent LC activation observed after DREADD activation, it seems unlikely that yohimbine would lead to a more pronounced LC activation, thus explaining the stronger transcriptional effects. Second, contrary to LC-specific DREADD-activation, systemic yohimbine injection will also antagonize postsynaptic α2-adrenergic receptors throughout the brain (and periphery). More research is needed to determine whether this could have a more widespread impact on the hippocampus (and other brain regions) than isolated LC-NA activation, further enhancing excitability by preventing α2-mediated inhibition of cAMP production. Finally, systemic yohimbine administration and noradrenergic activity have been shown to induce corticosterone release into the blood (Johnston, Baldwin, and File 1988; Leibowitz et al. 1988; Fink 2016). Thus, yohimbine injection could have broader transcriptional consequences, including corticosteroid-mediated effects on gene expression."

      Comment 6: Please add information about how virus or cannula placement was confirmed in these studies. Were missed placements also analyzed separately?

      Response: Pupillometry recordings were performed with all animals involving optogenetic or chemogenetic manipulations of the LC, before subjecting them to stress experiments. These assessments account for both correct optic fiber placement and virus expression (Privitera et al., 2020). If an animal did not show a clear pupil response, it was not included any further in the study. To demonstrate correct cannula placement for drug infusion of isoprotenerol in the dorsal hippocampus, we added a representative image of cannula placement in Supplementary Figure 1h.

      Comment 7: Time of day for tissue collection used in genetic analysis should be reported for all studies conducted or reanalyzed.

      Response: Thank you for pointing out this omission. Tissue collection for RNA-seq analysis was always performed between 11am and 5pm during the dark phase of the reversed light-dark cycle. We have added this information to the corresponding method section (“Tissue collection”).

      Reviewer #1 (Recommendations For The Authors):

      Comment 8: This is a well written, comprehensive and rigorous manuscript that will be of great interest to those in the scientific community.

      Response: Thank you for the positive evaluation of our work and for the constructive feedback.

      Reviewer #2 (Public Review):

      Comment 1: The present manuscript investigates the implication of locus coeruleus-noradrenaline system in the stress-induced transcriptional changes of dorsal and ventral hippocampus, combining pharmacological, chemogenetic, and optogenetic techniques. Authors have revealed that stress-induced release of noradrenaline from locus coeruleus plays a modulatory role in the expression of a large scale of genes in both ventral and dorsal hippocampus through activation of β-adrenoreceptors. Similar transcriptional responses were observed after optogenetic and chemogenetic stimulation of locus coeruleus. Among all the genes analysed, authors identified the most affected ones in response to locus coeruleus-noradrenaline stimulation as being Dio2, Ppp1r3c, Ppp1r3g, Sik1, and Nr4a1. By comparing their transcriptomic data with publicly available datasets, authors revealed that these genes were upregulated upon exposure to different stressors. Additionally, authors found that upregulation of Ppp1r3c, Ppp1r3g, and Dio2 genes following swim stress was sustained from 90 min up to 2-4 hours after stress and that it was predominantly restricted to hippocampal astrocytes, while Sik1 and Nr4a1 genes showed a broader cellular expression and a sharp rise and fall in expression, within 90 min of stress onset.

      Overall, the paper is well written and provides a useful inventory of dorsal and ventral hippocampal gene expression upregulated by activation of LC-NA system, which can be used as starting point for more functional studies related to the effects of stress-induced physiological and pathological changes.

      Response: We thank the reviewer for the careful assessment of our work.

      Comment 2: However, I believe that the study would have benefited of a more comprehensive analyses of sex differences. Experiments in females were conducted only in one experiment and analyses restricted to the ventral hippocampus.

      Response: In response to the comments by the reviewer, as well as Reviewer #1 and the editors, we have sequenced an additional 32 brain samples to expand the comparison of sex effects in females and males across dorsal and ventral hippocampus, and we included a new meta-analysis of 3 experimental datasets (51 male and 20 female) samples, to thoroughly assess sex differences in the transcriptomic response to stress. We refer the reviewer to our detailed response provided above to Reviewer #1, comment #2, and the updated results section on pages 3-4.

      Comment 3: Although, the experiments were overall sound and the results broadly support the conclusion made, I think some methodological choices should be better explained and rationalized. For instance, the study focuses on identifying transcriptional changes in the hippocampus induced by stress-mediated activation of the LC-NA system, however NA release following stress exposure and pharmacological or optogenetic manipulation was mostly measured in the cortex.

      Response: Because the hippocampus was used for RNA-sequencing, we could not assess NA release in the hippocampus (as this would require fiber implants that would interfere with molecular measures, or different tissue processing for HPLC). Nonetheless, we wanted to assess the transcriptional changes in the hippocampus, while simultaneously measuring successful stimulation of the LC-NA system in the same animals. To achieve this, we pursued 3 routes: 1) we used pupillometry to confirm functional LC activation; 2) we measured cFOS in the LC to directly demonstrate LC activation; 3) we assessed NA release using uHPLC (which requires larger tissue samples) and we chose the cortex because both cortex and hippocampus receive NA predominantly from the LC (Samuels & Szabadi, 2008). Importantly, we had previously shown that chemogenetic LC activation leads to a similar NA turnover in both the cortex and hippocampus, as measured by uHPLC (Zerbi et al., 2019). The relevant figure from that paper is inserted below to quickly show the striking similarity between hippocampus and cortex.

      Author response image 1.

      Levels of noradrenaline (NE) turnover (MHPG/NE ratio) in the cortex (CTX) and hippocampus (HC), measured in whole tissue with uHPLC 90min after hM3Dq-DREADD activation of the LC (copied and cropped from Zerbi et al, 2019, Neuron).

      In response to the reviewers comment, we performed additional experiments to directly demonstrate that LC-activation with DREADDs as well as optogenetics causes an increase in hippocampal NA-release. We recorded NA release in the hippocampus (using fiber photometry combined with genetically encoded NA sensors). For DREADD activation, we observed a strong increase in hippocampal noradrenaline that started a few minutes after clozapine administration, and this increase was sustained throughout the duration of the 21 minute recording (see Supplementary Figure2c-e). For optogenetic LC activation, we find a rapid and immediate sharp increase in NA levels in the hippocampus (Supplementary Figure 2f-h). These experiments were performed in females and males and triggered similar responses. An adapted and cropped version of Supplementary Figure 2 is pasted below for ease of reading.

      Please note that we could not perform a similar experiment using yohimbine, because the GRABNE sensors are based on the alpha-2 adrenergic receptor, thus yohimbine administration interferes with the photometry recording. However, we believe that it is clear from this response that strong activation of the LC leads to uniform release of NA in the hippocampus and cortex.

      Author response image 2.

      c, Schematic of fiber photometry recording of hippocampal NA during chemogenetic activation of the LC. After 5 min baseline recording in the homecage animals were injected with clozapine (0.03mg/kg, i.p.) and placed in the OFT for 21min. d, Average ΔF/F traces of GRABNE2m photometry recordings in response to chemogenetic activation of the LC (mean±SEM for hM3DGq+ and hM3DGq- split into females and males, n=3/group/sex). e, Peak ΔF/F response of fiber photometry trace. f, Schematic of fiber photometry recording of hippocampal NA during optogenetic activation of the LC. Animals were lightly anesthetized (1.5% isoflurane) and recorded in a stereotaxic frame. After 1 min baseline recording, animals were stimulated three times with 5Hz for 10s (10ms pulse width, ~8mW laser power) and recorded for 2 min post-stimulation. g, Average ΔF/F traces of the NA sensors GRABNE1m and nLightG in response to optogenetic activation of the LC (mean±SEM for females and males, n(females)= 10, n(males)=5. h, Peak ΔF/F response of fiber photometry trace.

      Comment 4: Furthermore, behavioral changes following systemic pharmacologic or chemogenetic manipulation were observed in the open field task immediately after peripheral injections of yohimbine or CNO, respectively. Is this timing sufficient for both drugs to cross the blood brain barrier and to exert behavioral effects?

      Response: We have previously shown that chemogenetic activation of the LC through clozapine elicits pupil responses within 1-2 minutes after injection (Privitera et al., 2020; Zerbi et al., 2019). This indicates that clozapine rapidly crosses the blood brain barrier and affects LC activity within a few minutes after injection. Our additional experiments using genetically encoded sensors in the hippocampus show this even more directly (Supplementary Figure 2d), see also the response to Comment 3 above.

      Similarly, yohimbine also rapidly crosses the blood brain barrier within the same time frame (Hubbard et al., 1988). These observations are consistent with the rapid behavioral effects that can be detected within a few minutes after injection of clozapine for LC-DREADD activation (Zerbi et al., 2019), and for yohimbine as well (von Ziegler et al., 2023). In response to another comment of this reviewer, we have also re-analyzed the behavior presented in the current manuscript in time-bins of 3 minutes, which also shows the rapid onset of effects in response to yohimbine (within the first 3 min) and DREADDs (within 6 min), see Supplementary Fig. 3.

      Comment 5: Finally, the study shows that activation of noradrenergic hippocampus-projecting LC neurons is sufficient to regulate the expression of several hippocampal genes, although the necessity of these projection to induce the observed transcriptional effects has been tested to some extent through systemic blockade of beta-adrenoceptor, I believe the study would have benefited of more selective (optogenetic or chemogenetic) necessity experiments.

      Response: We understand the reviewer's point that blocking the LC during stress exposure would be an interesting experiment. However, it is very hard to completely silence the LC during intense stressors. In fact, despite intense efforts, we have not been able to silence the LC during swim stress exposure using DREADDs or other chemogenetic approaches (PSAM/PSEM). We were in fact able to silence the LC with the optogenetic inhibitor JAWS (and others have reported successful LC silencing with GtACR2), but there is a major issue involving the "rebound effect", where more NA is released once the inhibition is stopped. We would thus have had to optogenetically silence the LC for 45-90 min, which would create heat artifacts, and require challenging control experiments to draw firm conclusions. Given all these issues, we reasoned that blocking adrenergic receptors is a simple and elegant solution, which provides clear evidence for the necessity of beta-adrenergic signaling.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      Comment 6: The study focuses on the identification of transcriptional changes in the hippocampus induced by stress-mediated activation of the LC-NA system, however, noradrenaline release following stress exposure or yohimbine injection was measured in the cortex. Authors should consider measuring NA concentrations in the hippocampus after exposure to swim stress or administration of yohimbine, or at least explain their choice to analyse to cortex in the manuscript.

      Response: We have addressed this issue in detail in Response to "Reviewer 2, Comment #3", where we provided an overview of the additional data that support our approach. As mentioned before, measuring NA release after yohimbine is not compatible with our GRABNE-photometry approach, as the GRAB-sensor is based on alpha2-adrenoceptor. Here, we would like to add that measuring NA release using photometry during swim stress is also challenging. The challenge is the vigorous movement (swimming, typically in one direction), which creates pressure on the cables/implants. We felt that overcoming these experimental challenges (setup, troubleshooting and controls) would be beyond the scope of the paper, given that it is already known that this stressor leads to strong NA release in the hippocampus. We have now included references that demonstrate that all the stressors used in our work trigger NA increase in the hippocampus (see response to Reviewer 1, Comment 3): “Therefore, we assessed their expression in a dataset comparing the effect of various stressors on the hippocampal transcriptome (Floriou-Servou et al., 2018). The stressors included restraint, novelty and footshock stress, which have all previously been shown to increase hippocampal NA release (Hajós-Korcsok et al., 2003; Lima et al., 2019; Masatoshi Tanaka et al., 1982).”

      Comment 7: Concerning the experiment aimed at investigating sex differences in gene expression, it is not clear the reason why authors decided to restrict their analyses in females to the ventral hippocampal only. The explanation that in males they did not detect major differences between the dorsal and ventral hippocampus is not sufficient, because there could have been different effects in females. Therefore, the conclusion made by the authors that their "results suggest that the transcriptomic response is independent of sex" is not entirely correct, since sex differences were only evaluated in the ventral hippocampus.

      Response: We appreciate the reviewer's critique. As described above, we have now also sequenced the dorsal hippocampal tissue from the propranolol experiment (males and females, 32 samples) and additionally added an extensive meta-analysis of three large datasets (n=71) to compare transcriptional sex differences in response to stress. A detailed description of these experiments and how they have extended/supported our conclusions have been provided in response to Reviewer #1, Comment #2.

      Comment 8: Besides the effects on females, the same experiment examined whether propranolol by itself (in the absence of stress) would have been able to alter gene expression: such effects were not examined in the dorsal hippocampus. In contrast, in a different experiment, the effects of isoproterenol on genes expression were restricted to the dorsal hippocampus only. Furthermore, related to this latter experiment, intra-dorsal hippocampal injection of isoproterenol should presumably mimic the rise in NA observed after stress exposure, why was gene expression measured 90 min after isoproterenol central injections while in the other experiments gene expression was determined 45 min after stress, that is when authors observe the peak NA concentration?

      Response: We have addressed the reviewer's critique of dorsal vs ventral hippocampus by reanalyzing 32 additional samples from dorsal hippocampus of male and female mice after propranolol (or saline) injection. Please see response to Reviewer #1, comment #2.

      Regarding the time points: We have chosen the 45 and 90 min time points mainly for two reasons. First, cFos protein changes are known to be strongest 90 min after neuronal activation. Second, because we wanted to capture gene expression changes triggered by NA release, we reasoned that these effects must be fast and should thus be measured at an early transcriptional time-point (45min). However, after performing the time-course experiment after swim stress exposure (Figure 4d,c), we observed that the LC-NA-sensitive genes (e.g. Dio2 and several PP1-subunits) show the strongest changes 90 min after stress exposure. Therefore, in some of our experiments we opted to analyze gene expression changes at 90min, converging with the time-point we typically use for cFos staining. Contrary to the reviewer's statement, peak NA concentrations are not observed 45 min after the various interventions, but rather the peak in the main metabolite (MHPG) is observed then, due to the temporal dynamics of NA release and breakdown. NA release occurs immediately upon stress exposure (or direct LC activation), which we also show in the new photometry data described above. Thus, rapid NA release triggers intracellular cascades that lead to downstream transcriptional changes, which peak presumably between 4590 min later.

      Comment 9: Behavioral changes following systemic pharmacologic or chemogenetic manipulation were observed in the open field task immediately after peripheral injections of yohimbine or CNO, respectively. Is this timing sufficient for both drugs to cross the blood brain barrier and to exert behavioral effects? It is also not immediately clear the reason why the open field tasks have different durations depending on the experiments, which can also impact the results. Authors might also consider to split the open field data analyses in 2 or 3 min time-bins, to allow for a better comparison across the different results.

      Response: We thank the reviewer for the suggestion to plot the behavior data as time-bins. We have implemented this change for the yohimbine and DREADD experiments, and updated the corresponding figure accordingly (Supplementary Figure 3, pasted below for ease of reading). The new visualization clearly shows that yohimbine injection triggers rapid behavioral effects already in the first three minutes, whereas the LC-DREADD activation triggers behavioral changes within 3-6 minutes after injection. Thus, clear drug effects are visible in the first 10 minutes, which is comparable to the standard OFT test (10min testing) shown in response to swim stress exposure (Suppl. Figure 3a). The choice to expose mice to the OFT for 21 minutes in total was due to the fact that we based our experimental approach on the optogenetic LC-stimulation protocol first published by McCall and colleagues (McCall et al, Neuron, 2015), in which the LC is stimulated for 3 min followed by 3 min pauses (see Suppl. Figure 3d). Because of this on-off design, we decided to keep the optogenetic analysis simple and show the overall effect (Supplementary Figure 3d), particularly as we know that NA dynamics do not recover rapidly enough after 3 min continuous stimulation to justify a bin-analysis (unpublished data).

      Author response image 3.

      Effects of acute stress and noradrenergic stimulation on anxiety-like behaviour in the open field test. a, Stress-induced changes in the open field test 45 min after stress onset. Stressed animals show overall reductions in distance traveled (unpaired t-test; t=3.55, df=22, p=0.0018), time in center (welch unpaired t-test; t=3.50, df=13.61, p=0.0036), supported rears (unpaired t-test; t=3.39, df=22, p=0.0026) and unsupported rears (unpaired t-test; t=5.53, df=22, p = 1.47e-05) compared to controls (Control n = 12; Stress n = 12). This data have been previously published (von Ziegler et al., 2022). b, Yohimbine (3 mg/kg, i.p.) injected animals show reduced distance traveled (unpaired t-test; t=2.39, df=10, p=0.03772), reduced supported rears (unpaired t-test; t=6.56, df=10, p=0.00006) and reduced unsupported rears (welch unpaired t-test; t=3.69, df=4.4, p = 0.01785) compared to vehicle injected animals (Vehicle n = 6; Yohimbine n = 7). c, Chemogenetic LC activation induced changes in the open field test immediately after clozapine (0.03 mg/kg, i.p.) injection. hM3Dq+ animals show reduced distance traveled (unpaired t-test; t=6.28, df=13, p=0.00003), reduced supported rears (unpaired t-test; t=4.28, df=13, p=0.0009), as well as reduced unsupported rears (welch unpaired t-test; t=4.28, df=13, p = 0.00437) compared to hM3D- animals (hM3Dq- n = 7; hM3Dq+ n = 8). d, Optogenetic 5 Hz LC activation induced changes during the open field test. ChR2+ animals show reduced supported rears (unpaired t-test; t=2.42, df=64, p=0.0185) and reduced unsupported rears (unpaired ttest; t=2.91, df=64, p = 0.00499) compared to ChR2- animals (ChR2- n = 32; ChR2+ n = 36). Data expressed as mean ± SEM. p < 0.05, p < 0.01, p < 0.001, **p < 0.0001.

      Comment 9: The study shows that activation of noradrenergic hippocampus-projecting LC neurons is sufficient to regulate the expression of several hippocampal genes. I believe the study would have benefited of more selective necessity experiments. Authors might consider adding optogenetic (or chemogenetic) experiments aimed at inhibiting LC-NA hippocampal projections during stress exposure (or, alternatively, perform intrahippocampal pharmacological blockade of β-adrenoreceptors during stress exposure), and determine the effects on gene expression.

      Response: We kindly refer the reviewer to our previous response to Comment #2 above.

      Minor concerns:

      There is a typo in the abstract. Please correct "LN-NA" with "LC-NA"

      Response: Thank you, we have corrected it.

      References

      Bangasser, D. A., Eck, S. R., & Ordoñes Sanchez, E. (1/2019). Sex differences in stress reactivity in arousal and attention systems. Neuropsychopharmacology: Official Publication of the American College of Neuropsychopharmacology, 44(1), 129–139.

      Bangasser, D. A., Wiersielis, K. R., & Khantsis, S. (06/2016). Sex differences in the locus coeruleusnorepinephrine system and its regulation by stress. Brain Research, 1641, 177–188.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      All comments made in the public section.

      We would like to thank the reviewer for their assessment of our study and for suggestions for additional experiments to follow up our studies.

      Reviewer #2 (Recommendations For The Authors):

      ‐ Preparation of spike proteins and VLPs. Although Triton‐X114 extraction was done to remove endotoxin from the recombinant spike protein preparations, its removal efficiency depends on the levels of endotoxin in the samples. Therefore, the residual endotoxin levels in each of the test samples and batches should be measured. Even very low but varying levels of residual endotoxin would substantially impact the reported results, as they create inconsistent data that are not interpretable.

      Certainly, endotoxin contamination in instilled materials is always an issue. Established protocols for inducing acute inflammatory responses using endotoxin outline specific ranges of endotoxin levels in the instillation materials. To induce acute lung inflammation in mice at least 2 µg of endotoxin must be instilled. We have endeavored to reduce the possibility of endotoxin contamination in our recombinant proteins by using a mammalian expression system; careful aseptic culture and protein purification techniques; and a final Triton-X114 partitioning protocol. We assessed the possibility of endotoxin contamination using the Pierce™ Chromogenic Endotoxin Quant Kit, which is based on the amebocyte lysate assay. Our analysis revealed that the endotoxin level in the purified recombinant protein preparation is below 1.0 EU/ml, which closely aligns with the levels specified for recombinant proteins. An endotoxin concentration of 1.0 EU/ml is equivalent to approximately 0.1 ng/ml. Throughout all mouse nasal instillation experiments, the total volume of recombinant protein administered did not exceed 6 µl. The amount of contaminant endotoxin instilled did not exceed 1 pg (50 µl of 0.02 ng/ml of endotoxin). Consequently, we can confirm that the extent of endotoxin contamination is at trace levels. Moreover, our study reveals multiple results indicating that the level of endotoxin contamination in the recombinant protein was inadequate to independently induce neutrophil recruitment in the cremaster muscle, lymph nodes, and liver. For further insights, refer to Figure 5.

      ‐ Doses of spike and VLPs: The amount of spike protein incorporated into HIV Gag‐based VLPs should be determined and compared to that found in the native SARS‐CoV‐2 virus particles. This should provide more physiologic doses (or dose ranges/titration) of spike than the arbitrary doses (3 ug or 5 ug) used in the mouse experiments.

      To visualize the acquisition of spike protein and track cells that have acquired the spike protein, we conducted a series of tests and optimizations using different concentrations of Alexa 488 labeled spike protein, ranging from 0.5 to 5 µg. During the processing of lung tissue for microscopic imaging, it was of utmost importance to preserve the integrity of the labeled spike protein in the tissue samples. We determined that instillation of 3 µg of Alexa 488 labeled spike protein yielded the optimal signal strength across the lung sections. Notably, in many mouse models employing intra-nasal instillation protocols for SARS-CoV2 spike protein or RBD domain-only recombinant proteins, a dosage of approximately 3 µg or higher were commonly used. Regarding the titer of spike-incorporated VLPs, it is important to highlight that we did not directly compare the quantity of spike protein present in NL4.3 VLPs to that of the naïve SARS-CoV-2 virus. HIV-1 and SARS-CoV-2 viruses typically carry around 70 gp120 spikes and 30 spikes, respectively. We estimated that SARS-CoV-2 spike-incorporated NL4.3 VLPs may display twice the number of spikes compared to naïve SARS-CoV-2. Notably, our measurements of SARS-CoV-2 spike on NL4.3 VLPs demonstrated similar behavior to SARS-CoV-2 in terms of specific binding to ACE2-expressing 293T cells, indicating their functional similarity in this context.

      Author response image 1.

      Spike protein-incorporated NL4.3 VLPs test with human ACE2-transfected HEK293 cells. The wild-type spike protein-incorporated VLPs and delta envelope NL4.3 VLPs were analyzed using human ACE2-transfected HEK293 cells. The first plot shows ACE2 expression levels in HEK293 cells. The second plot displays the binding pattern of Delta Env NL4.3 VLPs on ACE2-expressing HEK293 cells. The third plot illustrates the binding pattern of wild-type spike protein-incorporated NL4.3 VLPs on ACE2expressing HEK293 cells. The histogram provides a comparison of VLP binding strength to ACE2expressing HEK293 cells.

      ‐ The PNGase F‐treated protein was not studied in Fig 1. In Fig 2, glycan‐removal by PNGaseF has little effects on cell uptake and cell recruitment in the lung. If binding to one of the Siglec lectins is a critical initial step, experiments should be designed to evaluate this aspect of the spike‐cell interaction in a greater depth.

      As the reviewer states results with the PNGase F-treated protein were not shown in Fig. 1 although we showed results in Figs. 2 & 3. See discussion below about our preparation of the PNGase F-treated protein. Perhaps because we elected to use a purified fraction that retained ACE2 binding, the protein we used likely retained some complex glycans. As the reviewer notes the PNGase F treated protein had similar overall cellular recruitment and uptake profiles compared to the untreated spike protein. The PNGase Ftreated fraction we used no longer bound Siglec-F in the flow-based assay, shown in Fig. 7. This argues that the initial uptake and cellular recruitment following intranasal instillation of the Spike protein did not depend upon the engagement of Siglec-F. While Siglec-F on the murine alveolar macrophage can likely efficiently capture the spike proteins other cellular receptors contribute and the overall impact of the spike protein on alveolar macrophages likely reflects its engagement of multiple receptors.

      • Enzymatic removal of sialic acids from spike may be one parameter to explore. The efficiency of enzymatic removal should also be verified prior to experiments. Finally, the authors need to assess whether the proteins remained functional, folded properly, and did not aggregate.

      To obtain the de-glycosylated form of the SARS-CoV-2 spike protein, we employed PNGase F enzymatic digestion to remove glycans. Subsequently, the spike protein was purified using a size exclusion column. During this purification process, the PNGase F-treated spike protein segregated into two distinct fractions, specifically fraction 6 to 8 and fraction 9 to 11 (see revised Figure 1- figure supplement 1).

      Author response image 2.

      Size exclusion chromatography. The peak lines represent the absorbance at 280 nm. PNGase F-treated spike proteins were loaded onto a Superdex 26/60 column, resolved at a flow rate of 1.0 ml/min, and collected in 1 ml fractions.

      The Coomassie blue staining of an SDS-PAGE gel revealed that fractions 6 to 8 likely underwent a more pronounced de-glycosylation by PNGase F compared to fractions 9 to 11. Additionally, during the size column purification, we noticed that fraction 6 to 8 exhibited a faster mobility than the untreated spike protein, implying a potentially substantial modification of the protein's conformation. To probe the functional characteristics of the de-glycosylated spike protein in fraction 6 to 8, we conducted binding tests with human ACE2. Strikingly, the spike protein in fraction 6 to 8 completely lost its binding affinity to ACE2, indicating a loss of its ACE2-binding capability. Conversely, the protein in fraction 9 to 11 showed partial de-glycosylation but still retained its original functionality to bind to ACE2 and its antibody.

      Author response image 3.

      FACS analysis of various spike protein-bound beads. Protein bound beads were detected with labeled spike antibody, recombinant human ACE2, and recombinant mouse Siglec-F.

      Based on these results, we concluded that fraction 9 to 11 would be the most suitable choice for further studies as the de-glycosylated spike protein, considering its retained functional properties relevant for ligating ACE2 and antibody motifs yet had lost Siglec-F binding. In the revised manuscript we have describe in more detail the purification of the PNGase F treated Trimer and its functional assessment.

      ‐ Increases in macrophages and alveolar macrophages by Kifunensine Tx spike in Fig 2A suggest effects that are not related to Siglec lectins. These effects are not seen with the wild type or D614 spike trimers, so the relevance of high‐ mannose spike is unclear. On the other hand, there were clear differences between Wuhan and D614 trimers seen in Fig 2A and 2B, but there was no verification to ascertain whether these differences were indeed due to strain differences and not due to batch‐to‐batch variability of the recombinant protein production. The overall glycan contents of the Wuhan and D614 spike protein samples should be measured. If Siglec interaction is the main interest in this study, the terminal sialic acid contents should be determined and compared to those in the corresponding strains in the context of native SARS‐CoV‐2 virions.

      Our initial observation that Siglec-F positive alveolar macrophages (AMs) avidly acquired spike proteins followed by a rapid leukocyte recruitment provided the rational for us to examine the impact of modifying the glycosylation pattern on the spike protein (de-glycosylated and spike variants) on their binding tropism and their cellular recruitment profiles in the lung. In this context, we examined the influence of several glycan modification on spike proteins, hypothesizing that these modifications would alter the acquisition of the spike protein by mouse AMs compared to the wild-type trimer. While we did not conduct an indepth analysis of the glycan composition and terminal sialic acid contents of the SARS-CoV-2 spike proteins we used we did verify that the different proteins behaved as expected. Most of the biochemical studies were performed in Jim Arthos’ laboratory, which has a long interest in the glycosylation of the HIV envelope protein. On SDS-PAGE the SARS-CoV-2 spike protein purified from the Kifunesine treated CHO cells exhibited a 12 kDa reduction. It bound much better to L-Sign, DC-Sign, and maltose binding lectin, and poorly to Siglec-F. In the cellular studies it bound less well to most of the cellular subsets examined including murine alveolar macrophages. In studies with human blood leukocytes, it relied on cations for binding. However, it retained its toxicity directed at mouse and human neutrophils and it elicited a similar cytokine profile when added to human macrophages. The D614G mutation increased the spike protein binding to P-Selectin, CD163, and snowdrop lectin (mannose binding) suggesting that the mutation had altered the glycan content of the protein. We used the D614G spike protein in a limited number of experiments as it behaved like the wild-type protein except for a slightly altered cellular retention pattern 18 hrs after intranasal instillation. In the revised manuscript we have included its binding to peripheral blood leukocytes. The D614G mutation conferred stronger binding to human monocytes than the original Spike protein. As discussed above, we recovered two fractions following the PNGase F treatment, one with a 40 kDa reduction on SDS-PAGE and the other a 60 kDa decrease and we chose to evaluate the fraction with a 40 kDa reduction in subsequent experiments. Consistent with a loss of N-linked glycans the PNGase F treatment reduced the binding to the lectin PHA, which recognizes complex carbohydrates, and it resulted in a sharp reduction in Siglec-F binding. The lower molecular weight fraction recovered after PNGase F treatment no longer bound ACE2. While our studies showed that alveolar macrophages likely employ Siglec-F as a capturing receptor they possess other receptors that also can capture the spike protein. The downstream consequences of engaging SiglecF and other Siglecs by the SARS-CoV-2 spike protein will require additional studies.

      While acknowledging the possibility of some batch-batch variation in recombinant protein preparation, we don’t think this was a major issue. We have noted some batch-batch variations in yield- efficiency, however the purified proteins consistently gave similar results in the various experiments.

      ‐ Fig 3: The same concern described above applies to the hCoV‐HKU1 spike protein. In Panel D, the PNGase and Kifunensine treatment did not appear to abrogate the neutrophil recruitment. Panel A did not include PNGase and Kif Tx spike proteins. Quantification of images in panel D is missing and should be done on many randomly selected areas.

      We analyzed the neutrophil count of images in panel D and the results are presented. (Figure 3-figure supplement 1C). The Kifunensine treatment reduced the neutrophil recruitment at 3 hours, while the PNGase F treated Spike protein recruited as well or slightly more neutrophils. The hCoV-HKU1 S1 domain did not differ much from the saline control.

      ‐ Fig 4: Kifunensine Tx spike caused more increase in neutrophil damage after intrascrotal injections. PNGase Tx spike was not tested. Connection between Siglec‐spike binding and neutrophil recruitment/damage is lacking.

      Exteriorized cremaster muscle imaging functions as a model system for monitoring neutrophil behavior recruited by spike proteins within the local tissue, distinct from Siglec F-positive alveolar macrophages residing in lung tissue. Hence, our primary focus was not on investigating the Siglec/Spike protein interaction. Consequently, we did not utilize PNGase F-treated spike protein in these experiments. To clarify this issue, we added a sentence in main text ‘Although this model lacks Siglec F-positive macrophages, it is worth monitoring the effect of the SARS-CoV-2 Spike protein on neutrophils recruited in the inflammatory local tissue.’

      ‐ Fig 5. Neutrophil injury was also seen after inhalation (intranasal) of spike protein in mice and in vitro with human neutrophils. Panel B shows no titrating effects of spike (from 0.1 to 2) on Netosis of murine neutrophils. Panel C: Netosis was seen with human neutrophils at 1 but not 0.1. Is this species difference important?

      Given the observation of neutrophil NETosis in the mouse imaging experiment, our objective was to characterize the direct impact of the spike protein on human and murine neutrophils. The origins of the neutrophils are different as the murine neutrophils were purified from mouse bone marrow while the human neutrophils were purified from human blood. Both purification protocols led to greater than 98% neutrophils. However, the murine neutrophils contain many more immature cells (50-60%) because the bone marrow served as their source. Furthermore, the murine neutrophils are from 6–8-week-old mice while the human neutrophils are from 30-50 year-old humans. More work would be needed to sort out whether there is any difference between human and mouse neutrophils in their propensity to undergo netosis in response to Spike protein.

      ‐ Kifunensine Tx again did not cause any reduction, indicating the lack of involvement of sialic acid. How was this related to Siglec participation directly or indirectly? There was no quantification for Panel D.

      We do not think that Siglecs play a role in the induction of neutrophil netosis as the Spike proteins lacking Siglec interactions induced similar levels of netosis. Likely other neutrophil receptors are important. As noted in the text,

      "human neutrophils express several C-type lectin receptors including CLEC5A, which has been implicated in SARS-CoV-2 triggered neutrophil NETosis." Our goal with the data in Panel D was to visualize human neutrophil NETosis on trimer-bearing A549 cells we relied on the flow cytometry assays for quantification.

      ‐ The rationale for testing cation dependence is unclear and should be described. What is the significance of "cations enhanced leukocyte binding particularly so with the high mannose protein"? Are there cationdependent receptors for spike independent of glycans and huACE‐2? If so, how is this relevant to the main topic of this paper?

      It is well known that many glycan bindings by C-type lectins are calcium-dependent, involving specific amino acid residues that coordinate with calcium ions and bind to the hydroxyl groups of sugars. As discussed in our previous draft, the C-type lectin receptor L-SIGN has been suggested as a calciumdependent receptor for SARS-CoV-2, specifically interacting with high-mannose-type N-glycans on the SARS-CoV-2 spike protein. Therefore, it was worthwhile to investigate the calcium-dependent manner of spike protein binding to various types of immune cells. We added some data to this figure. It now includes the binding profile of the D614G protein. In addition, we corrected the binding data by subtracting the fluorescent signal from the unstained control cells.

      ‐ Fig 7: human Siglec 5 and 8 were studied in comparison with mouse Siglec F. Recombinant protein data are not congruent with transfected 293 cell data. Panel A, the best binding to hSiglec 5 and 8 are the PNGase F Tx spike protein; how to interpret these data? Panel B: only the WT and D614G spike proteins binding to Siglec 5 and 8 on transfected cells. It made sense that kif Tx (high‐mannose) and PNGaseF Tx (no glycan) spike would not bind to the Siglecs, but they did not bind to ACE2 either, indicative of nonfunctional spike proteins.

      We discussed this as follows: ‘The closest human paralog of mouse Siglec-F is hSiglec-8 (reference 40). While expressed on human eosinophils and mast cells, human AMs apparently lack it. In contrast, human AMs do express Siglec-5 (reference 37). Along with its paired receptor, hSiglec-14, Siglec-5 can modulate innate immune responses (reference 41). When tested in a bead binding assay, in contrast to Siglec-F, neither hSiglec-5 or -8 bound the recombinant spike protein, yet their expression in a cellular context allowed binding. The in vitro bead binding assay we established demonstrated the specific binding of the bait molecule to target molecules. However, it does have limitations in replicating the complexities of the actual cellular environment. As discussed previously the PNGase Tx fraction we used in these experiments retained ACE2 binding, but loss binding to Siglec-F in the bead assay. In a biacore assay, not shown, the PNGase Tx fraction bound L-Sign and DC-Sign better than the untreated trimer, and it retained human ACE2 binding although it bound less well than wild type-trimer. Why the PNGase Tx fractions bound poorly to the human ACE2 transfected HEK293 cells is unclear. A higher density of recombinant ACE2 on the beads compared to that expressed on the surface of HEK293 may explain the difference. Alternatively in the bead assay we used a recombinant human ACE2-Fc fragment fusion protein purified from HEK293 cells, while in the transfection assay, we expressed human full length ACE2. The biacore, the bead binding, and the functional assays we performed all suggest that we had used intact recombinant proteins.

      ‐ Fig 8: This last set of experiment was to measure cytokine release by different types of macrophage cultures treated with spike from different cells with vs without Kifunensine Tx. The connection of these experiments to the rest is tenuous and is not explained. This is one of the examples where bits of data are presented without tying them together.

      Dysregulated cytokine production significantly contributes to the pathogenesis of severe COVID-19 infection. Since we had observed strong binding of the spike protein to human monocytes and murine alveolar macrophages, we tested whether the spike protein altered cytokine production by human monocyte-derived macrophages. Depending on the culture conditions human monocytes can be differentiated M0, M1, or M2 phenotypes. Each type of macrophage responds differently to stimulants, often leading to distinct patterns of cytokine secretion. These patterns offer valuable insights into the immune response. The cytokine profiling conducted in this study enhances our understanding of how distinct macrophage types react to the spike protein.

      ‐ Discussion section did not describe how the various experiments and data are tied together. The authors explained the interactions of spike with different cell types in each paragraph separately, leaving this reviewer really confused as to what the authors want to convey as the main message of the paper.

      We have modified discussion to address this issue.

      Reviewer #3 (Recommendations For The Authors):

      ‐ The authors may want to refer to "intranasal instillation" to distinguish it from inhalation of an aerosolised liquid. How was the dose of the spike protein selected? There is some dose information in different settings, but usually between 0.1‐1 µg/ml or 0.1 µg‐5 µg range for in vivo injection, but the rationale for these ranges should be discussed. Is this mimicking a real situation during infections or a condition that might be used for vaccines?

      While inhalation of aerosolized liquid closely mimics the natural route of human exposure to respiratory infectious materials, intranasal instillation with a liquid inoculum remains a widely accepted standard approach for virus or vaccine inoculation across various laboratory species. To clearly define our mouse model, we are changing the term 'inhalation' to 'instillation'. We previously answered to Reviewer #2 as following: To visualize the acquisition of spike protein and track cells that have acquired the spike protein, we conducted a series of tests and optimizations using different concentrations of Alexa Fluor 488 labeled spike protein, ranging from 0.5 to 5 µg. During the processing of lung tissue for microscopic imaging, it was of utmost importance to preserve the integrity of the labeled spike protein on the tissue samples. Through our investigations, we determined that an instillation of 3 µg of Alexa Fluor 488 labeled spike protein yielded the most optimal signal strength across the lung sections. Notably, in many mouse models employing intra-nasal instillation protocols for SARS-CoV-2 spike protein or RBD domain-only recombinant proteins, a dosage of approximately 3 µg or higher was commonly used. Hence, based on these references and our preliminary studies, we selected 3 µg as the optimal concentration of instilled spike protein per mouse.

      ‐ Controls are not evenly applied. In some cases, the control for the large and complex SARS‐CoV2 spiker trimer is PBS. This seems insufficient to control against effects of injecting such complex proteins that can undergo significant conformational changes after uptake by a cell. In some cases, human coronavirus spike proteins from different viruses are used, but not much is said about these proteins and the different glycoforms are not explored. Are these prepared in the same way and do they have similar glycoforms. For example, if the Siglecs bind sialic acid on N‐linked glycans, then why do the purified Siglecs or Siglecs expressed in cells not bind the HKU‐1 spike, which would have such sialic acids if expressed in the same way as the CoV2 spike?

      We have taken careful consideration to select an appropriate control material for these experiments. Initially, we opted to employ Saline or PBS for intranasal instillation as a vehicle control, a choice aligned with the approach taken in numerous previous studies involving lung inflammation mouse models. However, as the reviewer pointed out, we share the concern for achieving more meaningful and comparable control materials, particularly considering the size and complexity of the recombinant protein. In accordance with this perspective, we introduced glycan-modified spike proteins and the HCoV-HKU1 S1 subunit. Figure 3 illustrates our comprehensive evaluation of various spike proteins in terms of their impact on neutrophil recruitment. The diversity of sialic acid structures observed on recombinant proteins expressed within the same cell emerges from the intricate interplay of multiple factors within the cellular glycosylation machinery. This complex enzymatic process empowers cells to finely modulate glycan structures and sialic acid patterns, tailoring them to suit the diverse biological functions of distinct proteins. Despite structural similarities between the HCoV-HKU1 and SARS-CoV-2 spike proteins, their glycan modifications vary, thereby leading to distinct binding properties with various Siglec subtypes. All recombinant proteins used in this study except for the S1 subunits were generated within our laboratory. These include the wild-type spike protein, the D614G Spike protein, the Kifunensine-treated high mannose spike proteins, and the PNGase F-treated deglycosylated spike proteins. All the proteins were produced using the same protocol using CHO cells or on occasion HEK293F cells. We have indicated in the manuscript where we used HEK293F cells for the protein production otherwise they were produced in CHO cells.

      ‐ Figure 1 F‐I, there should be a control for VLP without SARS‐CoV2 spike as the VLP will contain other components that may be active in the system.

      We tested the delta Env VLP for alveolar macrophage acquisition and neutrophil recruitment. We found a similar alveolar macrophage acquisition of the VLPs, but significantly less neutrophil recruitment compared to the free Spike protein. Since the uptake pattern with the VLPs matched that of the spike protein we did not consider adding a non-spike bearing VLP as a control. The rapid VLPs clearance into the lymphatics shortly after instillation may account for the reduced neutrophil recruitment following their instillation (Figure 1 figure supplement 2B, C).

      ‐ In Figure 1H, that do they mean by autofluorescence? Is this the cyan signal?

      Is the green signal also autofluorescence as this is identified as the VLP?

      We appreciate reviewer pointing out the typo regarding autofluorescence in the figure image. To provide clarity regarding the background in all lung section images, we have included additional supplemental data. During the fixation process of lung tissue, various endogenous elements in the tissue sample contribute to autofluorescence when exposed to lasers in the confocal microscope. Specifically, collagen and elastin present in the lung vasculature, including airways and blood vessels, are dominant structures that generate autofluorescence. To address this issue, we have implemented optimizations to distinguish between real signals and the noise caused by autofluorescence. We inadvertently failed to indicate the source of the strong cyan signal. The signal is due to Evans Blue dye delineating lung airway structures, which contain collagen and elastin—known binding materials for Evans Blue dye. This explains the strong fluorescence signals observed in the airways. We conjugated the recombinant spike protein with Alexa Fluor 488, and viral-like particles (VLPs) were visualized with gag-GFP. (Figure 1 figure supplement 2A, D)

      ‐ The control for SARS‐CoV2 spike trimer is PBS, but how can the authors distinguish patterns specific to the spike trimer from any other protein delivered by intranasal instillation. Could they use another channel with a control glycoprotein to determine if there is anything unique about the pattern for spike trimer?

      Alveolar macrophages employ numerous receptors to capture glycoproteins that have mannose, Nacetylglucosamine, or glucose exposed. Galactose-terminal glycoproteins are typically not bound. We do not think that the Spike protein is unique in its propensity to target alveolar macrophages.

      ‐ What is the parameter measured in Figure S2B?

      The percentage of the different cell types that have retained the instilled Spike protein at the three-hour time point. .

      ‐ The Spike trimer with high mannose oligosaccharides may gain binding to the mannose receptor. It may be helpful to state the distribution of this receptor and comment is it could be responsible for this having the largest effect size for some cell types.

      We agree that the spike trimer with high mannose should target cells bearing the mannose receptor. We have modified the discussion to address this point and have mentioned some of the cell types likely to bind the high mannose bearing spike protein.

      ‐ A key experiment is the Evans Blue measure of lung injury in Figure 3A. A control with the HKU‐1 spike is also performed, but more details on the matching of this proteins production to the SARS‐CoV2 spike trimer and the quantification of these comparative result should be provided. To show that the SARSCoV2 spike trimer can cause tissue injury on its own seems like a very important result, but the impact is currently reduced by the inconsistent application of controls and quantification of key results. Furthermore, if these results can be repeated in the B6 and B6 K18‐hACE2 mouse model it might further increase the impact by demonstrating whether or not hACE2 contributes to this effect.

      We repeated the lung permeability assay using the S1 subunit from the original SARS-CoV-2 and the S1 subunit from HCoV-HKU1. Both proteins were made by the same company using a similar expression system and purification protocol. Consistent with our original data, the instillation of the SARS-CoV-2 S1 subunit led to an increase in lung vasculature permeability, whereas the HCoV-HKU-1 S1 subunit had a minimal impact. (Figure 3 figure supplement 1A). This experiment suggests that it the S1 subunit that leads to the increase in vascular permeability. To address the contribution of hACE2 in this phenomenon, we conducted a lung permeability assay using K18-hACE2 transgenic mice. The K18-hACE2 transgenic mice exhibited a slight increase in lung vasculature permeability upon SARS-CoV-2 trimer instillation compared to the non-transgenic mice. This suggests that the hACE2-Spike protein interaction may contribute to an increase in lung vascular permeability during SARS-CoV-2 lung infection (Figure 3 figure supplement 1B).

      ‐ For Figure 4A, could they provide quantification. The neutrophil extravasation with Trimer appears quite robust, but the authors seem to down‐play this and it's not clear without quantification.

      To address this issue, we analyzed and graphed the neutrophil numbers in each image. Injection of the trimer along with IL-1β significantly increased neutrophil infiltration. (Figure 4 figure supplement 1)

      ‐ In Figure 4B, there are no neutrophils at all in the BSA condition. Is this correct? Intravascular neutrophils were detected with PBS injection in Figure 4A.

      We demonstrated that the neutrophil behaviors occur within the infiltrated tissue rather than within the blood vessels. Even when examining the blood vessels in all other images, it is challenging to identify neutrophils adhering to the endothelium of the blood vessels. Neutrophils observed in the PBS 3-hour control group are likely acute responders to the local injection, as a smaller number of neutrophils were observed in the 6-hour image.

      ‐ In Figure 5A the observation of neutrophil response in lung slices seems to be presented an anecdotal account. The neutrophil appears to polarize, but is this a consistent observation? How many such observations were made?

      We have consistent observations across three different experiments. In addition, highly polarized and fragmented neutrophils were consistently observed in the fixed lung section images.

      ‐ The statement: "human Siglec‐5 and Siglec‐8 bound poorly despite being the structural and functional equivalents of Siglec F, respectively (37)". How can one Siglec be the structural and the other the functional equivalent of Siglec‐F? It might help to provide a little more detail as to how these should be seen.

      Mouse Siglec-F has two distinct counterparts in the human Siglec system, both in terms of structure and function. In the context of domain structure, human Siglec-5 serves as the counterpart to mouse Siglec-F. However, it's important to note that while human Siglec-8 is not a genetic ortholog of mouse Siglec-F, it is expressed on similar cellular populations and functions as a functional paralog.

      ‐ The assay using purified proteins and proteins expressed in cells don't fully agree. For example, it's very surprising that recombinant Siglec 5 and 8 bind better to the non‐glycosylated form than to the glycosylated trimer. It appears from Figure S1 that the PNGaseF treated Spike contains at least partly glycosylated monomers and it also appears that the Kifunesine effect may be partial. PNGaseF may have a hard time removing some glycans from a native protein.

      We were also surprised by the results using the PNGase F treated Spike protein in that it lost binding to Siglec-F and retained binding to human Siglec-5 and 8 in the bead assay, shown in Figure 7A. As explained above we used a purified fraction of the PNGase F treated protein that retained some functional activity as assessed in the ACE2 binding assay and in biacore assays not shown. The persistent binding of Siglec-5 and Siglec-8 suggests that removal of some of the complex glycans had revealed sites capable of binding Siglec-5 and 8. We would agree with the reviewer that the PNGase treatment we used only removed some of the glycans from the native protein. In data not shown the high mannose spike protein behaved as predicted in biacore assays, binding better to DC-SIGN and maltose binding lectin, but less well to PHA and less well to ACE2. The high mannose trimer also bound less to the HEK293 cells expressing ACE2, Siglec-5, or Siglec-8 as well as peripheral blood leukocytes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings that examine both how Down syndrome (DS)-related physiological, behavioral, and phenotypic traits track across time, as well as how chronic treatment with green tea extracts 25 enriched in epigallocatechin-3-gallate (GTE-EGCG), administered in drinking water spanning prenatal through 5 months of age, impacts these measures in wild-type and Ts65Dn mice. However, the strength of the evidence is incomplete, due to high variability across measures, perhaps attributable to a failure to include sex as a factor for measures known to be sexually dimorphic. This study is of interest to scientists interested in Down Syndrome and its' treatment, as well as scientists who study disorders that impact multiple organ systems.

      Public Reviews:

      Using Ts65Dn - the most commonly used mouse model of Down syndrome (DS) - the goal of this study is two-pronged: 1) to conduct a thorough assessment of DS-related genotypic, physiological, behavioral, and phenotypic measures in a longitudinal manner; and 2) to measure the effects of chronic GTE-EGCG on these measures in the Ts65Dn mouse model. Corroborating results from several previous studies on Ts65Dn mice, findings of this study show confirm the Ts65Dn mouse model exhibits the suite of traits associated with DS. The findings also suggest that the mouse model might have experienced drift, given the milder phenotypes than those reported by earlier studies. Results of the GTE-EGCG treatment do not support its therapeutic use and instead show that the treatment exacerbated certain DS-related phenotypes.

      Strengths:

      The authors performed a rigorous assessment of treatment and examined treatment and genotypic alterations at multiple time points during growth and aging. Detailed analysis shows differences in genotype during aging as well as genotype with treatment. This study is solid in the overarching methodological approach (with the exception of RNAseq, described below). The biggest strength of the study is its approach and dataset, which corroborate results from a multitude of past studies on Ts65Dn mice, albeit on adult specimens. It would be beneficial for the dataset to be made available to other researchers using a public data repository.

      We deeply appreciate the reviewers' positive feedback. Their acknowledgment of the solid methodological approach and the rigorous assessment of genotypic and treatment effects over various developmental stages resonates with our motivation. Their suggestion to make the dataset available in a public data repository for other researchers is well-taken. We are committed to data sharing and we are creating a dedicated platform to facilitate the accessibility of our research data to the scientific community. Given its size and complexity, we currently hold the dataset available upon reasonable request to the corresponding authors.

      Weaknesses:

      There are several primary weaknesses, described below:

      Sex was not considered in the analyses.

      The number of experimental animals of each sex are not clearly represented in the paper, but are buried in supplemental tables, and the Ns for the RNAseq are unclear. No analyses were done to examine sex differences in male/female DS or WT animals with or without treatment. Body measurements will greatly vary by sex, but this was not taken into consideration during assessments. As such, there is a high amount of variability within each cohort measured for body assessments (tibia, body weight, skeletal development etc.). Supplemental table 14 had the list of each animal, but not collated by sex, genotype or treatment, making it difficult to assess the strength of each measurement.

      Our study primarily concentrated on providing a holistic understanding of the impact of trisomy and GTE-EGCG treatment on Down syndrome, and was not explicitly designed to investigate sexual dimorphism. However, instead of reporting on only one sex and thereby obviating sex as a source of variation, as in previously published studies, we decided to include both male and female mice within the study design to represent a more realistic portrayal of the nature of Down syndrome in a heterogeneous population. By encompassing both sexes, we aim to better capture the variability in Down syndrome.

      As we do acknowledge the significance of sex bias in scientific research, we considered performing post-hoc analyses to test the effect of sexual dimorphism, but found that our dataset was underpowered to obtain reliable results, since our experiments were not a priori designed to investigate this question and sample sizes for each sex by separate were not large enough. Nevertheless, considering the reviewer’s comment, we have taken specific steps to improve the representation of sex-related information and to enhance the clarity of our manuscript.

      First, we have redesigned all figures using empty and full symbols to distinguish male from female mice within each analysis, providing readers with an immediate sense of the sex distribution in each experimental group. Moreover, we have modified Supplementary Table 1 to offer a comprehensive breakdown of the number of male and female mice for each test, along with their respective genotypes and treatment groups. This table aims to make the sample size and sex distribution within our study as transparent as possible for our readers. While we acknowledge that our study lacked the statistical power to perform a detailed sex-based analysis, the visual representation of sex in our data shows which systems are mainly affected by sexual dysmorphism. This evidence can guide future investigations directly designed to investigate sexual effects in certain systems or structures.

      Key results are not clearly depicted in the main figures

      Rigorous assessment of each figure and the clarity of the figure to convey the results of the analysis needs to be performed. Many of the figures do not clearly represent the findings, with authors heavily relying on supplemental figures to present details to explain results. Figure legends do not adequately describe figures; rather, they are limited to describing how the analysis is performed. For example, LDA plots in Figure 4 do not clearly convey the results of metabolite analysis.

      Overall, the amount of data presented here is overwhelming, making it difficult to interpret the findings. Some assessments that do not add to the overall paper need to be removed. Clarifying the text, figures and trimming the supplement to represent the data in a manner that is easily understood will improve the readability of the paper. For example, perhaps measures which are not strongly impacted by genotype could be moved to the supplement, because they are not directly relevant to the question of whether GTE-EGCG reverses the impact of trisomy on the measures.

      As rightly pointed out by the reviewers, the vast amount of data generated by our holistic and longitudinal approach is one of the primary strengths, but also an important challenge in our study. Our dataset encompasses a comprehensive assessment of the effects of treatment and genotypic alterations at multiple time points during growth and aging. This multi-dimensional evaluation is pivotal to our research, and relegating data to supplementary material would restrict access to this holistic understanding. Our aim is to provide readers with a complete view of the complex interactions we have explored, and retaining this data in the main text is essential to uphold the integrity of our work.

      Indeed, we specifically chose to submit or manuscript to eLife because this journal allows to access supplemental information directly from the text and figures in the main manuscript and best aligned with our approach to data presentation. The eLife format permits us to offer readers a quick and informative overview of all the data within the main figures employing multivariate techniques such as Linear Discriminant Analysis or Principal Component Analysis. Subsequently, we supply more detailed analyses in the supplementary figures for readers who wish to delve deeper into specific aspects. Furthermore, while certain figures may be categorized as supplementary, for us it is crucial, and we would like to emphasize, that every result is comprehensively described in the main text.

      Acknowledging the concerns raised about the density of our paper and the potential challenges in interpreting the findings, we have conducted a thorough review of the text and figure legends. We have made revisions with the goal to enhance clarity and readability. We have made dedicated efforts to ensure that readers can readily grasp the significance of our results and appreciate the intricacies of our findings. We firmly believe that with these revisions, our chosen approach is the most effective means of presenting the richness of our data and maintaining the integrity of our findings.

      Lack of clarity in the behavioral analyses

      Behavioral assessments are not clearly written in the methods. For example, for the novel object recognition task, it isn't clear how preference was calculated. Is this simply the percent of time spent with the novel object, or is this a relative measure (novel:familiar ratio)? This matters because if it is simply the percent of time, the relevant measure is to compare each group to 50% (the absence of a preference). The key measures for each test need to be readily distinguished from the control measures.

      There are also many dependent behavioral measures. For example, speed and distance are directly related to each other, but these are typically reported as control measures to help interpret the key measure, which is the anxiety-like behavior. Similarly, some behavioral tests were used to represent multiple behavioral dimensions, such as anxiety and arousal. In general, the measurements of arousal seem atypical (speed and distance are typically reported as control measures, not measures of arousal). Similarly, measures of latency during training would not typically be used as a measure of long-term memory but instead reported as a control measure to show learning occurred. LDA analysis requires independence of the measures, as well as normality. It does not appear that all of the measures fed into this analysis would have met these assumptions, but the methods also do not clearly describe which measures were actually used in the LDA.

      We agree with the reviewers’ concerns about the clarity of our behavioral analyses and we have thus added information to the methods section to clarify the procedures. Specifically, for SPSN, social approach was recorded as time spent close to STR1, and a preference ratio was calculated as Pref= 100 Time close to STR1/(Time close to STR1 + Time close to Empty). Social recognition memory was scored as preference towards STR2 and calculated as Pref =100 (time close to STR2) / (Time close to STR1 + Time close to STR2). For NOR, preference for novel object was calculated as Pref=100* Time novel object / (Time familiar object + novel object).

      With regards to the different variables reported for the behavioral protocols, we agree that some measures, such as path length and speed can be used as control measures. For example, in an open field test, path length is an important control measure to assess whether an animal is engaged in the task. However, if an animal is actively moving, the amount of distance covered can but does not have to correlate with the amount of time that a mouse spends in the center of the open field. Using the measure of distance covered as a measure for general arousal and time spent in the center as a measure for anxiety related behavior allows a more nuanced evaluation of animal behavior. For instance, two animals spending similar amounts of time in the center may exhibit differences in the distance they cover. In this scenario, we would argue that anxiety related behavior (defined as exploring the center of an open field) would not reflect well a behavioral difference between the two animals, while the aspect of arousal clearly is a differencing factor.

      Regarding the PA task and the use of latency during training, we agree that typically latency during training can be used as control measure to show that learning occurred. However, our study involved testing animals at two distinct time points. Contextual fear conditioning creates very robust memory traces that can persist for weeks or even months, and therefore the starting premise is very different when repeating the test. Initially, the animals were experimentally naïve and had not yet experienced a foot shock, leading to a rapid entry into the dark box. However, after experiencing the first CS-US presentation, a robust and persistent contextual fear memory trace is formed. Therefore, the latency observed in the second training phase of the PA reflects in essence long-term contextual fear memory, that is robustly displayed in WT animals but less in treated WT and TS animals. We have included this clarification in the methods and results sections.

      Finally, we want to thank the reviewer for noticing the error in the LDAs, as the analysis was indeed performed including dependent variables for some systems. We have re-evaluated the LDAs for the behavioral tests and tibia microarchitecture tests, excluding dependent variables. As a result, the text and significance levels have been adjusted accordingly. To enhance transparency and clarity, we have included Supplementary Table S21, which precisely outlines the variables included in each LDA.

      Unclear value of RNAseq

      RNAseq was performed in cerebellum, a relatively spared region in DS pathology at an early time point in disease. Further, the expression of 125 genes triplicated in DS was shown in a PCA plot to highly overlap with WT, indicating that there are minimal differences in gene expression in these genes. If these genes are not critical for cerebellar function, perhaps this could account for the lack of differences between WT and Ts65Dn mice. If the authors are interested in performing RNAseq, it would have made more sense to perform this in hippocampus (to compare with metabolites) and to perform more stringent bioinformatic analysis than assessment by PCA of a limited subset of genes. Supplementary Table S14, which shows the differentially expressed genes, appears to be missing from the manuscript and cannot be evaluated. Additionally, the methods of the RNAseq are not sufficiently described and lack critical details. For example, what was the normalization performed, and which groups were compared to identify differentially expressed genes? It would also be worthwhile to describe how animals were identified for RNAseq-were those animals representative of their groups across other measures?

      We acknowledge the reviewers' comments on the RNAseq analysis and would like to provide additional insights into our rationale and choices for this analysis. The primary aim of our RNAseq analysis was to offer supplementary evidence in support of the broader context of our paper. Rather than focusing on specific genes, our aim was to assess potential alterations in transcription within genes triplicated in the mouse model and explore differentially expressed genes across the entire genome. Therefore, we conducted a global analysis of the triplicated genes using a PCA and analyzed the differentially expressed genes across the entire genome as shown in Supplementary Table S14. The table was originally included as a separate Excel file but apparently it was not received by the reviewers. We have contacted the eLife editorial to ensure its inclusion in the current version. Furthermore, we have modified the text to clarify that both the triplicated genes and the entire genome were analyzed.

      Regarding the use of cerebellum instead of hippocampus, we agree with the reviewers that the hippocampus is a major tissue of interest in the study of Down Syndrome since it mostly relates to cognition. Trisomic patients, however, also display other typical features such as for example a delay in the acquisition of motor skills. Here we decided to focus on the cerebellum as it is primarily associated to the locomotor system but also plays a role in other cognitive functions such as language processing and memory. Furthermore, at the time of the RNAseq analysis, the mice were 8 months old, equivalent to the adult human stage, and previous studies have shown transcriptomic alterations in this tissue and mouse model (Olmos-Serrano et al., 2016; Saran et al., 2003).

      The lack of observable differences between WT and Ts65Dn mice in our PCA analysis may be attributed to several factors as discussed in our article. First, the high variability within each group, inherent to the complexity of DS, may obscure inter-group differences. Additionally, the subtlety of gene expression differences between WT and trisomic mice in the set of triplicated genes, as suggested by other transcriptomics studies on DS (Aït Yahya-Graison et al., 2007; Lyle et al., 2004; Olmos-Serrano et al., 2016; Saran et al., 2003), may contribute to the limited distinctions observed. Furthermore, regarding treatment effects, the timing of the RNAseq analysis should be considered since it was conducted at the endpoint, three months after treatment cessation. This temporal aspect could imply that the effects of the drug are not persistent, and a molecular memory might not be formed and maintained.

      Nevertheless, we appreciate the reviewers' constructive comments and acknowledge the potential for more stringent bioinformatic analyses. While our intention was to provide an initial, global perspective, we are eager to support further investigations that delve deeper into the complexities of DS-related molecular mechanisms. Consequently, the dataset is available for other researchers to explore more specific questions upon request.

      Finally, we have updated the methods section of the article to offer more detailed information on RNAseq processing and analysis. We have also clarified that all the surviving mice were included in the analysis.

      Recommendations for the authors:

      (1) Please add power calculations for each of the assessments.

      We would like to clarify that we had already conducted power calculations as part of the initial planning and design phase of our study. After data acquisition and analysis, we have utilized appropriate statistical methods to interpret the results based on the data we have collected. Given that we had conducted a priori power calculations prior to data collection and that our analysis is based on the acquired data, we do not see the added value in including post hoc power calculations. Our primary focus has been on performing the correct statistical analyses to accurately interpret the results and draw meaningful conclusions.

      (2) Introduction has some excessive references for each statement, which are not necessary. For instance: lines 67-73 are only references for 1 statement and lines 74-76 are references for a 2nd statement in the same sentence.

      We have removed redundant references.

      (3) Introduction: Lines 136-146 Gene names need to be spelled out, not just the IDs. Were these studies done in human or mouse models of DS?

      We have spelled out the names of the genes.

      (4) Why was brain volume and brain structure size normalized to body weight, not clearly explained?

      The choice to normalize brain volume and brain structure size to body weight was a deliberate decision made to address potential confounding factors in our study. In the case of trisomic (TS) mice, they are generally smaller in size compared to their wild-type (WT) counterparts. The same may hold true for sex-related size differences. Without normalization, assessing brain volume and structure size could be misleading, as it might reflect the differences in overall body size rather than providing insights into the specific aspects of brain structure that we aimed to investigate. We have clarified this in the methods section.

      (5) In cognitive tests, some of the WT data represented in Figure 3 does not match supplemental findings. Again power calculations may indicate a higher number of WT mice are needed to clarify this discrepancy.

      We appreciate the reviewers' observation regarding the disparities between the data presented in Figure 3 and the supplemental figures. We would like to clarify that these variations are a result of the distinct analytical approaches employed in the two sets of data.

      In Figure 3 and all main figures, the data were analyzed using multivariate tests, which consider multiple variables simultaneously and are particularly suited for investigating the collective impact of multiple factors. Conversely, the results shown in the supplementary figures were derived from univariate tests, which focus on individual variables and are well-suited for addressing specific questions related to each variable in isolation. The discrepancies between the data in the main figures and the supplementary figures can be attributed to the differences in the analytical methods chosen.

      As for the suggestion of conducting power calculations to address the observed differences, we believe that the differences in data are inherent to the distinct analytical strategies and the specific research questions each analysis intended to answer. Power calculations may not be the most suitable approach in this context, as they pertain to sample size planning for hypothesis testing and may not reconcile the inherent dissimilarity between multivariate and univariate analyses.

      Aït Yahya-Graison, E., Aubert, J., Dauphinot, L., Rivals, I., Prieur, M., Golfier, G., . . . Potier, M. C. (2007). Classification of human chromosome 21 gene-expression variations in Down syndrome: impact on disease phenotypes. Am J Hum Genet, 81(3), 475-491. https://doi.org/10.1086/520000

      Lyle, R., Gehrig, C., Neergaard-Henrichsen, C., Deutsch, S., & Antonarakis, S. E. (2004). Gene expression from the aneuploid chromosome in a trisomy mouse model of down syndrome. Genome Res, 14(7), 1268-1274. https://doi.org/10.1101/gr.2090904

      Olmos-Serrano, J. L., Kang, H. J., Tyler, W. A., Silbereis, J. C., Cheng, F., Zhu, Y., . . . Sestan, N. (2016). Down Syndrome Developmental Brain Transcriptome Reveals Defective Oligodendrocyte Differentiation and Myelination. Neuron, 89(6), 1208-1222. https://doi.org/10.1016/j.neuron.2016.01.042

      Saran, N. G., Pletcher, M. T., Natale, J. E., Cheng, Y., & Reeves, R. H. (2003). Global disruption of the cerebellar transcriptome in a Down syndrome mouse model. Hum Mol Genet, 12(16), 2013-2019. https://doi.org/10.1093/hmg/ddg217

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      In this study, Yan et al. investigate the molecular bases underlying mating type recognition in Tetrahymena thermophila. This model protist possesses a total of 7 mating types/sexes and mating occurs only between individuals expressing different mating types. The authors aimed to characterize the function of mating type proteins (MTA and MTB) in the process of self- and non-self recognition, using a combination of elegant phenotypic assays, protein studies, and imaging. They showed that the presence of MTA and MTB in the same cell is required for the expression of concavalin-A receptors and for tip transformation - two processes that are characteristic of the costimulation phase that precedes cell fusion. Using protein studies, the authors identify a set of additional proteins of varied functions that interact with MTA and MTB and are likely responsible for the downstream signaling processes required for mating. This is a description of a fascinating self- and non-self-recognition system and, as the authors point out, it is a rare example of a system with numerous mating types/sexes. This work opens the door for the further understanding of the molecular bases and evolution of these complex recognition systems within and outside protists.

      The results shown in this study point to the unequivocal requirement of MTA and MTB proteins for mating. Nevertheless, some of the conclusions regarding the mode of functioning of these proteins are not fully supported and require additional investigation.

      Strengths:

      (1) The authors have established a set of very useful knock-out and reporter lines for MT proteins and extensively used them in sophisticated and well-designed phenotypic assays that allowed them to test the role of these proteins in vivo.

      (2) Despite their apparent low abundance, the authors took advantage of a varied set of protein isolation and characterization techniques to pinpoint the localization of MT proteins to the cell membrane, and their interaction with multiple other proteins that could be downstream effectors. This opens the door for the future characterization of these proteins and further elucidation of the mating type recognition cascade.

      Weaknesses:

      The manuscript is structured and written in a very clear and easy-to-follow manner. However, several conclusions and discussion points fall short of highlighting possible models and mechanisms through which MT proteins control mating type recognition:

      (1) The authors dismiss the possibility of a "simple receptor-ligand system", even though the data does not exclude this possibility. The model presented in Figure 2 S1, and on which the authors based their hypothesis, assumes the independence of MTA and MTB proteins in the generation of the intracellular cascade. However, the results presented in Figure 2 show that both proteins are required to be active in the same cell. Coupled with the fact that MTA and MTB proteins interact, this is compatible with a model where MTA would be a ligand and MTB a receptor (or vice-versa), and could thus form a receptor-ligand complex that could potentially be activated by a non-cognate MTA-MTB receptor-ligand complex, leading to an intracellular cascade mediated by the identified MRC proteins. As it stands, it is not clear what is the proposed working model, and it would be very beneficial for the reader for this to be clarified by having the point of view of the authors on this or other types of models.

      We are very grateful that Reviewer #1 proposed the possibility that MTA and MTB form a receptor-ligand complex in which one acting as the ligand and the other as the receptor. We considered this hypothesis when asking how dose MTRC function, too. However, our current results do not support this idea. For instance, if MTA were a ligand and MTB a receptor, we would expect a mating signal upon treatment with MTAxc protein, but not with MTBxc. Contrary to this expectation, our experiments revealed that both MTAxc and MTBxc exhibit very similar effects (Figure 5, green and blue), and their combined treatment produces a stronger effect (Figure 5, teal). This suggests a mixed function for both proteins. (We incorporated this discussion into the revised version [line 120-121, 240-244].) It is pity that our current knowledge does not provide a detailed molecular mechanism for this intricate system. We are actively investigating the protein structures of MTA, MTB, and the entire MTRC, hoping to gain deeper insights into the molecular functions of MTA and MTB.

      Additionally, we also realized that the expression we used in the previous version, “simple receptor-ligand model”, is not clearly defined. As Reviewer #1 pointed out, in this section, we examined whether the individual proteins of MTA and MTB act as a couple of receptor and ligand. We think this is the simplest possibility as a null hypothesis for Tetrahymena mating-type recognition. We have clarified it in the revised version (line 90-91, 104-106). According to this section, we proposed that MTA and MTB may form a complex that serves as a recognizer (functioning as both ligand and receptor) (line 117-118).

      (2) The presence of MTA/MTB proteins is required for costimulation (Figure 2), and supplementation with non-cognate extracellular fragments of these proteins (MTAxc, or MTBxc) is a positive stimulator of pairing. However, alone, these fragments do not have the ability to induce costimulation (Figure 5). Based on the results in Figures 5 and 6 the authors suggest that MT proteins mediate both self and non-self recognition. Why do MTAxc and MTBxc not induce costimulation alone? Are any other components required? How to reconcile this with the results of Figure 2? A more in-depth interpretation of these results would be very helpful, since these questions remain unanswered, making it difficult for the reader to extract a clear hypothesis on how MT proteins mediate self- and non-self-recognition.

      Several factors could contribute to the inability of MTA/Bxc to induce costimulation. It is highly likely that additional components are necessary, given that MTA/B form a protein complex with other proteins. Moreover, the expression of MTA/Bxc in insect cells, compared with Tetrahymena, might result in differences in post-translational modifications. Additionally, there are variations in protein conditions; on the Tetrahymena membrane, these proteins are arranged regularly and concentrated in a small area, while MTA/Bxc is randomly dispersed in the medium. The former condition could be more efficient. If there is a threshold required to stimulate a costimulation marker, MTA/Bxc may fail to meet this requirement. Much more studies are needed to fully answer this question. We acknowledged this limitation in the revised version (line 244-248).

      Reviewer #2:

      This manuscript reports the discovery and analysis of a large protein complex that controls mating type and sexual reproduction of the model ciliate Tetrahymena thermophila. In contrast to many organisms that have two mating types or two sexes, Tetrahymena is multi-sexual with 7 distinct mating types. Previous studies identified the mating type locus, which encodes two transmembrane proteins called MTA and MTB that determine the specificity of mating type interactions. In this study, mutants are generated in the MTA and MTB genes and mutant isolates are studied for mating properties. Cells missing either MTA or MTB failed to co-stimulate wild-type cells of different mating types. Moreover, a mixture of mutants lacking MTA or MTB also failed to stimulate. These observations support the conclusion that MTA and MTB may form a complex that directs mating-type identity. To address this, the proteins were epitope-tagged and subjected to IP-MS analysis. This revealed that MTA and MTB are in a physical complex, and also revealed a series of 6 other proteins (MRC1-6) that together with MTA/B form the mating type recognition complex (MTRC). All 8 proteins feature predicted transmembrane domains, three feature GFR domains, and two are predicted to function as calcium transporters. The authors went on to demonstrate that components of the MTRC are localized on the cell surface but not in the cilia. They also presented findings that support the conclusion that the mating type-specific region of the MTA and MTB genes can influence both self- and non-self-recognition in mating.

      Taken together, the findings presented are interesting and extend our understanding of how organisms with more than two mating types/sexes may be specified. The identification of the six-protein MRC complex is quite intriguing. It would seem important that the function of at least one of these subunits be analyzed by gene deletion and phenotyping, similar to the findings presented here for the MTA and MTB mutants. A straightforward prediction might be that a deletion of any subunit of the MRC complex would result in a sterile phenotype. The manuscript was very well written and a pleasure to read.

      Thanks for the valuable comments and suggestions. We are currently in the process of constructing deletion strains for these genes. As of now, we have successfully obtained ΔMRC1-3 and MRC4-6 knockdown strains. Our preliminary observations indicate that ΔMRC1-3 strains are unable to undergo mating. However, we prefer not to include these results in the current manuscript, as we believe that more comprehensive studies are still needed.

      Reviewer #3:

      The authors describe the role, location, and function of the MTA and MTB mating type genes in the multi-mating-type species T. thermophila. The ciliate is an important group of organisms to study the evolution of mating types, as it is one of the few groups in which more than two mating types evolved independently. In the study, the authors use deletion strains of the species to show that both mating types genes located in each allele are required in both mating individuals for successful matings to occur. They show that the proteins are localized in the cell membrane, not the cilia, and that they interact in a complex (MTRC) with a set of 6 associated (non-mating type-allelic) genes. This complex is furthermore likely to interact with a cyclin-dependent kinase complex. It is intriguing that T. thermophila has two genes that are allelic and that are both required for successful mating. This coevolved double recognition has to my knowledge not been described for any other mating-type recognition system. I am not familiar with experimental research on ciliates, but as far as I can judge, the experiments appear well performed and mostly support the interpretation of the authors with appropriate controls and statistical analyses.

      The results show clearly that the mating type genes regulate non-self-recognition, however, I am not convinced that self-recognition occurs leading to the suppression of mating. An alternative explanation could be that the MTA and MTB proteins form a complex and that the two extracellular regions together interact with the MTA+MTB proteins from different mating types. This alternative hypothesis fits with the coevolution of MTA and MTB genes observed in the phylogenetic subgroups as described by Yan et al. (2021 iScience). Adding MTAxc and/or MTBxc to the cells can lead to the occupation of the external parts of the full proteins thereby inhibiting the formation of the complex, which in turn reduces non-self interactions. Self-recognition as explained in Figure 2S1 suggests an active response, which should be measurable in expression data for example. This is in my opinion not essential, but a claim of self-recognition through the MTA and MTB should not be made.

      We express our gratitude to Reviewer #3 for proposing the occupation model and have incorporated this possibility into the manuscript. We believe it is possible that occupation may serve as the molecular mechanism through which self-recognition negatively regulates mating. If there is a physical interaction between mating-type proteins of the same type, but this interaction blocks the recognition machinery rather than initiating mating, it can be considered a form of self-recognition. This aligns with the observation that strains expressing MTA/B6 and MTB2 mate normally with WT cells of all mating types except for VI and II (line 203-204). A concise discussion on this topic is included in the manuscript (line 288-293, 659-661). We are actively investigating the downstream aspects of mating-type recognition, and we hope to provide further insights into this question soon.

      The authors discuss that T. thermophila has special mating-type proteins that are large, while those of other groups are generally small (lines 157-160 and discussion). The complex formed is very large and in the discussion, they argue that this might be due to the "highly complex process, given that there are seven mating types in all". There is no argument given why large is more complex, if this is complex, and whether more mating types require more complexity. In basidiomycete fungi, many more mating types than 7 exist, and the homeodomain genes involved in mating types are relatively small but highly diverse (Luo et al. 1994 PMID: 7914671). The mating types associated with GPCR receptors in fungi are arguably larger, but again their function is not that complex, and mating-type specific variations appear to evolve easily (Fowler et al 2004 PMID: 14643262; Seike et al. 2015 PMID: 25831518). The large protein complex formed is reminiscent of the fusion patches that develop in budding or fission yeasts. In these species, the mating type receptors are activated by ligand pheromones from the opposite mating type that induce polarity patch formation (see Sieber et al. 2023 PMID: 35148940 for a recent review). At these patches, growth (shmooing) and fusion occur, which is reminiscent (in a different order) of the tip transformation in T. thermophilia. The fusion of two cells is in all taxa a dangerous and complex event that requires the evolution of very strict regulation and the existence of a system like the MTRC and cyclin-dependent complex to regulate this process is therefore not unexpected. The existence of multiple mating types should not greatly complicate the process, as most of the machinery (except for the MTA and MTB) is identical among all mating types.

      We are very grateful that Reviewer #3 provide this insightful view and relevant papers. In response to the feedback, we removed the sentences regarding “multiple mating types greatly complicate the process” in the revised version. Instead, we have introduced a discussion section comparing the mating systems of yeasts and Tetrahymena (line 279-286).

      The Tetrahymena/ciliate genetics and lifecycle could be better explained. For a general audience, the system is not easy to follow. For example, the ploidy of the somatic nucleus with regards to the mating type is not clear to me. The MAC is generally considered "polyploid", but how does this work for the mating type? I assume only a single copy of the mating type locus is available in the MAC to avoid self-recognition in the cells. Is it known how the diploid origin reduces to a single mating type? This does not become apparent from Cervantes et al. 2013.

      In T. thermophila, the MIC (diploid) contains several mating-type gene pairs (mtGP, i.e., MTA and MTB) organized in a tandem array at the mat locus on each chromosome. In sexual reproduction, the new MAC of the progeny develops from the fertilized MIC through a series of genome editing events, and its ploidy increases to ~90 by endoreduplication. During this process, mtGP loss occurs, resulting in only one mtGP remaining on the MAC chromosome. The mating-type specificity of mtGPs on each chromosome within one nucleus becomes relatively pure through intranuclear coordination. After multiple assortments (possibly caused by MAC amitosis during cell fission), only mtGPs of one mating-type specificity exist in each cell, determining the cell’s mating type.

      It is pity that the exact mechanisms involved in this complicated process remain a black box. The loss of mating-type gene pairs is hypothesized to involve a series of homologous recombination events, but this has not been completely proven. Furthermore, there is no clear understanding of how intranuclear coordination and assortment are achieved. While we have made observations confirming these events, a breakthrough in understanding the molecular mechanism is yet to be achieved.

      We included more information in the revised version (line 672-683). Given the complexity of these unusual processes, we recommend an excellent review by Prof. Eduardo Orias (PMID: 28715961), which offers detailed explanations of the process and related concepts (line 685-686).

      Also, the explanation of co-stimulation is not completely clear (lines 49-60). Initially, direct cell-cell contact is mentioned, but later it is mentioned that "all cells become fully stimulated", even when unequal ratios are used. Is physical contact necessary? Or is this due to the "secrete mating-essential factors" (line 601)? These details are essential, for interpretation of the results and need to be explained better.

      Sorry that we didn’t realize the term “contact” is not precise enough. In Tetrahymena, physical contact is indeed necessary, but it can refer to temporary interactions. Unlike yeast, Tetrahymena cells exhibit rapid movement, swimming randomly in the medium. Occasionally, two cells may come into contact, but they quickly separate instead of sticking together. Even newly formed loose pairs often become separated. As a result, one cell can come into contact with numerous others and stimulate them. We have clarified this aspect in the revised version (line 50-51, 57).

      Abstract and introduction: Sexes are not mating types. In general, mating types refer to systems in which there is no obvious asymmetry between the gametes, beyond the compatibility system. When there is a physiological difference such as size or motility, sexes are used. This distinction is of importance because in many species mating types and sexes can occur together, with each sex being able to have either (when two) or multiple mating types. An example are SI in angiosperms as used as an example by the authors or mating types in filamentous fungi. See Billiard et al. 2011 [PMID: 21489122] for a good explanation and argumentation for the importance of making this distinction.

      We have clarified the expression in the revised version (line 20, 38, 40, 45).

      Recommendations for the authors:

      Reviewer #1:

      I really enjoyed reading this manuscript and I think a few tweaks in the writing/data presentation could greatly improve the experience for the reader:

      (1) The information about your previous work in identifying downstream proteins CDK19, CYC9, and CIP1 (lines 170-173) could be directly presented in the introduction.

      We have moved this information in the introduction in the revised version (line 74-77).

      (2) For a reader who is not familiar with Tetrahymena, a few more details on how reporter and knock-out lines are generated would be beneficial.

      We introduced the knock-out method in Figure 2 – figure supplement 1B, HA-tag method in Figure 3A, and MTB2-eGFP construction method in Figure 4E. In addition, we introduced how co-stimulation markers observed in Materials and Methods (line 404-410)

      (3) Figures 5 and 6: clarify the types of pairing and treatments that were done directly in the figure (eg. adding additional labels). As of now, it is necessary to go through the text and legend to try and understand in detail what was done.

      Cell types and treatments were directly introduced in the revised figure (Figure 5 and 6).

      (4) The logical transition in lines 136-142 is hard to follow.

      We rewrote this paragraph in the revised version (lines 143-156). Additionally, we added a figure to illustrate the theoretical mating-type recognition model between WT cells and ΔCDK19, ΔCYC9 cells, MTAxc, MTBxc proteins, and ΔMTA, ΔMTB cells (Figure 2 – figure supplement 1D-G).

      (5) Lines 191-196: the fact that cells expressing multiple mating types can self goes against an active self-rejection system - if this is the case there should be self-rejection among all expressed mating types. Unless non-self recognition is an active process and self-recognition is simply the absence of non-self recognition. The authors briefly mention this in lines 263-265, but it would be interesting to expand and clarify this.

      We appreciate that Reviewer #1 notice the interesting selfing phenotype of the MTB2-eGFP (MTVI background) strain. We further discussed it in the revised manuscript (line 298-306).

      (6) The authors briefly mention the possibility of different mating types using different recognition mechanisms (lines 255-260), based on the big differences in the size of the mating-specific region of MT proteins. Following this and the weakness nr. 2, I think it would be pertinent to gather and present more information on the properties and structures of the mating-type specific regions of MT proteins. Simple in silico analysis of motifs, structure, etc. could help clarify the role of these regions. It seems more parsimonious that MT proteins would have variable mating type specific regions that account for the recognition of the different mating types, and conserved cytoplasmic functions that could trigger a single downstream signaling cascade. It would be interesting to know the authors' opinion on this.

      We are very grateful for this suggestion. Actually, we are currently working on determining the 3D structure of MTRC. The Alphafold2 prediction indicates that the MT-specific region is comprised of seven global β-sheets, resembling the structure of immunoglobulins (Ig). Our most recent cryo-EM results have revealed a ~15Å structure, aligning well with the prediction. However, the main challenge lies in the low expression levels, both in Tetrahymena and insect/mammal cells. We anticipate obtaining more detailed results soon. Therefore, we prefer to present the MT recognition model with robust experimental evidence in the future, and didn’t discuss too much on this aspect in the current manuscript.

      (7) Adding a figure including a proposed model, as well as expanding the discussion on the points presented as "weaknesses" would help clarify the ideas/hypothesis on how the mating recognition works. I think this would really elevate the paper and help highlight the results.

      We added a figure to introduce the model and the weaknesses in the revised version (Figure 7, line 656-665).

      (8) Line 202-203: It is far-fetched to infer subcellular localization based on the data presented here, couterstaining with other dyes and antibodies specific to certain cell components, as well as negative control images, are required.

      Thanks for the suggestion. We attempted to stain cell components using various dyes and antibodies. Unfortunately, we found that cell surface and cilia (especially oral cilia) is very easy to give a false positive signal. We think this issue seriously affects the credibility of the results. It may seem like splitting hairs, but we are trying to be precise.

      Meanwhile, we still believe the mating-type proteins localizes to cell surface because MTA-HA is identified in the isolated cell surface proteins.

      Regarding negative control, as shown in Fig. 4G, where a MTB2-eGFP cell is pairing with a WT cell, no GFP signal is observed in the WT cell.

      (9) Lines 131: clarify the sentence - expression of Con-A receptors requires both MTA and MTB (MTA to receive the signal).

      We modified the sentence in the revised version (line 139-140).

      Reviewer #2:

      Minor points.

      (1) Line 194-196. Why are these cells able to self?

      These cells able to self may because the MTRC contain heterotypic mating-type proteins (MTA6 and MTB2), which activate mating when they interact with another heterotypic MTRC (line 207-208).

      (2) Line 232. What do the authors mean by the term synergistic effect here? Definition and statistics?

      Sorry about the confusion. The synergistic effect refers to the effect of MTAxc and MTBxc become stronger when using together. We clarified it in the revised version (line 232).

      (3) For Figure 4 panel D, are there antibodies that are available as a control for cilia? If so, then blotting this membrane would show that cilia-associated proteins are in the cilia preparation, which is a standard control for sub-cellular fractionation.

      Thanks for the suggestion. Unfortunately, we didn’t find a suitable cilia-specific antibody yet. Instead, we employed MS analysis to confirm the presence of cilia proteins in this sample (line 195-196, Figure 4–Source data 1). We also observed the sample under the microscope, which directly revealed the presence of cilia (Figure 4C).

      (4) At least one reference cited in the text was not present in the reference list. The authors should go through the references cited to ensure that all have made it into the reference list.

      We have checked all the references.

      Some minor edits:

      (1) MTA and MTB are presented in both roman and italics (e.g. line 209) in the manuscript. Maybe all should be in italics? Or is this a distinction between the gene and the protein?

      The italics word (MTA) refers to gene, and non-italics word (MTA) refers to protein.

      (2) Line 251. Change "achieving" to "achieve".

      We have corrected this word (line 266).

      Reviewer #3:

      Line 101. It would help to explain this expectation earlier in this paragraph.

      We explained the expectation in the revised version (line 92-97, 104-106).

      Line 109. How is a co-receptor different from the MTRC complex?

      We have rewritten the relevant sentences to enhance clarity (line 116-119). The molecular function of the MTRC complex could involve acting as a co-receptor or recognizer (functioning as both ligand and receptor). Based on the results presented in this section, we propose that MTA and MTB may function as a complex, but the confirmation of this hypothesis (MTRC) is provided in a later section. Therefore, we did not use the term “MTRC” here. These sentences briefly discuss the molecular function of this complex and explain why MTRC does not appear to function as a co-receptor.

      Line 251: which "dual approach" is referred to?

      Dual approach is referred to both self and non-self recognition. We explained it in the revised version (line 265-266).

      Line 258: what "different mechanisms" do the authors have in mind? Why would a different mechanism be expected? The different sizes could have evolved for (coevolutionary?) selection on the same mechanism.

      Sorry about the confusion. We clarified it in the revised version (line 269-278).

      What we intended to express is that we are uncertain whether the mating-type recognition model we discovered in T. thermophila is applicable to all Tetrahymena species due to significant differences in the length of the mating-type-specific region. We believe it is important to highlight this distinction to avoid potential misinterpretations in future studies involving other Tetrahymena species. At the same time, we look forward to future research that may provide insights into this question.

      Fig 2 C&D. Is it correct that these figures show the strains only after 'preincubation'? This is not apparent from the caption of the text. Additionally, the order of the images is very confusing. Write in the figures (so not just in the caption) what the sub-script means.

      These panels are re-organized in the revised version (Fig. 2C&D). There are three kinds of pictures: “not incubated”, “WT pre-incubated by mutant” and “mutant pre-incubated by WT”.

      The methods used to generate Figure 5 are not clearly described. I understand that the obtained xc proteins were added to the cells, and then washed, after which a test was performed mixing WT-VI and WT-VII cells. Were both cells treated? Or only one of the strains? The explanation for the reused washing medium is not clear and the method is not indicated.

      Both cells are treated. More details are provided in the revised manuscript (line 230-231, 633-634, 637-639, Fig. 5). To prepare the starvation medium containing mating-essential factors, cells were starved in fresh starvation medium for ~16 hours. Subsequently, cells were removed by three rounds of centrifugation (1000 g, 3 min) (line 330-332).

      In general, the figures are difficult to understand without repeated inquiries in the captions. Give more information in the figures themselves.

      More information is introduced in the figure (Fig. 2C, Fig. 3B, Fig. 4A, B, D, Fig. 5 and Fig. 6).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This paper suggests to apply intrinsically-motivated exploration for the discovery of robust goal states in gene regulatory networks.

      Strengths:

      The paper is well written. The biological motivation and the need for such methods are formulated extraordinarily well. The battery of experimental models is impressive.

      We thank the reviewer for sharing interest in the research problem and for recognizing the strengths of our work.

      Weaknesses:

      (1) The proposed method is compared to the random search. That says little about the performance with regard to the true steady-state goal sets. The latter could be calculated at least for a few simple ODE (e.g., BIOMD0000000454, `Metabolic Control Analysis: Rereading Reder'). The experiment with 'oscillator circuits' may not be directly interpolated to the other models.

      The lack of comparison to the ground truth goal set (attractors of ODE) from arbitrary initial conditions makes it hard to evaluate the true performance/contribution of the method. A part of the used models can be analyzed numerically using JAX, while there are models that can be analyzed analytically.

      "...The true versatility of the GRN is unknown and can only be inferred through empirical exploration and proxy metrics....": one could perform a sensitivity analysis of the ODEs, identifying stable equilibria. That could provide a proxy for the ground truth 'versatility'.

      We agree with the reviewer that one primary concern is to properly evaluate the effectiveness of the proposed method. However, as we move toward complex pathways, knowledge of the “true” steady-state goal sets is often unknown which is where the use of machine learning methods as the one we propose are particularly interesting (but challenging to evaluate).

      For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models. While we agree that it is still interesting to evaluate exploration methods on these simple models for checking their behavior, it is not clear how to scale this analysis to the targeted more complex systems.

      For systems whose true steady state distribution cannot be derived analytically or numerically, we believe that random search is a pertinent baseline as it is commonly used in the literature to discover the attractors/trajectories of a biological network. For instance, Venkatachalapathy et al. [1] initialize stochastic simulations at multiple randomly sampled starting conditions (which is called a kinetic Monte Carlo-based method) to capture the steady states of a biological system. Similarly, Donzé et al. [29] use a Monte Carlo approach to compute the reachable set of a biological network «when the number of parameters is large and their uncertain range is not negligible».

      (2) The proposed method is based on `Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning', which assumes state action trajectories [s_{t_0:t}, a_{t_0:t}], (2.1 Notations and Assumptions' in the IMGEP paper). However, the models used in the current work do not include external control actions, but rather only the initial conditions can be set. It is not clear from the methods whether IMGEP was adapted to this setting, and how the exploration policy was designed w/o actual time-dependent actions. What does "...generates candidate intervention parameters to achieve the current goal....", mean considering that interventions 'Sets the initial state...' as explained in Table 2?

      We thank the reviewer for asking for clarification, as indeed the IMGEP methodology originates from developmental robotics scenarios which generally focus on the problem of robotic sequential decision-making, therefore assuming state action trajectories as presented in Forestier et al. [65]. However, in both cases, note that the IMGEP is responsible for sampling parameters which then govern the exploration of the dynamical system. In Forestier et al. [65], the IMGEP also only sets one vector at the start (denoted θ∈Θ) which was specifying parameters of a movement (like the initial state of the GRN), which was then actually produced with dynamic motion primitives which are dynamical system equations similar to GRN ODEs, so the two systems are mathematically equivalent. More generally, while in our case the “intervention” of the IMGEP (denoted i ∈I) only controls the initial state of the GRN, future work could consider more advanced sequential interventions simply by setting parameters of an action policy π_i at the start which could be called during the GRN’s trajectory to sample control actions π_i (a_(t+1) 〖|s〗_(t0:t+1),a_t) where s_t would be the state of the GRN. In practice this would also require setting only one vector at the start, so it would remain the same exploration algorithm and only the space of parameters would change, which illustrates the generality of the approach.

      (3) Fig 2 shows the phase space for (ERK, RKIPP_RP) without mentioning the typical full scale of ERK, RKIPP_RP. It is unclear whether the path from (0, 0) to (~0.575, ~3.75) at t=1000 is significant on the typical scale of this phase space. is it significant on the typical scale of this phase space?

      The purpose of Figure 2 is to illustrate an example of GRN trajectory in transcriptional space, and to illustrate what “interventions” and “perturbations” can be in that context. To that end we have used the fixed initial conditions provided in the BIOMD0000000647, replicating Figure 5 of Cho et al. [56]. While we are not sure of what the reviewer means with “typical” scale of this phase space, we would like to point reviewer toward Figure 8 which shows examples of certain paths that indeed reach further point in the same phase space (up to ~10μM in RKIPP_RP levels and ~300μM in ERK levels). However, while the paths displayed in Figure 8 are possible (and were discovered with the IMGEP), note that they may be “rarer” to occur naturally in the sense that a large portion of the tested initial conditions with random search tend to converge toward smaller (ERK, RKIPP_RP) steady-state values similar to the ones displayed in Figure 2.

      (4) Table 2:

      a) Where is 'effective intervention' used in the method?

      b) in my opinion 'controllability', 'trainability', and 'versatility' are different terms. If their correspondence is important I would suggest to extend/enhance the column "Proposed Isomorphism". otherwise, it may be confusing.

      a) We thank the reviewer for pointing out that “effective intervention” is not explicitly used in the method. The idea here is that as we are exploring a complex dynamical system (here the GRN), some of the sampled interventions will be particularly effective at revealing novel unseen outcomes whereas others will fail to produce a qualitative change to the distribution of discovered outcomes. What we show in this paper, for instance in Figure 3a and Figure 4, is that the IMGEP method is particularly sample-efficient in finding those “effective interventions”, at least more than a random exploration. However we agree that the term “effective intervention” is ambiguous (does not say effective in what) and propose to replace it with “salient intervention” in the revised version.

      b) We thank the reviewer for highlighting some confusing terms in our chosen vocabulary, and we will try to clarify those terms in the revised version. We agree that controllability/trainability and versatility are not exactly equivalent concepts, as controllability/trainability typically refers to the amount to which a system is externally controllable/trainable whereas versatility typically refers to the inherent adaptability or diversity of behaviors that a system can exhibit in response to inputs or conditions. However, they are both measuring the extent of states that can be reached by the system under a distribution of stimuli/conditions, whether natural conditions or engineered ones, which is why we believe that their correspondence is relevant.

      I don't see how this table generalizes "concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective".

      We propose to replace “generalize” with “investigate” in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Etcheverry et al. present two computational frameworks for exploring the functional capabilities of gene regulatory networks (GRNs). The first is a framework based on intrinsically-motivated exploration, here used to reveal the set of steady states achievable by a given gene regulatory network as a function of initial conditions. The second is a behaviorist framework, here used to assess the robustness of steady states to dynamical perturbations experienced along typical trajectories to those steady states. In Figs. 1-5, the authors convincingly show how these frameworks can explore and quantify the diversity of behaviors that can be displayed by GRNs. In Figs. 6-9, the authors present applications of their framework to the analysis and control of GRNs, but the support presented for their case studies is often incomplete.

      Strengths:

      Overall, the paper presents an important development for exploring and understanding GRNs/dynamical systems broadly, with solid evidence supporting the first half of their paper in a narratively clear way.

      The behaviorist point of view for robustness is potentially of interest to a broad community, and to my knowledge introduces novel considerations for defining robustness in the GRN context.

      We thank the reviewer for recognizing the strengths and novelty of the proposed experimental framework for exploring and understanding GRNs, and complex dynamical systems more generally. We agree that the results presented in the section “Possible Reuses of the Behavioral Catalog and Framework” (Fig 6-9) can be seen as incomplete along certain aspects, which we tried to make as explicit as possible throughout the paper, and why we explicitly state that these are “preliminary experiments”. Despite the discussed limitations, we believe that these experiments are still very useful to illustrate the variety of potential use-cases in which the community could benefit from such computational methods and experimental framework, and build on for future work.

      Some specific weaknesses, mostly concerning incomplete analyses in the second half of the paper:

      (1) The analysis presented in Fig. 6 is exciting but preliminary. Are there other appropriate methods for constructing energy landscapes from dynamical trajectories in gene regulatory networks? How do the results in this particular case study compare to other GRNs studied in the paper?

      We are not aware of other methods than the one proposed by Venkatachalapathy et al. [1] for constructing an energy landscape given an input set of recorded dynamical trajectories, although it might indeed be the case. We want to emphasize that any of such methods would anyway depend on the input set of trajectories, and should therefore benefit from a set that is more representative of the diversity of behaviors that can be achieved by the GRN, which is why we believe the results presented in Figure 6 are interesting. As the IMGEP was able to find a higher diversity of reachable goal states (and corresponding trajectories) for many of the studied GRNs, we believe that similar effects should be observable when constructing the energy landscapes for these GRN models, with the discovery of additional or wider “valleys” of reachable steady states. We could indeed add other case studies in the supplementary to support the argument for the revised version.

      Additionally, it is unclear whether the analysis presented in Fig. 6C is appropriate. In particular, if the pseudopotential landscapes are constructed from statistics of visited states along trajectories to the steady state, then the trajectories derived from dynamical perturbations do not only reflect the underlying pseudo-landscape of the GRN. Instead, they also include contributions from the perturbations themselves.

      We agree that the landscape displayed Fig. 6C integrates contributions from the perturbations on the GRN’s behavior, and that it can shape the landscape in various ways, for instance affecting the paths that are accessible, the shape/depth of certain valleys, etc. But we believe that qualitatively or quantitatively analyzing the effect of these perturbations on the landscape is precisely what is interesting here: it might help 1) understand how a system respond to a range of perturbations and to visualize which behaviors are robust to those perturbations, 2) design better strategies for manipulating those systems to produce certain behaviors

      (2) In Fig. 7, I'm not sure how much is possible to take away from the results as given here, as they depend sensitively on the cohort of 432 (GRN, Z) pairs used. The comparison against random networks is well-motivated. However, as the authors note, comparison between organismal categories is more difficult due to low sample size; for instance, the "plant" and "slime mold" categories each only have 1 associated GRN. Additionally, the "n/a" category is difficult to interpret.

      We acknowledge that this part is speculative as stated in the paper: “the surveyed database is relatively small with respect to the wealth of available models and biological pathways, so we can hardly claim that these results represent the true distribution of competencies across these organism categories”. However, when further data is available, the same methodology can be reused and we believe that the resulting statistical analyses could be very informative to compare organismal (or other) categories.

      (3) In Fig. 8, it is unclear whether the behavioral catalog generated is important to the intervention design problem of moving a system from one attractor basin to another. The authors note that evolutionary searches or SGD could also be used to solve the problem. Is the analysis somehow enabled by the behavioral catalog in a way that is complementary to those methods? If not, comparison against those methods (or others e.g. optimal control) would strengthen the paper.

      We thank the reviewer for asking to clarify this point, which might not be clearly explained in the paper. Here the behavioral catalog is indeed used in a complementary way to the optimization method, by identifying a representative set of reachable attractors which are then used to define the optimization problem. For instance here, thanks to the catalog, we 1) were able to identify a “disease” region and several possible reachable states in that region and 2) use several of these states as starting points of our optimization problem, where we want to find a single intervention that can successfully and robustly reset all those points, as illustrated in Figure 8. Please note that given this problem formulation, a simple random search was used as an optimization strategy. When we mention more advanced techniques such as EA or SGD, it is to say that they might be more efficient optimizers than random search. However, we agree that in many cases optimizing directly will not work if starting from random or bad initial guess, and this even with EA or SGD. In that case the discovered behavioral catalog can be useful to better initialize this local search and make it more efficient/useful, akin to what is done in Figure 9.

      (4) The analysis presented in Fig. 9 also is preliminary. The authors note that there exist many algorithms for choosing/identifying the parameter values of a dynamical system that give rise to a desired time-series. It would be a stronger result to compare their approach to more sophisticated methods, as opposed to random search and SGD. Other options from the recent literature include Bayesian techniques, sparse nonlinear regression techniques (e.g. SINDy), and evolutionary searches. The authors note that some methods require fine-tuning in order to be successful, but even so, it would be good to know the degree of fine-tuning which is necessary compared to their method.

      We agree that the analysis presented in Figure 9 is preliminary, and thank the reviewer for the suggestion. We would first like to refer to other papers from the ML literature that have more thoroughly analyzed this issue, such as Colas et al. [74] and Pugh et al. [34], and shown the interest of diversity-driven strategies as promising alternatives. Additionally, as suggested by the reviewer, we added an additional comparison to the CMA-ES algorithm in order to complete our analysis. CMA-ES is an evolutionary algorithm which is self-adaptive in the optimization steps and that is known to be better suited than SGD to escape local minimas when the number of parameters is not too high (here we only have 15 parameters). However, our results showed that while CMA-ES explores more the solution space at the beginning of optimization than SGD does, it also ultimately converges into a local minima similarly to SGD. The best solution converges toward a constant signal (of the target b) but fails to maintain the target oscillations, similar to the solutions discovered by gradient descent. We tried this for a few hyperparameters (init mean and std) but always found similar results. We report the novel results at https://developmentalsystems.org/curious-exploration-of-grn-competencies/tuto2.html (bottom cell, Figure 4). We suggest including the updated figure and caption in the revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This is significant work, and you should certainly make the best case you can on the weaknesses discussed.

      We thank reviewer for this positive comment on the significance of our work. The referee indicates as weaknesses (i) that the force involving the bent or straight αI-helix is not readily apparent, (ii) the residue types were not varied in the helix mutations, and (iii) that the chemical shift perturbations are indirect observations.

      We think we have tried to address a large part of these questions by being very careful in our analysis and by the discussion in the manuscript. The following remarks may help to clarify this further:

      (i) The force emanating from the helix is e.g. visualized in the PC2 loadings in Figure 6E of the PCA carried on all observed SH3-SH2-KD resonances for all apo forms of the helix mutants. The SH2 residues identified by these loadings are in direct vicinity to the αI-helix. The respective PC2 scores correlate to 98% with the vmax of the catalytic reaction and to 94 % with the PC1 scores found for imatinib-induced opening. Importantly, the structure of the KD with the straight αI-helix indicates that mostly residues F516, Q517, S520, and I521 would clash with the SH2 domain in a closed core (Figure 6F). Thus, the expected clashes are in direct vicinity of the SH2 residues identified by the PC2 loadings as correlated to vmax and imatinib-induced opening. These data are completely orthogonal and show that most of the force is coming from residues F516, Q517, S520, and I521 in the αI-αI’ turn.

      (ii) We agree that we mainly used truncations of the αI-helix to study its involvement in activation. Point (i) makes it clear that a larger part of the αI-helix effects is caused by steric clashes of the residues in the αI-αI’ turn. In the latter region, we don’t expect strong amino acid type-specific effects besides excluded volume. Due to expression problems, we could not vary the helix length between residues 519 and 534. However, in this region we introduced the amino acid type mutation E528K. The latter showed a clear specific effect. Further amino acid type-specific effects may be possible in this region. However, we expect that the identified electrostatic E528-R479 interaction is one of the most important interactions in this region.

      (iii) We agree that chemical shift changes of individual resonances are often hard to interpret. However, we want to stress that our conclusions are all drawn from principal component analyses, which in all cases had as input well over 100 if not over 200 1H-15H resonances. The first two principal components of these analyses are robust averages over many residues, which reveal general correlated structural trends.

      We assume that chemical shift deposition etc will be pursued.

      We are currently depositing a larger collection of our Abl data to the “Biological Magnetic Resonance Data Bank (BMRB)”, which includes the NMR chemical shift data of the present work. A ‘collection’ will be a new feature of the BMRB, and we are in discussion with their staff. We will provide the accession codes as soon as possible (probably within the next month) to be included into the final version of the manuscript. We have amended the Data Availability Section accordingly.

      Reviewer #2 (Recommendations For The Authors):

      1) The overall discussion of the implications of the described allostery on kinase activation is provided through lenses of imatinib binding, which is used as an experimental trigger to disassemble the autoinhibited core. Can the authors elaborate in the Discussion on what event would play this role in the kinase catalytic cycle, communicating to helix I? Would dissociation of the myristate from the active site be hypothesized to be the first step in kinase activation? While I understand that certainty may be challenging to attain, it would be good to introduce some ideas into the Discussion.

      We appreciate the reviewer’s suggestions for the discussion and added the following text to the Conclusion section:

      "We have used here imatinib binding to the ATP-pocket as an experimental tool to disassemble the Abl regulatory core. Our previous analysis (Sonti et al., 2018) of the high-resolution Abl transition-state structure (Levinson et al., 2006) indicated that due to the extremely tight packing of the catalytic pocket, binding and release of the ATP and tyrosine peptide substrates is only possible if the P-loop and thereby the N-lobe move towards the SH3 domain by about 1–2 Å. This motion is of similar size and direction as the motion of the N-lobe observed in complexes with imatinib and other type II inhibitors (Sonti et al., 2018). From this we concluded that substrate binding opens the Abl core in a similar way as imatinib. The present NMR and activity data now clearly establish the essential role of the αI-helix both in the imatinib- and substrate-induced opening of the core, thereby further corroborating the similarity of both disassembly processes.

      Notably, the used regulatory core construct Abl83-534 lacks the myristoylated N-cap. Although we have previously demonstrated that the latter construct is predominantly assembled (Skora et al., 2013), the addition of the myristoyl moiety is expected to further stabilize the assembled conformation in a similar way as asciminib.

      Considering this mechanism, dissociation of myristoyl from the native Abl 1b core may be a first step during activation. However, it should be kept in mind that the Abl 1a isoform lacks the N-terminal myristoylation, and it is presently unclear whether other moieties bind to the myristoyl pocket of Abl 1a during cellular processes."

      2) Can the authors comment more on the differentiation between assembled conformations induced by type I inhibitor binding vs apo forms (or AMP-PNP and allosteric inhibitor) reported in Figure 3B? The differences are clearly identified by PCA but not sufficiently discussed.

      As indicated in the text, we think two structural effects are intermingled within PC2. Due to this admixture, it is hard to draw strong conclusions and we don’t want to expand on this too much. We have slightly modified the respective paragraph (p.7) as follows):

      "As the affected residues react differently to perturbations by type I inhibitors and truncation of the αI’-helix (Figure 3A, right), we attribute this behavior to two effects intermixed into the PC2 detection: (i) a minor rearrangement of the SH3/KD N-lobe interface caused by filling of the ATP pocket with type I inhibitors, which in contrast to the stronger N-lobe motion induced by type II inhibitors does not yet lead to core disassembly and (ii) a small rearrangement of the SH2/KD C-lobe interface caused by shortening and mutations of the αI-helix."

      3) The allosteric connection between active site inhibitor binding and the myristate/allosteric inhibitor binding has been observed in the past and noted before, in papers such as Zhang et al, Nature 2010. While the authors reference this paper, they do not acknowledge its specific findings or engage in a broader discussion of how their conclusions relate to this work.

      We have modified the beginning of the Conclusion section:

      "The allosteric connection between Abl ATP site and myristate site inhibitor binding has been noted before, albeit specific settings such as construct boundaries and the control of phosphorylation vary in published experiments. Positive and negative binding cooperativity of certain ATP-pocket and allosteric inhibitors has been observed in cellular assays and in vitro (Kim et al., 2023; Zhang et al., 2010). Furthermore, hydrogen exchange mass spectrometry has indicated changes around the unliganded ATP pocket upon binding of the allosteric inhibitor GNF-5 (Zhang et al., 2010). Here, we present a detailed high-resolution explanation of these allosteric effects via a mechanical connection between the kinase domain N- and C-lobes that is mediated by the regulatory SH2 and SH3 domains and involves the αI helix as a crucial element.

      Specifically, we have established a firm correlation between the kinase activity of the Abl regulatory core, the imatinib (type II inhibitor)-induced disassembly of the core, which is caused by a force FKD–N,SH3 between the KD N-lobe and the SH3 domain, and a force FαI,SH2 exerted by the αI-helix towards the SH2 domain. The FαI,SH2 force is mainly caused by a clash of the αI-αI’ loop with the SH2 domain. Both the FKD–N,SH3 and FαI,SH2 force act on the KD/SH2SH3 interface and may lead to the disassembly of the core, which is in a delicate equilibrium between assembled and disassembled forms. As disassembly is required for kinase activity, the modulation of both forces constitutes a very sensitive regulation mechanism. Allosteric inhibitors such as asciminib and also myristoyl, the natural allosteric pocket binder, pull the αI-αI’ loop away from the SH2 interface, and thereby reduce the FαI,SH2 force and activity. Notably, all observations described here were obtained under nonphosphorylated conditions, as phosphorylation will lead to additional strong activating effects."

      4) Figure 6 could do a better job of providing an illustration of steric clashes.

      We have revised Figure 6, panel F, in order to better illustrate the steric clashes, and modified the legend accordingly.

      5) There is a typo in line 5 from the top on page 11 (dash missing from "83534" superscript).

      Thank you. This was fixed.

    1. Author Response

      Many thanks for handling our manuscript (eLife-RP-RA-2023-93968) entitled "Allosteric modulation of the CXCR4:CXCL12 axis by targeting receptor nanoclustering via the TMV-TMVI domain", by García-Cuesta et al. We are delighted to hear your willingness to consider our manuscript following appropriate revision. We have carefully read the referees' commentaries and have organized new experiments to address their specific queries.

      Reviewer #1 (Public Review)

      The computational methodology is going to be carefully reviewed. In particular to justify the software and techniques used in this manuscript. We will also describe the method for identifying the pocket on the CXCR4 structure as well as the workflow used to explain the transition from docking evaluation to MD analyses. Additionally, we will conduct experiments to enhance the results and address the specific feedback provided, ultimately improving the overall reliability.

      Reviewer #2 (Public Review)

      Although the paper was initiated by titrating the compounds in migration experiments, we are going to add new kinetics and titration of concentrations in these experiments. In addition, we are going to change the way in which we present the data from the singlemolecule tracking experiments. We will add a representative video of each experimental condition, and include some of the mean square displacement curves to support our data on the analysis of the diffusion coefficient (D1-4) to give a more conclusive view of receptor clustering. Regarding the tumorigenesis experiments we will include the individual data points and we will try to perform kinetics with distinct concentrations of the drug.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript from Kavanjoo et al examines the role of macrophages within the fetal liver beyond erythrocyte maturation. Using single-cell sequencing, high-resolution imaging, and inducible genetic deletion of yolk-sac (YS) derived macrophages, the authors demonstrate that heterogeneous fetal liver macrophages regulate erythrocyte enucleation, interact physically with fetal HSCs, and may regulate neutrophil accumulation in the fetal liver. The data as presented do not strongly support the authors’ conclusion that fetal macrophages in the liver regulate the HSC niche or granulopoiesis from HSCs.

      Fetal-derived resident tissue macrophages are increasingly implicated in regulation of adult tissue function and homeostasis, but considerably less is known regarding the function of fetal macrophages during development. Macrophages in the fetal liver have been shown to form erythroblastic islands, where they regulate erythrocyte maturation. Here, the authors performed single-cell sequencing on fetal liver macrophages (Cd11b-lo) to gain insight into heterogeneity and utilized previously published pre-Mac signatures from the YS to focus on YS-derived macrophages. These clusters were then further cross-referenced with surface protein expression as determined by multidimensional flow cytometry to hone in on a very specific subset of three groups of F4/80hi macrophages defined by multiple surface markers. Fate-mapping with three models (Tnfrsf11a-Cre - YS pMAC derived; Ms4a3Cre - FL monocyte derived; CXCR4-Cre-ERT2 - definitive HSC derived) revealed that three major subsets are all derived from YS pMACs.

      We thank the reviewer for the comments and have addressed all points below. If certain points were mentioned twice, we responded at the position where the point was raised the first time.

      However, the relative frequencies of these specific populations are not shown, and because the single sequencing analysis goes through so many iterations of re-clustering that initiates by focusing specifically on pMAC signatures, this result is not surprising.

      Probing gene expression within each of the three clusters revealed ligand expression suggesting cell-cell interactions, and cross-referencing with a fetal LT-HSC gene expression dataset revealed potential receptor-ligand interactions. Microscopic investigation of physical interactions between specific macrophage subsets and HSCs was not particularly convincing. In Figure 3C, for example, Cluster C is very difficult to visualize. It would again be helpful to know what the ratios are within the FL for each cluster. Data in Figure 3F are not well represented by Data in Figure 3E.

      We showed frequencies after CODEX in the original manuscript (Fig. S3A, now Figure 4 - figure supplement 1A) since isolation of cells often induces an artifact, and relative frequencies after scRNA-seq experiments never represent the actual cell numbers present in situ. However, also the CODEX analysis has its weakness, especially in dense tissues, as the automated gating method may not catch every macrophage due to its star-shaped structure. Thus, we have now included the absolute numbers of macrophage subpopulations in Figure 7C. We have tried to improve the visualization of the clusters in Figure 3C (now Figure 4C) by zooming into a specific region. The Voronoi diagram is a powerful method that allows for an overall spatial visualization of cell distribution in large tissue pieces. In the high-resolution PDF that we provide, zooming into the PDF file should allow the reader to see each cluster in great detail.

      To improve the data of macrophage-HSC interaction we have performed 3D reconstructions and quantified the distance of CD150+ and Iba1+ cells in 3D (new Figure 3C-E) as the thin cryosectioning used for CODEX is not suitable to reconstruct these interactions properly (see also lines 328-331). Thus, Figure 3E was not able and also not meant to represent data shown in Figure 3F (now Figure 4E and 4F). Figure 3E is just meant to show examples of all clusters sitting in proximity to CD150+ HSCs.

      Furthermore, deletion of YS pMAC-derived macrophages the Tnfrsf11a-Cre X Spi1fl/fl resulted in broad macrophage depletion - although the authors did not demonstrate this using the carefully refined phenotypes they had defined earlier in the manuscript. Nonetheless, the authors demonstrate that macrophage depletion did affect erythroid enucleation, as expected, and the authors also showed some effect of macrophage deletion on LT-HSC gene expression by bulk transcription analysis. These effects were relatively small, however, and this was clear in the absence of effects on hematopoiesis in vivo or HSC proliferation ex vivo. To further investigate the effects of macrophage deletion on downstream hematopoieisis, the authors re-assessed the myeloid compartment following macrophage deletion, and identified and specifically focused on an observed increase in neutrophils in response to macrophage depletion. Based on this increase, they tested HSC differentiation using a colony-forming assay, which shows a slight increase in GM colonies that is also reflective of a slight but insignificant increase in total colony forming capability. The authors concluded that loss of fetal macrophages causes a reprogramming of HSCs to the granulocytic lineage. However, the colony-forming assay and subtle differences in gene expression are not sufficient to conclude that fetal HSCs have been reprogrammed towards granulocytic lineage by macrophage deletion.

      We thank the reviewer for this comment and have improved the manuscript accordingly: We have performed the colony-forming assay again with n=5 embryos per genotype that were harvested on the same day, which resulted in a similar phenotype as before, with the differences of GM colonies now being significant. Further, we quantified the depletion of all macrophage subpopulations in the Tnfrsf11a-Cre X Spi1fl/fl model (Fig. 7C). To strengthen the point that the transient lack of macrophages when HSCs arrive in the fetal liver leads to their reprograming, we included flow cytometry data from E16.5 and E18.5 where we still see an increase of neutrophils in the fetal liver, despite the fact that macrophages are repopulating the empty niche (Fig. 7E, F). To show that this is a cell-intrinsic effect, we have performed adoptive transfer experiments supporting our claim that loss of macrophages reprograms HSCs toward the granulocytic lineage (Fig. 7H, I)

      Overall, there are some interesting pieces of data in this manuscript, including the classification of new subsets of macrophages in the liver, their fate-mapping to the YS, and gene expression analysis. However, the data as presented do not strongly support a role for these particular macrophage subsets in regulating HSCs or fetal hematopoiesis within the fetal liver niche. Although there may be specific subsets of fetal liver macrophages that more closely physically interact with HSCs, deletion of what appeared to be a vast majority of macrophages in the FL did not appear to affect cellularity of hematopoietic stem and progenitor cells in vivo, and was not shown to convincingly affect HSC function. The mechanism by which macrophage deletion affected granulopoiesis could be independent from HSCs, and would be interesting to further explore.

      We hope that with new set of experiments we were able to convince the reviewer of the importance of macrophages in the HSC niche.

      Reviewer #2 (Public Review):

      Using a single-cell omics approach combined with spatial proteomics and genetic fate mapping, Kayvanjoo et al found that fetal liver (FL) macrophages cluster into distinct yolk sac-derived subpopulations and that some of the HSCs in FL preferentially associate with one of the identified macrophage subpopulations. FLs lacking macrophages show a delay in erythropoiesis. The authors also try to identify a role of macrophages for HSCs function in FL, and claim that macrophages affect myeloid differentiation of HSCs. Experimental support for the function of macrophages on HSCs remains weak. Taken together, their data provide a precise map of FL macrophage subpopulations, which is novel and will serve the field well.

      We thank the reviewer for the positive assessment. We have now strengthened the data regarding the impact of granulopoiesis by performing additional CFU assays and adoptive transfers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the researchers aimed to investigate the cellular landscape and cell-cell interactions in cavernous tissues under diabetic conditions, specifically focusing on erectile dysfunction (ED). They employed single-cell RNA sequencing to analyze gene expression patterns in various cell types within the cavernous tissues of diabetic individuals. The researchers identified decreased expression of genes associated with collagen or extracellular matrix organization and angiogenesis in several cell types, including fibroblasts, chondrocytes, myofibroblasts, valve-related lymphatic endothelial cells, and pericytes. They also discovered a newly identified marker, LBH, that distinguishes pericytes from smooth muscle cells in mouse and human cavernous tissues. Furthermore, the study revealed that pericytes play a role in angiogenesis, adhesion, and migration by communicating with other cell types within the corpus cavernosum. However, these interactions were found to be significantly reduced under diabetic conditions. The study also investigated the role of LBH and its interactions with other proteins (CRYAB and VIM) in maintaining pericyte function and highlighted their potential involvement in regulating neurovascular regeneration. Overall, the manuscript is well-written and the study provides novel insights into the pathogenesis of ED in patients with diabetes and identifies potential therapeutic targets for further investigation.

      Reviewer #2 (Public Review):

      Summary: In this manuscript, the authors performed single cell RNA-sequencing of cells from the penises of healthy and diabetes mellitus model (STZ injection-based) mice, identified Lbh as a marker of penis pericytes, and report that penis-specific overexpression of Lbh is sufficient to rescue erectile function in diabetic animals. In public human single cell RNA-sea datasets, the authors report that LBH is similarly specific to pericytes and down regulated in diabetic patients. Additionally, the authors report discovery of CRYAB and VIM1 as protein interacting partners with LBH.

      The authors contributions are of interest to the erectile dysfunction community and their Lbh overexpression experiments are especially interesting and well-conducted. However, claims in the manuscript regarding the specificity of Lbh as a pericyte marker, the mechanism by which Lbh overexpression rescues erectile function, cell-cell interactions impaired by diabetes, and protein-interaction partners require qualification or further evidence to justify.

      Major claims and evidence:

      1) Marker gene specificity and quantification: One of the authors' major contributions is the identification of Lbh as a marker of pericytes in their data. The authors present qualitative evidence for this marker gene relationship, but it is unclear from the data presented if Lbh is truly a specific marker gene for the pericyte lineage (either based on gene expression or IF presented in Fig. 2D, E). Prior results (see Tabula Muris Consortium, 2018) suggest that Lbh is widely expressed in non-pericyte cell types, so the claims presented in the manuscript may be overly broad. Even if Lbh is not a globally specific marker, the authors' subsequent intervention experiments argue that it is still an important gene worth studying.

      Answer: We appreciate this comment. In our scRNAseq data for the mouse cavernosum tissues, previously known markers such as Rgs5, Pdgfrb, Cspg4, Kcnj8, Higd1b, and Cox4i2 were found to be expressed not exclusively in pericytes, while Lbh exhibited specific expression patterns in pericytes (Fig. 2 and Supplementary Fig. 5). LBH expression was easily distinguishable from α-SMA, not only in mouse cavernosum but also in dorsal artery and dorsal vein tissues within penile tissues. This distinctive expression pattern of LBH was also observed in the human cavernous pericytes (Fig. 5). Then, we examined Lbh expression patterns in various mouse tissues using the mouse single-cell atlas (Tabula Muris), although endothelial and pericyte clusters were not subclustered in most tissues from Tabula Muris. To identify pericytes, we relied on the expression pattern of known marker genes (Pecam1 for endothelial cells, Rgs5, Pdgfrb, and Cspg4 for pericytes). Lbh was expressed in pericytes of the bladder, heart and aorta, kidney, and trachea but not as specifically in penile pericytes (Supplementary Fig. 6A-D). However, it is worth noting that other known pericyte markers were also did not exhibit exclusive expression in pericytes across all the tissues we analyzed. Therefore, in certain tissues, particularly in mouse penile tissues, Lbh may be a valuable marker in conjunction with other established pericyte marker genes for distinguishing pericytes.

      2) Cell-cell communication and regulon activity changes in the diabetic penis: The authors present cell-cell communication analysis and TF regulon analysis in Fig 3 and report differential activities in healthy and DM mice. These results are certainly interesting, however, no statistical analyses are performed to justify claimed changes in the disease state and no validations are performed. It is therefore challenging to interpret these results, and the relevant claims do not seem well supported.

      Answer: In response to these helpful suggestions, we calculated statistical significance and performed experimental validation. CellphoneDB permutes the cluster labels of all cells 1000 times and calculates the mean(mean(molecule 1 in cluster X), mean(molecule 2 in cluster Y)) at each time for each interaction pair, for each pairwise comparison between two cell types. We only considered interactions in which the difference in means calculated by these permutations were greater than 0.25-fold between diabetes and normal. Also, we considered that the interactions with P-value < 0.05 were significant.

      To assess differential regulon activities of transcription factor (SCENIC) between diabetic and normal pericytes, we utilized a generalized linear model with scaled activity scores for each cell as input. These scaled regulon activity values for angiogenesis-related TFs exhibited differences between diabetic and normal pericytes. The results of the generalized linear model revealed that Klf5, Egr1, and Junb were TFs with significantly altered regulon activities in diabetic pericytes. Experimental data indicated that the expression level of Lmo2, Junb, Elk1, and Hoxd10 was higher (Hoxd10) or lower (Lmo2, Junb, Elk1) in diabetic pericytes compared to normal pericytes (Supplementary Fig. 9). We have added the scaled regulon activity values and statistical significance in Fig. 3E.

      3) Rescue of ED by Lbh overexpression: This is a striking and very interesting result that warrants attention. By simple overexpression of the pericyte marker gene Lbh, the authors report rescue of erectile function in diabetic animals. While mechanistic details are lacking, the phenomenon appears to have a large effect size and the experiments appear sophisticated and well conducted. If anything, the authors appear to underplay the magnitude of this result.

      Answer: We appreciate this comment. Therefore, we have added relevant clarification in the revised manuscript discussion section to emphasize the importance of LBH overexpression on rescuing ED as follows: “To test our hypothesis, we utilized the diabetes-induced ED mouse model, commonly employed in various studies focusing on microvascular complications associated with type 1 diabetes. We observed that the overexpression of LBH in diabetic mice led to the restoration of reduced erectile function by enhancing neurovascular regeneration. However, this study primarily demonstrated the observed phenomenon without delving into the detailed mechanisms. Nonetheless, these results of LBH on erections provide us with new strategies for treating ED and should be of considerable concern.” (Please see revised ‘Discussion’)

      4) Mechanistic claims for rescue of ED by Lbh overexpression: The authors claim that cell type-specific effects on MPCs are responsible for the rescue of erectile function induced by Lbh overexpression. This causal claim is unsupported by the data, which only show that Lbh overexpression influences MPC performance. In vivo, it's likely that Lbh is being over expressed by diverse cell types, any of which could be the causal driver of ED rescue. In fact, the authors report rescue of cell type abundance in endothelial cells and neuronal cells. Therefore, it cannot be concluded that MPC effects alone or in principal are responsible for ED rescue.

      Answer: We agree with these claims. Therefore, we have added relevant clarifications in the discussion section of the revised manuscript. Our findings suggest that LBH can affect the function of cavernous pericytes, although we cannot definitively specify which particular cavernous cell types are affected by the overexpressed LBH, whether it be cavernous endothelial cells, smooth muscle cells, or others. Subsequent research will be required to conduct more comprehensive mechanistic investigations, such as in vitro studies using cavernous endothelial cells, smooth muscle cells, and fibroblasts to address these knowledge gaps. (Please see revised ‘Discussion’)

      5) Protein interaction data: The authors claim that CRYAB and VIM1 are novel interacting partners of LBH. However, the evidence presented (2 blots in Fig. 6A,B) lack the relevant controls. It is possible that CRYAB and VIM1 are cross-reactive with the anti-LBH antibody or were not washed out completely. The abundance of bands on the Coomassie stain in Fig. 6A suggests that either event is plausible. Therefore, the evidence presented is insufficient to support the claim that CRYAB and VIM1 are protein interacting partners of LBH.

      Answer: We agree with these claims. Therefore, we have added the relevant controls(Input) and performed Co-IP (IP: CRYAB or VIM, WB: LBH) to demonstrate CRYAB and VIM1 are not simply cross-reactive antigens to their LBH antibody. Our results show that we can detect the expression of CRYAB and VIM after LBH IP, and we also detect the expression of LBH after CRYAB and VIM IP. In addition, it can be seen from our results that the binding of LBH to VIM is higher than that of CRYAB. Regardless, these results indicate that the binding of CRYAB or VIM to LBH is not a random phenomenon. (Please see revised ‘Result’ and ‘Figure 6B’)

      Impact: These data will trigger interest in Lbh as a target gene within the erectile dysfunction community.

      Reviewer #3 (Public Review):

      Bae et al. described the key roles of pericytes in cavernous tissues in diabetic erectile dysfunction using both mouse and human single-cell transcriptomic analysis. Erectile dysfunction (ED) is caused by dysfunction of the cavernous tissue and affects a significant proportion of men aged 40-70. The most common treatment for ED is phosphodiesterase 5 inhibitors; however, these are less effective in patients with diabetic ED. Therefore, there is an unmet need for a better understanding of the cavernous microenvironment, cell-cell communications in patients with diabetic ED, and the development of new therapeutic treatments to improve the quality of life.

      Pericytes are mesenchymal-derived mural cells that directly interact with capillary endothelial cells (ECs). They play a vital role in the pathogenesis of erectile function as their interactions with ECs are essential for penile erection. Loss of pericytes has been associated with diabetic retinopathy, cancer, and Alzheimer's disease and has been investigated in relation to the permeability of cavernous blood vessels and neurovascular regeneration in the authors' previous studies. This manuscript explores the mechanisms underlying the effect of diabetes on pericyte dysfunction in ED. Additionally, the cellular landscape of cavernous tissues and cell type-specific transcriptional changes were carefully examined using both mouse and human single-cell RNA sequencing in diabetic ED. The novelty of this work lies in the identification of a newly identified pericyte (PC)-specific marker, LBH, in mouse and human cavernous tissues, which distinguishes pericytes from smooth muscle cells. LBH not only serves as a cavernous pericyte marker, but its expression level is also reduced in diabetic conditions. The LBH-interacting proteins (Cryab and Vim) were further identified in mouse cavernous pericytes, indicating that these signaling interactions are critical for maintaining normal pericyte function. Overall, this study demonstrates the novel marker of pericytes and highlights the critical role of pericytes in diabetic ED.

      Reviewer #1 (Recommendations For The Authors):

      1) The methods are poorly written. It lacks specific information on the sample size, experimental design, and data analysis methods employed. The absence of these crucial details makes it difficult to evaluate the robustness and reliability of the findings.

      Answer: We agree with the reviewer’s suggestion, now we revised the methods of our manuscript, and added detailed information or references. For sample size we have added detailed information in Figure legend (Please see revised ‘Method’ , Figure Legend, and Supplementary information.)

      2) The cell number in the scRNA-seq analysis is small (~12000) and some minor cell types are probably underrepresented. It is not clear whether the authors pooled the cells from different mice as one sample, or replicates in different groups have been included. It will be helpful to label different samples in the UMAP. The authors should repeat the experiments with more replicates to increase the cell number and validate the findings.

      Answer: We understand the reviewer's concern, but due to the small size of mouse penile tissue, we had to pool 5 corpus cavernosum tissues for each group (using pooled samples) for scRNA-seq analysis. Moreover, owing to the unique nature of mouse penile tissue, which is highly resistant, it posed challenges for the dissolution and isolation of single cells using conventional single-cell separation methods. Consequently, we had to increase the concentration of the enzyme to finally obtain 12,894 cells. Rather than conducting a repetitive scRNAseq analysis on the same mouse model, we validated our findings in human cavernous single-cell transcriptome data. This analysis allowed us to confirm the presence of pericyte in human corpus cavernosum, specific expression of LBH in human cavernous pericytes, and the identification of relevant GO terms associated with pericyte functions (Figure 5). We have add these information in ‘Method’ (Please see revised ‘Method’).

      3) Functional studies are lacking to justify how manipulating LBH expression or its interacting proteins might lead to effective therapeutic approaches for diabetic ED.

      Answer: We have performed the functional study to evaluate LBH expression might lead to effective therapeutic approaches for diabetic ED as showed in Figure 4G. Assessment of intracavernous pressure (ICP) is the most representative test for evaluating erectile function. Therefore, we modulated LBH expression in the penis of diabetic mice and assessed the erectile function of the mice by intracavernous pressure. However, we have not performed ICP studies and relative in vitro studies (migration, survival experiment) to assess whether LBH-interacting proteins have the same effect.

      4) Although the abstract identifies novel targets for potential interventions, such as LBH and its interacting proteins, the clinical relevance of these findings remains uncertain. The authors should include a discussion regarding the translation of these discoveries into therapeutic strategies or their potential impact on patients with diabetes and ED.

      Answer: We appreciate the reviewer's suggestion and have added a discussion as per the reviewer’s recommendation (Please see revised ‘Discussion’).

      5) While the study highlights the importance of pericytes in penile erection, it fails to mention the broader context of other cell types involved in the pathogenesis of ED. Neglecting to discuss potential contributions from endothelial cells, smooth muscle cells, or neural elements limits the comprehensive understanding of the cellular interactions underlying diabetic ED.

      Answer: We agree with the reviewer's suggestion and have added a discussion regarding the significance of other cell populations in penile tissues, such as endothelial cells, smooth muscle cells fibroblasts, and neural elements, along with the rationale for our focus on pericytes. (Please see revised ‘Discussion’).

      Reviewer #2 (Recommendations For The Authors):

      We congratulate the authors on an interesting study. We were especially excited to see their Lbh overexpression results. However, we felt other claims in the paper could benefit from additional investigation, analysis, and statistical rigor. We have provided a set of suggestions for improvement below.

      Major points:

      1) Pericyte marker gene proposal: See public review for commentary on the following suggested experiments. The authors should perform binary classification analysis using Lbh and report the performance of this gene as a marker (e.g. using the area under the receiver operating characteristic, accuracy, precision and recall). Further, they should consider performing this analysis for all other genes in their data to determine whether Lbh is the best marker gene.

      Answer: We appreciate this comment. AUC scores of Rgs5, Pln, Ednra, Npylr, Atp1b2, and Gpc3 for ability of a binary classifier to distinguish between pericyte and the other cell types in mouse penile tissues were measured by using FindMarkers function. Rgs5 had the highest AUC, but Rgs5 was also expressed in SMCs in our data. Pln, Ednra, Gpc3, and Npy1r also seemed to be candidate markers, but the literature search excluded these genes as they are also expressed in the SMCs of other tissues or different cell types. The AUC score of Lbh was over 0.7, and expression in SMC was not identified in previous studies, and ultimately, we experimentally identified that Lbh is penis pericyte specific. We have added this to the manuscript.

      Author response table 1.

      Robust differential expression analysis should also be performed for this gene (if not all) and the statistics should be reported, given known issues with the statistical approach used by the authors for differential expression (see: Squair 2021, 10.1038/s41467-021-25960-2). The authors' should also report the number of cells involved in these comparisons, as the number of pericytes in the data (Fig 1B) appears quite small.

      Answer: We appreciate this comment. We used “MAST” to identify differentially expressed genes. This test is often used to find DEGs in single-cell RNA data. However, because the pseudobulk method has advantages over the single cell DEG method (Squair 2021, 10.1038/s41467-021-25960-2), we additionally performed DEG analysis with DESeq2 to confirm whether Lbh can distinguish pericytes from other cell types in the penile. As a result, even when tested with DESeq2, Lbh expression was significantly higher in pericytes than in other cell types in penile (adjusted p-value = 2.694475e-07 in Pericyte vs SMC, adjusted P-value = 3.700118e-58 in Pericyte vs the other cell types). Mouse penile tissue is small in size, and the number of pericytes in mouse penile tissue is relatively smaller compared to fibroblasts and chondrocytes. In our mouse penile scRNAseq data, the number of pericytes is as follows: normal: 58, diabetes: 116. Despite the limited number of cells, we were able to establish statistical significance in our analyses.

      Immunostaining results in Fig. 2D, E should likewise be quantified. At present, it's unclear that LBH and aSMA are mutually exclusive as claimed. The authors should also investigate Lbh expression in public single cell genomics data, rather than performing candidate gene literature searches. For example, the Tabula Muris suggests Lbh is expressed widely outside pericytes.

      Answer: For Figure 2D and E, the aim of these analyses was to assess the distribution of LBH and other cellular markers to see if they overlap and if they can be distinguished. We think that some of the overlapping staining in the tissue may be caused by multilayered cellular structures, so staining within cells would be more convincing. Therefore, we quantified the percentage of LBH- or α-SMA-expressed pericytes and relative expression in smooth muscle cells in cell staining (Supplementary Fig. 5E). We found that only 3% of smooth muscle cells expressed LBH, 67% of mouse cavernous pericytes (MCPs) expressed α-SMA, and more than 97% of MCPs expressed LBH. Therefore, these results may illustrate the specific expression of LBH in MCPs. These information was added as ‘Supplementary Fig. 5E’ (Please see revised ‘Supplementary information’). We also examined Lbh expression patterns in various mouse tissues using the public mouse single-cell atlas (Tabula Muris), and provided a detailed response in reviewer 2’s public review 1.

      Even if Lbh is not the best marker, the authors' intervention experiment still motivates study of the gene, but these analyses would help contextualize the result for readers.

      2) Statistical anslyses for cell-cell communication and TF regulon analysis: See public review for context on these comments. The authors should perform statistical tests to evaluate the significance of differences detected for each of these analysis. For example, generalized linear models can be used to assess the significance of TF regulon activity scores from SCENIC, and permutation tests can be used to measure the significance of cell-cell interaction score changes. Without these statistical tests, it's challenging for a reader to interpret whether the results reported are meaningful or within the realm of experimental noise.

      Answer: We appreciate this comment. We calculated statistical significance TF regulon analyses as suggested by the reviewer and described a detailed statistical calculation method for cell-cell communication. We provided a detailed response in reviewer 2’s public review 2.

      3) Mechanism of ED rescue by Lbh overexpression: To support this claim, the authors would need to perform an experiment where Lbh is over expressed specifically in MPCs (using e.g. a specific promoter on their LTV construct, or a transgenic line with a cell type-specific Cre-Lox system). Absent these data, the claim should be removed.

      Answer: We agree with the reviewer's suggestion and we have reworked the claim that ‘LBH overexpression is affected by pericytes during ED recovery’ and have added relevant clarification in the Discussion section to clearly state that LBH overexpression may affect many cavernosum cells, such as cavernous endothelial cells, smooth muscle cells, fibroblasts, and pericytes (Please see revised ‘Result’ and ‘Discussion’)

      4) Protein interaction claims: This experiment would require that the authors perform a similar pull-down with LBH KO cells and or a reciprocal Co-IP (e.g. IP: CRYAB or VIM1, WB: LBH) to demonstrate CRYAB and VIM1 are not simply cross-reactive antigens to their LBH antibody. Further, these experiments appear to only have a single replicate for each condition. The authors should either remove associated claims, or perform a Co-IP experiment with the relevant controls with sufficient replication.

      Answer: We agree with the claims. Therefore, we have included the necessary controls (Input) and performed Co-IP (IP: CRYAB or VIM1, WB: LBH) to demonstrate that CRYAB and VIM1 are not simply cross-reactive antigens to their LBH antibody. Our results show that we can detect the expression of CRYAB and VIM after LBH IP, and we also detect the expression of LBH after CRYAB and VIM IP. In addition, it can be seen from our results that the binding of LBH to VIM is higher than that of CRYAB. Regardless, these results indicate that the binding of CRYAB or VIM to LBH is not a random phenomenon. Additionally, all IP experiments were replicated at least three times. (Please see revised ‘Result’ and ‘Figure 6B’)

      Minor Points:

      • The reference "especially in men" on line 56 seems odd given that only males can experience penile erectile dysfunction.

      Answer: We agree with the reviewer's suggestion and have removed the description 'especially male' (Please see revised ‘Introduction’)

      • Line 109, it's unclear what genes showed altered expression in Schwann cells.

      Answer: We apologize for the confusion. There was no significant differentially expressed genes between normal and diabetes in Schwann cells. We revised this part in the manuscript. (Schwann cells showed an increased expression compared to normal cells in diabetes, though not significant. In Schwann cells, there were no significant DEGs between diabetic and normal cells.)

      • It would be helpful for readers to see an analysis of the cell types that are transduced in the Lbh overexpression experiment in vivo. At present, some pericyte specificity is implied, but not demonstrated.

      Answer: We appreciate this comment. Our findings suggest that LBH can affect the function of cavernous pericytes, although we cannot definitively conclude which specific-cavernous cell types are affected by the overexpressed LBH, whether it be cavernous endothelial cells, smooth muscle cells, or others. Subsequent research will be required to conduct more comprehensive mechanistic investigations, such as in vitro studies using cavernous endothelial cells, smooth muscle cells, and fibroblasts to address these knowledge gaps. These were also mentioned in the manuscript.

      • To improve clarity and enhance readability, define abbreviations before their initial usage in the text. For instance, in the second paragraph of the Introduction, the abbreviation 'ECs' is used without prior definition. It can be inferred that it is referring to endothelial cells, mentioned in parentheses in the subsequent sentence.

      Answer: We agree with the reviewer's suggestion to expand acronyms and ensure that all acronyms are defined in the revised manuscript before they are used for the first time in the text (Please see revised Manuscript).

      • It is important to include relevant references that align with the content being discussed. For example, in the Introduction, pericytes are described as being involved in various processes such as angiogenesis, vasoconstriction, and permeability. The text refers to a single reverence, a review by Gerhardt and Besholtz, which primarily focuses on pericyte's role in regulating angiogenesis. Adding additional sources, such as the review by Bergers and Song (Neuro Oncol., 2005) is recommended.

      Answer: We agree with the reviewer's suggestion, and have added the reference as reviewer recommended (Please see revised Manuscript and reference).

      • Figure 3E: it is stated that a panel of 53 angiogenesis factors were tested, it is stated that only MMP3 showed increased expression. However, various unlabeled spots appear to show changed expression patterns. It would be helpful to show a summary graph with the relative intensities of the full array of factors tested.

      Answer: We agree with the reviewer’s suggestion, now we showed all spots density in angiogenesis array as Supplementary Table 1. The condition of the spots we selected was that the expression density was at least above 1500, and the change ratio was greater than 1.2. (Please see revised ‘Supplementary information’)

      Reviewer #3 (Recommendations For The Authors):

      Detailed statistical power calculation

      Data availability statement( were both mouse and human scRNA deposited in GEO with a taken and when will they be released to the public?)

      Answer: Human scRNA data have been deposited in GEO under accession number GSE206528. Our mouse scRNA dataset has been uploaded to KoNA and is available for download (https://www.kobic.re.kr/kona/review?encrypt_url=amlod2FucGFya3xLQUQyMzAxMDEz)

      Major concerns about this work

      1) The single cell RNAseq data collected for mouse diabetic ED(Fig 1B), FB are the most abundant cell population compared to PC, EC, SMC and other clusters. The rationale for studying FB clusters (in Figure 1, D-F) instead of PC cluster is unclear. Which cluster DEG did the authors annotate for Fig 1G-H?

      Answer: We understand the reviewer's suggestion and confusion. Although other major cell populations in penile tissue such as smooth muscle cells, endothelial cell, and fibroblasts have been extensively studied, pericytes have mainly been investigated in the context of the central nervous system (CNS). For example, in the CNS, pericytes are involved in maintaining the integrity of the brain's blood-brain barrier (BBB) [PMID: 27916653], regulating blood flow at capillary junctions [PMID: 33051294], and promoting neuroinflammatory processes [PMID: 31316352], whose dysfunction is considered an important factor in the progression of vascular diseases such as Alzheimer's disease [PMID: 24946075]. But little is known about the role of pericytes in penile tissue [PMID: 35865945; PMID: 36009395; PMID: 26044953]. In order to explore the role of pericytes in repairing the corpus cavernosum vascular and neural tissues damaged by DM, we focused on pericytes, which are multipotent perivascular cells that contribute to the generation and repair of various tissues in response to injury. Although recent studies have shown that pericytes are involved in physiological mechanisms of erection, little is known about their detailed mechanisms. We have also added this rationale in discussion.

      Single cell level study has not been conducted in mouse penile tissues. Therefore, before delving into pericytes, we aimed to identify overall transcriptome differences between normal and diabetic conditions in mouse penile tissues. We presented the analyses of FB, which make up the largest proportion among the cell types in the mouse penis, in Fig. 1D-F. The analysis of other cell types is provided in Supplementary Fig. 1-4. Fig. 1G-H are GO terms for Fibroblasts clusters. We added this information in the figure.

      2) Fig 2 is the critical data to show Lbh is a cavernous PC specific marker. More PC violin plots to identify PC cluster such as Cspg4, Kcnj8, Higd1b, Cox4i2 and more SMC violin plots to identify SMC cluster such as Acta2, Myh11, Tagln, Actg2 should be used for inclusion and exclusion of PC( the same concern applied to human scRNAseq in Fig 5B).

      Answer: We appreciate this comment. We examined the expression of other marker genes of pericytes and SMCs. Although some marker genes were rarely expressed in the mouse penis data (Kcnj8, Higd1b), the expression of marker genes tended to be relatively high in each cluster. The expression of Cspg4 and Cox4i2 was higher in pericytes than in SMCs, while the expression of Acta2, Myh11,and Tagln was higher in SMCs than in pericytes. Actag2 was specifically expressed in SMCs. Through the gene set enrichment test as well as the expression of known cell type marker genes, we identified that the annotation of pericyte and SMC was appropriate (Fig. 2B and Fig. 5C). We added the violin plots of these marker genes in Supplementary Fig. 5.

      Author response image 1.

      (Mouse)

      In human penis data, ACTA2 and MYH11 were expressed in SMCs, pericytes, and myofibroblasts, as in the previous paper [PMID: 35879305]. Among pericyte markers, the number of cells expressing KCNJ8 and HIGD1B was small. The cluster we annotated as pericyte was double positive for pericyte markers CSPG4 and COX4I2. ACTG2, a marker for SMC, was expressed more highly in SMC than in pericytes and myofibroblasts. As in the mouse penis data, we identified that the annotation of each cell type was appropriate through the gene set enrichment test in the human penis data. We added the violin plots of CSPG4, COX4I2, and ACTG2 in Supplementary Fig. 11.

      Author response image 2.

      (Human)

      When exploring Lbh expression levels in "Database of gene expression in adult mouse brain and lung vascular and perivascular cells" from https://betsholtzlab.org/VascularSingleCells/database.html, Lbh is not uniquely expressed in PC, suggesting its tissue-specific expression level. This difference should be discussed in the Discussion section.

      Answer: We appreciate this valuable comment. For the answer to this comment, we extensively analyzed Lbh expression patterns in various mouse tissues using the public mouse single-cell atlas (Tabula Muris) as also suggested by Reviewer 2. Please see our detailed response in reviewer 2’s public review 1.

      3) In prior studies on PC morphology and location (PMID: 21839917), they reside in capillaries (diameter less than 10um) or distal vessels (diameter less than 25um) and have oval cell body and long processes. Due to the non-specificity of Pdgfrb, SMC are positive for Pdgfrb staining (this has been shown in many publications that SMC are Pdgfrb+; unfortunately, NG2 antibody also stains for both PC and SMC). Therefore, the LBH immunostaining (in Fig 2D and 2E of large-sized vessels) are very likely for SMC identity, not PC. PC should be in close contact with CD31+ ECs in healthy conditions. The LBH immunostaining of PC in both mouse and human tissues (Fig 4) must be replaced and better characterized.

      Answer: We agree with the reviewer's suggestion. As it is widely known, peicytes are primarily located in capillaries, where they surround endothelial cells of blood vessels. However, recent discoveries have identified cells with pericyte-like characteristics in the walls of large blood vessels, challenging the traditional concept [PMID: 27268036]. In our study, we observed minimal overlap in staining between LBH and α-SMA, suggesting that the cells expressing LBH were not smooth muscle cells but possibly pericyte-like cells in large vessels. In small vessels within the bladder, kidney, and even the aorta, we found LBH-expressing cells surrounding CD31-expressing vessels, consistent with the known characteristics of pericytes. Further research is needed to comprehend the differences in LBH expression and its characteristics in both large and small blood vessels. We have added discussions and references for this issue (Please see revised ‘Discussion’ and ‘Reference’)

      4) How do mouse cavernous pericytes isolate? How is purity?

      Answer: As the reviewer points out, we isolated mouse spongiform pericytes following our and other previously published methods. We used pigment epithelium-derived factor (PEDF), which removes non-pericytic cells [PMID: 30929324, 23493068]. Although there are no purity study results such as FACS, other staining results thoroughly support the notion that this method yields pericytes with a notably high level of purity. (Please see ‘Method’ section).

      5) Can mouse scRNAseq cell-cell communication in Fig 3 be reproducible in human scRNAseq cell-cell communication? The results in human ED are more clinically significant than in mouse data.

      Answer: In human scRNAseq data, the difference between angiogenesis-related interactions between normal and diabetes was not as significant as that in mouse data. Because the cell type composition of the human and mouse penis is not completely identical, there are limitations in comparing cell-cell interactions. However, in the human penis data, some interactions related to angiogenesis between pericytes and other cell types were decreased in diabetes compared to normal (boxed parts).

      Author response image 3.

      6) Fibroblasts also express Vim. Murine PC VIM/CRYAB( should be written as Vim/Cryab as mouse proteins) direct interaction with Lbh is unclear from Lbh IP as Fig 6A red boxes showed a wide range of sizes. Where is the band for Lbh? Do human PC LBH interact with VIM/CRYAB?

      Answer: We agree with the reviewer's comment. VIM is a type III intermediate filament protein expressed in many cell types. We have added the relevant controls (Input) and performed Co-IP (IP: CRYAB or VIM, WB: LBH) to demonstrate CRYAB and VIM are not simply cross-reactive antigens to their LBH antibody. In western blot study, the LBH band was expressed between 35 kDa-48 kDa. From Figure 6A, we detected CRYAB in band 1 and VIM in bands 2 and 3. This may be due to the formation of dimers or multimers by VIM. We did not use human PCs for IP studies because IP requires large amounts of protein, making IP studies using human pericyte challenging. Nevertheless, the interaction between LBH and CRYAB in humans has been reported through fluorescent resonance energy transfer assay and affinity chromatography technology assay [PMID:34000384, PMID:20587334].

      7) In Fig 6H and I, why does CRYAB expression significantly reduce in vitro and in vivo under diabetic conditions, whereas VIM expression significantly increases?

      Answer: As the reviewer pointed out, and we have discussed on this issue in the manuscript, CRYAB is known to promote angiogenesis. Diabetes reduces CRYAB expression, so angiogenesis may be impaired. Furthermore, since VIM is a multifunctional protein, it interacts with several other proteins with multiple functions under various pathophysiological conditions. There are many relevant literatures showing that VIM expression is increased under diabetic conditions [PMID: 28348116 and PMID: 32557212]. And VIM deficiency protects against obesity and insulin resistance in patients with type 2 diabetes. Therefore, we hypothesize that exogenous LBH may have the ability to bind to the increased VIM in diabetic conditions and inactivate the effects of VIM. Thereby achieving the protective effect. This needs to be proved in further studies.

      8) The therapeutic strategies targeting (Lbh-Cryab-Vim) on mouse diabetic ED model is not investigated and need to be further validated and discussed.

      Answer: As the reviewers pointed out, in this study, we did not evaluate the targeted therapeutic strategy for LBH-CRYAB-VIM in a mouse diabetic ED model. We only identified the binding potential of these three proteins. Evaluation of this treatment strategy requires further study. For example, we can employ shRNA lentivirus, either alone or in combination, to downregulate CRYABexpression [PMID: 31612679] in normal mice, utilize a lentiviral vector CMV-GFP-puro-vimentin to overexpress Vimentin [PMID: 36912679], and then treat it with LBH to evaluate whether the LBH effect still exists (in vivo erectile function study and in vitro angiogenesis assay). We include this information in the Discussion section as a limitation of this study (Please see revised ‘Discussion’).

      9) The Discussion of current knowledge of pericytes in diabetic ED and other diseases and the significance of this study as well as clinical implications, should be expanded.

      Answer: As the reviewers pointed out, we have expanded the current knowledge of pericytes in diabetic ED and other diseases (CNS disease) and clinical implications as follows: “Although other major cell populations in penile tissue such as smooth muscle cells, endothelial cell, and fibroblasts have been extensively studied, pericytes have mainly been investigated in the context of the central nervous system (CNS). For example, in the CNS, pericytes are involved in maintaining the integrity of the brain's blood-brain barrier (BBB), regulating blood flow at capillary junctions, and promoting neuroinflammatory processes, whose dysfunction is considered an important factor in the progression of vascular diseases such as Alzheimer's disease. But little is known about the role of pericytes in penile tissue.” (Please see revised ‘Discussion’).

      10) How many clinical samples were used? How many times did each experiment repeat?

      Answer: As the reviewers pointed out, the clinical samples’ information was added in ‘method’ section. A total four human samples were used in this study (‘human corpus cavernosum tissues were obtained from two patients with congenital penile curvature (59-year-old and 47-year-old) who had normal erectile function during reconstructive penile surgery and two patients with diabetic ED (69-year-old and 56-year-old) during penile prosthesis implantation.’). For in vivo study, we quantified four different fields from human samples.

      Minor concerns

      1) Fig 1A, why normal mouse's body size is the same as DM?

      Answer: As the reviewer pointed out, in Figure 1A, while the size of normal mice and DM mice may not appear significantly different, there are indeed notable difference in body weight and size. The normal mice body weigh we used was about 30 grams, while DM mice body weigh was generally less than 24 grams. We found that we missed information on physiological and metabolic parameters from in vivo studies (ICP function study). Therefore, we have added it in Supplementary Table 2 (Please see revised ‘Supplementary information’)

      2) The label and negative, and positive controls for Fig 6B are missing.

      Answer: We thank for pointing out this. We have added the relevant controls (Input) and performed Co-IP (IP: CRYAB or VIM1, WB: LBH) to demonstrate CRYAB and VIM1 are not simply cross-reactive antigens to their LBH antibody and all IP was replicated for at least 3 times. (Please see revised ‘Result’ and ‘Figure 6B’)

      3) The limitation of this study and future work should be discussed.

      Answer: As the reviewer pointed out, we have added the limitation of this study and future direction in the discussion section (Please see revised ‘Discussion’).

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER 1

      The claim that olivooid-type feeding was most likely a prerequisite transitional form to jet-propelled swimming needs much more support or needs to be tailored to olivooids. This suggests that such behavior is absent (or must be convergent) before olivooids, which is at odds with the increasing quantities of pelagic life (whose modes of swimming are admittedly unconstrained) documented from Cambrian and Neoproterozoic deposits. Even among just medusozoans, ancestral state reconstruction suggests that they would have been swimming during the Neoproterozoic (Kayal et al., 2018; BMC Evolutionary Biology) with no knowledge of the mechanics due to absent preservation.

      Thanks for your suggestions. Yes, we agree with you that the ancestral swimming medusae may appear before the early Cambrian, even at the Neoproterozoic deposits. However, discussions on the affinities of Ediacaran cnidarians are severely limited because of the lack of information concerning their soft anatomy. So, it is hard to detect the mechanics due to absent preservation. Olivooids found from the basal Cambrian Kuanchuanpu Formation can be reasonably considered as cnidarians based on their radial symmetry, external features, and especially the internal anatomies (Bengtson and Yue 1997; Dong et al. 2013; 2016; Han et al. 2013; 2016; Liu et al. 2014; Wang et al. 2017; 2020; 2022). The valid simulation experiment here was based on the soft tissue preserved in olivooids.

      While the lack of ambient flow made these simulations computationally easier, these organisms likely did not live in stagnant waters even within the benthic boundary layer. The absence of ambient unidirectional laminar current or oscillating current (such as would be found naturally) biases the results.

      Many thanks for your suggestion concerning the lack of ambient flow in the simulations. We revised the section “Perspectives for future work and improvements” (lines 381-392 in our revised version of manuscript). Conducting the simulations without ambient flow can reduce the computational cost and, of course, making the simulation easier, while adding ambient flow can lead to poorer convergency and more technical issues. Meanwhile, we strongly agreed that these (benthic) organisms did not live in stagnant waters, as discussed in Liu et al. 2022. However, reducing computational complexity is not the main reason that the ambient flow was not incorporated in the simulations. As we discussed in section “Perspectives for future work and improvements”, our work focuses on the theoretical effect caused by the dynamics (based on fossil observation and hypothesis) of polyp on ambient environment (i.e., how fast the organism inhales water from ambient environment) rather than effect caused by ambient flow on organism (e.g., drag forces), which was what previous palaeontological CFD simulations mainly focused based on fossil morphology and hydrodynamics. To this end, we mainly concern the flow velocity above or near peridermal aperture (and vorticity computed in this paper) generated only by polyp’s dynamics itself without the interference of ambient flow (as many CFD simulations for modern jellyfish, i.e., McHenry & Jed 2003; Gemmell et al. 2013; Sahin et al. 2009. All those simulations were conducted under hydrostatic conditions). Adding ambient flow to our simulations “biases” the flow velocity profiles we expect to obtain in this case.

      Nevertheless, we do agree that the ambient unidirectional laminar current or oscillating current plays an important role in feeding and respiration behavior of Quadrapyrgites. Further investigations need to be realized by designing a set of new insightful simulations and is beyond the scope of this work. We conducted CFD simulations incorporated with a randomly generated surface that imitated uneven seabed, where unidirectional laminar current and oscillating current (or vortex) were formed and exerted on Quadrapyrgites located in different places on the surface (Zhang et al. 2022). We assumed that combining the method we used in Zhang et al. 2022 and the velocity profiles collected in this work to conduct new simulations may be a promising way to further investigate the effect of the ambient current on organisms’ active feeding behavior.

      There is no explanation for how this work could be a breakthrough in simulation gregarious feeding as is stated in the manuscript.

      Thanks for your suggestion. We revised the section “Perspectives for future work and improvements” (lines 396-404 in our revised version of manuscript).

      Conducting simulations of gregarious active feeding behavior generally need to model multi (or clustered) organisms, which is beyond the present computational capability. However, exploiting the simulation result and thus building a simplified model can be possible to realize that, as we may apply an inlet or outlet boundary condition to the peridermal aperture of Quadrapyrgites with corresponding exhale or inhale flow velocity profiles collected in this work. By doing this we can obtain a simplified version of an active feeding Quadrapyrgites model without using computational expensive moving mesh feature. Such a model can be used solely or in cluster to investigate gregarious feeding behavior incorporated with ambient current. Those above are explicit explanations for how this work could be a “breakthrough” in simulation gregarious feeding. However, we modified the corresponding description in section “Perspectives for future work and improvements” to make it more appropriate.

      Throughout the manuscript there are portions that are difficult to digest due to grammar, which I suspect is due to being written in a second language. This is particularly problematic when the reader is attempting to understand if the authors are stating an idea is well documented versus throwing out hypotheses/interpretations.

      Thanks. Our manuscript was checked and corrected by a native speaker of English again.

      Line-by-line:

      L023: "Although fossil evidence suggests..."

      L026: "demonstrated" instead of "proven"

      We corrected them accordingly.

      L030: "The hydrostatic simulations show that the..." Maybe I'm confused by the wording, but shouldn't this be the case since it's a set part of the model?

      As is demonstrated in our manuscript, all the simulations were conducted under “hydrostatic” environment. We originally intend to use the description “hydrostatic” here to emphasize the simulation condition we set in our work. However, it can literally lead to misunderstanding that some of the simulations we conducted are “hydrostatic” while the others are not. To this end, deleting the word “hydrostatic” here (line 30) may be appropriate to eliminate confusion.

      L058: "lacking soft tissue" Haootia preservation suggests it is soft tissue (Liu et al., 2014), unless the preceding sentence is not including Haootia, in which case this section is confusingly worded

      Thank you. We deleted the sentence “However, their affinities are not without controversy as the lacking soft tissue.”

      L085: change "proxy"

      Yes, we changed to “Considering their polypoid shape and cubomedusa-type anatomy, the hatched olivooids appear to a type of periderm-bearing polyp-shaped medusa (Wang et al. 2020) (lines 86-88).”

      L092: "assist in feeding" has this been stated before? Citation needed, else this interpretation should primarily be in the discussion

      Yes, you are right. We cited the reference at the end of the mentioned sentence (lines 91-94).

      L095: Remove "It is suggested that"

      Thanks for your suggestions. We corrected it.

      L100: "Probably the..." here to the end belongs in the discussion and not introduction.

      Thanks for your suggestions. We corrected the sentences.

      L108: "an abapical"

      Thanks for your suggestions. We revised it in line 107.

      L112: "for some distance" be specific or remove

      Yes, we deleted “for some distance” in line 111.

      L133: I can't find a corresponding article to Zhang et al., 2022. Is this the correct reference?

      The article Zhang et al. 2022 (entitled “Effect of boundary layer on simulation of microbenthic fossils in coastal and shallow seas”.) was in press at the time when we first submitted this manuscript. We complemented the corresponding term in References with the doi (10.13745/j.esf.sf.2023.5.32), which may help readers to locate this article easier.

      L138: You can't be positive that your simulations "provide a good reproduction of the movement." You have attempted to reconstruct said movement, but the language here is overly firm - as is "pave a new way"

      Thanks for your suggestions. We corrected the corresponding description (lines 138-140) to make it more rigorous.

      L149: "No significant change" implies statistics were computed that are not presented here.

      The statistics were computed by using built-in function of Excel and presented in Table supplement 2 (deposited in figshare, https://doi.org/10.6084/m9.figshare.23282627.v2) rather than in manuscript. To be specific, the error computations are followed by the formula of relative error, which is defined by:

      where u_z denotes the velocity profile collected on each cut point z with the current mesh parameters, u_z^* denotes the velocity profile collected on each cut point z with the next finer mesh parameters, i denotes each time step (from 0.01 to 4.0). In this case, the total average error was computed by averaging the sum of each 〖error〗_i on corresponding time step. The results are red marked in Table supplement 2. We revised the corresponding description in lines 140-146

      L152: "line graphs" >> "profiles"

      Thanks for your suggestions. We corrected it in line 144.

      L159: remove "significant" unless statistics are being reported, in which case those need to be explained in detail.

      Thanks for your suggestions. We removed "significant" and corrected the corresponding sentences in lines 150-153 to make them more rigorous.

      L159: I would recommend including a supplemental somewhere that shows how tall the modeled Quadrapyrgites is and where the cut lines exist above it.

      Many thanks for your suggestions. Corresponding complementation was made in the last paragraph of section “Computational fluid dynamics” (line 455 and line 535). We agree that it is appropriate to elucidate the height of modeled Quadrapyrgites and the position of each cut point. Hence, we add a supplementary figure (entitled Figure supplement 1) to illustrate those above.

      L183: "The maximum vorticity magnitude was set..." I do not follow what this threshold is based on the current phrasing.

      The vorticity magnitude mentioned here is the visualisation range of the color scalebar, which can be set manually set in the software. The positive number represent the vortex rotated counterclockwise, while the negative number represent that rotated clockwise on the cut plane. In this case, the visualisation range is [-0.001,0.001] (i.e., the absolute value of 0.001 is the threshold), as the color scalebar in Figure 7. Decreasing the threshold, for example, setting the visualisation range to [-0.0001,0.0001], can capture smaller vorticity on the cut plane, as the figure below on the left. Otherwise, setting the range to [-0.01,0.01] will focus on bigger vorticity, as the figure below on the right. We found [-0.001,0.001] could be an appropriate parameter to visualize the vortex near periderm based on our trial. To be more rigorous and to avoid confusion, we modified the description in the corresponding place of the manuscript (lines 172-174).

      Author response image 1.

      L201: "3.9-4 s"

      Thanks, we corrected it in line 191.

      L269: "Sahin et al.,..." add to the next paragraph

      Yes, we rearranged the corresponding two paragraphs (lines 258-289).

      L344: "Higher expansion-contraction..." this needs references and/or more justification.

      Thanks. We deleted the sentence.

      L446: two layers of hexahedral elements is a very low number for meshing boundary layer flow

      Many thanks for your question. We agree that an appropriate hexahedral elements mesh for boundary layer is essential to recover boundary flow, especially in cases where turbulence model incorporated with wall function is adopted such as the standard k-epsilon model. In this case, the boundary flow is not the main point since the velocity profile was collected above periderm aperture rather than near no-slip wall region. What else, we do not need drag (related to sheer stress and pressure difference) computations in this case, which requires a more accurate flow velocity reconstruction near no-slip walls as what previous palaeontological CFD simulations have done. Thus, we think two layers of hexahedral elements are enough. What else, hexahedral elements added to periderm aperture domain, as illustrated in figure below, can let the velocity near wall vary smoothly and thus can benefit the convergency of simulations.

      Author response image 2.

      L449: similar to comments regarding lines 146-148, key information is missing here. Figure 3C appears to be COMSOL's default meshing routine. While it is true that the domain is discretized in a non-uniform manner, no information is provided as to what mesh parameters were "tuned" to determine "optimal settings" or what those settings are (or how they are optimal).

      Many thanks for your question. Specific mesh parameters were listed in Table supplement 3 and corresponding descriptions and modifications were made both in lines 475-479 and lines 542-549. In most CFD cases, the mesh parameters need to be tuned to ensure a balance between computational cost and accuracy. If the difference of the result obtained from present mesh and that obtained from the next finer mesh ranges from 5% -10%, the present mesh is expected to be “optimal”. To achieve this, we prescribed several sets of different mesh (mainly concerning maximum and minimum element size) to each subdomain (domain of the inner cavity, domain of the peridermal aperture and domain outside of fossil model) of the whole computational domain in the test model. Subsequently, we refined the mesh step by step as much as possible and adjust the element size of subdomains to find suitable mesh parameters, that is how the mesh parameters were "tuned". We agree that we should explicit what mesh parameters were tuned and what those settings are.

      Figure 7 should have the timesteps included and the scaling of the arrows should be explicit in the caption

      Many thanks for your suggestions. We intended to use the white arrows to represent the velocity orientation rather than true velocity scale in Figure 7 (Instead, the white arrows in Animation supplement 1 represent a normalized velocity profile). To avoid confusion, we revised Figure 7 with timesteps and arrows represent a normalized velocity profile, making it consistent with Animation supplement 1. Corresponding modification is also made in the caption of Figure 7.

      The COMSOL simulation files (raw data) are missing from the supplemental data. These should be posted to Dryad or here.

      We uploaded the files to Dryad (https://datadryad.org/stash/share/QGDSqLh8HOll7ofl6JWVrqM57Rp62ZPjvZU0AQQHwTY), and added the corresponding link to section “Data Availability Statement”.

      REVIEWER 2

      Lines 319-334: The omission in this paragraph of Paraconularia ediacara Leme, Van Iten and Simoes (2022) from the terminal Ediacaran of Brazil is a serious matter, as (1) the medusozoan affinities of this fossil are every bit as well established as those of anabaritids, Sphenothallus, Cambrorhytium and Byronia, and (2) P. ediacara was a large (centimetric) polyp, the presence of which in Precambrian times is thus a problem for the simple evolutionary scenario (very small polyps followed later in evolutionary history by large polyps) outlined in the paragraph. Thus, Paraconularia ediacara must be mentioned in this paper, both in connection with the early evolution of size in cnidarian polyps and in other places where the early evolution of cnidarians is discussed.

      Thanks for your important suggestions. We added some sentences in lines 323-326 as following: “Significantly, the large-bodied, skeletonized conulariids-like Paraconularia found from the terminal Ediacaran Tamengo Formation of Brazil confirmed their ancient predators like the extant medusozoans and suggested the origin of cnidarians even farther into the deep evolutionary scenario (Leme et al. 2022).”

      Line 23. Delete the word, been.

      Line 25. Replace conjecture with conjectural.

      Line 26. Delete the word, the before calyx-like.

      Line 32. Replace consisting with consistent.

      Thanks for your suggestions. We all corrected them.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank the reviewers for their insightful comments on our manuscript. We have addressed the reviewers’ comments below and in the revised manuscript.

      Reviewer #1:

      Comment #1: The authors found differences in the initial spike doublet of action potentials between cortical neurons in experimental and control conditions (Figure 2e). The action potential firing frequency of the first two APs (instant firing frequency) of recorded neurons shall be quantified to investigate whether there are statistical differences between the action potential firing frequency in cortical neurons in different experimental groups versus control conditions.

      Response: As suggested by the reviewer, we have quantified the first interspike interval (ISI; time between the 1st and 2nd action potential). The data is included in Fig. 2h as well as in Fig. S3e and Table 1. The Results and Methods have also been updated accordingly.

      Comment #2: The mTORS12215Y induced the largest changes in Ih current amplitudes in cortical neurons compared with other experimental conditions. Whether the HCN4 channel expression is regulated by mTOR pathway activation, or could there be possible interactions between the HCN channel and mTORS12215Y mutant protein?

      Response: Our previous findings using the RhebS16H mutation support the idea that increased expression of HCN4 channels is regulated by mTOR pathway activation. This is evidenced by its sensitivity to rapamycin (a mTOR inhibitor) and expression of constitutively active 4E-BP1 (a translational repressor downstream of mTORC1). Since mTORS2215Y directly hyperactivates mTORC1 and there are no known interactions between HCN channels and mTORS2215Y, our data strongly suggests that abnormal HCN4 channel expression occurs via mTORC1 hyperactivation in this condition. We have revised our Discussion to make this point clearer.

      Comment #3: A comparison of the electrophysiological characteristics of cortical neurons in different experimental conditions in the present study and pathological neurons in human FCD reported in previous literature could be interesting. Inducing pathological gene mutations or knocking out key genes in mTOR pathway in the rodent cortex - which approach could better model human FCD?

      Response: We agree with the reviewer and have added a new paragraph in the Discussion to compare our electrophysiology results to those of previous studies done on human FCDII and TSC cytomegalic neurons. With regards to the reviewer’s question about which of the two approaches in the rodent cortex – inducing pathological gene mutations or knocking out key genes in the mTOR pathway – would better model human FCD, our study emphasizes the importance of considering gene-specific mechanisms in FCDII. Thus, modeling the genetically distinct FCDIIs will require using gene-specific manipulations. We have revised our Discussion to include this point. With that said, for some phenotypes that are generalized across FCDII independent of the mTOR pathway genes, using pathogenic mutations of mTOR activators or knockout of negative mTOR regulators would likely both be appropriate models. Of note, as discussed in the manuscript, there are also technical factors to be considered when choosing to use a pathogenic gene mutation versus knocking out a gene (the latter which would depend on the half-life of the proteins).

      Reviewer #2:

      Comment #1: The authors postulate that all the findings are dependent on mTORC1-related effects but don't assess whether some of the differences could be due to effects on mTORC2 signaling. mTORC2 is an important and poorly understood alternative isoform of mTOR (due to rictor binding) that has effects on distinct cell signaling pathways and in particular actin polymerization. This doesn't diminish the effects of the current analysis of mTORC1 but could explain genotypic differences in each variable. A few prior studies have assessed the role of mTORC2 in epileptogenesis and cortical malformations (Chen et al., 2019).

      Response: We agree with the reviewer and have revised our Discussion to include the possibility of mTORC2 contribution to the gene-specific phenotypic differences.

      Comment #2: The slice recordings were performed in the usual recording aCSF buffer conditions but there is no assessment of the role of amino acids or nutrients in the bath. While it is clear that valuable and viable acute slice recordings can be made in aCSF, the role of the mTOR pathway is to modulate cell growth in response to nutrient conditions. Thus, one variable that could be manipulated and assessed currently in this study is the levels of amino acids i.e., leucine and arginine added to the bath since DEPDC5 and TSC1 are responsive to ambient amino acid levels.

      Response: We thank the reviewer for this great suggestion, and we intend to pursue this as part of another study.

      Comment #3: The analysis concedes that the role of somatic mutations in cortical malformations may depend not only on genotypic effects but also on allelic load and cellular subtype affected by the mutation. Thus, it would be interesting to see if electroporation either at E14 or E16, thereby affecting a distinct pool of progenitors, would mitigate or accentuate differences between mTOR pathway genes.

      Response: We agree with the reviewer. This is a crucial experiment that we hope to perform in the future. We have also added a paragraph in our Discussion to address this important point.

      Comment #4: Treatment with rapamycin and zatebradine in each condition would have added to the strength of the findings to determine the mTOR-dependence and reversibility of HCN4 effects.

      Response: We previously demonstrated the mTORC1 dependence of HCN4 expression in the RhebS16H condition using rapamycin and expression of constitutively active 4E-BP1. 4E-BP1 is a translational repressor downstream of mTORC1. In the 4E-BP1 study, we used a conditional system to express 4EBP1F113A (mutation that resists inactivation by mTORC1) in adolescent mice while RhebS16H (and thus mTORC1 activation) was expressed embryonically. 4E-BP1F113A expression suppressed Ih current and HCN4 expression, suggesting that aberrant HCN4 expression can be reversed by decreasing mTORC1regulated translation. Based on these data and the findings that rapamycin suppressed abnormal HCN4 expression, we postulate that increased HCN4 expression in the different gene conditions examined in the present study occurs via the mTORC1 pathway. However, we agree with the reviewer that treating each of the conditions with rapamycin would provide direct evidence of their mTORC1 dependence. Additionally, treating each condition with the HCN channel blocker zatebradine would also add strength to the findings. We have added a comment in the Discussion to acknowledge this point.

      Reviewer #1 (Recommendations For The Authors):

      Comment #1: The authors found increased frequency or amplitudes of spontaneous postsynaptic currents in different experimental cohorts. These data may not be sufficient to conclude increased synaptic excitability, because there are no pharmacological experiments to verify whether the recorded inward currents are excitatory or inhibitory postsynaptic currents. An alternative approach could be analyzing the decay time of spontaneous postsynaptic currents, the excitatory postsynaptic currents had relatively faster decay time compared with inhibitory postsynaptic currents.

      Response: Thank you for the comment. We apologize for the lack of clarity and have added the following text in the Results to clarify: “To separate sEPSCs from spontaneous inhibitory postsynaptic currents (sIPSCs), we used an intracellular solution rich in K-gluconate to impose a low intracellular Cl- concentration and recorded at a holding potential of -70 mV, which is near the Cl- reversal potential. The 90%-10% decay time of the measured synaptic currents ranged between 4-8 ms in all conditions (mean ± SD: control: 4.9 ± 1.6; RhebY35L: 5.2 ± 1.4; mTORS2215Y: 7.4 ± 1.4; control: 6.8 ± 0.7; Depdc5KO: 7.4 ± 1.0; PtenKO: 8.1 ± 0.9; Tsc1KO: 7.4 ± 0.9), consistent with the expected decay time for sEPSCs and shorter than the decay time for sIPSCs (Kroon et al, 2019). The recorded synaptic currents were therefore considered to be sEPSCs.”

      Comment #2: There are typos of Depdc5 in the text and figure legends.

      Response: Thank you for noticing this error. We have corrected the typos in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Zhu and colleagues aimed to clarify the importance of isoform diversity of PCDHg in establishing cortical synapse specificity. The authors optimized 5' single-cell sequencing to detect cPCDHg isoforms and showed that the pyramidal cells express distinct combinations of PCDHg isoforms. Then, the authors conducted patch-clamp recordings from cortical neurons whose PCDHg diversity was disrupted. In the elegant experiment in Figure 3, the authors demonstrated that the neurons expressing the same sets of cPCDHg isoforms are less likely to form synapses with each other, suggesting that identical cPCDHg isoforms may have a repulsive effect on synapse formation. Importantly, this phenomenon was dependent on the similarity of the isoforms present in neurons but not on the amount of proteins expressed.

      One of the major concerns in an earlier version was whether PCDHg isoforms, which are expressed at a much lower level than C-type isoforms, have true physiological significance. The authors conducted additional experiments to address this point by using PCDHg cKO and provided convincing data supporting their conclusion. The results from PCDHg C4 overexpression, showing no impact on synaptic connectivity, further clarified the importance of isoforms. I have no further concerns, however, I would like to point out that the evidence for the necessity of the PCDHg isoform is still lacking because most experiments were done by overexpression. It would be helpful for the readers if the authors could add this point to the discussion.

      Thank you for the positive feedback on our work. We have now incorporated a discussion of the limitations associated with overexpression.

      Reviewer #2 (Public Review):

      This short manuscript by Zhu et al. describes an investigation into the role of gamma protocadherins in synaptic connectivity in the mouse cerebral cortex. First, the authors conduct a single-cell RNA-seq survey of postnatal day 11 mouse cortical neurons, using an adapted 10X Genomics method to capture the 5' sequences that are necessary to identify individual gamma protocadherin isoforms (all 22 transcripts share the same three 3' "constant" exons, so standard 3'biased methods can't distinguish them). This method adaptation is an advance for examining individual gamma transcripts, and it is helpful to publish the method, the characterization of which is improved in this revised manuscript. The results largely confirm what was known from other approaches, which is that a few of the 19 A and B subtype gamma protocadherins are expressed in an apparently stochastic and combinatorial fashion in each cortical neuron, while the 3 C subtype genes are expressed ubiquitously. Second, using elegant paired electrophysiological recordings, the authors show that in gamma protocadherin cortical slices, the likelihood of two neurons on layers 2/3 being synaptically connected is increased. That suggests that gamma protocadherins generally inhibit synaptic connectivity in the cortex; again, this has been reported previously using morphological assays, but it is important to see it confirmed here with physiology. Finally, the authors use an impressive sequential in utero electroporation method to provide evidence that the degree of isoform matching between two neurons negatively regulates their reciprocal synaptic connectivity. These are difficult experiments to do, and while some caveats remain, the main result is consistent. Strengths include the impressive methodology and improved demonstration of the previously-reported finding that gamma protocadherins work via homophilic matching to put a brake on synapse formation in the cortex. Weaknesses include the writing, which even in the revision fails to completely put the new results in context with prior work, which together has largely shown similar results; a still-incomplete characterization of a new alpha protocadherin KO mouse (a minor point but it should still be addressed); and a lack of demonstration of protein levels in electroporated brains. Because of the unique organization and expression pattern of the gamma protocadherins, it is unlikely that these results will be directly applicable to the broader understanding of the role of cell adhesion molecules in synapse development. However, the methodology, which is now better described, should be applicable more broadly and the improved demonstration of the role of gamma protocadherin's negative role in cortical synaptogenesis is helpful.

      Thank you for the positive comments on our work. We have taken your suggestion into account and expanded our discussion to contextualize our research within the broader field of PCDH. Additionally, we have included more data to further illustrate the decrease in αPCDH expression in Pcdha conditional knockout mice. Your feedback has been invaluable in enhancing our manuscript.

      Reviewer #3 (Public Review):

      In this study, Zhu and authors investigate the expression and function of the clustered Protocadherins (cPcdhs) in synaptic connectivity in the mouse cortex. The cPcdhs encode a large family of cadherin-related transmembrane molecules hypothesized to regulate synaptic specificity through combinatorial expression and homophilic binding between neurons expressing matching cPcdh isoforms. But the evidence for combinatorial expression has been limited to a few cell types, and causal functions between cPcdh diversity and wiring specificity have been difficult to test experimentally. This study addresses two important but technically challenging questions in the mouse cortex: 1) Do single neurons in the cortex express different cPcdh isoform combinations? and 2) Does Pcdh isoform diversity or particular combinations among pyramidal neurons influence their connectivity patterns? Focusing on the Pcdh-gamma subcluster of 22 isoforms, the group performed 5'end-directed single-cell RNA sequencing from dissociated postnatal (P11) cortex. To address the functional role of Pcdhg diversity in cortical connectivity, they asked whether the Pcdhgs and isoform matching influence the likelihood of synaptic pairing between 2 nearby pyramidal neurons. They performed simultaneous whole-cell recordings of 6 pyramidal neurons in cortical slices, and measured paired connections by evoked monosynaptic responses. In these experiments, they measured synaptic connectivity between pyramidal neurons lacking the Pcdhgs, or overexpressing dissimilar or matching sets of Pcdhg isoforms introduced by electroporation of plasmids encoding Pcdhg cDNAs.

      Overall, the study applies elegant methods that demonstrate that single cortical neurons express different combinations of Pcdh-gamma isoforms, including the upper layer Pyramidal cells that are assayed in paired recordings. The electrophysiology data demonstrate that nearby Pyramidal neurons lacking the entire Pcdhg cluster are more likely to be synaptically connected compared to the control neurons, and that overexpression of matching isoforms between pairs decreases the likelihood to be synaptically connected. These are important and compelling findings that advance the idea that the Pcdhgs are important for cortical synaptic connectivity, and that the repertoire of isoforms expressed by neurons influence their connectivity patterns potentially through a self/nonself discrimination mechanism. However, the findings are limited to probability in connectivity and do they do not support the authors' conclusions that Pcdhg isoforms regulate synaptic specificity, 'by preventing synapse formation with specific cells' or to 'unwanted partners'. Characterizations of the cellular basis of these defects are needed to determine whether they are secondary to other roles in cell positioning, axon/dendrite branching and synaptic pruning, and overall synaptic formation. Claims that Pcdh-alpha and Pcdhg C-type isoforms are not functionally required are premature, due to limitations of the experiments. Moreover, claims that 'similarity level of γPCDH isoforms between neurons regulate the synaptic formation' are not supported due to weak statistical analyses presented in Fig4. The overstatements should be corrected. There was also missed opportunity to clearly discuss these results in the context of other published work, including recent publications focused on the cortex.

      Thank you for your feedback on the strengths and weaknesses of our work. In terms of the cellular basis of affected synaptic connectivity caused by γ-PCDH isoforms, we have compared the probability of connectivity for neuronal pairs with similar range of distance. Our findings indicate that the manipulation primarily affects pairs within the 50-150 micrometer range, suggesting that cell positioning might be a critical factor for the impact of γ-PCDH on synapse formation. However, we acknowledge that we couldn't definitively determine whether the negative effect on synaptic connectivity stems directly from impaired synapse formation or indirectly from synaptic pruning or the influence of PCDHγ on axon/dendritic branching. We've added these limitations to our discussion to provide a more comprehensive view of our research. Furthermore, we've adjusted our statements to better reflect the significance of our findings. Your feedback has been instrumental in improving the clarity and depth of our manuscript.

      Strengths:

      • The 5' end sequencing with a Pcdhg-amplified library is a technical feat and addresses the pitfall of conventional scRNA-Seq methods due to the identical 3'sequences shared by all Pcdhg isoform and the low abundance of the variable exons. New figures with annotated cell types confirm that several pyramidal and inhibitory cortical subpopulations were captured.

      Statistical assessment of co-occurrence of isoform expression within clusters is also a strength.

      • By establishing the combinatorial expression of Pcdhgs by maturing pyramidal cells, the study further substantiates the 'single neuron combinatorial code for cPcdhs' model. Although combinatorial expression is not universal (ie. serotonergic neurons), there was limited evidence. The findings that individual pyramidal neurons express ~1-3 variable Pcdhg transcripts plus the Ctype transcripts aligns with single RT-PCR studies of single Purkinje cells (Esumi et al 2005; Toyoda et al 2014). They differ from the findings by Lv et al 2022, where C-type expression was lower among pyramidal neurons. OSNs also do not substantially express C-type isoforms (Mountoufaris et al 2017; Kiefer et al 2023). Differences, and the advantages of the 5'end -directed sequencing (vs. SmartSeq) could be raised in the discussion.

      • Simultaneous whole-cell recordings and pairwise comparisons of pyramidal neurons is a technically outstanding approach. They assess the effects of Pcdhg OE isoform on the probability of paired connections.

      • The connectivity assay between nearby pairs proved to be sensitive to quantify differences in probability in Pcdhg-cKO and overexpression mutants. The comparisons of connectivity across vertical vs lateral arrangement are also strengths. Overexpressing identical Pcdhg isoform (whether 1 or 6) reduces the probability of connectivity, but there are caveats to the interpretations (see below).

      Weaknesses:

      n earby pairs but are not sufficient evidence for synapse specificity. The cPcdhs play multiple roles in neurite arborization, synaptic density, and cell positioning. Kostadinov 2015 also showed that starburst cells lacking the Pcdhgs maintained increased % connectivity at maturity, suggesting a lack of refinement in the absence of Pcdhgs. The known roles raise questions on how these manipulations might have primary effects in these processes and then subsequently impact the probability of connectivity. Investigations of morphological aspects of pyramidal development would strengthen the study and potentially refine the findings. The authors should more clearly relate their findings to the body of cPcdh studies in the discussion.

      Previous studies revealed the adverse effects of γ-PCDHs on dendritic spines, demonstrating that their absence results in increased dendritic spines density, while overexpression leads to a reduction. In our study, we consistently observed that γ-PCDHs exert a negative influence on synaptic connectivity. This consistency strengthens the overall body of evidence in support of the role of γ-PCDHs in synaptic connectivity and dendritic spine regulation. While we have previously mentioned this point in our discussion to highlight the concordance between our findings and prior research, your input is greatly appreciated in reinforcing the scientific context of our work.

      • Pcdhg cKO-dependent effects on connectivity occur between closely spaced soma (50-100um - Figure 2E), highlighting the importance of spatial arrangement to connectivity (also noted by Tarusawa 2016). Was distance considered for the overexpression (OE) assays, and did the authors note changes in cell distribution which might diminish the connectivity? Recent work by Lv et al 2022 reported that manipulating Pcdhgs influences the dispersion of clonally-related pyramidal neurons, which also impacts the likelihood of connections. Overexpression of Pcdhgc3 increased cell dispersion and decreased the rate of connectivity between pairs. Though these papers are mentioned, they should be discussed in more detail and related to this work.

      Our data indicated that variable γ-PCDH isoforms primarily influence synaptic connectivity in neuronal pairs within the 50-150 micrometer range. Notably, as the distance between neurons increases, we observed a corresponding reduction in synaptic connectivity, as illustrated in Figure 2E. We have also included additional discussion regarding potential variances among different C-type isoforms.

      • Though the authors added suggested citations and improved the contextualization of the study, several statements do not accurately represent the cited literature. It is at the expense of crystalizing the novelty and importance of this present work. For instance, Garrett et al 2012 PMID: 22542181 was the first to describe roles for Pcdhgs in cortical pyramidal cells and dendrite arborization, and that pyramidal cell migration and survival are intact. Line 52 cited Wang et al 2002, but this was limited to gross inspection. Garrett et al is the correct citation for: 'The absence of γ-PCDH does not cause general abnormality in the development of the cerebral cortex, such as cell differentiation, migration, and survival (Wang et al., 2002).' Second, single cell cPcdh diversity is introduced very generally, as though all neuron types are expected to show combinatorial variable expression with ubiquitous C-Type expression. But those initial studies were limited to Purkinje cells (Esumi 2005 and Toyoda 2014). Profiling of serotonergic neurons and OSN reveals different patterns (citations needed for Chen 2017 PMID: 28450636; Mountofaris et al PMID: 2845063; Canzio 2023 PMID: 37347873), raising the idea that cPcdh diversity and ubiquitous Ctype expression is not universal. Thus, the authors missed the opportunity to emphasize the gap regarding cPcdh diversity in the cortex.

      We would like to extend our gratitude to the reviewer for pointing out the citation related to the roles of γ-PCDHs in the neocortex. After a thorough review of both papers, Wang et al., 2002 and Garrett et al., 2012, we concur that it would be more appropriate to cite both of these papers here. Your suggestion to underscore the diverse expression patterns of γPCDHs in neocortical neurons is well-received, and we have integrated this aspect of our findings with previous observations into a new paragraph within the discussion section. Your insights have greatly enriched the depth of our paper, and we genuinely appreciate your contribution.

      • They have not shown rigorously and statistically that the rate of connectivity changes with% isoform matching. In Figure 4D, comparisons of % isoform matching in OE assays show a single statistical comparison between the control and 100% groups, but not between the 0%, 11% and 33% groups. Is there a significant difference between the other groups? Significant differences are claimed in the results section, but statistical tests are not provided. The regression analysis in 4E suggests a correlation between % isoform similarity and connectivity probability, but this is not sound as it is based on a mere 4 data points from 4D. The authors previously explained that they cannot evaluate the variance in these recordings as they must pool data together. However, there should be some treatment of variability, especially given the low baseline rate of connectivity. Or at the very least, they should acknowledge the limitations that prevent them from assessing this relationship. Claims in lines 230+ are not supported: ' Overall, our findings demonstrate a negative correlation between the probability of forming synaptic connections and the similarity level of γPCDH isoforms expressed in neuron pairs (Fig. 4E)".

      We employed a bootstrap method to estimate the potential variance in the analysis presented in Fig. 4E. It's important to note that due to methodological limitations, a comprehensive assessment of variance based solely on recordings from a single animal is challenging. As such, we have adjusted our claims to be more aligned with our observations.

      • Figure 4 provides connectivity probability, but this result might be affected by overall synapse density. Did connection probability change with directionality (e.g between red to green cells, or green to red cells).

      As suggested by the reviewer, we have conducted an analysis to assess the directionality of connections under different conditions. This analysis involved comparing the directionalities of connections following the overexpression of six variable isoforms, as depicted in Fig. 3E. Upon examining 33 connected OE-Ctrl pairs following the electroporation of these 6 isoforms, we observed 3 pairs with bidirectional connections, 19 pairs with connections from OE to Ctrl, and 11 with connections from Ctrl to OE. To assess the statistical significance of these observations, we applied a Chi-square test. The results from this analysis indicated that there was no significant difference in the directionality of connections. These findings offer further support for the idea that overexpressing multiple γ-PCDH isoforms within a single neuron might not be sufficient to alter its connections with other neurons.

      • Generally, the statistical approaches were not sufficiently described in the methods nor in the figure legends, making it difficult to assess the findings. They do not report on how they calculated FDR for connectivity data, when this is typically used for larger multivariate datasets.

      We employed the False Discovery Rate (FDR) correction, specifically the BenjaminiHochberg method, to determine which values remained statistically significant. This method is widely accepted and involves inputting all the p-values and the total number, 'n.' Additional details about this correction are now provided in the Method section for clarity.

      • The possibility that the OE effects are driven by total Pcdhg levels, rather isoform matching, should be examined. As shown by qRT-PCR in Fig. 3, expression of individual isoforms can vary. It is reasonable that protein levels cannot be measured by IHC, although epitope tags could be considered as C-terminal tagging of cPcdhs preserves the function in mice (see Lefebvre 2008). Quantification of constant Pcdhg RNA levels by qRT-PCR or sc-RT-PCR would directly address the potential caveat that OE levels vary with isoform combinations.

      Through a series of multiple whole-cell recordings, we examined neuronal pairs within the 0% group, where both neurons exhibited overexpression of different combinations of γPCDH isoforms. What we discovered is that the connectivity level within pairs of neurons where both neurons overexpressed γ-PCDH isoforms, pairs with only one neuron overexpressing these isoforms, and pairs with two control neurons (lacking overexpression) was remarkably similar. However, as we incrementally raised the similarity level between the recorded neurons by increasing the overlap in the combinatorial expression of γ-PCDH isoforms, we observed a gradual decrease in the connectivity probability between these neurons. Notably, the connectivity probability reached its minimum when the recorded cells had the exact same combinatorial expression of γ-PCDH isoforms at the 100% similarity level. These findings suggest that the similarity level between neurons, rather than the absolute expression level of γ-PCDH isoforms, plays a critical role in affecting synapse formation.

      -A caveat for the relative plasmid expression quantifications in Figure 3-S1 is that IHC was used to amplify the RFP-tagged isoform, and thus does not likely preserve the relationship between quantities and detection.

      We attempted to enhance the mNeongreen signal, known for its exceptional signal-tonoise ratio, by utilizing the 32f6-100 antibody from Chromotek. However, our observations did not reveal any additional cells through immunostaining compared to the images obtained solely based on the mNeongreen signal. This indicates that the application of the available antibody did not yield a significant improvement in cell detection.<br /> It's important to emphasize that if the RFP signal is overvalued, it would result in an increase in both the "red only" and "red in total" categories. However, it's worth noting that the "red only" category is more sensitive to the outcome than the "red in total" category. Therefore, an overvaluation of the RFP signal would lead to an underestimation of the total estimated plasmid content in electroporated neurons. Consequently, this would result in a lower estimate for the proportion of co-expression cells rather than a higher estimate. We have updated the calculation method in the "Estimating the numbers of overexpressed γPCDH isoform" section to reflect these considerations.

      • Figure 1 didn't change in response to reviews to improve clarity. New panels relating to the scRNASeq analyses were added to supplementary data but many are central and should be included in Figure 1 (ie. S1-Fig6D). In the Results, the authors state that neuronal subpopulations generally show a combinatorial expression of some variable RNA isoforms and near ubiquitous C-type expression. But they only show data for the Layer 2/3 neuron-specific cluster in S1-Fig-6D, and so it is not clear if this pattern applies to other clusters. Fig. S1-5 show a low number of expressed isoforms per cell, but specific descriptions on whether these include C-type isoforms would be helpful. Figure 1F showing isoform profile in all neurons is not particularly meaningful. There is a lot of interest in neuron-type specific differences in cPcdh diversity, and the authors could highlight their data from S1-5 accordingly.

      In addition to the layer 2/3 cluster, we observed a diverse combinatorial expression of various variable γ-PCDH isoforms alongside nearly ubiquitous C-type expression in all other clusters of cells. We have now explicitly mentioned this observation in the main text. To underscore this point further, we have included a new figure, Fig. 1-S6, which provides information on the similarity analysis for all other clusters. It's important to note that the data in previous Fig. S1-5 (now renumbered as S1-7) were solely related to "variable" isoforms. We apologize for any confusion and have made this clarification by including it in the title of the figure.

      • The concept of co-occurrence and results should be explained within the results section, to more clearly relate this concept to data and interpretations. Explanations are now found in the methods, but this did not improve the clarity of this otherwise very interesting aspect of the study.

      Thanks for your suggestion. We have incorporated some of the explanations from the methods section into the main text t, mainly for the concept of “co-occurence”.

      • The claim that C-type Pcdhgs do not functionally influence connectivity is premature. Tests were limited to PcdhgC4, which has unique properties compared to the other 2 C-type isoforms (Garrett et al 2019 PMID: 31877124; Mancia et al PMID: 36778455). The text should be corrected to limit the conclusion to PcdhgC4, and not generally to C-type. The authors should test PcdhgC3 and PcdhgC5 isoforms.

      We have changed the claim for PcdhgC4, but not generally for C-type to better reflect our observation.

      • The group generated a novel conditional Pcdh-alpha mouse allele using CRISPR methods, and state that there were no changes in synaptic connectivity in these Pcdh-alpha mutants. But this claim is premature. The Southern blots validate the targeting of the allele. But further validations are required to establish that this floxed allele can be efficiently recombined, disrupting Pcdha protein levels and function. Pcdha alleles have been validated by western blots and by demonstration of the prominent serotonergic axonal phenotype of Pcdha-KO (ie. Chen 2017 PMID: 28450636; IngEsteves 2018 PMID: 29439167).

      We have obtained a new set of qRT-PCR data that confirms the decreased expression of α-PCDH in Pcdha CKO mice. These data have been integrated into Figure 2-S2D.

      • The Discussion would be strengthened by a deeper discussion of the findings to other cPcdh roles and studies, and of the limitations of the study. The idea that the Pcdhgs are influencing the rate of connectivity through a repulsion mechanism or synaptic formation (ie through negative interactions with synaptic organizers such as Nlgn - Molumby 2018, Steffen 2022) could be presented in a model, and supported by other literature.

      I would like to express my sincere appreciation to the reviewer for their invaluable comments and suggestions, which have led to extended discussions within our work. We have incorporated these suggestions into our paper to establish stronger connections between our observations and prior research findings.

      Reviewer #1 (Recommendations For The Authors):

      1) In Figure S6, the authors measured Euclidean distance from the single cell data to take account of the isoform expression levels in explaining diversity. However, it is hard to interpret the data without any control. The authors could measure the same value from a shuffled /randomized dataset for comparison (similarly to Fig 1F).

      We understand the reviewer's concern about the significance of the Euclidean distance analysis without an appropriate control. The inclusion of the Euclidean distance metric was initially a response to suggestions from other reviewers who recommended incorporating diverse methods for analyzing expression patterns among neurons.

      In response to your valuable feedback, we have taken measures to address these concerns. We have introduced shuffled data for comparison, thus enhancing the meaningful context for interpreting the results derived from the Euclidean distance analysis.

      2) The authors need to clarify which cortical regions were used for electrophysiological experiments.

      Apologies for any confusion. To clarify, all recordings were conducted on neurons located in layer 2/3 of the neocortex without further discrimination. We have reinstated this information in both the main text and the methods section to ensure its clarity.

      Reviewer #2 (Recommendations For The Authors):

      There are still some issues that must be addressed.

      1) The references to gamma protocadherin repulsion are not correct in context. A repulsive role of homophilic interaction has been inferred from certain knockout phenotypes in a subset of neurons (not in cortical neurons). However, repulsion has never been shown to follow gamma protocadherin engagement. The authors present no new evidence that their results are attributable to cellular repulsion at nascent synaptic contacts. The mechanism is unknown. The references to repulsion to explain their results should make it clear that this is one possible explanation, but it is not shown. Also some references in the text are not correct. For example, line 63/64: the results of Molumby and Steffen are not involving homophilic adhesion or repulsion, but rather a cis interaction with neuroligins. Those papers should not be discussed as involving repulsion as in the reference to Lefebvre 2012. Also line 268/269 "Together with previous findings (Molumby,,,Tarusawa), our observations solidify repulsion effect of g-PCDH on synapse formation. . .". This is not the case. Neither Molumby nor Tarusawa demonstrated any such repulsion.

      Thank you to the reviewer for pointing out the errors in our citations. We have made the necessary corrections to the citations and have also refined the descriptions of our observations to improve clarity and accuracy.

      2) The discussion of the results when C4 is overexpressed must also be greatly toned down. C4 is a strange C-type protein--it cannot get to the cell surface alone but relies on other cPCDHs for this, and its primary role is in preventing cell death. It is odd that the authors used this isoform to represent C-types. They should have used C3, which two recent papers showed have specific roles at some synapses (Meltzer et al 2023, Ginty lab) and in dendrite branching (Steffen et al 2023, Weiner lab) , or C5. It is entirely possible that just C4 has no role in synaptic matching--but C3 and C5 might. They should not conclude that the C-types have no such role and only A and B types do. That must be toned down (e.g., line 198/199, line 281).

      We acknowledge that using C4 to represent all three C-types (C3, C4, and C5) is not accurate. We have now modified the statement in the main text to rectify this.

      3) For the citation of Pcdhg flox/flox mice (line 126), Prasad et al., Development, 2008, Weiner lab, should also be cited as it fully characterized that line that was also used in Lefebvre et al 2008. They were co-published.

      Thank you for highlighting the missing citation, and we have now included it in the relevant section.

      4) the Pcdh alpha KO Mouse characterization is still insufficient. The authors must show that alpha expression is gone following introduction of Cre, either by RT-PCR using alpha constant domain primers, or an alpha antibody on Western. blot. The southern and off-target sequencing do not confirm that all alpha gene expression is gone.

      Thank you for your feedback. We have conducted the qRT-PCR analysis as per your suggestion. The results clearly indicate a substantial reduction in α-PCDH expression within the neocortex of Pcdha cKO mice. We have thoughtfully incorporated this data into the manuscript, and it is visually represented in the new panel of Figure 2-S2D. Your valuable input has contributed to enhancing the quality of our work, and we sincerely appreciate the opportunity to address this important aspect.

      5) I do not understand something in Figure 4-S1A. Why with 0% matching is synaptic connectivity so low? This is not the same as in Figure 3E. This has to be explained because it does suggest that overexpression of ANY isoforms can inhibit synapse formation, which is consistent with Molumby 2017, even though this paper says it is not just the levels but the isoform specificity.

      The panel of Fig.4-S1A illustrates the connection rate between neurons with the same color (icons in upper left), representing cells that express the same combination of γ-PCDHs (100% of similarity). The X-axis (0%, 11%, 33%, and 100%) reflects the similarity level between the 2 populations of cells (GFP and RFP).

      6) There are still issues with the English grammar in the paper. It is not too bad in the main text but someone should re-edit it. However, the figure legends are indeed much worse and truly must be edited professionally before they are acceptable.

      We apologize for our English writings in the paper. We have now polished most part of the manuscript, especially the parts for figure legends.

      Reviewer #3 (Recommendations For The Authors):

      • This study has many strengths and innovative findings. Most comments above included suggestions to strengthen the paper. The overall message that Pcdhgs influence the rate of synaptic connectivity between nearby cells is compelling. How this Pcdhg-isoform-dependent process could influence synaptic specificity can be explored in a model in the discussion. But this study did not test a role in 'synaptic specificity'; this term should be removed from the title and line 81 in the intro.

      Thank you for your invaluable comments aimed at improving our paper. Regarding the title, we believe that "synaptic connectivity" might be a more suitable choice than "synaptic specificity." However, we're open to considering other alternatives as well.

      • The manuscript and overall quality of the science will be improved by removing those sections that are not adequately investigated (ie.Pcdh-a cKO; PcdhgC4 is assessed but findings can't be extended to other C-type isoforms) and by outlining limitations of the study.

      We have modified the related claim mentioned in the main text.

      • The studies negatively correlating between isoform matching and connectivity are not robust. Additional approaches are needed if the authors want to make this claim.

      In Figure 4E, we have implemented a bootstrapping method. Bootstrapping is a statistical technique falling under the broader category of resampling methods. It involves random sampling from the observed data with replacement, enabling the calculation of standard errors, confidence intervals, and supporting hypothesis testing.

      • Statistical approaches should be described in methods, figure legends.

      More information about statistical approaches has been added in the figure legends.

      • The discussion should elaborate on the limitations of the study, and relate to other studies, including Lv et al 2022.

      We have added more discussion to relate our observations to previous findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      R1.1) Although very robust and capable of handling several situations, the researcher has to keep in mind that processing has to follow some basic rules in order for this pipeline to work properly. For instance, fiducials and scales need to be included in the photograph, and the slabs must be photographed against a contrasting background.

      Our pipeline does indeed have some prerequisites in terms of data acquisition – at the very least, a ruler must be present in the photographs. A contrasting background is not strictly needed, but does definitely facilitate segmentation. We have edited the Introduction and Discussion to emphasize these prerequisites.

      R1.2) Also, only coronal slices can be used, which can be limiting for certain situations.

      While the 3D reconstruction based on Eq. 1 is quite general, the segmentation is indeed tailored to coronal slices of the cerebrum. As explained in the paper, this orientation is standard when slicing the cerebrum, but axial or sagittal slicing may also be of interest – particularly when dissecting the brainstem or cerebellum. We acknowledge this limitation in the Discussion of the revised manuscript.

      R1.3) In the future, segmentation of the histological slices could be developed and histological structures added (such as small brainstem nuclei, for instance). Also, dealing with axial and sagittal planes can be useful to some labs.

      While outside the scope of this paper, these are good ideas for future directions, and are considered in the Discussion of the revised version.

      Reviewer 2

      R2.1) The current method could only perform accurate segmentation on subcortical tissues. It is of more interest to accurately segment cortical tissues, whose morphometrics are more predictive of neuropathology. The authors also mentioned that they would extend the toolset to allow for cortical tissue segmentation in the future.

      We agree with the reviewer that cortical parcellation has high value. We have included a new option in Photo-SynthSeg to parcellate the cortex using a machine learning block already existing in SynthSeg 2.0 (Billot et al, PNAS, 2023); see example in Figure 2 of the revised manuscript. This parcellation is volumetric; more accurate methods based on surfaces are out of the scope of this article and remain as future work. The manuscript has been edited to reflect these changes.

      R2.2) Brain tissues are not rigid bodies, so dissected slices could be stretched or squeezed to some extent. Also, dissected slices that contain temporal poles may have several disjoined tissues. Therefore, each pixel in dissected photographs may go through slightly diFerent transformations. The authors constrain that all pixels in each dissected photograph go through the same aFine transform in the reconstruction step probably due to concerns of computational complexity. But ideally, dissected photographs should be transformed with some non-linear warping or locally linear transformations. Or maybe the authors could advise how to place diFerent parts of dissected slices when taking dissection photographs to reduce such non-linearity of transforms.

      The reviewer is totally right. The problem with nonlinear warps is that, albeit trivial to implement, they compromise the robustness of the registration pipeline. This is because the nonlinear model introduces huge ambiguity in the space of solutions: for example, if one adds identical small nonlinear deformations to every slice, the objective function barely changes. The revised manuscript: (i) more thoroughly discussed this limitation; (ii) discusses nonlinear models for 3D reconstruction as future work; and (iii) makes recommendation about the tissue placement to minimize errors around the temporal pole.

      R2.3) For the quantitative evaluation of the segmentation on UW-ARDC, the authors calculated 2D Dice scores on a single slice for each subject. Could the authors specify how this single slice is chosen for each subject? Is it randomly chosen or determined by some landmarks? It's possible that the chosen slice is between dissected slices so SAMSEG cannot segment accurately.

      The slice is chosen to be close to the mid-coronal plane, while maximizing visibility of subcortical structures. The chosen slice is always a “real” dissected slice (rather than a digital “virtual” slice) and cannot be located in a gap between slices. This is clarified in the Quantitative Evaluation section of the revised manuscript.

      R2.4) Also from Figure 3, it seems that SAMSEG outperforms Photo-SynthSeg on large tissues, WM/Cortex/Ventricle. Is there an explanation for this observation?

      Since we use a single central coronal slice when computing Dice, SAMSEG yields very high Dice scores for large structures with strong contrast (e.g., the lateral ventricles). However, Photo-SynthSeg provides better results across the board, particularly when considering 3D analysis (see Figure 2 and results on volume correlations). We have added a comment on this issue to the revised manuscript.

      R2.5) In the third experiment, quantitative evaluation of 3D reconstruction, each digital slice went through random aFine transformations and illumination fields only. However, it's better to deform digital slices using random non-linear warping due to the non-rigidity of the brain as mentioned in R2.2. So, the reconstruction errors estimated here are quite optimistic. It would be more realistic if digital slices were deformed using random nonlinear warping.

      We agree with the reviewer and, as we acknowledge in the manuscript, the validation of the reconstruction error with synthetic data is indeed optimistic. The problem with adding nonlinear warps is that the results will depend heavily on the strength of the simulated deformation. We keep the warps linear as we believe that the value of this experiment lies in the trends that the errors reflect, as a function of slice thickness and its variability (“jitter”). This has been clarified in the revised manuscript.

      Reviewer 2 (recommendations for the authors)

      AR2.1) In the abstract, the authors mentioned that the segmentations of the 3D reconstructed stack deal with 11 brain regions, however, in most sections, only 9 tissue masks were compared, such as in Table 1, 2, and Figure 3. Also in the supplementary video, there are only 10 rendered tissues. So, what are these 11 regions? Is the background nonbrain region also counted as a region? And how these 11 regions were derived from the original 36 annotated tissues in T1-39?

      We particularly thank the reviewer for noticing this.

      The 11 regions are white matter, cortex, ventricle, thalamus, caudate, putamen, pallidum, hippocampus, amygdala, accumbens area, and ventral diencephalon. These are all bilateral labels, i.e., 22 regions in total. The original 36 labels include these 22 and: four labels for the cerebellum (left and right cortex and white matter); the brainstem; five labels for cerebrospinal fluid regions that we do not consider; the left and right choroid plexus; and two labels for white matter hypo intensities in the left and right hemisphere.

      As in many other papers, we leave “ventral diencephalon” and “accumbens area” out of the validation as they are not very well defined.

      We note that all regions except the accumbens are visible in Figure 1d. The ventral diencephalon is easy to miss as only a small portion of it is visible (when picking a slice, one needs to compromise in terms of how much of each structure is visible). Moreover, it has a very similar color to the cortex in the FreeSurfer convention (see picture below).

      Author response image 1.

      The accumbens is visible at 1m45s in the, segmented in orange (see capture below).

      Author response image 2.

      We have clarified these issues in the reviewed version of the manuscript.

      RA2.2) In Figure 1(f), why are the hippocampal volumes of confirmed AD subjects larger than those of the healthy controls? Is this a typo or is there any explanation for this?

      Yes, it is a typo. Again, thank you very much for noticing this.

      RA2.3) Typo on P3, "sex and gender were corrected" should be "age and gender were corrected".

      This has been corrected in the revised version.

      RA2.4) In the MADRC dataset, the authors mentioned that there are 18 full brains and 58 hemispheres, however, the total data size is 78. Is this a typo?

      Yes, it is. It has been corrected in the revised version.

      RA2.5) Comparing the binary masks in Figure 5(d) and the photographs in Figure 5(c), some tissues below the ventricles with high intensities are also removed from masks. Is this done by manual editing? If so, how long does it usually take to edit a clean mask for each subject?

      We used a combination of thresholding, morphological operations (erosion/dilation), and minor manual edits when needed – particularly to remove chunks of pial surface when they are visible, in the most anterior slices. The average is a couple of minutes per photograph. In the future, we plan to use these manually curated images to train a supervised convolutional neural network to perform the task automatically. These details are provided in the revised manuscript.

      RA2.6) In the method of 3d reconstruction, there are four weights for the optimization function. How did the authors determine such weights and do these weights have some impact on the reconstruction performance?

      The parameters were set by visual inspection of the output on a small pilot dataset, and do not have a strong impact on the reconstruction. The crucial aspect is to increase 𝜈 (the affine regularizer) and decrease 𝛼 (compliance with the external reference) when using a soft reference. These details have been added to the revised version.

      RA2.7) Finally for the deep learning-based segmentation, a U-Net was trained on GMM generated single-channel intensity synthetic images while the dissected photographs are color images with three channels. So, did the authors only input the grayscale photographs to the segmentation network? Are there any other preprocessing steps for color photographs, such as normalization? Is it possible to use GMM to generate color images as training data to better suit dissection photography?

      We did try simulating three channels during training, but the performance was actually worse than when simulating one channel and converting the RGB input to grayscale. This information has been added to the revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and insightful and constructive comments. We are pleased that reviewers found this study “opens the way for novel future work” and the findings “interesting”. We have experimentally addressed the points raised by the reviewers and have substantially revised the manuscript by modifying 30 figures panels. The reviewers’ points are specifically addressed below.

      1) The authors concluded that an accumulation of Ly6Clo monocytes occurred in the Rbpjfl/fl Lyz2cre/cre mouse by examining the percentage of cells among CD45+ cells in Figure 1. It would be helpful if the authors could give an account of the total cell count numbers of monocyte subsets per ml of blood and in the bone marrow to give the readers a better idea of the extent of increase as cell percentages among CD45+ cells may be influenced by the number of other immune subsets.

      We thank the reviewer for raising these points. In this research, we crossed Rbpjfl/fl mice with Lyz2-Cre mice carrying the Cre recombinase inserted in the Lysozyme-M (Lyz2) gene locus results in the selective deletion of RBP-J in myeloid cells, such as monocytes, macrophages and granulocytes. We then proceeded to examine the neutrophil levels in the bone marrow and blood. The percentage of neutrophils observed was found to be similar to that of control mice, which was in line with the findings reported in the literature (Metzemaekers et al. 2020). Furthermore, the proportion of Ly6Chi monocytes in RBP-J deficient mice was found to be similar to that of control mice, which is consistent with the literature (Ginhoux et al. 2014). Based on these results, we thought that the changes observed in the proportion of Ly6Clo monocytes could reliably indicate the alterations occurring in Ly6Clo monocytes within the Rbpjfl/flLyz2cre/cre mice.

      2) The authors demonstrated no significant differences in bone marrow progenitor and monocyte numbers, therefore concluding that monocyte egress from the bone marrow did not contribute to the increase in Ly6Clo monocyte numbers in the blood (Figure 1B-D). As it is unclear what is the exact cell number increase in the blood, the changes in bone marrow monocyte numbers might be too small to be reflected in their percentage calculations. In light that CCR2 was also found to play a role in Ly6Clo monocyte homeostasis in Rbpjfl/fl Lyz2cre/cre mice, could the authors demonstrate if Rbpj-deficient Ly6Clo monocytes might be more responsive to CCL2 through transwell experiments? This would also provide readers a more in-depth mechanism of how an increase in CCR2 on Rbpj-deficient Ly6Clo monocytes leads to their accumulation in the periphery.

      The experimental results regarding the proportion of monocytes and precursor cells in the bone marrow were derived from multiple experiments. The data obtained from individual experiments as well as the final integrated data did not reveal significant differences between the control mice and Rbpjfl/flLyz2cre/cre mice. Therefore, we believed that even if there were small changes in cell numbers, these differences could still be reflected through alterations in their proportions. We attempted transwell experiments, but unfortunately, they were not technically successful. Nearly all sorted Ly6Clo monocytes attached to the transwell membrane, making it challenging to draw a conclusion regarding the responsiveness of RBP-J deficient Ly6Clo monocytes to CCL2.

      3) In the parabiosis experiment conducted in Figure 3C-E, the authors provide conclusive evidence that the accumulation of Rbpj-deficient Ly6Clo monocytes was cell intrinsic as Rbpj-deficient Ly6Clo monocytes continued to accumulate in the blood of control counterparts. Monocytes have also been shown to accumulate in the spleen and re-enter or home back to the bone marrow. Assessing if there is a change in monocyte homing abilities in Rbpj-deficient Ly6Clo monocytes by examining their numbers in the spleen and bone marrow of control parabiotic mice would substantiate their claims that the defect was cell intrinsic and provide further understanding for the readers of why Rbpj-deficient Ly6Clo monocytes accumulate in the blood.

      We thank the reviewer for bringing out this interesting point. We also analyzed the proportions of GFP- Ly6Chi monocytes and Ly6Clo monocytes in the bone marrow of parabiotic mice. The experimental results revealed that there were no significant differences in the proportion of GFP- monocytes between the control mice and the KO animals (see the figure A below). We also detected the expression of CXCR4 in bone marrow Ly6Clo monocytes. Rbpjfl/flLyz2cre/cre mice exhibited normal expression of CXCR4 (see Author response image 1 below), which participates in the homing of classical and nonclassical monocytes to bone marrow and spleen monocyte reservoirs (Chong et al. 2016). The homing abilities of RBP-J deficient Ly6Clo monocytes may not have changed.

      Author response image 1.

      4) Authors should provide cell counts for Figure 5B to demonstrate the extent CCR2 depletion affects the number of Ly6Clo monocytes in Rbpjfl/fl Lyz2cre/cre mice as explained in point 1.

      As mentioned before, we believed that the proportion of circulating monocytes could, to some extent, provide evidence of the impact of CCR2 deficiency on Ly6Clo monocytes.

      Reviewer #2

      1) The confirmation of knockout in supplemental figure 1A shows only a two third knockdown when this should be almost totally gone. Perhaps poor primer design, cell sorting error or low Cre penetrance is to blame, but this is below the standard one would expect from a knockout.

      Kang et al (PMID: 31944217) evaluated the knockout efficiency of Rbpj in sorted colonic macrophages of Rbp-jfl/flLyz2cre/cre mice using qPCR and immunoblotting. The qPCR result indicated a two-third knockdown, while the immunoblotting results demonstrated efficient deletion of RBP-J protein in Rbp-jfl/flLyz2cre/cre mice. As pointed out by the reviewer, the observed two-third knockdown, which is lower than the expected complete knockout, may be attributed to primer design.

      2) Many figures (e.g. 1A) only show proportional data (%) when the addition of cell numbers would also be informative

      We appreciate the reviewer for bringing up these points. Indeed, multiple articles studying monocytes only show changes in cell proportions. As mentioned above, we believed that analyzing the proportion of circulating monocytes could offer valuable evidence of the influence of RBP-J deficiency on Ly6Clo monocytes.

      3) Many figures only have an n of 1 or 2 (e.g. 2B, 2C)

      Here, we employed annexin V (AnnV) and propidium iodide (PI) staining to evaluate apoptosis and cell death in Ly6Chi and Ly6Clo blood monocytes from control and RBPJ deficient mice. The results showed no significant difference in the levels of apoptosis and cell death between the two groups (see Author response image 2 below). The statistical data for Ki-67 expression obtained from multiple experiments, and the expression of Ki-67 showed no significant difference between the control and RBP-J deficient mice (see the figure B below). In Figure 2C, each dot represents 2-3 mice, and there were no differences observed between control and RBP-J deficient mice at multiple time points during the repeated measurements.

      Author response image 2.

      4) Sometimes strong statements were based on the lack of statistical significance, when more n number could have changed the interpretation (e.g. 2G, 3E)

      We have derived the corresponding conclusions based on the observed experimental results.

      5) There is incomplete analysis (e.g. Network analysis) and interpretation of RNAsequencing results (figure 4), the difference between the genotypes in both monocyte subsets would provide a more complete picture and potentially reveal mechanisms

      We thank the reviewer for bringing out this point. We agreed that a more comprehensive analysis, including a comparison between the genotypes in both monocyte subsets, would provide a deeper understanding and potentially uncover underlying mechanisms. Having observed alterations in blood Ly6Clo monocytes in RBP-J deficient mice, our primary focus had been on analyzing the differentially expressed genes within this subset of monocytes to gain further insights into its specific characteristics and behavior. We also uploaded sequencing data sets in the Genome Expression Omnibus with assigned accession numbers GSE208772 to facilitate interested researchers in accessing and downloading the data.

      6) The experiments in Figures 5 and 7 are missing a control (Lyz2cre/cre Ccr2RFP/RFP or the Rbpj+/+ versions) and may have been misinterpreted. For example if the control (RBP-J WT, CCR2 KO) was used then it would almost certainly show falling Ly6C low numbers compared to RBP-J WT CCR2 WT, but RBP-J KO CCR2 KO would still have more Ly6c low monocytes than RBP-J WT, CCR2 KO - meaning that the RBP-J function is independent of CCR2. I.e. Ly6c low numbers are mostly dependent on CCR2 but this is irrespective of RBP-J.

      The diminished Ly6Clo monocytes in Rbpjfl/flLyz2cre/creCcr2RFP/RFP (DKO) mice can be divided into two distinct subpopulations: one portion originates from Ly6Chi monocytes, while the other comprises Ly6Clo monocytes characterized by heightened CCR2 expression. The Ly6Clo monocytes that remain in DKO mice exhibit CCR2 expression levels within the normal range when compared to Lyz2cre/cre mice, but lower levels compared to RBP-J deficient mice (Figure 5A). These findings suggest that RBP-J exerts regulatory influence over Ly6Clo monocytes, at least in part, through CCR2.

      7) Figure 6 was difficult to interpret because of the lack of shown gating strategy. This reviewer assumes that alveolar macrophages were gated out of analysis

      The gating strategy of lung interstitial macrophage in the manuscript Figure 6 was consistent with the published work (Schyns et al, cited in the manuscript). We also measured alveolar macrophages (AM) from control and RBP-J deficient mice bronchoalveolar lavage fluid. At the resting state, RBP-J deficient mice exhibited normal AM frequency and number (see Author response image 3 below).

      Author response image 3.

      8) The statements around Figure 7 are not completely supported by the evidence, i) a significant proportion of CD16.2+ cells were CCR2 independent and therefore potentially not all recently derived from monocytes, and ii) there is nothing to suggest that the source was not Ly6C high monocytes that differentiated - the manuscript in general seems to miss the point that the source of the Ly6C low cells is almost certainly the Ly6C high monocytes - which further emphasises the importance of both cells in the sequencing analysis

      Schyns et al and Sabatel at al showed that the numbers of IM and CD16.2+ were similar in Ccr2 sufficient and Ccr2-/- mice, demonstrating that CD16.2+ cells were Ccr2 independent. The number of CD16.2+ cells was significantly reduced in Rbpjfl/flLyz2cre/creCcr2RFP/RFP mice as compared to Rbpjfl/flLyz2cre/cre mice, in line with decreased number of lung Ly6Clo monocytes and blood Ly6Clo monocytes, showing that CD16.2+ cells depended on Ccr2 for their presence in Rbpjfl/flLyz2cre/cre mice.

      9) The authors did not refer to or cite a similar 2020 study that also investigated myeloid deletion of Rbpj (Qin et al. 2020 - https://doi.org/10.1096/fj.201903086RR). Qin et al identified that Ly6Clo alveolar macrophages were decreased in this model - it is intriguing to synthesise these two studies and hypothesise that the ly6c low monocytes steal the lung niche, but this was not discussed

      We thank the reviewer for bringing this study to our attention. According to their findings, myeloid-specific RBP-J deficiency resulted in a decrease in Ly6CloCD11bhi alveolar macrophages but an increase in Ly6CloCD11blo alveolar macrophages after bleomycin treatment, while the total number of alveolar macrophages showed no significant difference. These results suggest that RBP-J may play a role in regulating the balance between these specific alveolar macrophage subsets in response to bleomycin-induced injury, without affecting the overall population of alveolar macrophages. This may be different from what we observe in interstitial macrophages under resting conditions.

      Reviewer #3

      1) It is curious that the authors do not see the increase in circulating monocytes reflected in the spleen however, the n-number is 2. Increasing the n-number would enable the author to understand the data which is not interpretable at the moment. There are multiple other places in which a low n-number makes it hard to fully understand the biology (eg Figure 2C&E)

      Although we only counted the number of splenic monocyte subsets in two mice, the proportion of splenic monocyte subsets was calculated based on additional quantity of mice in our study.

      2) Given that Ly6Clow monocytes are thought to be longer lived than Ly6C+ and there is still considerable labelling of Ly6Clow monocytes at the end of the 96 hours analysed in the EdU experiment, it is not possible to determine from the data here whether RBPJ deficiency increases life span. Could it be that differences in %EdU+ cells would only be seen at later time points? If the timeline was extended, could it be that differences in %EdU+ become apparent

      Based on the latex bead experiment, we observed that the presence of latex+ Ly6Clo monocytes at 7 days in control and RBP-J deficient mice did not differ, indicating that the lifespan of Ly6Clo monocytes did not increase.

      3) Similarly for the latex bead experiment. Given that there is only n=2 at the first time point and only ~30% of Ly6Clow monocytes are Latex+, it is very hard to conclusively claim that RBP-J does not influence monocyte survival or proliferation. An interesting experiment to assess whether RBP-J is increasing monocyte survival could be an adoptive transfer model in which Ly6Clow monocytes are injected into a congenic mouse and tracked over time.

      In RBP-J deficient mice, there was an increase in the proportion of Ly6Clo monocytes. We hypothesized that this lower proportion of latex+ cells might make it easier to observe differences, but clearly, in our experiment, no differences were observed between control and RBP-J deficient mice.

      4) RNA-seq: Ccr2 and Itgax are not the top hits. The authors do not investigate the top hits which may provide very interesting insight into how RBP-J influences monocyte biology.

      We thank the reviewer for raising these points. We also analyzed some top changed genes. The top two gene in the downregulated gene list are Hes1 and Nrarp, which are regulated by the Notch pathway (Krebs et al 2001 and Radtke et al 2010). We tested blood monocytes, but the population of monocyte subsets displayed no differences between Hes1fl/flRbp-jfl/flLyz2cre/cre and Rbp-jfl/flLyz2cre/cre mice (data not shown). As shown in Figure 2- figure supplement 1A, expression of Nr4a1 showed no significant differences between control and RBP-J deficient mice. The top gene in the upregulated gene list is Erdr1, which has been reported to play a role in cellular survival (Soto et al 2017), while blood monocyte subsets in RBP-J deficient mice displayed normal survival.

      5) The PCA plot in figure 4C- it would be interesting to see where all the biological replicates fall.

      We agree with the reviewer’s assessment that observing the positions of all biological replicates on the PCA plot may indeed yield valuable insights. However, it is worth noting that the upregulated and downregulated genes also offer suggestive hints.

      6) Based on CCR2 expression and CD11c expression, monocytes from RBP-J deficient mice look more like Ly6C+ monocytes - could it be that RBP-J is increasing conversion from Ly6C+ monocytes to Ly6Clow? Or could it be that Ly6Clow monocytes are heterogeneous and RBP-J is increasing survival or conversion of one subtype of Ly6Clow monocytes but looking at all Ly6Clow monocytes together is masking this?

      Ly6Clo monocyte can be subdivided into different subpopulations depending on surface makers, such as CD43, MHC-II, CD11c and CCR2 (Jakubzick et al 2013 and Ginhoux et al. 2014). Carlin et al founded that a subset of blood Ly6Clow cells was independent of both Ccr2 and Nr4a1. As said by the reviewer, Ly6Clo monocytes are heterogeneous. Therefore, there is a possibility of altered survival in a certain group of Ly6Clo monocytes.

      7) The data presented here suggest that lung CD16.2+ interstitial macrophages are derived from Ly6Clow monocytes which are increased via CCR2. Although the data are suggestive, they are not conclusive, lineage tracing and CCR2 blockade or better, conditional CCR2 deficiency would help to strengthen the claim.

      Schyns et al showed that the number of CD16.2+ was similar in Ccr2 sufficient and Ccr2-/- mice, demonstrating that CD16.2+ cells were Ccr2 independent. While number of CD16.2+ cells was significantly reduced in Rbpjfl/flLyz2cre/creCcr2RFP/RFP mice as compared to Rbpjfl/flLyz2cre/cre mice, in line with decreased number of lung Ly6Clo monocytes and blood Ly6Clo monocytes. Moreover, the turnover of lung Ly6Chi and Ly6Clo monocytes was normal. These results implicated that CD16.2+ cells depended on Ccr2 for their presence in Rbpjfl/flLyz2cre/cre mice.

      8) The figures could do with more headings/ more detailed legends to help the reader, for example including what is BM, what is blood, what is spleen. Figure 2E needs the days labelled on or above the histograms.

      We thank the reviewer for raising this important point. We have now added additional detailed legends to the figure.

      9) Gating strategies should be included to help the reader understand which cells you are looking at, especially for Figure 6&7.

      The gating strategy for Figures 6 and 7 followed the method reported in the literature, which included the identification of alveolar macrophages. Additionally, we labeled the markers for cell populations in the figure.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) Line 99-100 The authors claimed that IQCH is a novel IQ motif-containing protein, which is essential for spermiogenesis and fertilization. However, it is not clear if the currently published paper named an ancient testis-specific IQ motif containing H gene that regulates specific transcript isoform expression during spermatogenesis.

      Response: Thanks to the reviewer’s comment. Yes, IQCH is the ancient testis-specific IQ motif containing H gene. According to the reviewer’s suggestion, we have revised the statement “Here, we revealed a testis-specific IQ motif containing H gene, IQCH, which is essential for spermiogenesis and fertilization” in Introduction part of revised manuscript.

      2) Line 154-159 Immunofluorescence staining for the marker of the acrosome (peanut agglutinin: PNA) as well as the mitochondrial marker (Transcription Factor A, Mitochondrial: TFAM) was performed to confirm the deficiency of the acrosomes and mitochondria in the proband's spermatozoa. It seems that the spermatozoa acrosomes and mitochondria were severely defective in the proband. The authors should indicate IQCH's role in mitochondrial and acrosome function and IQCH's role in mitochondrial and acrosome function these points by explaining how IQCH is related to mitochondrial and acrosome deficiency. In addition to staining, other functional analyses should be performed to strengthen the claim of acrosome and mitochondrial defects.

      Response: We appreciate the reviewer's valuable suggestion. Indeed, in our study, the results of multiomics analysis on WT and Iqch KO testes, including LC-MS/MS analysis, proteomic analysis, and RNA-seq analysis, found a potential role of IQCH in mitochondrial and acrosome function. GO analysis of these analysis indicated a significant enrichment in mitochondrial and acrosomal functions, including acrosomal vesicle, acrosome assembly, vesicle fusion with Golgi apparatus, mitochondrion organization, mitochondrial matrix, and so on. Among the enriched molecules, in particular, HNRNPK mainly expresses at Golgi phase and Cap phase (Biggiogera et al. 1993). ANXA7 is a calcium-dependent phospholipid-binding protein that is a negative regulator of mitochondrial apoptosis (Du et al. 2015). Loss of SLC25A4 results in mitochondrial energy metabolism defects in mice (Graham et al. 1997). Furthermore, we confirmed that IQCH interacted with HNRNPK, ANXA7, and SLC25A4 through Co-IP, and exhibited downregulation in the sperm of the Iqch KO mice by immunofluorescence and western blotting. Moreover, IQCH can bind to HNRPAB, which could influence the mRNAs level of Catsper-family, such as Catsper1, Catsper2, and Catsper3, which are crucial for acrosome development (Jin ZR et al). In addition, we also detected HNRPAB binding to Dnhd1, which affects mitochondria development (Tan C et al). Therefore, in addition to staining, the other functional analyses also have provided the evidence of acrosome and mitochondrial defects caused by IQCH absence.

      3) Line 180-182 IQCH knockout mice were generated. It is not clear why Mut-IQCH mice were not generated to be consistent with the human sequencing data.

      Response: Thanks for reviewer’s comments. To understand IQCH's impact on fecundity in mice, we employed CRISPR-Cas9 to generate mice encoding the orthologous variant of IQCH387+1_387+10del detected in humans. Regrettably, due to sequence complexity, the designed sgRNA's specificity and efficiency were low, hindering successful Iqch knock-in mouse construction. Considering IQCH387+1_387+10del results in absent expression, we pursued Iqch knockout mice to explore IQCH's role in spermatogenesis.

      4) Line 241.Figure 5A Gene Ontology (GO) analysis of the IQCH-bound proteins revealed a particular enrichment in fertilization, sperm axoneme assembly, mitochondrial organization, calcium channel, and RNA processing. But these GO functions are not shown in Figure 5A. The entire Figure 5 should be revised to enhance readability.

      Response: We sincerely apologize for the oversight. These GO functions were indeed identified during the analysis of IQCH-bound proteins. Regrettably, we unintentionally omitted these GO functions when creating the plots. We have revised the plots in Figure 5 in revised manuscript to enhance readability.

      5) Line 242 "33 ribosomal proteins were identified (Fig. 5B), indicating that IQCH might be involved in protein synthesis". The authors should perform an analysis to support the claim of protein synthesis defects.

      Response: Thanks to reviewer’s suggestions. Initially, we have supplemented Co-IP experiments to confirm the interaction between IQCH and three ribosomal proteins (RPL4, RPS3, and RPS7), chosen from a pool of 33 ribosomal proteins based on different protein scores (Figure R1). In addition, the proteomic analysis revealed 807 upregulated proteins and 1,186 downregulated proteins in KO mice compared to WT mice. We confirmed the key downregulated proteins by western blotting and immunofluorescence staining in the previous manuscript. These results indicated that IQCH might interact with ribosomal proteins to regulate protein expression. Naturally, the regulation of protein synthesis by IQCH requires further experiments for confirmation in future studies.

      Author response image 1.

      The interaction between IQCH and ribosomal proteins. Co-IP assays confirmed that IQCH interacted with RPL4, RPS3, and RPS7 in WT mouse sperm.

      6) Line 244 The authors mentioned too many GO functions without focus.

      Response: Following reviewer’s suggestions, we have simplified IQCH-associated GO functions in the revised manuscript.

      7) Figure 6, there are no negative controls in all co-IP experiments. Band sizes are not marked. Thus, all data can't be evaluated. This also raises concern about whether the LC-MS/MS experiment to identify IQCH interacting protein was well-controlled? All co-IP experiments were poorly designed to draw any conclusion.

      Response: Thanks to reviewer’s comments. We have supplemented negative controls in all Co-IP experiments and provided band sizes in Figure 6 in revised manuscript.

      8) The authors mentioned that IQCH can bind to CaM. But they didn't detect CaM protein in Figure 5. Did the LC-MS/MS experiment really work?

      Response: Thanks to reviewer’s comments. We detected the interaction of CaM protein with IQCH in the LC-MS/MS experiment analysis, which has been submitted as new Data S1 in the revised manuscript. We also confirmed their binding in mouse sperm by Co-IP experiment and immunofluorescence staining, which results were shown in Figure 6 and Figure S10 in the previous study.

      9) Figure 6D. Because IQCH is lost in Iqch KO sperm, what is the point of showing in the Co-IP assay that CaM does not bind to IQCH in Iqch KO sperm?

      Response: Following reviewer’s suggestions, we have deleted the results of Co-IP assay that CaM could not bind to IQCH in Iqch KO sperm.

      10) Figure 6E. The Co-IP assay does not support the authors' claim that the decreased expression of HNRPAB was due to the reduced binding of IQCH and CaM by the knockout of IQCH or CaM.

      Response: Thanks to reviewer’s expert comments. Indeed, the results of Figure 6E confirmed the interaction of IQCH and CaM in K562 cells, and also showed that the expression of HNRPAB was reduced when IQCH or CaM was knocked down, suggesting that IQCH or CaM might regulate HNRPAB expression. While in Figure 6F, the downregulation of HNRPAB caused by knocking down IQCH (or CaM) cannot be rescued when overexpressed CaM (or IQCH), indicating that CaM (or IQCH) cannot mediate HNRPAB expression alone. Therefore, the reduced expression of HNRPAB in Figure 6E might result from the weakened interaction between IQCH and CaM, but not a superficial downregulation of IQCH or CaM expression. To avoid the confusion, we have modified the relevant description in the revied manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      1) Lines 117 and 129: Please provide the reference number (NM_xxx.x) for the IQCH isoform that was used to interpret this variant. This is key information. Also, please provide the predicted truncation consequence caused by this splicing variant to IQCH protein.

      Response: Thanks to reviewer’s suggestions. We have added reference number (NM_0010317152) of IQCH in manuscript. We employed splice site prediction tools, such as SpliceAI, RDDC, and varSEAK, to assess the expression consequences of this IQCH splicing variant. These tools couldn't anticipate the outcome of this splicing variant. However, the results of minigene splicing assay showed that the IQCH c.387+1_387+10del resulted in degradation of IQCH.

      2) Figure 1A: The deleted sequence indicated by the red box does not match IQCH c.387+1_387+10del. Please show a plot of the exon-intron boundary under the Sanger sequencing results of the WT allele.

      Response: Thanks to reviewer’s suggestions. We are sorry for the use of non-standard descriptions about the results of Sanger sequencing. According to the HGVS nomenclature (Figure R2), we have modified the red box to match IQCH c.387+1_387+10del and have added the exon-intron boundary in Figure 1A accordingly.

      Author response image 2.

      HGVS nomenclature description of the IQCH variant. The picture showed a detailed HGVS nomenclature description of IQCH c.387+1_387+10del.

      Minor comments:

      a) Manuscript title: It is suggested to change the title to "IQCH regulates spermatogenesis by interacting with CaM to promote the expression of RNA-binding proteins".

      Response: According to reviewer’s suggestions, we have modified the title as “IQCH regulates spermatogenesis by interacting with CaM to promote the expression of RNA-binding proteins”.

      b) Line 116: Please introduce the abbreviation WES. Also, please introduce the other abbreviations (such as WT, SEM, TEM, etc.) the first time they appear.

      Response: Thanks to reviewer’s suggestions. We have provided the full explanations for all abbreviations upon their initial appearance.

      c) Line 140, "Nonfunctional IQCH": Due to "the lack of IQCH expression" in Line 137, should "Nonfunctional IQCH" be changed into "IQCH deficiency"?

      Response: Thanks for reviewer’s the detailed review. We have modified this title in Results part of the revised manuscript as followed: “IQCH deficiency leads to sperm with cracked axoneme structures accompanied by defects in the acrosome and mitochondria”

      d) The information on the following references is incomplete: Sechi et al., Tian et al., Wang et al., and Xu et al. Please provide issue/page/article numbers.

      Response: We are sorry for our oversight. We have provided the missing issue/page/article numbers for the references.

      e) The title of Figure 1: Please emphasize that the male infertile-associated variant is "homozygous".

      Response: Thanks to reviewer’s suggestions. We have revised the title of Figure 1 to emphasize the homozygous variant as follows: “Identification of a homozygous splicing mutation in IQCH in a consanguineous family with male infertility”.

      f) Table 1: Please provide the reference paper for the normal values. Response: We appreciate the reviewer's detailed checks. We have provided the reference paper for the normal values in Table 1.

      g) Figure 5F is distorted. Please make sure that it is a perfect circle.

      Response: Thanks to reviewer’s suggestions. We have revised both the graphical representation and layout of Figure 5 in revised manuscript to make sure the readability.

      Reviewer #3 (Recommendations For The Authors):

      While the writing is generally clear, there are multiple examples of where the writing could be improved for clarity.

      1) While some terms are defined throughout the manuscript, many abbreviations are not defined upon their first mention, such as WES, RT-PCR, TYH, HTF, KSOM, KEGG, RIPA, PMSE, SDS-PAGE, H&L, and HRP.

      Response: Thanks to reviewer’s suggestions. We have provided the full explanations for all abbreviations upon their initial appearance.

      2) On line 44, the claim that spermatogenesis is the "most complex biological process" is rather subjective and hard to support with concrete data.

      Response: Thanks to reviewer’s suggestions. We have modified this description in the Introduction section as follow: “Spermatogenesis is one of the most complex biological process in male organisms and functions to produce mature spermatozoa from spermatogonia in three phases: (i) spermatocytogenesis (mitosis), (ii) meiosis, and (iii) spermiogenesis.”

      3) On line 54, I think the authors meant "heterogeneous," not "heterologous."

      Response: Thanks to reviewer’s comment. We have changed “heterologous” into “heterogeneous”.

      4) On line 156, I think the authors meant "deficiency," not "deficient."

      Response: Thanks to reviewer’s comment. We are sorry to make this mistake. We have made the correction in the revised version of the manuscript.

      5) On line 300, K562 cells are mentioned, but neither in the Methods nor the Results are any details about the biological origin of these cells (or rationale for their use other than co-expression of IQCH and CaM) provided.

      Response: Thanks to reviewer’s suggestion. K562 cell line is a human leukemia cell line and is enriched in the expression of IQCH and CaM, we thus opted to use this cell line for an easier knockdown of IQCH and CaM. We have supplemented the details about the biological origin of these cells in Method section of revised manuscript.

      6) For the Results section describing Figure 6H, it would be nice to provide some explanation of the results of ICHQ overexpression alone relative to control situations and not just relative to the delta-IQ version or relative to simultaneous CaM manipulation.

      Response: According to the reviewer’s suggestion, we have supplemented the co-transfection of control and CaM plasmids in HEK293T cells, and the results showed that the expression of HNRPAB in cells co-transfected with control and CaM plasmids was similar to that of co-transfected with IQCH (△IQ) /CaM plasmids, but was lower than that in the cells overexpressing the WT-IQCH and CaM plasmids, confirming the nonfunction of IQCH (△IQ) plasmids. We have shown the results in Figure 6H in the revised manuscript.

      7) The sentence on lines 352-354 is confusing.

      Response: We apologize for any confusion caused by the sentence in question. We have revisited the sentence and made appropriate revisions to enhance its clarity as follows: “Our findings suggest that the fertilization function is the main action of IQ motif-containing proteins, while each specific IQ motif-containing protein also has its own distinct role in spermatogenesis.”

      8) The use of "employee" on line 371 is awkward and not very scientific.

      Response: Thanks to reviewer’s comment. We have changed “employee” in to “downstream effector protein” on line 376

    1. Author Response

      Thanks to all the reviewers for their insightful and constructive comments, which are very helpful in improving the manuscript. We are encouraged by the many positive comments regarding the significance of our findings and the value of our data. Regarding the reviews’ concern on cell classification, we used several additional marker genes to explain the identification of cell clusters and subclusters. We have further analyzed and rewrote part of the text to address the concerns raised. Here is a point-by-point response to the reviewers’ comments and concerns. Figures R1-R9 were provided only for additional information for reviewers and were not included in the revised manuscript.

      Reviewer #1 (Public Review):

      In the article "Temporal transcriptomic dynamics in developing macaque neocortex", Xu et al. analyze the cellular composition and transcriptomic profiles of the developing macaque parietal cortex using single-cell RNA sequencing. The authors profiled eight prenatal rhesus macaque brains at five timepoints (E40, E50, E70, E80, and E90) and obtained a total of around 53,000 high-quality cells for downstream analysis. The dataset provides a high-resolution view into the developmental processes of early and mid-fetal macaque cortical development and will potentially be a valuable resource for future comparative studies of primate neurogenesis and neural stem cell fate specification. Their analysis of this dataset focused on the temporal gene expression profiles of outer and ventricular radial glia and utilized pesudotime trajectory analysis to characterize the genes associated with radial glial and neuronal differentiation. The rhesus macaque dataset presented in this study was then integrated with prenatal mouse and human scRNA-seq datasets to probe species differences in ventricular radial glia to intermediate progenitor cell trajectories. Additionally, the expression profile of macaque radial glia across time was compared to those of mouse apical progenitors to identify conserved and divergent expression patterns of transcription factors.

      The main findings of this paper corroborate many previously reported and fundamental features of primate neurogenesis: deep layer neurons are generated before upper layer excitatory neurons, the expansion of outer radial glia in the primate lineage, conserved molecular markers of outer radial glia, and the early specification of progenitors. Furthermore, the authors show some interesting divergent features of macaque radial glial gene regulatory networks as compared to mouse. Overall, despite some uncertainties surrounding the clustering and annotations of certain cell types, the manuscript provides a valuable scRNA-seq dataset of early prenatal rhesus macaque brain development. The dynamic expression patterns and trajectory analysis of ventricular and outer radial glia provide valuable data and lists of differentially expressed genes (some consistent with previous studies, others reported for the first time here) for future studies.

      The major weaknesses of this study are the inconsistent dissection of the targeted brain region and the loss of more mature excitatory neurons in samples from later developmental timepoint due to the use of single-cell RNA-seq. The authors mention that they could observe ventral progenitors and even midbrain neurons in their analyses. Ventral progenitors should not be present if the authors had properly dissected the parietal cortex. The fact that they obtained even midbrain cells point to an inadequate dissection or poor cell classification. If this is the result of poor classification, it could be easily fixed by using more markers with higher specificity. However, if it is the result of a poor dissection, some of the cells in other clusters could potentially be from midbrain as well. The loss of more mature excitatory neurons is also problematic because on top of hindering the analysis of these neurons in later developmental periods, it also affects the cell proportions the authors use to support some of their claims. The study could also benefit from the validation of some of the genes the authors uncovered to be specifically expressed in different populations of radial glia.

      We thank the Reviewer’s comments and apologize for the shortcomings of tissue dissection and cell capture.

      We used more marker genes for major cell classification, such as SHOX2, IGFBP5, TAC1, PNYN, FLT1, and CYP1B, in new Figure 1D, to improve the cell type annotation results. We improved the cell type annotation results by fixing cluster 20 from C20 as Ventral LGE-derived interneuron precursor and cluster by the expression of IGFBP5, TAC1, and PDYN; fixing cluster 23 from meningeal cells to thalamus cells by the expression of ZIC2, ZIC4, and SHOX2. These cell types were excluded in the follow-up analysis. Due to EN8 being previously incorrectly defined as midbrain neurons, it resulted in a misunderstanding of the dissection result as a poor dissection. After carefully reviewing the data analysis process, we determined that EN8 was a small group of cells in cluster 23 mistakenly selected during excitatory neuron analysis, as shown in Figure R5(A), which was corrected after revision. In the revised manuscript, we deleted the previous EN8 subcluster and renumbered the rest of the excitatory neuron subclusters in the new Figure 2.

      In addition, we also improved the description of sample collection as follows: “We collected eight pregnancy-derived fetal brains of rhesus macaque (Macaca mulatta) at five prenatal developmental stages (E40, E50, E70, E80, E90) and dissected the parietal lobe cortex. Because of the different development times of rhesus monkeys, prenatal cortex size and morphology are different. To ensure that the anatomical sites of each sample are roughly the same, we use the lateral groove as a reference to collect the parietal lobe for single-cell sequencing (as indicated by bright yellow in Figure S1A) and do not make a clear distinction between the different regional parts including primary somatosensory cortex and association cortices in the process of sampling”. As shown in Figure S1A, due to the small volume of the cerebral cortex at early time points, especially in E40, a small number of cells beyond the dorsal parietal lobe, including the ventral cortex cells and thalamus cells, were collected during the sampling process with the brain stereotaxic instrument.

      In this study, the BD method was used to capture single cells. Due to the fixed size of the micropores, this method might be less efficient in capturing mature excitatory neurons. However, it has a good capture effect on newborn neurons at each sampling time point so that the generation of excitatory neurons at different developmental time points can be well observed, as shown in Figure 2, which aligns with our research purpose.

      To verify the reliability of our cell annotation results, we compared the similarity of cell-type association between our study and recently published research(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652), using the scmap package to project major cell types in our macaque development scRNA-seq dataset to GSE226451. The river plot in Author response image 1 illustrates the broadly similar relationships of cell type classification between the two datasets.

      Author response image 1.

      Riverplot illustrates relationships between datasets in this study and recently published developing macaque telencephalon datasets major cell type annotation.

      Furthermore, bioinformatics analysis is used for the validation of genes specifically expressed in outer radial glia. We verified terminal oRG differentiation genes in the recently published macaque telencephalic development dataset(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652) (GEO accession: GSE226451). The results of Author response image 2 show that the gene expression showed states/stages. Most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.

      Author response image 2.

      Heatmap shows the relative expression of genes displaying significant changes along the pseudotime axis of vRG to oRG from the dataset of Nicola Micali et al.2023(GEO: GSE226451). The columns represent the cells being ordered along the pseudotime axis.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Xu et al., is an interesting study aiming to identify novel features of macaque cortical development. This study serves as a valuable atlas of single cell data during macaque neurogenesis, which extends the developmental stages previously explored. Overall, the authors have achieved their aim of collecting a comprehensive dataset of macaque cortical neurogenesis and have identified a few unknown features of macaque development.

      Strengths:

      The authors have accumulated a robust dataset of developmental time points and have applied a variety of informatic approaches to interrogate this dataset. One interesting finding in this study is the expression of previously unknown receptors on macaque oRG cells. Another novel aspect of this paper is the temporal dissection of neocortical development across species. The identification that the regulome looks quite different, despite similar expression of transcription factors in discrete cell types, is intriguing.

      Weaknesses:

      Due to the focus on demonstrating the robustness of the dataset, the novel findings in this manuscript are underdeveloped. There is also a lack of experimental validation. This is a particular weakness for newly identified features (like receptors in oRG cells). It's important to show expression in relevant cell types and, if possible, perform functional perturbations on these cell types. The presentation of the data highlighting novel findings could also be clarified at higher resolution, and dissected through additional informatic analyses. Additionally, the presentation of ideas and goals of this manuscript should be further clarified. A major gap in the study rationale and results is that the data was collected exclusively in the parietal lobe, yet the rationale and interpretation of what this data indicates about this specific cortical area was not discussed. Last, a few textual errors about neural development are also present and need to be corrected.

      We thank you for your comments and suggestions concerning our manuscript. The comments and suggestions are all valuable and helpful for revising and improving our paper and the essential guiding significance to our research. We have studied the comments carefully and made corrections, which we hope to meet with approval. We have endeavored to address the multiple points raised by the referee.

      To support the reliability of our data and newly identified features, we verified terminal oRG differentiation genes in the recently published macaque telencephalic development dataset(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652) (GEO accession: GSE226451). The results of Figure R2 show that the gene expression showed states/stages. Most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.

      Our research results mainly explore the conserved features of neocortex development across species. By comparing evolution, we found the types of neural stem cells in the intermediate state, their generative trajectories, and gene expression dynamics accompanying cell trajectories. We further explored the stages of transcriptional dynamics during vRG generating oRG. More analysis was performed through transcriptional factor regulatory network analysis. We performed the TFs regulation network analysis of human vRG with pyscenic workflow. The top transcription factors of every time point in human vRG were calculated, and we used the top 10 TFs and their top 5 target genes to perform interaction analysis and generate the regulation network of human vRG in revised figure 6. In comparison of the pyscenic results of mouse, macaque and human vRG, it was obvious that the regulatory networks were not evolutionarily conservative. Compared with macaque, the regulatory network of transcription factors and target genes in humans is more complex. Some conserved regulatory relationships present in more than one species are identified, such as HMGN3, EMX2, SOX2, and HMGA2 network at an early stage when deep lager generation and SOX10, ZNF672, ZNF672 network at a late stage when upper-layer generation.

      Although the parietal lobe is the center of the somatic senses and is significant for interpreting words as well as language understanding and processing. In this study, the parietal lobe area was selected mainly because of the convenience of sampling the dorsal neocortex. As we described in the Materials and Methods section as follows: “Because of the different development times of rhesus monkeys, prenatal cortex size and morphology are different. To ensure that the anatomical sites of each sample are roughly the same, we use the lateral groove as a reference to collect the parietal lobe for single-cell sequencing (as indicated by bright yellow in Figure S1A) and do not make a clear distinction between the different regional parts including primary somatosensory cortex and association cortices in the process of sampling”.

      Thanks for carefully pointing out our manuscript's textual errors about neural development. We have corrected them which were descripted in the following response.

      Reviewer #3 (Public Review):

      Summary: The study adds to the existing data that have established that cortical development in rhesus macaque is known to recapitulate multiple facets cortical development in humans. The authors generate and analyze single cell transcriptomic data from the timecourse of embryonic neurogenesis.

      Strengths:

      Studies of primate developmental biology are hindered by the limited availability and limit replication. In this regard, a new dataset is useful.

      The study analyzes parietal cortex, while previous studies focused on frontal and motor cortex. This may be the first analysis of macaque parietal cortex and, as such, may provide important insights into arealization, which the authors have not addressed.

      Weaknesses:

      The number of cells in the analysis is lower than recent published studies which may limit cell representation and potentially the discovery of subtle changes.

      The macaque parietal cortex data is compared to human and mouse pre-frontal cortex. See data from PMCID: PMC8494648 that provides a better comparison.

      A deeper assessment of these data in the context of existing studies would help others appreciate the significance of the work.

      We thank the reviewer for these suggestions and constructive comments. We agree with the reviewer that the cell number in our study is lower than in recently published studies. The scRNA sequencing in this study was completed between 2018 and 2019, the early stages of the single-cell sequencing technology application. Besides, we have been unable to get extra macaque embryos to enlarge the sample numbers recently since rhesus monkey samples are scarce. Therefore, the number of cells in our study is relatively small compared to recently published single-cell studies.

      The dataset suggested by the reviewers is extremely valuable, and we tried to perform analysis as the reviewer suggested to explore temporal expression patterns in different species of parietal cortex. The dataset from PMCID: PMC8494648 provides the developing human brain across regions from gestation week (GW)14 to gestation week (GW)25. Since this data set only covers the middle and late stages of embryonic neurogenesis, it did not fully match the developmental time points of our study for integration analysis. However, we quoted the results of this study in the discussion section.

      The human regulation analysis with pyscenic workflow was added into new figure 6 for the comparison of different species vRG regulatory network. Compared with macaque, the regulatory network of transcription factors and target genes in humans is more complex. Some conserved regulatory relationships present in more than one species are identified, such as HMGN3, EMX2, SOX2, and HMGA2 network at an early stage when deep lager generation and SOX10, ZNF672, ZNF672 network at a late stage when upper-layer generation.

      Besides, we performed additional integration analysis of our dataset with the recently published macaque neocortex development datase (GEO accession: GSE226451) to verify the reliability of our cell annotation results and terminal oRG differentiation genes. The river plot in Figure R1 illustrates the broadly similar relationships of cell type classification between the two datasets. The result in Figure R2 showed that most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.

      Reviewer #1 (Recommendations For The Authors):

      1) Throughout the manuscript, the term "embryonic" or "embryogenesis" is used in reference to all timepoints (E40-E90) in this study. The embryonic period is a morphologically and anatomically defined developmental period that ends ~E48-E50 in rhesus macaque. Prenatal or developing is a more accurate term when discussing all timepoints of this study.

      We thank the reviewer for pointing out this terminology that needs to be clarified. We have now replaced “embryonic” with “prenatal” as a more appropriate description for the sampling time points in the manuscript.

      2) Drosophila should be italicized in the introduction.

      Thanks for suggesting that we have set the “Drosophila” words to italics in the manuscript.

      3) Introduction - "In rodents, radial glia are found in the ventricular zone (VZ), where they undergo proliferation and differentiation." This sentence implies that only within rodents are radial glia found within the ventricular zone. Radial glia are present within the ventricular zone of all mammals.

      Thanks for careful reading. This sentence has been corrected “In mammals, radial glial cells are found in the ventricular zone (VZ), where they undergo proliferation and differentiation.”

      4) Figure 1A - an image of the E40 brain is missing.

      We first sampled the prenatal developmental cortex of rhesus monkeys at the E40 timepoint. Unfortunately, we forgot to save the photo of the sampling at the E40 time point.

      5) Figure 1B and 1C - it is unclear why cluster 20 is not annotated in Figure 1 as in the text it is stated "Each of the 28 identified clusters could be assigned to a cell type identity..." This cluster expresses VIM and PAX6 suggestive of ventricular radial glia and is located topographically approximate to IPC cluster 8 and seems to bridge the gap between neural stem cells and the interneuron clusters. Additionally, cluster 20 appears to be subclustered by itself in the progenitor subcluster UMAP (Figure 3A) suggestive of a batch effect or cells with low quality. The investigation, quality control, and proper annotation of this cluster 20 is necessary.

      We appreciate for the reviewer’s suggestion. We detected specific expression marker genes of cluster 20, cells in this cluster specifically expressed VIM, IGFBP5 and TAC. According to the cell annotation results from a published study, we relabeled cluster 20 as ventral LGE-derived interneuron precursors (Yu, Yuan et al. Nat Neurosci. 2021. doi:10.1038/s41593-021-00940-3. PMID: 34737447.). Cluster 20 cells have been deleted in the new Figure 3A.

      6) Figure 1B UMAP - it is unexpected that meningeal cells would cluster topographically closer to the excitatory neuron cluster (one could even argue that the meningeal cell cluster is located within the excitatory neuron clusters) instead of next to or with the endothelial cell clusters. This is suspicious for a mis-annotated cell cluster. ZIC2 and ZIC3 were used as the principal marker genes for meningeal cells. However, these genes are not specific for meninges (PanglaoDB) and had not been identified as marker genes in a developmental sc-RNAseq dataset of the developing mouse meninges (DeSisto et al. 2020). Additional marker genes (COL1A1, COL1A2, CEMIP, CYP1B1, SLC13A3) may be helpful to delineate the identity of this cluster and provide more evidence for a meningeal origin.

      We thank the reviewer for the constructive advice. The violin plot in Author response image 3 has checked additional marker genes, including COL1A1, COL1A2, CEMIP, and CYP1B2. Cluster 23 does not express these marker genes but specifically expresses thalamus marker genes SHOX2(Rosin, Jessica M et al. Dev Biol. 2015. doi:10.1016/j.ydbio.2014.12.013. PMID: 25528224.) and TCF7L2(Lipiec, Marcin Andrzej et al. Development. 2020. doi: 10.1242/dev.190181. PMID: 32675279). According to the gene expression results, we corrected the cell definition of cluster 23 to thalamic cells in the revised manuscript. Specifically, we added marker genes SHOX2 and CYP1B1 in the new Figure 1D violin plot and corrected the cell definition of cluster23 from meninges to thalamus cells in the revised manuscript and figures.

      Author response image 3.

      Vlnplot of additional markers in cluster 23.

      7) From Figure 1A, it appears that astrocytes (cluster 13) are present at E40 and E50 timepoints. This is inconsistent with literature and experimental data of the timing of the neuron-glia switch in primates and inconsistent with the claim within the text that, "Collectively, these results suggested that cortical neural progenitors undergo neurogenesis processes during the early stages of macaque embryonic cortical development, while gliogenic differentiation... occurs in later stages." The clarification of the percentage of astrocytes at each timepoint would clarify this point.

      According to the suggestion, we have statistically analyzed the percentage of astrocytes (cluster 13) at each time point. The statistical results showed that the proportion of astrocytes was low to 0.1783% and 0.1046% at E40 and E50 time points, and increased significantly at E80 and E90, suggesting the onset of macaque gliogenesis might be around embryonic 80 days to 90 days. The result was consistent with published research on the timing of the neuron-glial transition in primates (Rash, Brian G et al. Proc Natl Acad Sci U S A. 2019. doi:10.1073/pnas.1822169116. PMID: 30894491). Besides, we thought that the cells in cluster 13 captured at E40 to E50 time points, with a total number of less than 200, maybe astrocyte precursor cells expressing the AQP4 gene (Yang, Lin, et al. Neuroscience bulletin. 2022. doi:10.1007/s12264-021-00759-9. PMID: 34374948).

      8) A subcluster of ExN neurons was identified and determined to be of midbrain origin based on expression of TCF7L2. Did this subcluster express other known markers of the developing midbrain (OTX2, LMX1A, NR4A2, etc...)? Additionally, does this subcluster suggest that the limits of the dissection extended to the midbrain in samples E40 and E50?

      We apologize for the previous inadequacy of the excitatory neuron cell annotation. In the description of the previous version of the manuscript, we misidentified the cells of the EN8 as midbrain cells. Following the reviewer’s suggestion, we verified the expression of more tissue- specific marker genes of EN8. As the violin diagram in Author response image 4 shows, other developing midbrain markers OTX2, NR4A2, and PAX7 did not express in EN8, but thalamus marker genes SHOX2, TCF7L2, and NTNG1 were highly expressed in EN8. Besides, dorsal cortex excitatory neuron markers NEUROD2, NEUROD6, and EMX1 were not expressed in EN8, which suggests that EN8 might not belong to cortical cells. After carefully reviewing the data analysis process, we determined that EN8 was a small group of cells in cluster 23 mistakenly selected during excitatory neuron analysis, as shown in Figure R5(A), which was corrected after revision. In the revised manuscript, we have removed EN8 from the analysis of excitatory neurons. In the revised manuscript, we have deleted the previous EN8 subcluster and renumbered the left excitatory neuron subclusters in new Figure 2 and Figure S3.

      Author response image 4.

      (A). Modified diagram of clustering of excitatory neuron subclusters collected at all time points, visualized via UMAP related to Figure 2A. (B) Vlnplot of different marker genes in EN8.

      9) "These data suggested that the cell fate determination by diverse neural progenitors occurs in the embryonic stages of macaque cortical development and is controlled by several key transcriptional regulators" The authors present a list of differentially expressed genes specific to the various radial glia clusters along pseudotime. Some of these radial glia DEGs are known and have been characterized by previous literature while other DEGs they have identified had not been previously shown to be associated with radial glia specification/maturation. However, this list of DEGs does not support the claim that cell fate determination is controlled by several key transcriptional regulators. What were the transcriptional regulators of radial glia specification identified in this study and how were they validated?

      We agree with the reviewer and honestly admit that the description of this part in the previous manuscript is inaccurate. The description has been deleted in the revised manuscrip.

      10) "Comparing vRG to IPC trajectory between human, macaque, and mouse, we found this biological process of vRG-to-IPC is very conserved across species, but the vRG to oRG trajectory is divergent between species. The latter process is almost invisible in mice, but it is very similar in primates and macaque." Firstly, macaques are primates, and the text should be updated to reflect this. Secondly, from Figure 5C., it seems there were no outer radial glia detected at all within the vRG-oRG and vRG-IPC developmental trajectories. This would imply that oRGs are not "almost invisible" in mice, but rather do not exist. The authors need to clarify the presence or absence of identifiable outer radial glia in the integrated dataset and relate the relative abundance of these cells to their interpretation of the developmental trajectories for each species.

      We apologize for the description inaccuracies in the manuscript and thank the reviewer for pointing out the expression errors. At your two suggestions, the description has been corrected in the revised manuscript as "Comparing vRG to IPC trajectory between human, macaque, and mouse, we found this biological process of vRG-to-IPC is very conserved across species. However, the vRG to oRG trajectory is divergent between species because the oRG population was not identified in the mouse dataset. The latter process is almost invisible in mice but similar in humans and macaques".

      Although several published research has shown that oRG-like progenitor cells were present in the mouse embryonic neocortex(Wang, Xiaoqun et al. Nature neuroscience.2011. doi:10.1038/nn.2807; Vaid, Samir et al. Development. 2018, doi:10.1242/dev.169276. PMID: 30266827). However, oRG cells were barely detected in the scRNA-seq dataset of mice cortical development studies(Ruan, Xiangbin et al. Proc Natl Acad Sci U S A. 2021. doi:10.1073/pnas.2018866118. PMID: 33649223; Di Bella, Daniela J et al. Nature. 2021. doi:10.1038/s41586-021-03670-5. PMID: 34163074; Chen, Ao et al. Cell. 2022. doi:10.1016/j.cell.2022.04.003. PMID: 35512705). There were no oRG populations detected in the mouse embryonic cortical development dataset (GEO: GSE153164) used for integration analysis in our study.

      11) "Ventral radial glia cells generate excitatory neurons by direct and indirect neurogenesis" This should be corrected to dorsal radial glia cells as this paper is discussing radial glia of the dorsal pallium.

      13) Editorially, gene names need to be italicized in the text, figures, and figure legends.

      14) Figure 5B - a scale bar showing the scale of the relative expression denoted by the dark blue color would be beneficial.

      15) Figure S7D is mislabeled in the figure legend.

      Merged response to points 11 to 15: Thank you for kindly pointing out the errors in our manuscript. We have corrected the above four points in the revised version.

      Reviewer #2 (Recommendations For The Authors):

      Specific suggestions for authors:

      In the abstract the authors state: "thicker upper-layer neurons". I think it's important to be clear in the language by stating either that the layers are thicker or the neurons are most dense.

      Thanks for your good comments. The description of “thicker upper-layer neurons” was corrected to “the thicker supragranular layer” in the revised manuscript. The supragranular layer thickness in primates was much higher than in rodents, both in absolute thickness and in proportion to the thickness of the whole neocortex (Hutsler, Jeffrey J et al. Brain research. 2005. doi:10.1016/j.brainres.2005.06.015. PMID: 16018988). Here, we want to describe the supragranular layer of primates as significantly higher than that of rodents, both in absolute thickness and in proportion to the thickness of the whole neocortex.

      The introduction needs additional clarification regarding the vRG vs oRG discussion. I was unclear what the main takeaway for readers should be. Similarly, the discussion of previous studies and the importance for comparing human and macaque could be clarified.

      We appreciate the suggestion and apologize for the shortcomings of the introduction part. We have rewritten the section and added additional clarification in the revised introduction. In the revised manuscript, the contents of the introduction are as follows:

      “The neocortex is the center for higher brain functions, such as perception and decision-making. Therefore, the dissection of its developmental processes can be informative of the mechanisms responsible for these functions. Several studies have advanced our understanding of the neocortical development principles in different species, especially in mice. Generally, the dorsal neocortex can be anatomically divided into six layers of cells occupied by distinct neuronal cell types. The deep- layer neurons project to the thalamus (layer VI neurons) and subcortical areas (layer V neurons), while neurons occupying more superficial layers (upper-layer neurons) preferentially form intracortical projections1. The generation of distinct excitatory neuron cell types follows a temporal pattern in which early-born neurons migrate to deep layers (i.e., layers V and VI), while the later- born neurons migrate and surpass early-born neurons to occupy the upper layers (layers II-IV) 2. In Drosophila, several transcription factors are sequentially explicitly expressed in neural stem cells to control the specification of daughter neuron fates, while very few such transcription factors have been identified in mammals thus far. Using single-cell RNA sequencing (scRNA-seq), Telley and colleagues found that daughter neurons exhibit the same transcriptional profiles of their respective progenitor radial glia, although these apparently heritable expression patterns fade as neurons mature3. However, the temporal expression profiles of neural stem cells and the contribution of these specific temporal expression patterns in determining neuronal fate have yet to be wholly clarified in humans and non-human primates. Over the years, non-human primates (NHP) have been widely used in neuroscience research as mesoscale models of the human brain. Therefore, exploring the similarities and differences between NHP and human cortical neurogenesis could provide valuable insight into unique features during human neocortex development.

      In mammals, radial glial cells are found in the ventricular zone (VZ), where they undergo proliferation and differentiation. The neocortex of primates exhibits an extra neurogenesis zone known as the outer subventricular zone (OSVZ), which is not present in rodents. As a result of evolution, the diversity of higher mammal cortical radial glia populations increases. Although ventricular radial glia (vRG) is also found in humans and non-human primates, the vast majority of radial glia in these higher species occupy the outer subventricular zone (OSVZ) and are therefore termed outer radial glia (oRG). Outer radial glial (oRG) cells retain basal processes but lack apical junctions 4 and divide in a process known as mitotic somal translocation, which differs from vRG 5. VRG and oRG are both accompanied by the expression of stem cell markers such as PAX6 and exhibit extensive self-renewal and proliferative capacities 6. However, despite functional similarities, they have distinct molecular phenotypes. Previous scRNA-seq analyses have identified several molecular markers, including HOPX for oRGs, CRYAB, and FBXO32 for vRGs7. Furthermore, oRGs are derived from vRGs, and vRGs exhibit obvious differences in numerous cell-extrinsic mechanisms, including activation of the FGF-MAPK cascade, SHH, PTEN/AKT, and PDGF pathways, and oxygen (O2) levels. These pathways and factors involve three broad cellular processes: vRG maintenance, spindle orientation, and cell adhesion/extracellular matrix production8.

      Some transcription factors have been shown to participate in vRG generation, such as INSM and TRNP1. Moreover, the cell-intrinsic patterns of transcriptional regulation responsible for generating oRGs have not been characterized.

      ScRNA-seq is a powerful tool for investigating developmental trajectories, defining cellular heterogeneity, and identifying novel cell subgroups9. Several groups have sampled prenatal mouse neocortex tissue for scRNA-seq 10,11, as well as discrete, discontinuous prenatal developmental stages in human and non-human primates 7,12 13,14. The diversity and features of primate cortical progenitors have been explored 4,6,7,15. The temporally divergent regulatory mechanisms that govern cortical neuronal diversification at the early postmitotic stage have also been focused on 16. Studies spanning the full embryonic neurogenic stage in the neocortex of humans and other primates are still lacking. Rhesus macaque and humans share multiple aspects of neurogenesis, and more importantly, the rhesus monkey and human brains share more similar gene expression patterns than the brains of mice and humans17-19. To establish a comprehensive, global picture of the neurogenic processes in the rhesus macaque neocortex, which can be informative of neocortex evolution in humans, we sampled neocortical tissue at five developmental stages (E40, E50, E70, E80, and E90) in rhesus macaque embryos, spanning the full neurogenesis period. Through strict quality control, cell type annotation, and lineage trajectory inference, we identified two broad transcriptomic programs responsible for the differentiation of deep-layer and upper-layer neurons. We also defined the temporal expression patterns of neural stem cells, including oRGs, vRGs, and IPs, and identified novel transcription factors involved in oRG generation. These findings can substantially enhance our understanding of neocortical development and evolution in primates.”

      Why is this study focused on the parietal lobe? This should be discussed in the introduction and interpretation of the data should be contextualized in the context of this cortical area.

      In this study, samples were collected from the parietal lobe area mainly for the following reasons:

      (1) To ensure that the cortical anatomical parts collected at each time point are consistent, we used the lateral cerebral sulcus as a marker to collect the parietal lobe tissue above the lateral sulcus for single-cell sequencing sample collection. Besides, the parietal region is also convenient for sampling the dorsal cortex.

      (2) Previous studies have made the timeline of the macaque parietal lobe formation process during the prenatal development stage clear ( Finlay, B L, and R B Darlington.Science.1995. doi:10.1126/science.7777856. PMID: 7777856), which is also an essential reason for using the parietal lobe as the research object.

      Figure 1:

      Difficult to appreciate how single cell expression reflects the characterization of layers described in Figure 1A. A schematic for temporal development would be helpful. Also, how clusters correspond to discrete populations of excitatory neurons and progenitors would improve figure clarity. Perhaps enlarge and annotate the UMAPS on the bottom of Figure 1A.

      We thank the reviewer for the suggestion and apologize for that Figure 1A does not convey the relationship between single-cell expression and neocortex layer formation. In the revised manuscript, time points information associated with the hierarchy is labeled to the diagram in Figure S1A. The UMAPS on the bottom of Figure 1A was enlarged in the revised manuscript as new Figure 1C.

      Labels on top of clusters for 1A/1B would be helpful as it's difficult to see which colors the numbers correspond to on the actual UMAP.

      Many thanks to the reviewer for carefully reading and helpful suggestions. We have adjusted the visualization of UMAP in the revised vision. The numbers in the label bar of Figure 1B have been moved to the side of the dot so that the dot can be seen more clearly.

      Microglia and meninges are also non-neural cells. This needs to be changed in the discussion of the results.

      Thanks for the suggestion. We have fixed the manuscript as the reviewer suggested. The description in the revised manuscript has been fixed as follows: “According to the expression of the marker genes, we assigned clusters to cell type identities of neurocytes (including radial glia (RG), outer radial glia (oRG), intermediate progenitor cells (IPCs), ventral precursor cells (VP), excitatory neurons (EN), inhibitory neurons (IN), oligodendrocyte progenitor cells (OPC), oligodendrocytes, astrocytes, ventral LGE-derived interneuron precursors and Cajal-Retzius cells, or non-neuronal cell types (including microglia, endothelial, meninge/VALC(vascular cell)/pericyte, and blood cells). Based on the expression of the marker gene, cluster 23 was identified as thalamic cells, which are small numbers of non-cortical cells captured in the sample collection at earlier time points. Each cell cluster was composed of multiple embryo samples, and the samples from similar stages generally harbored similar distributions of cell types.”.

      It's important to define the onset of gliogenesis in the text and figure. What panels/ages show this?

      We identified the onset of gliogenesis by statistically analyzing the percentage of astrocytes (cluster 13) at each time point and added the result in Figure S1. The statistical results showed that the proportion of astrocytes was deficient at E40 and E50 time points and increased significantly at E80 and E90, suggesting the onset of macaque gliogenesis might be around embryonic 80 days to 90 days. The result was consistent with published research on the timing of the neuron-glial transition in primates (Rash, Brian G et al. Proceedings of the National Academy of Sciences of the United States of America 201. doi:10.1073/pnas.1822169116. PMID: 30894491).

      Figure 2:

      Why are there so few neurons at E90? Is it capture bias, dissociation challenges (as postulated for certain neuronal subtypes in the discussion), or programmed cell death at this time point?

      We thought it was because mature neurons at E90 with abundant axons and processes were hard to settle into micropores of the BD method for single cell capture. Due to the fixed size of the BD Rhapsody microwells, this sing-cell capture method might be less efficient in capturing mature excitatory neurons but has a good capture effect on newborn neurons at each sampling time point. In conclusion, based on the BD cell capture method feature, the immature neurons at each point are more easily captured than mature neurons in our study, so the generation of excitatory neurons at different developmental time points can be well observed, as shown in Figure 2, which aligns with our research purpose.

      The authors state: "We then characterized temporal changes in the composition of each EN subcluster. While the EN 5 and EN 11 (deep-layer neurons) subclusters emerged at E40 and E50 and disappeared in later stages, EN subclusters 1, 2, 3, and 4 gradually increased in population size from E50 to E80 (Figure 2D)." What about EN7? It's labeled as an upper layer neuron that is proportionally highest at E40. Could this be an interesting, novel finding? Does this indicate something unique about macaque corticogenesis? The authors don't describe/discuss this cell type at all.

      We apologize for the manuscript’s lack of detailed descriptions of EN results. In our study, EN7 is identified as CUX1-positive, PBX3-positive, and ZFHX3-positive excitatory neuron subcluster. The results of Fig. 2B show that EN7 was mainly captured from the early time points (E40/E50) samples. Above description was added in the revised manuscript.

      The Pbx/Zfhx3-positive excitatory neuron subtype reported in Moreau et al. study on mouse neocortex development progress ( Moreau, Matthieu X et al. Development. 2021. doi:10.1242/dev.197962. PMID: 34170322). Our study verified that the Pbx3/Zfhx3-positive cortical excitatory neurons also exist in the early stage of prenatal macaque cortex development.

      Is there any unique gene expression in identified subtypes that are surprising? Did the comparison against human data, in later figures, inform any unique features of gene expression?

      Based on the excitatory neuron subclusters analysis result in our study, we found no astonishing results in excitatory neuron subclusters. In subsequent integrated cross-species analyses, macaque excitatory neurons showed similar transcriptional characteristics to human excitatory neurons. In general, excitatory neurons tend to have a greater diversity in the cortex of animals that are more advanced in evolution (Ma, Shaojie et al. Science. 2022. doi:10.1126/science.abo7257. PMID: 36007006; Wei, Jia-Ru et al. Nat Commun. 2022. doi:10.1038/s41467-022-34590-1. PMID: 36371428; Galakhova, A A et al. Trends Cogn Sci. 2022. doi:10.1016/j.tics.2022.08.012. PMID: 36117080; Berg, Jim et al. Nature. 2021. doi:10.1038/s41586-021-03813-8. PMID: 34616067). Since only single-cell transcriptome data was analyzed in this study, we did not find any unique features of the prenatal developing macaque cortex excitatory neurons in the comparison against the human dataset due to the limitation of information dimension.

      Figure 3:

      The identification of terminal oRG differentiation genes is interesting. The confirmation of known gene expression as well as novel markers that indicate different states/stages of oRG cells is a valuable resource. As the identification of described ion channel expression is a novel finding, it should be explored more and would be strengthened by validation in tissue samples and, if possible, functional assays.

      E is the most novel part of this figure, but it's very hard to read. I think increasing the focus of this figure onto this finding and parsing these results more would be informative.

      Thanks for the positive comments. We apologize for the lack of clarity and conciseness in figure visualizations. We hypothesized vRG to oRG cell trajectories into three phases: onset, commitment, and terminal. The leading information conveyed by Figure 3E was the dynamic gene expression along the developmental trajectory from vRG to oRG. Specific genes were selected and shown in the schema diagram of new Figure 3.

      We verified terminal oRG differentiation genes in the recently published macaque telencephalic development dataset(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652) (GEO accession: GSE226451). The results of Author response image 2 show that the gene expression showed states/stages. Most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.

      I'm curious about the granularity of the oRG_C12 terminal cluster. Are there ways to subdivide the different cells that seem to be glial-committed vs actively dividing vs neurogenically committed to IPCs? In the text, the authors referred to different oRG populations, but they are annotated as the same cluster and cell type. The authors should clarify this.

      According to the reviewer's suggestion, we subdivide the oRG_C12 into eight subclusters. Based on the marker gene in Author response image 5C, subclusters 1,2 and 4 might be glial- committed with AQP4/S100B positive expression; subclusters 3,6,7 might be neurogenically committed to IPCs with NEUROD6 positive expression; subclusters 0,3,5,6,7 might be actively dividing state with MKI67/TOP2A positive expression.

      Author response image 5.

      Subdivide analysis of oRG_C12. (A)and (B) Subdividing of e oRG_C12 visualized via UMAP. Cells are colored according to subcluster timepoint (A) and subcluster identities (B). (C) Violin plot of molecular markers for the subclusters.

      Figure 4:

      Annotating/labeling the various EN clusters (even as deep/upper) would help improve the clarity of this and other figures. It's clear what each progenitor subtype is but it's hard to read the transitions. Why are all the EN groups in pink/red? It makes the data challenging to interpret.

      In Figure4A, we use different yellow/orange colors for deep-layer excitatory neuron subclusters (EN5 and EN10), and different red/pink colors for upper-layer excitatory neuron subclusters (EN1, EN2, EN3, EN4, EN6, EN7, EN8 and EN9). We add the above information in the legend of Figure 4 in the revised manuscript.

      E50 seems to be unique - what's EN11?

      Based on the molecular markers for EN subclusters in Author response image 2, we recognized EN11 as a deep-layer excitatory neuron subcluster expressing BCL11B and FEZF2. As explained in the above reply, the microplate of BD has a good effect on capturing newborn neurons at each time point. The EN11 was mainly a newborn excitatory neuron at the E50 timepoint, which makes the subcluster seem unique.

      Author response image 6.

      Vlnplot of different markers in EN8.

      Figure 4E - the specificity of gene expression for deep vs upper layer markers seems to be over stated given the visualized gene expression pattern (ex FEZF2). Could the right hand panels be increased to better appreciate the data and confirm the specificity, as described.

      In our study, we used slingshot method to infer cell lineages and pseudotimes, which have been used to identifying biological signal for different branching trajectories in many scRNA- seq studies. We apologize for the lack of visualization clarity in the figure 4E. Due to the size limitation of the uploaded file, the file was compressed, resulting in a decrease in the clarity of the image. Below, we provided figure 4E with a higher definition and increased several genes’ slingshot branching tree results according to the reviewer's suggestion.

      Figure 5:

      There are some grammatical typos at the bottom of page 8. In this section, it also feels like there is a missing logical step between expansion of progenitors through elongated developmental windows that impact long-term expansion of the upper cortical layers.

      We apologize for the grammatical typos and have corrected them in the revised manuscript. We understand the reviewer’s concern. Primates have much longer gestation than rodents, and previous study evidence had shown that extending neurogenesis by transplanting mouse embryos to a rat mother increases explicitly the number of upper-layer cortical neurons, with concomitant abundant neurogenic progenitors in the subventricular zone(Stepien, Barbara K et al. Curr Biol. 2020. doi:10.1016/j.cub.2020.08.046. PMID: 32888487). We thought this mechanism could also explain primates' much more expanded abundance of upper-layer neurons.

      I'm curious about the IPCs that arise from the oRGs. Lineage trajectory shows vRG decision to oRG or IPC, but oRGs also differentiate into IPCs. Could the authors conjecture why they are not in this dataset or are indistinguishable from vRG-derived IPCs.

      Several published experiments have proved that oRG can generate IPC in human and macaque developing neocortex. (Hansen, David V et al. Nature. 2010. doi:10.1038/nature08845. PMID: 20154730; Betizeau, Marion et al. Neuron. 2013. doi:10.1016/j.neuron.2013.09.032. PMID: 24139044). Clearly identifying the difference between IPC generated from vRG and oRG at the transcriptional level in our single-cell transcriptome dataset is difficult. We hypothesized that the IPCs produced by both pathways have highly similar transcriptional features. Due to the limit of the scRNA data analysis algorithm used in this study, we didn’t distinguish the two kinds of IPC, which could not be in terms of pseudo-time trajectory reconstruction and transcriptional data.

      Figure 6 :

      How are the types 1-5 in 6A defined? Were they defined in one species and then applied across the others?

      We applied the same analysis to each species. We first picked up vRG cells in each species dataset and screened the differentially expressed genes (DEGs) between adjacent development time points using the “FindMarkers” function (with min. pct = 0.25, logfc. threshold = 0.25). After separate normalization of the DEG expression matrix from different species datasets, we use the “standardise” function from the Mfuzz package to standardize the data. The DEGs of vRG in each species were grouped into five clusters using the Mfuzz package in R with fuzzy c- means algorithm.

      The temporal dynamics in the highlighted section in B have interesting, consistent patterns of gene expression of the genes described, but what about the genes below that appear less consistent temporally? What processes do not appear to be conserved, given those gene expression differences?

      Many thanks for the constructive comments. The genes in Figure 6B below are temporal dynamics non-conserved transcription factors among the three species vRG. We performed a functional enrichment analysis on the temporal dynamics of non-conserved transcription factors with the PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System(https://www.pantherdb.org/), and the analysis results are shown in Author response image 7. The gene ontology (GO) analysis results show that unconserved transcription factors were related to different biological processes, cellular components, and molecular functions. However, subsequent experiments are still needed to verify specific genes.

      Author response image 7.

      Gene Ontology (GO) analysis of unconserved temporal patterns transcription factors among mouse, macaque and human vRG cells.

      The identification of distinct regulation of gene networks, despite conservation of transcription factors in discrete cell types, is interesting. What does the comparison between humans and macaques indicate about regulatory differences evolutionarily?

      We appreciate the reviewer for the comments. We performed the TFs regulation network analysis of human vRG with pyscenic workflow. The top transcription factors of every time point in human vRG were calculated, and we used the top 10 TFs and their top 5 target genes to perform interaction analysis and generate the regulation network of human vRG in revised figure 6. In comparison of the pyscenic results of mouse, macaque and human vRG, it was obvious that the regulatory networks were not evolutionarily conservative. Compared with macaque, the regulatory network of transcription factors and target genes in humans is more complex. Some conserved regulatory relationships present in more than one species are identified, such as HMGN3, EMX2, SOX2, and HMGA2 network at an early stage when deep lager generation and SOX10, ZNF672, ZNF672 network at a late stage when upper-layer generation.

      Reviewer #3 (Recommendations For The Authors):

      The data should be compared to a similar brain region in human and mouse, if available. (See data from PMCID: PMC8494648).

      We appreciate the reviewer’s suggestions. In Figure 6, the species-integration analysis, the mouse data were from the perspective of the somatosensory cortex, macaque data were mainly from the parietal lobe in this study, and human data including the frontal lobe (FL), parietal lobe (PL), occipital lobe (OL), and temporal lobe (TL). PMC8494648 offered high-quality data covering the period of gestation week 14 to gestation week 25. However, our study's development stage of rhesus monkeys is E40-E90 days, corresponding to pcw8-pcw21 in humans. The quality of data from PMC8494648 is particularly good. However, the developmental processes covered by PMC8494648 don’t perfectly match the development time of the macaque cortex that we focused on in this study. Therefore, it is challenging to integrate the dataset (PMCID: PMC8494648) into the data analysis part. However, we have cited the results of this precious research (PMCID: PMC8494648) in the discussion part of the revised manuscript.

      A deeper assessment of these data in the context of existing studies would help distinguish the work and enable others to appreciate the significance of the work.

      We appreciate the reviewer’s constructive suggestions. The human regulation analysis with pyscenic workflow was added into new figure 6 for the comparison of different species vRG regulatory network. Analysis of the regulatory activity of human, macaque and mouse prenatal neocortical neurogenesis indicated that despite commonalities in the roles of classical developmental TFs such as GATA1, SOX2, HMGN3, TCF7L1, ZFX, EMX2, SOX10, NEUROG1, NEUROD1 and POU3F1. The top 10 TFs of the human, macaque, and mouse vRG each time point and their top 5 target genes identified by pySCENIC as an input to construct the transcriptional regulation network (Figure 6 D, F and H). Some conserved regulatory TFs present in more than one species are identified, such as HMGN3, EMX2, SOX2, and HMGA2 at an early stage when deep- lager generation and SOX10, ZNF672, and ZNF672 at a late stage when upper-lay generation.

      Besides, we performed some comparative analysis with our macaque dataset and the newly published macaque telencephalon development dataset. The results were only used to provide additional information to reviewers and were not included in the revised manuscript.

      To verify the reliability of our cell annotation results, we compared the similarity of cell-type association between our study and recently published research(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652), using the scmap package to project major cell types in our macaque development scRNA-seq dataset to GSE226451. The river plot in Author response image 1 illustrates the broadly similar relationships of cell type classification between the two datasets. Otherwise, we used more marker genes for cell annotation to improve the results of cell type definition in new Figure 1D. Besides, the description of distinct excitatory neuronal types has been improved in the new Figure 2.

      Furthermore, we verified terminal oRG differentiation genes in the recently published macaque telencephalic development dataset(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652) (GEO accession: GSE226451). The results of Authro response image 2 show that the gene expression showed states/stages. Most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.

    1. Author Response

      Note to the editor and reviewers.

      All the authors would like to thank the editorial team and the two anonymous reviewers for their efforts and thoughtfulness in assessing our manuscript. We very much appreciate it and we all believe that the manuscript has been much improved in addressing the comments and suggestions made.

      General considerations on the revised manuscript

      We have applied extensive modifications to the manuscript with our main goal being the improvement of clarity. The Introduction has been changed mainly to introduce precisely our terminology and we have stuck to it in the rest of the manuscript. The Results section has been divided up into more defined sections. The discussion has been extensively re-written to improve clarity, following the suggestion of the reviewers. Main figures 1 and 4 have been modified with clearer schematics. Supplementary figures and legends have been modified and several supplementary schematic figures have been added to clearly present our interpretations for various data. We have added a Supplementary Discussion where the most detailed technical parts of our discussion are presented to avoid unnecessarily weighing down the main discussion, where our main conclusions are outlined. We have presented our mass photometry mixing experiment in a new supplementary figure, with detailed explanation. We have also expanded our discussion of in vivo and general relevance of our study.

      Response to manuscript evaluation

      Our manuscript has been evaluated as a valuable study and presenting solid experimental evidence. We appreciate the recognition of our work.

      Two weaknesses were identified by reviewers: 1) our experiments do not completely exclude the possibility of an alternative nucleophile. This relates to the evaluation of our experimental evidence. 2) Our study does not address the in vivo relevance of the interface swapping phenomenon, which relate to the value of the study for the community.

      Response to the evaluation of experimental evidence (Weakness #1):

      We argued in the original manuscript that we have excluded completely the presence of an alternative nucleophile. This conclusion is based on a series of experiments which were presented in the originally submitted manuscript. These experiments are not discussed by the reviewers in relation to this main conclusion and therefore we suggest that they have not been properly evaluated. We believe our conclusion to be appropriately supported by these data (see our response to reviewer #1). In addition, the criticism of our gel-filtration data by reviewer #2 was based on a misinterpretation of Supplementary figure 1 b. We accept of course that the way the data was presented could be misleading and we assume responsibility for this. We have attempted to correct this by changing the main text and the figures legends and annotation. In conclusion, we believe that the evaluation of experimental evidence as presented in the revised manuscript could be upgraded to “convincing”.

      Response to our study general relevance evaluation (weakness #2):

      We agree with both reviewers about the in vivo relevance of our observation being an important question, not addressed so far. Indeed, the value of our study would be greatly increased by in vivo data and be of interest to a wider audience. However, we would like to argue that our study would interest a wider audience than initially stated for the following reasons: 1) Our study is the first evidence of interface swapping in vitro and will constitute a base to investigate this phenomenon both in vivo and in vitro. It will therefore interest a wide audience due to the potential involvement of interface swapping in a wide range of processes, such as recombination, evolution, and drug targeting (see also below). 2) DNA cleavage is the central mode of action of antibiotics targeting bacterial type II topoisomerases (i.e. topoisomerases “poisons”). This already established target is one of the few having produced new scaffolds and too few new antibacterial are in production to fulfill medical needs. The role of interface stability is also emerging as a modulator of the efficiency of topoisomerase poisons. See for instance (Germe, Voros et al. 2018, Bandak, Blower et al. 2023). By shedding light on interface dynamics, our study will be of interest to scientist interested in the development of these drugs. In addition, the heterodimer system can potentially produce detailed mechanistic information (Gubaev, Weidlich et al. 2016, Hartmann, Gubaev et al. 2017, Stelljes, Weidlich et al. 2018) not only on gyrase but also on other, dimeric type II topoisomerases or even other dimeric enzyme in general. We have amended the manuscript to make these points clearer. Therefore, we believe that the evaluation of the revised manuscript’s relevance could be upgraded to “important”.

      Point-by-point response to the reviewer

      Reviewer #1 (Public Review):

      Germe and colleagues have investigated the mode of action of bacterial DNA gyrase, a tetrameric GyrA2GyrB2 complex that catalyses ATP-dependent DNA supercoiling. The accepted mechanism is that the enzyme passes a DNA segment through a reversible double-stranded DNA break formed by two catalytic Tyr residues-one from each GyrA subunit. The present study sought to understand an intriguing earlier observation that gyrase with a single catalytic tyrosine that cleaves a single strand of DNA, nonetheless has DNA supercoiling activity, a finding that led to the suggestion that gyrase acts via a nicking closing mechanism. Germe et al used bacterial co-expression to make the wild-type and mutant heterodimeric BA(fused). A complexes with only one catalytic tyrosine. Whether the Tyr mutation was on the A side or BA fusion side, both complexes plus GyrB reconstituted fluoroquinolone-stabilized double-stranded DNA cleavage and DNA supercoiling. This indicates that the preparations of these complexes sustain double strand DNA passage. Of possible explanations, contamination of heterodimeric complexes or GyrB with GyrA dimers was ruled out by the meticulous prior analysis of the proteins on native Page gels, by analytical gel filtration and by mass photometry. Involvement of an alternative nucleophile on the Tyr-mutated protein was ruled unlikely by mutagenesis studies focused on the catalytic ArgTyrThr triad of residues. Instead, results of the present study favour a third explanation wherein double-strand DNA breakage arises as a consequence of subunit (or interface/domain) exchange. The authors showed that although subunits in the GyrA dimer were thought to be tightly associated, addition of GyrB to heterodimers with one catalytic tyrosine stimulates rapid DNA-dependent subunit or interface exchange to generate complexes with two catalytic tyrosines capable of double-stranded DNA breakage. Subunit exchange between complexes is facilitated by DNA bending and wrapping by gyrase, by the ability of both GyrA and GyrB to form higher order aggregates and by dense packing of gyrase complexes on DNA. By addressing a puzzling paradox, this study provides support for the accepted double strand break (strand passage) mechanism of gyrase and opens new insights on subunit exchange that may have biological significance in promoting DNA recombination and genome evolution.

      The conclusions of the work are mostly well supported by the experimental data.

      Strengths:

      The study examines a fundamental biological question, namely the mechanism of DNA gyrase, an essential and ubiquitous enzyme in bacteria, and the target of fluoroquinolone antimicrobial agents.

      The experiments have been carefully done and the analysis of their outcomes is comprehensive, thoughtful and considered.

      The work uses an array of complementary techniques to characterize preparations of GyrA, GyrB and various gyrase complexes. In this regard, mass photometry seems particularly useful. Analysis reveals that purified GyrA and GyrB can each form multimeric complexes and highlights the complexities involved in investigating the gyrase system.

      The various possible explanations for the double-strand DNA breakage by gyrase heterodimers with a single catalytic tyrosine are considered and addressed by appropriate experiments.

      The study highlights the potential biological importance of interactions between gyrase complexes through domain-or subunit-exchange

      We thank the reviewer for their support, effort, and comments. The above is a great summary.

      Weaknesses:

      The mutagenesis experiments described do not fully eliminate the perhaps unlikely participation of an alternative nucleophile.

      We agree that the mutagenesis experiment on its own does not fully eliminate the possibility of an alternative nucleophile. The number of residues mutated is limited, and therefore it is possible we have missed a putative alternative nucleophile.

      However, we have other data and experiments supporting the conclusion that no alternative nucleophile exists. Therefore, we want to stress that our conclusion that no such alternative exist is based on these extra data. These data and experiments are not discussed by either reviewer despite being present in the original manuscript. This puzzled us and we have modified the manuscript and the figures in the hope that they, and their significance, would not be missed.

      Briefly:

      1) We have performed cleavage-based labeling of the nucleophile responsible for cleavage. This experiment is depicted in Figure 4. The nucleophilic activity of the residue involved results in covalent link between the polypeptide (that includes the residue) and radiolabeled DNA. Therefore, a polypeptide that includes an active nucleophile will be radiolabeled and visible, whereas a polypeptide that is missing an active nucleophile will remain unlabeled and invisible. We can distinguish the BA and the A polypeptide from their size. In the case of the BA.A complex both the BA polypetide and the A polypetide are radiolabeled and therefore both have an active nucleophile. In the case of the BAF.A complex, the unmutated A polypeptide is labeled, meaning that a nucleophile is still active. In contrast, the BAF polypeptide shows no detectable labeling. This result means that removing the hydroxyl group from the catalytic tyrosine abolishes any protein-DNA covalent link, suggesting that no other nucleophile from the BA polypetidic chain can substitute for the catalytic tyrosine hydroxyl group. This experiment excludes the possibility of an alternative nucleophile coming from the polypeptidic chain of either GyrA or GyrB. This experiment, described in figure 4, is not discussed by the reviewer. This experiment is similar in principle to early experiments identifying catalytic tyrosine in topoisomerases. See for instance, (Shuman, Kane et al. 1989).

      2) The experiment above does not exclude a nucleophile coming from the solvent. To exclude this possibility, we have used T5 exonuclease (which needs a free 5’ DNA end to digest) and ExoIII (which need a free 3’ DNA end to digest). We have shown the reconstituted cleavage is not sensitive to T5 and sensitive to ExoIII. This shows that the 5’ end of the cleaved sites are protected by a bulky polypeptide impairing T5 activity, which is active in our reaction as shown by the digestion of a control DNA fragment. This experiment shows that the reconstituted cleavage is very unlikely to come from a small nucleotide potentially provided by the solvent. This experiment is described in the main text and the results are shown in supplementary figure 5. It is not mentioned by either reviewer.

      3) Finally, we would like to emphasize our experiment comparing the BAF.A59 to BALLL.A59. The BALLL.A59 complex displays increased cleavage compared to BAF.A59. If this increased cleavage was due to an alternative nucleophile on the BALLL side, we would expect an accompanying increase in supercoiling activity since the BALLL.A59 possesses one CTD, which is sufficient for supercoiling. The fact that no increased supercoiling activity is observed strongly suggests subunit exchange reconstituting an A59 dimer, inactive for supercoiling but active for cleavage. We believe this somewhat complex observation to be quite significant and we have attempted to clarify the manuscript and discuss its full significance in several places.

      Reviewer #1 (Recommendations For The Authors):

      An interesting paper on DNA gyrase that explains a puzzling paradox in terms of the double-strand break mechanism.

      Major points

      1) The authors consider several mechanisms that could potentially explain their data. On page 15, the authors present the evidence against the nicking closing mechanism proposed by Gubaev et al. Throughout the manuscript, they indicate where their experimental results agree with this earlier work but should also indicate and account for differences. For example, Gubaev et al describe cross linking experiments that they claim rule out subunit exchange. These aspects should be clearly explained.

      Thank you for the suggestion. We have re-written the discussion to address this point. We are extensively discussing experiments from (Gubaev, Weidlich et al. 2016), and offer our interpretation of apparently conflicting results. We suggest that their experiments are basically consistent with our data when correctly interpreted. To keep the main manuscript clear, we have added a supplementary discussion where experiments from (Gubaev, Weidlich et al. 2016) are discussed further in relation to our data.

      2) Page 9. The experiments done to rule out the perhaps unlikely alternative nucleophile hypothesis relate to the possible role of the Arg and Threonine of the RYT triad. These residues are close to the DNA and therefore are prime candidates and attractive targets for mutagenesis. However, strictly speaking, the mutant enzyme data presented do not rule all possibilities. For example, Serine is often the nucleophile used by resolvases to effect DNA recombination via subunit exchange. The ideal experiment to rule out/rule in other nucleophiles would be to identify the residue(s) that become attached to DNA in the cleavage reaction.

      Please see above. We have effectively ruled an alternative nucleophile with our cleavage-based labeling experiment and others that were present and discussed in the original manuscript but were missed. We have modified the manuscript and figures in order to make this point clearer than before.

      3) p17. The readout for subunit exchange used by the authors is double-stranded DNA cleavage. Attempts to directly detect the formation of the DNA cleaving complexes GyrA2B2 and (GyrBA)2 (arising from subunit exchange between heterodimers) by mass photometry were not successful. Perhaps FRET would have been another approach to try as it could also detect interface and domain interchanges.

      Directly detecting interface exchange directly by proximity experiment would be extremely useful. FRET would have to be done in the BAF.A + GyrB configuration where the amount of interface exchange is important. Now, we do not have the tools to do that and developing them would be outside the scope of the study. We propose cross linking experiment to be done in the future. We argue that the manuscript is convincing without these for now. This will be addressed in the future. This point, and other possible future experiments are now discussed in the discussion section.

      4) The underlying canvas of this paper is the strand passage mechanism of gyrase. It would seem appropriate to include the papers first proposing it - Brown P.O and Cozzarelli N.R. (1979) and Mizuuchi K et al (1980).

      We very much agree. These papers have now been added in the introduction as appropriate, highlighting the relationship between double-strand cleavage and the strand-passage mechanism.

      5) Figure 1. The quality of the insets is poor. It is difficult to pick out the key catalytic residues and their disposition vis-a-vis DNA.

      We agree, Figure 1 has been re-done and the schematic theme has been harmonized throughout the whole manuscript. We very much hope that clarity has improved. Thank you for the suggestion.

      6) The experimental work is a very detailed analysis of a specific feature of engineered gyrase heterodimers. Making the work accessible to the general reader will be important. Using shorter paragraphs each with a specific theme might help. In particular, the second paragraph of the Results on p7, the section on p9 and bottom of p11, p13 and the first paragraph of the Discussion on p14 are each a page or more long. A shorter manuscript that avoids overinterpretation of the smaller details would also help.

      We agree. We have now split long paragraphs into individual sections, with titles, in the Results. This structure is recapitulated at the beginning of the discussion, and we have split the discussion into shorter paragraphs, each with a unique point being made.

      7) The impact of the Gubaev et al (2016) paper for the field in general, and as the catalyst for the present work should be better documented. Mention of this earlier paper and its significance at the beginning of the Abstract and elsewhere e.g in the Introduction might also help with a more logical organization of the current findings and result in a shorter paper (which would be easier to read).

      We have added a reference to (Gubaev, Weidlich et al. 2016) in the abstract and have expanded our introduction

      Minor points

      1) Legends for Figs 2 and 6; Supplementary Figs 1 and 8. The designation of subfigures as a, b, c, d , e etc appears to be incorrect. Check throughout and in the text.

      The manuscript has been checked for such errors.

      2) Figure 2, and first paragraph p8. Peaks in Fig 2c should be labelled to facilitate discussion on p8.

      Agreed, this has been done.

      3) Supplementary Fig 4 and elsewhere in the manuscript. A variety of notations are used to denote phenylalanine mutants e.g. AsubscriptF, AsuperscriptF and AF. Check and use one format throughout.

      Done

      4) Figures showing gels include the label '+EtBr, +cipro'. This is somewhat confusing because EtBr was contained in the gel (not the samples) whereas cipro was included in the reaction. Modify or describe in the legend..

      We have re-written the figure legend.

      5) Supplementary Fig 4b describes a small effect on the ratio of linear to nicked DNA for the triple LLL mutant. Is this significant? How many times was the measurement made?

      This has been addressed in the original manuscript in the supplementary data. In term of quantification, the experiment has been done 3 times for each prep, with the same GyrB prep and concentration. The standard error is displayed on the figure. This result is very reproducible and have been reproduced more than 3 times. No LLL cleavage assay showed more single-strand than double-strand cleavage. For the phenylalanine mutant, no cleavage assay showed more double-strand than single-strand cleavage.

      6) Supplementary Fig 5 legend. Should 'L' read 'size markers' (and give their sizes)?

      Yes indeed, we have modified the figure to clarify.

      7) p11 line 5. Is this statement correct?

      Yes, it is correct. Although we hope we are on the same line. When the Tyrosine is mutated on one side only of the heterodimer, both single- and double-strand cleavage are protected from T5 exonuclease digestion.

      8) 12 last line should read...and supercoiling activity (not shown)..were

      Thank you, done.

      There are a number of typos throughout the text, for example:

      Page 3 line..Difficult to conclude...what?

      Page 3 para 3...Lopez....and Blazquez

      We have corrected these typos and checked the whole manuscript.

      Reviewer #2 (Public Review):

      DNA gyrase is an essential enzyme in bacteria that regulates DNA topology and has the unique property to introduce negative supercoils into DNA. This enzyme contains 2 subunits GyrA and GyrB, which forms an A2B2 heterotetramer that associates with DNA and hydrolyzes ATP. The molecular structure of the A2B2 assembly is composed of 3 dimeric interfaces, called gates, which allow the cleavage and transport of DNA double stranded molecules through the gates, in order to perform DNA topology simplification. The article by Germe et al. questions the existence and possible mechanism for subunit exchange in the bacterial DNA gyrase complex.

      The complexes are purified as a dimer of GyrA and a fusion of GyrB and GyrA (GyrBA), encoded by different plasmids, to allow the introduction of targeted mutations on one side only of the complex. The conclusion drawn by the authors is that subunit exchange does happen, favored by DNA binding and wrapping. They propose that the accumulation of gyrase in higher-order oligomers can favor rapid subunit exchange between two active gyrase complexes brought into proximity.

      The authors are also debating the conclusions of a previous article by Gubaev, Weidlich et al 2016 (https://doi.org/10.1093/nar/gkw740). Gubaev et al. originally used this strategy of complex reconstitution to propose a nicking-closing mechanism for the introduction of negative supercoils by DNA gyrase, an alternative mechanism that precludes DNA strand passage, previously established in the field. Germe et al. incriminate in this earlier study the potential subunit swapping of the recombinant protein with the endogenous enzyme, that would be responsible for the detected negative supercoiling activity.

      Accordingly, the authors also conclude that they cannot completely exclude the presence of endogenous subunits in their samples as well.

      Strengths

      The mix of gyrase subunits is plausible, this mechanism has been suggested by Ideka et al, 2004 and also for the human Top2 isoforms with the formation of Top2a/Top2b hybrids being identified in HeLa cells (doi: 10.1073/pnas.93.16.8288).

      Germe et al have used extensive and solid biochemical experiments, together with thorough experimental controls, involving :

      • the purification of gyrase subunits including mutants with domain deletion, subunit fusion or point mutations.

      • DNA relaxation, cleavage and supercoiling assays

      • biophysical characterization in solution (size exclusion chromatography, mass photometry, mass spectrometry)

      Together the combination of experimental approaches provides solid evidence for subunit swapping in gyrase in vitro, despite the technical limitations of standard biochemistry applied to such a complex macromolecule.

      We thank the reviewer for their supportive and considered comments.

      Weaknesses

      The conclusions of this study could be strengthened by in vivo data to identify subunit swapping in the bacteria, as proposed by Ideka et al, 2004. Indeed, if shown in vivo, together with this biochemical evidence, this mechanism could have a substantial impact on our understanding of bacterial physiology and resistance to drugs.

      Thank you for this comment. Indeed, whether this interface exchange can happen in vivo and lead to recombination is a very important question. However, we believe that this is outside the scope of this study simply because of the amount of work one can fit into one paper. Proving that interface exchange can happen in vitro has already necessitated a number of non-trivial experiments and likewise investigating interface exchange in vivo will require a careful, long-term study (see our reply to reviewer #2 comment, who also raised this point). We can’t address it with one additional experiment with the tools we have. However, we very much hope to do it in the future.

      Reviewer #2 (Recommendations For The Authors):

      Specific questions and comments for the authors:

      1) Complex identification during purification

      The statement line 236-237 that "Our heterodimer preparation showed a single-peak on a gel-filtration column, distinct from the GyrA dimer peak" is not entirely clear. In Fig supp 1 b, how can the authors conclude from the superose 6 that GyrBA is separated from the GyrA dimer? Since they seem close in size 160/180kDa, they are unlikely to be well separated in a superose 6 gel filtration column. The SDS-PAGE seems to show both species in the same fractions #15-17 therefore it would not be possible to distinguish GyrBA. A from A2.

      There appears to be some confusion about what Supp Fig. 1b shows. First, in all our gel filtration conditions both GyrBA and GyrA can’t exist as monomers at a significant concentration. Therefore, we can never observe the GyrBA monomer on a gel filtration column. Supp Fig. 1b shows the gel filtration profile of the BA.A heterodimer only. This is the output of the last, polishing step in the reaction. We analyze these results using SDS-PAGE. Therefore, the BA.A heterodimer will be denatured and separated into 2 polypeptides: GyrBA and GyrA, which migrates according to their size in an SDS-PAGE and forms two bands. These two bands do not represent two separate species in solution. They represent the separation of one species only, the BA.A heterodimer into its two, denatured, subunits: GyrA and GyrBA. We do not conclude from Supp Fig. 1 as a whole that GyrBA and the GyrA dimer are well separated, and this is not stated in the manuscript. We conclude that the BA.A dimer is fairly well separated from the GyrA dimer. They have significant different size (~260 kDa and ~180 kDa respectively) and form different peaks on a gel filtration column. The BA.A heterodimer has a GyrA subunit and therefore will shows a GyrA band on an SDS-PAGE, like the GyrA dimers but the two are obviously distinct in their quaternary structure. We are hoping that our new schematics and re-write of some of the results and figure legends will clarify this.

      Panel 6 shows a different elution volume for the 2 species BA.A and A2 on an analytical S200 column, which appears better at separating the complexes in this size range.

      Did the authors consider using a S200 column instead of superose 6 for the sample preparation, to optimize the separation of GyrBA. A from A2?

      This is not a necessarily true statement (see above). We have not run the GyrA dimer on a Superose 6 column. The analysis was done on an s200 because extensive data for the GyrA dimer was already available with this, already calibrated column. We do not expect the Superose 6 to be worse in this size range. In fact, it might even be better. The Superose 6 profile in Supp. Fig. 1b shows BA.A only and no GyrA dimer. We have clarified the annotations in the figure to make this clearer.

      Regarding the analytical gel filtration experiment, there is however an overlap in the elution volume in the analytical column, therefore how can the authors ensure there is no excess free A2 complex in the GyrBA. A sample?

      Indeed, there is an overlap, but we argue that it is overstated. The important part of the overlap is where the maximum height of the GyrA peak is positioned compared to the BA.A trace, not where the traces intersect. This overlap is minimal. If a contaminating GyrA peak was hidden in the BA.A peak, it would have to be at least 10 times less intense than the BA.A peak. Since BA.A and GyrA dimer have roughly the same extinction coefficient, this means that a contamination would detectable at 10 % or even less. Our mass photometry further excludes such contamination.

      Alternatively, the addition of a larger (cleavable) tag at the C-terminal end of the BA construct (therefore not disturbing dimer association) could allow to better distinguish the 2 populations already at the size exclusion step.

      This is true and could allow cleaner purification. There are also other ways to achieve cleaner purification, like adding a secondary tag. However, like we argue in the manuscript, our contaminations are already minimal. It is questionable what benefits could be gained in changing the protocol. We also argue that the tandem tag method does not completely exclude contamination (Supplementary Discussion) and therefore we are not sure if this would be worth the time and expenditure.

      2) GyrA and GyrB Oligomers:

      In the mass photometry experiment, the authors explain that the low concentration of the proteins promotes dissociation of GyrA dimers, hence the detection of GyrA monomers instead of GyrA dimers, which are also detected in the GyrBA.A sample.

      However, it cannot be concluded that the GyrA dimer is not formed in the condition of the gel filtration chromatography, at higher concentration.

      In our mass photometry experiment, The BA.A sample is not as diluted as the GyrA dimer and much closer to our experimental condition. Since we have calculated the dissociation constant, we can calculate the expected level of dissociation (or reassociation). The level of dissociation is minimal in these conditions. If some dissociation is expected from the BA.A heterodimers, a very low amount of GyrBA monomer should also be present and yet they are not observed. We presume that it is because mass photometry is much more sensitive to GyrA (see our mixing mass photometry experiment that we have added). If the GyrA would reassociate at higher concentration, it would do so either with itself (forming a GyrA dimer) or with the GyrBA monomer, reforming the heterodimer. Assuming both GyrA dimer and heterodimer have the same dissociation constant, roughly one third of the GyrA monomer would reassociate with themselves. Assuming even complete reassociation of the GyrA dimer, this would leave only GyrA dimer accounting for 2% of the prep.

      Another interpretation would be to assume that GyrBA monomers are not present at all and that GyrA monomer are reassociating only with themselves. This is not valid because of the following thermodynamic reason:

      Since the profile for the GyrA dimer are collected at equilibrium, we should expect a ratio between GyrA monomer and dimers that follow the dissociation constant. In other words, if the GyrA monomer were in equilibrium with GyrA dimer we should expect a much higher dimer concentration already as the GyrA monomers are not as dilute. We do not observe a GyrA dimer peak in the BA.A profile, even though we can detect a low amount of GyrA dimer mixed with BA.A. Therefore, we conclude that the observed GyrA monomer must be in equilibrium with another dimerization partner, which is most probably the GyrBA monomer (see above). Therefore, only a minimal amount of GyrA dimer is expected to be formed at higher concentration by direct reassociation. This could probably increase if we let this solution-based exchange carry on for a long time at dissociation equilibrium. We have actually shown that this solution-based exchange is very slow and take several days because of the low dissociation at equilibrium.

      The mass spectrometry analysis in Fig 2 confirms the presence of (monomeric) GyrA in the sample, despite different experimental conditions.

      The concentration of heterodimer in the mass spectrometry experiment is actually higher than in the mass photometry experiment. This shows that self-reassociation of the GyrA monomer as suggested above is undetectable with mass spectrometry at higher concentration.

      We considered that the “GyrA monomer” peak could be a contaminating GyrB monomer, which is ~90 kDa, which would explain the lack of reassociation. However, the mass spectrometry peak shows precisely the expected molecular weight of GyrA so we interpret this peak as arising from very limited dissociation of the BA.A heterodimer. The reassociation is limited at high concentration due simply to the fact that the difference in concentration between the mass photometry and our other experimental conditions is not that high. The GyrA dimer had to be diluted 400 times to see significant dissociation and yet even at this very low concentration the dissociation is far from complete.

      Our general conclusions on the couple of point above is that we cannot completely exclude the presence of GyrA dimers being present, although they are undetectable in our working conditions either by mass photometry (lower concentration), Mass spectrometry (higher concentration) and even gel filtration (even higher concentration, see above). For the mass photometry, we have established that our detection threshold for a contamination is very low (see our mixing experiment).

      Figure 2A: the authors state in the introduction that GyrB is a monomer in solution and then explain that the upper bands in the native gel are multimer of GyrB. Could the authors comment and provide the size exclusion profile of the Gyr B purification?

      We have expanded our discussion of this. However, we have not been successful in collecting a gel filtration profile for GyrB. This is likely due to excessive oligomerization at the concentration we are using for gel filtration. We suggest that our mass photometry and Blue-Native PAGE experiment shows clearly that GyrB can be detected as a monomer in solution at the appropriate dilution. However, GyrB tends to oligomerize in a regular fashion (Consider especially Supp Fig. 8a), which suggest that it could align heterodimers on DNA in a linear, regular orientation. We have added a discussion of this.

      Together the relevance of the oligomeric state of purified GyrA or GyrB should be clarified, relative to their role in subunit swapping.

      We have added explanation in our discussion, while also trying to not be too speculative. Basically, we believe that GyrB oligomerization is likely to be involved. It is difficult to conclude for GyrA since no experiment has allowed us to test it. Therefore, the role of GyrA oligomerization, if any, is unclear. The GyrA tetramer is very prominent though and forms very easily. GyrB on the contrary forms longer oligomers more readily than GyrA and we surmise that this would help interface exchange. However, the structure of these GyrA and GyrB oligomers is not clear, which make it difficult to go beyond speculation on this. It would be a very interesting experiment if we were able to suppress GyrB oligomerization whilst conserving its ability to promote strand-passage and cleavage. Same goes for GyrA. Unfortunately, we are unable to do that at this time.

      4) Subunit exchange

      Line 320: the concept of subunit exchange in this context should be clearly explained. If one understands correctly, the authors mean that the BAF polypeptide, part of the BAF.A complex, could be replaced by a combination of B+A therefore forming a fully functional WT A2B2 gyrase complex.

      Thank you for the suggestion. We have harmonized and clearly defined our terminology for interface swapping and subunit exchange in the introduction and attempted to be much more rigorous when referring to it.

      A great effort has been done in this study to explain all the pros and cons of the experimental design but the length of the explanations may prevent readers outside of the field to fully appreciate the conclusions. This article would benefit from the addition of a few schematics to summarize the working hypothesis.

      Thanks for the suggestion. We have added a series of schematics to illustrate our interpretation for each construct. As mentioned above the terminology has been more rigorously defined and updated throughout the manuscript.

      5) Presence of endogenous GyrA

      Line 419-425: it is quite difficult to follow the explanations regarding the possible contamination of the sample by endogenous GyrA.

      Maybe these points should rather be addressed in the discussion, when debating the conclusions of Gubaev et al.

      We agree. We have re-organized the Discussion doing just that. We added a Supplementary Discussion in which we further discuss the contamination problem in relation to (Gubaev, Weidlich et al. 2016).

      Production of the subunits in another (non bacterial) expression system or a cell free system may prevent the association of endogenous protein.

      Absolutely. We are planning on addressing this in the future, using the yeast expression system.

      6) Mechanism for subunit swapping

      Lines 588-595: As described by the authors the BA fusion shows decreased activity when compared with the WT probably due to limited conformational flexibility in absence of an additional linker sequence between the fused subunits.

      The affinity of BA for A may possibly be reduced compared to the free A2B2 complex, due to a relative stiffness of the fusion upon full association with a free B subunit, as rightfully pointed by the authors.

      If subunit exchange do happen in vitro, at least in the conditions of this study, the authors could assess the affinity of BA for A, when compared to the association of free B and A subunits

      Experiments using analytical ultracentrifugation or surface plasmon resonance (SPR) may allow to determine the relative affinity of the BA +(A+B) compared to the A2B2 complex. This could be done also for the BALLL mutant and association with A59.

      It would be extremely useful to measure the affinity of BA for A. However, this is difficult because of the high affinity of the interface. To measure a dissociation constant, one has to be able to measure the concentration of the monomer and the dimer at equilibrium. Because of this, the complex must be diluted enough to see any dissociation, making detection difficult. In practice, this also means that we cannot purify monomeric versions of these subunits. We therefore can’t perform “on-rate” study on an SPR surface, which would require flowing monomers on its partner subunit tethered to the SPR surface. However, we could perform “off-rate” studies, but the dissociation time is likely to be very long, making the measurement difficult. We have not tried it though, and it could turn out to be informative. An analysis of antibodies off-rate done in the past could provide a guideline for us to perform this experiment. Analytical ultracentrifugation is an excellent technique and could in theory provide information. In practice however it would be still necessary to dilute the complex enough to obtain significant dissociation at equilibrium, making detection difficult. As far as we are aware, analytical ultracentrifugation rely on UV absorbance for protein detection and therefore we probably would not detect our material at the necessary dilution. We are however open-minded about technique with very sensitive detection methods that could be used.

      9) In vivo relevance

      The study does not conclude on the subunits exchange in vivo, which have been suggested by earlier studies by Ikeda et al. To elaborate further on the relevance of such mechanism in the bacteria, experiments involving the fluorescent labeling of endogenous / exogenous mutant subunits may be required to provide further information on this phenomenon.

      We completely agree that the in vivo relevance of such phenomena is the central question. Addressing this directly is not trivial though. Expressing both BA and A in vivo will results in random partnering and lead to a mix of dimers: A2 (1/4), BA2(1/4) and BA.A (1/2), assuming equal interface affinity. Therefore, to see subunit exchange in the same way as in vitro, one would have to get rid of the BA2 and A2 dimer together, or the BA.A dimer only. Our initial strategy to do that would be to engineer a specific dimer as being uniquely targeted for degradation. This could allow us to “get rid” of for instance the BA.A dimer. Subsequently, we would turn off the degradation and translation together and observe the rate of subunit exchange. This is not trivial though and would be the subject of a further study.

      10) Figure 3: I guess the "intact" label refers to the supercoiled DNA (SC) ? It also appears as "uncleaved" in supp Figure 6. The same label for this topoisomer should be used throughout.

      Thank you for pointing that out. It has now been corrected.

      Bandak, A. F., T. R. Blower, K. C. Nitiss, R. Gupta, A. Y. Lau, R. Guha, J. L. Nitiss and J. M. Berger (2023). "Naturally mutagenic sequence diversity in a human type II topoisomerase." Proceedings of the National Academy of Sciences 120(28).

      Germe, T., J. Voros, F. Jeannot, T. Taillier, R. A. Stavenger, E. Bacque, A. Maxwell and B. D. Bax (2018). "A new class of antibacterials, the imidazopyrazinones, reveal structural transitions involved in DNA gyrase poisoning and mechanisms of resistance." Nucleic Acids Res.

      Gubaev, A., D. Weidlich and D. Klostermeier (2016). "DNA gyrase with a single catalytic tyrosine can catalyze DNA supercoiling by a nicking-closing mechanism." Nucleic Acids Res 44(21): 10354-10366.

      Hartmann, S., A. Gubaev and D. Klostermeier (2017). "Binding and Hydrolysis of a Single ATP Is Sufficient for N-Gate Closure and DNA Supercoiling by Gyrase." J Mol Biol 429(23): 3717-3729. Shuman, S., E. M. Kane and S. G. Morham (1989). "Mapping the active-site tyrosine of vaccinia virus DNA topoisomerase I." Proc Natl Acad Sci U S A 86(24): 9793-9797.

      Stelljes, J. T., D. Weidlich, A. Gubaev and D. Klostermeier (2018). "Gyrase containing a single C-terminal domain catalyzes negative supercoiling of DNA by decreasing the linking number in steps of two." Nucleic Acids Res.

    1. Author Response

      Reviewer #3 (Public Review):

      Strengths:

      NanoPDLIM2, nanotechnologies that efficiently deliver lentivirus overcomes resistance to chemotherapy and anti-PD-1 immunotherapy. This is a new strategy for enhancing the efficiency of immune checkpoint inhibitors.

      This finding is important from a clinical translation perspective, but I have several minor concerns.

      Weaknesses:

      1) Please describe the mechanism of increased MHC class I and PD-L1 by PDLIM2.

      Our previous studies showed that PDLIM2 induces MHC-I induction through decreasing STAT3 whereas it is dispensable for PD-L1 expression (Sun et al, 2019, PMID: 31757943). In line with the studies, PD-L1 is induced by chemotherapeutic drugs, but not by NanoPDLIM2 (Figure 6A). Together with the roles of PDLIM2 in repressing RelA-dependent MDR1 induction by chemotherapy and in preventing expression of cell survival and proliferation genes by targeting both RelA and STAT3 (Sun et al, 2019, PMID: 31757943), further providing the mechanistic basis for the combination and synergistic effect of nanoPDLIM2, anti-PD-1 and chemo drugs. The improvement has now been further incorporated.

      2) Please describe the mechanism of decreased MDR1, nuclear RelA and STAT3 by PDLIM2.

      Our previous studies demonstrated that PDLIM2 reduces MDR1 expression by degrading nuclear RelA (Sun et al, 2019, PMID: 31757943).

      3) Please determine whether PDLIM2 expression directly impacts immune cells (function and number)?

      As shown in Figure 5, NanoPDLIM2 increased the number and activation of tumor infiltrating lymphocytes (TILs); and in prior study, PDLIM2 knockout repressed the numbers of TILs and inhibited the activation of CD4+ and CD8+ T cells, while its re-expression in lung tumors led to T cell activation (Sun et al. 2019, PMID: 31757943). On the other hand, selective deletion of PDLIM2 in immune cells and in particular myeloid cells repressed the numbers and activation of TILs (Li et al, 2021, PMID: 33539325; PMCID: PMC8021114). Thus, PDLIM2 may impact immune cells both directly and indirectly, particularly when nanoparticles can deliver PDLIM2 into both tumor cells and tumor-associated immune cells (despite PDLIM2 is delivered into much fewer immune cells compared to tumor cells).

      4) What is the efficiency of PDLIM2 delivery? Does delivery efficiency determine anti-tumor effect?

      As shown in the manuscript, the dose of PDLIM2 used already shows high delivery (20-30 copies per tumor cell in Figure 3B) and therapeutic efficacy in the mouse model of refractory lung cancer and particularly when being combined with anti-PD-1 and chemo drugs. It is of interest to test different doses in the model for the best delivery and efficacy, which is actively being pursued in the lab.

      5) Authors used a non-immunogenic tumor model. Can you demonstrate the combination effect with PDLIM2 in immunogenic lung cancer models to determine whether the combination of PDLIM2 with anti-PD-1 Ab confers a synergistic effect without chemotherapy?

      Yes, it is of interest to demonstrate the combination of PDLIM2 and anti-PD-1 in immunogenic lung cancer models with chemotherapy although a synergy is highly expected. The greatest challenge in the lung cancer field is the low response of non-immunogenic tumor, which is the focus of the current manuscript.

      6) On page 11, % change can make one over-interpret data.

      The % change has been removed from the manuscript.

      7) In Figure 5, what is the difference between 5A and 5D?

      Figure 5A shows the increase of TILs by nanoPDLIM2 in animals that did not receive PD-1 blockade immunotherapy, Figure 5D shows the increase of TILs by nanoPDLIM2 in animals received PD-1 blockade immunotherapy.

      8) It is unclear whether PDLIM2 confers an additive or a synergistic effect with anti-PD-1/chemo.

      PDLIM2 nanotherapy confers a synergistic effect with chemotherapy on increasing apoptosis in tumors (Figure 4B) and tumor reduction (Figure 4A and 6E, left panel, tumor number), confers a synergistic effect with antiPD-1 on increasing CD4+ and CD8+ TILs (Figure 5A and 5D), and apoptosis in tumors (Figure 5F), and an additive effect on tumor reduction (Figure 5C and 6E), and confers a synergistic effect with chemotherapy plus anti-PD-1 on increasing CD4+ and CD8+ TILs (Figure 5A and 6F) and tumor reduction (Figure 6E, left panel, tumor number).

      9) Have the authors tested any toxicity in normal lungs?

      Same to tumor lungs, no obvious toxicity has been observed in normal lungs.

      Reviewer #1 (Recommendations For The Authors):

      The paper is clear and well-written, although some minor edits are needed. For example, the title could be changed to reflect both human and mouse studies in the manuscript for more general readers. Moreover, 'lung cancer' should be used instead of 'lung cancers'. The manuscript could be further improved by validating their findings in a different model and particularly the syngeneic model of metastatic lung cancer for a better overall survival time by the new combination therapy, given the fact that clinical trial studies usually start in patients with metastatic tumors. But this is optional because the therapeutic effect on primary lung cancer is already significant.

      Thanks for the correction and wonderful suggestions. The “lung cancers” were replaced with “lung cancer”, and the title was changed to “Improving PD-1 blockade plus chemotherapy for complete remission of lung cancer by nanoPDLIM2”.

      Reviewer #2 (Recommendations For The Authors):

      1) What is the rationale for i.v. injection of nanoparticles containing PDLIM2 plasmid? Intranasal administration of nanoparticles may potentially target nanoPDLIM2 specifically to the lungs. Another potential option is intranasal infection of mice with adenovirus expressing PDLIM2.

      The rationale for i.v. injection of nanoPDLIM2 is that iv injected nanoPDLIM2 first reach into the lung and more importantly tumor tissues as well as the convenience and high efficacy of mouse i.v. injection, particularly when multiple injections are needed. Mice are much less stressful compared to other intranasal or even intratracheal injection. Adenovirus can be used only once, because it will initiate ant-viral immune response in mice.

      2) The authors examine PDLIM2 expression in lung tumors 1 week after i.v. administration of nanoparticles (Fig. 3A). Do all tumor cells express PDLIM2 after nanoPDLIM2 treatment? How long does PDLIM2 persist in the tumors? The kinetics of PDLIM2 expression may be informative to help interpret the results from the various combination treatments given to the mice. Multiple rounds of nanoPDLIM2 treatment could potentially improve the efficacy of the treatment.

      For all the sections examined (n=6), PDLIM2 was re-expressed in most but not all lung cancer cells at 1-week of the i.v administration. Accordingly, nanoPDLIM2 was injected weekly. We are examining if PDLIM2 reexpression can last longer. We are also testing the best dose with the best efficacy.

      3) Does the plasmid DNA from nanoparticles trigger an innate immune response in the lung that contributes to anti-tumor responses?

      In line with previous studies showing no effect on immune responses (Bonnet et al. 2008. PMID: 18709489), the dose used in current study does not significantly affect immune cells in the lung, suggesting no obvious effect of nanoparticles with empty plasmid on innate immune response.

      4) In Fig. 4, does the combination of nanoPDLIM2 and chemotherapy diminish STAT3 nuclear staining?

      NanoPDLIM2 alone decreased nuclear STAT 3 in tumor cells (Figure 2C), it also diminished nuclear STAT3 in tumor cells with the combination of chemotherapy.

    1. Author Response

      On behalf of my co-authors, I thank you very much for sending our manuscript (# eLifeRP-RA-2023-91223) entitled “Elimination of subtelomeric repeat sequences exerts little effect on telomere functions in Saccharomyces cerevisiae” for review and providing us an opportunity for revision. We also thank the reviewers for their critical and constructive comments and suggestions which have helped us to strengthen our study. We have performed more experiments to address the concerns the reviewers raised, and we have also revised or corrected some of our statements as the reviewers suggested.

      Reviewer #1

      1) The author’s data indicate that cells with many chromosomes are more dependent on possibly homologous recombination than SY12 cells with three chromosomes. Telomerase-deficient cells exhibit the type I and type II telomere structures, whereas telomerase-deficient SY12 cells often generate different telomere structures (named Type X survivors or atypical survivors). Type I survivor depends on Rad51 possessing tandem Y' elements whereas Type II survivor depends on Rad59 carrying long TG sequences (line 60-70). Both types require Rad52 (line 66-70). At the moment, it is not determined how Type X or atypical survivors are generated in telomerase-deficient SY12 cells.

      The authors need to determine whether Type X or atypical survivors depend on other repair pathways from Type I and Type II, and what DNA sequences are retained adjacent to telomeres in Type X or atypical survivors by sequencing analysis (Fig. 2).

      We thank the reviewer’s valuable comments and suggestions. Atypical survivor is a subtype of survivor that exhibits non-uniform telomere patterns, distinct from those observed in Type I, Type II, Type X, or circular survivors. To further determine its genetic requirements, we deleted RAD52 in SY12 tlc1Δ, SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ, and SY12XYΔ+Y tlc1Δ strains. Southern blotting results showed that neither Type I nor Type II survivors were found in the series of strains; circular survivor was in the predomination; beside circular survivor, some survivors exhibiting non-uniform telomere patterns suggested they were atypical survivor. These results have been presented as Figure 2—figure supplement 6B, Figure 5—figure supplement 2B and Figure 6—figure supplement 4B in the revised version. The results showed that atypical survivors still emerged when Rad52 pathway was repressed, indicating that the formation of atypical survivors does not strictly rely on the homologous recombination.

      Given that "atypical" clones exhibit non-uniform telomere patterns, it’s not surprising that their chromosome structures are variable and tanglesome. Consequently, it is hard for us to amplify and sequence the DNA sequences retained adjacent to telomeres.

      Since no Type X survivor was detected in SY12 tlc1Δ rad52Δ strain (Author response image 1A), we deleted RAD50 or RAD51 in SY12 tlc1Δ strain to investigate on which pathway the formation of the Type X survivor relied. Results showed that Type X survivor emerged in the absence of Rad51 but not Rad50, suggesting that the formation of Type X survivor depended on Rad50 pathway. These results have been presented as Figure 2—figure supplement 7.

      To determine the chromosomal end structure of the Type X survivor, we randomly selected a typical Type X survivor, and performed PCR-sequencing analysis. The results revealed the intact chromosome ends for I-L, X-R, XIII-L, XI-R, and XIV-R, albeit with some mismatches compared with the S. cerevisiae S288C genome, which possibly arising from recombination events that occurred during survivor formation. Notably, the sequence of the Y’-element in XVI-L could not be detected, while the X-element remained intact. Figure 2—figure supplement 5 in the revised manuscript.

      2) Survivor generation of each type (Type I, Type II, Type X or atypical and circularization) needs to be accurately quantitated. The authors concluded that X or Y' elements are not strictly necessary for survivor formation (Fig. 5 and Fig. 6). However, their removal appears to increase atypical survivor and chromosome circularization (Fig. 2 vs Fig. 5 and 6).

      We are grateful for the reviewer’s critical and constructive suggestions. According to the reviewer’s requirement, we quantified each type of survivors in SY12 tlc1Δ, SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ and SY12XYΔ+Y tlc1Δ strains (Figure 2D, 5C, 6A and 6B). In SY12 tlc1Δ strain, Type I survivors accounted for 16%, Type II survivors for 2%, Type X survivors for 24%, circular survivors for 20% and atypical survivors for 38%. In SY12YΔ tlc1Δ strain, 4% were Type II survivors, 52% were circular survivors and 44% were atypical survivors.

      For the SY12XYΔ tlc1Δ strain, 8% were Type II survivors, 48% were circular survivors and 44% were atypical survivors. In SY12XYΔ+Y tlc1Δ strain, the proportions of Type II, circular and atypical survivors were 14%, 44%, and 42%, respectively (Author response image 1).

      In comparing SY12YΔ with SY12XYΔ, we observed a similar ratio of circular and atypical survivors. This result indicates that the remove of X-elements exert little effect on the formation of circular and atypical survivors. Similarly, in SY12XYΔ+Y strain, the proportions of circular and atypical survivors were comparable to those in SY12XYΔ strain, indicating that Y’-elements also have little effect on the formation of circular and atypical survivors. However, due to the unknown frequency of survivor formation, alternative explanations of these data are possible. For example, subtelomeric elements previously suggested to have no impact on the formation of any survivor types might influence every type to similar extents, leading to similar ratios across all survivor types. With our present data, it is still unclear whether the absence of X and Y'-elements enhances the formation of circular and atypical survivors. Therefore, we did not present these results in the revised manuscript.

      Author response image 1.

      Quantitation of each survivor type in SY12 subtelomerice engineered strains. The ratio of survivor types in SY12 tlc1Δ, SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ and SY12XYΔ+Y tlc1Δ strains. Type I, pulper; Type II, green; Type X, gray; atypical survivor, orange; circular survivor, blue.

      3)The authors asked whether X and Y' elements are required for cell proliferation, stress response, telomere length control and telomere silencing (Fig. 4). Similar studies have been previously carried out by using synthetic chromosomes (see PMID: 28300123). The authors need to discuss this point.

      Thanks for your suggestion, we have added the information in the revised version. (p.24 line 449-453)

      4) The Fig. 7 data support that circular chromosomes do not require Ku-dependent DNA end protection. This is consistent with the current view that Ku binds and protects DNA ends. This finding by itself does not contribute significantly to our understanding of telomere maintenance. The authors need to more extensively discuss the significance of their findings in SY12 cells compared to wild-type cells with 16 chromosomes.

      We agree with the logic that this reviewer has pointed out. Our results demonstrate that combinatorial deletion of YKU70 and TLC1 caused synthetic lethality in SY12 cells, which possess three linear chromosomes, However, it did not affect the viability of "circular survivors", supporting the notion that telomere deprotection leads to the synthetic lethality in yku70Δ tlc1Δ double mutants. Nevertheless, this conclusion merely confirms the current view observed in wild-type cells that Ku binds and protects DNA ends.

      To avoid confusing readers and maintain the logical flow of the manuscript, we have deleted this section in the revised version.

      Minor issues:

      1) Line 112-113: " for SY13, which contains two chromosomes, could also have a high probability of circularizing all chromosomes for survival": The reference or the supplemental data are required.

      Thank this reviewer for the suggestion. According to the reviewer’s comments, we performed a Southern blotting assay to examine the types of survivors in SY13 tlc1Δ strain. We found that the majority of SY13 tlc1Δ clones exhibited hybridization signal similar to SY14 tlc1Δ circular survivors, pointing to the possibility that two chromosomes in these survivors may undergo intra-chromosomal fusions. This result has been added to figure 1D in the revised version.

      2) Line 349-350: The BY4742 mre11Δ haploid strain serves as a negative control. The authors need to explain why mre11 cells serve as a negative control.

      Thank this reviewer for the comment. We employed mre11Δ as negative control because Mre11 is a member of the RAD52 epistasis group, which is involved in the repair of double-stranded breaks in DNA, and mutants in MRE11 exhibit defects in the repair of DNA damages caused by DNA damage drugs (Krogh and Symington, 2004; Lewis et al., 2004; Symington, 2002). (p.23 line 420-422)

      Reviewer #2

      1) The qualification of survivor types mostly relies on molecular patterns in Southern blots. While this is a valid method for a standard strain, it might be more difficult to apply to the strains used in this study. For example, in SY8, SY11 and SY12, the telomere signal at 1-1.2 kb can be very faint due to the small number of terminal Y' elements left. As another example, for the Y'-less strain, it might seem obvious that no Type I survivor can emerge given that Y' amplification is a signature of Type I, but maybe Type-I-specific molecular mechanisms might still be used. To reinforce the characterization of survivor types, an analysis of the genetic requirements for Type I and Type II survivors (e.g. RAD51, RAD54, RAD59, RAD50) could complement the molecular characterization in specific result sections.

      We thank this reviewer for his/her constructive comments and suggestions. To investigate whether Type-I-specific molecular mechanisms are still utilized in the survivor formation in Y'-less strain, we deleted RAD51 in SY12XYΔ tlc1Δ. SY12XYΔ tlc1Δ rad51Δ strain was able to generate three types of survivors, including Type II survivor, circular survivor and atypical survivor, similar to the observations in SY12XYΔ tlc1Δ strain. However, the ratios of circular and atypical survivors were 36% and 32%, respectively, lower than the 48% and 44% observed in SY12XYΔ tlc1Δ strain (supplementary file 5). This result indicates that Type-I-specific molecular mechanisms contribute to the survivor formation. Given that our work primarily focuses on the function of subtelomeric elements, we chose not to include this result in our revised manuscript to maintain a coherent logical flow.

      To reinforce the characterization of survivor types, we deleted RAD50, RAD51 and RAD52 in SY12 tlc1Δ strain, respectively. Southern blotting assay revealed that in the absence of Rad51, no Type I survivor was detected; in the absence of Rad50, neither Type I nor Type X survivor was detected. However, circular and atypical survivors still emerged in the absence of Rad52, suggesting that the RAD52-mediated homologous recombination is not strictly necessary for the formation of circular and atypical survivors. These results have been presented as Figure 2—figure supplement 6 and Figure 2— figure supplement 7.

      2) In the title, the abstract and throughout the discussion, the authors chose to focus on the effect of X- and Y'-element deletion on different phenotypes and on survivor formation, as the main message to convey. While it is a legitimate and interesting message, other important results of this work might benefit from more spotlight. Namely, the observation that strains with different chromosome numbers show different survivor patterns and that several survival strategies beyond Type I and II exist and can reach substantial frequencies depending on the chromosomal context.

      Thanks for your valuable suggestion. While we value your suggestion to highlight additional aspects of our work, we would like to express our perspective on the current emphasis on the effect of X- and Y'-element deletion. We believe that by maintaining this focus, we can present a more coherent and impactful narrative for our readers. Additionally, we recognize that the relationship between chromosome numbers and survivor type frequencies is complex and warrants further experimental validation. We are considering exploring this aspect in more detail in our future projects. However, we fully acknowledge the importance of the observations you raised concerning strains with different chromosome numbers and the diversity of survival strategies.

      3) In SY12 strain, while X- and Y'-elements are not essential for survivor emergence, they do modulate the frequency of each type of survivors, with more chromosome circularization events observed for SY12YΔ, SY12XYΔ and SY12XYΔ+Y strains. This result should be stated and discussed, maybe alongside the change in survivor patterns in the other SY strains, to more accurately assess the roles of these subtelomeric elements.

      Following the reviewer’s suggestion, we compared the circular survivor ratios in SY12 tlc1Δ, SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ and SY12XYΔ+Y tlc1Δ strains (supplementary file 5). It appears that the formation of circular survivors is less efficient in the SY12 tlc1Δ, with a ratio of 20%, much lower than that in SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ or SY12XYΔ+Y tlc1Δ strains. However, it should be noted that SY12 tlc1Δ can generate Type I and Type X survivors, potentially decreasing the ratio of circular survivors.

      Therefore, we further compared the circular survivor ratios in SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ and SY12XYΔ+Y tlc1Δ strains. In the SY12YΔ tlc1Δ strain, circular survivors accounted for 52% (26/50), comparable to 48% (24/50) in the SY12XYΔ tlc1Δ strain, indicating that X- elements exert little effect on the formation of circular survivor. Additionally, the ratio of circular survivors was 44% (22/50) in SY12XYΔ+Y tlc1Δ strain, also comparable to 48% (24/50) in the SY12XYΔ tlc1Δ strain, suggesting that Y’-element also has little effect on chromosome circularization. However, due to the unknown frequency of survivor formation, alternative explanations of these data are possible. For example, subtelomeric elements previously suggested to have no impact on the formation of any survivor types might influence every type to similar extents, resulting in similar ratios across all survivor types. With our current data, it is still uncertain whether X and Y'-elements modulate the frequency of each type of survivors. Therefore, we did not include these results in the revised manuscript.

      4) The authors might want to update some general information about subtelomere structure and their diversity across yeast strain with the recent paper by O'Donnell et al. 2023 Nature Genetics, "Telomere-to-telomere assemblies of 142 strains characterize the genome structural landscape in Saccharomyces cerevisiae".

      Thanks for your advice. We have added this information in the revised manuscript. (p.3 line 51-54)

      5) Although it is cited in the discussion, the recent work by the Malkova lab (Kockler et al. 2021 Mol Cell) could be mentioned in the introduction as it conceptually changes our views on survivor formation, its dynamics and the categorization into Type I and Type II.

      Thanks for your advice. We have added this information in the revised manuscript. (p.5 line 75-78)

      6) p.7 line 128-130: rather than chromosome number, the ratio of survivor types might be controlled by the fraction of subtelomeres with Y'-elements and their relative configuration across chromosomes. A map of the structure of remaining subtelomeres in the SYn strains might be good to have.

      We have added this information in supplementary file 2 in the revised manuscript.

      7) Fig. 1C: in SY9 tlc1Δ, the lane with triangle mark looks like a type II.

      The hybridization pattern of SY9 tlc1Δ clone 2 has both amplified Y’L-element and long heterogeneous TG1-3 repeats, it might be the “hybrid” survivor mentioned by Kockler’s work (Kockler et al., 2021). Therefore, we classify it as a no-classical survivor.

      8) p.9 line 149: the title of this result section "Y'-element is not essential for the viability of cells carrying linear chromosomes" doesn't reflect well the content of the section, which is more about characterizing the survivor pattern in SY12.

      Thanks for your advice. We have changed the title of this section into “Characterizing the survivor pattern in SY12” in the revised manuscript. (p.9 line 155)

      9) p.10 line 167: that type I can emerge in SY12 indicates that multiple Y'-elements in tandem are not required for type I recombination. I am not sure if this was already known, but it could be noted.

      We appreciate the reviewer’s comment. We have added this information in the revised manuscript. (p.10 175-177)

      10) p.18 line 318-320: the deletion of the Y' element also seems to remove the centromere-proximal telomere sequence adjacent to it. Maybe it should be stated as well. Even more importantly, in lines 327-329, the Y'-element that is reintroduced in the strain does not include the centromere-proximal short telomere sequence. This is important to interpret the Southern blots.

      We thank the reviewer for this critical suggestion. The deletion of Y'-element including both Y’- and X- element sequence in XVI-L (supplementary file 4), and the Y’element in the XVI-L does not contain the centromere-proximal telomere sequence. The Y'-element reintroduced into the left arm of Chr 3 in SY12XYΔ strain was cloned from native left arm of XVI in SY12 strain which does not contain the centromere-proximal short telomere sequence. Besides listing these details in supplementary file 4, we also emphasize it in the revised manuscript (p.21 line 397-398).

      11) p.29 lines 496-497: it seems that X and Y'-elements tend to inhibit formation of circular survivors either directly (by participating in end protection), or by promoting type I and type II, thus reducing the fraction of circular survivors. Maybe this could be added to the conclusion of this section.

      We thank the reviewer for his/her comments and have analyzed survivor types in SY12 tlc1Δ, SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ and SY12XYΔ+Y tlc1Δ strains (supplementary file 5). Circular survivor formation appears less efficient in the SY12 tlc1Δ, with a ratio of 20%, significantly lower than SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ or SY12XYΔ+Y tlc1Δ strains. However, it is noteworthy that SY12 tlc1Δ can generate Type I and Type X survivors, potentially impacting the circular survivor ratio.

      We further compared circular survivor ratios in SY12YΔ tlc1Δ, SY12XYΔ tlc1Δ and SY12XYΔ+Y tlc1Δ strains. SY12YΔ tlc1Δ had 52% circular survivors, similar to SY12XYΔ tlc1Δ with 48%, indicating minimal impact of X- elements. Additionally, SY12XYΔ+Y tlc1Δ had 44% circular survivors, also similar to SY12XYΔ tlc1Δ, suggesting that Y’-element has little effect on chromosome circularization. However, due to unknown frequency of survivor formation, alternative explanations, like subtelomeric elements affecting all the type of survivor similarly, are possible. With our current data, it remains unclear whether X and Y'-elements are involved in end protection and consequently inhibit the formation of circular survivors.

      Therefore, these results were not included in the revised manuscript.

      12) p.32 line 533: this result section doesn't really fit with the rest of the paper, does it?

      Thanks for your valuable advice. To avoid confusing readers and to keep the fluency of logic flow of the manuscript we have deleted this section in the revised version.

      13) The methods section does not describe the experiments sufficiently and it often lacks specific details such as the manufacturer or references. Some sections of the methods are more exhaustive than others. They should all be written with the same level of detail in my opinion.

      Thanks for your advice. We have described the experiments more sufficiently and added the manufacturer or references in the ‘materials and methods’ part in the revised manuscript. (p.41 line741-745, p.42 line 755-756, p.42 line 762-770, p.43 line 788 and p.45 line 812-813)

      Minor comments, typos and grammatical errors:

      p.3 line 33: "INTROUDUCTION" should be "INTRODUCTION".

      We have corrected it in the revised manuscript. (p.3 line 33) p.4 line 54: "S, cerevisiae", use dot instead of comma. R15: We have corrected it in the revised manuscript. (p.4 line 57)

      p.4 line 55: I believe TLC1 as the RNA moiety should be in (non-italicized) capital letters and not written as a protein.

      We have corrected it in the revised manuscript. (p.4 line 58)

      p.7 line 115: please indicate that pRS316 uses URA3 as a marker, otherwise the counterselection with 5'-FOA is not obvious.

      Thank this reviewer for the comment. We have added this statement in the revised manuscript. (p.7 line 121-122)

      p.12 line 206: tlc1Δ should be in italic.

      We have corrected it in the revised manuscript. (p.10 line 184)

      p.13 lines 227-229: "where only one hybridization signal", a verb seems to be missing.

      We thank the reviewer’s kind reminder and have corrected the mentioned errors in the revised manuscript. (p.14 line 254-255)

      Reviewer #3

      1) A weakness of the manuscript is the analysis of telomere transcriptional silencing. They state: "The results demonstrated a significant increase in the expression of the MPH3 and HSP32 upon Sir2 deletion, indicating that telomere silencing remains effective in the absence of X and Y'-elements". However, there are no statistical analyses performed as far as I can see. For some of the strains, the significance of the increased expression in sir2 (especially for MPH3) looks questionable. In addition, a striking observation is that the SY12 strain (with only three chromosomes) express much less of both MPH3 and HSP32 than the parental strain BY4742 (16 chromosomes), both in the presence and absence of Sir2. In fact, the expression of both MPH3 and HSP32 in the SY12 sir2 strain is lower than in the BY4742 SIR2+ strain. In addition, relating this work to previous studies of subtelomeric sequences in other organisms would make the discussion more interesting.

      First, I enjoyed reading your manuscript. It would be great if you performed the statistical analysis on the RT-qPCR data in figure 4B and addressed the issue of the difference of the BY4742 and SY12 strains. A model could be that this is a titration effect of silencing proteins due to fewer telomeres, which could be investigated by performing the analyses on more SY-strains with variable numbers of telomeres.

      We highly appreciate the reviewer’s valuable comments and suggestions, which included a point that has also left us confused. We conducted statistical analyses on the RT-qPCR data, and the t-test result revealed that upon the deletion of Sir2, SY12YΔ, SY12XYΔ and SY12XYΔ+Y strains exhibited a significant increase in MPH3 expression (located on the right arm of chr X) with a P value < 0.05. In the case of SY12, the deletion of Sir2 resulted in an increase in gene expression (P value < 0.1). Similar tendencies were observed in the BY4742 strain. The statistical analyses of RTqPCR results on XVI-L mirrored those of X-R.

      The results demonstrated a significant increase in MPH3 and HSP32 expression upon SIR2 deletion in SY12YΔ, SY12XYΔ and SY12XYΔ+Y strains, leading to the conclusion that telomere silencing remains effective in the absence of X-and Y’-elements. However, as the reviewer has pointed out, no statistically significant differences in MPH3 and HSP32 expression were observed between the SY12 and SY12 sir2Δ strain. For HSP32, this lack of significance may be attributed to the greater distance between HSP32 and telomere XVI-L in SY12 compared to SY12YΔ, SY12XYΔ or SY12XYΔ+Y strains, resulting in a weaker telomere position effect on HSP32 and a non-significant increase in gene expression in SY12. However, this explanation does not apply to MPH3, as SY12YΔ, with a same distance between MPH3 and telomere X-R as in SY12, still exhibits an effective telomere position effect on MPH3. We cannot provide a compelling explanation at this moment, and we suspect that the lack of statistically significant differences may be due to random clonal variation.

      Additionally, the SY12 strain (with three chromosomes) exhibited lower expression levels of both MPH3 and HSP32 compared to the parental strain BY4742 (with 16 chromosomes). Notably, it has been reported that the expression of genes coding silencing proteins in SY14 (with one chromosomes) were nearly identical to that of BY4742 (with 16 chromosomes)(Shao et al., 2018). Consequently, with respect to the reduced chromosome numbers, the silencing proteins appeared to be relatively overexpressed. Therefore, as pointed out by the reviewer, this observed phenomenon may be attributed to a titration effect of silencing proteins due to fewer telomeres. We have added the statistical analyses result in Figure 4B.

      We have related our work with previous studies of subtelomeric sequences in fission yeast in the discussion part. (p.37 line 655-676)

      Minor points are to correct the figure legend for Figure 6 supplement 1 (the strain designations) and line 55, RNAs are written with all caps, i.e. TLC1, and line 537 delete the "which" in the sentence.

      Thanks for your advice. We have corrected them in the revised manuscript.

      1) The strain has been replaced with SY12XYΔ+Y (p.35 line 617, 618 and 620)

      2) “Tlc1” has been replaced with “TLC1” (p.4 line 58).

      3) We have deleted the section of “Circular chromosome maintain stable when double knockout of yku70 and tlc1” according to the suggestions raised by reviewer 1 and 2, the deleted section contain the sentence in line 537 you mentioned.

      Kockler, Z.W., Comeron, J.M., and Malkova, A. (2021). A unified alternative telomerelengthening pathway in yeast survivor cells. Molecular Cell 81, 1816-1829.e1815. Krogh, B.O., and Symington, L.S. (2004). Recombination proteins in yeast. Annu Rev Genet 38, 233-271.

      Lewis, L.K., Storici, F., Van Komen, S., Calero, S., Sung, P., and Resnick, M.A. (2004). Role of the nuclease activity of Saccharomyces cerevisiae Mre11 in repair of DNA double-strand breaks in mitotic cells. Genetics 166, 1701-1713.

      Shao, Y., Lu, N., Wu, Z., Cai, C., Wang, S., Zhang, L.L., Zhou, F., Xiao, S., Liu, L., Zeng, X., et al. (2018). Creating a functional single-chromosome yeast. Nature 560, 331-335. Symington, L.S. (2002). Role of RAD52 epistasis group genes in homologous recombination and double-strand break repair. Microbiol Mol Biol Rev 66, 630-670, table of contents.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study reports on the potential of neural networks to emulate simulations of human ventricular cardiomyocyte action potentials for various ion channel parameters with the advantage of saving simulation time in certain conditions. The evidence supporting the claims of the authors is solid, although the inclusion of open analysis of drop-off accuracy and validation of the neural network emulators against experimental data would have strengthened the study. The work will be of interest to scientists working in cardiac simulation and quantitative pharmacology.

      Thank you for the kind assessment. It is important for us to point out that, while limited, experimental validation was performed in this study and is thoroughly described in the work.

      Reviewer 1 - Comments

      This manuscript describes a method to solve the inverse problem of finding the initial cardiac activations to produce a desired ECG. This is an important question. The techniques presented are novel and clearly demonstrate that they work in the given situation. The paper is well-organized and logical.

      Strengths:

      This is a well-designed study, which explores an area that many in the cardiac simulation community will be interested in. The article is well written and I particularly commend the authors on transparency of methods description, code sharing, etc. - it feels rather exemplary in this regard and I only wish more authors of cardiac simulation studies took such an approach. The training speed of the network is encouraging and the technique is accessible to anyone with a reasonably strong GPU, not needing specialized equipment.

      Weaknesses:

      Below are several points that I consider to be weaknesses and/or uncertainties of the work:

      C I-(a) I am not convinced by the authors’ premise that there is a great need for further acceleration of cellular cardiac simulations - it is easy to simulate tens of thousands of cells per day on a workstation computer, using simulation conditions similar to those of the authors. I do not really see an unsolved task in the field that would require further speedup of single-cell simulations. At the same time, simulations offer multiple advantages, such as the possibility to dissect mechanisms of the model behaviour, and the capability to test its behaviour in a wide array of protocols - whereas a NN is trained for a single purpose/protocol, and does not enable a deep investigation of mechanisms. Therefore, I am not sure the cost/benefit ratio is that strong for single-cell emulation currently.

      An area that is definitely in need of acceleration is simulations of whole ventricles or hearts, but it is not clear how much potential for speedup the presented technology would bring there. I can imagine interesting applications of rapid emulation in such a setting, some of which could be hybrid in nature (e.g. using simulation for the region around the wavefront of propagating electrical waves, while emulating the rest of the tissue, which is behaving more regularly/predictable, and is likely to be emulated well), but this is definitely beyond of the scope of this article.

      Thank you for this point of view. Simulating a population of few thousand cells is completely feasible on single desktop machines and for fixed, known parameters, emulation may not fill ones need. Yet we still foresee a great untapped potential for rapid evaluations of ionic models, such as for the gradient-based inverse problem, presented in the paper. Such inverse optimization requires several thousand evaluations per cell and thus finding maximum conductances for the presented experimental data set (13 cell pairs control/drug → 26 APs) purely through simulations would require roughly a day of simulation time even in a very conservative estimation (3.5 seconds per simulation, 1000 simulations per optimization). Additionally, the emulator provides local sensitivity information between the AP and maximum conductances in the form of the gradient, which enables a whole new array of efficient optimization algorithms [Beck, 2017]. To further emphasize these points, we added the number of emulations and runtime of each conducted experiment in the specific section and a paragraph in the discussion that addresses this point:

      "Cardiomyocyte EP models are already very quick to evaluate in the scale of seconds (see Section 2.3.1), but the achieved runtime of emulations allows to solve time consuming simulation protocols markedly more efficient. One such scenario is the presented inverse maximum conductance estimation problem (see Section 3.1.2 and Section 3.1.3), where for estimating maximum conductances of a single AP, we need to emulate the steady state AP at least several hundred times as part of an optimization procedure. Further applications include the probabilistic use of cardiomyocyte EP models with uncertainty quantification [Chang et al., 2017, Johnstone et al., 2016] where thousands of samples of parameters are potentially necessary to compute a distribution of the steady-state properties of subsequent APs, and the creation of cell populations [Muszkiewicz et al., 2016, Gemmell et al., 2016, Britton et al., 2013]." (Section 4.2)

      We believe that rapid emulations are valuable for several use-cases, where thousands of evaluations are necessary. These include the shown inverse problem, but similarly arise in uncertainty quantification, or cardiomyocyte population creation. Similarly, new use-cases may arise as such efficient tools become available. Additionally, we provided the number of evaluations along with the runtimes for each of the conducted experiments, showing how essential these speedups are to realize these experiments in reasonable timeframes. Utilizing these emulations in organ-level electrophysiological models is a possibility, but the potential problems in such scenarios are much more varied and depend on a number of factors, making it hard to pin-point the achievable speed-up using ionic emulations.

      C I-(b) The authors run a cell simulation for 1000 beats, training the NN emulator to mimic the last beat. It is reported that the simulation of a single cell takes 293 seconds, while emulation takes only milliseconds, implying a massive speedup. However, I consider the claimed speedup achieved by emulation to be highly context-dependent, and somewhat too flattering to the presented method of emulation. Two specific points below:

      First, it appears that a not overly efficient (fixed-step) numerical solver scheme is used for the simulation. On my (comparable, also a Threadripper) CPU, using the same model (”ToR-ORd-dyncl”), but a variable step solver ode15s in Matlab, a simulation of a cell for 1000 beats takes ca. 50 seconds, rather than 293 of the authors. This can be further sped up by parallelization when more cells than available cores are simulated: on 32 cores, this translates into ca. 2 seconds amortized time per cell simulation (I suspect that the NN-based approach cannot be parallelized in a similar way?). By amortization, I mean that if 32 models can be simulated at once, a simulation of X cells will not take X50 seconds, but (X/32)50. (with only minor overhead, as this task scales well across cores).

      Second, and this is perhaps more important - the reported speed-up critically depends on the number of beats in the simulation - if I am reading the article correctly, the runtime compares a simulation of 1000 beats versus the emulation of a single beat. If I run a simulation of a single beat across multiple simulated cells (on a 32-core machine), the amortized runtime is around 20 ms per cell, which is only marginally slower than the NN emulation. On the other hand, if the model was simulated for aeons, comparing this to a fixed runtime of the NN, one can get an arbitrarily high speedup.

      Therefore, I’d probably emphasize the concrete speedup less in an abstract and I’d provide some background on the speedup calculation such as above, so that the readers understand the context-dependence. That said, I do think that a simulation for anywhere between 250 and 1000 beats is among the most reasonable points of comparison (long enough for reasonable stability, but not too long to beat an already stable horse; pun with stables was actually completely unintended, but here it is...). I.e., the speedup observed is still valuable and valid, albeit in (I believe) a somewhat limited sense.

      We agree that the speedup comparison only focused on a very specific case and needs to be more thoroughly discussed and benchmarked. One of the main strengths of the emulator is to cut the time of prepacing to steady state, which is known to be a potential bottleneck for the speed of the single-cell simulations. The time it takes to reach the steady state in the simulator is heavily dependant on the actual maximum conductance configuration and the speed-up is thus heavily reliant on a per-case basis. The differences in architecture of the simulator and emulator further makes direct comparisons very difficult. In the revised version we now go into more detail regarding the runtime calculations and also compare it to an adaptive time stepping simulation (Myokit [Clerx et al., 2016]) in a new subsection:

      "The simulation of a single AP (see Section 2.1) sampled at a resolution of 20kHz took 293s on one core of a AMD Ryzen Threadripper 2990WX (clock rate: 3.0GHz) in CARPentry. Adaptive timestep solver of variable order, such as implemented in Myokit [Clerx et al., 2016], can significantly lower the simulation time (30s for our setup) by using small step sizes close to the depolarization (phase 0) and increasing the time step in all other phases. The emulation of a steady state AP sampled at a resolution of 20kHz for t ∈ [−10, 1000]ms took 18.7ms on a AMD Ryzen 7 3800X (clock rate: 3.9GHz) and 1.2ms on a Nvidia A100 (Nvidia Corporation, USA), including synchronization and data copy overhead between CPU and GPU.

      "The amount of required beats to reach the steady state of the cell in the simulator has a major impact on the runtime and is not known a-priori. On the other hand, both simulator and emulator runtime linearly depends on the time resolution, but since the output of the emulator is learned, the time resolution can be chosen at arbitrarily without affecting the AP at the sampled times. This makes direct performance comparisons between the two methodologies difficult. To still be able to quantify the speed-up, we ran Myokit using 100 beats to reach steady state, taking 3.2s of simulation time. In this scenario, we witnessed a speed-up of 171 and 2 · 103 of our emulator on CPU and GPU respectively (again including synchronization and data copy overhead between CPU and GPU in the latter case). Note that both methods are similarly expected to have a linear parallelization speedup across multiple cells.

      For the inverse problem, we parallelized the problem for multiple cells and keep the problem on the GPU to minimize the overhead, achieving emulations (including backpropagation) that run in 120µs per AP at an average temporal resolution of 2kHz. We consider this the peak performance which will be necessary for the inverse problem in Section 3.1.2." (Section 2.3.1)

      Note that the mentioned parallelization across multiple machines/hardware applies equally to the emulator and simulator (linear speed-up), though the utilization for single cells is most likely different (single vs. multi-cell parallelization).

      C I-(c) It appears that the accuracy of emulation drops off relatively sharply with increasing real-world applicability/relevance of the tasks it is applied to. That said, the authors are to be commended on declaring this transparently, rather than withholding such analyses. I particularly enjoyed the discussion of the not-always amazing results of the inverse problem on the experimental data. The point on low parameter identifiability is an important one and serves as a warning against overconfidence in our ability to infer cellular parameters from action potentials alone. On the other hand, I’m not that sure the difference between small tissue preps and single cells which authors propose as another source of the discrepancy will be that vast beyond the AP peak potential (probably much of the tissue prep is affected by the pacing electrode?), but that is a subjective view only. The influence of coupling could be checked if the simulated data were generated from 2D tissue samples/fibres, e.g. using the Myokit software.

      Given the points above (particularly the uncertain need for further speedup compared to running single-cell simulations), I am not sure that the technology generated will be that broadly adopted in the near future.

      However, this does not make the study uninteresting in the slightest - on the contrary, it explores something that many of us are thinking about, and it is likely to stimulate further development in the direction of computationally efficient emulation of relatively complex simulations.

      We agree that the parameter identifiability is an important point of discussion. While the provided experimental data gave us great insights already, we still believe that given the differences in the setup, we can not draw conclusions about the source of inaccuracies with absolute certainty. The suggested experiment to test the influence of coupling is of interest for future works and has been integrated into the discussion. Further details are given in the response to the recommendation R III- (t)

      Reviewer 2 - Comments

      Summary:

      This study provided a neural network emulator of the human ventricular cardiomyocyte action potential. The inputs are the corresponding maximum conductances and the output is the action potential (AP). It used the forward and inverse problems to evaluate the model. The forward problem was solved for synthetic data, while the inverse problem was solved for both synthetic and experimental data. The NN emulator tool enables the acceleration of simulations, maintains high accuracy in modeling APs, effectively handles experimental data, and enhances the overall efficiency of pharmacological studies. This, in turn, has the potential to advance drug development and safety assessment in the field of cardiac electrophysiology.

      Strengths:

      1) Low computational cost: The NN emulator demonstrated a massive speed-up of more than 10,000 times compared to the simulator. This substantial increase in computational speed has the potential to expedite research and drug development processes

      2) High accuracy in the forward problem: The NN emulator exhibited high accuracy in solving the forward problem when tested with synthetic data. It accurately predicted normal APs and, to a large extent, abnormal APs with early afterdepolarizations (EADs). High accuracy is a notable advantage over existing emulation methods, as it ensures reliable modeling and prediction of AP behavior

      C II-(a) Input space constraints: The emulator relies on maximum conductances as inputs, which explain a significant portion of the AP variability between cardiomyocytes. Expanding the input space to include channel kinetics parameters might be challenging when solving the inverse problem with only AP data available.

      Thank you for this comment. We consider this limitation a major drawback, as discussed in Section 4.3. Identifiability is already an issue when only considering the most important maximum conductances. Further extending the problem to include kinetics will most likely only increase the difficulty of the inverse problem. For the forward problem though, it might be of interest to people studying ionic models to further analyze the effects of channel kinetics.

      C II-(b) Simplified drug-target interaction: In reality, drug interactions can be time-, voltage-, and channel statedependent, requiring more complex models with multiple parameters compared to the oversimplified model that represents the drug-target interactions by scaling the maximum conductance at control. The complex model could also pose challenges when solving the inverse problem using only AP data.

      Thank you pointing out this limitation. We slightly adapted Section 4.3 to further highlight some of these limitations. Note however that the experimental drugs used have been shown to be influenced by this drug interaction in varying degrees [Li et al., 2017] (e.g. dofetilide vs. cisapride). However, the discrepancy in identifiability was mostly channel-based (0%-100%), whereas the variation in identifiability between drugs was much lower (39%-66%).

      C II-(c) Limited data variety: The inverse problem was solved using AP data obtained from a single stimulation protocol, potentially limiting the accuracy of parameter estimates. Including AP data from various stimulation protocols and incorporating pacing cycle length as an additional input could improve parameter identifiability and the accuracy of predictions.

      The proposed emulator architecture currently only considers the discussed maximum conductances as input and thus can only compensate when using different stimulation protocols. However, the architecture itself does not prohibit including any of these as parameters for future variants of the emulator. We potentially foresee future works extending on the architecture with modified datasets to include other parameters of importance, such as channel kinetics, stimulation protocols and pacing cycle lengths. These will however vary between the actual use-cases one is interested in.

      C II-(d) Larger inaccuracies in the inverse problem using experimental data: The reasons for this result are not quite clear. Hypotheses suggest that it may be attributed to the low parameter identifiability or the training data set were collected in small tissue preparation.

      The low parameter identifiability on some channels (e.g. GK1) poses a problem, for which we state multiple potential reasons. As of yet, no final conclusion can be drawn, warranting further research in this area.

      Reviewer 3 - Comments

      Summary:

      Grandits and colleagues were trying to develop a new tool to accelerate pharmacological studies by using neural networks to emulate the human ventricular cardiomyocyte action potential (AP). The AP is a complex electrical signal that governs the heartbeat, and it is important to accurately model the effects of drugs on the AP to assess their safety and efficacy. Traditional biophysical simulations of the AP are computationally expensive and time-consuming. The authors hypothesized that neural network emulators could be trained to predict the AP with high accuracy and that these emulators could also be used to quickly and accurately predict the effects of drugs on the AP.

      Strengths:

      One of the study’s major strengths is that the authors use a large and high-quality dataset to train their neural network emulator. The dataset includes a wide range of APs, including normal and abnormal APs exhibiting EADs. This ensures that the emulator is robust and can be used to predict the AP for a variety of different conditions.

      Another major strength of the study is that the authors demonstrate that their neural network emulator can be used to accelerate pharmacological studies. For example, they use the emulator to predict the effects of a set of known arrhythmogenic drugs on the AP. The emulator is able to predict the effects of these drugs, even though it had not been trained on these drugs specifically.

      C III-(a) One weakness of the study is that it is important to validate neural network emulators against experimental data to ensure that they are accurate and reliable. The authors do this to some extent, but further validation would be beneficial. In particular for the inverse problem, where the estimation of pharmacological parameters was very challenging and led to particularly large inaccuracies.

      Thank you for this recommendation. Further experimental validation of the emulator in the context of the inverse problem would be definitely beneficial. Still, an important observation is that the identifiability varies greatly between channels. While the inverse problem is an essential reason for utilizing the emulator, it is also empirically validated for the pure forward problem and synthetic inverse problem, together with the (limited) experimental validation. The sources of problems arising in estimating the maximum conductances of the experimental tissue preparations are important to discuss in future works, as we now further emphasize in the discussion. See also the response to the recommendations R III-(t).

      Reviewer 1 - Recommendations

      R I-(a) Could further detail on the software used for the emulation be provided? E.g. based on section 2.2.2, it sounds like a CPU, as well as GPU-based emulation, is possible, which is neat.

      Indeed as suspected, the emulator can run on both CPUs and GPUs and features automatic parallelization (per-cell, but also multi-cell), which is enabled by the engineering feats of PyTorch [Paszke et al., 2019]. This is now outlined in a bit more detail in Sec. 2 and 5.

      "The trained emulator is provided as a Python package, heavily utilizing PyTorch [Paszke et al., 2019] for the neural network execution, allowing it to be executed on both CPUs and NVidia GPUs." (Section 5)

      R I-(b) I believe that a potential use of NN emulation could be also in helping save time on prepacing models to stability - using the NN for ”rough” prepacing (e.g. 1000 beats), and then running a simulation from that point for a smaller amount of time (e.g. 50 beats). One could monitor the stability of states, so if the prepacing was inaccurate, one could quickly tell that these models develop their state vector substantially, and they should be simulated for longer for full accuracy - but if the model was stable within the 50 simulated beats, it could be kept as it is. In this way, the speedup of the NN and accuracy and insightfulness of the simulation could be combined. However, as I mentioned in the public review, I’m not sure there is a great need for further speedup of single-cell simulations. Such a hybrid scheme as described above might be perhaps used to accelerate genetic algorithms used to develop new models, where it’s true that hundreds of thousands to millions of cells are eventually simulated, and a speedup there could be practical. However one would have to have a separate NN trained for each protocol in the fitness function that is to be accelerated, and this would have to be retrained for each explored model architecture. I’m not sure if the extra effort would be worth it - but maybe yes to some people.

      Thank you for this valuable suggestion. As pointed out in C I-(a), one goal of this study was to reduce the timeconsuming task of prepacing. Still, in its current form the emulator could not be utilized for prepacing simulators, as only the AP is computed by the emulator. For initializing a simulation at the N-th beat, one would additionally need all computed channel state variables. However, a simple adaptation of the emulator architecture would allow to also output the mentioned state variables.

      R I-(c) Re: ”Several emulator architectures were tried on the training and validation data sets and the final choice was hand-picked as a good trade-off between high accuracy and low computational cost” - is it that the emulator architecture was chosen early in the development, and the analyses presented in the paper were all done with one previously selected architecture? Or is it that the analyses were attempted with all considered architectures, and the well-performing one was chosen? In the latter case, this could flatter the performance artificially and a test set evaluation would be worth carrying out.

      We apologize for the unclear description of the architectural validation. The validation was in fact carried out with 20% of the training data (data set #1), which is however completely disjoint with the test set (#2, #3, #4, formerly data set #1 and #2) on which the evaluation was presented. To further clarify the four different data sets used in the study, we now dedicated an additional section to describing each set and where it was used (see also our response below R I-(d)), and summarize them in Table 1, which we also added at R II-(a). The cited statement was slightly reworked.

      "Several emulator architectures were tried on the training and validation data sets and the final choice was hand-picked as a good trade-off between high accuracy on the validation set (#1) and low computational runtime cost." (Section 2.2.2)

      R I-(d) When using synthetic data for the forward and inverse problem, with the various simulated drugs, is it that split of the data into training/validation test set was done by the drug simulated (i.e., putting 80 drugs and the underlying models in the training set, and 20 into test set)? Or were the data all mixed together, and 20% (including drugs in the test set) were used for validation? I’m slightly concerned by the potential of ”soft” data leaks between training/validation sets if the latter holds. Presumably, the real-world use case, especially for the inverse problem, will be to test drugs that were not seen in any form in the training process. I’m also not sure whether it’s okay to reuse cell models (sets of max conductances) between training and validation tests - wouldn’t it be better if these were also entirely distinct? Could you please comment on this?

      We completely agree with the main points of apprehension that training, validation and test sets all serve a distinct purpose and should not be arbitrarily mixed. However, this is only a result of the sub-optimal description of our datasets, which we heavily revised in Section 2.2.1 (Data, formerly 2.3.1). We now present the data using four distinct numbers: The initial training/validation data, now called data set #1 (formerly no number), is split 80%/20% into training and validation sets (for architectural choices) respectively. The presented evaluations in Section 2.3 (Evaluation) are purely performed on data set #2 (normal APs, formerly #1), #3 (EADs, formerly #2) and #4 (experimental).

      R I-(e) For the forward problem on EADs, I’m not sure if the 72% accuracy is that great (although I do agree that the traces in Fig 12-left also typically show substantial ICaL reactivation, but this definitely should be present, given the IKr and ICaL changes). I would suggest that you also consider the following design for the EAD investigation: include models with less severe upregulation of ICaL and downregulation of IKr, getting a population of models where a part manifests EADs and a part does not. Then you could run the emulator on the input data of this population and be able to quantify true, falsexpositive, negative detections. I think this is closer to a real-world use case where we have drug parameters and a cell population, and we want to quickly assess the arrhythmic risk, with some drugs being likely entirely nonrisky, some entirely risky, and some between (although I still am not convinced it’s that much of an issue to just simulate this in a couple of thousands of cells).

      Thank you for pointing out this alternative to address the EAD identification task. Even though the values chosen in Table 2 seem excessively large, we still only witnessed EADs in 171 of the 950 samples. Especially border cases, which are close to exhibiting EADs are hardest to estimate for the NN emulator. As suggested, we now include the study with the full 950 samples (non-EAD & EAD) and classify the emulator AP into one of the labels for each sample. The mentioned 72.5% now represent the sensitivity, whereas our accuracy in such a scenario becomes 90.8% (total ratio of correct classifications):

      "The data set #3 was used second and Appendix C shows all emulated APs, both containing the EAD and non-EAD cases. The emulation of all 950 APs took 0.76s on the GPU specified in Section 2.2.3 We show the emulation of all maximum conductances and the classification of the emulation. The comparison with the actual EAD classification (based on the criterion outlined in Appendix A) results in true-positive (EAD both in the simulation and emulation), false-negative (EAD in the simulation, but not in the emulation), false-positive (EAD in the emulation, but not in the simulation) and true-negative (no EAD both in the emulation and simulation). The emulations achieved 72.5% sensitivity (EAD cases correctly classified) and 94.9% specificity (non-EAD cases correctly classified), with an overall accuracy of 90.8% (total samples correctly classified). A substantial amount of wrongly classified APs showcase a notable proximity to the threshold of manifesting EADs. Figure 7 illustrates the distribution of RMSEs in the EAD APs between emulated and ground truth drugged APs. The average RMSE over all EAD APs was 14.5mV with 37.1mV being the maximum. Largest mismatches were located in phase 3 of the AP, in particular in emulated APs that did not fully repolarize." (Section 3.1.1)

      R I-(f) Figure 1 - I think a large number of readers will understand the mathematical notation describing inputs/outputs; that said, there may be a substantial number of readers who may find that hard to read (e.g. lab-based researchers, or simulation-based researchers not familiar with machine learning). At the same time, this is a very important part of the paper to explain what is done where, so I wonder whether using words to describe the inputs/outputs would not be more practical and easier to understand (e.g. ”drug-based conductance scaling factor” instead of ”s” ?). It’s just an idea - it needs to be tried to see if it wouldn’t make the figure too cluttered.

      We agree that the mathematical notation may be confusing to some readers. As a compromise between using verbose wording and mathematical notation, we introduced a legend in the lower right corner of the figure that shortly describes the notation in order to help with interpreting the figure.

      R I-(g) ”APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000 ms were excluded” - I’m not sure I understand what exactly you mean here - could you clarify?

      With this criterion, we try to discard data that is far away from fully repolarizing within the given time frame, which applies to 116 APs in data set #1 and 50 APs in data set #3. We added a small side note into the text:

      "APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000ms (indicative of an AP that is far away from full repolarization) were excluded." (Section 2.2.1)

      R I-(h) Speculation (for the future) - it looks like a tool like this could be equally well used to predict current traces, as well as action potentials. I wonder, would there be a likely benefit in feeding back the currents-traces predictions on the input of the AP predictor to provide additional information? Then again, this might be already encoded within the network - not sure.

      Although not possible with the chosen architecture (see also R I-(b)), it is worth thinking about an implementation in future works and to study differences to the current emulator.

      Entirely minor points:

      R I-(i) ”principle component analysis” → principal component analysis

      Fixed

      R I-(j) The paper will be probably typeset by elife anyway, but the figures are often quite far from their sections, with Results figures even overflowing into Discussion. This can be often fixed by using the !htb parameters (\begin{figure}[!htb]), or potentially by using ”\usepackage[section]{placeins}” and then ”\FloatBarrier” at the start and end of each section (or subsection) - this prevents floating objects from passing such barriers.

      Thank you for these helpful suggestions. We tried reducing the spacing between the figures and their references in the text, hopefully improving the reader’s experience.

      R I-(k) Alternans seems to be defined in Appendix A (as well as repo-/depolarization abnormalities), but is not really investigated. Or are you defining these just for the purpose of explaining what sorts of data were also included in the data?

      We defined alternans since this was an exclusion criterion for generating simulation data.

      Reviewer 2 - Recommendations

      R II-(a) Justification for methods selection: Explain the rationale behind important choices, such as the selection of specific parameters and algorithms.

      Thank you for this recommendation, we tried to increase transparency of our choices by introducing a separate data section that summarizes all data sets and their use cases in Section 2.2.1 and also collect many of the explanations there. Additionally we added an overview table (Table 1) of the utilized data.

      Author response table 1.

      Table 1: Summary of the data used in this study, along with their usage and the number of valid samples. Note that each AP is counted individually, also in cases of control/drug pairs.

      R II-(b) Interpretation of the evaluation results: After presenting the evaluation results, consider interpretations or insights into what the results mean for the performance of the emulator. Explain whether the emulator achieved the desired accuracy or compare it with other existing methods. In the revised version, we tried to further expand the discussion on possible applications of our emulator (Section 4.2). See also our response to C I-(a). To the best of our knowledge, there are currently no out-of-the-box methods available for directly comparing all experiments we considered in our work.

      Reviewer 3 - Recommendations

      R III-(a) In the introduction (Page 3) and then also in the 2.1 paragraph authors speak about the ”limit cycle”: Do you mean steady state conditions? In that case, it is more common to use steady state.

      When speaking about the limit cycle, we refer to what is also sometimes called the steady state, depending on the field of research and/or personal preference. We now mention both terms at the first occurence, but stick with the limit cycle terminology which can also be found in other works, see e.g. [Endresen and Skarland, 2000].

      R III-(b) On page 3, while comparing NN with GP emulators, I still don’t understand the key reason why NN can solve the discontinuous functions with more precision than GP.

      The potential problems in modeling sharp continuities using GPs is further explained in the referenced work [Ghosh et al., 2018] and further references therein:

      "Statistical emulators such as Gaussian processes are frequently used to reduce the computational cost of uncertainty quantification, but discontinuities render a standard Gaussian process emulation approach unsuitable as these emulators assume a smooth and continuous response to changes in parameter values [...] Applying GPs to model discontinuous functions is largely an open problem. Although many advances (see the discussion about non-stationarity in [Shahriari et al., 2016] and the references in there) have been made towards solving this problem, a common solution has not yet emerged. In the recent GP literature there are two specific streams of work that have been proposed for modelling non-stationary response surfaces including those with discontinuities. The first approach is based on designing nonstationary processes [Snoek et al., 2014] whereas the other approach attempts to divide the input space into separate regions and build separate GP models for each of the segmented regions. [...]"([Ghosh et al., 2018])

      We integrated a short segment of this explanation into Section 1.

      R III-(c) Why do authors prefer to use CARPentry and not directly openCARP? The use of CARPentry is purely a practical choice since the simulation pipeline was already set up. As we now point out however in Sec. 2.1 (Simulator), simulations can also be performed using any openly available ionic simulation tool, such as Myokit [Clerx et al., 2016], OpenCOR [Garny and Hunter, 2015] and openCARP [Plank et al., 2021]. We emphasized this in the text.

      "Note, that the simulations can also be performed using open-source software such as Myokit [Clerx et al., 2016], OpenCOR [Garny and Hunter, 2015] and openCARP [Plank et al., 2021]." (Section 2.1)

      R III-(d) In paragraph 2.1:

      (a) In this sentence: ”Various solver and sampling time steps were applied to generate APs and the biomarkers used in this study (see Appendix A)” this reviewer suggests putting the Appendix reference near “biomarkers”. In addition, a figure that shows the test of various solver vs. sampling time steps could be interesting and can be added to the Appendix as well.

      (b) Why did the authors set the relative difference below 5% for all biomarkers? Please give a reference to that choice. Instead, why choose 2% for the time step?

      1) We adjusted the reference to be closer to “biomarkers”. While we agree that further details on the influence of the sampling step would be of interest to some of the readers, we feel that it is far beyond the scope of this paper.

      2) There is no specific reference we can provide for the choice. Our goal was to reach 5% relative difference, which we surpassed by the chosen time steps of 0.01 ms (solver) and 0.05 ms (sampling), leading to only 2% difference. We rephrased the sentence in question to make this clear.

      "We considered the time steps with only 2% relative difference for all AP biomarkers (solver: 0.01ms; sampling: 0.05ms) to offer a sufficiently good approximation." (Section 2.1)

      R III-(e) In the caption of Figure 1 authors should include the reference for AP experimental data (are they from Orvos et al. 2019 as reported in the Experimental Data section?)

      We added the missing reference as requested. As correctly assumed, they are from [Orvos et al., 2019].

      R III-(f) Why do authors not use experimental data in the emulator development/training?

      For the supervised training of our NN emulator, we need to provide the maximum conductances of our chosen channels for each AP. While it would be beneficial to also include experimental data in the training to diversify the training data, the exact maximum conductances in our the considered retrospective experiments are not known. In the case such data would be available with low measurement uncertainty, it would be possible to include.

      R III-(g) What is TP used in the Appendix B? I could not find the acronymous explanation.

      We are sorry for the oversight, TP refers to the time-to-peak and is now described in Appendix A.

      R III-(h) Are there any reasons for only using ST and no S1? Maybe are the same?

      The global sensitivity analysis is further outlined in Appendix B, also showing S1 (first-order effects) and ST (variance of all interactions) together (Figure 11) [Herman and Usher, 2017] and their differences (e.g. in TP) Since S1 only captures first-order effects, it may fail to capture higher-order interactions between the maximum conductances, thus we favored ST.

      R III-(i) In Training Section Page 8. It is not clear why it is necessary to resample data. Can you motivate?

      The resampling part is motivated by exactly capturing the swift depolarization dynamics, whereas the output from CARPentry is uniformly sampled. This is now further highlighted in the text.

      "Then, the data were non-uniformly resampled from the original uniformly simulated APs, to emphasize the depolarization slope with a high accuracy while lowering the number of repolarization samples. For this purpose, we resamled the APs [...]" (Section 2.2.1)

      R III-(j) For the training of the neuronal network, the authors used the ADAM algorithm: have you tested any other algorithm?

      For training neural networks, ADAM has become the current de-facto standard and is certainly a robust choice for training our emulator. While there may exist slightly faster, or better-suited training algorithms, we witnessed (qualitative) convergence in the training (Equation (2)). We thus strongly believe that the training algorithm is not a limiting factor in our study.

      R III-(k) What is the amount of the drugs tested? Is the same dose reported in the description of the second data set or the values are only referring to experimental data? Moreover, it is not clear if in the description of experimental data, the authors are referring to newly acquired data (since they described in detail the protocol) or if they are obtained from Orvos et al. 2019 work.

      In all scenarios, we tested 5 different drugs (cisapride, dofetilide, sotalol, terfenadine, verapamil). We revised our previous presentation of the data available, and now try to give a concise overview over the utilized data (Section 2.2.1 and table 1) and drug comparison with the CiPA distributions (Table 5, former 4). Note that in the latter case, the available expected channel scaling factors by the CiPA distributions vary, but are now clearly shown in Table 5.

      R III-(l) In Figure 4, I will avoid the use of “control” in the legend since it is commonly associated with basal conditions and not with the drug administration.

      The terminology “control” in this context is in line with works from the CiPA initiative, e.g. [Li et al., 2017] and refers to the state of cell conditions before the drug wash-in. We added a minor note the first time we use the term control in the introduction to emphasize that we refer to the state of the cell before administering any drugs

      "To compute the drugged AP for given pharmacological parameters is a forward problem, while the corresponding inverse problem is to find pharmacological parameters for given control (before drug administration) and drugged AP." (Section 1)

      R III-(m) In Table 1 when you referred to Britton et al. 2017 work, I suggest adding also 10.1371/journal.pcbi.1002061.

      We added the suggested article as a reference.

      R III-(n) For the minimization problem, only data set #1 has been used. Have you tested data set #2?

      In the current scenario, we only tested the inverse problem for data set #2 (former #1). The main purpose for data set #3 (former #2), was to test the possibility to emulate EAD APs. Given the overall lower performance in comparison to data set #2 (former #1), we also expect deteriorated results in comparison to the existing inverse synthetic problem.

      R III-(o) In Figure 6 you should have the same x-axis (we could not see any points in the large time scale for many biomarkers). Why dVmMax is not uniformed distributed compared to the others? Can you comment on that?

      As suggested, we re-adjusted the x-range to show the center of distributions. Additionally, we denoted in each subplot the number of outliers which lie outside of the shown range. The error distribution on dVmMax exhibits a slightly off-center, left-tailed normal distribution, which we now describe a bit more in the revised text:

      "While the mismatches in phase 3 were simply a result of imperfect emulation, the mismatches in phase 0 were a result of the difficulty in matching the depolarization time exactly. [...] Likewise, the difficulty in exactly matching the depolarization time leads to elevated errors and more outliers in the biomarkers influenced by the depolarization phase (TP and dVmMax)," (Section 3.1.1)

      R III-(p) Page 14. Can the authors better clarify ”the average RMSE over all APs 13.6mV”: is it the mean for all histograms in Figure 7? (In Figure 5 is more evident the average RMSE).

      The average RMSE uses the same definition for Figures 5 and 7: It is the average over all the RMSEs for each pair of traces (simulated/emulated), though the amount of samples is much lower for the EAD data set and not normal distributed.

      R III-(q) In Table 4, the information on which drugs are considered should be added. For each channel, we added the names of the drugs for which respective data from the CiPA initiative were available.

      R III-(r) Pag. 18, second paragraph, there is a repetition of ”and”.

      Fixed

      R III-(s) The pair’s combination of scaling factors for simulating synthetic drugs reported in Table 2, can be associated with some effects of real drugs? In this case, I suggest including the information or justifying the choice.

      The scaling factors in Table 2 are used to create data set #3 (former #2), and is meant to provide several APs which expose EADs. This is described in more detail in the new data section, Section 2.2.1:

      "Data set #3: The motivation for creating data set #3 was to test the emulator on data of abnormal APs showing the repolarization abnormality EAD. This is considered a particularly relevant AP abnormality in pharmacological studies because of their role in the genesis of drug-induced ventricular arrhythmia’s [Weiss et al., 2010]. Drug data were created using ten synthetic drugs with the hERG channel and the Cav1.2 channel as targets. To this end, ten samples with pharmacological parameters for GKr and PCa (Table 2) were generated and the synthetic drugs were applied to the entire synthetic cardiomyocyte population by scaling GKr and PCa with the corresponding pharmacological parameter. Of the 1000 APs simulated, we discarded APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000ms (checked for the last AP), indicative of an AP that does not repolarize within 1000ms. This left us with 950 APs, 171 of which exhibit EAD (see Appendix C)." (Section 2.2.1)

      R III-(t) A general comment on the work is that the authors claim that their study highlights the potential of NN emulators as a powerful tool for increased efficiency in future quantitative systems pharmacology studies, but they wrote ”Larger inaccuracies were found in the inverse problem solutions on experimental data highlight inaccuracies in estimating the pharmacological parameters”: so, I was wondering how they can claim the robustness of NN use as a tool for more efficient computation in pharmacological studies.

      The discussed robustness directly refers to efficiently emulating steady-state/limit cycle APs from a set of maximum conductances (forward problem, Section 3.1.1). We extensively evaluated the algorithm and feel that given the low emulation RMSE of APs (< 1 mV), the statement is warranted. The inverse estimation, enabled through this rapid evaluation, performs well on synthetic data, but shows difficulties for experimental data. Note however that at this point there are multiple potential sources for these problems as highlighted in the Evaluation section (Section 4.1) and Table 5 (former 4) highlights the difference in accuracy of estimating per-channel maximum conductances, revealing a potentially large discrepancy. The emulator also offers future possibilities to incorporate additional informations in the forms of either priors, or more detailed measurements (e.g. calcium transients) and can be potentially improved to a point where also the inverse problem can be satisfactorily solved in experimental preparations, though further analysis will be required.

      References [Beck, 2017] Beck, A. (2017). First-order methods in optimization. SIAM.

      [Britton et al., 2013] Britton, O. J., Bueno-Orovio, A., Ammel, K. V., Lu, H. R., Towart, R., Gallacher, D. J., and Rodriguez, B. (2013). Experimentally calibrated population of models predicts and explains intersubject variability in cardiac cellular electrophysiology. Proceedings of the National Academy of Sciences, 110(23).

      [Chang et al., 2017] Chang, K. C., Dutta, S., Mirams, G. R., Beattie, K. A., Sheng, J., Tran, P. N., Wu, M., Wu, W. W., Colatsky, T., Strauss, D. G., and Li, Z. (2017). Uncertainty quantification reveals the importance of data variability and experimental design considerations for in silico proarrhythmia risk assessment. Frontiers in Physiology, 8.

      [Clerx et al., 2016] Clerx, M., Collins, P., de Lange, E., and Volders, P. G. A. (2016). Myokit: A simple interface to cardiac cellular electrophysiology. Progress in Biophysics and Molecular Biology, 120(1):100–114.

      [Endresen and Skarland, 2000] Endresen, L. and Skarland, N. (2000). Limit cycle oscillations in pacemaker cells. IEEE Transactions on Biomedical Engineering, 47(8):1134–1137.

      [Garny and Hunter, 2015] Garny, A. and Hunter, P. J. (2015). OpenCOR: a modular and interoperable approach to computational biology. Frontiers in Physiology, 6.

      [Gemmell et al., 2016] Gemmell, P., Burrage, K., Rodr´ıguez, B., and Quinn, T. A. (2016). Rabbit-specific computational modelling of ventricular cell electrophysiology: Using populations of models to explore variability in the response to ischemia. Progress in Biophysics and Molecular Biology, 121(2):169–184.

      [Ghosh et al., 2018] Ghosh, S., Gavaghan, D. J., and Mirams, G. R. (2018). Gaussian process emulation for discontinuous response surfaces with applications for cardiac electrophysiology models.

      [Herman and Usher, 2017] Herman, J. and Usher, W. (2017). SALib: An open-source python library for sensitivity analysis. J. Open Source Softw., 2(9):97.

      [Johnstone et al., 2016] Johnstone, R. H., Chang, E. T., Bardenet, R., de Boer, T. P., Gavaghan, D. J., Pathmanathan, P., Clayton, R. H., and Mirams, G. R. (2016). Uncertainty and variability in models of the cardiac action potential: Can we build trustworthy models? Journal of Molecular and Cellular Cardiology, 96:49–62.

      [Li et al., 2017] Li, Z., Dutta, S., Sheng, J., Tran, P. N., Wu, W., Chang, K., Mdluli, T., Strauss, D. G., and Colatsky, T. (2017). Improving the in silico assessment of proarrhythmia risk by combining hERG (human ether`a-go-go-related gene) channel–drug binding kinetics and multichannel pharmacology. Circulation: Arrhythmia and Electrophysiology, 10(2).

      [Muszkiewicz et al., 2016] Muszkiewicz, A., Britton, O. J., Gemmell, P., Passini, E., S´anchez, C., Zhou, X., Carusi, A., Quinn, T. A., Burrage, K., Bueno-Orovio, A., and Rodriguez, B. (2016). Variability in cardiac electrophysiology: Using experimentally-calibrated populations of models to move beyond the single virtual physiological human paradigm. Progress in Biophysics and Molecular Biology, 120(1):115–127.

      [Orvos et al., 2019] Orvos, P., Kohajda, Z., Szlov´ak, J., Gazdag, P., Arp´adffy-Lovas, T., T´oth, D., Geramipour, A.,´ T´alosi, L., Jost, N., Varr´o, A., and Vir´ag, L. (2019). Evaluation of possible proarrhythmic potency: Comparison of the effect of dofetilide, cisapride, sotalol, terfenadine, and verapamil on hERG and native iKr currents and on cardiac action potential. Toxicological Sciences, 168(2):365–380.

      [Paszke et al., 2019] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.

      [Plank et al., 2021] Plank, G., Loewe, A., Neic, A., Augustin, C., Huang, Y.-L., Gsell, M. A., Karabelas, E., Nothstein, M., Prassl, A. J., S´anchez, J., Seemann, G., and Vigmond, E. J. (2021). The openCARP simulation environment for cardiac electrophysiology. Computer Methods and Programs in Biomedicine, 208:106223.

      [Shahriari et al., 2016] Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proceedings of the IEEE, 104(1):148–175. Conference Name: Proceedings of the IEEE.

      [Snoek et al., 2014] Snoek, J., Swersky, K., Zemel, R., and Adams, R. (2014). Input Warping for Bayesian Optimization of Non-Stationary Functions. In Proceedings of the 31st International Conference on Machine Learning, pages 1674–1682. PMLR. ISSN: 1938-7228.

      [Weiss et al., 2010] Weiss, J. N., Garfinkel, A., Karagueuzian, H. S., Chen, P.-S., and Qu, Z. (2010). Early afterdepolarizations and cardiac arrhythmias. Heart Rhythm, 7(12):1891–1899.

    1. Author Response

      We thank you for your careful review of our manuscript and helpful comments and suggestions. We have carefully considered each point and have addressed them by adding changes to the manuscript and figures. The text below detailed our responses and edits.

      Reviewer #1 (Public Review):

      Summary:

      Liao et al leveraged two powerful genomics techniques-CUT&RUN and RNA sequencing-to identify genomic regions bound by and activated or inactivated by SMAD1, SMAD5, and the progesterone receptor during endometrial stromal cell decidualization.

      Strengths:

      The authors utilized powerful next generation sequencing and identified important transcriptional mechanisms of SMAD1/5 and PGR during decidualization in vivo.

      Weaknesses:

      Overall, the manuscript and study are well structured and provide critical mechanistic updates on the roles of SMAD1/5 in decidualization and preparation of the maternal endometrium for pregnancy. Please consider the following to improve the manuscript:

      • Figure 4: A and C show bar graphs, not histograms. Please alter this phrasing.

      Figure legends were adjusted as suggested.

      • What post hoc test was performed on qPCR analyses? (Figure 6). It is evident that any assumptions of equal variance need to be negated due to the wide dispersion in experimental response invalidating the assumptions of a one-way ANOVA.

      Yes, a Tukey’s post hoc test was performed on the qPCR analyses. To address the reviewer’s question regarding equal variance, normality of the dataset was examined by D’agostino & Pearson test in GraphPad Prism. The data demonstrated a normal distribution pattern, thus justifying the one-way ANOVA test.

      • Figure 6: what data points are plotted? Are these technical replicates from individual wells or qPCR technical replicates?

      The dataset represents three technical and three biological data points.

      • Figure 6: Consider changing graph colors to increase visibility of error bars and data points.

      Thank you for this suggestion. The colors of the error bars in Figure 6 have been changed to increase visibility. Additionally, different shapes have been utilized to distinguish between different groups.

      • Figure 6 legend: no histograms are shown in this figure. Refer to all gene names utilizing proper nomenclature and conventions (gene names should be italicized).

      The legend was adjusted as suggested with the correct nomenclature implemented.

      • qPCR analyses: qPCR normalization should be done to at least two internal control genes, preferably three according to the MIQE guidelines (PMID: 19246619).

      As suggested, we have performed additional qPCR analysis with normalization done to three internal controls.

      • Supplement figure 2: graphs are bar graphs, not histograms.

      The legends have been changed as suggested.

      Reviewer #2 (Public Review):

      Summary:

      Liao and colleagues generated tagged SMAD1 and SMAD5 mouse models and identified genome occupancy of these two factors in the uterus of these mice using the CUT&RUN assay. The authors used integrative bioinformatic approaches to identify putative SMAD1/5 direct downstream target genes and to catalog the SMAD1/5 and PGR genome co-localization pattern. The role of SMAD1/5 on stromal decidualization was assayed in vitro on primary human endometrial stromal cells. The new mouse models offer opportunities to further dissect SMAD1 and SMAD5 functions without the limitation from SMAD antibodies, which is significant. The CUT&RUN data further support the usefulness of these mouse models for this purpose.

      Strengths:

      The strength of this study is the novelty of new mouse models and the valuable cistromic data derived from these mice.

      Weaknesses:

      The weakness of the present version of the manuscript includes the self-limited data analysis approaches such as the proximal promoter based bioinformatic filter and a missed opportunity to investigate the role of SMAD1/5 on determining the genome occupancy of major uterine transcription regulators.

      Thank you for the comments. We addressed the limitation of the promoter-based analysis in the discussion and pointed out the possibility of analyzing additional genomics features (Lines 548551). Based on the suggestions, we also included an analysis in which we compared SMAD1/5 binding activities in this study to known major uterine transcription regulators’ binding activities (namely, SOX17 and NR2F2) using published ChIP-seq data in the mouse uterus. Results from this analysis are discussed in Lines 426-436. Content from the adjusted manuscript is copied below.

      Lines 548-551:

      “From pathway enrichment analysis, we demonstrate that genes with SMAD1/5 and PR bound at the promoter regions are enriched for key pathways in directing the decidualization process, such as WNT and relaxin signaling pathways. Future studies can benefit from analyzing binding events beyond the promoter regions.”

      Lines 426-436:

      “To further evaluate the key roles of SMAD1/5 as major uterine transcription regulators, we cross-compared the genomic binding sites of SMAD1/5 with known key transcription factors, namely aforementioned SOX17 (Supplement Figure 1E), as well as NR2F2 (Supplement Figure 1F), an essential regulator of hormonal response, using our CUT&RUN data sets and published mouse uterine SOX17 and NR2F2 ChIP-seq data sets (GSE118328, GSE232583). Among the annotated genes, 5402 genes are shared between SMAD1/5 and SOX17, and 1922 genes are shared between SMAD1/5 and NR2F2. Such observations indicate a potential co-regulatory mechanism between SMAD1/5 and other key uterine transcription factors in maintaining appropriate uterine functions. Overall, our analyses demonstrate that the transcriptional activity of SMAD1, SMAD5, and PR coordinate the expression of key genes required for endometrial receptivity and decidualization.”

      Reviewer #3 (Public Review):

      Summary:

      As SMAD1/5 activities have previously been indistinguishable, these studies provide a new mouse model to finally understand unique downstream activation of SMAD1/5 target genes, a model useful for many scientific fields. Using CUT&RUN analyses with gene overlap comparisons and signaling pathway analyses, specific targets for SMAD1 versus SMAD5 were compared, identified, and interpreted. These data validate previous findings showing strong evidence that SMADs directly govern critical genes required for endometrial receptivity and decidualization, including cell adhesion and vascular development. Further, SMAD targets were overlapped with progesterone receptor binding sites to identify regions of potential synergistic regulation of implantation. The authors report strong correlations between progesterone receptor and SMAD1/5 direct targets to cooperatively promote embryo implantation. Finally, the authors validated SMAD1/5 gene regulation in primary human endometrial stromal cells. These studies provide a data-rich survey of SMAD family transcription, defining its role as a governor of early pregnancy.

      Strengths:

      This manuscript provides a valuable survey of SMAD1/5 direct transcriptional events at the time of receptivity. As embryo implantation is controlled by extensive epithelial to stromal molecular crosstalk and hormonal regulation in space and time, the authors state a strong, descriptive narrative defining how SMAD1/5 plays a central role at the site of this molecular orchestration. The implementation of cutting-edge techniques and models and simple comparative analyses provide a straightforward, yet elegant manuscript.

      Although the progesterone receptor exists as a major regulator of early pregnancy, the authors have demonstrated clear evidence that progesterone receptor with SMAD1/5 work in concert to molecularly regulate targets such as Sox17, Id2, Tgfbr2, Runx1, Foxo1 and more at embryo implantation. Additionally, the authors pinpoint other critical transcription factor motifs that work with SMADs and the progesterone receptor to promote early pregnancy transcriptional paradigms.

      Weaknesses:

      Although a wonderful new tool to ascertain SMAD1 versus SMAD5 downstream signaling, the importance of these factors in governing early pregnancy is not novel. Furthermore, functional validation studies are needed to confirm interactions at promoter regions. Addtionally, the authors presume that all overlapped genes are shared between progesterone receptor and SMAD1/5, yet some peak representations do not overlap. Although, transcriptional activation can occur at the same time, they may not occur in the same complex. Thus, further confirmation of these transcriptional events is warranted.

      Thank you for the review; we appreciate these valuable comments. Although we used an overlap approach to investigate the gene regulatory networks between SMAD1/5 and PR at the gene level, we functionally validated the regulatory effect in an in vitro decidualization model using a qPCR approach. We acknowledge that gene activations may not occur at the exact same complex, but functional validation screenings at the promoter level are beyond the scope of the study. However, we added the discussion about the possibility of proposed investigations in Lines 553-558. Our current dataset and validation studies support our conclusions with robust evidence. Content from Lines 553-558 is copied below.

      Lines 553-558: “In this study, we determined the overlapped transcriptional control between SMAD1/5 and PR at the gene level, and functionally validated the regulatory effect at the transcript level in a human stromal cell decidualization model. While we observe a subset of peak representations that do not overlap at the base pair level in the promoter regions, future functional screenings at the promoter level, such as luciferase reporter assays to assess transcriptional co-activation by SMAD1/5 and PR, will advance this study.”

      • Since whole murine uterus was used for these studies, the specific functions of SMAD1/5 in the stroma versus the epithelium (versus the myometrium) remain unknown. Specific roles for SMAD1/5 in the uterine stroma and epithelial compartments still need to be examined. Also, further work is needed to delineate binding and transcriptional activation of SMAD1/5 and the progesterone receptor in stromal versus epithelial uterine compartments.

      Thank you for the comments. Indeed, our study was performed in the whole mouse uterus, which includes stroma, epithelium and myometrium. Our previous data shows that nuclear SMAD1/5 are localized to both the stroma and epithelium in the decidua zone during the decidualization process at 4.5 dpc (PMID:34099644). Published in vivo studies also demonstrate the essential role of SMAD1/5 in the uterine epithelium and stroma compartments, respectively (PMIDs:35383354/27335065/17967875). Although we believe the binding/transcriptional activation of SMAD1/5 and PR occurs in both compartments based on the mouse phenotypic data, opportunities for further compartment-specific analysis were granted and discussion regarding such investigations was added (Lines 501-513). Content from Lines 501-513 is copied below.

      Lines 501-513:

      “Published studies have shown that nuclear SMAD1/5 localize to the stroma and epithelium during the decidualization process at 4.5dpc during the window of implantation. Conditional deletion of SMAD1/5 exclusively in the uterine epithelium using lactoferrin-icre (Ltf-icre) results in severe subfertility due to impaired implantation and decidual development. Conditional deletion of SMAD1/5/4 exclusively in the cells from mesenchymal lineage (including uterine stroma) using anti-Mullerian hormone type 2 receptor cre (Amhr2-cre) results in infertility with defective decidualization. Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that transcriptional co-regulation by SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out the potential coregulation of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

      • There are asynchronous gene responses in the SMAD1/5 ablated mouse model compared to the siRNA-treated human endometrial stromal cells. These differences can be confounding, and more clarity is required in understanding the meaning of these differences and as they relate to the entire SMAD transcriptome.

      Thank you for the comments. From the mouse models with SMAD1/5 conditional deletions, we observed phenotypic defects at 4.5 dpc, which is the beginning of decidualization in the mouse. Our study used human endometrial stromal cells as a model to validate our findings functionally, aiming to mimic the specific time point during decidualization. Differences between the two models may arise from the strategy used to perturb SMAD1/5; in the mouse, a complete knockout of SMAD1/5 was used, resulting in failed decidualization, while the human endometrial stromal cells used an siRNA knockdown approach, which decreased the potential for decidualization. As such, this information needs to be considered when evaluating genome-wide effects on the transcriptome. We added a discussion of this point to Lines 564-572. Content from Lines 564-572 is copied below.

      Lines 564-572:

      “Since mice only undergo decidualization upon embryo implantation whilst human stromal cells undergo cyclic decidualization in each menstrual cycle in response to rising levels of progesterone, asynchronous gene responses may occur in comparison between mouse models and human cells. However, cellular transformation during decidualization is conserved between mice and humans, which makes findings in the mouse models a valuable and transferable resource to be evaluated in human tissues. Accordingly, our functional validation studies were performed using human endometrial stromal cells induced to decidualize in vitro for four days, which models the early phases of decidualization. Additional transcriptomic studies of the SMAD1/5 perturbations in human endometrial stromal cells will be of great resource in understanding the entire SMAD1/5 regulomes in humans.”

      Reviewer #1 (Recommendations For The Authors):

      • Minor grammatical errors requiring attention such as inserting punctuation at the end of sentences and including figure legends prior to the end of sentence punctuation.

      Thanks for the comments. Additional proofreading was conducted for the revision.

      Reviewer #2 (Recommendations For The Authors):

      1) Between SMAD1 and SMAD5, does losing one SMAD affect the other SMAD's genome occupancy?

      Thanks for the comments. Based on the mouse phenotypic data that conditional deletion of SMAD1 in the uterus does not affect female fertility, while conditional deletion of SMAD5 leads to subfertility, and conditional deletion of both SMAD1 and SMAD5 leads to complete infertility. We believe losing one SMAD will affect the other SMAD's genome occupancy. This point is discussed in Lines 514-517, with contents copied below.

      Lines 514-517: “Although our studies herein confirm that SMAD1 and SMAD5 proteins have distinct transcriptional regulatory activities, our previous studies demonstrated that while SMAD5 can functionally replace SMAD1, SMAD1 cannot replace SMAD5 in the uterus. How this epistatic relationship is established in a tissue-specific manner still needs to be determined by further biochemical investigations.”

      2) In light of SMAD1/5 and PGR co-occupied cis-acting elements and coregulating uterine transcriptome, does loss of SMAD1/5 alter the PGR and ESR1 genome occupancy?

      Thanks for the comments. In the SMAD1/5 double conditional knockout mice, we observe the hyposensitivity towards progesterone and unopposed estrogen responses. We hypothesize that loss of SMAD1/5 alters PR genome occupancy and subsequently ER genome occupancy is altered as a secondary effect. To functionally address this question, genomic profiling studies need to be performed in the SMAD1/5 knockout mice, and, ideally, also performed in the PR knockout mice. However, such large-scale studies are beyond the scope of the current study and will not affect our conclusions under physiological conditions. We did include additional discussion regarding this comment in Lines 551-553, with the contents copied below.

      Lines 551-553: “Profiling the PR genome occupancy in the SMAD1/5 deficient mice would provide an interesting perspective to reevaluate the major regulatory roles of SMAD1/5 in mediating uterine transcriptomes.”

      3) In terms of investigating the impact of SAMD1/5 on cell type composition, perhaps the digital cytometry approach (e.g., PMID: 31061481) could provide unbiased inferences.

      Thank you for the comments. We included expression analysis of a subset of SMAD1/5 direct target genes over different uterine compartments (Figure 4E). We also added the discussion of the opportunities for further compartment-specific analysis, including but not limited to the digital cytometry approach in Lines 506-513, with the contents copied below.

      Line 506-513:

      “Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that the transcriptional co-regulatory roles of SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out potential co-regulatory roles of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

      4) The limitation of focusing on the promoter occupied SMADs should be discussed.

      Additional discussion of the limitation of focusing on the promoter regions was added in Lines 548-551, with contents copied below.

      Lines 548-551:

      “From pathway enrichment analysis, we demonstrate that genes with SMAD1/5 and PR bound at the promoter regions are enriched for key pathways in directing the decidualization process, such as WNT and relaxin signaling pathways. Future studies can benefit from analyzing binding events beyond the promoter regions.”

      5) Methods: The reagent and the condition for PGR CUT&RUN is missing.

      Information added in Line 153.

      1. Line 260: Please clarify the statement of "suggesting the transcriptional of PR depends on BMP/SMAD1/5 signaling".

      Thanks for the suggestion. The sentence was rephrased to (Lines 258-261) “Our previous studies revealed that conditional ablation of SMAD1 and SMAD5 in the uterus decreased P4 response during the peri-implantation period, suggesting that the transcriptional activities of PR depend on BMP/SMAD1/5 signaling.”

      7) Line 280-289: This statement belongs to the discussion section.

      The statement was moved as suggested.

      8) Figure 4E is not cited in the result section.

      Figure 4E was cited in the results section in the revised version. (Line 386)

      9) Figures 3C, 3D, 3E, 3F, 5B and 5D: please include the full lists in the supplemental data so that labs with limited bioinformatic capabilities could use these findings to facilitate scientific discovery.

      Data regarding the aforementioned figures were included in Supplement Tables 3-8 and Supplement Files 1-2.

      10) Figure 2B and Figure 5A: the heatmaps without further grouping on common and distinct genome occupancy among assayed factors provided minimum useful information. Please reconsider the presentation format in order to deliver more meaningful results.

      Figure 2B and Figure 5A were replotted with clustering using the k-means algorithm. Methods and legends were updated accordingly.

      Reviewer #3 (Recommendations For The Authors):

      To delineate specific roles for SMAD1/5 in the uterine stroma and epithelial compartments, methods such as single cell sequencing or spatial transcriptomic analysis may be warranted.

      The manuscript now includes the discussion of future opportunities in investigating the roles of SMAD1/5 in different uterine compartments using single-cell sequencing and/or spatial transcriptomic analysis (Lines 498-513), with contents copied below.

      Lines 498-513:

      “Our studies also examined the role of SMAD1/5 in mediating progesterone responses at the genomic and transcription levels. Similarly, our analysis was based on data sets generated from the whole mouse uterus, which contains multiple compartments of the uterine structures, including but not limited to epithelium and stroma. Published studies have shown that nuclear SMAD1/5 localize to the stroma and epithelium during the decidualization process at 4.5 dpc, during the window of implantation. Conditional deletion of SMAD1/5 exclusively in the uterine epithelium using lactoferrin-icre (Ltf-icre) results in severe subfertility due to impaired implantation and decidual development. Conditional deletion of SMAD1/5/4 exclusively in the cells from mesenchymal lineage (including uterine stroma) using anti-Mullerian hormone type 2 receptor cre (Amhr2-cre) results in infertility with defective decidualization. Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that the transcriptional co-regulatory roles of SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out potential co-regulatory roles of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

    1. Author Response

      We would like to thank the editor and the reviewers for their constructive comments and the chance to revise the manuscript. The suggestions have allowed us to improve our manuscript. We have been able to fulfil all reviewer comments and added new statistical analyses to examine associations for subsets of data. Whilst suggested by a reviewer, we did not perform large-scale experiments to confirm the viability of low sporozoite densities at different time-points post salivary gland colonization. For these assays there are currently no satisfactory in vitro models for sporozoites harvested from single mosquitoes and setting up and validating such experiments could be a PhD project in itself. We do consider this suggestion very relevant but beyond the scope of the current work.

      Relevantly, during the time the manuscript was under review at eLife, we have been able to examine the multiplicity of infection in our field experiments. This was, as written in the original manuscript, a key reason to also perform experiments in the field where there is a greater diversity of parasite lines. We have successfully performed AMA-1 amplicon deep sequencing on infected mosquito salivary glands and infected skins. Although this does not change the key messages of the manuscript and is secondary to our main hypothesis, we do consider it a relevant addition since we were able to demonstrate that for some infected mosquitoes from the Burkina Faso study, multiple clones were expelled by mosquitoes during probing on a single piece of artificial skin. We have added a short paragraph to our revised manuscript and updated the acknowledgement section to include the supporting researcher who conducted those experiments.

      Reviewer #1 (Public Review):

      Summary: There is a long-believed dogma in the malaria field; a mosquito infected with a single oocyst is equally infectious to humans as another mosquito with many oocysts. This belief has been used for goal setting (and modelling) of malaria transmission-blocking interventions. While recent studies using rodent malaria suggest that the dogma may not be true, there was no such study with human P. falciparum parasites. In this study, the numbers of oocysts and sporozoite in the mosquitoes and the number of expelled sporozoites into artificial skin from the infected mosquito was quantified individually. There was a significant correlation between sporozoite burden in the mosquitoes and expelled sporozoites. In addition, this study showed that highly infected mosquitoes expelled sporozoites sooner.

      Strengths:

      • The study was conducted using two different parasite-mosquito combinations; one was lab-adapted parasites with Anopheles stephensi and the other was parasites, which were circulated in infected patients, with An. coluzzii. Both combinations showed statistically significant correlations between sporozoite burden in mosquitoes and the number of expelled sporozoites.

      • Usually, this type of study has been done in group bases (e.g., count oocysts and sporozoites at different time points using different mosquitoes from the same group). However, this study determined the numbers in individual bases after multiple optimization and validation of the approach. This individual approach significantly increases the power of correlation analysis.

      Weaknesses:

      • In a natural setting, most mosquitoes have less than 5 oocysts. Thus, the conclusion is more convincing if the authors perform additional analysis for the key correlations (Fig 3C and 4D) excluding mosquitoes with very high total sporozoite load (e.g., more than 5-oocyst equivalent load).

      In the revised manuscript, we have also performed our analysis including only the subset of mosquitoes with low oocyst burden. In our Burkina Faso experiments, where we could not control oocyst density, 48% (15/31) of skins were from mosquitoes with <5 oocyst sheets. Whilst low oocyst densities were thus not very uncommon, we acknowledge that this may have rendered some comparisons underpowered. At the same time, we observe a strong positive trend between oocyst density and sporozoite density and between salivary gland sporozoite density and mosquito inoculum. This makes it very likely that this trend is also present at lower oocyst densities, an association where sporozoite inoculation saturates at high densities is plausible and has been observed before for rodent malaria (DOI: 10.1371/journal.ppat.1008181) whilst we consider it less likely that sporozoite expelling would be more efficient at low (unmeasured) sporozoite densities.

      • As written as the second limitation of the study, this study did not investigate whether all expelled sporozoites were equally infectious. For example, Day 9 expelled sporozoites may be less infectious than Day 11 sporozoites, or expelled sporozoites from high-burden mosquitoes may be less infectious because they experience low nutrient conditions in a mosquito. Ideally, it is nice to test the infectivity by ex vivo assays, such as hepatocyte invasion assay, and gliding assay at least for salivary sporozoites. But are there any preceding studies where the infectivity of sporozoites from different conditions was evaluated? Citing such studies would strengthen the argument.

      We appreciate this thought and can see the value of these experiments. We are not aware of any studies that examined sporozoite viability in relation to the day of salivary gland colonization or sporozoite density.

      One previous study assessed the NF54 sporozoite infectivity on different days post infection (days 12-13-14-15-16-18) and observed no clear differences in ‘per sporozoite hepatocyte invasion capacity’ over this period (DOI: 10.1111/cmi.12745). We nevertheless agree that it is conceivable that sporozoites require maturation in the salivary glands and might not all be equally infectious. While hepatocyte invasion experiments are conducted with bulk harvesting of all the sporozoites that are present in the salivary glands, it would even be more interesting to assess the invasion capacity of the smaller population of sporozoites that migrate to the proboscis to be expelled. This would, as the reviewer will appreciate, be a major endeavour. To do this well the expelled sporozoites would need to be harvested from the salivary glands/proboscis and used in the best and most natural environment for invasion. The suggested work would thus depend on the availability of primary hepatocytes since conventional cell-lines like HC-04 are likely to underestimate sporozoite invasion. Importantly, there are currently no opportunities to include the barrier of the skin environment in invasion assays whilst this may be highly important in determining the likelihood that sporozoites manage to achieve invasion and give rise to secondary infections. In short, we agree with the reviewer that these experiments are of interest but consider these well beyond the scope of the current work. We have added a section to the Discussion section to highlight these future avenues for research. ‘Of note, our assessments of EIP and of sporozoite expelling did not confirm the viability of sporozoites. Whilst the infectivity of sporozoites at different time-points post infection has been examine previously (https://doi.org/10.1111/cmi.12745), these experiments have never been conducted with individual mosquito salivary glands. To add to this complexity, such experiments would ideally retain the skin barrier that may be a relevant determinant for invasion capacity and primary hepatocytes.’

      • Since correlation analyses are the main points of this paper, it is important to show 95% CI of Spearman rank coefficient (not only p-value). By doing so, readers will understand the strengths/weaknesses of the correlations. The p-value only shows whether the observed correlation is significantly different from no correlation or not. In other words, if there are many data points, the p-value could be very small even if the correlation is weak.

      We appreciate this comment and agree that this is indeed insightful. We have added the 95% confidence intervals to all figure legends and main text. We also provide them below.

      Fig 3b: 95% CI: 0.74, 0.85

      Fig 3c: 95% CI: 0.17, 0.50

      Fig 4c: 95% CI: 0.80, 0.95

      Fig 4d: 95% CI: 0.52, 0.82

      Supp Fig 5a: 95% CI: 0.74, 0.85

      Supp Fig 5b: 95% CI: 0.73, 0.93

      Supp Fig 6: 95% CI: 0.11, 0.48

      Supp Fig 7: 95% CI: -0.12, 0.16

      Reviewer #2 (Public Review):

      Summary: The malaria parasite Plasmodium develops into oocysts and sporozoites inside Anopheles mosquitoes, in a process called sporogony. Sporozoites invade the insect salivary glands in order to be transmitted during a blood meal. An important question regarding malaria transmission is whether all mosquitoes harbouring Plasmodium parasites are equally infectious. In this paper, the authors investigated the progression of P. falciparum sporozoite development in Anopheles mosquitoes, using a sensitive qPCR method to quantify sporozoites and an artificial skin system to probe for parasite expelling. They assessed the association between oocyst burden, salivary gland infection intensity, and sporozoites expelled.

      The data show that higher sporozoite loads are associated with earlier colonization of salivary glands and a higher prevalence of sporozoite-positive salivary glands and that higher salivary gland sporozoite burdens are associated with higher numbers of expelled sporozoites. Intriguingly, there is no clear association between salivary gland burdens and the prevalence of expelling, suggesting that most infections reach a sufficient threshold to allow parasite expelling during a mosquito bite. This important observation suggests that low-density gametocyte carriers, although less likely to infect mosquitoes, could nevertheless contribute to malaria transmission.

      Strengths: The paper is well written and the work is well conducted. The authors used two experimental models, one using cultured P. falciparum gametocytes and An. stephensi mosquitoes, and the other one using natural gametocyte infections in a field setup with An. coluzzii mosquitoes. Both studies gave similar results, reinforcing the validity of the observations. Parasite quantification relies on a robust and sensitive qPCR method, and parasite expelling was assessed using an innovative experimental setup based on artificial skin.

      Weaknesses: There is no clear association between the prevalence of sporozoite expelling and the parasite burden. However, high total sporozoite burdens are associated with earlier and more efficient colonization of the salivary glands, and higher salivary gland burdens are associated with higher numbers of expelled sporozoites. While these observations suggest that highly infected mosquitoes could transmit/expel parasites earlier, this is not directly addressed in the study. In addition, whether all expelled sporozoites are equally infectious is unknown. The central question, i.e. whether all infected mosquitoes are equally infectious, therefore remains open.

      We agree that the manuscript provides important steps forward in our understanding of what makes an infectious mosquito but does not conclusively demonstrate that highly infected mosquitoes are more likely to initiate a secondary infection. We consider this to be beyond the scope of the current work although the current work lays the foundation for these important future studies. For human Plasmodium infections the most satisfactory answer on the infectiousness of low versus high infected mosquitoes comes from controlled human infection models. In response to reviewer comments, we have extended our Discussion section to highlight this importance. To accommodate the (very fair) reviewer comments, we have avoided any phrasings that suggest that our findings demonstrate differences in transmission.

      Reviewer #3 (Public Review):

      Summary: This study uses a state-of-the-art artificial skin assay to determine the quantity of P. falciparum sporozoites expelled during feeding using mosquito infection (by standardised membrane feeding assay SMFA) using both cultured gametocytes and natural infection. Sporozoite densities in salivary glands and expelled into the skin are quantified using a well-validated molecular assay. These studies show clear positive correlations between mosquito infection levels (as determined by oocyst numbers), sporozoite numbers in salivary glands, and sporozoites expelled during feeding. This indicates potentially significant heterogeneity in infectiousness between mosquitoes with different infection loads and thus challenges the often-made assumption that all infected mosquitoes are equally infectious.

      Strengths: Very rigorously designed studies using very well validated, state-of-the-art methods for studying malaria infections in the mosquito and quantifying load of expelled sporozoites. This resulted in very high-quality data that was well-analyzed and presented. Both sources of gametocytes (cultures vs. natural infection) show consistent results further strengthening the quality of the results obtained.

      Weaknesses: As is generally the case when using SMFAs, the mosquito infections levels are often relatively high compared to wild-caught mosquitoes (e.g. Bombard et al 2020 IJP: median 3-4 ), and the strength of the observed correlations between oocyst sheet and salivary gland sporozoite load even more so between salivary gland sporozoite load and expelled sporozoite number may be dominated by results from mosquitoes with infection levels rarely observed in wild-caught mosquitoes. This could result in an overestimation of the importance of these well-observed positive relationships under natural transmission conditions. The results obtained from these excellently designed and executed studies very well supported their conclusion - with a slight caveat regarding their application to natural transmission scenarios

      For efficiency and financial reasons, we have worked with an approach to enhance mosquito infection rates. If we had worked with gametocytes at physiological concentrations and a small number of donors, we probably have had considerably lower mosquito infection rates. Whilst this would indeed result in lower infection burdens in the sparse infected mosquitoes, addressing the reviewer concern, it would have made the experiments highly inefficient and expensive. The skin mimic was initially provided free of charge when the matrix was close to the expiry date but for the experiments in Burkina Faso we had to purchase the product at market value. Whilst we consider the biological question sufficiently important to justify this investment – and think our findings prove us right – it remained important to avoid using skins for uninfected mosquitoes. Since oocyst prevalence and density are strongly correlated (doi: 10.1016/j.ijpara.2012.09.002; doi: 10.7554/eLife.34463), a low oocyst density in natural infections typically coincides with a high proportion of negative mosquitoes.

      Of note, our approach did result in the inclusion of 15 skins from infected mosquitoes with 1-4 oocysts. This number may be modest but we did include observations from this low oocyst range which is, we agree, highly important for better understanding malaria epidemiology.

      This work very convincingly highlights the potential for significant heterogeneity in the infectiousness between individual P. falciparum-infected mosquitoes. Such heterogeneity needs to be further investigated and if again confirmed taken into account both when modelling malaria transmission and when evaluating the importance of low-density infections in sustaining malaria transmission.

      Reviewer #4 (Public Review):

      Summary: The study compares the number of sporozoites expelled by mosquitoes with different Plasmodium infection burden. To my knowledge this is the first report comparing the number of expelled P. falciparum sporozoites and their relation to oocyst burden (intact and ruptured) and residual sporozoites in salivary glands. The study provides important evidence on malaria transmission biology although conclusions cannot be drawn on direct impact on transmission.

      Strengths: Although there is some evidence from malaria challenge studies that the burden of sporozoites injected into a host is directly correlated with the likelihood of infection, this has been done using experimental infection models which administer sporozoites intravenously. It is unclear whether the same correlation occurs with natural infections and what the actual threshold for infection may be. Host immunity and other host related factors also play a critical role in transmission and need to be taken into consideration; these have not been mentioned by the authors. This is of particular importance as host immunity is decreasing with reduction in transmission intensity.

      Weaknesses: The natural infections reported in the study were not natural as the authors described. Gametocyte enrichment was done to attain high oocyst infection numbers. Studying natural infections would have been better without the enrichment step. The infected mosquitoes have much larger infection burden than what occurs in the wild.

      Nevertheless, the findings support the same results as in the experiments conducted in the Netherlands and therefore are of interest. I suggest the authors change the wording. Rather than calling these "natural" infections, they could be called, for example, "experimental infections with wild parasite strains".

      We have addressed these concerns and, in the process, also changed our manuscript title. The following sentences have been changed:

      “It is currently unknown whether all Plasmodium falciparum infected mosquitoes are equally infectious. We assessed sporogonic development using cultured gametocytes in the Netherlands and natural infections in Burkina Faso”.

      Now reads: “It is currently unknown whether all Plasmodium falciparum infected mosquitoes are equally infectious. We assessed sporogonic development using cultured gametocytes in the Netherlands and experimental infections with naturally circulating parasite strains in Burkina Faso”. 226-228 “Experimental infections with naturally circulating parasite strains show comparable correlation between oocyst density, salivary gland density and sporozoite inoculum”.

      Has now replaced the original phrasing: “Natural infected mosquitoes by gametocyte carriers in Burkina Faso show comparable correlation between oocyst density, salivary gland density and sporozoite inoculum”.

      I do not believe the study results generate sufficient evidence to conclude that lower infection burden in mosquitoes is likely to result in changes to transmission potential in the field. In study limitations section, the authors say "In addition, our quantification of sporozoite inoculum size is informative for comparisons between groups of high and low-infected mosquitoes but does not provide conclusive evidence on the likelihood of achieving secondary infections. Given striking differences in sporozoite burden between different Plasmodium species - low sporozoite densities appear considerably more common in mosquitoes infected with P. yoelii and P. berghei the association between sporozoite inoculum and the likelihood of achieving secondary infections may be best examined in controlled human infection studies. However, in the abstract conclusion the authors state "Whilst sporozoite expelling was regularly observed from mosquitoes with low infection burdens, our findings indicate that mosquito infection burden is associated with the number of expelled sporozoites and may need to be considered in estimations of transmission potential." Kindly consider ending the sentence at "expelled sporozoites." Future studies on CHMI can be recommended as a conclusion if authors feel fit.

      We agree that we need to be very cautious with conclusions on the impact of our findings for the infectious reservoir. We have rephrased parts of our abstract and have updated the Discussion section following the reviewer suggestions. We agree with the reviewer that CHMI studies are recommended and have expanded the Discussion section to make this clearer. The sentence in the abstract now ends as:

      "Whilst sporozoite expelling was regularly observed from mosquitoes with low infection burdens, our findings indicate that mosquito infection burden is associated with the number of expelled sporozoites. Future work is required to determine the direct implications of these findings for transmission potential."

      Reviewer #1 (Recommendations For The Authors):

      • Prevalence data shown in Fig 2A and Table S1 are different. For example, >50K at Day 11, Fig 2A shows ~85% prevalence, but Table S1 says 100%. If the prevalence in Table S1 shows a proportion of observations with positive expelled sporozoites (instead of a proportion of positive mosquitoes shown in Fig 2A), then the prevalence for <1K at Day 11 cannot be 6.7% (either 0 or 20% as there were a total of 5 observations). So in either case, it is not clear why the numbers shown in Fig 2A and Table S1 are different.

      Figure 2A and Table S2 are estimated prevalence and odds ratios from an additive logistic regression model (i.e. excluding the interaction between day and sporozoite categories). Table S1 includes this interaction when estimating prevalence and odds ratios and as we can see some categories in the interaction were extremely small resulting in blown up confidence intervals especially in day 11. So Table S1 and Fig 2A are the results from two different models. Whilst our results are thus correct, we can understand the confusion and have added a sentence to explain the model used in the figure/table legends.

      Figure. 2 Extrinsic Incubation Period in high versus low infected mosquitoes. A. Total sporozoites (SPZ) per mosquito in body plus salivary glands (x-axis) were binned by infection load <1k; 1k-10k; 10k-50k; >50k and plotted against the proportion of mosquitoes (%) that were sporozoite positive (y-axis) as estimated from an additive logistic regression model with factors day and SPZ categories. Supplementary Table S1. The extrinsic incubation period of P. falciparum in An. stephensi estimated by quantification of sporozoites on day 9, 10, 11 by qPCR. Based on infection intensity mosquitoes were binned into four categories (<1k, 1k-10k, 10k-50k, >50) that was assessed by combining sporozoite densities in the mosquito body and salivary gland. Prevalences and odds ratios were estimated from a logistic regression model with factors day, SPZ category and their interaction.

      There are 3 typos in the paper. Please fix them.

      Line 464; ...were counted using a using an incident....

      Line 473; Supplementary Figure 7 should be Fig S8.

      Line 508: ...between days 9 and 10 using a (t=-2.0467)....

      We appreciate the rigour in reviewing our text and have corrected all typos.

      Reviewer #2 (Recommendations For The Authors):

      High infection burdens may result in earlier expelling capacity in mosquitoes, which would reflect more accurately the EIP. The fact that earlier colonization of SG and correlation between SG burden and numbers expelled suggest it could be the case, but it would be interesting to directly measure the prevalence of expelling over time to directly assess the effect of the sporozoite burden (not just at day 15 but before). This could reveal how the parasite burden in mosquitoes is a determinant of transmission.

      We appreciate this suggestion and will consider this for future experiments. It adds another variable that is highly relevant but will also complicate comparisons where sporozoite expelling is related to both time since infectious blood meal and salivary gland sporozoite density (that is also dependent on time since infectious bloodmeal). Moreover, we then consider it important to measure this over the entire duration of sporozoite expelling, including late time-points post infectious bloodmeal. This may form part of a follow-up study.

      Another question is whether all sporozoites (among expelled parasites) are equally infective, i.e. susceptible to induce secondary infection. If not, this could reconcile the data of this study and previous results in the rodent model where high burdens were associated with an increased probability to transmit.

      As also indicated above, we are aware of a single study that assessed NF54 sporozoite infectivity on different days post infection (days 12-13-14-15-16-18) and observed no clear differences in ‘per sporozoite hepatocyte invasion capacity’ over this period (DOI: 10.1111/cmi.12745). We nevertheless agree that it is conceivable that sporozoites require maturation in the salivary glands and might not all be equally infectious. While hepatocyte invasion experiments are conducted with bulk harvesting of all the sporozoites that are present in the salivary glands, it would even be more interesting to assess the invasion capacity of the smaller population of sporozoites that migrate to the proboscis to be expelled. This would, as the reviewer will appreciate, be a major endeavour. To do this well the expelled sporozoites would need to be harvested from the salivary glands/proboscis and used in the best and most natural environment for invasion. The suggested work would thus depend on the availability of primary hepatocytes since conventional cell-lines like HC-04 are likely to underestimate sporozoite invasion. Importantly, there are currently no opportunities to include the barrier of the skin environment in invasion assays whilst this may be highly important in determining the likelihood that sporozoites manage to achieve invasion and give rise to secondary infections. In short, we agree with the reviewer that these experiments are of interest but consider these well beyond the scope of the current work. We have added a section to the Discussion section to highlight these future avenues for research. ‘Of note, our assessments of EIP and of sporozoite expelling did not confirm the viability of sporozoites. Whilst the infectivity of sporozoites at different time-points post infection has been examine previously (ref), these experiments have never been conducted with individual mosquito salivary glands. To add to this complexity, such experiments would ideally retain the skin barrier that may be a relevant determinant for invasion capacity and primary hepatocytes.’

      The authors evaluated oocyst rupture at day 18, i.e. 3 days after feeding experiments (performed at day 15). Did they check in control experiments that the prevalence of rupture oocysts does not vary between day 15 and day 18?

      We did not do this and consider it very unlikely that there is a noticeable increase in the number of ruptured oocysts between days 15 and 18. We observe that salivary gland invasion plateaus around day 12 and the provision of a second bloodmeal that is known to accelerate oocyst maturation and rupture (doi: 10.1371/journal.ppat.1009131) makes it even less likely that a relevant fraction of oocysts ruptures very late. Perhaps most compellingly, the time of oocyst rupture will depend on nutrient availability and rupture could thus occur later for oocysts from a heavily infected gut compared to oocysts from mosquitoes with a low infection burden. We observe a very strong association between salivary gland sporozoite density (day 15) and oocyst density (assessed at day 18) without any evidence for change in the number of sporozoites per oocyst for different oocyst densities. In our revised manuscript we have also assessed correlations for different ranges of oocyst intensities and see highly consistent correlation coefficients and find no evidence for a change in ‘slope’. If oocyst rupture would regularly happen between days 15 and 18 and this late rupture would be more common in heavily infected mosquitoes, we would expect this to affect the associations presented in figures 3B and 4C This is not the case.

      The authors report higher sporozoite numbers per oocyst and a higher proportion of SG invasion as compared to previous studies (30-50% rather than 20%). How do they explain these differences? Is it due to the detection method and/or second blood meal? Or parasite species?

      We were also intrigued by these findings in light of existing literature. To address potential discrepancies, it is indeed possible that the 2nd bloodmeal made a difference. In addition, NF54 is known to be a highly efficient parasite in terms of gametocyte formation and transmission. And there are marked differences in these performances between NF54 isolates and definitely between NF54 and its clone 3D7 that is regularly used. We also used a molecular assay to detect and quantify sporozoites but consider it less likely that this is a major factor in terms of explaining SG invasion since sporozoite densities were typically within the range that would be detected by microscopy. We can only hypothesize that the 2nd bloodmeal may have contributed to these findings and acknowledge this in the revised Discussion section.

      The median numbers of expelled sporozoites seem to be higher in the natural gametocyte infection experiments as compared to the cultures. Is it due to the mosquito species (An. coluzzii versus An. stephensi?).

      The added value of our field experiments, a more relevant mosquito species and more relevant parasite isolates, is also a weakness in terms of understanding possible differences between in vitro experiments and field experiments with naturally circulating parasite strains. We only conclude that our in vitro experiments do not over-estimate sporozoite expelling by using a highly receptive mosquito source and artificially high gametocyte densities. We have clarified this in the revised Discussion.

      39% of sporozoite-positive mosquitoes failed to expel, irrespective of infection densities. Could the authors discuss possible explanations for this observation?

      In paragraph 304-307 we now write that:” This finding broadly aligns with an earlier study of Medica and Sinnis that reported that 22% of P. yoelii infected mosquitoes failed to expel sporozoites. For highly infected mosquitoes, this inefficient expelling has been related to a decrease of apyrase in the mosquito saliva”.

      In Figure 3, it would be interesting to zoom in the 0-1k window, below the apparent threshold for successful expelling.

      We have generated correlation estimates for different ranges of oocyst and sporozoite densities and added these in Supplementary Table 5. We agree that this helps the reader to appreciate the contribution of different ranges of parasite burden to the observed associations.

      In Fig S8. Did they observe intact oocysts with fixed samples? These could be shown as well in the figure.

      We have incorporated this comment. An intact oocyst from fixed samples was now added to Fig S10.

      Minor points

      -line 119: LOD and LOQ could be defined here.

      We agree that this should have been defined. We changed line 119 to explain LOD and LOQ to: …“the limit of detection (LOD) and limit of quantification (LOQ)”….

      • line 126: the title does not reflect the content of this paragraph.

      We have changed the title: “Immunolabeling allows quantification of ruptured oocysts ”into: A comparative analysis of oocyst densities using mercurochrome staining and anti-CSP immunostaining.

      -line 269: infectivity is not appropriate. The data show colonization of SG.

      Line 269: infectivity has been changed with colonization of salivary glands.

      There seems to be a problem with Fig S6. The graph seems to be the same as Fig 3C. Please check whether the graph and legends are correct.

      Supplementary Figure 6 shows the sporozoite expelling density in relation to infection burden with a threshold set at > 20 sporozoites while Fig 3C shows the total sporozoite density (residual salivary gland sporozoites + sporozoites expelled, X-axis) in relation to the number of expelled sporozoites (Y-axis) by COX-1qPCR without any threshold density. We have explained this in more detail in the revised supplemental figure where we now state

      “Of note, this figure differs from Figure 3C in the main text in the following manner. This figure presents sporozoite expelling density in relation to infection burden with a threshold set at > 20 sporozoites to conclude sporozoite positivity while Figure 3C shows the total sporozoite density (residual salivary gland sporozoites + sporozoites expelled, X-axis) in relation to the number of expelled sporozoites (Y-axis) by COX-1 qPCR without any threshold density and thus includes all observations with a qPCR signal”

      Reviewer #3 (Recommendations For The Authors):

      Congratulations to the authors for the really excellently designed and rigorously conducted studies.

      My main concern is in regards to the relatively high oocyst numbers in their experimental mosquitoes (from both sources of gametocytes) compared to what has been reported from wild-caught mosquitoes in previous studies in Burkina Faso.

      We have addressed this concern above. For completeness, we include the main points here again. We enriched gametocytes for efficiency reasons, experiments on gametocytes at physiological concentrations would have resulted in a lower oocyst density (and thus more ‘natural’ although a minority of individuals achieves very high oocyst densities in all studies that included a broad range of oocyst densities (e.g. doi: 10.1016/j.exppara.2014.12.010; doi: 10.1016/S1473-3099(18)30044-6). Of note, we did include 15 skins from low oocyst densities (1-4 oocysts). Whilst low oocyst densities were thus not very uncommon in our sample set, we acknowledge that this may have rendered some comparisons underpowered. At the same time, we observe a strong positive trend between oocyst density and sporozoite density and between salivary gland sporozoite density and mosquito inoculum. This makes it very likely that this trend is also present at lower oocyst densities, an association where sporozoite inoculation saturates at high densities is plausible and has been observed before for rodent malaria (DOI: 10.1371/journal.ppat.1008181) whilst we consider it less likely that sporozoite expelling would be more efficient at low (unmeasured) sporozoite densities. In the revised manuscript we have also performed our analysis including only the subset of mosquitoes with low oocyst burden.

      The best way to address this would be to do comparable artificial skin-feeding experiments on such wild-caught mosquitoes, but I appreciate that this is very difficult to do.

      This would indeed by difficult to do. Mostly because infection status can only be examined post-hoc and it is likely that >95% of mosquitoes are sporozoite negative at the moment experiments are conducted (in many settings this will even be >99%). Importantly, also in wild-caught mosquitoes very high oocyst burdens are observed in a small but relevant subset of mosquitoes (doi: 10.1016/j.ijpara.2020.05.012).

      Instead, I would suggest the authors conduct addition analysis of their data using different cut-offs for maximum oocyst numbers (e.g. <5, <10, <20) to determine if these correlations hold across the entire range of observed oocyst sheets and salivary gland sporozoite load.

      We have provided these calculations for the proposed range of oocyst numbers. In addition, we also provided them for a range of sporozoite densities. These findings are now provided in

      Entire range of observed oocyst sheets and salivary gland sporozoite load. A minor point on the regression lines in Figures 3 & 4: both variables in these plots have inherent variation (measurement & natural), but regression techniques such as reduced major exit regression (MAR) that allow error in both x and y variables may be preferable to a standard lines regression. Also, as it is implausible that mosquitoes with zero sporozoite in salivary glands expel several hundred sporozoites at feeding, the regression should probably also be constrained to pass through the 0,0 point.

      Since the main priority of the analyses is the correlation, and not the fit of the regression line – which is only for indication, and also because of the availability of software, we did not change the type of regression. We have however added a disclaimer to the legend, and we have also forced the intercept to 0 – which does indeed better reflect the biological association. Additionally we added 95% confidence intervals to all Spearman’s correlation coefficients in the legends.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors propose a hypothesis for ovarian carcinogenesis based on epidemiological data, and more specifically they suggest that the latter relates to ascending genital tract "infection" or "dysbiosis", the resulting fallopian tube inflammation ultimately predisposing to ovarian cancer.

      While this hypothesis would ideally be addressed in a longitudinal set-up with repeated female genital tract sampling, such an approach is obviously hard to realize. Rather, the authors present this hypothesis as a rationale for a cross-sectional study involving 81 patients with ovarian cancer (most with the most common subtype of high grade serous ovarian carcinoma, though other subtypes were also included), as well as 106 control patients with various non-infectious conditions including endometriosis and benign ovarian cysts. In all patients was there a comprehensive microbiome sampling of ovarian surface/fallopian tube, cervix and peritoneal cavity as well sampling of a number of potential sources of contamination, including surgery sites, ambient environment, consumables used in the DNA extraction and sequencing pipeline, etc. In line with the hypothesis presented at the outset, species with a threshold of at least 100 reads in both at least one cervical and at least one fallopian tube sample, while absent from environmental swabs, were considered relevant to the postulated pathway.

      Remarkably, fallopian tube microbiota in ovarian cancer patients tended to cluster more closely to those retrieved from the paracolic gutter, than fallopian tube microbiota in non-cancer controls, which showed more relative similarity to vaginal/genital tract microbiota.

      Although not really addressed by the authors, there also seem to be quite a few differences, at least in terms of abundance, in cervical microbiota between ovarian cancer patients and controls as well, which is an interesting finding, even when accounting for differences in age distribution between ovarian cancer patients and included control patients.

      Overall, very few data are available thus far on the upper genital tract/fallopian tube microbiome, while also invariably controversial, as it has proven extremely difficult to obtain pelvic samples in a valid, "sterile" manner, i.e. without affecting a resident low-biomass microbiome to be analyzed. The authors took a number of measures to counter so, and in this respect, this is likely the largest and most valid study on the subject, even though biases and contamination can never be completely excluded in this context.

      As such, I believe the strength of this study and paper primarily relates to the rigour of the methodology, thereby giving us a valuable insight in the presumed fallopian tube/ovarian surface microbiome, which may definitely serve as an impetus and a reference to future translational ovarian cancer research, or ovarian microbiome research for that matter.

      I believe that the authors should acknowledge in more detail, that the data obtained from their cross-sectional study, valid as these are, do not provide any direct support to the hypothesis - albeit also plausible - set forth, a discussion that I somehow missed to a certain extent. It is important to realize in this and related contexts that neoplasia may well induce microbiome alterations through a variety of mechanisms, hence microbiome alterations not per se being causative. Conclusions should therefore be more reserved. Along the same lines, potential biases introduced through the selection of control patients (some detail here would be insightful) also deserves some discussion, as it is not known, whether other conditions such as benign ovarian cysts or endometriosis have some relationship with the human microbiome, be it causative or 'reversely causative', see for instance very recent work in Science Translational Medicine.

      We appreciate the reviewer’s detailed review and thoughtful comments. We have added the following sentences in the Discussion to address the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

      Reviewer #2 (Public Review):

      The authors aimed to investigate the microbiota present in the fallopian tubes (FT) and its potential association with ovarian cancer (OC). They collected swabs intraoperatively from the FT and other surgical sites as controls to profile the FT microbiota and assess its relationship with OC.

      They observed a clear shift in the FT microbiota of OC patients compared to non-cancer patients. Specifically, the FT of OC patients had more types of bacteria typically found in the gastrointestinal tract and the mouth. In contrast, vaginal bacterial species were more prevalent in non-cancer patients. Serous carcinoma, the most common OC subtype, showed a higher prevalence of almost all FT bacterial species compared to other OC subtypes.

      The strengths of the study include its large sample size, rigorous collection methods, and use of controls to identify the possible contaminants. Additionally, the study employed advanced sequencing techniques for microbiota analysis. However, there are some weaknesses to consider. The study relied on swabs collected intraoperatively, which may not fully represent the microbiota in the FT during normal physiological conditions. The study also did not establish causality between the identified bacteria and OC but rather demonstrated an association. Regardless, the findings are important and these questions need to be addressed by future studies. A few additions in data representation and analysis are instead recommended.

      Overall, the authors achieved their aims of identifying the FT microbiota and assessing its relationship with OC. The results support the conclusion that there is a clear shift in the FT microbiota in OC patients, paving the way for further investigations into the role of these bacteria in the pathogenesis of ovarian cancer.

      The identification of specific bacterial species associated with OC could contribute to the development of novel diagnostic and therapeutic approaches. The study design and the data generated here can be valuable to the research community studying the microbiota and its impact on cancer development. However, further research is needed to validate these findings and elucidate the underlying mechanisms linking the FT microbiota shift and OC.

      We appreciate the reviewer’s detailed review and positive comments.

      Reviewer #3 (Public Review):

      The findings of Bo Yu and colleagues titled "Identification of fallopian tube microbiota and its association with ovarian cancer: a prospective study of intraoperative swab collections from 187 patients" describes the identification of the fallopian tube microbiome and relationship with ovarian cancer. The studies are highly rigorous obtaining specimens from the fallopian tube, ovarian surfaces, paracolic gutter of patients of known or suspected ovarian cancer or benign tumor patients. The investigators took great care to ensure there was no or limited contamination including test the surgical suite air, as the test locations are from low abundance microbiota. The findings provide evidence that the microbiota in the fallopian tube, especially in ovarian cancer has similarities to gut microbial communities. This is a potentially novel observation.

      The studies investigate the microbiome of >1000 swabs from 81 ovarian cancer and 106 non-cancer patients. The sites collected are low biomass microbiota making the study particularly challenging. The studies provide descriptive evidence that the ovarian cancer fallopian tube microbiota contain species that are similar to the gut microbiota. In contrast the fallopian tube microbiota of non-cancer patients that exhibit more similarity to the uterine/cervical microbiota. This may be a relevant observation but is highly descriptive with limited insights on the functional relevance.

      The data indicate the presence of low biomass FT microbiota. The findings support the existence of FT microbiota in ovarian cancer that appears to be related to gut microbial species. While interesting, there is no insights on how and why these microbial species are found in the FT. The studies only identify the species but there is no transcriptomic analysis to provide an indication on whether the bacteria are activating DNA damage pathways. This is an interesting observation that requires more insights to address how these bacteria reach the fallopian tube and a related question is whether these bacteria are found in the peritoneum.

      An additional concern is whether these data can be used to develop biomarkers of disease and early detection of disease. can the investigators detect the ovarian cancer FT microbiota in cervical/vaginal secretions? That may yield more significant insights for the field.

      We appreciate the reviewer’s detailed review and thoughtful comments. We have added the following sentences in the Discussion to acknowledge the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

      Reviewer #1 (Recommendations For The Authors):

      I have no additional comments here.

      Reviewer #2 (Recommendations For The Authors):

      The data analysis and data representation could be improved by the following points:

      1. To compare the microbiota and assess the overall microbiota structure difference between the cancer vs non cancer cohort alpha- and beta-diversity of the microbial communities can be conducted.

      2. A differential abundance analysis could also be conducted to assess the differences at the genera and taxa level between the cancer vs non cancer cohorts.

      3. The analysis suggested above can also be conducted in the serous vs non serous cancer cohorts.

      4. In Figure 4 and 5 it would be more intuitive to show the predominant niche of each bacterium by color coding

      We appreciate these helpful suggestions from the reviewer. We have added Figure 2B to address the diversity as well as the differences between cancer versus non-cancer cohorts. We have added in the Results section the description of our findings in Figure 2B. We have added color coding to Figure 4 and 5 as the reviewer suggested.

      Reviewer #3 (Recommendations For The Authors):

      These studies are interesting but are very descriptive with no obvious approaches for understanding the mechanisms of FT microbiota in ovarian cancer. The identification of these bacteria is not sufficient to draw implications on their impact on ovarian cancer development or progression. This needs to be addressed.

      We agree with the reviewer and have added the following sentences in the Discussion to acknowledge the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

    1. Author Response

      Responses to public reviews

      Reviewer 1

      We thank the reviewer for the valuable and constructive comments and are pleased that the re-viewer finds our study timely and our behavioral results clear.

      1) The RSA basically asks on the lowest level, whether neural activation patterns (as measured by EEG) are more similar between linked events compared to non-linked events. At least this is the first question that should be asked. However, on page 11 the authors state: "We ex-amined insight-induced effects on neural representations for linked events [...]". Hence, the critical analysis reported in the manuscript fully ignores the non-linked events and their neu-ral activation patterns. However, the non-linked events are a critical control. If the reported effects do not differ between linked and non-linked events, there is no way to claim that the effects are due to experimental manipulation - neither imagination nor observation. Hence, instead of immediately reporting on group differences (sham vs. control) in a two-way in-teraction (pre vs. post X imagination vs. observation), the authors should check (and re-port) first, whether the critical experimental manipulation had any effect on the similarity of neural activation patterns in the first place.

      We completely agree that the non-link items are a critical control. Therefore, we had reported not only the results for linked but also for non-linked events on page 15, lines 336-350. We clarified this important point now on page 12 lines 283-286:

      “Subsequently, we examined insight-induced effects on neural representations for linked (vs. non-linked) events by comparing the change from pre- to post-insight (post-pre) and the difference between imagination and observation (imagination - observation) between cTBS and sham groups using an independent cluster-based permutation t-test.”

      Moreover, to directly compare linked and non-linked events we performed a four-way in-teraction including link vs. non-link. This analysis yielded a significant four-way interaction, showing that the interaction of time (pre vs. post), mode of insight (imagination vs. obser-vation) and cTBS differed for linked vs. non-linked items. We then report the follow-up analyses, separately for linked and non-linked events. Please see pages 12-13, lines 287-294:

      “First, we included the within-subject factors time (pre vs. post), mode of insight (imagina-tion vs. observation) and link (vs. non-link) by calculating the difference waves. Subse-quently we conducted a cluster-based permutation test comparing the cTBS and the sham groups. This analysis yielded a four-way interaction within a negative cluster in a fronto-temporal region (electrode: FT7; p = 0.007, ci-range = 0.00, SD = 0.00). This result indicates that the impact of cTBS over the angular gyrus on the neural pattern reconfiguration follow-ing imagination- vs. observation-based insight may differ between linked and non-linked events. For linked events, this analysis yielded a […]”

      2) Overall, the focus on the targeted three-way interaction is poorly motivated. Also, a func-tional interpretation is largely missing.

      In order to better explain our motivation for the three-way interaction, we em-phasized in the introduction the importance of disentangling potential differences due to the mode of insight, given the known role of the angular gyrus in imagination on pages 4-5, lines 107-115:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked event.”

      As discussed on page 21 (starting from line 478; see also the intro on page 4), we expected that the angular gyrus would be particularly implicated in imagination-based insight, given its known role in imagination (e.g.: Thakral et al., 2017). Moreover, given the angular gyrus’s strong connectivity with other regions, the results observed may not be driven by this re-gion alone but also by interconnected regions, such as the hippocampus. We clarified these important points at the very end of the discussion on pages 23-24, lines 543-560:

      “Furthermore, the differential impact of cTBS to the angular gyrus on neural reconfigura-tions between events linked via imagination and those linked via observation may be at-tributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Uddin et al., 2010). This stronger connectivity between the ventral angular gyrus and the hippocampus may shed light on the greater impact of cTBS to the angular gyrus on im-agination-based insight. Given the angular gyrus’s robust connectivity with other brain re-gions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also origi-nate from these interconnected regions. This notion may bear particular importance given the required accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the an-gular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gy-rus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014).”

      3) "Interestingly, we observed a different pattern of insight-related representational pattern changes for non-linked events." It is not sufficient to demonstrate that a given effect is pre-sent in one condition (linked events) but not the other (non-linked events). To claim that there are actually different patterns, the authors would need to compare the critical condi-tions directly (Nieuwenhuis et al., 2011).

      We completely agree and now compared the two conditions directly. Specifical-ly, we now report the significant four-way interaction, including the factor link vs. non-link, before delving into separate analyses for linked and non-linked events on pages 12-13, lines 287-294:

      “First, we included the within-subject factors time (pre vs. post), mode of insight (imagina-tion vs. observation) and link (vs. non-link) by calculating the difference waves. Subse-quently we conducted a cluster-based permutation test comparing the cTBS and the sham groups. This analysis yielded a four-way interaction within a negative cluster in a fronto-temporal region (electrode: FT7; p = 0.007, ci-range = 0.00, SD = 0.00). This result indicates that the impact of cTBS over the angular gyrus on the neural pattern reconfiguration follow-ing imagination- vs. observation-based insight may differ between linked and non-linked events. For linked events, this analysis yielded a […]”

      4) "This analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B)." (p. 11). The authors report results with specificity for certain topographical locations. However, this is in stark contrast to the fact that the authors derived time X time RSA maps.

      We did derive time × time similarity maps for each electrode within each partic-ipant, which allowed us to find a cluster consisting of specific electrodes. We apologize for not making this aspect clear enough and have, therefore, modified the respective part of our methods section on page 38, lines 951-952:

      “In total, this analysis produced eight Representational Dissimilarity Matrices (RDMs) for each electrode and each participant.”

      5) "These theta power values were then combined to create representational feature vectors, which consisted of the power values for four frequencies (4-7 Hz) × 41 time points (0-2 sec-onds) × 64 electrodes. We then calculated Pearson's correlations to compare the power pat-terns across theta frequency between the time points of linked events (A with B), as well as between the time points of non-linked events (A with X) for the pre- and the post-phase separately, separately for stories linked via imagination and via observation. To ensure un-biased results, we took precautions not to correlate the same combination of stories twice, which prevented potential inflation of the data. To facilitate statistical comparisons, we ap-plied a Fisher z-transform to the Pearson's rho values at each time point. This yielded a global measure of similarity on each electrode site. We, thus, obtained time × time similarity maps for the linked events (A and B) and the non-linked events (A and X) in the pre- and post-phases, separately for the insight gained through imagination and observation." (p. 34+35).

      If RSA values were calculated at each time point and electrode, the Pearson correlations would have been computed effectively between four samples only, which is by far not enough to derive reliable estimates (Schönbrodt & Perugini, 2013). The problem is aggra-vated by the fact that due to the time and frequency smoothing inherent in the time-frequency decomposition of the EEG data, nearby power values across neighboring theta frequencies are highly similar to start with. (e.g., Schönauer et al., 2017; Sommer et al., 2022).

      Alternative approaches would be to run the correlations across time for each electrode (re-sulting in the elimination of the time dimension) or to run the correlations at each time point across electrodes (resulting in the elimination of topographic specificity).

      At least, the authors should show raw RSA maps for linked and non-linked events in the pre- and post-phases separately for the insight gained through imagination and observa-tion in each group, to allow for assessing the suitability of the input data (in the supple-ments?) before progressing to reporting the results of three-way interactions.

      Although we do see the reviewer’s point, we think that an RSA specific to the theta range yielding electrode specific time × time similarity maps must be run this way, otherwise, as you pointed out, one or the other dimension is compromised. Running an RSA across time for each electrode will lead to computing a similarity measure between the events without information on when these stimuli become more or less similar, thereby ig-noring the temporal dynamics crucial to EEG data and not taking advantage of the high temporal resolution. Conversely, conducting an RSA across electrodes might result in an overall similarity measure per participant, disregarding the spatial distribution and potential variations among electrodes. Although EEG has limited spatial resolution, different elec-trodes can capture differences that may aid in understanding neural processing. However, as suggested by the reviewer, we included the raw RSA maps for linked and non-linked events separately for pre- and post-phases, imagination and observation and link and non-link in the supplement and refer to these data in the results section on pages 12-13, lines 293-295:

      “For linked events, this analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B; Figure 3 – Figure sup-plement 1).”

      And on page 15, lines 339-341:

      “This analysis yielded a positive cluster (p = 0.035, ci-range = 0.00, SD = 0.00) in a fronto-temporal region (electrode: FT7; Fig. 3C; Figure 3 – Figure supplement 2).”

      Reviewer 2

      We thank the reviewer for the very helpful and constructive comments and appreciate that the reviewer finds our study relevant to all areas of cognitive research.

      1) While the observed memory reconfiguration/changes are attributed to the angular gyrus in this study, it remains unclear whether these effects are solely a result of the AG's role in re-configuration processes or to what extent the hippocampus might also mediate these memory effects (e.g., Tambini et al., 2018; Hermiller et al., 2019).

      We agree that, in addition to the critical role of the angular gyrus, there may be an involvement of the hippocampus. We point now explicitly to the modulatory capacities of angular gyrus stimulation on the hippocampus. Please see page 4, lines 81-88:

      “One promising candidate that may contribute to insight-driven memory reconfiguration is the angular gyrus. The angular gyrus has extensive structural and functional connections to many other brain regions (Petit et al., 2023), including the hippocampus (Coughlan et al., 2023; Uddin et al., 2010). Accordingly, previous studies have shown that stimulation of the angular gyrus resulted in altered hippocampal activity (Thakral et al., 2020; Wang et al., 2014). Furthermore, the angular gyrus has been implicated in a myriad of cognitive func-tions, including mental arithmetic, visuospatial processing, inhibitory control, and theory-of-mind (Cattaneo et al., 2009; Grabner et al., 2009; Lewis et al., 2019; Schurz et al., 2014).”

      We further added a new paragraph to the discussion pointing at the possibility that not solely the angular gyrus but another brain region, such as the hippocampus, may have me-diated the changes observed in our study on pages 23-24, lines 546-562:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagination-based insight. Given the angular gyrus’s robust connectivity with other brain regions, includ-ing the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      2) Another weakness in this manuscript is the use of different groups of participants for the key TMS intervention, along with underspecified or incomplete hypotheses/predictions.

      In our view, the chosen between-subjects design is to be preferred over a crossover design for several reasons. First, our choice aimed to eliminate potential se-quence effects that may have adversely affected performance in the narrative-insight task (NIT). Second, this approach ensured consistency in expectations regarding the story links while also mitigating potential differences induced by fatigue. Additionally, we accounted for the potential advantage of a within-subject design – the stimulation of the same brain – by utilizing neuro-navigated TMS for targeting the stimulation coordinate. Finally, it is im-portant to note that we measured the event representations pre- and post-insight and that also the mode of insight was manipulated within-subject. Thus, our design did include a within-subject component and we are convinced that the chosen paradigm balances the different strengths and weaknesses of within-subject and between-subjects designs in the best possible manner. We specified our rationale for choosing a between-subjects ap-proach in the introduction on page 5, lines 122-126:

      “We intentionally adopted a mixed design, combining both between-subjects and within-subject methodologies. The between-subjects approach was chosen to minimize the risk of carry-over effects and sequence biases. Simultaneously, we capitalized on the advantages of a within-subject design by altering the pre- to post-insight comparison and the mode of insight (imagination vs. observation) within each participant.”

      Moreover, to provide a comprehensive portrayal of the two groups, we incorporated de-scriptions concerning trait and state variables alongside age and motor thresholds and in-cluded t-test comparisons between these variables on page 7, lines 157-160:

      “Notably, the groups did not differ on levels of subjective chronic stress (TICS), state and trait anxiety (STAI-S, STAI-T), depressive mood (BDI), imaginative capacities (FFIS), person-ality dimensions (BFI), age, and motor thresholds (for descriptive statistics see Table 1; all p > 0.053).”

      And further included age and motor thresholds as control variables in Table 1 on page 18, lines 402-404:

      “Overall, levels of subjective chronic stress, anxiety, and depressive mood were relatively low and not different between groups. The groups did further not differ in terms of per-sonality traits, imagination capacity, age or motor thresholds (all p > 0.053; see Table 1).”

      For greater precision in outlining our hypotheses, we specified these at the end of the in-troduction on pages 4-55, lines 107-118:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked events. We further predicted that cTBS to the left angular gyrus would reduce the impact of (imagination-based) insight into the link of initially unrelated events on memory performance during free recall, given its higher variability compared to other memory measures.”

      3) Furthermore, in some instances, the types of analyses used do not appear to be suitable for addressing the questions posed by the current study, and there is limited explanation pro-vided for the choice of analyses and questionnaires.

      We addressed this concern by inserting a new section “control variables” in the methods explaining our rationale for employing the different questionnaires as control var-iables on pages 40-41, lines 1003-1019:

      “Control variables In order to ensure that the observed effects were solely attributable to the TMS manipula-tion and not influenced by other factors, we comprehensively evaluated several trait and state variables. To account for potential variations in anxiety levels that could impact our re-sults, we specifically measured state and trait anxiety using STAI-S and STAI-T (Laux et al., 1981), thus minimizing the potential confounding effects of anxiety on our findings (Char-pentier et al., 2021). Additionally, we evaluated participants’ chronic stress levels using the TICS (Schulz & Schlotz, 1999) to exclude any group variations that might explain the effect on memory, cosidering the well-established impact of stress on memory (Sandi & Pinelo-Nava, 2007; Schwabe et al., 2012). Moreover, we assessed participants’ depressive symp-toms employing the BDI (Hautzinger et al., 2006), to guarantee group comparability on this clinical measure. We further assessed fundamental personality dimensions using the BFI-2 (Danner et al., 2016) to exclude any potential group discrepancies that could account for dif-ferences observed. Lastly, we assessed participants’ imaginative capacities using the FFIS (Zabelina & Condon, 2019), to ensure uniformity across groups regarding this central varia-ble, considering the significant role of imagination in relation to the cTBS-targeted angular gyrus (Thakral et al., 2017).”

      We further specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-85:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Moreover, we added an explanation on why we opted for the RSA approach in the meth-ods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      Reviewer 3

      We thank the reviewer for the constructive and very helpful comments. We are pleased that the reviewer considered our experimental design to be strong and our behavioral results to be striking.

      1) My major criticism relates to the main claim of the paper regarding causality between the angular gyrus and the authors' behavior of interest. Specifically, I am not convinced by the evidence that the effects of stimulation noted in the paper are attributable specifically to the angular gyrus, and not other regions/networks.

      While our results showed specific changes after cTBS over the angular gyrus, demonstrating a causal involvement of the angular gyrus in these effects, we completely agree that this does not rule out an involvement of additional areas. In particular, there is evidence suggesting that cTBS over parietal regions, such as the angular gyrus, could poten-tially influence hippocampal functioning. We address this issue now in a new paragraph that we have added to the discussion, on pages 23-24, lines 546-564:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagination-based insight. Given the angular gyrus’s robust connectivity with other brain regions, includ-ing the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus. Expanding upon this idea, it is conceivable that targeting a more dorsal segment of the angular gyrus might exert a stronger influence on observation-based linking – an aspect that warrants future in-vestigations.”

      Responses to reviewer recommendations

      Reviewer 1

      1) On page 26, the authors write: "[...] different video events (A, B, and X) were recalled from day one [...]". I may have missed this point, but I had the impression that the task was con-ducted within one day.

      Indeed, this study was conducted within a single day. We rephrased the respec-tive statement accordingly. Please see page 7, lines 149-153:

      “To test this hypothesis and the causal role of the angular gyrus in insight-related memory reconfigurations, we combined the life-like video-based narrative-insight task (NIT) with representational similarity analysis of EEG data and (double-blind) neuro-navigated TMS over the left angular gyrus in a comprehensive investigation within a single day.”

      We further included this information in the methods section on page 27, lines 634-635:

      “In total, the experiment took about 4.5 hours per participant and was completed within a single day. ”

      Reviewer 2

      1) There is a substantial disconnection between the introduction and the methods/results sec-tion. One reason is that there is not sufficient detail regarding the hypotheses/predictions and the specific types of analyses chosen to test these hypotheses/predictions. Additionally, it is not explained what comparisons and outcomes would be informative/expected. This should be made clear. Second and related to the above, the rationale for conducting certain types of analyses (correlation, coherence, see below) sometimes is not specified.

      To address this concern, we elaborated on our hypotheses incorporating specif-ic predictions for the free recall, given its higher variability than the other memory measures, and for imagination vs. observation at the end of the introduction on pages 4-5, lines 107-122:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked events. We further predicted that cTBS to the left angular gyrus would reduce the impact of (imagination-based) insight into the link of initially unrelated events on memory performance during free recall, given its higher variability compared to other memory measures. Considering the high connectivity profile of the angular gyrus within the brain (Seghier, 2013), we conducted an EEG connec-tivity analysis building upon prior findings concerning alterations in neural reconfigurations. To establish a link between neural and behavioral findings, we chose a correlational ap-proach to relate observations from these two domains.”

      Moreover, we made our rationale for the employed analyses more explicit and specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-851:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Moreover, we added an explanation on why we opted for the RSA approach in the meth-ods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      2) The authors suggest that besides Branzi et al. (2021), this is one of the first studies showing that memory update is linked to the AG. I suggest having a look at work from Tambini, Nee, & D'Esposito, 2018, JoCN, and other papers from Joel Voss' group that target a similar re-gion of AG/Inferior parietal cortex. Many studies, using multiple TMS protocols, have now shown this brain region is causally involved in episodic and associative memory encoding.

      As mentioned above, further consideration of this literature is important as it delves into the region's hippocampal connectivity (and other network properties), and how that mediates the memory effects. Indeed because of the nature of the methods employed in this study, we do not know if the memory-related behavioural effects are due to TMS-changes induced at the AG's versus the hippocampal' s level, or both. How do the current findings square with the existing TMS effects from this region? Can the connectivity profile of the target re-gion highlighted by previous studies provide further insight into how the current behaviour-al effect arises? Some comments on this could be added to the discussion.

      We completely agree that the other studies showing enhanced associative memory after TMS to parietal regions need to be addressed. Therefore, we updated the discussion on page 20, lines 449-453:

      “Interestingly, recent work has additionally indicated that targeting parietal regions with TMS led to alterations in hippocampal functional connectivity, thereby enhancing associa-tive memory (Nilakantan et al., 2017; Tambini et al., 2018; Wang et al., 2014), potentially shedding light on the underlying mechanisms involved.”

      Moreover, we included a section specifically addressing the possibility that the effects ob-served may pertain to having modulated other regions via the targeted region and updated the discussion on pages 23-24, lines 543-562:

      “Furthermore, the differential impact of cTBS to the angular gyrus on neural reconfigura-tions between events linked via imagination and those linked via observation may be at-tributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Uddin et al., 2010). This stronger connectivity between the ventral angular gyrus and the hippocampus may shed light on the greater impact of cTBS to the angular gyrus on im-agination-based insight. Given the angular gyrus’s robust connectivity with other brain re-gions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also origi-nate from these interconnected regions. This notion may bear particular importance given the required accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the an-gular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gy-rus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      3) Another comment I have regards the results observed for the observation vs imagination insight conditions. The authors mention that the 'changes in representational similarity for the observation condition should be interpreted with caution, as these seemingly opposite changes appeared to be at least in part driven by group differences already in the pre-phase before participants gained insight.' I wonder what these group differences are and whether the authors have any hypothesis about what factors determined them.

      We could only speculate about the basis of the observed pre-insight phase dif-ferences. However, we provide now the raw RSA data as supplemental material to make the pattern of the (raw) RSA findings in the pre- and post-insight phases more transparent. We refer the interested reader to this material on pages 12-13, lines 293 to 295:

      “For linked events, this analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B; Figure 3 – Figure sup-plement 1).”

      And on page 15, lines 339-341:

      “This analysis yielded a positive cluster (p = 0.035, ci-range = 0.00, SD = 0.00) in a fronto-temporal region (electrode: FT7; Fig. 3C; Figure 3 – Figure supplement 2).”

      Furthermore, the age of participants is not reported separately for the two groups (cTBS to AG vs Sham), I think. This should be reported including a t-test showing that the two groups have the same age.

      We agree and report now explicitly that groups did not significantly differ in rel-evant control variables including age. Please see page 7, lines 157-160:

      “Notably, the groups did not differ on levels of subjective chronic stress (TICS), state and trait anxiety (STAI-S, STAI-T), depressive mood (BDI), imaginative capacities (FFIS), person-ality dimensions (BFI), age, and motor thresholds (for descriptive statistics see Table 1; all p > 0.053).”

      And further included age and motor thresholds as control variables in Table 1 on page 18, lines 402-412:

      “Overall, levels of subjective chronic stress, anxiety, and depressive mood were relatively low and not different between groups. The groups did further not differ in terms of per-sonality traits, imagination capacity, age or motor thresholds (all p > 0.053; see Table 1).”

      The fact this study is not a within-subject design makes difficult the interpretation of the results and this should be recognised as an important limitation of the study.

      As outlined above, a within-subject design would in our view come with several disadvantages, such as significant sequence/carry-over effects. Moreover, the neural rep-resentation change was measured in a pre-post design, enabling us to measure the insight-driven neural reconfiguration at the individual level.

      We clarify our rationale for the between-subjects factor TMS in the introduction on page 5, lines 122-126:

      “We intentionally adopted a mixed design, combining both between-subjects and within-subject methodologies. The between-subjects approach was chosen to minimize the risk of carry-over effects and sequence biases. Simultaneously, we capitalized on the advantages of a within-subject design by altering the pre- to post-insight comparison and the mode of insight (imagination vs. observation) within each participant.”

      Furthermore, we included our rationale for choosing a between-subjects approach for the crucial TMS manipulation in the methods section on page 25, lines 601-604:

      “We implemented a mixed-design including the within-subject factors link (linked vs. non-linked events), session (pre- vs. post-link), and mode (imagination vs. observation) as well as the between-subjects factor group (cTBS to the angular gyrus vs. sham) to mitigate the risk of carry-over effects and sequence biases of the crucial cTBS manipulation.”

      4) The angular gyrus is a heterogeneous region with multiple graded subregions. The one tar-geted in the present study is the ventral AG which has strong connections with the episodic-hippocampal memory system. I was wondering if this might explain why the AG TMS ef-fects on representational changes have been observed for events linked via imagination but not direct observation. Perhaps the stimulation of a more 'visual' AG subregion (see Hum-phreys et al., 2020, Cerebral Cortex) would have resulted in a different (opposite) pattern of results. It would be good to add some comments on this in the discussion.

      We appreciate this interesting perspective offered regarding the potential out-comes of our study, particularly in relation to the activation of a more ventral sub region of the angular gyrus. We incorporated this idea into our discussion, alongside considerations regarding the potential effects of a more dorsal angular gyrus stimulation on observation-based linking. However, caution is warranted recognizing the inherent limitations posed by the precision of TMS manipulations, which is further underscored by our electric field simu-lations, utilizing a 10 mm radius. We included this section in the discussion on pages 23-24, lines 546-569:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagina-tion-based insight. Given the angular gyrus’s robust connectivity with other brain regions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus. Expanding upon this idea, it is conceivable that targeting a more dorsal segment of the angular gyrus might exert a stronger influence on observation-based linking – an aspect that warrants future in-vestigations. Yet, while acknowledging the functional heterogeneity within the angular gy-rus (Humphreys et al., 2020), pinpointing specific sub regions via TMS remains challenging due to its limited focal precision at the millimeter level (Deng et al., 2013; Thielscher & Kammer, 2004), as reinforced by our electric field simulations utilizing a 10 mm radius. Hence, drawing definitive conclusions regarding distinct angular gyrus sub regions requires future research employing rigorous checks to assess the focality of their stimulation.”

      5) Regarding the methods section, I have the following specific queries. It is unclear what is the purpose of the coherence and correlation analyses (pages 35, 36). Could the authors pro-vide further clarification on this? These analyses seem not to be mentioned anywhere in the introduction. This should be clarified briefly in the introduction and then in the methods sec-tion. The same for the questionnaires (anxiety, stress, etc): It is unclear the reason for col-lecting this type of data. This should be clarified in the introduction as well.

      We agree, and have updated the introduction as follows on page 5, lines 118-122:

      “Considering the high connectivity profile of the angular gyrus within the brain (Seghier, 2013), we conducted an EEG connectivity analysis building upon findings from the RSA anal-yses concerning alterations in neural reconfigurations. To establish a link between neural and behavioral findings, we chose a correlational approach to relate observations from these two domains.”

      We additionally provided an explanation for including these questionnaires in the introduc-tion on page 5, lines 126-129:

      “To control for any group differences beyond the TMS manipulation, we gathered various control variables through questionnaires, including trait- and state-anxiety, depressive symptoms, chronic stress levels, personality dimensions, and imaginative capacities.”

      Moreover, we elaborated on the underlying rationale guiding our chosen analytical ap-proaches. Therefore, we specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-851:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Furthermore, we added an explanation on why we opted for the RSA approach in the methods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      6) The preregistration webpage is in German. This is not ideal as it means that the information is available only to German speakers.

      This webpage can easily be switched to English by changing the settings in the top right corner:

      To address this issue, we included a description of how to set the webpage to English in the methods section on page 25, lines 581-582:

      “For translation to English, please adjust the page settings located in the top right corner.”

      7) Page 18. 'NIT' and 'MAT' - avoid abbreviations when possible.

      We included the full name for the narrative-insight task (NIT) on page 7, line 151, line 153, and line 165, page 8 lines 177-178 and line 187, page 19 on line 427, page 26 on line 615, line 629 and line 632, page 27, line 653, page 30, lines 730-731, page 31, line 754, page 35, line 870, line 873, and page 36 and line 885.

      We further included the full name for the multi-arrangements task (MAT) on page 19, lines 428-429.

      8) Line 21....we further observed DECREASED...should be replaced with INCREASED, if I am not wrong.

      We checked the sentence again and it looks correct to us, since it describes the change for observation-based insight, not imagination-based insight. We clarified that this finding pertains to observation-based linking by modifying the sentence on page 23, lines 525-528, as follows:

      “Following cTBS to the angular gyrus, we further observed decreased pattern similarity for non-linked events in the observation-based condition, resembling the pattern change ob-served in the sham group for linked events, which may highlight the role of the angular gy-rus in representational separation during observation-based linking”

      Reviewer 3

      1) The major claim of the paper is that the angular gyrus is causally involved in insight-driven memory reconfiguration. To the authors' credit, they localized stimulation to the angular gyrus using an anatomical scan, the strength of the estimated electromagnetic field in the angular gyrus correlated with their behavioral results, and there were also brain-behavior correlations involving sensors located in the parietal lobe. However, the minimum evidence needed to claim causality is 1) evidence of a behavioral change (which the authors found) and 2) evidence of target engagement in the angular gyrus. It is also important to show brain-behavior correlations between target engagement and behavior. Although the au-thors stimulated the angular gyrus, that does not mean that rTMS specifically affected this region or that the behavioral results can be attributed to rTMS effects on the angular gyrus. As the authors point out, the angular gyrus has dense connections with other regions such as the hippocampus. In fact, several studies have shown that angular gyrus (or near AG) stimulation affects the hippocampal network (Wang et al., 2014, Science; Freedberg et al. 2019, eNeuro; Thakral et al., 2020, PNAS). EEG also has a poor spatial resolution, so even though the results were attributable to parieto-temporal sensors, this is not sufficient evi-dence to claim that the angular gyrus was modulated. Source localization would be re-quired to reconstruct the signal specifically from the AG. Thus, with the manuscript written as is, the authors can claim that "cTBS to the angular gyrus modulates insight-driven memory reconfiguration," but the current claim is not sufficiently substantiated.

      While acknowledging the potential role of the angular gyrus in driving the ob-served changes, we recognize that the available evidence may not be sufficient. Conse-quently, we have introduced several modifications within our manuscript to address this concern.

      In the revised Introduction, we now explicitly address the possibility of a stimulation of the hippocampus via the angular gyrus on page 4, lines 84-85:

      “Accordingly, previous studies have shown that stimulation of the angular gyrus resulted in altered hippocampal activity (Thakral et al., 2020; Wang et al., 2014).”

      Additionally, we included relevant evidence demonstrating previous instances of targeted stimulation of the angular gyrus, which led to alterations in hippocampal connectivity and associative memory. These insights have been included in the discussion on page 20, lines 449-453:

      “Interestingly, recent work has additionally indicated that targeting parietal regions with TMS led to alterations in hippocampal functional connectivity, thereby enhancing associa-tive memory (Nilakantan et al., 2017; Tambini et al., 2018; Wang et al., 2014), potentially shedding light on the underlying mechanisms involved.”

      Next, we have integrated crucial modifications essential for establishing a conclusive infer-ence of causality in our study. Moreover, we now explore the potential mediation of the effects observed from angular gyrus stimulation through other brain regions, like the hip-pocampus. In addition, we have highlighted prior work where such stimulation coincided with alterations in associative memory. For the updated discussion section, please see pag-es 23-24, lines 538-562:

      “Although our study provided evidence suggesting a causal role of the angular gyrus in in-sight-driven memory reconfigurations – highlighted by behavioral changes after cTBS to the angular gyrus, neural changes in left parietal regions, and relevant brain-behavior associa-tions – it is important to acknowledge the limitations imposed by the spatial resolution of EEG. Consequently, the precise source of the observed signal changes in the parietal re-gions remains uncertain, potentially tempering the definitive nature of these findings. Fur-thermore, the differential impact of cTBS to the angular gyrus on neural reconfigurations between events linked via imagination and those linked via observation may be attributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). An-other intriguing aspect to consider is that the stimulated site was situated in the more ven-tral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagina-tion-based insight. Given the angular gyrus’s robust connectivity with other brain regions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      We further replaced terms that imply inhibition of the angular gyrus with a more operation-ally descriptive phrase:

      “cTBS to the angular gyrus”

      2) The authors frequently claim that cTBS is "inhibitory stimulation" and that inhibition of the angular gyrus caused their effects. There is a common misconception within the cognitive neuroscience literature that stimulation is either "inhibitory" or "excitatory," but there is no such thing as either. The effects of rTMS are dependent on many physiological, state, and trait-specific variables and the location of stimulation. For example, while cTBS does repro-ducibly inhibit behavior supported by the motor cortex (Wilkinson et al., 2010, Cortex; Rosenthal et al., 2009, J Neurosci), cTBS of the posterior parietal cortex reproducibly en-hances hippocampal network functional connectivity and episodic memory (Hermiller et al., 2019, Hippocampus; Hermiller et al., 2020, J Neurosci). The authors reference the Huang et al. (2005) paper as evidence of its inhibitory effects but work in this paper is not sufficient to broadly categorize cTBS as inhibitory. First, Huang et al. stimulated the motor cortex and measured the effects on corticospinal excitability, which is significantly different from what the current authors are measuring. Furthermore, this oft-cited study only included 9 sub-jects. Other studies have found that the effects of theta-burst are significantly more varia-ble when more subjects are used. For example, intermittent theta-burst, which is assumed to be excitatory based on the Huang paper, was found to produce unreliable excitatory ef-fects when more subjects were examined (Lopez-Alonso, 2014, Brain Stimulation). Thus, the a priori assumption that stimulation would be inhibitory is weak and cTBS should not be dis-cussed as "inhibitory."

      We agree and included now a statement in the methods section that explicitly states that cTBS effects may be region-specific on page 33, lines 817-819:

      “Nonetheless, the effects of cTBS appear to vary based on the targeted region, with cTBS to parietal regions demonstrating the capability to enhance hippocampal connectivity (Hermiller et al., 2019, 2020).”

      We further substituted all terminology suggestive of an inhibitory effect with the phrase:

      “cTBS to the angular gyrus”.

      However, it is important to note, that while other studies (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014) found increased hippocampal connectivity after rTMS to a parie-tal region as well as enhanced associative memory, we observed impaired memory for the linked events. We included this clarification in the discussion on page 24, lines 558-562:

      “In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      3) The hypothesis at the end of the introduction did not strike me as entirely clear. From this hypothesis, it seems that the authors are just comparing the differences in memory and re-configuration during imagination-based insight links. However, the authors also include ob-servation-based links and a non-linking condition, which seem ancillary to the main hy-pothesis. Thus, I am confused about why these extra factors were included and exactly what statistical results would confirm the authors' hypothesis.

      We agree, and have clarified our hypotheses on pages 4-5, lines 107-115:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to reduce the increase in neural similarity for linked events and in-crease of neural dissimilarity for non-linked events.”

      4) Many of the distributions throughout the paper do not look normal. Was normality checked? Are non-parametric stats warranted?

      We evaluated and reported the normality assumption in our behavioral anal-yses. Despite the non-normal distribution of our data, we chose to utilize linear-mixed models due to their robust performance even in case of deviations from normal distribu-tions. This update in our methods section can be found on page 36, lines 890-896:

      “After outlier correction, we identified non-normality in our data using a Shapiro-Wilk test (narrative-insight task: W = 0.92, p < 0.001; multi-arrangements task: W = 0.94, p < 0.001; forced-choice recognition: W = 0.50, p < 0.001; free recall details: W = 0.85, p < 0.001; free recall naming of linking events: W = 0.94, p < 0.001). However, we mitigated this by employ-ing linear-mixed models (LMMs), recognized for their robustness even with non-normally distributed data (Schielzeth et al., 2020).”

      We recalculated the correlational analysis between the RSA data and the behavioral recall of linking events by using the Spearman method on page 13, lines 306-308:

      “Furthermore, to address a deviation from the normality assumption, the correlational analysis was repeated using the Spearman method, which indicated an even stronger cor-relation (r(59) = 0.32, p = 0.012).”

      We further recalculated the correlation between the change in coherence for linked events and the recall of details for events linked via imagination on page 16, lines 376-378:

      “Please note that for addressing a deviation from the normality assumption, the correla-tional analysis was repeated using the Spearman method, which yielded a significant corre-lation of similar strength (r(59) = 0.31, p = 0.015).”

      Our EEG analyses , including RSA and coherence analyses, utilized a cluster-based permuta-tion test (Fieldtrip; Oostenveld et al., 2011). These tests do not assume a normal distribu-tion by utilizing empirical sampling for statistical inference. This approach ensures robust-ness without constraints imposed by specific distributional assumptions. Subsequent t-tests, stemming from significant clusters identified in the initial non-parametric analyses, were extensions of the robust non-parametric approach and did not require additional normality testing.

      5) Can the authors include more detail about the sham coil? Was it subthreshold? Did the EMF cross the skull?

      The sham coil, also obtained from MAG & More GmbH, München, Germany, provided a similar sensory experience; however, the company did not specify any field strength (n.a.) as this coil was purposefully designed to prevent the induction of an elec-tromagnetic field (EMF) capable of penetrating the skull, thereby ensuring it had no impact on the brain. We clarified on this point in the methods section on pages 31-32, lines 772-778:

      “Two identically looking but different 70 mm figure-of-eight-shaped coils were used de-pending on the TMS condition: The PMD70-pCool coil (MAG & More GmbH, München, Germany) with a 2T maximum field strength was used for cTBS, while the PMD70-pCool-SHAM coil (MAG & More GmbH, München, Germany), with minimal magnetic field strength, was employed for sham, providing a similar sensory experience, with stimulation pulses being scattered over the scalp and not penetrating the skull.”

      6) There are differences between exclusion criteria in pre-registration and report. For example, BMI is an exclusion factor in the report, but not in the pre-registration. Can the authors provide a reason for this deviation?

      This discrepancy is due to (partial) participant recruitment from previous fMRI studies conducted in our lab that involved a stress induction protocol (as a structural MRI image was needed for the ‘neuronavigated’ TMS). Owing to the distinct cortisol stress reac-tivity observed in individuals with varying body mass indices (BMIs), participants with a BMI below 19 or above 26 kg/m² were excluded from these studies. To maintain consistency within our sample, only participants meeting these criteria were included. We elaborated on this point in the methods section on page 25, lines 586-592:

      “Participants were screened using a standardized interview for exclusion criteria that com-prised a history of neurological and psychiatric disease, medication use and substance abuse, cardiovascular, thyroid, or renal disease, evidence of COVID-19 infection or expo-sure, and any contraindications to MRI examination or TMS. Additionally, participants with a body mass index (BMI) below 19 or above 26 kg/m² were excluded. This decision stemmed from recruiting some participants from prior studies that incorporated stress induction pro-tocols, which imposed this specific criterion (Herhaus & Petrowski, 2018; Schmalbach et al., 2020).”

      7) Were impedances monitored and minimized during EEG?

      Yes, they were monitored. We clarified this point in the methods section on page 34, lines 845-847:

      “We maintained impedances within a range of ± 20 μV using the common mode sense (CMS) and driven right leg (DRL) electrodes, serving as active reference and ground, re-spectively”

      8) I think there may be a typo related to the Thakral coordinates. I believe Thakral used MNI coordinates -48,-64, 30, whereas the authors stated they used -48,-67,30. Is this a mistake?

      Upon reevaluation of our study coordinates, we identified a slight deviation in our stimulation coordinates compared to those reported by Thakral et al. (2017; +3mm on the y-axis). This variance resulted from the required MNI to Talairach (TAL) transformations necessary for utilizing the neuronavigation software Powermag View! (MAG & More GmbH, München, Germany). Notably, this deviation was consistent across all participants in our study. While TMS is more precise than tDCS, its focality is not as fine-grained down to the millimeter level. Despite this, our electric field simulations, adopting a 10mm radius, ef-fectively encompassed the original coordinates specified by Thakral et al. (2017). This radius ensured coverage over the intended target area, mitigating the impact of this minor devia-tion on the overall study outcomes. We updated the methods section accordingly on page 33, lines 800-806:

      “Based on the individual T1 MR images, we created 3D reconstructions of the participants' heads, allowing us to precisely locate the left angular gyrus coordinate (MNI: -48, -67, 30), initially derived from previous work (Thakral et al., 2017), for TMS stimulation. Despite a mi-nor deviation in coordinates due to necessary MNI to Talairach transformations for soft-ware compatibility (Powermag View! by MAG & More GmbH, München, Germany), our methodology ensured precise localization of the angular gyrus target area.”

      9) How was the tail of the coil positioned during stimulation? Was it individualized so that the lobes of the coil are perpendicular to the nearest gyrus, as is commonly done?

      The coil handle always pointed upwards to maintain optimal positioning with the coil holder. We followed the positioning procedure in the neuronavigation software Powermag View!, which did not indicate any positioning of the coil handle but specified the position and angle of the coil itself. To incorporate this aspect, we updated the legend of figure 2 on page 11, lines 260-261:

      “Please note that in the study, the coil handle was oriented upwards; however, in this illus-tration, it has been intentionally depicted as pointing downwards for better visibility pur-poses.”

      We further updated the method section on page 33, lines 723-824:

      “The coil was positioned tangentially on the head and mechanically fixed in a coil holder, with its handle pointing upwards to maintain its position”

    1. Author Response

      We are grateful for the insightful suggestions and comments provided by the reviewers. Your constructive feedback has been valuable, and we are thankful for the opportunity to address each point.

      We appreciate both reviewers’ recognition of our devotion to rigorous methodology and experimental control in this study, as evidenced by the comments: “remarkable efforts were made to isolate peripheral confounds”, “a clear strength of the study is the multitude of control conditions … that makes results very convincing”, and “thorough design of the study”. Indeed, we hope to have provided more than solid, but compelling evidence for sound-driven motor inhibitory effects of online TUS. We hope that this will be reflected in the assessment. Our conclusions are supported by multiple experiments across multiple institutions using exemplary experimental control including (in)active controls and multiple sound-sham conditions. This contrasts with the sole use of flip-over sham or no-stimulation conditions used in the majority of work to date. Indeed, the current study communicates that substantiated inferences on the efficacy of ultrasonic neuromodulation cannot be made under insufficient experimental control.

      In response to the reviewers' comments, we have substantially changed our manuscript. Specifically, we have open-sourced the auditory masking stimuli and specified them in better detail in the text, we have improved the figures to reflect the data more closely, we have clarified the intracranial doseresponse relationship, we have elaborated in the introduction, and we have further discussed the possibility of direct neuromodulation. We hope that you agree these changes have helped to substantially improve the manuscript.

      Public reviews

      1.1) Despite the main conclusion of the authors stating that there is no dose-response effects of TUS on corticospinal inhibition, both the comparison of Isppa and MEP decrease for Exp 1 and 2, and the linear regression between MEP decrease (relative to baseline) and the estimated Isppa are significant, arguing the opposite, that there is a dose-response function which cannot be fully attributed to difference in sound (since the relationship in inversed, lower intracranial Isppa leads to higher MEP decrease). These results suggest that doseresponse function needs to be further studied in future studies.

      We thank the reviewer for bringing up this point. While we are convinced our study provides no evidence for a direct neuromodulatory dose-response relationship, we have realized that the manuscript could benefit from improved clarity on this point.

      A dose-response relationship between TUS intensity and motor cortical excitability was assessed by manipulating free-water Isppa (Figure 4C). Here, no significant effect of free-water stimulation intensity was observed for Experiment I or II, thus providing no evidence for a dose-response relationship (Section 3.2). To aid in clarity, ‘N.S.’ has been added to Figure 4C in the revised manuscript.

      However, it is likely that the efficacy of TUS would depend on realized intracranial intensity, which we estimated with 3D simulations for on-target stimulation. These simulations resulted in an estimated intracranial intensity for each applied free-water intensity (i.e., 6.35 and 19.06 W/cm2), for each participant. We then tested whether inter-individual differences in intracranial intensity during on-target TUS affected MEP amplitude. We have realized that the original visualization used to display these data and its explanation was unintuitive. Therefore, we have completely revised Supplementary Figure 6. Because of the substantial length of this section, we have not copied it here. Please see the Supplementary material for the implemented improvements.

      In brief, we now show MEP amplitudes on the y-axis, rather than expressing values a %change. This plot depicts how individuals with higher intracranial intensities during ontarget TUS exhibit higher MEP amplitudes. However, this same relationship is observed for active control and sound-sham conditions. If there were a direct neuromodulatory doseresponse relationship of TUS, this would be reflected as the difference between on-target and control conditions changing as the estimated intracranial intensity increases. This was not the case. Further, the fact that the difference between on-target stimulation and baseline changes across intracranial intensities is notable, but this occurs to an equal degree in the control conditions. Therefore, these data cannot be interpreted as evidence for a doseresponse relationship.

      We hope the changes in Supplementary Figure 6 will make it clear that there is no evidence for direct intracranial dose-response effects.

      1.2) Other methods to test or mask the auditory confound are possible (e.g., smoothed ramped US wave) which could substantially solve part of the sound issue in future studies or experiments in deaf animals etc... 

      We agree with the reviewer’s statement. We aimed to replicate the findings of online motor cortical inhibition reported in prior work using a 1000 Hz square wave modulation frequency. While ramping can effectively reduce the auditory confound, as noted in the discussion, this is not feasible for the short pulse durations (0.1-0.3 ms) employed in the current study (Johnstone et al., 2021). We have further clarified this point in the methods section of the revised manuscript as follows:

      “While ramping the pulses can in principle mitigate the auditory confound (Johnstone et al., 2021; Mohammadjavadi et al., 2019), doing so for such short pulse durations (<= 0.3 ms) is not effective. Therefore, we used a rectangular pulse shape to match prior work.”

      Mitigation of the auditory confound by testing deaf subjects is a valid approach, and has now been added to the revised manuscript in the discussion as follows:

      “Alternative approaches could circumvent auditory confounds by testing deaf subjects, or perhaps more practically by ramping the ultrasonic pulse to minimize or even eliminate the auditory confound.”

      1.3) Dose-response function is an extremely important feature for a brain stimulation technique. It was assessed in Exp II by computing the relationship between the estimated intracranial intensities and the modulation of corticospinal excitability (Fig. 3b, 3c). It is not clear why data from Experiment I could not be integrated in a global intracranial dose-response function to explore wider ranges of intracranial intensities and MEP variability.

      We chose not to combine data from Experiment 1 in a global intracranial dose-response function because TUS was applied at different fundamental frequencies and focal depths (Experiment I: 500 kHz, 35 mm; Experiment II: 250 kHz, 28 mm). We have now explicitly communicated this under Supplementary Figure 6:

      “It was not appropriate to combine data from Experiments I and II given the different fundamental frequencies and stimulation depths applied… we ran simple linear models for Experiment II, which had a sufficient sample size (n = 27) to assess inter-individual variability.”

      1.4) Furthermore, the dose response function as computed with the MEP change relative to baseline shows a significant effect (6.35W/cm2) or a trend (19.06 W/cm2) for a positive linear relationship. This comparison cannot disentangle the auditory confound from the pure neuromodulatory effect but given the direction of the relationship (lower Isppa associated with larger neuromodulatory effect), it is unlikely that it is driven by sound. This relationship is absent for the Active control condition or the Sound Sham condition, more or less matched for peripheral confound. This needs to be further discussed. 

      Please refer to point 1.1

      1.5) The clear auditory confound arises from TUS pulsing at audible frequencies, which can be highly subject to inter-individual differences. Did the authors individually titrate the auditory mask to account for this intra- and inter-individual variability in auditory perception? 

      In Experiments I-III, the auditory mask was identical between participants. In Experiment IV, the auditory mask volume and signal-to-noise ratio were adjusted per participant. In the discussion we recommend individualized mask titration. However, we do note that masking successfully blinded participants in Experiment II, despite using uniform masking stimuli (Supplementary Figure 5).

      1.6) How different is the masking quality when using bone-conducting headphones (e.g., Exp. 1) compared to in-ear headphones (e.g., Exp. 2)?

      In our experience, bone conducting headphones produce a less clear, fuzzier, sound than in-ear headphones. However, in-ear headphones block the ear canal and likely result in the auditory confound being perceived as louder. We have included this information in the discussion of the revised manuscript:

      “Titrating auditory mask quality per participant to account for intra- and inter-individual differences in subjective perception of the auditory confound would be beneficial. Here, the method chosen for mask delivery must be considered. While bone-conducting headphones align with the bone conduction mechanism of the auditory confound, they might not deliver sound as clearly as in-ear headphones or speakers. Nevertheless, the latter two rely on airconducted sound. Notably, in-ear headphones could even amplify the perceived volume of the confound by obstructing the ear canal.”

      1.7) I was not able to find any report on the blinding efficacy of Exp. 1. Do the authors have some data on this? 

      We do not have blinding data available for Experiment I. Following Experiment I, we decided it would be useful to include such an assessment in Experiment II.

      1.8) Was the possibility to use smoothed ramped US wave form ever tested as a control condition in this set of studies, to eventually reduce audibility? For such fast PRF, for fast PRF, the slope would still need to be steep to stimulate the same power (AUC), it might not be as efficient. 

      We indeed tested smoothing (ramping) the waveform. There was no perceptible impact on the auditory confound volume. Indeed, prior research has also indicated that ramping over

      such short pulse durations is not effective (Johnstone et al., 2021). Taken together, we chose to continue with a square wave modulation as in prior TUS-TMS studies. We have updated the methods section of the manuscript with the following:

      “While ramping the pulses can in principle mitigate the auditory confound (Johnstone et al., 2021; Mohammadjavadi et al., 2019), doing so for such short pulse durations (<= 0.3 ms) is not effective. Therefore, we used a rectangular pulse shape to match prior work.”

      Importantly, our research shows that auditory co-stimulation can confound effects on motor excitability, and this likely occurred in multiple seminal TUS studies. While some preliminary work has been done on the efficacy of ramping in humans, future work is needed to determine what ramp shapes and lengths are optimal for reducing the auditory confound.

      1.9) There are other models or experiments that need to be discussed in order to clearly disassociate the TUS effect from the auditory confound effect, for instance, testing deaf animal models or participants, or experiments with multi-region recordings (to rule out the effects of the dense structural connectivity between the auditory cortex and the motor cortex). 

      The suggestion to consider multi-region recording in future experiments is important. Indeed, the effects of the auditory confound are expected to vary between brain regions. In the primary motor cortex, we observe a learned inhibition, which is perhaps supported by dense structural connectivity with the auditory system. In contrast, in perceptual areas such as the occipital cortex, one might expect tuned attentional effects in response to the auditory cue. We suggest that it is likely that the impact of the auditory confound also operates on a more global network level. It is reasonable to propose that, in a cognitive task for example, the confound will affect task performance and related brain activity, ostensibly regardless of the extent of direct structural connectivity between the auditory cortex and the (stimulated) region of interest.

      Regarding the testing of deaf subjects, this has been included in the revised discussion as follows:

      “Alternative approaches could circumvent auditory confounds by testing deaf subjects, or perhaps more practically by ramping the ultrasonic pulse to minimize or even eliminate the auditory confound.”

      1.10) The concept of stochastic resonance is interesting but traditionally refers to a mechanism whereby a particular level of noise actually enhances the response of non-linear systems to weak sensory signals. Whether it applies to the motor system when probed with suprathreshold TMS intensities is unclear. Furthermore, whether higher intensities induce higher levels of noise is not straightforward neither considering the massive amount of work coming from other NIBS studies in particular. Noise effects are indeed a function of noise intensity, but exhibit an inverted U-shape dose-response relationship (Potok et al., 2021, eNeuro). In general SR is rather induced with low stimulation intensities in particular in perceptual domain (see Yamasaki et al., 2022, Neuropsychologia).  In the same order of ideas, did the authors compare inter-trials variability across the different conditions? 

      We thank the reviewer for these insightful remarks. Indeed, stochastic resonance is a concept first formalized in the sensory domain. Recently, the same principles have been shown to apply in other domains as well. For example, transcranial electric noise (tRNS) exhibits similar stochastic resonance principles as sensory noise (Van Der Groen & Wenderoth, 2016). Indeed, tRNS has been applied to many cortical targets, including the motor system. In the current manuscript, we raise the question of whether TUS might engage with neuronal activity following principles similar to tRNS. One prediction of this framework would be that TUS might not modulate excitation/inhibition balance overall, but instead exhibit an inverted U-shape dose-dependent relationship with stochastic noise. Please note, we do not use the ‘suprathreshold TMS intensity’ to quantify whether noise could bring a sub-threshold input across the detection threshold, nor whether it could bring a sub-threshold output across the motor threshold. Instead, we use the MEP read-out to estimate the temporally varying excitability itself. We argue that MEP autocorrelation captures the mixture of temporal noise and temporal structure in corticospinal excitability. Building on the non-linear response of neuronal populations, low stochastic noise might strengthen weakly present excitability patterns, while high stochastic noise might override pre-existing excitability. It is therefore not the overall MEP amplitude, but the MEP timeseries that is of interest to us. Here, we observe a non-linear dose-dependent relationship, matching the predicted inverted U-shape. Importantly, we did not intend to assume stochastic resonance principles in the motor domain as a given. We have now clarified in the revised manuscript that we propose a putative framework and regard this as an open question:

      “Indeed, human TUS studies have often failed to show a global change in behavioral performance, instead finding TUS effects primarily around the perception threshold where noise might drive stochastic resonance (Butler et al., 2022; Legon et al., 2018). Whether the precise principles of stochastic resonance generalize from the perceptual domain to the current study is an open question, but it is known that neural noise can be introduced by brain stimulation (Van Der Groen & Wenderoth, 2016). It is likely that this noise is statedependent and might not exceed the dynamic range of the intra-subject variability (Silvanto et al., 2007). Therefore, in an exploratory analysis, we exploited the natural structure in corticospinal excitability that exhibits as a strong temporal autocorrelation in MEP amplitude.”

      Following the above reasoning, we felt it critical to estimate noise in the timeseries, operationalized as a t-1 autocorrelation, rather than capture inter-trial variability that ignores the timeseries history and requires data aggregation thereby reducing statistical power. Importantly, we would expect the latter index to capture global variability, putatively masking the temporal relationships which we were aiming to test. The reviewer raises an interesting option, inviting us to wonder if inter-trial variability might be sensitive enough, nonetheless. To this end, we compared inter-trial variability as suggested. This was achieved by first calculating the inter-trial variability for each condition, and then running a three-way repeated measures ANOVA on these values with the independent variables matching our autocorrelation analyses, namely, procedure (on-target/active control)intensity (6.35/19.06)masking (no mask/masked). This analysis did not reveal any significant interactions or main effects.

      Author response table 1.

      1.11) State-dependency/Autocorrelations: These values were extracted from Exp2 which has baseline trials. Can the authors provide autocorrelation values at baseline, with and without auditory mask?  Can the authors comment on the difference between the autocorrelation profiles of the active TUS condition at 6.35W/cm2 or at 19.06W/cm2. They should somehow be similar to my understanding.  Besides, the finding that TUS induces noise only when sound is present and at lower intensities is not well discussed. 

      In the revised manuscript, we have now included baseline in the figure (Figure 4D). Regarding baseline with and without a mask, we must clarify that baseline involves only TMS (no mask), and sham involves TMS + masking stimulus (masked).

      The dose-dependent relationship of TUS intensity with autocorrelation is critical. One possible observation would have been that TUS at both intensities decreased autocorrelation, with higher intensities evoking a greater reduction. Here, we would have concluded that TUS introduced noise in a linear fashion.

      However, we observed that lower-intensity TUS in fact strengthened pre-existing temporal patterns in excitability (higher autocorrelation), while during higher-intensity TUS these patterns were overridden (lower autocorrelation). This non-linear relationship is not unexpected, given the non-linear responses of neurons.

      If this non-linear dependency is driven by TUS, one could expect it to be present during conditions both with and without auditory masking. However, the preparatory inhibition effect of TUS likely depends on the salience of the cue, that is, the auditory confound. In trials without auditory masking, the salience of the confound in highly dependent on (transmitted) intensity, with higher intensities being perceived as louder. In contrast, when trials are masked, the difference in cue salience between lower and higher intensity stimulation in minimized. Therefore, we would expect for any nuanced dose-dependent direct TUS effect to be best detectable when the difference in dose-dependent auditory confound perception is minimized via masking. Indeed, the dose-dependent effect of TUS on autocorrelation is most prominent when the auditory confound is masked.

      “In sum, these preliminary exploratory analyses could point towards TUS introducing temporally specific neural noise to ongoing neural dynamics in a dose-dependent manner, rather than simply shifting the overall excitation-inhibition balance. One possible explanation for the discrepancy between trials with and without auditory masking is the difference in auditory confound perception, where without masking the confound’s volume differs between intensities, while with masking this difference is minimized. Future studies might consider designing experiments such that temporal dynamics of ultrasonic neuromodulation can be captured more robustly, allowing for quantification of possible state-dependent or nondirectional perturbation effects of stimulation.”

      1.12) Statistical considerations. Data from Figure 2 are considered in two-by-two comparisons. Why not reporting the ANOVA results testing the main effect of TUS/Auditory conditions as done for Figure 3. Statistical tables of the LMM should be reported. 

      Full-factorial analyses and main effects for TUS/Auditory conditions are discussed from Section 3.2 onwards. These are the same data supporting Figure 2 (now Figure 3). We would like to note that the main purpose of Figure 2 is to demonstrate to the reader that motor inhibition was observed, thus providing evidence that we replicated motor inhibitory effects of prior studies. A secondary purpose is to visually represent the absence of direct and spatially specific neuromodulation. However, the appropriate analyses to demonstrate this are reported in following sections, from Section 3.2 onwards, and we are concerned that mentioning these analyses earlier will negatively impact comprehensibility.

      Statistical tables of the LMMs are provided within the open-sourced data and code reported at the end of the paper, embedded within the output which is accessible as a pdf (i.e., analysis/analysis.pdf).

      1.13) Startle effects: The authors dissociate two mechanisms through which sound cuing can drive motor inhibition, namely some compensatory expectation-based processes or the evocation of a startle response. I find the dissociation somehow artificial. Indeed, it is known that the amplitude of the acoustic startle response habituates to repetitive stimulation. Therefore, sensitization can well explain the stabilization of the MEP amplitude observed after a few trials. 

      Thank you for bringing this to our attention. Indeed, an acoustic startle response would habituate over repetitive stimulation. A startle response would result in MEP amplitude being significantly altered in early trials. As the participant would habituate to the stimulus, the startle response would decrease. MEP amplitude would then return to baseline levels. However, this is not the pattern we observe. An alternative possibility is that participants learn the temporal contingency between the stimulus and TMS. Here, compensatory expectation-based change in MEP amplitude would be observed. In this scenario, there would be no change in MEP amplitude during early trials because the stimulus has not yet become informative of the TMS pulse timing. However, as participants learn how to predict TMS timing by the stimulus, MEP amplitude would decrease. This is also the pattern we observe in our data. We have clarified these alternatives in the revised manuscript as follows:

      “Two putative mechanisms through which sound cuing may drive motor inhibition have been proposed, positing either that explicit cueing of TMS timing results in compensatory processes that drive MEP reduction (Capozio et al., 2021; Tran et al., 2021), or suggesting the evocation of a startle response that leads to global inhibition (Fisher et al., 2004; Furubayashi et al., 2000; Ilic et al., 2011; Kohn et al., 2004; Wessel & Aron, 2013). Critically, we can dissociate between these theories by exploring the temporal dynamics of MEP attenuation. One would expect a startle response to habituate over time, where MEP amplitude would be reduced during startling initial trials, followed by a normalization back to baseline throughout the course of the experiment as participants habituate to the starling stimulus. Alternatively, if temporally contingent sound-cueing of TMS drives inhibition, MEP amplitudes should decrease over time as the relative timing of TUS and TMS is being learned, followed by a stabilization at a decreased MEP amplitude once this relationship has been learned.”

      1.14) Can the authors further motivate the drastic change in intensities between Exp1 and 2? Is it due to the 250-500 carrier difference? It this coming from the loss power at 500kHz? 

      The change in intensities between Experiments I and II was not an intentional experimental manipulation. Following completion of data acquisition, our TUS system received a firmware update that differentially corrected the 250 kHz and 500 kHz stimulation intensities. In this manuscript, we report the actual free-water intensities applied during our experiments.

      1.15) Exp 3: Did 4 separate blocks of TUS-TMS and normalized for different TMS intensities used with respect to baseline. But how different was it. Why adjusting and then re adjusting intensities? 

      The TMS intensities required to evoke a 1 mV MEP under the four sound-sham conditions significantly differed from the intensities required for baseline. In the revised appendix, we have now included a figure depicting the TMS intensities for these conditions, as well as statistical tests demonstrating each condition required a significantly higher TMS intensity than baseline.

      TMS intensities were re-adjusted to avoid floor effects when assessing the efficacy of ontarget TUS. Sound-sham conditions themselves attenuate MEP amplitude. This is also evident from the higher TMS intensities required to evoke a 1 mV MEP under these conditions. If direct neuromodulation by TUS would have further decreased MEP amplitude, the concern was that effects might not be detectible within such a small range of MEP amplitudes.

      1.16) In Exp 4, TUS targeted the ventromedial WM tract. Since direct electrical stimulation on white matter pathways within the frontal lobe can modulate motor output probably through dense communication along specific white matter pathways (e.g., Vigano et al., 2022, Brain), how did the authors ensure that this condition is really ineffective? Furthermore, the stimulation might have covered a lot more than just white matter. Acoustic and thermal simulations would be helpful here as well. 

      Thank you for pointing out this possibility. Ultrasonic and electrical stimulation have quite distinct mechanisms of action. Therefore, it is challenging to directly compare these two approaches. There is a small amount of evidence that ultrasonic neuromodulation of white matter tracts is possible. However, the efficacy of white matter modulation is likely much lower, given the substantially lesser degree of mechanosensitive ion channel expression in white matter as opposed to gray matter (Sorum et al., 2020, PNAS). Further, recent work has indicated that ultrasonic neuromodulation of myelinated axonal bundles occurs within the thermal domain (Guo et al., 2022, SciRep), which is not possible with the intensities administered in the current study. Nevertheless, based on Experiment IV in isolation, it cannot be definitively excluded that there TUS induced direct neuromodulatory effects in addition to confounding auditory effects. However, Experiment IV does not possess sufficient inferential power on its own and must be interpreted in tandem with Experiments I-III. Taken together with those findings, it is unlikely that a veridical neuromodulation effect is seen here, given the equivalent or lower stimulation intensities, the substantially deeper stimulation site, and the absence of an additional control condition in Experiment IV. This likelihood is further decreased by the fact that inhibitory effects under masking descriptively scale with the audibility of TUS.

      Off-target effects such as unintended co-stimulation of gray matter when targeting white matter is always an important factor to consider. Unfortunately, individualized simulations for Experiment IV are not available. However, the same type of transducer and fundamental frequency was used as in Experiment II, for which we do have simulations. Given the size of the focus and the very low in-situ intensities extending beyond the main focal point, it is incredibly unlikely that effective stimulation was administered outside white matter in a meaningful number of participants. Nevertheless, the reviewer is correct that this can only be directly confirmed with simulations, which remain infeasible due to both technical and practical constraints. We have included the following in the revised manuscript:

      “The remaining motor inhibition observed during masked trials likely owes to, albeit decreased, persistent audibility of TUS during masking. Indeed, MEP attenuation in the masked conditions descriptively scale with participant reports of audibility. This points towards a role of auditory confound volume in motor inhibition (Supplementary Fig. 8). Nevertheless, one could instead argue that evidence for direct neuromodulation is seen here. This unlikely for a number of reasons. First, white matter contains a lesser degree of mechanosensitive ion channel expression and there is evidence that neuromodulation of these tracts may occur primarily in the thermal domain (Guo et al., 2022; Sorum et al., 2021). Second, Experiment IV lacks sufficient inferential power in the absence of an additional control and must therefore be interpreted in tandem with Experiments I-III. These experiments revealed no evidence for direct neuromodulation using equivalent or higher stimulation intensities and directly targeting grey matter while also using multiple control conditions. Therefore, we propose that persistent motor inhibition during masked trials owes to continued, though reduced, audibility of the confound (Supplementary Fig. 8). However, future work including an additional control (site) is required to definitively disentangle these alternatives.”

      1.17) Still for Exp 4. the rational for the 100% MSO or 120% or rMT is not clear, especially with respect to Exp 1 and 2. Equipment is similar as well as raw MEPs amplitudes, therefore the different EMG gain might have artificially increased TMS intensities. Could it have impacted the measured neuromodulatory effects?

      Experiment IV was conducted independently at a different institute than Experiments I-II. In contrast to Experiments I-II, a gel pad was used to couple TUS to the participant’s head. The increased TMS-to-cortex distance introduced by the gel pad necessitates higher TMS intensities to compensate for the increased offset. In fact, in 9/12 participants, the intended intensity at 120% rMT exceeded the maximum stimulator output. In those cases, we defaulted to the maximum stimulator output (i.e., 100% MSO). We have clarified in the revised supplementary material as follows:

      “We aimed to use 120% rMT (n =3). However, if this intensity surpassed 100% MSO, we opted for 100% MSO instead (n = 9). The mean %MSO was 94.5 ± 10.5%. The TMS intensities required in this experiment were higher than those required in Experiment I-II using the same TMS coil, though still within approximately one standard deviation. This is likely due to the use of a gel pad, which introduces more distance between the TMS coil and the scalp, thus requiring a higher TMS intensity to evoke the same motor activity.”

      Regarding the EMG gain, this did not affect TMS intensities and did not impact the measured neuromodulatory effects. The EMG gain at acquisition is always considered during signal digitization and further analyses.

      1.18) Exp. 4. It would be interesting to provide the changes in MEP amplitudes for those subjects who rated "inaudible" in the self-rating compared to the others. That's an important part of the interpretation: inaudible conditions lead to inhibition, so there is an effect. The auditory confound is not additive to the TUS effect. 

      Previously, we only provided participant’s ratings of audibility, and showed that conditions that were rated as inaudible more often showed less inhibition, descriptively indicating that inaudible stimulation does not lead to inhibition. This interpretation is in line with our conclusion that the TUS auditory confound acts as a cue signaling the upcoming TMS pulse, thus leading to preparatory inhibition.

      We have now included an additional plot and discussion in Supplementary Figure 8 (Subjective Report of TUS Audibility). Here, we show the change in MEP amplitude from baseline for the three continuously masked TUS intensities as in the main manuscript, but now split by participant rating of audibility. Descriptively, less audible sounds result in no marked change or a smaller change in MEP amplitude. This supports our conclusion that direct neuromodulation is not being observed here. When participants were unsure whether they could hear TUS, or when they did hear TUS, more inhibition was observed. However, this is still to a lesser degree than unmasked stimulation which was nearly always audible, and likely also more salient. This also supports our conclusion that these results indicate a role of cue salience rather than direct neuromodulation. Regarding masked conditions where participants were uncertain whether they heard TUS, the sound was likely sufficient to act as a cue, albeit potentially subliminally. After all, preparatory inhibition is not a conscious action undertaken by the participant either. We would also like to note that participants reported perceived audibility after each block, not after each trial, so selfreported audibility was not a fine-grained measurement. The data from Experiment IV suggest that the volume of the cue has an impact on motor inhibition. Taken together with the points mentioned in 1.16, it is not possible to conclude there is evidence for direct neuromodulation in Experiment IV.

      1.19) I suggest to re-order sub panels of the main figures to fit with the chronologic order of appearance in the text. (e.g Figure 1 with A) Ultrasonic parameters, B) 3D-printed clamp, C) Sound-TMS coupling, D) Experimental condition). 

      We have restructured the figures in the manuscript to provide more clarity and to have greater alignment with the eLife format.

      2.1) Although auditory confounds during TUS have been demonstrated before, the thorough design of the study will lead to a strong impact in the field.

      We thank the reviewer for recognition of the impact of our work. They highlight that auditory confounds during TUS have been demonstrated previously. Indeed, our work builds upon a larger research line on auditory confounds. The current study extends on the confound’s presence by quantifying its impact on motor cortical excitability, but perhaps more importantly by invalidating the most robust and previously replicable findings in humans. Further, this study provides a way forward for the field, highlighting the necessity of (in)active control conditions and tightly matched sham conditions for appropriate inferences in future work. We have amended the abstract to better reflect these points:

      “Primarily, this study highlights the substantial shortcomings in accounting for the auditory confound in prior TUS-TMS work where only a flip-over sham control was used. The field must critically reevaluate previous findings given the demonstrated impact of peripheral confounds. Further, rigorous experimental design via (in)active control conditions is required to make substantiated claims in future TUS studies.”

      2.2) A few minor [weaknesses] are that (1) the overview of previous related work, and how frequent audible TUS protocols are in the field, could be a bit clearer/more detailed

      We have expanded on previous related work in the revised manuscript:

      “Indeed, there is longstanding knowledge of the auditory confound accompanying pulsed TUS (Gavrilov & Tsirulnikov, 2012). However, this confound has only recently garnered attention, prompted by a pair of rodent studies demonstrating indirect auditory activation induced by TUS (Guo et al., 2022; Sato et al., 2018). Similar effects have been observed in humans, where exclusively auditory effects were captured with EEG measures (Braun et al., 2020). These findings are particularly impactful given that nearly all TUS studies employ pulsed protocols, from which the pervasive auditory confound emerges (Johnstone et al., 2021).”

      2.3) The acoustic control stimulus can be described in more detail

      We have elaborated upon the masking stimulus for each experiment in the revised manuscript as follows:

      Experiment I: “In addition, we also included a sound-only sham condition that resembled the auditory confound. Specifically, we generated a 1000 Hz square wave tone with 0.3 ms long pulses using MATLAB. We then added white noise at a signal-to-noise ratio of 14:1. This stimulus was administered to the participant via bone-conducting headphones.”

      Experiment II: “In this experiment, the same 1000 Hz square wave auditory stimulus was used for sound-only sham and auditory masking conditions. This stimulus was administered to the participant over in-ear headphones.”

      Experiment III: “Auditory stimuli were either 500 or 700 ms in duration, the latter beginning 100 ms prior to TUS (Supplementary Fig. 3.3). Both durations were presented at two pitches. Using a signal generator (Agilent 33220A, Keysight Technologies), a 12 kHz sine wave tone was administered over speakers positioned to the left of the participant as in Fomenko and colleagues (2020). Additionally, a 1 kHz square wave tone with 0.5 ms long pulses was administered as in Experiments I, II, IV, and prior research (Braun et al., 2020) over noisecancelling earbuds.”

      Experiment IV: “We additionally applied stimulation both with and without a continuous auditory masking stimulus that sounded similar to the auditory confound. The stimulus consisted of a 1 kHz square wave with 0.3 ms long pulses. This stimulus was presented through wired bone-conducting headphones (LBYSK Wired Bone Conduction Headphones). The volume and signal-to-noise ratio of the masking stimulus were increased until the participant could no longer hear TUS, or until the volume became uncomfortable.”

      In the revised manuscript we have also open-sourced the audio files used in Experiments I, II, and IV, as well as a recording of the output of the signal generator for Experiment III:

      “Auditory stimuli used for sound-sham and/or masking for each experiment are accessible here: https://doi.org/10.5281/zenodo.8374148.”

      2.4) The finding that remaining motor inhibition is observed during acoustically masked trials deserves further discussion.

      We agree. Please refer to points 1.16 and 1.18.

      2.5) In several places, the authors state to have "improved" control conditions, yet remain somewhat vague on the kind of controls previous work has used (apart from one paragraph where a similar control site is described). It would be useful to include more details on this specific difference to previous work.

      In the revised manuscript, we have clarified the control condition used in prior studies as follows:

      Abstract:

      “Primarily, this study highlights the substantial shortcomings in accounting for the auditory confound in prior TUS-TMS work where only a flip-over sham control was used.”

      Introduction:

      “To this end, we substantially improved upon prior TUS-TMS studies implementing solely flip-over sham by including both (in)active control and multiple sound-sham conditions.”

      Methods:

      “We introduced controls that improve upon the sole use of flip-over sham conditions used in prior work. First, we applied active control TUS to the right-hemispheric face motor area, allowing for the assessment of spatially specific effects while also better mimicking ontarget peripheral confounds. In addition, we also included a sound-only sham condition that closely resembled the auditory confound.”

      2.6) I also wondered how common TUS protocols are that rely on audible frequencies. If they are common, why do the authors think this confound is still relatively unexplored (this is a question out of curiosity). More details on these points might make the paper a bit more accessible to TUS-inexperienced readers. 

      Regarding the prevalence of the auditory confound, please refer to point 2.2.

      Peripheral confounds associated with brain stimulation can have a strong impact on outcome measures, often even overshadowing the intended primary effects. This is well known from electromagnetic stimulation. For example, the click of a TMS pulse can strongly modulate reaction times (Duecker et al., 2013, PlosOne) with effect sizes far beyond that of direct neuromodulation. Unfortunately, this consideration has not yet fully been embraced by the ultrasonic neuromodulation community. This is despite long known auditory effects of TUS (Gavrilov & Tsirulnikov, 2012, Acoustical Physics). It was not until the auditory confound was shown to impact brain activity by Guo et al., and Sato et al., (2018, Neuron) that the field began to attend to this phenomenon. Mohammadjavadi et al., (2019, BrainStim) then showed that neuromodulation persisted even in deaf mice, and importantly, also demonstrated that ramping ultrasound pulses could reduce the auditory brainstem response (ABR). Braun and colleagues (2020, BrainStim) were the first bring attention to the auditory confound in humans, while also discussing masking stimuli. This was followed by a study from Johnstone and colleagues (2021, BrainStim) who did preliminary work assessing both masking and ramping in humans. Recently, Liang et al., (2023) proposed a new form of masking colourfully titled the ‘auditory Mondrian’. Further research into the peripheral confounds associated with TUS is on the way.

      However, we agree that the confound remains relatively unexplored, particularly given the substantial impact it can have, as demonstrated in this paper. What is currently lacking is an assessment of the reproducibility of previous work that did not sufficiently consider the auditory confound. The current study constitutes a strong first step to addressing this issue, and indeed shows that results are not reproducible when using control conditions that are superior to flip-over sham, like (in)active control conditions and tightly matched soundsham conditions. This is particularly important given the fundamental nature of this research line, where TUS-TMS studies have played a central role in informing choices for stimulation protocols in subsequent research.

      We would speculate that, with TUS opening new frontiers for neuroscientific research, there comes a rush of enthusiasm wherein laying the groundwork for a solid foundation in the field can sometimes be overlooked. Therefore, we hope that this work sends a strong message to the field regarding how strong of an impact peripheral confounds can have, also in prior work. Indeed, at the current stage of the field, we see no justification not to include proper experimental control moving forward. Only when we can dissociate peripheral effects from direct neuromodulatory effects can our enthusiasm for the potential of TUS be warranted.

      2.7) Results, Fig. 2: Why did the authors not directly contrast target TUS and control conditions? 

      Please refer to point 1.1.

      2.8) The authors observe no dose-response effects of TUS. Does increasing TUS intensity also increase an increase in TUS-produced sounds? If so, should this not also lead to doseresponse effects? 

      We thank the reviewer for this insightful question. Yes, increasing TUS intensity results in an increased volume of the auditory confound. Under certain circumstances this could lead to ‘dose-response’ effects. In the manuscript, we propose that the auditory confounds acts as a cue for the upcoming TMS pulse, thus resulting in MEP attenuation once the cue is informative (i.e., when TMS timing can be predicted by the auditory confound). In this scenario, volume can be taken as the salience of the cue. When the auditory confound is sufficiently salient, it should cue the upcoming TMS pulse and thus result in a reduction of MEP amplitude.

      If we take Experiment II as an example (Figure 3B), the 19.06 W/cm2 stimulation would be louder than the 6.35 W/cm2 intensity. However, as both intensities are audible, they both cue the upcoming TMS pulse. One could speculate that the very slight (nonsignificant) further decrease for 19.06 W/cm2 stimulation could owe to a more salient cueing.

      One might notice that MEP attenuation is less strong in Experiment I, even though higher intensities were applied. Directly contrasting intensities from Experiments I and II was not feasible due to differences in transducers and experimental design. From the perspective of sound cueing of the upcoming TMS pulse, the auditory confound cue was less informative in Experiment I than Experiment II, because TUS stimulus durations of both 100 and 500 ms were administered, rather than solely 500 ms durations. This could explain why descriptively less MEP attenuation was observed in Experiment I, where cueing was less consistent.

      Perhaps more convincing evidence of a sound-based ‘dose-response’ effect comes from Experiment IV (Figure 4B). Here, we propose that continuous masking reduced the salience of the auditory confound (cue), and thus, less MEP attenuation was be observed. Indeed, we see less MEP change for masked stimulation. For the lowest administered volume during masked stimulation, there was no change in MEP amplitude from baseline. For higher volumes, however, there was a significant inhibition of MEP amplitude, though it was still less attenuation than unmasked stimulation. These results indicate a ‘doseresponse’ effect of volume. When the volume (intensity) of the auditory confound was low enough, it was inaudible over the continuous mask (also as reported by participants), and thus it did not act as a cue for the upcoming TMS pulse, therefore not resulting in motor inhibition. When the volume (intensity) was higher, less participants reported not being able to hear the stimulation, so the cue was to a given extent more salient, and in line with the cueing hypothesis more inhibition was observed.

      In summary, because the volume of the auditory confound scales with the intensity of TUS, there may be dose-response effects of the auditory confound volume. Along the border of (in)audibility of the confound, as in masked trials of Experiment IV, we may observe dose-response effects. However, at clearly audible intensities (e.g., Experiment I & II), the size of such an effect would likely be small, as both volumes are sufficiently audible to act as a cue for the upcoming TMS pulse leading to preparatory inhibition.

      2.9) I wonder if the authors could say a bit more on the acoustic control stimulus. Some sound examples would be useful. The authors control for audibility, but does the control sound resemble the one produced by TUS? 

      Please refer to point 2.3.

      2.10) The authors' claim that the remaining motor inhibition observed during masked trials is due to persistent audibility of TUS relies "only" on participants' descriptions. I think this deserves a bit more discussion. Could this be evidence that there is a TUS effect in addition to the sound effect? 

      Please refer to points 1.16 and 1.18.

    1. Author Response

      Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the depression-like behavior”. These comments are constructive and very helpful for improving our manuscript. We have studied comments carefully and have made provisional revision which we hope meet with approval. We also respond to the reviewer’s comments point by point as following.

      Reviewer #1 (Public Review):

      Comment 1:

      The pharmacological tools used in this study are highly non-selective. Gd3+, used here to block NALCN is actually more commonly used to block TRP channels. 2-APB inhibits not only TRPC channels, but also TRPM and IP3 receptors while stimulating TRPV channels (Bon and Beech, 2013), while FFA actually stimulates TRPC6 channels while inhibiting other TRPCs (Foster et al., 2009).

      We agree with the reviewer that the substances mentioned are not specific. Although we performed shRNA experiments against NALCN and TRPC6, we also used more specific pharmacological modulators for these two channels, L703,606 (the antagonist of NALCN)[1] and larixyl acetate (a potent TRPC6 inhibitor)[2]. The results are shown in figure 3E, F and figure 4C, E.

      Comment 2:

      -The multimodal approach including shRNA knockdown experiments alleviates much of the concern about the non-specific pharmacological agents. Therefore, the author's claim that NALCN is involved in VTA dopaminergic neuron pacemaking is well-supported.

      -However, the claim that TRPC6 is the key TRPC channel in VTA spontaneous firing is somewhat, but not completely supported. As with NALCN above, the pharmacology alone is much too non-specific to support the claim that TRPC6 is the TRP channel responsible for pacemaking. However, unlike the NALCN condition, there is an issue with interpreting the shRNA knockdown experiments. The issue is that TRPC channels often form heteromers with TRPC channels of other types (Goel, Sinkins and Schilling, 2002; Strübing et al., 2003). Therefore, it is possible that knocking down TRPC6 is interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      From our single-cell RNA-seq results, TRPC7 and TRPC4 are found not to be present broadly like TRPC6 in the VTA DA neurons. And in experiments using single cell PCR (sFig. 9A), only a very small proportion of TRPC6-positive DA cells (DAT+) expressed TRPC4 (sFig. 9Bi) or TRPC7 (sFig. 9Bii), in consistent with the results of single-cell RNA-seq (Fig.2). Therefore, it is possible that knocking down TRPC6 maybe not interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      Comment 3:

      The claim that TRPC6 channels in the VTA are involved in the depressive-like symptoms of CMUS is supported.

      • However, the connection between the mPFC-projecting VTA neurons, TRPC6 channels, and the chronic unpredictable stress model (CMUS) of depression is not well supported. In Figure 2, it appears that the mPFC-projecting VTA neurons have very low TRPC6 expression compared to VTA neurons projecting to other targets. However, in figure 6, the authors focus on the mPFC-projecting neurons in their CMUS model and show that it is these neurons that are no longer sensitive to pharmacological agents non-specifically blocking TRPC channels (2-APB, see above comment). Finally, in figure 7, the authors show that shRNA knockdown of TRPC6 channels (in all VTA dopaminergic neurons) results in depressive-like symptoms in CMUS mice. Due to the low expression of TRPC6 in mPFC-projecting VTA neurons, the author's claims of "broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. Because of the messy pharmacological tools used, it cannot be clamed that TRPC6 in the mPFC-projecting VTA neurons is altered after CMUS. And because the knockdown experiments are not specific to mPFC-projecting VTA neurons, it cannot be claimed that reducing TRPC6 in these specific neurons is causing depressive symptoms.

      The reason we focused on the mPFC-projecting VTA DA neurons is that this pathway is indicated in depressive-like behaviors of the CMUS model[3-5]. Although mPFC-projecting VTA DA neurons seem have lower level of TRPC6, we reason they are still functional there. However, we do agree with the reviewer that the statement “broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. We have changed the statements based on the reviewer suggestion. Furthermore, we did selectively knockdown TRPC6 in the mPFC-projecting VTA DA neurons, and then studied the behavior (Fig.8).

      Comment 4:

      It is important to note that the experiments presented in Figure 1 have all been previously performed in VTA dopaminergic neurons (Khaliq and Bean, 2010) including showing that low calcium increases VTA neuron spontaneous firing frequency and that replacement of sodium with NMDG hyperpolarizes the membrane potential.

      We agree with reviewer that similar experiments have been performed previously [6] for the flow of our manuscript and for general readers.

      Comment 5:

      -The authors explanation for the increase in firing frequency in 0 calcium conditions is that calcium-activated potassium channels would no longer be activated. However, there is a highly relevant finding that low calcium enhances the NALCN conductance through the calcium sensing receptor from Dejian Ren's lab (Lu et al., 2010) which is not cited in this paper. This increase in NALCN conductance with low calcium has been shown in SNc dopaminergic neurons (Philippart and Khaliq, 2018), and is likely a factor contributing to the low-calcium-mediated increase in spontaneous VTA neuron firing.

      We agree with the reviewer and thanks for the suggestions. A discussion for this has been added.

      Comment 6:

      -One of the only demonstrations of the expression and physiological significance of TRPCs in VTA DA neurons was published by (Rasmus et al., 2011; Klipec et al., 2016) which are not cited in this paper. In their study, TRPC4 expression was detected in a uniformly distributed subset of VTA DA neurons, and TRPC4 KO rats showed decreased VTA DA neuron tonic firing and deficits in cocaine reward and social behaviors.

      We thank the reviewer for the suggestion. The references and a discussion for this has been added.

      Comment 7:

      • Out of all seven TRPCs, TRPC5 is the only one reported to have basal/constitutive activity in heterologous expression systems (Schaefer et al., 2000; Jeon et al., 2012). Others TRPCs such as TRPC6 are typically activated by Gq-coupled GPCRs. Why would TRPC6 be spontaneously/constitutively active in VTA DA neurons?

      In a complex neuronal environment where VTA DA neurons are located, multiple modulatory factors including the GPCRs could be dynamically active, this could lead to the activation of TRP channels including TRPC6.

      Comment 8:

      A new paper from the group of Myoung Kyu Park (Hahn et al., 2023) shows in great detail the interactions between NALCN and TRPC3 channels in pacemaking of SNc DA neurons.

      The reference mentioned has been added. We thank the reviewer.

      Reviewer #2 (Public Review):

      Comment 1:

      These results do not show that TRPC6 mediates stress effects on depression-like behavior. As stated by the authors in the first sentence of the final paragraph, "downregulation of TRPC6 proteins was correlated with reduced firing activity of the VTA DA neurons, the depression-like behaviors, and that knocking down of TRPC6 in the VTA DA neurons confer the mice with depression behaviors." Therefore, the results show associations between TRPC6 downregulation and stress effects on behavior, occlusion of the effects of one by the other on some outcome measures, and cell manipulation effects that resemble stress effects. There is no experiment that shows reversal of stress effects with cell/circuit-specific TRPC6 manipulations. Please adjust the title, abstract and interpretation accordingly.

      We agree with the reviewer’s suggestion. The title was changed to ‘’The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the chronic stress-induced depression-like behavior” and the abstract and interpretation were also adjusted accordingly.

      Comment 2:

      Statistical tests and results are unclear throughout. For all analyses, please report specific tests used, factors/groups, test statistic and p-value for all data analyses reported. In some cases, the chosen test is not appropriate. For example, in Figure 6E, it is not clear how an experiment with 2 factors (stress and drug) can be analyzed with a 1-way RM ANOVA. The potential impact of inappropriate statistical tests on results makes it difficult to assess the accuracy of data interpretation.

      We have redone the statistical analysis as suggested by the reviewer and added specific tests used, factors/groups, test statistic and p-value for all data analyses into the figure legends of the revised manuscript.

      Comment 3:

      Why were only male mice used? Please justify and discuss in the manuscript. Also, change the title to reflect this.

      Although most similar previous studies used male mice or rats[7, 8], we do agree with the reviewer that the female animals should also be tested, in consideration possible role of sex hormones, as such we repeated some key experiments on female mice (sFig.1.6.8. and 13).

      Comment 4:

      Number of recorded cells is very low in Figure 1. Where in VTA did recordings occur? Given the heterogeneity in this brain region, this n may be insufficient. Additional information (e.g., location within VTA, criteria used to identify neurons) should be included. Report the number of mice (i.e., n = 6 cells from X mice) in all figures.

      Yes indeed, the number here is not high. More experiments were performed to increase the N/n number. And the location of recorded cells in VTA and the number of used mice is now shown in all figures; criteria to identify neurons is stated in the Methods-Identification of DA neurons and electrophysiological recordings. At the end of electrophysiological recordings, the recorded VTA neurons were collected for single-cell PCR. VTA DA neurons were identified by single-cell PCR for the presence of TH and DAT.

      Comment 5:

      Authors refer to VTA DA neurons as those that are DAT+ in line 276, although TH expression is considered the standard of DAergic identity, and studies (e.g., Lammel et al, 2008) have shown that a subset of VTA DA neurons have low levels of DAT expression. Authors should reword/clarify that these are DAT-expressing VTA DA neurons.

      The study published by Lammel[9] in 2015 has shown the low dopamine specificity of transgene expression in ventral midbrain of TH-Cre mice; on the other hand, DAT-Cre mice exhibit dopamine-specific Cre expression patterns, although DAT-Cre mice are likely to suffer from their own limitations (for example, low DAT expression in mesocortical DA neurons may make it difficult to target this subpopulation, see Lammel et al., 2008[10]).Hence, in our study, the DAT was used as criteria to identify DAT neurons. Of course, TH and DAT were all tested in single-cell PCR to identify whether the recorded cells were DA neurons.

      Comment 6:

      Neuronal subtype proportions should be quantified and reported (Fig. 1Aii).

      Neuronal subtype proportions are now quantified and reported in Fig. 1Aii.

      Comment 7:

      In addition to reporting projection specificity of neurons expressing specific channels, it would be ideal to report these data according to spatial location in VTA.

      The spatial location of recorded cells in VTA are now shown in all figures.

      Comment 8:

      The authors state that there are a small number of Glut neurons in VTA, then they state that a "significant proportion" of VTA neurons are glutamatergic.

      Thanks, “a significant proportion of neurons” has been changed to “less than half of sequenced DA neurons”.

      Comment 9:

      It is an overstatement that VTA DA neurons are the key determinant of abnormal behaviors in affective disorders.

      Thanks, we have amended the statement to that “Dopaminergic (DA) neurons in the ventral tegmental area (VTA) play an important role in mood, reward and emotion-related behaviors”.

      Reviewer #3 (Public Review):

      Comment 1:

      The authors of this study have examined which cation channels specifically confer to ventral tegmental area dopaminergic neurons their autonomic (spontaneous) firing properties. Having brought evidence for the key role played by NALCN and TRPC6 channels therein, the authors aimed at measuring whether these channels play some role in so-called depression-like (but see below) behaviors triggered by chronic exposure to different stressors. Following evidence for a down-regulation of TRPC6 protein expression in ventral tegmental area dopaminergic cells of stressed animals, the authors provide evidence through viral expression protocols for a causal link between such a down-regulation and so-called depression-like behaviors. The main strength of this study lies on a comprehensive bottom-up approach ranging from patch-clamp recordings to behavioral tasks. However, the interpretation of the results gathered from these behavioral tasks might also be considered one main weakness of the abovementioned approach. Thus, the authors make a confusion (widely observed in numerous publications) with regard to the use of paradigms (forced swim test, tail suspension test) initially aimed (and hence validated) at detecting the antidepressant effects of drugs and which by no means provide clues on "depression" in their subjects. Indeed, in their hands, the authors report that stress elicits changes in these tests which are opposed to those theoretically seen after antidepressant medication. However, these results do not imply that these changes reflect "depression" but rather that the individuals under scrutiny simply show different responses from those seen in nonstressed animals. These limits are even more valid in nonstressed animals injected with TRPC6 shRNAs (how can 5-min tests be compared to a complex and chronic pathological state such as depression?). With regard to anxiety, as investigated with the elevated plus-maze and the open field, the data, as reported, do not allow to check the author's interpretation as anxiety indices are either not correctly provided (e.g. absolute open arm data instead of percents of open arm visits without mention of closed arm behaviors) or subjected to possible biases (lack of distinction between central and peripheral components of the apparatus).

      We agree with the reviewer that behavior tests we used here is debatable whether they represent a real depression state, and this is an open question that could be discussed from different respective. Since these testes (forced swimming and tail suspension), as the reviewer noted, were “widely observed in numerous publications”, we used these seemly only options to reflect a “depression-like” state. One could argue that since these testes were initially used for testing antidepressants (“validated”), with decreased immobility time as indications of anti-depressive effects, why not an increased immobility time reflect a “depression-like” state. As for anxiety tests, the data concerning the elevated plus-maze are also changed based on the reviewer’s suggestion.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      -The paper needs extensive editing for both overall structural clarity and for the high number of typos and grammatical errors.

      We thank the reviewer’s suggestion. The revised manuscript has been edited extensively.

      Recommendation 2 for improving the paper:

      -Retrobeads are often toxic to cells and build up with increasing time. It is surprising that the authors wait 14-21 days for retrobead expression in their target cells. It is also a problem that the mPFC projecting cells have a longer time with the retrobeads than the other projection-targeting cells because the toxicity could be more extensive with the longer wait time thus confounding the results. The authors should repeat some mPFC experiments at the 14 day time point to confirm that the longer time with the beads is not influencing the differential effects in these cells.

      According to the methods published by Stephan Lammel and Jochen Roeper, “For sufficient labeling, survival periods for retrograde tracer transport depended on respective injection areas: DS and NAc lateral shell, 7 days; NAc core, NAc medial shell, and BLA, 14 days; and mPFC, 21 days[10]”, we did the experiments related to mPFC projecting cells at the 21 day time point. Consistent with the mentioned above, the labeled mPFC projecting cells at 14 day time point, is not sufficient, compared with this at 21 day time point, which is shown as followings.

      Author response image 1.

      Confocal images showing the anatomical distribution of mPFC-projecting DA neurons labelled with retrobeads (red) in the VTA after DAT-immunofluorescence (green) staining at different day time point (A, 14d; B, 21d) after retrobeads injection; Scale bars=10 μm.

      Recommendation 3 for improving the paper:

      -The experiment with FFA in Figure 4E seems weird. Why is there no baseline before the FFA application? And why is the baseline trending downward immediately? The authors should explain why this example experiment is presented differently from all the others.

      We apologize for this part that this example time-course is not typical. Since the FFA is not specific antagonist for TRPC6 and actually stimulates TRPC6 channels, we repeated the experiments with a more specific pharmacological modulator for TRPC6, larixyl acetate (LA), and the results are shown in Figure 4C and 4F.

      Recommendation 4 for improving the paper:

      -It would be much more useful to see exact p values in the text, as it aids in interpreting the 'insignificance' of specific comparisons. Specifically, in Figure 5F, the 2-APB looks like it is having a small effect, and the already low firing rate (due to the TRPC6 knockdown) makes a big effect less likely. It would be useful to know what the actual p value is here (and everywhere).

      OK. We now report all P values in the figure legends of the revised version.

      Recommendation 5 for improving the paper:

      -In the results, it should be explained that the "RMP" of VTA DA neurons was obtained by treating the cells with TTX.

      A sentence indicating the presence of TTX when measuring “RMP” is added in the Results part of the revised version.

      Recommendation 6 for improving the paper:

      -The spacing of the panels in the figures is somewhat odd. The figures could be more compact.

      Thanks, we have re-arranged all figures.

      Recommendation 7 for improving the paper:

      The paper is difficult to read because of significant grammatical errors. Here are some examples by line number, but this list is not at all exhaustive.

      We thank the reviewer for pointing out grammatical errors and we corrected them.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      Fix typos: e.g., change HCH to HCN, change EMP to EPM, "these finding", "compact par" should read "pars compacta", "substantial" in line 475 should read "substantia", Incomplete sentences on line 73 and line 107, etc. Also, what is meant by "autonomic" firing activity? What is meant by "expression files"? Change "depression behaviors" to depression-like behaviors. "The HCN" as written in line 69 is a bit misleading, as HCN channels in the heart and brain are different members of a family of channels, although as written in the text, it seems that they are identical. In Figure 2, rearrange order of brain regions (e.g., from "BLA-VTA" to "VTA-BLA"), because as written, it seems that the focus is on projections into the VTA from each brain region, rather than VTA neurons that project to each respective region.

      We thank the reviewer for pointing out these errors and we corrected them. Autonomic firing activity has been changed to spontaneous firing activity. Expression files has been changed to expression levels. All the “depression behavior” have been changed to depression-like behaviors. In the Figure 2, all “xx-VTA” have been changed to “VTA-xx”.

      Reviewer #3 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      Methodology: as opposed to sFig. 8 where the order through which mice were repeatedly tested is precise, such a key information is lacking in Fig. 6 as well as in the Methods section (for example, when such traumatic stress as forced swimming is performed with regard to the other tests?). Relevant to this point is the possible bias triggered by such chronological testing as exposure to the forced swim test likely affects the behaviors recorded in the other tests. Furthermore, the way this test is conducted is appealing as it is mentioned that the water depth was set to 10 cms which is quite low given that immobility scores might be affected by the ability of mice to stand on their tails.

      With regard to the elevated plus-maze, data are erroneously provided. Absolute values regarding open arm behaviors should be provided as percentages of the number of visits (or time spent therein) over the total (open + closed) number of arm visits. Indeed, closed arm visits should also be provided. This variable, also considered an index of locomotor activity, would allow the reader to exclude any effect of locomotion on the exploration in the open field.

      As they stand, data in the open field seem to indicate parallel changes at the center(center time) and the periphery (total distance), hence suggesting locomotor effects rather than anxiogenic effects. Data related to the center and the periphery should be clearly distinguished. Lastly, the number of weeks allowed for the mice to recover from surgeries aimed at delivering viruses are not mentioned. This is important as it could have affected the amplitude of the sensitivity to the stressors.

      We thank the reviewer for the suggestion. The lack information in Figure 6 and the Methods is now supplied. We apologize for the wrong number of “10 cm” in the forced swimming test, this has been corrected. The data concerning the elevated plus-maze are also changed based on the reviewer’s suggestion. For a possible role of locomotor effect, we tested the mice on the rota-rod test. From the result, there is no difference in locomotor activity between control and depressed-like mice (sFig.10G, sFig.12I and sFig.13G). We modified the experimental procedure timeline in Figure 6 and in the method- AAV for gene knockdown or overexpression and viral construct and injection, we added “Mice were singly housed with enough food and water to recover for 4-5 weeks after injection of virus, before behavior tests and electrophysiological recordings.” to report the number of weeks allowed for the mice to recover from surgeries aimed at delivering virus.

      Recommendation 2 for improving the paper:

      Results/conclusions: as yet mentioned, the authors make a confusion in the interpretation of their tail suspension tests and forced swimming tests. I acknowledge that such a confusion is frequent but it is important to note that the tests used by the authors were INITIALLY aimed at detecting the antidepressant effects of drugs under investigation. However, it is not because a test reveals such antidepressant properties that they also provide indices of depression. The authors will surely agree that it is unlikely that a 5-min test provides a model of a chronic pathology accounted for by a complex intrication between genetics and environmental factors. I would propose the authors to read for example Molendijk and De Kloet (Eur J Neurosci 2022). I think that the authors should just neutrally mention their results without any interpretation related to depression. On the other hand, what could have been interesting is to test whether the so-called "depressive-like" responses recorded in the study were sensitive to chronic antidepressant treatments. This would have allowed the authors to further suggest some relevance (if any) with depression-like pathologies.

      As we discussed above, we again agree with the reviewer’s concern. However, if as stated by the reviewer that “However, it is not because a test reveals such antidepressant properties that they also provide indices of depression”, then the experiments suggested by the reviewer “….. to test whether the so-called "depressive-like" responses recorded in the study were sensitive to chronic antidepressant treatments”

      Recommendation 3 for improving the paper:

      A close examination of the responses to CMUS or chronic restraint suggests that indeed two populations of animals were detected, possibly sensitive and resilient to these stressors. Did the authors try to examine this possibility?

      Based on the results of behavior test in CMUS and CRS, animals might be divided into two populations of animals highly-sensitive and moderately-sensitive ones.

      Recommendation 4 for improving the paper:

      There are some text changes that need to be performed:

      Page 2 line 46: ref 4 uses a social stress model which brings no clearcut evidence for it being a "depression" model. Indeed, this model can also be suggested to be a model of chronic anxiety (Kalueff et al., Science 2006; Chaouloff, Cell tissue Res 2013), hence indicating that VTA dopaminergic neurons might also be involved in anxiety.

      page 11, line 329: the references supporting the hypothesis that VTA DA neurons are linked to depression cannot be found in the reference list (10-15 do not correspond to the appropriate references).

      page 11, line 3341: reference 47 does not fit with the authors' assertion as it did not include any behavior.

      Fig. S8: body weight data are likely provided as changes rather than absolute values (e.g. 8 g)

      We agreed with the reviewer’s comments. The line 46“……such as depression states” has been changed to “such as depression- or anxiety-related states”. And we corrected the references in line 329 and 341. Finally, the body weight has been changed to the change in body weight.

      References:

      1. Um, K.B., et al., TRPC3 and NALCN channels drive pacemaking in substantia nigra dopaminergic neurons. Elife, 2021. 10.

      2. Urban, N., et al., Identification and Validation of Larixyl Acetate as a Potent TRPC6 Inhibitor. Mol Pharmacol, 2016. 89(1): p. 197-213.

      3. Zhong, P., et al., HCN2 channels in the ventral tegmental area regulate behavioral responses to chronic stress. Elife, 2018. 7.

      4. Liu, D., et al., Brain-derived neurotrophic factor-mediated projection-specific regulation of depressive-like and nociceptive behaviors in the mesolimbic reward circuitry. Pain, 2018. 159(1): p. 175.

      5. Walsh, J.J. and M.H. Han, The Heterogeneity of Ventral Tegmental Area Neurons: Projection Functions in a Mood-Related Context. Neuroscience, 2014. 282: p. 101-108.

      6. Khaliq, Z.M. and B.P. Bean, Pacemaking in dopaminergic ventral tegmental area neurons: depolarizing drive from background and voltage-dependent sodium conductances. J Neurosci, 2010. 30(21): p. 7401-13.

      7. Li, L., et al., Selective targeting of M-type potassium K(v) 7.4 channels demonstrates their key role in the regulation of dopaminergic neuronal excitability and depression-like behaviour. Br J Pharmacol, 2017. 174(23): p. 4277-4294.

      8. Friedman, A.K., et al., Enhancing depression mechanisms in midbrain dopamine neurons achieves homeostatic resilience. Science, 2014. 344(6181): p. 313-9.

      9. Lammel, S., et al., Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons. Neuron, 2015. 85(2): p. 429-38.

      10. Lammel, S., et al., Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron, 2008. 57(5): p. 760-73.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      This work describes a new and powerful approach to a central question in ecology: what are the relative contributions of resource utilisation vs interactions between individuals in the shaping of an ecosystem? This approach relies on a very original quantitative experimental set-up whose power lies in its simplicity, allowing an exceptional level of control over ecological parameters and of measurement accuracy.

      In this experimental system, the shared resource corresponds to 10^12 copies of a fixed single-stranded target DNA molecule to which 10^15 random single-stranded DNA molecules (the individuals populating the ecosystem) can bind. The binding process is cycled, with a 1000x-PCR amplification step between successive binding steps. The composition of the population is monitored via high-throughput DNA sequencing. Sequence data analysis describes the change in population diversity over cycles. The results are interpreted using estimated binding interactions of individuals with the target resource, as well as estimated binding interactions between individuals and also self-interactions (that can all be directly predicted as they correspond to DNA-DNA interactions). A simple model provides a framework to account for ecosystem dynamics over cycles. Finally, the trajectory of some individuals with high frequency in late cycles is traced back to the earliest cycles at which they are detected by sequencing. Their propensities to bind the resource, to form hairpins, or to form homodimers suggest how different interaction modes shape the composition of the population over cycles.

      The authors report a shift from selection for binding to the resource to interactions between individuals and self-interactions over the course of cycles as the main drivers of their ecosystem. The outcome of the experiment is far from trivial as the individual resource binding energy initially determines the relative enrichment of individuals, and then seems to saturate. The richness of the population dynamics observed with this simple system is thus comparable to that found in some natural ecosystems. The findings obtained with this new approach will likely guide the exploration of natural ecosystems in which parameters and observables are much less accessible.

      My review focuses mainly on the experimental aspects of this work given my own expertise. The introduction exposes very convincingly the scientific context of this work, justifying the need for such an approach to address questions pertaining to ecology. The manuscript describes very clearly and rigorously the experimental setup. The main strengths of this work are (i) the outstanding originality of the experimental approach and (ii) its simplicity. With this setup, central questions in ecology can be addressed in a quantitative manner, including the possibility of running trajectories in parallel to generalize the findings, as reported here. Technical aspects have been carefully implemented, from the design of random individuals bearing flanking regions for PCR amplification, binding selection and (low error) amplification protocols, and sequencing read-out whose depth is sufficient to capture the relevant dynamics.<br /> :<br /> We thank the reviewer for summarizing our work and the main findings in a very clear and effective manner.

      One missing aspect in the data analysis is the quantification of the effect of PCR amplification steps in shaping the ecosystem (to be modeled if significant). In addition, as it stands the current work does not fully harness the power of the approach. For instance, with this setup, one can tune the relative contributions of binding selection vs amplification for instance (to disentangle forces that shape the ecosystem). One can also run cycles with new DNA individuals, designed with arbitrarily chosen resource binding vs self-binding, that are predicted to dominate depending on chosen ecological parameters. I have three main recommendations to the authors:

      1) PCR amplification steps (and not only binding selection steps) should be taken into account when interpreting the outcome of experiments.

      2) More generally, a systematic analysis of the possible modes of propagation of a DNA molecule from one cycle to the next, including those considered as experimental noise, would help with interpreting the results.

      3) Testing experimentally the predictions from the analysis and the modelling of results would strengthen the case for this approach.

      Despite its conceptual simplicity, our approach has indeed a few experimental handles that enable exploring a relevant variety of conditions much beyond those described in this paper, of which we are very aware. These involve selection vs. amplification or set the stage to explore competition, parasitism or cooperation among specific species, as the reviewer points out, but also introduce mutations and explore the kinetics of evolution in static or dynamic environments. Ongoing experiments are considering some of these conditions. We modified the text to mention more explicitly these possibilities, which are now mentioned in p11 lines 376-378 and lines 416-417. The three points raised by the reviewer helped us to further improve and clarify strengths and limitations of our work, as detailed below.

      Regarding the first point, here are my suggestions :

      • Run one cycle of just amplification vs 'binding + amplification', or simply increase the number of PCR cycles (and subsample the product) to check whether it impacts the population composition, in particular for sequences with predictions derived from the current analysis.

      The point raised by the reviewer is indeed very relevant and not discussed in our manuscript. Prompted by the reviewer’s comment, we performed two new experiments to distinguish resource-binding selection from PCR amplification effects.

      First, we performed a negative control experiment in which we performed the “selection step” with bear beads, i.e. beads without with no DNA grafted on them. We then compared the results with the corresponding results of the original experiments on Oligo 1 and 2.

      After 6 cycles, the most abundant sequence in the negative dataset has a relative occurrence of 0.05%, whereas the dominant strand in Oligo 1 and Oligo 2 has an abundance of 8% and 16%, respectively, i.e. 40-80 times larger.

      This indicates that the drift due to non-specific binding + PCR amplification is at least two orders of magnitude smaller than the selection induced by the affinity with the resource.

      This results are now cited in p14 lines 468-470, and described in Appendix 1, Experimental controls.

      Second, we tested the effect of PCR amplification on the selection process. We exploited the fact that we have aliquots for each generation of our evolution experiment, which we sampled and saved after PCR and before sequencing. We thus chose a specific generation - specifically generation 9 from Oligo 1 experiments - and performed another PCR round we proceeded directly to sequencing with no beadsselection step. We then compared the ensemble of oligos obtained in this way, which we named Oligo 1 “cycle 9 replica”, with both the original Oligo1 cycle 9, and with Oligo1 cycle 10.

      We sampled 20 times 4 x 10^5 sequences from the cycle 9 dataset, from cycle 9 replica and from cycle 10 with a bootstrap approach. To compare the three systems we extracted the fraction of the population of each covered by the 10 most abundant individuals. The results are shown in Figure 2 - Figure Supplement 4. In the figure caption further details on the analysis can be found. The similarity between cycle 9 and cycle 9 replica and the marked difference between cycle 9 replica and cycle 10

      indicates that the relevant part of the selection is indeed performed by the resourcebinding mechanism, while drifts induced by PCR play a secondary role.

      As a further check, we compared the specific sequences across the 20 samples in cycle 9 and cycle 9 replica datasets and found that the 10 most abundant sequences are almost always the same. In particular, the first 8/9 are always the same, possibly shuffled.

      These new pieces of evidence are now cited in p14 lines 483-484 and described in Appendix 1, Experimental controls.

      • Sequencing read-out includes the same PCR protocol as the one used for amplification steps, so read-out potentially has an effect on the composition of the ecosystem. Again, varying the number of PCR cycles is a direct way to test this.

      The PCR amplification involved in the read-out might have a minor effect on the sequencing outcome but not on the composition of the ecosystem. In fact, the sample that undergoes sequencing is taken from the pool at each cycle, and not inserted back into it. Thus, it does not participate in the following selection steps. This is specified in the text at p3 line 104

      • Could self-interactions (hairpins of homodimers) benefit individuals during amplification steps? The role of self-interactions during binding selection steps could also be tested directly over one cycle (again varying the relative weight of the binding vs amplification to disentangle both).

      Our choice of conditions for PCR amplification were thought to minimize effects of this type. PCR amplification is carried out at 68 C, a temperature at which, given the level of self and mutual complementarity in the sequences analyzed in the text, hairpins or homodimers should be melted and thus have no effect. This is specified in the text at p. 14 lines 479-480 However, if an effect is present, it gives a disadvantage (rather than an advantage) to self-interacting individuals. For the amplification step we used Q5® Hot Start HighFidelity DNA Polymerase, which does not possess strand displacement activity. Therefore, in theory, if during amplification the polymerase encounters a double strand portion, it stops and synthesizes only a truncated product, which will be then lost during the purification step. In other words, sequences with secondary and/or tertiary structures are less likely to be amplified during the polymerization step. As a consequence, a DNAi that is characterized by this kind of structures, will be negatively selected even in the case of optimal binding to the resource, and will be underrepresented in the pool.

      About the second point:

      • Regarding the effect of sampling (sequencing read-out), PCR amplification errors: explicitly check the consistency of observations with the expected outcome, in the methods section (right now these aspects are only briefly mentioned in the main text), which would highlight again the level of control and accuracy of the system.

      Hoping to have well interpreted the request, we performed a technical replicate sequencing Oligo 1 cycle 9 again and analyzed the sequences that have at least 100 reads (corresponding to 27.42% of the total reads). We find that among the 800 DNA species that have at least 100 reads, 93.6% are found in both replicates. All the nonoverlapping sequences have very low abundance, close to 100.

      Moreover, we compare the population size of each DNA species between the two replicas, after having equalized the database sizes. The results are now cited in p14 lines 509-510, In Appendix 1, Experimental Controls and shown in Figure 2-figure supplement 3, where we plot the ratio of the number of reads in the two replicates for each sequence as a function of the number of reads in one. We found an average of 0.965 with a standard deviation of 0.119. High fluctuations are found in the most rare species, as expected.

      We think this evaluation indeed strengthens the solidity of our results.

      • I have a small concern about target resource accessibility: is there any spacer between the ssDNA and the bead? The methods section does not mention any, and I would expect such a proximity between the target DNA and the bead to yield steric repulsion that impedes interactions with random DNA individuals.

      Yes, there is a 12-carbon spacer between the bead and the resource, which was inserted exactly to make the resource more accessible. This information is now available in Table 1 of Supplementary Information detailing the sequences used in the experiment. However, as now described in the text (p8 lines 284-286), we observe that the interaction with the resource is always shifted to the 3', the terminal furthest from the bead, indicating some residual issue of accessibility to the resource sections closest to the bead.

      • Regardless of the existence of a spacer, binding of random DNA molecules to beads instead of the target DNA constitutes a potential source of noise (described for now as '1-x' in the IBEE model), which can be probed by swapping targets, selecting without target etc.

      This issue is addressed by the test with bare beads described above, in which we found little effects, corresponding to small 1−𝑥 value.

      • Is there any recombination potentially occurring during amplification steps? This could be tested with a set of known molecules amplified over 24 amplification steps in a row (no binding step).

      It is possible for recombination to occur during the amplification steps. In Appendix 2, the section "By-Product Formation from PCR Amplification", discusses PCR byproducts as aberrant forms of amplification, such as recombination events. We adopted several strategies to limit by-product formation, such as: i) use of “blockers” characterized by a phosphate group at 3’ end (thus inhibiting their usage during the amplification and allowing a better control of the reaction conditions over the PCR cycles), ii) a high annealing temperature (to limit the possibility of a spurious primer annealing to the random region), iii) fewer PCR cycles, iv) a high primer concentration, v) a very short elongation step (all these strategies have been implemented to avoid a possible mispriming event between different DNAi, and the formation of concatemers). However, the formation of by-products is a problem inherent to the technique: in fact, it is a known issue for classical SELEX technology (Tolle et al. 2014), mainly due to the random region within the DNAi. Q5® Hot Start High-Fidelity DNA Polymerase (New England Biolabs, Ipswich, MA, USA) has an error rate of <0.44 x 10-6/base.

      In classic SELEX technology, the average number of selection cycles is 10. This limitation is partly due to the increase in PCR by-products. As we can see from Figure 2 Supplementary Figure 1, the percentage of PCR by-products is less than 20% at cycle 12, and then increases dramatically in the following cycles. We are performing a series of experiments with known and limited sequences to verify and better understand the phenomenon for future applications of the SEDES platform. On this issue we decided not to modify the manuscript since we think it is already well discussed in Appendix 1.

      And the third point:

      • Perform one cycle (or a few cycles) with random DNA individuals, the most frequent individuals at the end of the current experiment, newly designed individuals with higher binding affinity to the target than currently dominating individuals, newly designed individuals with higher propensity to form hairpins or to form homodimers. Such experimental testing of predictions from the data analysis/modeling, typical of a physics approach, would illustrate the level of understanding one can reach with a simple yet powerful experimental setup.

      We perfectly agree that the approach we propose and the set of results we obtained call for further investigations that could strengthen analysis and modeling. The final aim we envisage is the understanding, within this simplified approach, of key evolutionary factors such as fitness. Indeed, becoming able to write an explicit fitness function would be a significant new contribution to the understanding of evolutionary processes, even within the limited settings of the ADSE approach, as discussed in the conclusions of the manuscript.

      However, undergoing such an analysis is a long and expensive job, which we have started and will be completed in a not immediate future. For this reason, given the already significant body of results we are presenting here, we prefer to keep this paper confined to the study of the evolution of a random DNAi population and discuss in a future contribution the behavior of smaller designed sets of competing, collaborating or parasitic individuals.

      Looking ahead, additional stages of investigations will also include mutations - to investigate the kinetics of speciation, and, in an even further stage, the interplay between evolution kinetics and dynamical mutation of resources.

      I have a few smaller points:

      • It would be very useful to provide the expected dynamic range of binding free energies (in terms of DeltaG and omega): what is the maximum binding free energy for the perfect complement?

      The NUPACK-computed binding free energy of a 20 basis-long oligomer complementary to the resource (𝜔=20) is -24.36 Kcal/mol for Oligo1 and -23.08 Kcal/mol for oligo 2. This is the best answer we can offer to the reviewer’s request, since the maximum binding free energy of DNAi individuals (much longer than the target strand) would include contributions from the unpaired bases. Indeed, the values give above are approached by the left tail of the distribution of Fig. 3a, which however includes DNAi self-energies.<br /> The perfect complement binding free energy is now cited in the text as a reference for the dynamical range of DeltaG (p4 lines 151-152).

      • How is the number of captured DNA molecules quantified? Is 10^12 measured, estimated, or hypothesized?

      The number of sequences was calculated from data obtained from 260 nm absorbance quantification. We have now added this information in the Methods, Selection Phase” section.

      Reviewer #2:

      Summary:

      In this manuscript, the authors introduced ADSE, a SELEX-based protocol to explore the mechanism of emergency of species. They used DNA hybridization (to the bait pool, "resources") as the driving force for selection and quantitatively investigated the factors that may contribute to the survival during generation evolution (progress of SELEX cycle), revealing that besides individual-resource binding, the inter- and intra-individual interactions were also important features along with mutualism and parasitism.

      Strengths:

      The design of using pure biochemical affinity assay to study eco-evolution is interesting, providing an important viewpoint to partly explain the molecular mechanism of evolution.

      Weaknesses:

      Though the evidence of the study is somewhat convincing, some aspects still need to be improved, mostly technical issues.

      Major:

      1) There are a few technical issues that the authors should clarify in the manuscript to make the analysis more transparent:

      1.1) To my understanding, it is difficult to guarantee the even distribution of different species (individuals) in the initial individual pool. Even though the authors have shown in Fig. 2a that the top 10 sequences take up ~ 0% in the pool, it remains unclear how abundant these top and bottom representative sequences are, given the huge number of the pool (10E15). Can the author show the absolute number of these sequences in different quantiles? Please show both Oligo sets.<br /> : First, we thank the reviewer for both positive and critical comments that have guided us in reformulating or clarifying some messages of our work.

      As for this specific point: 10E15 is a small number compared to 4^50 = 10E30, the number of possible sequences of length 30. Thus, we don’t expect more than one individual per sequence in the initial pool. However, sequencing requires a preparation amplification, which may lead to detecting a few sequences with more than one individual.

      Specifically, in the initial pool of Oligo 1, the most abundant individual (of sequence GAACTAAAGGGGCGGTGTCCACTTGCCTGTAGTGGTTATCAGTCCGGTTG)has 3 copies. The 0.7% of the sequences has 2 copies, while the vast majority of strings (99.3% on a sample of about 1.5 x 10E6 sequenced DNAi) is present in one copy only. A similar situation holds for Oligo 2, with 4 DNAi present in 3 copies and the 0.8% of the sequences (in a pool of 2 x 10E6 DNAi) in 2 copies.

      It is worth noticing that none of the 10 most abundant species in the last cycle is present in the sample. Indeed, the fraction of the pool which is sequenced is removed from the population that undergoes evolution (as now specified in p2, line 104). We specified in the text (p2, lines 69-70, p3 lines 94-96) the fact that in the initial pool no sequence is expected to be present in more than one individual.

      1.2) The author claimed that they used two different oligo sets (Oligo1 and Oligo2) in this study. It is unclear which data was used in the presentation. How reproducible are they? Similar to this concern, how reproducible if the same oligo set was used to repeat the experiment?

      The oligo used in the main text was declared in Methods, Replica section. It is now declared also in the main text (p3 lines 106-108 and in the captions of Figure 2, Figure 3 and Figure 4). Reproducibility is addressed in: Figure 2-figure supplement 5; Figure 2-figure supplement 6; Appendix 2: Results of the experimental replica.

      It should also be noted that two starting pools of random 50mers are necessarily disjoint sets for the same reason discussed in the previous answer: the probability of common sequences in two 10E15 selections from a 10E30 is negligibly small. Thus, it is expected that each time a new evolution experiment is started, different dominant sequences are found. However, the statistical properties of the DNAi pool during the evolution process of Oligo1 and Oligo2 are similar as discussed in Appendix 2 of the paper.

      1.3) PCR and illumina sequencing itself introduced selection bias. How would the analysis eliminate them? The authors only discussed the errors created during PCR cycles (page 3, lines 115-122). However the PCR itself would prefer to amplify some sequences over the others (e.g. with high GC content). Similarly, the illumina sequencing would be difficult to sequence the low complexity sequences. How would this be circumvented?

      Yes, both PCR and Illumina sequencing have some known biases in the amplification process (e.g. sequencing of homopolymers or amplification of GC-rich sequences) that are intrinsic to the used techniques. Regarding PCR, we implemented a thermal protocol optimized for our chosen experimental setup, characterized by very short denaturation, annealing and amplification steps performed at high temperatures. Regarding Illumina sequencing, we can’t rule out a bias against specific sequences (e.g, homopolymers), which however should not be captured during the selection step, due to the design of the resource. Also, the libraries subjected to sequencing are characterized by a low complexity: according to the experimental design, the first and last 25 nucleotides are the same for all DNAi, the only differences being in the central 50 nt-long sequence. It is known that a low complexity library might encounter problems during sequencing due to the design of Illumina instruments: nucleotide diversity, especially in the first sequencing cycles, is critical for cluster filtering, optimal run performance and high-quality data generation. To overcome this limitation, the obtained libraries were run together with more complex and diverse library preparations: the ADSE sequences were about 1-2% of the total reads per run, corresponding to only a few million reads.

      This discussion is now in Appendix 1, Intrinsic limitations of the molecular approach.

      1.4) Some DNA sequences would bind to the beads instead of the resource sequence coated on them. Should the author run the experiment using bead alone as a control?<br /> : We performed a negative control experiment in which we performed the “selection step” with bear beads, i.e. beads without with no DNA grafted on them. We then compared the results with the corresponding results of the original experiments on Oligo 1 and 2.

      After 6 cycles, the most abundant sequence in the negative dataset has a relative occurrence of 0.05%, whereas the dominant strand in Oligo 1 and Oligo 2 has an abundance of 8% and 16%, respectively, i.e. 40-80 times larger.

      This indicates that the drift due to non-specific binding (+ PCR amplification) is at least two orders of magnitude smaller than the selection induced by the affinity with the resource.<br /> This part is now discussed in Appendix 1, Experimental controls.

      2) It would be interesting to study the impact of environmental factors, for example, changing pH, salt concentration, and detergent. Would these factors accelerate/decelerate the evolution?

      We agree that the approach we propose and the set of results we obtained call for further investigations. However, performing these additional experiments, which would require a minimum of 6 generations each, is a long and expensive job, which we have started and will not be completed in the near future. For this reason, given the already significant body of results we are presenting here, we prefer to keep this paper confined to the study of the evolution of a random DNAi population in the selected conditions and leave the exploration of new conditions, potentially opening new evolutionary scenarios, to a future contribution. In fact, our aim was to show that through our platform we can indeed observe fundamental elements of evolution in a non-biological system, which, in the set of chosen parameters, we do.

      3) The concentration of individual oligo is apparently one of the most important factors in determining the interactions. In later cycles, some oligos become dominant, namely with extremely higher concentrations compared to their concentration in earlier cycles. This would definitely affect its interaction with resources, or self-interaction, or interaction with other oligos in the pool. However, the authors failed to discuss this factor, which may explain the exponential enrichment in later cycles.

      We agree with the reviewer that this is an important point, but we disagree that we have not discussed it. We introduce the topic at the end of the “Null Model and Eco-evolutionary Algorithm”, where we comment on the change of the gamma parameter by saying that there must be a shift in the evolution process, first dominated by the interactions with the resources, and in later stages by some other factors (lines 227230) that we then discuss in “Self and mutual DNAi interactions are evolutionary drivers”. In this latter chapter and in the following, we indeed discussed the effects of mutual and self interactions between DNAi.

      Indeed, a key point in our paper is the change in the gamma parameter necessary to match the IBEE model to experiments, as it is now more openly stated (p5 lines 217218 where we also mention figure 2-supplement 8 which clearly shows the necessity of a variable gamma). The two regimes enlightened by the gamma value must reflect a change in the competition for the resources and interactions among species. In the first generations, where the diversity of species is large (there are few strings for each species) and binding to the resources generally very week (small <omega>), the affinity with the resource is the main driving force (fast growth of <omega>), while mutual interactions remain too random to favor any species in particular. In the later cycles instead, when <omega> becomes large enough to provide a significant stability to the resource-binding of the majority of species, the dominating species compete more intensively on the basis of their structure and capacity of self-defense, parasitism and mutualism, a condition in which evolution affects more modifications in sequences than in <omega>.

      Certainly, our understanding of this shift is based on statistical behavior and it is inferential, based on the study of specific DNAi described in the last part of the manuscript. For a better molecular model, more experiments with selected DNAi competing, cooperating or being parasitic would be necessary, with the final aim of defining a predictive fitness function. Alas, this requires months of further investigation. :

      4) The author observed the different behaviors of medium 𝜔 in early and late cycles, referring to Fig 2h. Using the IBEE model, they found out it is the change of gamma. However, the authors did not further discuss the molecular mechanism. It could be very interesting to understand the evolutionary change of these individuals.

      This comment might be related to the previous one. It is true that our discussion and understanding of the whole process is statistical, and misses a molecular model to predict the value of gamma.

      However, the specific behavior that the reviewer asks about (those in Fig. 2h) is not related to the change in gamma. Even if gamma remains as in the first part of the evolution (gamma = 3), the species with overlap between 6 and 10 would first grow in number and later decrease. Indeed, during the first cycles they have an advantage with respect to the majority of species with lower maximum overlap, a condition that favors their amplification. However, in the second stage of the evolution dominant species with a larger affinity emerge and outcompete the individuals of this class. We added a sentence in the text to clarify this point (p7 lines 227-229).

      5) In Figure 2f, some high w become quite missing. Should the authors give some interpretation? It is not observed in cycle 12 though (panel e).

      Such an effect is just due to under-sampling. In a pool of 10^n oligomers, any sequence with a given 𝜔 with P(omega) < 10E-n will have a vanishing probability to appear in that sample.<br /> At cycle 12 the overall number of sequenced strands is larger than at cycle 24, due to the growing presence of PCR by-products. Thus, the right tail of the cyan distribution at the last cycle is sampled with less accuracy than at cycle 12. We have added a sentence in the revised manuscript (p5 lines 177-178) to clarify this point.

      6) It would be interesting to further explore if another type of selection resource is used, for example protein that binds to particular sequences, i.e. transcription factors. Previous studies have used a large amount of sequence-specific transcription factors to run SELELX. Since the data have existed there, why not explore?

      This is an interesting suggestion: can we use data from “ordinary” SELEX favoring specific sequences to explore sequence evolution? Two limitations make us a bit skeptical on this path: first, the consensus sequences of DNA-binding proteins are rather short and typically target dsDNA rather than ssDNA; second, the free energy of interaction is known only for the consensus sequence but not for sequences with all possible mutations with respect to the consensus sequence, making very hard to develop any molecular understanding of the process.

      Minor:

      1) There is no figure legend or in-text citation of Figure 2b.

      2) Please correct "⁃C" with "{degree sign}C" in lines 470, 471, 472, 477 et al.

      3) Typos and grammar issues should be corrected. Examples are shown below (but not limited to these only):

      • mixed use of past and present tense.

      • Line 152, "basis" should be "bases".

      • Line 277, "a impediment" should be "an impediment"

      • Line 278, "a major deadly threats" should be "major deadly threats"<br /> :<br /> We are sorry for the mistakes, and we have corrected them. Many thanks to the reviewer!

    1. Author Response

      We appreciate your consideration of our manuscript entitled “Deciphering molecular heterogeneity and dynamics of neural stem cells in human hippocampal development, aging, and injury” (eLife-RP-RA-2023-89507). We thank all the reviewers for their valuable and thoughtful comments and suggestions. We have carefully considered all the comments and revised our manuscript (eLife-VOR-RA2023-89507) accordingly. You can find our point-by-point responses here. In the revised manuscript, we have addressed most of the issues and concerns raised by the reviewers. We hope that the changes will better illustrate the quality of our sn-RNA data and the criteria of the cell type identification. However, due to the scarcity of stroke and neonatal human brain samples, we cannot strengthen our findings and conclusions by increasing this type of hippocampal tissue for analysis within the expected timeframe. With these improvements and limitations, we would like to ask whether we could get a better judgment from the reviewers.

      Reviewer #1 (Public Review):

      In this manuscript, Yao et al. explored the transcriptomic characteristics of neural stem cells (NSCs) in the human hippocampus and their changes under different conditions using single-nucleus RNA sequencing (snRNA-seq). They generated single-nucleus transcriptomic profiles of human hippocampal cells from neonatal, adult, and aging individuals, as well as from stroke patients. They focused on the cell groups related to neurogenesis, such as neural stem cells and their progeny. They revealed genes enriched in different NSC states and performed trajectory analysis to trace the transitions among NSC states and towards astroglia and neuronal lineages in silico. They also examined how NSCs are affected by aging and injury using their datasets and found differences in NSC numbers and gene expression patterns across age groups and injury conditions. One major issue of the manuscript is questionable cell type identification. For example, in Figure 2C, more than 50% of the cells in the astroglia lineage clusters are NSCs, which is extremely high and inconsistent with classic histology studies.

      We appreciate the concerns raised by Reviewer 1 regarding the cell type identification. We suggest that the identification of the 16 main cell types in our study is accurate, as supported by the differential gene expression and the similarity of transcriptional profiles across species (Figure 1B to D, Figure Supplement 1C to E, and Figure 2A and B).

      While we appreciate the reviewer for bringing up the concern regarding the high proportion of NSCs within the astroglia lineage clusters, it is worth mentioning that distinguishing hippocampal qNSCs from astrocytes by transcription profiling poses a significant challenge in the field due to their high transcriptional similarity. From previous global UMAP analysis, AS1 (adult specific) can be separated from qNSCs, but AS2 (NSC-like astrocytes) cannot. Therefore, the data presented in Figure 2C to G aimed to further distinguish the qNSCs from AS2 by using gene set scores analysis. Based on different scores, we categorized qNSC/AS lineages into qNSC1, qNSC2 and AS2. Figure 2C presented the UMAP plot of qNSC/AS2 population from only neonatal sample. We apologize for not clarifying this in the figure legend. We have now clarified this information in the figure legend of Figure 2C. More importantly, we have added UMAP plots and quantifications for other groups in Figure2Supplement 2A and B, including adult, aging, and injure samples. This supplementary figure provides more complete information of the cell type composition and dynamic variations during aging and injury. Although the ratio of NSCs in the astroglia lineage clusters remains higher compared to classic histology studies, the trends indicate a reduction in qNSCs and an increase in astrocytes during aging and injury, which supports that cell type identification by using gene set score analysis is effective, although still not optimal. Combined methods to accurately distinguish between qNSCs and astrocytes are required in the future, and we also discuss this in the corresponding texts.

      Major comments:

      In Figure 1E, the authors should provide supporting quality control of their snRNAseq dataset in the corresponding supplementary figures. Specifically, they should show that the average number of genes and transcripts detected in each cluster are similar across different conditions. This would rule out the possibility that the stem cell gene enrichment is an artifact of increased global gene expression.

      Thanks for the suggestion. We have provided the supporting quality control of our snRNA-seq dataset in Figure1-Supplement 1A, B and F. The detailed data presented in Figure 1-Supplement 1A and Figure 1-source data 1 show that more than 2000 genes per cell were detected in all donor samples and mitochondrial genes accounted for less than 5%, suggesting that most cells were viable before freezing and underwent minimal RNA degradation. The hippocampi were dissected and collected from donors with a short post-mortem interval of about 3-4 hours to ensure low levels of RNA degradation and cellular apoptosis rates in the collected samples. For subsequent transcriptome analysis, we removed cells with fewer than 200 genes or more than 8600 genes (potentially indicating cell debris and doublets) and those with more than 20% of transcripts generated from mitochondrial genes, as shown in Figure 1-Supplement 1A and B. Figure 1-Supplement 1F provides evidence supporting that the average number of genes detected in each neurogenic cell type (AS2/qNSC, pNSC, aNSC, NB and GC) is similar across different conditions. This suggests that the enrichment of stem cell genes is not simply an artifact of increased global gene expression.

      In Figure 2A, the authors performed a cross-species comparative analysis of neurogenic cell clusters by integrating their datasets with published datasets from mice, pigs, and macaques. They assigned cell types to the clusters based on their similarity to the same cell group across species. However, they did not address why a previous study by Franjic et al. (Neuron 2022) using the same method and analysis did not detect any neurogenic clusters in human hippocampal and entorhinal cells. This discrepancy could have implications for the validity of their approach and the interpretation of their results. The authors should provide possible explanations for the different outcomes.

      We appreciate the valuable feedback provided by the reviewer. In our dataset, we sequenced 24,671 GC nuclei and 92,966 total DG cell nuclei, which also includes neonatal samples. The number of nuclei we sequenced is 4.5 times higher than that of Wang et al. (Cell Research, 2022), who also detected NBs. Thus, it is reasonable to conclude that we were able to detect NBs. Moreover, the presence of these rare cell types has been demonstrated in our study through immunostaining techniques, which provides further evidence. In addition, we downloaded the snRNAseq data from Franjic et al. (Neuron 2022) and mapped the dataset onto our snRNAseq dataset using the “multimodal reference mapping” method. Based on the mapping analysis, astrocytes, qNSCs, and aNSCs were identified in Franjic’s data with varying correlation efficiencies, but neuroblasts or immature neurons could not be detected (Figure 6-figure supplement 11 A to G). Therefore, we speculated that the discrepancies between our study and Franjic’s might be caused by health state differences across hippocampi, which subsequently lead to different degrees of hippocampal neurogenesis and immature neuron maintenance.

      In Figure 2C-2J, the authors examined the astroglia lineage clusters to identify NSC subpopulations and their gene features. However, they did not use consistent cell types for the analysis. Some comparisons involved quiescent NSCs (qNSCs) and differentiated astrocytes, while others involved primed NSCs (pNSCs), and active NSCs (aNSCs). This could introduce bias and affect the results. The authors should consistently include all astroglia cell clusters in their analysis, such as q, p, a NSCs and astrocytes.

      We understand the concerns raised by the reviewer, and we use different cell types as the starting points for the developmental trajectory for specific reasons. pNSCs represent an intermediate state between quiescence and activation. During embryonic development, pNSCs demonstrate the greatest similarity to RGLs. Subsequently, pNSCs progressively exit the cell cycle and transition into qNSCs during the postnatal stage. These qNSCs have the ability to re-enter the cell cycle upon activation by stimuli. Based on this knowledge, we have set the pNSC population as the root of the developmental trajectory in the neonatal sample, which aligns more closely with the actual developmental process. However, setting qNSCs as the root of the NSC developmental trajectory in the adult injury sample is more fit to the process of adult neurogenesis.

      In addition, the authors’ identification of qNSCs, pNSCs and aNSCs is very questionable in Figure 2. For instance, qNSC2 cells in Figure 2G express MBP, PLP1, and MOBP, which are markers of mature oligodendrocytes. They receive low scores in RGL gene module scoring in Figure 2E, even lower than those of astrocytes. These cells are likely misclassified mature oligodendrocytes. In Figure 2H-I, the authors did not present the DEGs in pNSCs and aNSCs, the GO terms of these clusters are very similar. To confirm their results, the authors should either use histology or cite literature that supports the differentiation of pNSCs and aNSCs by these genes.

      We appreciate the reviewer’s observation regarding the high expression of oligodendrocyte (OL) genes in the qNSC2 population, and we acknowledge that we currently do not have a clear explanation for this finding. However, despite the expression of OL genes in qNSC2, when we conducted a transcriptional similarity analysis comparing qNSC2 to other cell populations, we still observed a higher similarity between qNSC2 and qNSC1, as well as between qNSC2 and astrocytes, rather than oligodendrocytes. Therefore, qNSC2 are not misclassified mature oligodendrocytes (Figure 2-figure supplement 2C).

      Regarding pNSCs and aNSCs, both cell types share similar molecular characteristics, with a key distinction in their proliferation abilities. Notably, aNSCs primarily reside in the S/G2/M phase and highly express the cell cycle-related gene CCND2, reflecting active mitosis. Since its capacity to differentiate into neuroblast/immature granule cells, aNSCs also express a small subset of genes associated with neuronal differentiation, including STMN2, SOX11, and SOX4 (Figure 1C, D, and Figure 2J). As per the reviewer’s request, we have presented the DEGs in pNSCs and aNSCs (Figure 2-figure supplement 2D, Figure 2-source data 2). The results of GO analysis reveal that pNSC is more associated with the Wnt signaling pathway, axonogenesis, and Hippo signaling, while aNSC is more associated with G2/M transition of mitotic cell cycle, neuron projection development, axon development, and dendritic spine organization (Figure2-figure supplement 2E, Figure 2-source data 2).

      As Figure 2C illustrates, the authors isolated qNSCs and differentiated astrocytes from the astroglia lineage clusters to identify DEGs. However, more than 50% of the cells in the astroglia lineage clusters are NSCs, which is extremely high and inconsistent with classic histology studies. This could be due to cluster misclassification or over-representation of neonatal NSCs in the NSC cluster. The authors should stratify their data by age groups and provide corresponding UMAP plots and quantification. They should also compare DEGs between NSCs and astrocytes within each age group in all of the analyses, as neonatal, adult, and aging NSCs may have different properties and outputs.

      While we appreciate the reviewer for bringing up the concern regarding the high proportion of NSCs within the astroglia lineage clusters, it is worth mentioning that distinguishing hippocampal qNSCs from astrocytes by transcription profiling poses a significant challenge in the field due to their high transcriptional similarity. From previous global UMAP analysis, AS1 (adult specific) can be separated from qNSCs, but AS2 (NSC-like astrocytes) cannot. Therefore, the data presented in Figure 2C to G aimed to further distinguish the qNSCs from AS2 by using gene set scores analysis. Based on different scores, we categorized qNSC/AS lineages into qNSC1, qNSC2 and AS2. Figure 2C presented the UMAP plot of qNSC/AS2 population from only neonatal sample. We apologize for not clarifying this in the figure legend. We have now clarified this information in the figure legend of Figure 2C. More importantly, we have added UMAP plots and quantifications for other groups in Figure2-Supplement 2A and B, including adult, aging, and injure samples. This supplementary figure provides more complete information of the cell type composition and dynamic variations during aging and injury. Although the ratio of NSCs in the astroglia lineage clusters remains higher compared to classic histology studies, the trends indicate a reduction in qNSCs and an increase in astrocytes during aging and injury, which supports that cell type identification by using gene set score analysis is effective, although still not optimal. Combined methods to accurately distinguish between qNSCs and astrocytes are required in the future, and we also discuss this in the corresponding texts. (The same question has been answered in the first part of this letter.)

      In Figure 3, the authors discuss the important issues of shared gene expression between interneurons and NB/im-GCs. In the published work (Zhou et al. Nature 2022; Wang et al. Cell Research 2022), however, NBs and im-GCs are not located in the interneuron cluster. This needs to be stated to avoid confusion. Specifically, this suggests the limitation of using a few preselected markers for cell type identification. The author should also examine whether these shared markers are indeed expressed in human interneurons by immunostaining as one application of these markers will be in histology for the field.

      Thanks for the reviewer’s comments. We agree that single nucleus transcriptome analysis is capable of effectively distinguishing between immature neurons and interneurons. In our UMAP plot, the NBs and im-GCs are not located in the interneuron cluster, either. When we compared the granule cell lineage which contains NB/immature GC and the interneuron population at the whole transcriptome level between our dataset and published mouse (Hochgerner et al. 2018), macaque and human (Franjic et al. 2022) transcriptome datasets, we found high transcriptomic congruence across different datasets (Figure 3-figure supplement 3A). Specifically, our identified human GABA-INs very highly resembled the well-annotated interneurons in different species (similarity scores > 0.95) (Figure 3-figure supplement 3A). The point we want to convey here is that many markers previously used to identify immature neurons are also expressed in interneurons. Therefore, when using these markers for staining and identification purposes, there is a possibility of mistaking an interneuron for an immature neuron. Hence, when selecting markers, we need to be aware of this and exclude genes that are highly expressed in interneurons as markers for immature neurons. To support our view, we conducted co-immunostainings of DCX (a traditional neuroblast marker) and SST (a typical interneuron marker). Our results demonstrate that SST-positive interneurons are indeed capable of being stained by the traditional neuroblast marker DCX in primates. Please see Figure 3-figure supplement 4A-C.

      In Figure 4, the authors' classification of cell subpopulations in the neuronal lineage is not convincing. They claim to have identified two subpopulations of granule cells (GCs) that derive from neuroblasts in Figure 4A-4D. However, this is inconsistent with previous single-cell transcriptomic studies of human hippocampus, which only identified one GC cluster. The differentially expressed genes (DEGs) that they used to distinguish the two GC subpopulations are not supported by prior research. This could be a result of over-classification or technical bias. CALB1 marks mature neurons whereas CALB2 marks immature neurons. However, in Figure 4F, it suggests that CALB1 is expressed in cells that have similar pseudotime scores as CALB2, both of which reside in an intermediate position during the differentiation trajectory. This does not match the known expression patterns of these markers in GCs. The authors should explain this discrepancy and provide additional evidence to support their claims. In addition, for Figure 4F, the authors should address how the different cell fate groups correspond to cell clusters.

      We appreciate the concerns raised by the reviewer. Unfortunately, despite trying various strategies to confirm the identity of the two subpopulations of granule cells (GCs) derived from neuroblasts, we were unable to find a clear answer. As a result, we can only provide an objective description of the differences in gene expression and developmental trajectory and speculate that these differences may be related to their degree of maturity but are not aligned on the same trajectory.

      Regarding the expression of CALB1 and CALB2, the original Figure 4F did not provide precise positional information for these genes due to the compression of a large amount of gene information. In order to address this, we conducted a separate trajectory analysis specifically for CALB1 and CALB2 (Figure4-figure supplement 6B). The results of this analysis are in line with previous literature reports: CALB2 was found to be enriched in immature neurons, while CALB1 exhibited a delayed expression pattern and was enriched in mature neurons.

      The authors compared NSCs in different age groups in Figure 5, but their analysis in Figure S5A-D only included neonatal and aging stages, omitting adult stages. They should perform cross-age analyses with all three stages for consistency.

      Thank you for the reviewer's comments. We have now included the differentially expressed genes (DEGs) of the neurogenic lineage in the adult stage. Please see Figure5-supplyment 8.

      In Figure 6E, the authors should separate the data by age and calculate the proportion of the re-clustered cell groups, as they did in Figure 6B. In the re-clustered groups, how do the aNSCs and reactive astrocytes change with age?

      Thanks for the reviewer's comments. We have removed the previous Figure 6B and recalculated the proportions of the re-clustered cell groups, including reactive astrocytes (AS). The changes in the proportions of qNSC1, qNSC2, pNSC, aNSCs, and reactive astrocytes with age are now shown in Figure 6E of the updated version. We observed that the proportion of aNSCs decreases with age but increases after injury. Reactive astrocytes primarily appear in the injury group, while their proportion is very low in the other groups.

      In Figure 6E-H, the authors assert that the aNSC group in stroke injury can produce oligodendrocytes in vivo based on trajectory analysis, which is a bold claim and lacks literature support. Their evidence is insufficient, as it relies on a single in vitro study.

      Thanks for the reviewer's comments. We have provided more references to support our claim (e.g., El Waly, Cayre, and Durbec 2018; Parras et al. 2004; Enric Llorens-Bobadilla et al. 2015b; Koutsoudaki et al. 2016). These studies have indicated that under injury conditions, neural stem cells have potentials to differentiate into oligodendrocytes.

      In Figure S8 and the Discussion section, they compared their dataset with Zhou et al. (Nature 2022), a published snRNA-seq dataset of the human hippocampus across the lifespan. The authors speculated that the new neurons identified in the EdU in vitro culture analysis in Zhou et al. might be related to epilepsy, but they did not provide any evidence for this claim. To partially validate their speculation, the authors should conduct the same integrative analysis with Ayhan et al. (Neuron 2021), which examined snRNA-seq data from epileptic patient hippocampi, to demonstrate that they could detect the injury-induced aNSC population and injury-associated genes. Furthermore, they should also conduct the same integrative analysis with the other two published human hippocampal datasets, namely Franjic et al. (Neuron 2022) and Wang et al. (Cell Research 2022).

      Thanks for the reviewer's comments. As the reviewer’s request, we down loaded the snRNA-seq data from Zhou et al. (Nature 2022), Wang et al (Cell Research, 2022a), Franjic et al. (Neuron 2022) and Ayhan et al. (Neuron 2021) for integrative analysis. Except for the dataset from Zhou et al. (Nature 2022), which utilized machine learning and made it difficult to extract cell type information for fitting with our own data, the datasets from the other three laboratories were successfully mapped onto our dataset. Different levels of correlation were observed, confirming the presence of astrocytes, qNSCs, aNSCs, and NBs (Figure 6-figure supplement 11 E to G).

      There are a few minor concerns that the authors could improve upon. In Fig. 5D, HOPX immunostaining pattern doesn't not look like NSCs. In Figure 5B and 6B, the same data were presented twice. And proper statistical tests are missing in Figure 6B.

      Thanks for the reviewer's comments. We have added the arrowheads to indicate the typical immunostaining of HOPX immunostaining, which clearly shows its nuclear localization. This observation is consistent with previous reports on the subcellular distribution of HOPX protein. In the updated version, Figure 5B and 6D are distinct and not repetitive. The inclusion of the proportions of reactive astrocytes in Figure 6D provides valuable information about their distribution within the different groups. Unfortunately, statistical tests cannot be conducted for the neonatal and injury samples since only one sample is available in each case.

      # Reviewer 2

      Major points:

      1) The number of sequenced nuclei is lower than the calculated numbers of nuclei required for detecting rare cell types according to a recent meta-analysis of five similar datasets (Tosoni et al., Neuron, 2023). However, Yao et al report succeeding in detecting rare populations, including several types of neural stem cells in different proliferation states, which have been demonstrated to be extremely scarce by previous studies. It would be very interesting to read how the authors interpret these differences.

      We appreciate the valuable comments from the reviewer. We understand the reviewer’s concern and have also noticed that according to the computational modeling conducted by Tosoni et al. (Neuron, 2023), at least 21 neuroblast cells (NBs) can be identified out of 30,000 granule cells (GCs) from a total of 180,000 dentate gyrus (DG) cells. In our dataset, we sequenced 24,671 GC nuclei and 92,966 total DG cell nuclei, which also includes neonatal samples. The number of nuclei we sequenced is 4.5 times higher than that of Wang et al. (Cell Research, 2022), who also detected NBs. Therefore, it is reasonable to conclude that we were able to detect NBs. Moreover, the presence of these rare cell types has been demonstrated in our study through immunostaining techniques, which provides further evidence. we have implemented strict quality control measures to support the reliability of our sequencing data. These measures include: 1. Immediate collection of tissue samples after postmortem (3-4 hrs) to ensure the quality of isolated nuclei. 2. Only nuclei expressing more than 200 genes but fewer than 5000-8600 genes (depending on the peak of enrichment genes) were considered. On average, each cell detected around 3000 genes. 3. The average proportion of mitochondrial genes in each sample was approximately 1.8%, with no sample exceeding 5%. The related supplementary information has been included in Figure 1-supplement 1A, B and F, and Figure 1source data 1.

      2) The information regarding the donors including in this study is very scarce. Factors such as chronic conditions, medication, lifestyle parameters, inflammatory levels should be provided.

      Thanks for the reviewer's comments. We have incorporated additional details about the donors. However, we would like to clarify that information regarding lifestyle parameters has not been collected. Please refer to Figure 1-source data 1 for the updated information.

      3) The number of donors included per group is insufficient: neonatal group n=1; adult group n=2; stroke n=1. Although the scarcity and value of each human brain sample is a factor to be considered, the authors must explain why and how the results obtained from individuals can be extrapolated to the population at these low numbers, especially considering that the rate of adult hippocampal neurogenesis is assumed to be very variable across individuals (Tosoni et al., Neuron, 2023).

      Thanks for the reviewer's comments. We acknowledge these limitations and understand that the inclusion of a larger number of donors would strengthen the statistical power and generalizability of our findings. However, due to the scarcity of stroke or neonatal human samples, it was not feasible to collect a larger sample size within the expected timeframe. To explain why and how we could identify the rare neurogenic populations, we have shown that the number of cells captured from individual samples and the average number of genes detected per cell are sufficient, indicating overall good sequencing quality (Figure 1-supplement 1A and B, and Figure 1-source data 1). Additionally, we have further confirmed the presence of these cell types with low abundance by integrating immunofluorescence staining (Figure 4E and Figure 6F), cell type-specific gene expression (Figure1 C and D), overall transcriptomic characteristics (Figure 1-supplement 1E), and developmental potential (Figure4 A-D, Figure 6A-D).

      4) The definition of primed NSCs (pNSCs) is poor and questionable. "Primed" may be interpreted as a loaded term and the authors only make an effort to follow them into their neurogenic trajectory while figure 4A suggest that they also, if not preferentially judging on the directionality of the RNA velocity vectors, generate astrocytes and quiescent NSCs.

      Thanks for the reviewer's comments. We apologize for not clearly explaining the definition of pNSC in our study. We have now included an explanation in the text and added supplementary information to highlight the features of pNSC and aNSC (Figure 2H to J, Figure2-figure supplement 2D and E). The results of GO analysis reveal that pNSC is more associated with the Wnt signaling pathway, axonogenesis, and Hippo signaling, while aNSC is more associated with G2/M transition of mitotic cell cycle, neuron projection development, axon development, and dendritic spine organization (Figure2-figure supplement 2E, Figure 2-source data 2). The pNSCs referred to in this study represent an intermediate state between quiescence and activation. During embryonic development, pNSCs exhibit the greatest similarity to RGLs. Subsequently, pNSCs gradually exit the cell cycle and transition into qNSCs during the postnatal development (Figure 2J). Thus, in Figure 4A, for the neonatal sample analysis, some pNSCs are shown to enter the neurogenic trajectory, while others exit the cell cycle and transition into qNSCs or become astrocytes (AS2) during postnatal development, indicating a bidirectional trajectory.

      5) The experimental definition of quiescent NSCs (qNSC1) is poor and questionable. The qNSC1 cluster is defined by the expression of HOXP (page 6), which the authors indicate is a"quiescence NSC gene". However, at least in mice, HOXP collages with BrdU in proliferative NSCs (Deqiang Li et al, Stem Cell Res. 2015).

      Thank you for providing the information about the study conducted by Deqiang Li et al (Stem Cell Res. 2015). We have carefully reviewed their findings. They propose that Hopx is specifically expressed in RGL cells, which are predominantly in a quiescent state. Additionally, they observed that Hopx-positive cells are long-term BrdU-label retaining cells, and Hopx-null NSCs show enhanced neurogenesis, as evidenced by an increased number of BrdU-positive cells. These results suggest that high expression of Hopx in NSCs indicates their quiescence. Furthermore, other studies have provided further support for using high expression of the HOPX gene as a marker to identify quiescent NSCs (Jaehoon Shin et al., Cell Stem Cell 2015; Daniel A. Berg et al., Cell 2019)

      6) The term quiescent is never defined in the text, and the reader is forced to assume that they refer to the absence of active proliferation genes, most commonly MKI67. Is that what the authors intended? this should be clarified.

      Thanks for the reviewer's comments. We apologize for not clearly explaining the definition of qNSC in our study. We have now included an explanation in the text. qNSCs exhibit reversible cell cycle arrest and display a low rate of metabolic activity. However, they still possess a latent capacity to generate neurons and glia when they receive activation signals. They express genes such as GFAP, ALDH1L1, ID4, and HOPX (Figure 2B). The absence or low expression of active proliferation genes is one feature of qNSCs. The main difference lies in the state of the cell cycle and metabolism.

      7) They find cell clusters that express the proliferation marker MKI67. however, previous studies have indicated the difficulty of snRNA-seq techniques to detect proliferation marker transcripts, specially MKI67 even in hippocampal samples from human infants (for example see the snRNAseq studies from Wang and from Zhou cited by the authors and previously mentioned meta-analysis).

      Thanks for the reviewer's comments. We could detect MKI67 in our snRNA-seq data, albeit with a very low number of cells (not clustered) expressing it. Here, we are providing the feature plot in Author response image 1 to illustrate the expression of MKI67. In our Figure 5C, we compared the expression level of MKI67 in neurogenic lineage among neonatal, adult and aged groups, and observed its high expression in neonatal rather than adult and aged groups. But the fraction of cells expressed MIK67 is still very low. We apologize for the confusion. We did not claim that we identified specific cell clusters expressing MKI67 in our study.

      Author response image 1.

      8) The authors observe declining numbers of proliferating cells with aging and interpret this as evidence of declining neurogenesis. However, they also observe sustained neuroblast numbers in the aged brains they analyzed. Wouldn't these neuroblast support neurogenesis? This is unclear and should be discussed.

      Thanks for the reviewer's question. We will revise the inaccurate description to clarify that the number of proliferating NPCs, rather than immature neurons, is dramatically reduced with aging. This is because, compared to rodents, immature neurons in primates are indeed retained for a longer period and possess the potential to further develop into mature neurons (Kohler, S.J., et al., PNAS, 2011). We have discussed this in the corresponding texts (Figure 5).

      9) The authors indicate that they find DCX transcript expression in interneurons. This is a potentially interesting observation. However, the authors should be very clear to state that in most studies that use DCX as a marker of immature granule cells, DCX's expression is detected by immunohistochemistry. Therefore, the fact that DCX transcripts may be present in other immature neurons does not necessarily disqualify its use as a protein maker of immature granule cells. This clarification will help to prevent misinterpretations of the data presented by the authors.

      Thanks for the reviewer's suggestion. We have clarified that we observed DCX transcripts present in interneurons in addition to immature neurons by snRNAseq. In this revised version, we conducted co-immunostainings of DCX (a traditional neuroblast marker) and SST (a typical interneuron marker). Our results demonstrate that SST-positive interneurons are indeed capable of being stained by the traditional neuroblast marker DCX in primates. Please see Figure 3-figure supplement 4A-C. The similar result has also been reported by Franjic et al. (Neuron 2022).

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of the current study was to evaluate the effect of neuronal activity on blood-brain barrier permeability in the healthy brain, and to determine whether changes in BBB dynamics play a role in cortical plasticity. The authors used a variety of well-validated approaches to first demonstrate that limb stimulation increases BBB permeability. Using in vivo-electrophysiology and pharmacological approaches, the authors demonstrate that albumin is sufficient to induce cortical potentiation and that BBB transporters are necessary for stimulus-induced potentiation. The authors include a transcriptional analysis and differential expression of genes associated with plasticity, TGF-beta signaling, and extracellular matrix were observed following stimulation. Overall, the results obtained in rodents are compelling and support the authors' conclusions that neuronal activity modulates the BBB in the healthy brain and that mechanisms downstream of BBB permeability changes play a role in stimulus-evoked plasticity. These findings were further supported with fMRI and BBB permeability measurements performed in healthy human subjects performing a simple sensorimotor task. While there are many strengths in this study, there is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions. The authors only used males in this study and do not discuss whether they would also expect to sex differences in stimulation-evoked BBB changes in the healthy brain. Another minor limitation is the authors did not address the potential impact of anesthesia which can impact neurovascular coupling in rodent studies. The authors could have also better integrated the RNAseq findings into mechanistic experiments, including testing whether the upregulation of OAT3 plays a role in cortical plasticity observed following stimulation. Overall, this study provides novel insights into how neurovascular coupling, BBB permeability, and plasticity interact in the healthy brain.

      While there are many strengths in this study, there is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions. The authors only used males in this study and do not discuss whether they would also expect to sex differences in stimulation-evoked BBB changes in the healthy brain.

      We agree with the reviewer regarding the importance of examining sex differences on stimulation-evoked BBB changes. To address this issue we have: (1) clarified in the methods section that the human study involved both males and females; (2) added a section to the discussion highlighting the male bias as a key limitation of our animal experiments; and (3) stated that future work should examine whether stimulation-evoked BBB changes differ between makes and females.

      Another minor limitation is the authors did not address the potential impact of anesthesia which can impact neurovascular coupling in rodent studies.

      We are grateful for this comment and agree with the reviewer that the potential effects of anesthesia should be discussed. We have added the following discussion paragraph:

      “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can affect various receptors within the NVU, potentially altering neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketamine-xylazine anesthesia, which unlike other anesthetics, was shown to generate robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018).”

      Reviewer #2 (Public Review):

      Summary:

      This study builds upon previous work that demonstrated that brain injury results in leakage of albumin across the bloodbrain barrier, resulting in activation of TGF-beta in astrocytes. Consequently, this leads to decreased glutamate uptake, reduced buffering of extracellular potassium, and hyperexcitability. This study asks whether such a process can play a physiological role in cortical plasticity. They first show that stimulation of a forelimb for 30 minutes in a rat results in leakage of the blood-brain barrier and extravasation of albumin on the contralateral but not ipsilateral cortex. The authors propose that the leakage is dependent upon neuronal excitability and is associated with an enhancement of excitatory transmission. Inhibiting the transport of albumin or the activation of TGF-beta prevents the enhancement of excitatory transmission. In addition, gene expression associated with TGF-beta activation, synaptic plasticity, and extracellular matrix are enhanced on the "stimulated" hemisphere. That this may translate to humans is demonstrated by a breakdown in the blood-brain barrier following activation of brain areas through a motor task.

      Strengths:

      This study is novel and the results are potentially important as they demonstrate an unexpected breakdown of the blood-brain barrier with physiological activity and this may serve a physiological purpose, affecting synaptic plasticity.

      The strengths of the study are:

      1) The use of an in vivo model with multiple methods to investigate the blood-brain barrier response to a forelimb stimulation.

      2) The determination of a potential functional role for the observed leakage of the blood-brain barrier from both a genetic and electrophysiological viewpoint.

      3) The demonstration that inhibiting different points in the putative pathway from activation of the cortex to transport of albumin and activation of the TGF-beta pathway, the effect on synaptic enhancement could be prevented.

      4) Preliminary experiments demonstrating a similar observation of activity-dependent breakdown of the blood-brain barrier in humans.

      Weaknesses:

      There are both conceptual and experimental weaknesses.

      1) The stimulation is in an animal anesthetized with ketamine, which can affect critical receptors (ie NMDA receptors) in synaptic plasticity.

      We agree that the potential effects of anesthesia should be considered. The Discussion was revised to address this point: “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can affect various receptors within the NVU, potentially altering neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketamine-xylazine anesthesia, which unlike other anesthetics, was shown to generate robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018)”

      2) The stimulation protocol is prolonged and it would be helpful to know if briefer stimulations have the same effect or if longer stimulations have a greater effect ie does the leakage give a "readout" of the stimulation intensity/length.

      Thank you for this important comment. We are also very curious about the potential relationship between stimulation magnitude/duration and subsequent leakage and have added the following statement to the discussion:

      “Future studies should also explore the effects of stimulation magnitude/duration on BBB modulation, as well as the stimulation threshold between physiological and pathological increase in BBB permeability.”

      Our current findings indicate that a one-minute stimulation does not affect vascular permeability or SEP and we aim to test additional stimulation paradigms in future studies.

      3) For some of the experiments (see below), the numbers of animals are low and the statistical tests used may not be the most appropriate, making the results less clear cut.

      We appreciate this comment and have revised the statistical analysis of Figure 1J,K. We now use a nested t-test to test for differences between rats (as opposed to sections). The differences remain significant (EB, p=0.0296; Alexa, p=0.0229). The text was modified accordingly.

      4) The experimental paradigms are not entirely clear, especially the length of time of drug application and the authors seem to try to detect enhancement of a blocked SEP.

      Thank you for pointing this out. Figures 2&3 were revised for clarification and a ‘Drug Application’ subsection was added to the methods section.

      5) It is not clear how long the enhancement lasts. There is a remark that it lasts longer than 5 hours but there is no presentation of data to support this.

      Thank you for this comment. As the length of experiments differed between animals, the exact length could not be specifically stated. To clarify this point, we revised the text to indicate that LTP was recorded until the end of each experiment (between 1.5-5 hours, depending on the condition the animal was in). We also added a panel to figure 2 (Figure 2d) with exemplary data showing potentiation 60, 90, and 120 min post stimulation.

      6) The spatial and temporal specificity of this effect is unclear (other than hemispheric in rats) and even less clear in humans.

      Our animal experiments (using both in vivo imaging and histological analysis) showed no evidence of BBB modulation outside the cortical somatosensory area corresponding to the limbs. We looked at the entirety of the coronal section of the brain and found enhancement solely in the somatosensory area corresponding to limb. The right side of panels h and i in Figure 1 show an x20 magnification of the section, focusing on the enhanced area. The whole section was not shown, as no fluorescence was found outside the magnified area. Moreover, our quantification showed that the enhancement was specific to the contralateral and not ipsilateral somatosensory cortex (Figure 1 j-k).

      We agree that temporal specificity needs to be further explored, and we have now stated that in the discussion: “Future studies are needed to explore the BBB modulating effects of additional stimulation protocols – with varying durations, frequencies, and magnitudes. Such studies may also elucidate the temporal and ultrastructural characteristics that may differentiate between physiological and pathological BBB modulation.”

      We also agree that larger studies are needed to better understand the specificity of the observed effect in humans, and to account for potential inter-human variability in vascular integrity and brain function due to different schedules, diets, exercise habits, etc.

      8) The experimenters rightly use separate controls for most of the experiments but this is not always the case, also raising the possibility that the application of drugs was not done randomly or interleaved, but possibly performed in blocks of animals, which can also affect results.

      Thank you for pointing out this lack of clarity. We have now highlighted that drug application was done randomly.

      9) Methyl-beta-cyclodextrin clears cholesterol so the effect on albumin transport is not specific, it could be mediating its effect through some other pathway.

      We agree that the effect of mβCD may not be specific. To mitigate this issue, we used a very low mβCD concentration (10uM). Notably, this is markedly lower than the concentrations reported by Koudinov et al, showing that cholesterol depletion is observed at 5mM mβCD and not at 2.5mM/5mM (Koudinov & Koudinova, 2001). This point was added to the discussion.

      10) Since the breakdown of the blood-brain barrier can be inhibited by a TGF-beta inhibitor, then this implies that TGFbeta is necessary for the breakdown of the blood-brain barrier. This does not sit well with the hypothesis that TGF-beta activation depends upon blood-brain barrier leakage.

      Thank you for pointing out this lack of clarity. We have added a discussion paragraph that clarifies our hypothesis: “As mentioned above, albumin is a known activator of TGF-β signaling, and TGF-β has a well-established role in neuroplasticity. Interestingly, emerging evidence suggests that TGF-β also increases cross-BBB transcytosis (Betterton et al., 2022; Kaplan et al., 2020; McMillin et al., 2015; Schumacher et al., 2023). Hence, we propose the following two-part hypothesis for the TGF-β/BBB-mediated synaptic potentiation observed in our experiments: (1) prolonged stimulation triggers TGF-β signaling and increased caveolae-mediated transcytosis of albumin; and (2) extravasated albumin induces further TGF-β signaling, leading to synaptogenesis and additional cross-BBB transport – in a self-reinforcing positive feedback loop. Future research is needed to examine the validity of this hypothesis.

      Reviewer #3 (Public Review):

      Summary:

      This study used prolonged stimulation of a limb to examine possible plasticity in somatosensory evoked potentials induced by the stimulation. They also studied the extent that the blood-brain barrier (BBB) was opened by prolonged stimulation and whether that played a role in the plasticity. They found that there was potentiation of the amplitude and area under the curve of the evoked potential after prolonged stimulation and this was long-lasting (>5 hrs). They also implicated extravasation of serum albumin, caveolae-mediated transcytosis, and TGFb signalling, as well as neuronal activity and upregulation of PSD95. Transcriptomics was done and implicated plasticity-related genes in the changes after prolonged stimulation, but not proteins associated with the BBB or inflammation. Next, they address the application to humans using a squeeze ball task. They imaged the brain and suggested that the hand activity led to an increased permeability of the vessels, suggesting modulation of the BBB.

      Strengths:

      The strengths of the paper are the novelty of the idea that stimulation of the limb can induce cortical plasticity in a normal condition, and it involves the opening of the BBB with albumin entry. In addition, there are many datasets and both rat and human data.

      Weaknesses:

      The conclusions are not compelling however because of a lack of explanation of methods and quantification. It also is not clear whether the prolonged stimulation in the rat was normal conditions. To their credit, the authors recorded the neuronal activity during stimulation, but it seemed excessive excitation. Since seizures open the BBB this result calls into question one of the conclusions. that the results reflect a normal brain. The authors could either conduct studies with stimulation that is more physiological or discuss the caveats of using a supraphysiological stimulus to infer healthy brain function.

      The conclusions are not compelling however because of a lack of explanation of methods and quantification.

      Thank you for this comment. In the revised paper, we expanded the Methods section to better describe the procedures and approaches we used for data analysis.

      It also is not clear whether the prolonged stimulation in the rat was normal conditions.

      We believe that the used stimulation protocol is within the physiological range (and relevant to plasticity, learning and memory) for the following reasons:

      1) In our continuous electrophysiological recordings, we did not observe any form of epileptiform or otherwise pathological activity.

      2) Memory/training/skill acquisition experiments in humans often involve similar training duration or longer (Bengtsson et al., 2005), e.g., a 30 min thumb training session performed by (Classen et al., 1998).

      3) The levels of SEP potentiation we observed are similar to those reported in:

      a) Rats following a 10-minute whisker stimulation (one hour post stimulation, (Mégevand et al., 2009)).

      b) Humans following a 15 min task (McGregor et al., 2016).

      This important point is now presented in the discussion.

      Reviewer #1 (Recommendations For The Authors):

      The discussion would benefit from additional discussion of the potential impacts of sex and anesthesia in their findings.

      We agree with the reviewer and have added the following paragraph to the discussion:

      “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can potentially alter neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketaminexylazine anesthesia, which unlike other anesthetics, was shown to maintain robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018). Another limitation of our animal study is the potentially non-specific effect of mβCD – an agent that disrupts caveola transport but may also lead to cholesterol depletion (Keller & Simons, 1998). To mitigate this issue, we used a very low mβCD concentration (10uM), orders of magnitude below the concentration reported to deplete cholesterol (Koudinov et al). Lastly, our animal study is limited by the inclusion of solely male rats. While our findings in humans did not point to sex-related differences in stimulation-evoked BBB modulation, larger animals and human studies are needed to examine this question.”

      The figure text is quite small.

      Thank you for pointing this out, we revised all figures and increased font size for clarity.

      Including pharmacological concentrations within the figure legends would improve the readability of the manuscript.

      Thank you for this suggestion, the figure legends were modified accordingly.

      In methods for immunoassays the 5 groups could be more clear by stating that there are 3 timepoints for stimulation experiments. There is a typo in this section where the 24-hour post is stated twice in the same sentence.

      Thank you for pointing this out, the text was modified accordingly.

      Reviewer #2 (Recommendations For The Authors):

      1) In Figure 1, J and K seem to indicate that in these experiments the statisitics were done per slice and not per animal. This is not a reasonable approach, a repeat measure ANOVA or averaging for each animal are more appropriate statistical approaches.

      We thank the reviewer for pointing this out. The statistical analysis for Figure 1j,k was modified. We now use a nested ttest to test for differences between rats and not sections. The differences are still significant (EB, p=0.0296; Alexa, p=0.0229). The manuscript was modified accordingly.

      2) In Figure 2, the protocol does not seem to give much idea about time course. There was a stimulation test for 1 minute before and then 1 minute after the 30-minute stimulation train. How was potentiation assessed for the next 5 hours and where are the data?

      Potentiation was assessed by repeating 1min test stim every 30 min for the duration of the experiment, we added a panel to show late potentiation, see response above.

      3) In Figure 2, there is a notable lack of controls eg the effect of sham stimulation and application of saline. These are important as the drift of response magnitude can be a problem in long experiments.

      We did test for the potential presence of response drift, by examining whether SEPs of non-stimulated animals change over time (at baseline, 30 or 60 minutes of recording; n=6). No statistical differences were found. Our analysis focused on using each animal as its own control (i.e., comparing baseline SEP to SEP post albumin perfusion), because SEP studies highlight the importance of comparing each animal to its own baseline, due to the large inter-animal variability (All et al., 2010; Mégevand et al., 2009; Zandieh et al., 2003).

      4) Figure 3 a is not clear – were the drugs applied throughout?

      Thank you for pointing this out. We have revised Figure 3 a to show that the drugs were applied for 50 min before the stimulation.

      5) In Figure 3 panel d is repeated in panel j. This needs correcting

      Thank you. This mistake was fixed.

      6) In LTP-type experiments usually the antagonist is applied during the stimulation and then washed out. This avoids the problem in this figure in which CNQX effectively blocks transmission and so it is not possible to detect any enhancement if it were there. Eg in panel e, CNQX block transmission, and then the assessment is performed when the AMPA receptors are blocked after 30 minutes of stimulation. If receptors are blocked no enhancement will be detectable. Moreover, surely the question is the ratio of the effect of 30-minute stimulation on the SEP in the presence of CNQX and so the statistics should be done on the fold change in the SEP following 30-minute stimulation in the presence of CNQX.

      Thank you. The protocol might have been misrepresented in the original figure. We modified Fig 3a to clarify that the antagonists were indeed washed out upon stimulation start to make sure the receptors are not blocked during the test stimulation following the 30 min stimulation. In addition, we tested for the difference in fold change between 30 min stim, and 30 min stimulation following antagonists wash-in (Fig 3f and Fig S2a).

      7) Interesting in Figure f, stimulation, albumin, and AP5 all seem to have the same enhancement of the SEP. Is the lack of effect of 30-minute stimulation in the presence of AP5, a ceiling effect ie AP5 has enhanced the SEP, and no further enhancement from stimulation is possible.

      This is a very interesting point that will require further research.

      8) SJN seems to block neurotransmission. What is the mechanism? The same analysis as for CNQX should be performed ie what is the fold change not compared to baseline but in the presence of SJN.

      Our quantification showed that SJN did not significantly reduce the SEP max amplitude, and we therefore did not include this graph in the figure.

      9) Please acknowledge that the effect of mbetaCD is non-specific. There is a large literature on the effects of cholesterol depletion on LTP.

      We agree that the effect of mβCD may not be specific. To mitigate this issue, we used a very low mβCD concentration (10µM). Notably, this is markedly lower than the concentrations reported by Koudinov et al, showing that cholesterol depletion is only observed at a concentration of 5mM (Koudinov & Koudinova, 2001). This point is now discussed under the discussion paragraph describing the study’s limitations.

      10) k&l seem to have used the same control in which case they should not be analysed separately (they are all part of the same experiment).

      We agree with the reviewer and have revised the figure accordingly.

      11) The difference in gene expression in Figure 4 would be more convincing if it could be prevented by for example a TGFbeta inhibitor.

      We agree and acknowledge the impact such experiments could provide. We plan to incorporate these experiments into our future studies.

      12) Figure 5 seems to indicate bilateral and widespread BBB modulation arguing that this may be a non-specific effect. Panel g should look at other neocortical regions eg occipital cortex.

      We agree and thank the reviewer for this comment. We revised the figure to include other cortical areas, such as the frontal and occipital cortices (Figure 5g)

      Minor comments

      1) Paired data eg in Fig 2D are better represented by pairing the dots usually with a line.

      2) Please correct the %fold baseline in axes in graphs which show % change for baseline.

      3) Figure 4 is not correctly referred to in the text.

      We agree with all the points raised by the reviewer and revised the figures and text accordingly.

      Reviewer #3 (Recommendations For The Authors):

      The conclusions are not compelling however because of a lack of explanation of methods and quantification. It also is not clear whether the prolonged stimulation in the rat was normal conditions. To their credit, the authors recorded the neuronal activity during stimulation, but it seemed excessive excitation. Since seizures open the BBB this result calls into question one of the conclusions. that the results reflect a normal brain. The authors could either conduct studies with stimulation that is more physiological or discuss the caveats of using a supraphysiological stimulus to infer healthy brain function.

      Major concerns:

      Methods need more explanation. Rationales need more justification. Examples are provided below.

      Throughout many sections of the paper, sample sizes and stats are often missing. For stats, please provide p-values and other information (tcrit, U statistic, F, etc.)

      Thank you, we added the relevant information where it was missing throughout the manuscript.

      For transcriptomics, they might have found changes in BBB-related genes if they assayed vessels but they assayed the cortex.

      We agree with the reviewer that this would be a very interesting future direction. The present study could not include this kind of analysis due to lack of access to vasculature isolation methods or single-cell RNA seq.

      What were the inclusion/exclusion criteria for the subjects?

      Thank you for pointing out this lack of clarity. The methods section (under ‘Magnetic Resonance Imaging’ – ‘Participants’) was expanded to include the following:

      “Male and female healthy individuals, aged 18-35, with no known neurological or psychiatric disorders were recruited to undergo MRI scanning while performing a motor task (n=6; 3 males and 3 females). MRI scans of 10 sex- and age- matched individuals (with no known neurological or psychiatric disorders) who did not perform the task were used as control data (n=10; 5 males and 5 females.

      Were they age and sex-matched?

      They were, indeed, age and sex-matched. This was now clarified in the relevant Methods section.

      Were there other factors that could have influenced the results?

      Certainly. Human subjects are difficult to control for due to different schedules, diets, exercise habits, and other factors that may impact vascular integrity and brain function. Larger multimodal studies are needed to better understand the observed phenomenon.

      Fig. 1. Images are very dim. Text here and in other figures is often too small to see. Some parts of the figures are not explained.

      Our apologies. Figures and legends were revised accordingly.

      Fig 2a, f. I don't see much difference here- do the authors think there was?

      We agree that the difference may not be visually obvious. The quantification of trace parameters (amplitude and area under curve) does, however, reveal a significant SEP difference in response to both stimulation (panels X and y) and albumin (panels z and q).

      Fig 3 d and j seem the same.

      We thank the reviewer for noticing. This was a copy mistake that was now rectified.

      Lesser concerns and examples of text that need explana9on:

      Introduction

      Insulin-like growth factor is transported. From where to where?

      The text was edited to clarify that this was cross-BBB influx of insulin-like growth factor-I.

      RMT that underlies the transport of plasma proteins was induced by physiological or non-physiological stimulation.

      This was shown without stimulation, in normal physiology of young and aged healthy mice. The text was edited to clarify this point.

      What was the circadian modulation that was shown to implicate BBB in brain function?

      The text was edited for clarity.

      Results

      When the word stimulation is used please be specific if whiskers are moved by an experimenter, an electrode is used to apply current, etc.

      We have now moved the ‘Stimulation protocol’ section closer to beginning of the Methods and emphasized that we administered electrical stimulation to the forepaw or hindlimb using subdermal needle electrodes.

      Please explain how the authors are convinced they localized the vascular response.

      The vascular response was localized via: (1) visual detection of arterioles that dilated in response to stimulation (due to functional hyperemia / neurovascular coupling) [figure 1 d]; and (2) quantitative mapping of increased hemoglobin concentration (Bouchard et al., 2009) [Figure 1 b]. This is now mentioned in the methods (under ‘In vivo imaging’) and results (under the ‘Stimulation increases BBB permeability’).

      "30 min of limb stimulation" means what exactly? 6 Hz 2mA for 30 min?

      Thank you. The text was revised for clarity (Methods under ‘Stimulation protocol’):

      “The left forelimb or hind limb of the rat was stimulated using Isolated Scmulator device (AD Instruments) attached with two subdermal needle electrodes (0.1 ms square pulses, 2-3 mA) at 6 Hz frequency. Test stimulation consisted of 360 pulses (60 s) and delivered before (as baseline) and after long-duration stimulation (30 min, referred throughout the text as ‘stimulation’). In control and albumin rats, only short-duration stimulations were performed. Under sham stimulation, electrodes were placed without delivering current.”

      Histology that was performed to confirm extravasation needs clarification because if tissue was removed from the brain, and fixed in order to do histology, what is outside the vessels would seem likely to wash away.

      Thank you for pointing out the need to clarify this point. The Histology description in the Methods section was revised in the following manner:

      “Albumin extravasacon was confirmed histologically in separate cohorts of rats that were anesthetized and stimulated without craniotomy surgery. Assessment of albumin extravasacon was performed using a well-established approach that involves peripheral injection of either labeled-albumin (bovine serum albumin conjugated to Alexa Flour 488, Alexa488-Alb) or albumin-labeling dye (Evans blue, EB – a dye that binds to endogenous albumin and forms a fluorescent complex), followed by histological analysis of brain tissue (Ahishali & Kaya, 2020; Ivens et al., 2007; Lapilover et al., 2012; Obermeier et al., 2013; Veksler et al., 2020). Since extravasated albumin is taken up by astrocytes (Ivens et al., 2007; Obermeier et al., 2013), it can be visualized in the brain neuropil after brain removal and fixation (Ahishali & Kaya, 2020; Ivens et al., 2007; Lapilover et al., 2012; Veksler et al., 2020). Five rats were injected with Alexa488-Alb (1.7 mg/ml) and five with EB (2%, 20 mg/ml, n=5). The injections were administered via the tail vein. Following injection, rats were transcardially perfused with…”

      It is not clear why there was extravasacon contralateral but not ipsilateral if there are cortical-cortical connections.

      Interpersonally, we also did not observe ipsilateral SEP in response to limb stimulation, with evidence of SEP and BBB permeability only in the contralateral sensorimotor region. This finding is consistent with electrophysiological and fMRI studies showing that peripheral stimulation results in predominantly contralateral potentials (Allison et al., 2000; Goff et al., 1962).

      After injection of Evans blue or Alexa-Alb, how was it shown that there was extravasacon?

      Extravasalon in cortical sections was visualized using a fluorescent microscope (Figure 1 h-i). Since extravasated albumin is taken up by astrocytes, fluorescent imaging can be used for visualizing and quantifying labeled albumin (Ahishali & Kaya, 2020; Ivens et al., 2007; Knowland et al., 2014). Here is the relevant methods excerpt:

      “Coronal sections (40-μm thick) were obtained using a freezing microtome (Leica Biosystems) and imaged for dye extravasacon using a fluorescence microscope (Axioskop 2; Zeiss) equipped with a CCD digital camera (AxioCam MRc 5; Zeiss).”

      How is a sham control not stimulated - what is the sham procedure?

      In the sham stimulation protocol electrodes were placed, but current was not delivered. A section titled ‘Stimulation protocol’ was added to the methods to clarify this point.

      What was the method for photothrombosis-induced ischemia?

      The procedure for photothrombosis-induced ischemia is described under the Methods section ‘Immunoassays’ – ‘Enzyme-linked immunosorbent assay (ELISA) for albumin extravasalon’:

      “Rats were anesthetilzed and underwent … photothrombosis stroke (PT) as previously described (Lippmann et al., 2017; Schoknecht et al., 2014). Briefly, Rose Bengal was administered intravenously (20 mg/kg) and a halogen light beam was directed for 15 min onto the intact exposed skull over the right somatosensory cortex.”

      Fig 1d. All parts of d are not explained.

      Thank you for pointing this out. In the revised manuscript, the panels of this figure were slightly reordered, and we made sure all panels are explained in the legend.

      e. Is the LFP a seizure? How physiological is this- it does not seem very physiological.

      Thank you for your comment. We believe that this activity is not a seizure because it lacks the typical slow activity that corresponds to the “depolarizalon shir” observed during seizures (Ivens et al., 2007; Milikovsky et al., 2019; Zelig et al., 2022).

      f. Permeability index needs explanation. How was the area chosen for each rat? Randomly? Was it the same across rats?

      We have now revised the Methods section to provide a clearer description of the permeability index calculation and the choice of the imaging area:

      “Across all experiments, acquired images were the same size (512 × 512 pixel, ~1x1 mm), centered above the responding arteriole. Images were analyzed offline using MATLAB as described (Vazana et al., 2016). Briefly, image registration and segmentation were performed to produce a binary image, separating blood vessels from extravascular regions. For each extravascular pixel, a time curve of signal intensity over time was constructed. To determine whether an extravascular pixel had tracer accumulation over time (due to BBB permeability), the pixel’s intensity curve was divided by that of the responding artery (i.e., the arterial input function, AIF, representing tracer input). This ratio was termed the BBB permeability index (PI), and extravascular pixels with PI > 1 were identified as pixels with tracer accumulation due to BBB permeability.”

      g. For Evans blue and Alexa-Alb was the sample size rats or sections?

      Thank you for this question. We revised the statistical analysis for Figure 1j,k to appropriately asses the differences between rats. We used a nested t-test to test for differences between rats (and not sections). The differences remained significant (EB, p=0.0296; Alexa, p=0.0229) and the text was modified accordingly.

      h, i, j need more contrast and/or brightness to appreciate the images. Arrows would help. The text is too small to read.

      Thank you. This issue was addressed in the revised paper.

      To induce potentiation, 6 Hz 2 mA stimuli were used for 30 min. Please justify this as physiological.

      Thank you for the comment. We believe that the used stimulation protocol is within the physiological range (and relevant to plasticity, learning and memory) for the following reasons:

      1. In our continuous electrophysiological recordings, we did not observe any form of epileptiform or otherwise pathological activity.

      2. Memory/training/skill acquisition experiments in humans often involve similar training duration or longer (Bengtsson et al., 2005), e.g., a 30 min thumb training session performed by (Classen et al., 1998).

      3. The levels of SEP potentiation we observed are similar to those reported in:

      a. Rats following a 10-minute whisker stimulation (one hour post stimulation, (Mégevand et al., 2009)).

      b. Humans following a 15 min task (McGregor et al., 2016).

      We have revised the Discussion of the paper to clarify this important point.

      The test stimulus to evoke somatosensory evoked potentials was 1 min. Was this 6 Hz 2 mA for 1 min? Please justify.

      Yes. We chose these parameters as these ranges were shown to induce the largest changes in blood flow (with laserdoppler flowmetry) and summated SEP (Ngai et al., 1999), corresponding with our findings. We also show that these stimulation parameters do not induce changes in BBB permeability nor synaptic potentiation, therefore served as test control.

      How long after the 30 min was the test stimulus triggered- immediately? 30 sec afterwards?

      The test stimulus was applied 5 min afterwards to allow for BBB imaging protocol (now explained in the Methods section).

      How were amplitude and AUC measured? Baseline to peak? For AUC is it the sum of the upward and downward deflections comprising the LFP?

      Yes, and yes. This is now clarified in the ‘Analysis of electrophysiological recordings’ section in the Methods.

      How was the same site in the somatosensory cortex recorded for each animal?<br /> Potentiation was said to last >5 hrs. How often was it measured? Was potentiation the same for the amplitude and the AUC?

      The location of the cranial window over the somatosensory cortex was the same in all rats. The location of the specific responding arteriole may change between animals, but the recording electrode was places around the responding arteriole in the same approaching angle and depth for all animals.

      As the length of experiments differed between animals, the exact length could not be specifically stated. We therefore revised the text to clarify that LTP was recorded until the end of each experiment (depending on the animal condition, between 1.5-5 hours) and added a panel to figure 2 (Figure 2f) with exemplary data showing potentiation 120 min (2hr) post stimulation.

      Why was 25% of the serum level of albumin selected- does the brain ever get exposed to that much? Was albumin dissolved in aCSF or was aCSF chosen as a control for another reason?

      Yes, albumin was dissolved in aCSF and the solution was allowed to diffuse through the brain. The relatively high concentration of albumin was chosen to account for factors that lower its effective tissue concentration:

      1. The low diffusion rate of albumin (Tao & Nicholson, 1996).

      2. The likelihood of albumin to encounter a degradation site or a cross-BBB efflux transporter (Tao & Nicholson, 1996; Zhang & Pardridge, 2001).

      Figure 2.

      a. Please show baseline, the stimulus, and aftier the stimulus.

      Please point out when there was stimulacon.

      What is the inset at the top?

      The inset on top is the example trace of the stimulus waveform, the legend of the figure was modified for clarity.

      b. Please show when the stimulus artifact occurred. The end of the 1-minute test stimulus period is fine. Why are the SEPs different morphologies? It suggests the different locations in the cortex were recorded.

      What is shown is the averaged SEP response over 1min test stimulus, each SEP is time locked to each stimulus. Regarding SEP waveform, it does indeed show different morphology between animals, as sometimes different arterioles respond to the stimulation, and we localize the recording to the responding vessel in each rat. However, in each rat the recording is only from one location. Once the electrode was positioned near the responding arteriole it was not moved.

      d, e. What are the stats?

      h, i. Add stats. Are all comparisons Wilcoxon? Please provide p values.

      The comparisons were performed with the Wilcoxon test. We now state that and provide the exact p values.

      j. What was selected from the baseline and what was selected during Albumin and how long of a record was selected?

      What program was used to create the spectrogram?

      What is meant by changes at frequencies above 200 Hz, the frequencies of HFOs?

      The Method section (under ‘Electrophysiology – Data acquisition and analyses’) has been revised for clarification. Spectrogram was created with MATLAB and graphed with Prism. For analysis, we selected a 10 min recorded segment before starting albumin perfusion, and 10 min after terminating albumin perfusion.

      When the cortial window was exposed to drugs, what were concentrations used that were selective for their receptor? How long was the exposure?

      Was the vehicle tested?

      We have revised the Methods section (under ‘Animal preparation and surgical procedures - Drug application’) to clarify the duration and concentration used and justification. All blockers were exposed for 50 min. The vehicle was an artificial cerebrospinal fluid solution (aCSF).

      For PSD-95, what was the area of the cortex that was tested?

      Were animals acutely euthanized and the brain dissected, frozen, etc?

      We have revised the Methods section (under ‘Immunoassays’) for clarity.

      What is mbetaCD?

      The full term was added to the results section. It is also mentioned in the Methods.

      Is SJN specific at the concentration that was chosen? Did it inhibit the SEP?

      In the concentration used in our experiments, SJN is a selective TGF-β type I receptor ALK5 inhibitor (see (Gellibert et al., 2004)).

      Fig. 3b. It looks like CNQX increased the width of the vessels quite a bit. Please explain.

      For AP5, very large vessels were imaged, making it hard to compare to the other data.

      The vascular dilation in response to the stimulation under CNQX was similar to that seen under “normal” conditions (i.e. aCSF). As for AP5, in some experiments the responding arteriole was in close proximity to a large venule that cannot be avoidable while imaging. For quantification we always measured arterioles within the same diameter range.

      e. Sometimes CNQX did not block the response after 30 min stimulation. Why?

      CNQX is washed out before the 30 min stimulation starts, so it is not expected to block the response to stimulation. However, in some cases the response to stimulation was lower in amplitude, likely due to residual CNQX that did not wash out completely.

      Regarding DEGs, on the top of p 10 what are the percentages of?

      In this analysis we tested in each hemisphere how many genes expressed differentially between 1 and 24 hours post stimulation (either up- or down- regulated). The results were presented as the percentages of differentially expressed genes in each hemisphere (13.2% contralateral, and 7.3% ipsilateral). The text was rephrased for clarity.

      Please add a ref for the use of the JSD metric methods and support for its use as the appropriate method. Other methods need explanation/references.

      References were added to the text to clarify. The Jensen-Shannon Divergence metric is commonly used to calculate the statistical pairwise distance among two distributions (Sudmant et al., 2015). From comparing a few different distance metric calculations including JSD, our results were similar irrespective of the distance metric applied. Therefore, we demonstrate the variability between paired samples of stimulated and non-stimulated cortex of each animal at two time points following stimulation (24 h vs. 1 h) using JSD.

      What synaptic plasticity genes were selected for assay and what were not?

      What does "largely unaffected" mean? Some of the genes may change a small amount but have big functional effects.

      The selected genes of interest were taken from a large list compiled from previous publications (see (Cacheaux et al., 2009; Kim et al., 2017)) and are well documented in gene ontology databases and tools (e.g., Metascape, (Zhou et al., 2019)).

      We agree that the term ‘largely unaffected’ is suboptimal, and we rephrased this section of the results to indicate that “No significant differences were found in BBB or inflammation related genes between the hemispheres”. We also agree that a small number of genes can have big functional effects. Future studies are needed to better understand the genes underlying the observed BBB modulation.

      Please note that Slc and ABCs are not only involved in the BBB.

      Thank you. We modified the text to no longer specify that these are BBB-specific transporters.

      Please explain the choice of the stress ball squeeze task, and DCE.

      DCE is a well-established method for BBB imaging in living humans, and it is cited throughout the manuscript. The ball squeeze task was chosen as it is presumed to involve primarily sensory motor areas, without high-level processing (Halder et al., 2005). This is now stated in the discussion.

      What is Gd-DOTA?

      Gd-DOTA is a gadolinium-based contrast agent (gadoterate meglumine, AKA Dotarem). Text was revised for clarity. Please see the Methods section under ‘Magnetic Resonance Imaging’ - ‘Data Acquisition’.

      What does a higher percentage of activated regions mean- how was activacon defined and how were regions counted?

      Higher percentage of activated regions refers to regions in which voxels showed significant BOLD changes due to the motor task preformed. The statistical approaches and analyses are detailed in the Methods section under ‘Magnetic Resonance Imaging - Preprocessing of functional data, and fMRI Localizer Motor Task’.

      Figure. 4

      Was stimulation 1 min or 30 min.?

      30 min, Text has been revised for clarity.

      What is the Wald test and how were p values adjusted-please add to the Stats section.

      The Methods section under ‘Statistical analysis’ was revised to clarify this point.

      Is there a reason why p values are sometimes circles and otherwise triangles?

      The legend was revised to explain that ”Circles represent genes with no significant differences between 1 and 24 h poststimulation. Upward and downward triangles indicate significantly up- and down- regulated genes, respectively.”

      How can a p-value be zero? Please explain abbreviations.

      The p-value is very low (~10-10) and therefore appears to be zero due to the scale of the y-axis.

      Fig. 5b.

      There are unexplained abbreviations.

      The x on the ball and hand is not clear relative to the black ball and hand.

      Thank you for noticing. We revised the figure for clarity.

      c. What was the method used to make an activator map and what is meant by localizer task?

      The explanation of the “fMRI Localizer Motor Task” section in the methods was revised for added clarity.

      f. What is the measurement "% area" that indicates " BBB modulation"?

      Is it in f, the BBB permeable vessels (%)? f. Please explain: "Heatmap of BBB modulated voxels percentage in motor/sensory-related areas of task vs. controls."

      The %area measurement indicates the percentage of voxels within a specific brain region that have a leaky BBB. See Methods.

      Is Task - the control?

      Yes.

      Supplemental Fig. 2.

      Why is AUC measured, not amplitude?

      The amplitude, and now also the AUC are shown in Figure 3.

      b. There is no comparison to baseline. The arrowhead points to the start of stimulation but there is no arrowhead marking the end.

      In the revised paper we added a grey shade over the stimulation period to better visualize the difference to baseline. In this panel we wanted to show that NMDA receptor antagonist did not block the SEP, while AMPA receptor antagonist did.

      c. In the blot there are two bands for PSD95- which is the one that is PSD95? There is no increase in PSD95 uncl 24 hrs but in the graph in d there is. In the blot, there is a strong expression of PSD95 ipsilateral compared to contralateral in the sham-why?

      What is the percent change fold?

      The PSD-95 is the top and larger band. The lower band was disregarded in the analysis. The example we show may not fully reflect the group statistics presented in panel d. Upon quantification of 8 animals, PSD-95 is significantly higher 30 min and 24 hours post stimulation in the contralateral hemisphere. No significant changes were found in sham animals. The % change fold refers to the AUC change compared to baseline. This panel was now incorporated in Figure 3 (panel h), and the title was corrected to “|AUC|, % change from baseline”.

      Supplemental Fig. 4.

      a. If ipsilateral and contralateral showed many changes why do the authors think the effects were only contralateral?

      Our gene analysis was designed to complement our in vivo and histological findings, by assessing the magnitude of change in differentially expressed genes (DEGs). This analysis showed that: (1) the hemisphere contralateral to the stimulus has significantly more DEGs than the ipsilateral hemisphere; and (2) the DEGs were related to synaptic plasticity and TGF-b signaling. These findings strengthen the hypothesis raised by our in vivo and histological experiments.

      Supplemental Fig. 5 includes many processes not in the results. Examples include dorsal cuneate and VPL, dynamin, Kir, mGluR, etc. The top right has numbers that are not mentioned. If the drawings are from other papers they should be cited.

      The drawings of Figure 5 are original and were not published before. This hypothesis figure points to mechanisms that may drive the phenomena described in the paper. The legend of the figure was revised to include references to mechanisms that were not tested in this study.

      Papers referenced in this letter:

      Ahishali, B., & Kaya, M. (2020). Evaluation of Blood-Brain Barrier Integrity Using Vascular Permeability Markers: Evans Blue, Sodium Fluorescein, Albumin-Alexa Fluor Conjugates, and Horseradish Peroxidase. Methods in Molecular Biology, 2367, 87–103. https://doi.org/10.1007/7651_2020_316

      Aksenov, D. P., Li, L., Miller, M. J., Iordanescu, G., & Wyrwicz, A. M. (2015). Effects of anesthesia on BOLD signal and neuronal activity in the somatosensory cortex. Journal of Cerebral Blood Flow and Metabolism, 35(11), 1819–1826. https://doi.org/10.1038/jcbfm.2015.130

      All, A. H., Agrawal, G., Walczak, P., Maybhate, A., Bulte, J. W. M., & Kerr, D. A. (2010). Evoked potential and behavioral outcomes for experimental autoimmune encephalomyelitis in Lewis rats. Neurological Sciences, 31(5), 595–601. https://doi.org/10.1007/s10072-010-0329-y

      Allison, J. D., Meador, K. J., Loring, D. W., Figueroa, R. E., & Wright, J. C. (2000). Functional MRI cerebral activation and deactivation during finger movement. Neurology, 54(1), 135–142. https://doi.org/10.1212/wnl.54.1.135

      Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8(9), 1148–1150. https://doi.org/10.1038/nn1516

      Betterton, R. D., Abdullahi, W., Williams, E. I., Lochhead, J. J., Brzica, H., Stanton, J., Reddell, E., Ogbonnaya, C., Davis, T. P., & Ronaldson, P. T. (2022). Regula/on of Blood-Brain Barrier Transporters by Transforming Growth Factor-β/Activin Receptor-Like Kinase 1 Signaling: Relevance to the Brain Disposition of 3-Hydroxy-3-Methylglutaryl Coenzyme A Reductase Inhibitors (i.e., Sta/ns). Drug Metabolism and Disposition, 50(7), 942–956. https://doi.org/10.1124/dmd.121.000781

      Bouchard, M. B., Chen, B. R., Burgess, S. A., & Hillman, E. M. C. (2009). Ultra-fast multispectral optical imaging of cortical oxygenation, blood flow, and intracellular calcium dynamics. Optics Express, 17(18), 15670. https://doi.org/10.1364/oe.17.015670

      Cacheaux, L. P., Ivens, S., David, Y., Lakhter, A. J., Bar-Klein, G., Shapira, M., Heinemann, U., Friedman, A., & Kaufer, D. (2009). Transcriptome profiling reveals TGF-β signaling involvement in epileptogenesis. Journal of Neuroscience, 29(28), 8927–8935. https://doi.org/10.1523/JNEUROSCI.0430-09.2009

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. Journal of Neurophysiology, 79(2), 1117–1123. https://doi.org/10.1152/JN.1998.79.2.1117/ASSET/IMAGES/LARGE/JNP.JA47F4.JPEG

      Franceschini, M. A., Radhakrishnan, H., Thakur, K., Wu, W., Ruvinskaya, S., Carp, S., & Boas, D. A. (2010). The effect of different anesthetics on neurovascular coupling. NeuroImage, 51(4), 1367–1377. https://doi.org/10.1016/j.neuroimage.2010.03.060

      Gellibert, F., Woolven, J., Fouchet, M. H., Mathews, N., Goodland, H., Lovegrove, V., Laroze, A., Nguyen, V. L., Sautet, S., Wang, R., Janson, C., Smith, W., Krysa, G., Boullay, V., De Gouville, A. C., Huet, S., & Hartley, D. (2004). Identification of 1,5-naphthyridine derivatives as a novel series of potent and selective TGF-β type I receptor inhibitors. Journal of Medicinal Chemistry, 47(18), 4494–4506. https://doi.org/10.1021/jm0400247

      Goff, W. R., Rosner, B. S., & Allison, T. (1962). Distribution of cerebral somatosensory evoked responses in normal man. Electroencephalography and Clinical Neurophysiology, 14(5), 697–713. https://doi.org/10.1016/0013-4694(62)90084-6

      Halder, P., Sterr, A., Brem, S., Bucher, K., Kollias, S., & Brandeis, D. (2005). Electrophysiological evidence for cortical plasticity with movement repetition. European Journal of Neuroscience, 21(8), 2271–2277. https://doi.org/10.1111/J.1460-9568.2005.04045.X

      Ivens, S., Kaufer, D., Flores, L. P., Bechmann, I., Zumsteg, D., Tomkins, O., Seiffert, E., Heinemann, U., & Friedman, A. (2007). TGF-β receptor-mediated albumin uptake into astrocytes is involved in neocortical epileptogenesis. Brain, 130(2), 535–547. https://doi.org/10.1093/brain/awl317

      Kaplan, L., Chow, B. W., & Gu, C. (2020). Neuronal regulation of the blood–brain barrier and neurovascular coupling. In Nature Reviews Neuroscience (Vol. 21, Issue 8, pp. 416–432). Nature Research. https://doi.org/10.1038/s41583-020-0322-2

      Keller, P., & Simons, K. (1998). Cholesterol is required for surface transport of influenza virus hemagglutinin. Journal of Cell Biology, 140(6), 1357–1367. https://doi.org/10.1083/jcb.140.6.1357

      Kim, S. Y., Senatorov, V. V., Morrissey, C. S., Lippmann, K., Vazquez, O., Milikovsky, D. Z., Gu, F., Parada, I., Prince, D. A., Becker, A. J., Heinemann, U., Friedman, A., & Kaufer, D. (2017). TGFβ signaling is associated with changes in inflammatory gene expression and perineuronal net degradation around inhibitory neurons following various neurological insults. Scientific Reports, 7(1), 7711. https://doi.org/10.1038/s41598-017-07394-3

      Knowland, D., Arac, A., Sekiguchi, K. J., Hsu, M., Lutz, S. E., Perrino, J., Steinberg, G. K., Barres, B. A., Nimmerjahn, A., & Agalliu, D. (2014). Stepwise Recruitment of Transcellular and Paracellular Pathways Underlies Blood-Brain Barrier Breakdown in Stroke. Neuron, 82(3), 603–617. https://doi.org/10.1016/j.neuron.2014.03.003

      Koudinov, A. R., & Koudinova, N. V. (2001). Essen/al role for cholesterol in synaptic plasticity and neuronal degeneration. The FASEB Journal, 15(10), 1858–1860. https://doi.org/10.1096/r.00-0815re

      Lapilover, E. G., Lippmann, K., Salar, S., Maslarova, A., Dreier, J. P., Heinemann, U., & Friedman, A. (2012). Periinfarct blood-brain barrier dysfunction facilitates induction of spreading depolarization associated with epileptiform discharges. Neurobiology of Disease, 48(3), 495–506. htttts://doi.org/10.1016/j.nbd.2012.06.024

      Lindauer, U., Villringer, A., & Dirnagl, U. (1993). Characterization of CBF response to somatosensory stimulation: Model and influence of anesthetics. American Journal of Physiology - Heart and Circulatory Physiology, 264(4 33-4), 223–1228. https://doi.org/10.1152/ajpheart.1993.264.4.h1223

      Lippmann, K., Kamintsky, L., Kim, S. Y., Lublinsky, S., Prager, O., Nichtweiss, J. F., Salar, S., Kaufer, D., Heinemann, U., & Friedman, A. (2017). Epileptiform activity and spreading depolarization in the bloodbrain barrier-disrupted peri-infarct hippocampus are associated with impaired GABAergic inhibition and synaptic plasticity. Journal of Cerebral Blood Flow and Metabolism, 37(5), 1803–1819. https://doi.org/10.1177/0271678X16652631

      Masamoto, K., & Kanno, I. (2012). Anesthesia and the quantitative evaluation of neurovascular coupling. In Journal of Cerebral Blood Flow and Metabolism (Vol. 32, Issue 7, pp. 1233–1247). SAGE PublicationsSage UK: London, England. https://doi.org/10.1038/jcbfm.2012.50

      McGregor, H. R., Cashaback, J. G. A., & Gribble, P. L. (2016). Functional Plasticity in Somatosensory Cortex Supports Motor Learning by Observing. Current Biology, 26(7), 921–927. https://doi.org/10.1016/j.cub.2016.01.064

      McMillin, M. A., Frampton, G. A., Seiwell, A. P., Patel, N. S., Jacobs, A. N., & DeMorrow, S. (2015). TGFβ1 exacerbates blood-brain barrier permeability in a mouse model of hepatic encephalopathy via upregulation of MMP9 and downregulation of claudin-5. Laboratory Investigation, 95(8), 903–913. https://doi.org/10.1038/labinvest.2015.70

      Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M., & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. Journal of Neuroscience, 29(16), 5326– 5335. https://doi.org/10.1523/JNEUROSCI.5965-08.2009

      Milikovsky, D. Z., Ofer, J., Senatorov, V. V., Friedman, A. R., Prager, O., Sheintuch, L., Elazari, N., Veksler, R., Zelig, D., Weissberg, I., Bar-Klein, G., Swissa, E., Hanael, E., Ben-Arie, G., Schefenbauer, O., Kamintsky, L., Saar-Ashkenazy, R., Shelef, I., Shamir, M. H., … Friedman, A. (2019). Paroxysmal slow cortical activity in Alzheimer’s disease and epilepsy is associated with blood-brain barrier dysfunction. Science Translational Medicine, 11(521), eaaw8954–eaaw8954. https://doi.org/10.1126/scitranslmed.aaw8954

      Ngai, A. C., Jolley, M. A., D’Ambrosio, R., Meno, J. R., & Winn, H. R. (1999). Frequency-dependent changes in cerebral blood flow and evoked potentials during somatosensory stimulation in the rat. Brain Research, 837(1–2), 221–228. https://doi.org/10.1016/S0006-8993(99)01649-2

      Obermeier, B., Daneman, R., & Ransohoff, R. M. (2013). Development, maintenance and disruption of the blood-brain barrier. In Nature Medicine (Vol. 19, Issue 12, pp. 1584–1596). Nature Publishing Group. https://doi.org/10.1038/nm.3407

      Schoknecht, K., Prager, O., Vazana, U., Kamintsky, L., Harhausen, D., Zille, M., Figge, L., Chassidim, Y., Schellenberger, E., Kovács, R., Heinemann, U., & Friedman, A. (2014). Monitoring stroke progression: In vivo imaging of cortical perfusion, blood-brain barrier permeability and cellular damage in the rat photothrombosis model. Journal of Cerebral Blood Flow and Metabolism, 34(11), 1791–1801. https://doi.org/10.1038/jcbfm.2014.147

      Schumacher, L., Slimani, R., Zizmare, L., Ehlers, J., Kleine Borgmann, F., Fitzgerald, J. C., Fallier-Becker, P., Beckmann, A., Grißmer, A., Meier, C., El-Ayoubi, A., Devraj, K., Mittelbronn, M., Trautwein, C., & Naumann, U. (2023). TGF-Beta Modulates the Integrity of the Blood Brain Barrier In Vitro, and Is Associated with Metabolic Alterations in Pericytes. Biomedicines, 11(1), 1–19. https://doi.org/10.3390/biomedicines11010214

      Shim, H. J., Jung, W. B., Schlegel, F., Lee, J., Kim, S., Lee, J., & Kim, S. G. (2018). Mouse fMRI under ketamine and xylazine anesthesia: Robust contralateral somatosensory cortex ac/va/on in response to forepaw stimulation. NeuroImage, 177, 30–44. https://doi.org/10.1016/J.NEUROIMAGE.2018.04.062

      Sudmant, P. H., Alexis, M. S., & Burge, C. B. (2015). Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biology, 16(1), 287. https://doi.org/10.1186/s13059-015-0853-4

      Tao, L., & Nicholson, C. (1996). Diffusion of albumins in rat cortical slices and relevance to volume transmission. Neuroscience, 75(3), 839–847. https://doi.org/10.1016/0306-4522(96)00303-X

      Vazana, U., Veksler, R., Pell, G. S., Prager, O., Fassler, M., Chassidim, Y., Roth, Y., Shahar, H., Zangen, A., Raccah, R., Onesti, E., Ceccanti, M., Colonnese, C., Santoro, A., Salvati, M., D’Elia, A., Nucciarelli, V., Inghilleri, M., & Friedman, A. (2016). Glutamate-mediated blood–brain barrier opening: Implications for neuroprotection and drug delivery. Journal of Neuroscience, 36(29), 7727–7739. https://doi.org/10.1523/JNEUROSCI.0587-16.2016

      Veksler, R., Vazana, U., Serlin, Y., Prager, O., Ofer, J., Shemen, N., Fisher, A. M., Minaeva, O., Hua, N., SaarAshkenazy, R., Benou, I., Riklin-Raviv, T., Parker, E., Mumby, G., Kamintsky, L., Beyea, S., Bowen, C. V., Shelef, I., O’Keeffe, E., … Friedman, A. (2020). Slow blood-to-brain transport underlies enduring barrier dysfunction in American football players. Brain, 143(6), 1826–1842. https://doi.org/10.1093/brain/awaa140

      Zandieh, S., Hopf, R., Redl, H., & Schlag, M. G. (2003). The effect of ketamine/xylazine anesthesia on sensory and motor evoked potentials in the rat. Spinal Cord, 41(1), 16–22. https://doi.org/10.1038/sj.sc.3101400

      Zelig, D., Goldberg, I., Shor, O., Ben Dor, S., Yaniv-Rosenfeld, A., Milikovsky, D. Z., Ofer, J., Imtiaz, H., Friedman, A., & Benninger, F. (2022). Paroxysmal slow wave events predict epilepsy following a first seizure. Epilepsia, 63(1), 190–198. https://doi.org/10.1111/epi.17110

      Zhang, Y., & Pardridge, W. M. (2001). Mediated efflux of IgG molecules from brain to blood across the blood– brain barrier. Journal of Neuroimmunology, 114(1–2), 168–172. https://doi.org/10.1016/S01655728(01)00242-9

      Zhou, Y., Zhou, B., Pache, L., Chang, M., Khodabakhshi, A. H., Tanaseichuk, O., Benner, C., & Chanda, S. K. (2019). Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications, 10(1), 1–10. https://doi.org/10.1038/s41467-019-09234-6

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding for the treatment of PCCs by sequencing 16 tumor specimens from five patients with pheochromocytomas by single-cell transcriptomics and proposing a new molecular classification criterion based on the sequencing results and characterization of tumor microenvironmental features. The evidence supporting the claims of the authors is solid, although the inclusion of more patient samples would strengthen the study's conclusions. The work will be of interest to clinicians or medical biologists working on rare pheochromocytomas (PCCs).

      Firstly, we sincerely appreciate the positive feedback from the editor and extend our gratitude to the three reviewers for their meticulous review and valuable comments. Our detailed responses to each recommendation are outlined below.

      Response to reviewers’ recommendations

      Reviewer #1 (Recommendations for The Authors):

      1) Transcriptomal clonal dynamics of different PCCs is well written. However for conclusion sample size needs to be more.

      Acknowledging the rarity of PCCs with an incidence of approximately 0.2 to 0.6 cases per 100,000 person-years (Farrugia & Charalampopoulos, 2019; Neumann et al, 2019), our study recognizes the limitation in sample size, as discussed in the limitations section (Page 22). In response to this concern, we are committed to undertaking further research with an expanded sample size to bolster the robustness of our conclusions, seeking a more comprehensive understanding of tumor microenvironment characterization and molecular classification in PCCs. We appreciate the valuable guidance provided by the reviewer.

      2) Clinical, biochemistry data of 5 cases can be analysed. Any findings in different categories as per postulated classification can be noted for further studies. Example: epinephrine levels

      We have now included the clinical information of 5 PCC patients, encompassing signs and symptoms, the tumor size, and laboratory test results in the revised manuscript as Supplemental Table S3 (Page 11-12). Notably, our analysis revealed that the kinase-type PCC patient (P4) exhibited higher blood pressures and plasma levels of catecholamine metabolites (3-methoxytyramine and normetanephrine) compared to metabolism-type PCC patients (P1-P3, and P5). This observation aligns with the elevated expression of phenylethanolamine N-methyltransferase (PNMT), an enzyme involved in the biosynthesis of catecholamine and linked to hypertension, in P4, as identified in the scRNA-seq data (Figure 4B and 4D) (Kennedy et al, 1993; Konosu-Fukaya et al, 2018; Nguyen et al, 2015). As suggested, we plan to conduct further research to explore the correlation of our molecular classification with plasma levels of catecholamine metabolites, and the relevant points have been discussed in the revision (Page 20).

      We would like to take this chance to again thank the reviewer for the careful review and very helpful guidance about how to improve our study.

      References for Reviewer #1:

      Farrugia FA, Charalampopoulos A (2019) Pheochromocytoma. Endocrine regulations 53: 191-212 Neumann HPH, Young WF, Jr., Eng C (2019) Pheochromocytoma and Paraganglioma. The New England journal of medicine 381: 552-565

      Kennedy B, Elayan H, Ziegler MG (1993) Glucocorticoid hypertension and nonadrenal phenylethanolamine N-methyltransferase. Hypertension (Dallas, Tex : 1979) 21: 415419

      Konosu-Fukaya S, Omata K, Tezuka Y, Ono Y, Aoyama Y, Satoh F, Fujishima F, Sasano H, Nakamura Y (2018) Catecholamine-Synthesizing Enzymes in Pheochromocytoma and Extraadrenal Paraganglioma. Endocrine pathology 29: 302309

      Nguyen P, Khurana S, Peltsch H, Grandbois J, Eibl J, Crispo J, Ansell D, Tai TC (2015) Prenatal glucocorticoid exposure programs adrenal PNMT expression and adult hypertension. The Journal of endocrinology 227: 117-127

      Reviewer #2 (Recommendations for The Authors):

      1) Please revise all references to "malignant potential", "malignant behavior", etc. throughout the article, including the abstract and introduction, and replace them with the word "metastasis" as appropriate. Since all PCCs are malignant non-epithelial neuroendocrine neoplasms originating from the paraganglia, which are themselves malignant tumors, it is unacceptable to describe them as "malignant potential" or "malignant potential". Please review the 2022 WHO/IARC classification and description of pheochromocytoma/paraganglioma (reference: Mete O, Asa SL, Gill AJ, Kimura N, de Krijger RR, Tischler A. Overview of the 2022 WHO Classification of Paragangliomas and Pheochromocytomas. Endocr Pathol. 2022;33(1):90-114. doi:10.1007/s12022-022-09704-6).

      As suggested, we have replaced all occurrences of “malignant potential” or “malignant behavior” with “metastasis” throughout the revised manuscript. We have also included a citation to the 2022 WHO/IARC classification for further clarity.

      • Similarly, it is not advisable to use the PASS score to predict "malignant" PCC; this type of scoring system evaluates the "metastasis risk" or the "metastasis potential" of PCC.

      We appreciate the reviewer for this insight and have revised our statements accordingly.

      • Also, "MALIGNANT CHAFFIN CELLS" needs to be modified; in fact, it is the "tumor cell of PCC" that the authors are trying to express.

      As suggested, we have amended the term “malignant chromaffin cells” to “PCC cells” in the revised manuscript (Page 9-10).

      2) How does the PASS score specifically relate to intra-tumor heterogeneity as reflected by scRNA-seq? In fact, the PASS score evaluates the histological or pathological invasiveness of PCC, and different sections of the same tumor tissue may have different histological manifestations, which may affect the score; however, scRNA-seq analyzes the cellular composition of the tumor, which is not the same as the information reflected by the PASS score. Both represent different levels and dimensions of intra-tumor heterogeneity and should be analyzed together. Please specifically list, one by one, the proportion of each item score of the PASS system and cell type of scRNA-seq for each sample and the results of the comparisons with each other to better present the conclusions.

      As suggested, we have included the proportion of each item score from the PASS system in the revised manuscript as Supplemental Table S2 (Page 8). Integrating this data with the cell type composition of each sample from Figure 2B, our analysis suggests that intra-tumor heterogeneity, as assessed by the PASS system, is more extensive compared to scRNA-seq. We concur with the reviewer’s judgement that scRNA-seq analysis and PASS score represent different levels and dimensions of intratumor heterogeneity, and we have adjusted our claim throughout the revised manuscript accordingly (Page 8, 9, and 19).

      3) Where is the specific mutation site of the VHL gene in patient 5? Please advise.

      The VHL gene mutation site, c.499C>T (missense mutation), in patient 5 was identified through whole exome sequencing (WES) analysis. We have now added the information to Supplemental Table S1 in the revised manuscript (Page 6).

      4) Please revise Supplementary Figure 1, the scale should not appear in the picture of the staining result of P5.

      As suggested, we have adjusted the position of the scale bar.

      Author response image 1.

      Hematoxylin-eosin staining and immunohistochemistry staining of CGA marker in formalin-fixed paraffin-embedded PCC tissue sections matched to scRNA-seq specimens. Scale bar, 100 μm.

      5) What were the clinical presentation and biochemical findings in the five patients?

      The information regarding tumor sizes, signs and symptoms, and plasma levels of catecholamine metabolites [3-methoxytyramine (3-MT), metanephrine (MN), and normetanephrine (NMN)] has been added to the revised manuscript as Supplemental Table S3 (Page 11-12).

      • Were there any preoperative symptoms of hypertension?

      With the exception of P2, preoperative symptoms of hypertension were observed in all PCC patients. The information has been added to the revised manuscript as Supplemental Table S3 (Page 11-12).

      • What was the size and catecholamine secretion phenotype of each tumor? What was the relationship between these data and the scRNA-seq results?

      The secretion phenotype showed that the kinase-type PCC patient (P4) exhibited higher plasma levels of catecholamine metabolites (3-methoxytyramine and normetanephrine) compared to metabolism-type PCC patients (P1-P3, and P5). This observation aligns with the elevated expression of phenylethanolamine Nmethyltransferase (PNMT), an enzyme involved in the biosynthesis of catecholamine and linked to hypertension, in P4, as identified in the scRNA-seq data (Figure 4B and 4D) (Kennedy et al, 1993; Konosu-Fukaya et al, 2018; Nguyen et al, 2015). Meanwhile, we have not observed the correlation between tumor sizes and molecular classification. We have now included tumor sizes and laboratory test results of 5 PCC patients in the revised manuscript as Supplemental Table S3 (Page 11-12), and the relevant points have been discussed in the revision (Page 20).

      6) Please revise Figure 1A, the meaning shown in the figure appears to dissociate the tissues of the patient's normal adrenal glands, which can be misleading.

      We appreciate the reviewer for raising this concern. The schematic in Figure 1A has been revised accordingly.

      Author response image 2.

      (1A) Schematic of the experimental pipeline. 11 tumor specimens and 5 adjacent normal adrenal medullary specimens were isolated from 5 PCC patients, dissociated into single-cell suspensions, and analyzed using 10x Genomics Chromium droplet scRNA-seq.

      • Please revise the figure note for Figure 1B, where the symbol (B) appears twice.

      As suggested, we have revised the figure legends for Figure 1B and 1C (Page 42).

      7) Please indicate in the figure legends and text what exactly is meant by "adjacent specimens"? medulla? cortex? normal tissue? I believe the authors mean adjacent normal adrenal medullary tissue, please check the article.

      As suggested, we have revised the term “adjacent specimens” to “adjacent normal adrenal medullary tissues” throughout the revised manuscript.

      8) Please review the pathologic diagnostic criteria of this study in light of the 2022 WHO/IARC guidelines for pathologic diagnosis: "For the pathological diagnosis, the inclusion criteria were neuroendocrine neoplasm originating from the adrenal medulla and retroperitoneal origin, i.e. pheochromocytoma and paraganglioma, with consistent morphologic and immunohistochemical confirmation in relevant cases and positivity for chromogranin A and synaptophysin. The exclusion criteria were adrenocortical neoplasm and metastatic tumors." It is not rigorous enough to diagnose a tumor as PCC based on positive CgA immunohistochemical staining results alone.

      We have revised the statements about pathologic diagnostic criteria in accordance with the suggestion and have cited the reference (Page 6).

      We would like to express our gratitude to the reviewer for the thorough review and invaluable guidance provided to enhance the quality of our study.

      References for Reviewer #2:

      Kennedy B, Elayan H, Ziegler MG (1993) Glucocorticoid hypertension and nonadrenal phenylethanolamine N-methyltransferase. Hypertension (Dallas, Tex: 1979) 21: 415419

      Konosu-Fukaya S, Omata K, Tezuka Y, Ono Y, Aoyama Y, Satoh F, Fujishima F, Sasano H, Nakamura Y (2018) Catecholamine-Synthesizing Enzymes in Pheochromocytoma and Extraadrenal Paraganglioma. Endocrine pathology 29: 302309

      Nguyen P, Khurana S, Peltsch H, Grandbois J, Eibl J, Crispo J, Ansell D, Tai TC (2015) Prenatal glucocorticoid exposure programs adrenal PNMT expression and adult hypertension. The Journal of endocrinology 227: 117-127

      Reviewer #3 (Recommendations For The Authors):

      I have several concerns and suggestions, which if addressed would improve the manuscript.

      1) The statements of “plasmas” in the manuscript and figures are confusing, which should be revised as “plasma cells”.

      As suggested, we have revised the terminology from “plasmas” to “plasma cells” throughout the revised manuscript and figures.

      2) The marker genes used for defining plasma cells (IGHG1 and IGLC2) showed low expressing percentage in Figure 1D. Please consider providing other genes as the marker of plasma cells.

      As suggested, we performed additional analysis to pinpoint marker genes for accurate definition of plasma cells. Applying stricter statistical criteria (cut-off pvalue < 0.05, log2 fold change ≥ 1.5, and expressing percentage ≥ 0.6), we identified XBP1 (a transcription factor playing key roles in the final stages of plasma cell development) and IGKC (a type of light-chain immunoglobulins) (Todd et al, 2009; Poulsen et al, 2002) as top significant differentially expressed genes (DEGs) suitable for defining plasma cells. These data are now presented as Figure 1D in the revised manuscript (Page 7).

      Author response image 3.

      (1D) Dot plot of representative marker genes for each cell type. The color scale represents the average marker gene expression level; dot size represents the percentage of cells expressing a given marker gene.

      3) The statement “Our clustering and cell type annotation analysis identified diverse adrenal cells, stromal cells, and immune cells within the PCC microenvironment” seems not be exhibited in Figure 1, so the clustering result of adrenal cells, stromal cells, and immune cells need to be added.

      As suggested, we performed clustering analysis for adrenal cells, stromal cells, and immune cells (including lymphocytes and myeloid cells), and visualized by the Uniform Manifold Approximation and Projection (UMAP) plot. These data have been added to the revised manuscript as Supplemental Figure S3 (Page 8).

      Author response image 4.

      Integration Analysis across 5 PCC Patients Revealing the Cell Type Composition of the PCC Microenvironment. UMAP plot depicting the distribution of adrenal cells, stromal cells, and immune cells (including lymphocytes and myeloid cells) within the PCC microenvironment.

      4) Given the classification of “metabolism-type PCCs” and “kinase-type PCCs” have not been presented in Figure 2D, the statement “Combined with our findings of a higher proportion of neutrophils and monocyts/macrophages in metabolism-type as compared with kinase-type” in Result 6 should be supported by using additional data.

      As suggested, we performed additional analysis to evaluate the proportion of neutrophils and monocytes/macrophages in metabolism-type and kinasetype PCC patients. These data have been added to the revised manuscript as Supplemental Figure S4 (Page 14).

      Author response image 5.

      The frequency distribution of cell types within the microenvironment of metabolism-type and kinase-type PCC patients.

      5) What makes the difference of scRNA-seq analysis and multispectral immunofluorescent staining in judging the immune escape of PCCs? Please provide an explanation.

      We appreciate the reviewer's concern. scRNA-seq lacks spatial details, and multispectral immunofluorescent staining is constrained in the number of detected proteins. To address this, we employed both methods for analysis. scRNA-seq revealed limited communication between tumor and T cells, with lower HLA-I expression in kinase-type PCCs compared to metabolism-type PCCs. This was supported by multispectral staining using antibodies against CD4+ T cells, CD8+ T cells, M1 macrophages, or M2 macrophages markers, indicating sparse immune cell infiltration around tumor cells, mainly in the stroma (Figure 7A and 7B). This dual approach strengthens our understanding of immune escape in both PCC types. The explanation has been added to the revised manuscript (Page 21).

      6) Figure 7G missed the scale bar for the staining results of marker proteins. Please add the scale bar into the figure.

      As suggested, we have added to the scale bar accordingly.

      7) In the method part of the manuscript, the authors should describe the minimum and maximum number used for quality control of the number of genes and the percentage of mitochondrial genes.

      For quality control, we established a minimum threshold of no less than 200 genes and a maximum threshold of no more than 5000 genes. Additionally, the quality control process included a maximum threshold of 30% for mitochondrial genes. These specific criteria have been added to the methods section of the revised manuscript (Page 25-26).

      We express our gratitude to the reviewer for their supportive recommendations and invaluable guidance on enhancing the rigor of our data.

      References for Reviewer #3:

      Todd DJ, McHeyzer-Williams LJ, Kowal C, Lee AH, Volpe BT, Diamond B, McHeyzer-Williams MG, Glimcher LH (2009) XBP1 governs late events in plasma cell differentiation and is not required for antigen-specific memory B cell development. The Journal of experimental medicine 206: 2151-2159

      Poulsen TS, Silahtaroglu AN, Gisselø CG, Tommerup N, Johnsen HE (2002) Detection of illegitimate rearrangements within the immunoglobulin light chain loci in B cell malignancies using end sequenced probes. Leukemia 16: 2148-2155

    1. Author Response

      Reviewer #1 (Public Review)

      Midbrain dopamine neurons have attracted attention as a part of the brain's reward system. A different line of research, on the other hand, has shown that these neurons are also involved in higher cognitive functions such as short-term memory. However, these neurons are thought not to encode short-term memory itself because they just exhibit a phasic response in short-term memory tasks, which cannot seem to maintain information during the memory period. To understand the role of dopamine neurons in short-term memory, the present study investigated the electrophysiological property of these neurons in rodents performing a T-maze version of a short-term memory task, in which a visual cue indicated which arm (left or right) of the T-maze was associated with a reward. The animal needed to maintain this information while they were located between the cue presentation position and the selection position of the T-maze. The authors found that the activity of some dopamine neurons changed depending on the information while the animals were located in the memory position. This dopamine neuron modulation was unable to explain the motivation or motor component of the task. The authors concluded that this modulation reflected the information stored as short-term memory.

      I was simply surprised by their finding because these dopamine neurons are similar to neurons in the prefrontal cortex that store memory information with sustained activity. Dopamine neurons are an evolutionally conserved structure, which is seen even in insects, whereas the prefrontal cortex is developed mainly in the primate. I feel that their findings are novel and would attract much attention from readers in the field. But the authors need to conduct additional analyses to consolidate their conclusion.

      We thank reviewer #1 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Reviewer #1 (Recommendations to The Authors)

      (1) The authors found the dopamine neuron modulation that reflected the memory information during the delay period. Here the dopamine neuron activity was aligned by the position, not by time, in which the animals needed to maintain the information. Usually, the activity was aligned by time, and many studies found that dopamine neurons exhibited a short duration burst in response to rewards and behaviorally relevant stimuli including visual cues presented in short-term memory tasks. For comparison, I (and probably other readers) want to see the time-aligned dopamine neuron modulation that reflected the memory information. Did the modulation still exist? Did it have a long duration? The authors just showed the time-aligned "population" activity that exhibited no memory-dependent modulation.

      We agree that the point raised by the reviewer is important. To address this question, we added a new paragraph to the Methods section titled “Methodological considerations” (in line 793 of the revised manuscript), where we explain the caveats of using time alignment in the T-maze task study. We also created a new sup figure 5 to clarify our argument. As the figure shows, we did not observe major differences in the firing rates when they were arranged by position or time. More importantly, we did not detect brief bursts of activity in response to the visual cue which could reflect an RPE signaling scheme. Our interpretation is that in the T-maze task, DA neurons encode “miniature” RPE signals between successive states in the T-maze, which are hard to detect, especially when neurons receive a continuous sensory input during trials.

      (2) Several studies have reported that dopamine neurons at different locations encode distinct signals even within the VTA or SNr. Were the locations of dopamine neurons maintaining the memory information different from those of other dopamine neurons?

      We thank the reviewer’s comment. Indeed, there is evidence from recent studies demonstrating that DA neurons form functional and anatomical clusters in the VTA and SN. Following the reviewer’s advice, we report the anatomical structure of memory and non-memory-specific neurons in the revised manuscript. You can read these results in the paragraph “Anatomical organization of trajectory-specific neurons.” in the “Results” section (in line 383 of the revised manuscript) and in the new sup figure 11. We only observed a clear functional-anatomical segregation in GABA neurons, but not in DA neurons. But we should note that the absence of segregation in the DA neurons could be accounted for by the fact that we recorded mostly from the lateral VTA, therefore we do not have any numbers from the medial VTA.

      (3a) Did the dopamine neurons maintaining the memory information respond to reward?

      We believe that we have already provided the data that can partially answer this question by correlating the firing rate difference between the reward and memory delay sections. This result was described in the “Neuronal activities in delay and reward are unrelated.” paragraph and in Figure 6. Moreover, motivated by the reviewer’s question, we also performed additional analysis, which is included in the revised manuscript. Briefly, we clustered significant responses between the memory delay and reward sections (Category 1: Left-signif, R-signif or No-signif / Category 2: Memory delay or Reward). We discovered that only a very small number of neurons showed the same significant trajectory preference in the memory delay and reward sections (i.e., significant preference for left trials in the memory delay and significant preference for the left reward). In fact, more significant neurons showed a preference for opposite trajectories (i.e. significant preference for left trials in memory delay and a significant preference for right rewards). A description of the new results is included in the “Neuronal activities in delay and reward are unrelated.” paragraph (in line 349 of the revised manuscript) and in the new supplementary Figure 11.

      (3b) Did they encode reward prediction error? The relationship between the present data and the conventional theory may be valuable.

      We understand that the readers of this study will come up with the question of how memory-specific activities are related to RPE signaling. However, the T-maze task we used in this research was designed for studying working memory and was not adequate to extract information about the RPE signaling of DA neurons.

      RPE signaling is mainly studied in Pavlovian conditioning. These are low-dimensional tasks with usually four (4) states (state1: ITI, state2: trial start, state3: stimulus presentation, state4: reward delivery). Evidence of RPE signaling is extracted from the firing activity of states 3 and 4 (which is theorized to be related to the difference in the values for states 3 and 4).

      However, in the T-maze task, the number of states is hard to define and practically countless. In these conditions, it has been suggested that numerous small RPEs are signaled while the mice navigate the maze; Thus, they are very difficult to detect. To our knowledge, only Kim et al 2020, Cell, vol183, pg1600, managed to detect the RPE signaling activity of DA neurons while mice were teleported in a virtual corridor.

      Another confounding factor in extracting RPE signals in the T-maze task is that the environment is high-dimensional and DA neurons are multitasking. Therefore, it is likely that RPE signaling could be masked by other parallel encoding schemes.

      We have added these descriptions in the “Methodological considerations” (in line 793 of the revised manuscript).

      (4) Did the dopamine neurons maintaining the memory information (left or right) prefer a contralateral direction like neurons in the motor cortex?

      We thank the reviewer for this comment. Indeed, the majority of the memory-specific DA neurons showed a preference for the contralateral direction. We report this result in the legend of the new sup fig 10 (in line 1668 of the revised manuscript).

      (5) As shown in Table S2, the proportion of GABA neurons maintaining the memory information (left or right during delay) was much larger than that of dopamine neurons. It seems to be strange because the main output neurons in the VTA are dopaminergic. What is the role of these GABA neurons?

      We thank the reviewer for pointing this out. The present study shows that in both populations a sizeable portion of neurons show memory-specific encoding activities. However, the percentage of memory-encoding GABA neurons is more than twice as large as in the DA neurons. Moreover, we show that GABA neurons are functionally and anatomically segregated.

      From this evidence, one could raise the hypothesis that the GABA neurons have a primary role and that the activity of DA neurons is a collateral phenomenon, triggered in a sequence of events within the VTA network. To characterize the (1) role and (2) importance of GABA neurons in memory-guided behavior, one should first identify the afferent and efferent projections of these cells in great detail. Unfortunately, we do not provide anatomical evidence.

      So far, with the electrophysiological data we have collected (unit and field recordings), we can address an alternative hypothesis. It has been reported earlier (but we have also observed) that the VTA circuit engages in behaviorally related network oscillations which range from 0.4Hz up to 100Hz. Converging evidence from different brain regions, in vitro preparations but also in vivo recordings agree that local networks of inhibitory neurons are crucial for the generation, maintenance, and spectral control of network oscillations. Ongoing analysis, which we hope will lead to a publication, is looking for the behavioral correlates of network oscillations on the T-maze task, as well as the correlation of single-unit firing activity to the field oscillations. We expect to detect a higher field-unit coherence in GABA neurons, which could explain their stronger engagement in memory-specific encoding activity.

      The potential role of GABA neurons in network oscillations is discussed in the revised manuscript in a newly added paragraph in line 564.

      Reviewer #2 (Public Review)

      The authors phototag DA and GABA neurons in the VTA in mice performing a t-maze task, and report choice-specific responses in the delay period of a memory-guided task, more so than in a variant task w/o a memory component. Overall, I found the results convincing. While showing responses that are choice selective in DA neurons is not entirely novel (e.g. Morris et al NN 2006, Parker et al NN 2016), the fact that this feature is stronger when there is a memory requirement is an interesting and novel observation.

      I found the plots in 3B misleading because it looks like the main result is the sequential firing of DA neurons during the Tmaze. However, many of the neurons aren't significant by their permutation test. Often people either only plot the neurons that are significant, or plot with cross-validation (ie sort by half of the trials, and plot the other half).

      Relatedly, the cross-task comparisons of sequences (Fig, 4,5) are hampered by the fact that they sort in one task, then plot in the other, which will make the sequences look less robust even if they were equally strong. What happens if they swap which task's sequences they use to order the neurons? I do realize they also show statistical comparisons of modulated units across tasks, which is helpful.

      We thank reviewer #2 for the valuable and constructive comments on our manuscript. If, as the reviewer commented, the rate differences between left and right trajectories were only the result we want to claim, there may be a way to show only those whose left and right are significant. However, the sequential activity is also one of the points we wanted to display. We did not emphasize this result because it has already been shown by Engelhard et al. 2019. However, after reading the reviewer's comments, we decided to add a few lines in the "Results" (in lines 205 - 215 of the revised manuscript) and "Discussion" (in line 453 of the revised manuscript) describing the sequential activity of the VTA circuit. In those lines, we explained that DA activity is position-specific (resulting in sequential activity) and that a fraction of them also have left-right specificity.

      Overall, the introduction was scholarly and did a good job covering a vast literature. But the explanation of t-maze data towards the end of the introduction was confusing. In Line 87, I would not say "in the same task" but "in a similar task" because there are many differences between the tasks in question.

      We thank the reviewer for pointing out this mistake. In the revised manuscript, we replaced “in the same task” with “in a similar task” (in line 85 of the revised manuscript).

      And not clear what is meant by "by averaging neuronal population activities, none of these computational schemes would have been revealed. " There was trial averaging, at least in Harvey et al. I thought the main result of that paper related to coding schemes was that neural activity was sequential, not persistent. I think it would help the paper to say that clearly.

      We admit that this sentence leaves room for misunderstanding. We were mainly referring to DA studies using microdialysis or fiber photometry techniques. We decided to delete this sentence in the revised manuscript.

      Also, I'm not aware it was shown that choice selectivity diminishes when the memory demand of the task is removed - please clarify if that is true in both referenced papers.

      The reviewer’s remark is correct. None of these reports show explicitly that memory-specific activities are diminished without the memory component. Therefore, we deleted this sentence in the revised manuscript.

      If so, an interpretation of this present data could be found in Lee et al biorxiv 2022, which presents a computational model that implies that the heterogeneity in the VTA DA system is a reflection of the heterogeneity found in upstream regions (the state representation), based on the idea that different subsets of DA neurons calculate prediction errors with respect to different subsets of the state representation.

      We thank the reviewer for sharing this interpretation. We agree that this theory would support our results. In the revised manuscript we briefly discuss the Lee et al. report (in line 460 of the revised manuscript).

      I am surprised only 28% of DA neurons responded to the reward - the reward is not completely certain in this task. This seems lower than other papers in mice (even Pavlovian conditioning, when the reward is entirely certain). It would be helpful if the authors comment on how this number compares to other papers.

      In Pavlovian conditioning, neuronal responses to rewards are compared to a relatively quiet period of firing activity (usually the inter-trial interval epoch). As the reviewer pointed out, in the present study, the number of DA neurons responding to reward is smaller compared to the earlier studies. We hypothesize that this is due to our comparison method. We compared the post-reward response to an epoch when the animal was running along the side arms and the majority of neurons were highly active, instead of comparing it to a quiescent baseline epoch.

      Reviewer #2 (Recommendations to The Authors)

      Can you clarify what disparity you are referring to here? "Disparities between this 438 and our study in the proportions of modulated neurons could be attributed to the 439 different recording techniques applied as well as the maze regions of interest; for 440 example, Engelhard et al. analyzed neuronal firing activities in the visual-cue period 441 (Engelhard et al., 2019), whereas we focused on memory delay.". Is it the fact that Engelhard et al did not report choice-selective activity? They did report cue-side-selective activity, with some neurons responsive to cues on one side, and other neurons responsive to cues on the other side. Because there are more cues on the left when the mouse turns left, these neurons do indeed have choice-selective responses.

      We thank the reviewer for this comment. We agree that we need to clarify further our argument. As the reviewer pointed out, Engelhard et al identified choice-specific DA neurons. However, they reported the encoding properties of DA neurons only in the visual-cue period and the reward period. Remarkably, although the task has a memory delay, they did not report the neuronal firing activities for this delay period. Instead, in the present study we dedicated most of our analysis to characterizing the firing properties of VTA neurons in the delay period.

      Also, in response to your comment, we edited the paragraph where we describe the disparities between our study and Engelhard et al (in line 466 in the revised manuscript).

      I don't think this sentence of intro is needed since it doesn't really contain new info: "Therefore, we looked for hints 116 of memory-related encoding activities in single DA and GABA neurons by 117 characterizing their firing preference for opposite behavioral choices.".

      We agree with the reviewer. Therefore, we deleted this sentence in the revised manuscript.

      I didn't understand this line of discussion: "Our evidence does not question the validity of this computational model, since we do not provide evidence of how the selective preference for one response over the other translates into the release site.".

      The gating theory is based on experimental evidence of neuronal firing activities of DA neurons but also takes into consideration (to a lesser degree) the pre- and post-synaptic processes at the DA release sites (inverted U-shape of D1R activity). We thought that the reader may come to the conclusion that we question the validity of the gating theory. But this is not our intention, especially when we do not provide important evidence such as (1) the projection sites of DA and GABA neurons and (2) the sequence of events that take place at the synaptic triads following the DA and GABA release.

      After reading your comment we came to the conclusion that this sentence should be omitted because it is not within the scope of this study to question the validity of the gating theory. Instead, we dedicated a few lines of text to explaining which components of the gating theory (“update”, “maintenance & manipulation” and “motor preparation”) could be attributed to the trajectory-specific activities in the memory delay of the T-maze task. (section “Activities of midbrain DA neurons in short-term memory” in line 417 of the revised manuscript).

      In 1B, please illustrate when the light pulses are on & off?

      Following the reviewer’s instruction, we added colored bars on top of the raster plots in Figure 1B, indicating the light induction conditions.

      In legend for 6C, please clarify it's a correlation between the difference in R and L choice activity across the epochs (if my understanding is correct).

      The reviewer’s understanding is correct. We took this advice into consideration to further clarify the methods of analysis that led to the plot in Figure 6C (in line 1246 in the revised manuscript).

    1. Author Response

      We thank you for the time you took to review our work and for your feedback!

      The major changes to the manuscript are:

      1. We have extended the range of locomotion velocity over which we compare its dependence with cholinergic activity in Figures 2E and S2H.

      2. We have quantified the contributions of cholinergic stimulation on multiplicative and additive gains on visual responses (Figure S7).

      3. We have provided single cell examples for the change in latency to visual response (Figure S12).

      4. We have added an analysis to compare layer 2/3 and layer 5 locomotion onset responses as a function of visuomotor condition (Figure S8).

      A detailed point-by-point response to all reviewer concerns is provided below.  

      Reviewer #1 (Public Review):

      The paper submitted by Yogesh and Keller explores the role of cholinergic input from the basal forebrain (BF) in the mouse primary visual cortex (V1). The study aims to understand the signals conveyed by BF cholinergic axons in the visual cortex, their impact on neurons in different cortical layers, and their computational significance in cortical visual processing. The authors employed two-photon calcium imaging to directly monitor cholinergic input from BF axons expressing GCaMP6 in mice running through a virtual corridor, revealing a strong correlation between BF axonal activity and locomotion. This persistent activation during locomotion suggests that BF input provides a binary locomotion state signal. To elucidate the impact of cholinergic input on cortical activity, the authors conducted optogenetic and chemogenetic manipulations, with a specific focus on L2/3 and L5 neurons. They found that cholinergic input modulates the responses of L5 neurons to visual stimuli and visuomotor mismatch, while not significantly affecting L2/3 neurons. Moreover, the study demonstrates that BF cholinergic input leads to decorrelation in the activity patterns of L2/3 and L5 neurons.

      This topic has garnered significant attention in the field, drawing the interest of many researchers actively investigating the role of BF cholinergic input in cortical activity and sensory processing. The experiments and analyses were thoughtfully designed and conducted with rigorous standards, leading to convincing results which align well with findings in previous studies. In other words, some of the main findings, such as the correlation between cholinergic input and locomotor activity and the effects of cholinergic input on V1 cortical activity, have been previously demonstrated by other labs (Goard and Dan, 2009; Pinto et al., 2013; Reimer et al., 2016). However, the study by Yogesh and Keller stands out by combining cutting-edge calcium imaging and optogenetics to provide compelling evidence of layerspecific differences in the impact of cholinergic input on neuronal responses to bottom-up (visual stimuli) and top-down inputs (visuomotor mismatch).

      We thank the reviewer for their feedback.

      Reviewer #2 (Public Review):

      The manuscript investigates the function of basal forebrain cholinergic axons in mouse primary visual cortex (V1) during locomotion using two-photon calcium imaging in head-fixed mice. Cholinergic modulation has previously been proposed to mediate the effects of locomotion on V1 responses. The manuscript concludes that the activity of basal forebrain cholinergic axons in visual cortex provides a signal which is more correlated with binary locomotion state than locomotion velocity of the animal. Cholinergic axons did not seem to respond to grating stimuli or visuomotor prediction error. Optogenetic stimulation of these axons increased the amplitude of responses to visual stimuli and decreased the response latency of layer 5 excitatory neurons, but not layer 2/3 neurons. Moreover, optogenetic or chemogenetic stimulation of cholinergic inputs reduced pairwise correlation of neuronal responses. These results provide insight into the role of cholinergic modulation to visual cortex and demonstrate that it affects different layers of visual cortex in a distinct manner. The experiments are well executed and the data appear to be of high quality. However, further analyses are required to fully support several of the study's conclusions.

      We thank the reviewer for their feedback.

      1) In experiments analysing the activity of V1 neurons, GCaMP6f was expressed using a ubiquitous Ef1a promoter, which is active in all neuronal cell types as well as potentially non-neuronal cells. The manuscript specifically refers to responses of excitatory neurons but it is unclear how excitatory neuron somata were identified and distinguished from that of inhibitory neurons or other cell types.

      This might be a misunderstanding. The Ef1α promoter has been reported to drive highly specific expression in neurons (Tsuchiya et al., 2002) with 99.7% of labeled cells in layer 2/3 of rat cortex being NeuN+ (a neuronal marker), with only 0.3% of labeled cells being GFAP+ (a glial marker) (Yaguchi et al., 2013). This bias was even stronger in layer 5 with 100% of labeled cells being NeuN+ and none GFAP+ (Yaguchi et al., 2013). The Ef1α promoter in an AAV vector, as we use it here, also biases expression to excitatory neurons. In layer 2/3 of mouse visual cortex, we have found that 96.8% ± 0.7% of labeled neurons are excitatory three weeks after viral injection (Attinger et al., 2017). Similar results have also been found in rats (Yaguchi et al., 2013), where on expressing GFP under Ef1a promoter delivered using Lenti virus, 95.2% of labeled neurons in layer 2/3 were excitatory and 94.1% in layer 5 were excitatory. These numbers are comparable to the ones obtained with promoters commonly used to target expression to excitatory neurons. To do this, typically two variants of promoters based on the transcription start region of CaMKIIα gene have been used. The first, the CaMKIIα-0.4 promoter, results in 95% excitatory specificity (Scheyltjens et al., 2015). The second, the CaMKIIα-1.3 promoter, results in only 82% excitatory specificity (Scheyltjens et al., 2015), and is thus not far from chance. We have clarified this in the manuscript. Nevertheless, we have removed the qualifier “excitatory” when talking about neurons in most instances, throughout the manuscript.

      2) The manuscript concludes that cholinergic axons convey a binary locomotion signal and are not tuned to running speed. The average running velocity of mice in this study is very slow - slower than 15 cm/s in the example trace in Figure 1D and speeds <6 cm/s were quantified in Figure 2E. However, mice can run at much faster speeds both under head-fixed and freely moving conditions (see e.g. Jordan and Keller, 2020, where example running speeds are ~35 cm/s). Given that the data in the present manuscript cover such a narrow range of running speeds, it is not possible to determine whether cholinergic axons are tuned to running speed or convey a binary locomotion signal.

      Our previous analysis window of 0-6.25 cm/s covered approximately 80% of all data. We have increased the analysis window to 0-35 cm/s that now covers more than 99% of the data (see below). Also, note that very high running speeds are probably overrepresented in the Jordan and Keller 2020 paper as mice had to be trained to run reliably before all experiments given the relatively short holding times of the intracellular recordings. The running speeds in our current dataset are comparable to other datasets we have acquired in similar experiments.

      Figure 2E has now been updated to reflect the larger range of data. Please note, as the number of mice that contribute to the data now differs as a function of velocity (some mice run faster than others), we have now switched to a variant of the plot based on hierarchical bootstrap sampling (see Methods). This does not overtly change the appearance of the plot. See Author response image 1 for a comparison of the original plot, the extended range without bootstrap sampling, and the extended range with bootstrap sampling currently used in the paper.

      Author response image 1.

      Average activity of cholinergic axons as a function of locomotion velocity. (A) As in the previous version of the manuscript. (B) As in A, but with the extended velocity range. (C) As in B, but using hierarchical bootstrap sampling to estimate median (red dots) and 95% confidence interval (shading) for each velocity bin.

      3) The analyses in Figure 4 only consider the average response to all grating orientations and directions. Without further analysing responses to individual grating directions it is unclear how stimulation of cholinergic inputs affects visual responses. Previous work (e.g. Datarlat and Stryker, 2017) has shown that locomotion can have both additive and multiplicative effects and it would be valuable to determine the type of modulation provided by cholinergic stimulation.

      We thank the reviewer for this suggestion. To address this, we quantified how cholinergic stimulation influenced the orientation tuning of V1 neurons. The stimuli we used were full field sinusoidal drifting gratings of 4 different orientations (2 directions each). For each neuron, we identified the preferred orientation and plotted responses relative to this preferred orientation as a function of whether the mouse was running, or we were stimulating cholinergic axons. Consistent with previous work, we found a mixture of a multiplicative and an additive components during running. With cholinergic axon stimulation, the multiplicative effect was stronger than the additive effect. This is now quantified in Figure S7.

      4) The difference between the effects of locomotion and optogenetic stimulation of cholinergic axons in Figure 5 may be confounded by differences in the visual stimulus. These experiments are carried out under open-loop conditions, where mice may adapt their locomotion based on the speed of the visual stimulus. Consequently, locomotion onsets are likely to occur during periods of higher visual flow. Since optogenetic stimulation is presented randomly, it is likely to occur during periods of lower visual flow speed. Consequently, the difference between the effect of locomotion and optogenetic stimulation may be explained by differences in visual flow speed and it is important to exclude this possibility.

      We find that in general locomotion is unaffected by visual flow in open loop conditions in this type of experiment (in this particular dataset, there was a small negative correlation between locomotion and visual flow in the open loop condition, Author response image 2).

      Author response image 2.

      Correlation between visual flow and locomotion in open loop conditions. Average correlation of locomotion velocity and visual flow speed in open loop for all mice in Figure 5. Each dot is an imaging site. In the open loop, the correlation between locomotion and visual flow speed is close to zero, but significantly negative in this dataset.

      However, to directly address the concern that our results are influenced by visual flow, we can restrict our analysis only to locomotion onsets that occurred in absence of visual flow (Author response image 3A and R3B). These responses are not substantially different from those when including all data (Figures 5A and 5B). Thus, the difference between the effect of locomotion and optogenetic stimulation cannot be explained by differences in visual flow speed.

      Author response image 3.

      Open loop locomotion onset responses without visual flow. (A) Average calcium response of layer 2/3 neurons in visual cortex to locomotion onset in open loop in the absence of visual flow. Shading indicates SEM. (B) As in A, but for layer 5 neurons.

      5) It is unclear why chemogenetic manipulations of cholinergic inputs had no effect on pairwise correlations of L2/3 neuronal responses while optogenetic stimulation did.

      This is correct – we do not know why that is the case and can only speculate. There are at least two possible explanations for this difference:

      1) Local vs. systemic. The optogenetic manipulation is relatively local, while the chemogenetic manipulation is systemic. It is not clear how cholinergic release in other brain regions influences the correlation structure in visual cortex. It is conceivable that a cortex-wide change in cholinergic release results in a categorically different state with a specific correlation structure in layer 2/3 neurons different from the one induced by the more local optogenetic manipulation.

      2) Layer-specificity of activation. Cholinergic projections to visual cortex arrive both in superficial and deep layers. We activate the axons in visual cortex optogenetically by illuminating the cortical surface. Thus, in our optogenetic experiments, we are primarily activating the axons arriving superficially, while in the chemogenetic experiment, we are likely influencing superficial and deep axons similarly. Thus, we might expect a bias in the optogenetic activation to influencing superficial layers more strongly than the chemogenetic activation does.

      6) The effects of locomotion and optogenetic stimulation on the latency of L5 responses in Figure 7 are very large - ~100 ms. Indeed, typical latencies in mouse V1 measured using electrophysiology are themselves shorter than 100 ms (see e.g. Durand et al., 2016). Visual response latencies in stationary conditions or without optogenetic stimulation appear surprisingly long - much longer than reported in previous studies even under anaesthesia. Such large and surprising results require careful analysis to ensure they are not confounded by artefacts. However, as in Figure 4, this analysis is based only on average responses across all gratings and no individual examples are shown.

      This is correct and we speculate this is the consequence of a combination of different reasons.

      1) Calcium imaging is inherently slower than electrophysiological recordings. While measuring spiking responses using electrophysiology, response latencies of on the order of 100 ms have indeed been reported, as the reviewer points out. Using calcium imaging these latencies are typically 4 times longer (Kuznetsova et al., 2021). This is likely a combination of a) calcium signals that are slower than electrical changes, b) delays in the calcium sensor itself, and c) temporal sampling used for imaging that is about 3 orders of magnitude slower than what typically used for electrophysiology.

      2) Different neurons included in analysis. The calcium imaging likely has very different biases than electrophysiological recordings. Historically, the fraction of visually responsive neurons in visual cortex based on extracellular electrophysiological recordings has been systematically overestimated (Olshausen and Field, 2005). One key contributor to this is the fact that recordings are biased to visually responsive neurons. The criteria for inclusion of “responsive neurons” strongly influences the “average” response latency. In addition, calcium imaging has biases that relate to the vertical position of the somata in cortex. Both layer 2/3 and layer 5 recordings are likely biased to superficial layer 2/3 and superficial layer 5 neurons. Conversely, electrical recordings are likely biased to layer 4 and layer 5 neurons. Thus, comparisons at this level of resolution between data obtained with these two methods are difficult to make.

      We have added example neurons as Figure S12, as suggested.  

      Reviewer #1 (Recommendations For The Authors):

      While the study showcases valuable insights, I have a couple of concerns regarding the novelty of their research and the interpretation of results. By addressing these concerns, the authors can clarify the positioning of their research and strengthen the significance of their findings.

      (Major comments)

      1) Page 1, Line 21: The authors claim, "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, enabling faster switching between internal representations during locomotion." However, it is not clear which specific data or results support the claim of "switching between internal representations." Overall, their study primarily presents responses averaged across all neurons imaged, lacking a detailed exploration of individual neuron response patterns. Population analysis, such as PCA and decoding, can be used to assess the encoding of each stimulus by V1 neurons - "internal representation."<br /> To strengthen their claim regarding "switching between internal representations," the authors could consider an experiment measuring the speed at which the population activity pattern A transitions to the population activity pattern B when the visual stimulus switches from A to B. Such experiments would significantly enhance the impact of their study, providing a clearer understanding of how BF cholinergic input influences the dynamic representation of stimuli during locomotion.

      We thank the reviewer for bringing this up. That acetylcholine enables a faster switching between internal representations in layer 5 is a speculation. We have attempted to make this clearer in the discussion. Our speculation is based on the finding that the population response in layer 5 to sensory input is faster under high levels of acetylcholine (Figures 4D and 7B). In line with the reviewer’s intuition, the neuronal response to a change in visual stimulus, in our experiment from a uniform grey visual stimulus to a sinusoidal grating stimulus, is indeed faster. Based on evidence in favor of layer 5 encoding internal representation (Heindorf and Keller, 2023; Keller and Mrsic-Flogel, 2018; Suzuki and Larkum, 2020), we interpret the decrease in latency of the population response as a faster change in internal representation. We are not sure a decoding analysis would add much to this, given that a trivial decoder simply based on mean population response would already find a faster transition. We have expanded on our explanation of these points in the manuscript.

      2) Page 4, Line 103: "..., a direct measurement of the activity of cholinergic projection from basal forebrain to the visual cortex during locomotion has not been made." This statement is incorrect. An earlier study by Reimer et al. indeed imaged cholinergic axons in the visual cortex of mice running on a wheel. They found that "After walking onset, ... ACh activation, and a large pupil diameter, were sustained throughout the walking period in both cortical areas V1 and A1." Their findings are very similar to the results presented by Yogesh and Keller - that is, BF cholinergic axons exhibited locomotion statedependent activity. The authors should clarify the positioning of this study relative to previous studies.

      Reimer, J., McGinley, M., Liu, Y. et al. Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex. Nat Commun 7, 13289 (2016). https://doi.org/10.1038/ncomms13289

      We have clarified this as suggested. However, we disagree slightly with the reviewer here. The key question is whether the cholinergic axons imaged originate in basal forebrain. While Reimer et al. 2016 did set out to do this, we believe a number of methodological considerations prevent this conclusion:

      1) In their analysis, Reimer et al. 2016 combine data from mice with cholinergic axons labeled with either viral injection to basal forebrain or germline cross of ChAT-cre mice with reporter line. Unfortunately, it is unclear what the exact number of mice labeled with either strategy was. Based on the information in the paper, we can conclude that of the 6 mice used for experiments between 2 and 5 were germline cross. The problem with germline labeling of ChAT positive neurons is that when using a cross, VIP-ChAT+ neurons in cortex are also labeled. Based on the fact that Reimer et al. 2016 find an anticipatory increase in activity on locomotion onset, that is also seen by Larsen et al. 2018 (they use a germline cross strategy), an effect we do not see in our data, we speculate that a significant part of the signals reported in the Reimer et al. 2016 paper are from local VIP-ChAT+ neurons.

      2) In their analysis, Reimer et al. 2016 also combine all imaging data obtained from both primary auditory cortex and primary visual cortex. Given the heterogeneity in the basal forebrain cholinergic neuronal population and their projection selectivity, to better understand these signals, it’s important to acquire the signals from cholinergic axons selectively in specific cortical regions, which we do in visual cortex. Based on the information provided in their paper, we were unfortunately not able to discern the injection location for their viral labeling strategy. Given the topographic selectivity in projection from basal forebrain, this could give hints as to the relative contribution of cholinergic projections to A1 vs V1 in their data. The injection coordinates given in the methods of the Reimer paper, of 4 mm lateral and 0.5 mm posterior to bregma to target basal forebrain, are likely wrong (they fall outside the head of the mouse).

      Given the heterogeneity in the basal forebrain cholinergic neuronal population and their projection selectivity, to better understand these signals, it’s important to acquire the signals from cholinergic axons both selectively in a cortical region, as we do in visual cortex, and purely originating from basal forebrain. Collins et al. 2023 inject more laterally and thus characterize cholinergic input to S1 and A1, while Lohani et al. 2022 use GRAB sensors which complement our findings. Please note, we don’t think there is any substantial disagreement in the results of previous studies and ours, with very few exceptions, like the anticipatory increase in cholinergic activity that precedes locomotion onset in the Reimer et al. 2016 data, but not in ours. This is a rather critical point in the context of the literature of motor-related neuronal activity in mouse V1. Based on early work on the topic, it is frequently assumed that motor-related activity in V1 is driven by a cholinergic input. This is very likely incorrect given our results, hence we feel it is important to highlight this methodological caveat of earlier work.

      3) Fig. 4H: The authors found that L5 neurons exhibit positive responses at the onset of locomotion in a closed-loop configuration. Moreover, these responses are further enhanced by photostimulation of BF axons.

      In a previous study from the same authors' group (Heindorf and Keller, 2023), they reported 'negative' responses in L5a IT neurons during closed-loop locomotion. This raises a question about the potential influence of different L5 neuron types on the observed results between the two studies. Do the author think that the involvement of the other neuronal type in L5, the PT neurons, might explain the positive responses seen in the present study? Discussing this point in the paper would provide valuable insights into the underlying mechanisms.

      Yes, we do think the positive response observed on locomotion onset in closed loop is due to non-Tlx3+ neurons. Given that Tlx3-cre only labels a subset of inter-telencephalic (IT) neurons (Gerfen et al., 2013; Heindorf and Keller, 2023), it’s not clear whether the positive response is explained by the pyramidal tract (PT) neurons, or the non-Tlx3+ IT neurons. Dissecting the response profiles of different subsets of layer 5 neurons is an active area of research in the lab and we hope to be able to answer these points more comprehensively in future publications. We have expanded on this in the discussion as suggested.

      Furthermore, it would be valuable to investigate whether the effects of photostimulation of BF axons vary depending on neuronal responsiveness. This could help elucidate how neurons with positive responses, potentially putative PT neurons, differ from neurons with negative responses, putative IT neurons, in their response to BF axon photostimulation during locomotion.

      We have attempted an analysis of the form suggested. In short, we found no relationship between a neuron’s response to optogenetic stimulation of ChAT axons and its response to locomotion onset, or its mean activity. Based on their response to locomotion onset in closed loop, we split layer 5 neurons into three groups, 30% most strongly decreasing (putative Tlx3+), 30% most strongly increasing, and the rest. We did not see a response to optogenetic stimulation of basal forebrain cholinergic axons in any of the three groups (Author response image 4A). We also found no obvious relationship between the mean activity of neurons and their response to optogenetic stimulation (Author response image 4B).

      Author response image 4.

      Neither putative layer 5 cell types nor neuronal responsiveness correlates with the response to optogenetic stimulation of cholinergic axons. (A) Average calcium response of layer 5 neurons split into putative Tlx3 (closed loop locomotion onset suppressed) and non-Tlx3 like (closed loop locomotion onset activated) to optogenetic stimulation of cholinergic axons. (B) Average calcium response of layer 5 neurons to optogenetic stimulation of cholinergic axons as a function of their mean response throughout the experimental session. Left: Each dot is a neuron. Right: Average correlation in the response of layer 5 to optogenetic stimulation and mean activity over all neurons per imaging site. Each dot is an imaging site.

      (Minor comments)

      1) It is unclear which BF subregion(s) were targeted in this study.

      Thanks for pointing this out. We targeted the entire basal forebrain (medial septum, vertical and horizontal limbs of the diagonal band, and nucleus basalis) with our viral injections. All our axonal imaging data comes from visual cortex and given the sensory modality-selectivity of cholinergic projections to cortex, the labeled axons originate from medial septum and the diagonal bands (Kim et al., 2016). We have now added the labels for basal forebrain subregions targeted next to the injection coordinates in the manuscript.

      2) Page 43, Line 818: The journal name of the cited paper Collins et al. is missing.

      Fixed.

      3) In the optogenetic experiments, how long is the inter-trial interval? Simulation of BF is known to have long-lasting effects on cortical activity and plasticity. It is, therefore, important to have a sufficient interval between trials.

      The median inter-trial interval for different stimulation events are as follows:

      • Optogenetic stimulation only : 15 s

      • Optogenetic stimulation + grating : 12 s

      • Optogenetic stimulation + mismatch: 35 s

      • Optogenetic stimulation + locomotion onset: 45 s

      We have added this information to the methods in the manuscript.

      Assuming locomotion is the primary driver of acetylcholine release (as we argue in Figures 1 and 2), the frequency of stimulation roughly corresponds to the frequency of acetylcholine release experienced endogenously. It is of course possible that being awake and mobile puts the entire system in a longlasting acetylcholine driven state different from what would be observed during long-term quite wakefulness or during sleep. But the main focus of the optogenetic stimulation experiments we performed was to investigate the consequences of the rapid acetylcholine release driven by locomotion.

      4) Page 11, Line 313: "..., we cannot exclude the possibility of a systemic contribution to the effects we observe through shared projections between different cortical and subcortical target." This possibility can be tested by examining the effect of optogenetic stimulation of cholinergic axons on locomotor activity, as they did for the chemogenetic experiments (Fig. S7). If the optogenetic manipulation changes locomotor activity, it is likely that this manipulation has some impact on subcortical activity and systemic contribution to the changes in cortical responses observed.

      Based on the reviewer suggestion we tested this and found no change in the locomotor activity of the mice on optogenetic stimulation of cholinergic axons locally in visual cortex (we have added this as Figure S5 to the manuscript). Please note however, we can of course not exclude a systemic contribution based on this.

      5) Fig. 4 and 5: In a closed-loop configuration, L2/3 neurons exhibit a transient increase in response at the onset of locomotion, while in an open-loop configuration, their response is more prolonged. On the other hand, L5 neurons show a sustained response in both configurations. Do the authors have any speculation on this difference?

      This is correct. Locomotion onset responses in layer 2/3 are strongly modulated by whether the locomotion onset occurs in closed loop or open loop configurations (Widmer et al., 2022). This difference is absent in our layer 5 data here. We suspect this is a function of a differential within-layer cell type bias in the different recordings. In the layer 2/3 recordings we are likely biased strongly towards superficial L2/3 neurons that tend to be negative prediction error neurons (top-down excited and bottom-up inhibited), see e.g. (O’Toole et al., 2023). A reduction of locomotion onset responses in closed loop is what one would expect for negative prediction error neurons. While layer 5 neurons exhibit mismatch responses, they do not exhibit opposing top-down and bottom-up input that would result in such a suppression (Jordan and Keller, 2020).

      We can illustrate this by splitting all layer 2/3 neurons based on their response to gratings and to visuomotor mismatch into a positive prediction error (PE) type (top 30% positive grating response), a negative prediction error type (top 30% positive visuomotor mismatch response), and the rest (remaining neurons and neurons responsive to both grating and visuomotor mismatch). Plotting the response of these neurons to locomotion onset in closed loop and open loop, we find that negative PE neurons have a transient response to locomotion onset in closed loop while positive PE neurons have a sustained increase in response in closed loop. In open loop the response of the two populations is indistinguishable. Splitting the layer 5 neurons using the same criteria, we don’t find a striking difference between closed and open loop between the two groups of neurons. We have added this as Figure S8.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      1) As a ubiquitous promoter was used to drive GCaMP expression, please explain how excitatory neurons were identified.

      2) As the data cover a very small range of running speeds, it is important to confirm that the binary locomotion signal model still applies when mice run at higher speeds - either by selecting recordings where mice have a wider range of running speeds or conducting additional experiments. In addition, please show the running speed tuning of individual axons.

      3) Please provide a more detailed analysis of the effects of locomotion and cholinergic modulation on visual responses. How does cholinergic modulation affect orientation and direction tuning? Are the effects multiplicative or additive? How does this compare to the effects of locomotion on single neurons?

      4) To ensure that the analyses in Figure 5 are not confounded by differences in the visual stimulus, please include average visual flow speed traces for each condition.

      5) Please clarify why chemogenetic manipulations of cholinergic inputs had no effect on pairwise correlations in L2/3.

      6) The latency effect is quite an extraordinary claim and requires careful analysis. Please provide examples of single neurons illustrating the latency effect - including responses across individual grating orientations/directions. One possible confound is that grating presentation could itself trigger locomotion or other movements. In the stationary / noOpto conditions, the grating response might not be apparent in the average trace until the animal begins to move. Thus the large latency in the stationary / noOpto conditions may reflect movement-related rather than visual responses.

      Please see our responses to these points in the public review part above.

      There are some minor points where text and figures could be improved:

      1) When discussing the decorrelation of neuronal responses by cholinergic axon activation, it is important to make it clear that Figure 6D quantifies the responses of layer 5 apical dendrites rather than neurons.

      We have added this information to the results section.

      2) In Figure S7, please clarify why velocity is in arbitrary units.

      This was an oversight and has been fixed.

      3) Please clarify how locomotion and stational trials are selected in Figure 4.

      We thank the reviewers for pointing this out. Trials were classified as occurring during locomotion or while mice were stationary as follows. We used a time-window of -0.5 s to +1 s around stimulus onset. If mice exhibited uninterrupted locomotion above a threshold of 0.25 cm/s in this time-window, we considered the stimulus as occurring during locomotion, otherwise it was defined as occurring while the mice were stationary. Note, the same criteria to define locomotion state was used to isolate visuomotor mismatch events, and also during control optogenetic stimulation experiments. We have added this information to the methods.

      4) When testing whether cholinergic activation is sufficient to explain locomotion-induced decorrelation in Figure 6G-H, please show pre-CNO and post-CNO delta-correlation, not just their difference.

      We can do that, but the results are harder to parse this way. We have added this as Figure S11 to the manuscript. The problem with parsing the figure is that the pre-CNO levels are different in different groups. This is likely a function of mouse-to-mouse variability and makes it harder to identify what the CNO induced changes are. Using the pre-post difference removes the batch influence. Hence, we have left this as the main analysis in Figure 6G and 6H.

    1. Author Response

      eLife assessment

      The important work by Aballay et al. significantly advances our understanding of how G protein-coupled receptors (GPCRs) regulate immunity and pathogen avoidance. The authors provide convincing evidence for the GPCR NPR-15 to mediate immunity by altering the activity of several key transcription factors. This work will be of broad interest to immunologists.

      The authors express their sincere appreciation to Timothy Behrens (Senior Editor), the Reviewing Editor, and the original reviewers for their considerate and favorable assessment of our manuscript.

      Reviewer #1 (Public Review):

      Summary:

      Otarigho et al. presented a convincing study revealing that in C. elegans, the neuropeptide Y receptor GPCR/NPR-15 mediates both molecular and behavioral immune responses to pathogen attack. Previously, three npr genes were found to be involved in worm defense. In this study, the authors screened mutants in the remaining npr genes against P. aeruginosa-mediated killing and found that npr-15 loss-of-function improved worm survival. npr-15 mutants also exhibited enhanced resistance to other pathogenic bacteria but displayed significantly reduced avoidance to S. aureus, independent of aerotaxis, pathogen intake and defecation. The enhanced resistance in npr-15 mutant worms was attributed to upregulation of immune and neuropeptide genes, many of which were controlled by the transcription factors ELT-2 and HLH-30. The authors found that NPR-15 regulates avoidance behavior via the TRPM gene, GON-2, which has a known role in modulating avoidance behavior through the intestine. The authors further showed that both NPR-15-dependent immune and behavioral responses to pathogen attack were mediated by the NPR-15-expressing neurons ASJ. Overall, the authors discovered that the NPR-15/ASJ neural circuit may regulate distinct defense mechanisms against pathogens under different circumstances. This study provides novel and useful information to researchers in the fields of neuroimmunology and C. elegans research.

      The authors are grateful for the thoughtful and insightful comments on our manuscript. Your feedback has been instrumental in refining our work, and we appreciate the time and expertise you have invested in evaluating our study.

      Strengths:

      1) This study uncovered specific molecules and neuronal cells that regulate both molecular immune defense and behavior defense against pathogen attack and indicate that the same neural circuit may regulate distinct defense mechanisms under different circumstances. This discovery is significant because it not only reveals regulatory mechanisms of different defense strategies but also suggests how C. elegans utilize its limited neural resources to accomplish complex regulatory tasks.

      The authors express gratitude to the reviewer for recognizing that the present study revealed specific molecules and neuronal cells involved in regulating both molecular immune defense and behavioral defense against pathogen attacks. Additionally, the acknowledgment that the same neural circuit may oversee distinct defense mechanisms under different circumstances is appreciated.

      2) The conclusions in this study are supported by solid evidence, which are often derived from multiple approaches and/or experiments. Multiple pathogenic bacteria were tested to examine the effect of NPR-15 loss-of-function on immunity; the impacts of pharyngeal pumping and defecation on bacterial accumulation were ruled out when evaluating defense; RNA-seq and qPCR were used to measure gene expression; gene inactivation was done in multiple strains to assess gene function.

      The authors thank the reviewer for appreciating that this study is supported by solid evidence.

      3) Gene differential expression, gene ontology, and pathway analyses were performed to demonstrate that NPR-15 controls immunity by regulating immune pathways.

      The authors thank the reviewer for appreciating the Gene differential expression, gene ontology, and pathway analyses performed in the study.

      4) Elegant approaches were employed to examine avoidance behavior (partial lawn, full lawn, and lawn occupancy) and the involvement of neurons in regulating immunity and avoidance (the use of a diverse array of mutant strains).

      The author thanks the reviewer for appreciating the approaches used in this study.

      5) Statistical analyses were appropriate and adequate.

      The authors thank the reviewer for appreciating the Statistical analyses used in this study.

      Reviewer #2 (Public Review):

      Summary:

      The authors are studying the behavioral response to pathogen exposure. They and others have previously describe the role that the G-protein coupled receptors in the nervous system plays in detecting pathogens, and initiating behavioral patterns (e.g. avoidance/learned avoidance) that minimize contact. The authors study this problem in C. elegans, which is amenable to genetic and cellular manipulations and allow the authors to define cellular and signaling mechanisms. This paper extends the original idea to now implicate signaling and transcriptional pathways within a particular neuron (ASJ) and the gut in mediating avoidance behaviour.

      Strengths:

      The work is rigorous and elegant and the data are convincing. The authors make superb use of mutant strains in C. elegans, as well tissue specific gene inactivation and expression and genetic methods of cell ablation. to demonstrate how a gene, NPR15 controls behavioral changes in pathogen infection. The results suggest that ASJ neurons and the gut mediate such effects. I expect the paper will constitute an important contribution to our understanding of how the nervous system coordinates immune and behavioral responses to infection.

      The authors sincerely thank the reviewer for the thoughtful and positive review of our manuscript. We greatly appreciate the time and effort you dedicated to evaluating our work, and we are pleased that you find our study to be a rigorous and elegant contribution to the understanding of behavioral responses to pathogen exposure.

      Reviewer #1 (Recommendations For The Authors):

      The authors have adequately addressed my concerns and questions. I have no more comments or recommendations for the authors.

      The authors thank the reviewer for the constructive comments on the manuscript

      Reviewer #2 (Recommendations For The Authors):

      The authors have adequately addressed my concerns.

      The authors express their appreciation to the reviewer for the valuable and constructive comments provided on the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the effects of two sensory stimuli (visual and somatosensory) on fMRI responsiveness during absence seizures were investigated in GEARS rats with concurrent EEG recordings. SPM analysis of fMRI showed a significant reduction in whole-brain responsiveness during the ictal period compared to the interictal period under both stimuli, and this phenomenon was replicated in a structurally constrained whole-brain computational model of rat brains.

      The conclusion of this paper is that whole-brain responsiveness to both sensory stimuli is inhibited and spatially impeded during seizures.

      I also suggest the manuscript should be written in a way that is more accessible to readers who are less familiar with animal experiments. In addition, the implementation and interpretation of brain simulations need to be more careful and clear.

      Several sections of the manuscript were clarified and simplified to be more accessible. Also, implementation and interpretations of brain simulations were modified to be more precise.

      Strengths:

      1) ZTE imaging sequence was selected over traditional EPI sequence as the optimal way to perform fMRI experiments during absence seizures.

      2) A detailed classification of stimulation periods is achieved based on the relative position in time of the stimulation period with respect to the brain state.

      3) A whole-brain model embedded with a realistic rat connectome is simulated on the TVB platform to replicate fMRI observations.

      We thank the reviewer for indicating the strengths of our manuscript.

      Weaknesses:

      1) The analysis in this paper does not directly answer the scientific question posed by the authors, which is to explore the mechanisms of the reduced brain responsiveness to external stimuli during absence seizures (in terms of altered information processing), but merely characterizes the spatial involvement of such reduced responsiveness. The same holds for the use of mean-field modeling, which merely reproduces experimental results without explaining them mechanistically as what the authors have claimed at the head of the paper.

      We agree with the reviewer that the manuscript does not answer specifically about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states. The sentence that can lead to misinterpretations in the manuscript abstract: “The mechanism underlying the reduced responsiveness to external stimulus remains unknown.” was therefore modified to the following “The whole-brain spatial and temporal characteristics of reduced responsiveness to external stimulus remains unknown”.

      2) The implementations of brain simulations need to be more specific.

      Contribution:

      The contribution of this paper is performing fMRI experiments under a rare condition that could provide fresh knowledge in the imaging field regarding the brain's responsiveness to environmental stimuli during absence seizures.

      Reviewer #2 (Public Review):

      Summary:

      This study examined the possible effect of spike-wave discharges (SWDs) on the response to visual or somatosensory stimulation using fMRI and EEG. This is a significant topic because SWDs often are called seizures and because there is non-responsiveness at this time, it would be logical that responses to sensory stimulation are reduced. On the other hand, in rodents with SWDs, sensory stimulation (a noise, for example) often terminates the SWD/seizure.

      In humans, these periods of SWDs are due to thalamocortical oscillations. A certain percentage of the normal population can have SWDs in response to photic stimulation at specific frequencies. Other individuals develop SWDs without stimulation. They disrupt consciousness. Individuals have an absent look, or "absence", which is called absence epilepsy.

      The authors use a rat model to study the responses to stimulation of the visual or somatosensory systems during and in between SWDs. They report that the response to stimulation is reduced during the SWDs. While some data show this nicely, the authors also report on lines 396-8 "When comparing statistical responses between both states, significant changes (p<0.05, cluster-) were noticed in somatosensory auditory frontal..., with these regions being less activated in interictal state (see also Figure 4). That statement is at odds with their conclusion.

      We thank the reviewer for noting this discrepancy. The statement should have been written vice versa and it has been corrected as: “When comparing statistical responses between both states, significant changes (p<0.05, cluster-level corrected) were noticed in the somatosensory, auditory and frontal cortices: these regions were less activated in ictal than in interictal state (see also Figure 4).”

      They also conclude that stimulation slows the pathways activated by the stimulus. I do not see any data proving this. It would require repeated assessments of the pathways in time.

      We agree with the reviewer that there are no data showing slowing of the pathways in response to stimulus. However, we are a bit confused about this comment, as to what part in conclusion section it refers to. We did not intentionally claim that stimulation slows the activated pathways in the manuscript.

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data.

      Hemodynamic response functions were studied for two reasons:

      • To account for possible change in HRF during the detection of activated regions. Indeed, a physiological change in HRF can mask the detection of an activation when the software uses a standard HRF to convolve the design matrix (David et al. 2008).

      • To characterize the shape and polarity of fMRI activations in brain regions that we noticed to be differently activated between ictal and interictal states and evaluate whether alteration in activation was associated to alteration in hemodynamic.

      The observed HRF decreases (rather than increases) in the cortex when stimulation was applied during SWD, was discussed in section 4.4., where we speculated that neuronal suppression caused by SWD can prevent responsiveness. In this case, the decreased HRF could either be a consequence or a cause of the observed neuronal suppression. The assumption that the HRF reduction is causal would be supported by a possible vascular steal effect from other activation regions. However, in the conclusion section we did not state this and therefore the following sentence was added to conclusions: “Moreover, the detected decreases in the cortical HRF when sensory stimulation was applied during spike-and-wave discharges, could play a role in decreased sensory perception. Further studies are required to evaluate whether this HRF change is a cause or a consequence of the reduced neuronal response”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The conclusion is that the modeling supports the conclusions of the study, which is useful.

      Details about the model were added.

      Strengths:

      Use of fMRI and EEG to study SWDs in rats.

      Weaknesses:

      Several aspects of the Methods and Results are unclear.

      Reviewer #3 (Public Review):

      Summary:

      This is an interesting paper investigating fMRI changes during sensory (visual, tactile) stimulation and absence seizures in the GAERS model. The results are potentially important for the field and do suggest that sensory stimulation may not activate brain regions normally during absence seizures. However the findings are limited by substantial methodological issues that do not enable fMRI signals related to absence seizures to be fully disentangled from fMRI signals related to the sensory stimuli.

      Strengths:

      Investigating fMRI brain responses to sensory stimuli during absence seizures in an animal model is a novel approach with the potential to yield important insights.

      The use of an awake, habituated model is a valid and potentially powerful approach.

      Weaknesses:

      The major difficulty with interpreting the results of this study is that the duration of the visual and auditory stimuli was 6 seconds, which is very close to the mean seizure duration per Table 1. Therefore the HRF model looking at fMRI responses to visual or auditory stimuli occurring during seizures was simultaneously weighting both seizure activity and the sensory (visual or auditory) stimuli over the same time intervals on average. The resulting maps and time courses claiming to show fMRI changes from visual or auditory stimulation during seizures will therefore in reality contain some mix of both sensory stimulation-related signals and seizure-related signals. The main claim that the sensory stimuli do not elicit the same activations during seizures as they do in the interictal period may still be true. However the attempts to localize these differences in space or time will be contaminated by the seizure-related signals.

      The claims that differences were observed for example between visual cortex and superior colliculus signals with visual stim during seizures vs. interictal are unconvincing due to the above.

      We understand this concern expressed by the reviewer and agree that seizure-related signals must be considered in the analysis when studying stimulation responses. Therefore, in modelling the responses in the SPM framework, we considered both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the stimulation should be, in theory, separated as much as possible from the effects caused by the seizure itself. Additionally, the cases where stimulations occurred fully inside a seizure (included in Figure 3, “...stimulation during ictal state) actually had a longer average seizure duration of 45 ± 60 s, therefore being much longer than 6s which an average duration taken from all seizures.

      However, we acknowledge that there is a potential that some leftover effects from a seizure are still present, and we have noted this caution in the “Physiologic and methodologic considerations” section: “We note a caution that presented maps and time courses showing fMRI changes from visual or whisker stimulation during seizures may contain mixture of both sensory stimulation-related signals and seizure-related signals. To minimize this contamination, we considered in SPM both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the seizure itself should be separated as much as possible from the effects caused by stimulation.”

      The maps shown in Figure 3 do not show clear changes in the areas claimed to be involved.

      We clarified the overall appearance of Figure 3, by enlarging the selected cross sections for better anatomical differentiation and added anterior and posterior directions on all images.

      Reviewer #1 (Recommendations For The Authors):

      1) The implementations of brain simulations need to be more specific: How is the stimulation applied in the mean-field model in terms of its mathematical expression? The state variable of the model is the rate of neuronal firing, but how is it subsequently converted into fMRI responses? How are the statistical plots calculated? How much does this result depend on the model parameter?

      Further details and explanations about the model have now been added to the manuscript. The stimulation of a specific region is simulated as an increase in the excitatory input to the specific node. In particular we use a square function for representing the stimulus (see for example panel A in Figure 6–figure supplement 1). As the referee mentions, the model describes the dynamics of the neuronal firing rates. This provides direct information about neuronal activity and responsiveness for which all the statistical analyses of the simulations shown in the paper were performed using the firing rates. For these analyses, no conversion to fMRI was needed. To build the statistical maps, an ANOVA (analysis of variance) test was used. The ANOVA test is originally designed to assess the significance of the change in the mean between two samples, and is calculated via an F-test as the ratio of the variance between and within samples. In our case it allowed us to assess the impact of the stimulation on the ongoing neuronal activity by performing a comparison of the timeseries of the firing rate with and without stimulation (this was performed independently for each state). For the results presented in this paper, the ANOVA analysis was performed using the “f_oneway” function of the scipy.stats. module in python. Regarding the dependence on the model parameter, the main results obtained in our paper are related with the responsiveness of the system under two quantitatively different types of ongoing dynamics: an asynchronous irregular activity (interictal period) and an oscillatory SWD type of dynamics (ictal period). In particular, we show how for the SWD dynamics the activity evoked by the stimulus is overshadowed by the ongoing activity which imposes a strong limitation in the response of the system and the propagation of the stimulus. In this sense, the main results of the simulations are very general, and no significant dependence on specific cellular or network parameters was observed within a physiologically relevant range or should be expected. Nevertheless, we point out that, as mentioned in the text, the key parameter that triggers the transition between the two types of dynamics is the strength of the adaptation current (in particular the strength of the spike-triggered adaptation parameter ‘b’ described in the Supplementary information), which in addition has the capacity of controlling the frequency of the oscillations. In the paper, this parameter was set such that the SWD frequency falls within the range observed in the GAERS (between 7-12Hz). We believe that further analysis around the region of transition between states, in particular from a dynamical point of view, could be of relevance for future work.

      2) In the abstract, what exactly does "typical information flow in functional pathways" mean and which part of the results does this refer to?

      We note that this sentence was overly complicated. By “typical information flow”, we were referring to sensory responsiveness during interictal state. Therefore, we made the following modifications to the abstract: “These results suggest that sensory processing observed during an interictal state can be hindered or even suppressed by the occurrence of an absence seizure, potentially contributing to decreased responsiveness.”

      3) Figure 4 - Figure Supplement 1 performed an analysis of comparing states between 'when stimulation ended a seizure' and 'stimulation during an ictal period'. The authors should explain more clearly in the manuscript what is the reason and significance of considering the state of 'when stimulation ended a seizure'. And how is a seizure considered to be terminated by stimulation rather than ending spontaneously?

      We have now added explanations to the manuscript section 2.5.3 as why this state was also of interest: “The case when stimulation ended a seizure is particularly interesting for studying the spatial and temporal aspects explaining shift from ictal, i.e. non-responsiveness state, to non-ictal, i.e. responsiveness state.” We agree that there is a possibility that seizures ended spontaneously at the same time as stimulus was applied but argue that seizures most probably end due to stimulation, based on results published previously (https://doi.org/10.1016/j.brs.2012.05.009).

      4) In Section 3.1, some detailed descriptions of methods should be moved to Section 2, e.g. how the spatial and temporal SNR is obtained and the description of bad quality data. Also, I suggest the significance of selecting the optimal MRI sequence be stated earlier in the paper, as Section 3.1 cannot be expected from reading the abstract and introduction.

      We moved some technical explanations of SNRs from section 3.1. to section 2.4.1. Significance of the selection of the MRI sequence is also now stated earlier in the introduction section: “For this purpose, the functionality of ZTE sequence was first piloted, and selected over traditional EPI sequence for its lower acoustic noise and reduced magnetic susceptibility artefacts. The selected MRI sequence thus appeared optimal for awake EEG-fMRI measurements.”

      Some minor issues:

      1) How is ROI defined in this paper? What type of atlas is used?

      Anatomical ROIs were drawn based on Paxinos and Watson rat brain atlas 7th edition. Region was selected if there were statistically significant activations detected inside that region, based on activation maps. We clarified the definition of ROI as the following: “Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps.”

      2) Section 4.3.2, "In addition, some responses were seen in the somatosensory cortex during the seizure state, which may be due to the fact that the linear model used did not completely remove the effect of the seizure itself" What is the reason for the authors to make such comments?

      This claim was made because we saw similar trend of responses (deactivation) in F-contrast maps in the somatosensory cortex, when comparing “stimulation during ictal state” maps to "seizure map", leading us to assume that the effect of seizure was still apparent in the maps (even though “seizure only” states were used as nuisance regressors). However, as this claim is highly speculative, we have decided to delete this sentence in the manuscript.

      3) Abbreviations such as SPM, HRF, CBF, etc. are not defined in the manuscript.

      Definitions for these abbreviations were added.

      4) Supplementary information-AdEx mean-field model, 've and vi', e and i should be subscripted.

      Subscripts were added.

      Reviewer #2 (Recommendations For The Authors):

      Below are more detailed questions and concerns. Many questions are about the Methods, which seem to be written by a specialist. However, there are also questions about the experimental approach and conclusions.

      One of the strengths of the study is the use of fMRI and EEG. However, to allow rats to be still in the magnet, isoflurane was used, and then as soon as rats recovered they were imaged. However isoflurane has effects on the brain long after the rats have appeared to wake up. Moreover, to train rats to be still, repetitive isoflurane sessions had to be used. Repetitive isoflurane should have a control of some kind, or be discussed as a limitation.

      The repetitive use of isoflurane is indeed an important limiting factor that was not yet discussed in the manuscript. We have added the following sentences to the “Physiologic and methodologic considerations” section:

      “As the used awake habituation and imaging protocol didn’t allow us to avoid the usage of isoflurane during the preparation steps, we cannot rule out the possible effect of using repetitive anesthesia on brain function. However, duration (~15 min) and concentration of anesthesia (~1.5%) during these steps were still moderate, whereas extended durations (1-3 h) of either single or repetitive isoflurane exposures have been used in previous studies where long-term effects on brain function have been observed (Long II et al., 2016; Stenroos et al., 2021). Moreover, there was a 5-15 min waiting period between the cessation of anesthesia and initiation of fMRI scan, to avoid the potential short-term effects of isoflurane that has been found to be most prominent during the 5 min after isoflurane cessation (Dvořáková et al., 2022).

      An assumption of the study is that interictal periods are normal. However, they may not be. A control is necessary. One also wants to know how often GAERS have spontaneous spike-wave discharges (SWDs), what the authors call seizures. The reason is that the more common the SWDs, the less likely interictal periods are normal. It seems from the Methods that rats were selected if they had frequent seizures so many could be captured in a recording session. Those without frequent seizures were discarded.

      A good control would be a normal rat that has spontaneous SWDs, since almost all rat strains have them, especially with age and in males (PMID: 7700522). However, whether they are frequent enough might be a problem. Alternatively, animals could be studied with rare seizures to assess the normal baseline, and compared to interictal states in GAERS.

      We appreciate this concern raised by the Reviewer. Even though it would be interesting to study different strains and SWD frequency dependence, the aim of this study was to compare interictal vs ictal states in this specific animal model. We also understand that interictal periods could not necessarily model “normal” state and therefore went through the manuscript again to remove any claims referring to this.

      About the mechanisms of SWDs, the authors should update their language which seems imprecise and lacks current citations (starting on line 71):

      "Although the origin of absence seizures is not fully understood, current studies on rat models of absence seizures suggest that they arise from atypical excitatory-inhibitory patterns in the barrel field of the somatosensory cortex (Meeren et al. 2002; Polack et al. 2007) and lead to synchronous cortico-thalamic activity (Holmes, Brown, and Tucker 2004)."

      Some of the best explanations for SWDs that I know of are from the papers of John Huguenard. His reviews are excellent. They discuss the mechanisms of thalamocortical oscillations.

      We have reformatted the sentences discussing the mechanism of SWDs and included the explanations provided by manuscripts from Huguenard and McCafferty et al.: “Although the origin of absence seizures is not fully understood, current studies on rat models of absence seizures suggest that they arise from excitatory drive in the barrel field of the somatosensory cortex (Meeren et al. 2002; Polack et al. 2007, 2009, David et al., 2008) and then propagate to other structures (David et al., 2008) including thalamus, knowing to play an essential role during the ictal state (Huguenard, 2019). Notably, the thalamic subnetwork is believed to play a role in coordinating and spacing SWDs via feedforward inhibition together with burst firing patterns. These lead to the rhythms of neuronal silence and activation periods that are detected in SWD waves and spikes (McCafferty et al., 2018; Huguenard, 2019).”

      The following also is not precise:

      "Although seizures are initially triggered by hyperactive somatosensory cortical neurons, the majority of neuronal populations are deactivated rather than activated during the seizure, resulting in an overall decrease in neuronal activity during SWD (McCafferty et al. 2023)." What neuronal populations? Cortex? Which neurons in the cortex? Those projecting to the thalamus? What about thalamocortical relay cells? Thalamic gabaergic neurons?

      Lines 85-8: "In addition, a previous fMRI study on GAERS, which measured changes in cerebral blood volume, found both deactivated and activated brain areas during seizures (David et al. 2008). Which areas and conditions led to reduced activity? Increased activity? How was it surmised?

      "concurrent stimuli and therefore could contribute to the alterations in behavioral responsiveness" - This idea has been raised before by others (Logthetis, Barth). Please discuss these as the background for this study.

      The particular section was modified to the following:

      “Previous results on GAERS have indicated that, during an absence seizure, hyperactive electrophysiological activity in the somatosensory cortex can contribute to bilateral and regular SWD firing patterns in most parts of the cortex. These patterns propagate to different cortical areas (retrosplenial, visual, motor and secondary sensory), basal ganglia, cerebellum, substantia nigra and thalamus (David et al. 2008; Polack et al. 2007). Although SWDs are initially triggered by hyperactive somatosensory cortical neurons, neuronal firing rates, especially in majority of frontoparietal cortical and thalamocortical relay neurons, are decreased rather than increased during SWD, resulting in an overall decrease in activity in these neuronal populations (McCafferty et al. 2023). Previous fMRI studies have demonstrated blood volume or BOLD signal decreases in several cortical regions including parietal and occipital cortex, but also, quite surprisingly, increases in subcortical regions such as thalamus, medulla and pons (David et al., 2008; McCafferty et al., 2023). In line with these findings, graph-based analyses have shown an increased segregation of cortical networks from the rest of the brain (Wachsmuth et al. 2021). Altogether, alterations in these focal networks in the animal models of epilepsy impairs cognitive capabilities needed to process specific concurrent stimuli during SWD and therefore could contribute to the lack of behavioral responsiveness (Chipaux et al. 2013; Luo et al. 2011; Meeren et al. 2002; Studer et al. 2019), although partial voluntary control in certain stimulation schemes can be still present (Taylor et al., 2017).”

      Please discuss the mean-field model more. What are its assumptions? What is its validation? Do other models also provide the same result?

      We have now extended the discussion and explanation of the mean-field model, both in the main text and in the Supplementary information. The mean-field model is a statistical tool to estimate the mean activity of large neuronal populations, and as such its main assumptions are centered around the size of the population analyzed and the characteristic times of the neuronal dynamics under study. It has been shown that the formalism is valid for characteristic times of neuronal dynamics with a lower bond in the order of few milliseconds and with population size of in the order thousands of neurons (see El Boustani and Destexhe, Neural computation 2009; and Di Volo et al, Neural computation 2019), with both conditions satisfied in the simulations made for this work. Regarding the validation, the model has been extensively validated and used for simulating different brain states (Di Volo et al. 2009; Goldman et al. 2023), signal propagation in cortical circuits (Zerlaut et al, 2018) and to perform whole-brain simulations (Goldman et al, 2023). The standard validation of the mean-field implies its comparison with the activity obtained from the corresponding spiking neural network. For completeness we show in Author response image 1 an example of the SWD type of dynamics obtained from a spiking neural network together with the one obtained from the mean-field. This figure has been added now to the Supplementary information of the paper. Regarding the extension of the results to other models, we think that the generality of our results is an interesting point from our work. The main results obtained from our simulation are related with the responsiveness of the system during two different type of ongoing activity: in the interictal state there is a significant variation on the ongoing activity evoked by the stimulation that is propagated to other regions, while in the SWD state the evoked activity is overshadowed by the ongoing activity which imposes a strong limit to the responsiveness of the system and the propagation of the signal. In this sense, the results of the simulations are very general and should be extensible to other models. Of course, the advantage of using a model like ours is the capability of reproducing the different states, its applicability to large scale simulations, and the fact that it is built from biologically relevant single-cell models (AdEx).

      Author response image 1.

      Comparison of the SWD dynamics in the mean-field model and the underlying spiking-neural network of AdEx neurons. A) Raster plot (top) and mean firing rate (bottom) from an SWD type of dynamics obtained from the spiking- network simulations. The network is made of 8000 excitatory neurons and 2000 inhibitory neurons. Neurons in the network are randomly connected with probability p=0.05 for inhibitory-inhibitory and excitatory-inhibitory connections, and p=0.06 for excitatory-excitatory connections. Cellular parameters correspond to the ones used in the mean-field, with spike-triggered adaptation for excitatory neurons set to b=200pA. We show the results for excitatory (green) and inhibitory (red) neurons. B) Mean-firing rate obtained from a single mean-field model. We see that, although the amplitude of oscillations is larger in the spiking-network, the mean-field can correctly capture the general dynamics and frequency of the oscillations.

      Line 11: "rats were equally divided by gender." Given n=11, does that mean 5 males and 6 females or the opposite?

      Out of 11 animals, 6 were males, and 5 females. This is now mentioned in the manuscript.

      What was the type of food?

      Type of food was added to the manuscript (Extrudat, vitamin-fortified, irradiated > 25 kGy)

      What were the electrodes?

      This was provided in the manuscript. Carbon fiber filament was produced by World Precision Instruments. The tips of this filament were spread to brush-like shape to increase the contact surface above the skull.

      "low noise zero echo time (ZTE) MRI sequence"- please explain for the non-specialist or provide references.

      Reference added.

      Lines 148-150: "The length of habituation period was selected based on pilot experiments and was sufficient for rats to be in low-stress state and produce absence seizures inside the magnet." How do the authors know the rats were in a low-stress state?

      This claim was based on two factors. At the end of the habituation protocol, the motion of animals was considerably decreased according to previous study using similar restraint/habituation protocol (DOI: 10.3389/fnins.2018.00548). In this study the decreased motion is also correlated with decreased blood corticosterone levels which reduced to baseline levels (indicating low-stress state) after 4 days of habituation. Another factor is when epileptic rodents are continuously recorded for 24h, most SWDs occur during a state of passive wakefulness or drowsiness (Lannes et al. 1988, Coenen et al. 1991) . Either way, as we don’t have a way to provide direct evidence of low-stress state, we modified the sentence to the following:

      “The length of habituation period was selected based on pilot experiments to provide low-motion data therefore giving rats a better chance to be in a low-stress state and thus produce absence seizures inside the magnet.”

      Lines 150-2: "Respiration rate and motion were monitored during habituation sessions using a pressure pillow and video camera to estimate stress level." What were the criteria for a high stress level?

      Criteria for high (or low) stress levels were based mostly on motion levels according to previous study (DOI: 10.1016/s0149-7634(05)80005-3). Still, as we didn’t measure direct measures of stress, we modified the sentence to the following:

      “Pressure pillow and video camera were used to estimate physiological state, via breathing rate, and motion level, respectively.”

      Lines 152-3: "During the last habituation session, EEG was measured to confirm that the rats produced a sufficient amount of absence seizures (10 or more per session)." If 10 min, the rats would basically be seizing the entire session, leading to doubt about what the interictal state was.

      The length of the last habituation session was 60min and the fMRI scan 45min. Given that rats produced ~40-50 seizures during fMRI scan, on average they produced ~1 seizures/min, and one seizure lasting on average of 5-6s, giving ~45s periods for interictal states. 10 or more seizures were used as a threshold to give statistically meaningful findings based on pilot experiments.

      Line 153: "Total of 2-5 fMRI experiments were conducted per rat within a 1-3-week period." What was the schedule for each animal? A table would be useful. If it varied, how do the authors know this was justified?

      Please see Figure 1–figure supplement 2 for examples of habituation timelines for individual rats:

      We found an error when stating 2-5 fMRI experiments, but it should be 3-5 fMRI experiments. This was corrected. We had an aim to acquire 12-14 sessions per stimulation condition and once a sufficient number of sessions were acquired, part of the animals was not used further. Two of the animals that were found to have good quality EEG and produced sufficient amounts of SWDs were kept, and briefly retrained for later second stimulation condition experiments. This was done to replace animals that needed to be excluded in the second stimulation condition due to bad quality EEG or lost implant. Extended use of some animals could theoretically bring slight variation to results but could actually be an advantage as animals were already well trained providing low-motion data.

      "Before and after each habituation session, rats were given a treat of sugar water and/or chocolate cereals as positive reinforcement. " How much and what was the concentration of sugar water; chocolate cereal?

      Rats were given 3 chocolate cereals and/or 1% sugar water. This was added to the manuscript now.

      Line 188: "We relied on pilot calibration of the heated water to maintain the body temperature" Please explain.

      Sentence was clarified:

      “We relied on pilot calibration of the temperature of heated water circulating inside animal bed to maintain the normal body temperature of ~37 °C"

      Line 190: "After manual tuning and matching of the transmit-receive coil, shimming and anatomical imaging" Please explain for the non-specialist.

      Sentence was simplified:

      “After routine preparation steps in the MRI console were done"

      Lines 199-201: "Anatomical imaging was conducted with a T1-FLASH sequence (TR: 530 ms, TE: 4 ms, flip angle 196 18{degree sign}, bandwidth 39,682 kHz, matrix size 128 x 128, 51 slices, field-of-view 32 x 32 mm², resolution 0.25 x 0.25 x 0.5 mm3). fMRI was performed with a 3D ZTE sequence (TR: 0.971 ms, TE: 0 ms, flip angle 4{degree sign}, pulse length 1 µs, bandwidth 150 kHz, oversampling 4, matrix size 60 x 60 x 60, field-of-view 30 x 30 x 60 mm3 , resolution of 0.5 x 0.5 x 1 mm3 , polar under sampling factor 5.64 nr. of projections 2060 resulting to a volume acquisition time of about 2 s). A total of 1350 volumes (45 min) were acquired." Please explain for the non-specialist.

      These technical parameters are provided for the sake of repeatability. Section was however clarified as the following and citation was added:

      Anatomical imaging was conducted with a T1-FLASH sequence (repetition time: 530 ms, echo time: 4 ms, flip angle 18°, bandwidth 39,682 kHz, matrix size 128 x 128, 51 slices, field-of-view 32 x 32 mm², spatial resolution 0.25 x 0.25 x 0.5 mm3). fMRI was performed with a 3D ZTE sequence (repetition time: 0.971 ms, TE: 0 ms, flip angle 4°, pulse length 1 µs, bandwidth 150 kHz, oversampling 4, matrix size 60 x 60 x 60, field-of-view 30 x 30 x 60 mm3, spatial resolution of 0.5 x 0.5 x 1 mm3, polar under sampling factor 5.64, number of projections 2060 resulting to a volume acquisition time of about 2 s (look Wiesinger & Ho, 2022 for parameter explanations)). A total of 1350 volumes (45 min) were acquired.

      "Visual (n=14 sessions, 5 rats) and somatosensory whisker (n=14 sessions, 4 rats)" - Please explain how multiple sessions were averaged for a single rat. Please justify the use of different numbers of sessions per rat.

      All the sessions belonging to the same stimulus scheme (multiple sessions per rat) were put at the once as sessions in SPM analysis together with all the stimulus conditions belonging to these sessions. Justifications for using a different number of sessions per rat, were given above.

      Lines 205-206: "For the visual stimulation, light pulses (3 Hz, 6 s total length, pulse length 166 ms) were produced by a blue led, and light was guided through two optical fibers to the front of the rat's eyes. What wavelength of blue? Why blue? Is the stimulation strong? Weak?

      Wavelength was 470 nm and brightness 7065 mcd with a current of 20mA. Blue was selected as it is in the frequency range that rat can differentiate and this color has been used in previous literature ( https://doi.org/10.1016/j.neuroimage.2020.117542, https://doi.org/10.1016/j.jneumeth.2021.109287)

      Line 212: "Stimulation parameters were based on previous rat stimulation fMRI studies to produce robust responses" What is a robust response? One where a lot of visual cortical voxels are activated?

      Sentence was corrected as the following:

      “Stimulation parameters were based on previous rat stimulation fMRI studies and chosen to activate voxels widely in visual and somatosensory pathways, correspondingly.”

      Line 245: "Seizures were confirmed as SWDs if they had a typical regular pattern, had at least double the amplitude compared to baseline signal..." What was the "typical" pattern? What baseline signal was it compared to? Was the baseline measured as an amplitude? Peak to trough?

      Sentence was corrected to the following:

      “Seizures were confirmed as SWDs if they had a typical regular spike and wave pattern with 7-12 Hz frequency range and had at least double the amplitude compared to baseline signal. All other signals were classified as baseline i.e. signal absent of a distinctive 7-12 Hz frequency power but spread within frequencies from 1 to 90 Hz.”

      "using rigid, affine, and SYN registrations" Please explain for the non-specialist.

      Corrected as the following:

      “using rigid, affine (linear) and SYN (non-linear) registrations”

      Line 274-5: "However, there were also intermediate cases where the seizure started or ended during the stimulation block (Figure 1 - Figure Supplement 1). These intermediate cases were modeled as confounds" Why confounds? They could be very interesting because the stimulation may not be affected if timed at the end of the seizure. What was the definition of start and end? Defining the onset and end of seizures is tricky.

      We agree that these cases are also highly interesting. Indeed, all the intermediate cases were also analyzed separately but not included in the manuscript (other than the case when stimulation immediately ended a seizure) as no statistical findings were found when comparing these cases to the baseline. E.g. for the case when stimulation was applied towards the end of seizure, it provided weakened responses but still stronger compared to case when stimulation was applied fully during a seizure (indicating some responsiveness after the cessation of seizure). As these intermediate cases led to results with higher variance, we considered them as confounds in the general linear model (i.e. reducing unwanted variance from the results of interests).

      Definition of onset and end of seizure can be difficult in some cases. When looking at the signal itself, especially towards the end of seizure the amplitude of SWDs can get weaker and thus the shift from seizure to baseline signal can be more problematic to differentiate. However, when looking at the power spectrum the boundaries were more easily detectable. Thus, in the definitions of onsets and ends of seizure we relied on both the signal and power spectrum (stated in the manuscript).

      "in the SPM analysis" Please explain for the non-specialist.

      Definition of SPM together with a link to software site was added.

      Line 276: "of fMRI data (see 2.5.3.) and thus explained variance that was not accounted for by the main effects of interest. " Please clarify.

      Clarified as:

      “Intermediate cases, where the seizure started or ended during the stimulation block (Figure 1–figure supplement 1), were considered as confounds of no-interest in the SPM analysis of fMRI data and the explained variance caused by the confounds were reduced from the main effects of interests”

      Line 277: "Additionally, a contrast..." What is meant?

      This chapter in 2.5.3. was modified as a whole to be more clear.

      Line 278-9: "...was given to two cases: i) when stimulation ended a seizure (0-2 s between stimulation start and seizure end)..." Again, how is the seizure onset and end defined?

      Look comment above.

      Lines 281-2: "Stimulations that did not fully coincide with a seizure were considered as nuisance regressors in the second level analysis." What is meant by nuisance regressor?

      Reference to SPM 12 manual was given for technical terms referring to analysis software.

      Lines 283-8: "Motion periods were also included as multiple regressors (not convolved with a basis function) to be used as nuisance regressors. Stimulations that coincided with a motion above 0.3% of the voxel size were not considered stimulation inputs. Stimulation and seizure inputs were convolved with "3 gamma distribution basis functions" (i.e. 3rd 285 order gamma) in SPM (option: basis functions, gamma functions, order: 3), to account for temporal and dispersion variations in the hemodynamic response. The choice of 3rd order gamma was based on the expectation that time-to peak and shape of HRFs of seizure could vary across voxels (David et al. 2008)." Please explain the technical terms.

      Reference for SPM 12 manual was given for technical terms referring to analysis software, and HRF was defined.

      "BAMS rat connectome" - Please explain the technical terms.

      Modified as:

      “…connection matrix of the rat nervous system (BAMS rat connectome, Bota, Dong, and Swanson 2012).”

      Results

      After removing problematic animals and sessions, was there sufficient power? There probably wasn't enough to determine sex differences.

      After removing problematic sessions, we found statistically significant results (multiple comparison corrected) results in both activation maps, and hemodynamic responses. To determine sex differences, there were not enough animals for statistical findings (p>0.05).

      Figure 2 - I don't understand "tSNR" here. What is the point here?

      B vs C. Are these different brain areas or the same but SNR was adjusted?

      D. Where is FD explained? I think explaining what the parts of the figure show would be helpful.

      tSNR, the temporal signal-to-noise ratio, demonstrates the behavior of noise through time. Readers who are planning to mimic the used awake fMRI protocol together with the single loop coil, might be interested on data quality aspect, and ability for the coil to capture signal from noise, as it is one of the most important factors in fMRI designs where small signal changes have to be distinguished from the background noise.

      B and C illustrate the same brain area, but B was acquired with high resolution anatomical scanning (T1 FLASH), and C was acquired with low resolution ZTE scanning. We clarified the figure legend to the following:

      “…spatial signal-to-noise ratios of an illustrative high resolution anatomical T1-FLASH (B), and low resolution ZTE image (C)

      FD was explained in section 2.5.1. Some parts of the explanation were clarified: “Framewise displacement (FD) (Figure 2E) was calculated as follows. First, the differential of successive motion parameters (x, y, z translation, roll, pitch, yaw rotation) was calculated. Then absolute value was taken from each parameter and rotational parameters were divided by 5 mm (as estimate of the rat brain radius) to convert degrees to millimeters (Power et al. 2012). Lastly, all the parameters were summed together.”

      Table 1 has no statistical comparisons.

      Table 1 is purely an illustration of stimulation and seizure occurrence. There is no specific interest to compare stimulation types (in what state of seizure it occurred) as it does not provide any meaningful inferences to the study.

      Statistical activation maps - it is not clear how this was done.

      Creation of statistical maps are explained in section 2.5.3.

      Line 384-5: "In addition, some responses were observed in the somatosensory cortex during a seizure state, probably due to incomplete nuisance removal of the effect of the seizure itself by the linear model used." I don't see why the authors would not suggest that the result is logical given that stimuli should activate the somatosensory cortex.

      Sentence was modified as the following:

      “In addition, responses were observed in the somatosensory cortex during a seizure state”

      Fig 3 "F-contrast maps." Please explain.

      Creation of statistical maps are explained in section 2.5.3.

      HRF- please define. The ROI selection is unclear - it "was based on statistical differences seen in activation maps." But how were ROIs drawn? Also, why were HRFs examined at the end of seizures?

      HRF was defined, and definitions of HRF and ROI were moved from results section 3.3. to method section 2.5.3.

      Definition of ROI was clarified:

      “Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps.”

      HRFs were estimated additionally at the end of seizure as it was specifically interesting to study brain state shifts from ictal to interictal. This shift was also providing us statistically significant findings in means that brain responses differed from ictal stimulation.

      Line 421: "Interestingly, the response amplitude was higher when the stimulation ended a seizure compared to when it did not" Why is this interesting?

      Word “interestingly” was changed to “additionally” to avoid any inferences in the results section.

      Line 427: "Notably, HRFs amplitudes were both negatively and positively signed during the ictal 427 state, depending on the brain region." Why is this notable?

      Word “notably” was removed to avoid any inferences in the results section.

      Please explain the legends of Figures 4 and 6 more clearly.

      Figure 4, and figure 4 – figure supplement 1, legends were clarified:

      “HRFs was calculated in selected ROI, belonging to visual or somatosensory area, by multiplying gamma basis functions (Figure 1–figure supplement 1, B) with their corresponding average beta values over a ROI and taking a sum of these values.”

      Using the comments above as a guide, please revise the Discussion to be more precise and more clear about what was shown and what can be concluded in light of limitations. Please ensure the literature is cited where appropriate.

      Some parts of the discussion and conclusion sections were modified.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      Formatting: fMRI maps in Figures 3 and 5 should be more clearly labeled, indicating anterior and posterior directions on all images, and the cross sections should be enlarged to enable anatomical areas to be more clearly differentiated.

      Anterior and posterior directions were added, and cross sections were enlarged.

      The Methods section 2.41 and other places in the text, and Figure 2 - Figure Supplement 1 say that there was less artifact on the EEG with ZTA than with GE-EPI. However the EEG shown in Figure 2 - Figure Supplement 1 Part C shows much more artifact in the left (ZTE) trace than the right (GE-EPI) trace. This apparent contradiction should be resolved.

      The figure was actually demonstrating the relative change to the signal when MRI sequences were on, and by this standard, the ZTE produced both less amplitude and frequency changes than EPI. In the example figure, the baseline fluctuations in the EEG trace in the left were higher in amplitude than in the right, and this could potentially lead to misconception of ZTE producing more noise. Figure legend was clarified to highlight relative change:

      “ZTE also caused relatively less artificial noise on EEG signal, keeping both amplitude of the signal and frequencies relatively more intact, which improved live detection of absence seizures.”

      Figure 2 - Supplement 1, part B horizontal axis should provide units.

      Units were added.

      Figure 2 - Supplement 1, legend last sentence says arrows mark the beginning of each "sequence." Is this a typo and should this instead say "each seizure"?

      Should state “each fMRI sequence” which was corrected.

      Line 307, Methods "to reveal brain areas where ictal stimulation provided higher amplitude response than interictal" - should this be reversed, ie weren't the authors analyzing a contrast to determine where interictal signals were higher than ictal signals?

      This should be reversed, and was corrected, thank you for noting this.

      Figure 6 - Figure Supplement 1, the scales are very different for many of the plots so they are hard to compare. Especially in the ictal periods (D, E, F) it is hard to see if any changes are happening during ictal stimulation similar to interictal stimulation due to very different scales. The activity related to SWD is so large that it overshadows the rest and perhaps should be subtracted out.

      We point out that Figure 6 - Figure Supplement 1 reproduces with a higher level of detail the results shown of Figure 6 from the main text, where all signals are plotted in the same scale. The difference between scales used in this figure is intended, and its purpose is to show and highlight the large differences observed on the ongoing activity and the evoked response between the two states (ictal and interictal). In interictal periods the ongoing activity is characterized by fluctuations around a baseline level whose variance is highly affected by the application of the stimulus. On the contrary, ictal periods are characterized by large oscillations, with periods of high and synchronized activity followed by periods of nearly no activity, where the effect of the stimulus on the dynamics is overshadowed by the ongoing dynamics (both from local and from afferent nodes) as the referee mentions, and which imposes a strong limit to the responsiveness of the system and the propagation of the signal.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript entitled 'Safb1 regulates cell fate determination in adult neural stem cells by enhancing Drosha cleavage of NFIB mRNA' by Iffländer et al, represents a solid piece of work addressing a non-canonical function of Drosha on NFIB mRNA processing via a newly identified Drosha partner, Safb1. The authors provide particularly systematic and convincing evidence on the biochemical interactions among the key players in this cascade. However, the significance of these interactions for NSC fate determination is not adequately supported by the data, hence, I have some remarks that would need to be addressed in order to clarify the impact of these events on NSC biology.

      1) One of my main concerns is related to the nature of the DG NSCs used in all in vitro assays. The authors refer to their previous work on how these cells are isolated using a Hes5 mouse reporter line. However, both recent scRNAseq data (http://linnarssonlab.org/dentate/ from Hochgerner et al) and the authors' own immunostainings (Fig. 7A), clearly show that Hes5 does not label only adult NSCs in the DG, but also (if not primarily) astrocytes. Considering that the initial cultures could contain a high proportion of mature astrocytes, most of the major conclusions and hypotheses should be reformulated.

      We thank the reviewer for their comment. We think that there is a misunderstanding about how the DG neural stem cells were isolated and cultured. In this manuscript we did not use the Hes5::GFP allele to isolate the stem cells. We isolated DG neural stem cells from C57Bl6 mice according to the protocol of Babu et al. (Babu et al. 2007 doi: 10.1371/journal.pone.0000388) and maintained and differentiated these according to our previous manuscripts (Ronaldo et al. 2016). This was not clear in the methods section of the original manuscript and, therefore, we have added the reference Babu et al. In order to address potential contamination with astrocytes, we have added images of the stem cells and their progeny immunostained with astrocytic markers (GFAP and S100b) in undifferentiated and differentiated states. These new data show that these neurogenic cells and their progeny do not express astrocytic markers until differentiation is induced.

      2) Along these lines, Safb1 expression is quite widespread in the mouse DG (Fig. 7A) and does not display any specificity towards any type of progenitor cells compared to its expression in DGCs within the GCL. The authors should discuss this and integrate this expression information into their conclusions and interpretations, highlighting all pertinent limitations.

      We appreciate and agree with the reviewer’s comment. SAFB1 is indeed broadly expressed by most if not all cells in the hippocampus. We quantified levels of SAFB1 expression across progenitors, astrocytes and neurons in the adult DG and in the SVZ, and show that SAFB1 levels differ across different neural stem cell populations and neural cells. We believe that our data show both in vitro and in vivo that the levels of SAFB1 are critical for determining the function of SAFB1 in regulating neural stem cell fate. We also showed that elevating SAFB1 levels in SVZ-derived neural stem cells suppresses their differentiation into oligodendrocytes, This we have made clearer in the text. However, how cells sense the levels of SAFB1 remains to be shown and it is difficult to speculate on the mechanism.

    1. Author Response

      Reviewer #1 (Public Review):

      In this analysis derived from the BLADE study, a Phase IV investigation using the LHRH antagonist Degarelix, the authors revealed additional insights into the relationship between FSH and body composition.

      The primary strength of the study lies in its prospective nature and the utilization of human subjects.

      We thank the reviewer for the positive evaluation.

      However, some weaknesses exist in the study.

      First, the authors presented results from a simple correlation study without accounting for potential confounding factors in fat metabolism. Particularly, readers may be intrigued to understand how testosterone or estradiol interact with FSH in relation to fat mass.

      As for the evaluation of circulating levels of testosterone and estradiol, unfortunately the protocol did not include the dosage for these hormones. The evaluation of testosterone, in particular, would have required mass photometry as the values of testosterone during therapy with degarelix are reduced below the sensitivity of the methods used in clinical practice. Therefore, the correlation/association analysis between testosterone and body composition would not have been reliable and would not have been useful for the study. All patients were considered to have hypogonadism due to the significant decrease in PSA values and the limited testosterone data available.

      The inverse relationship between ALBI/FBM was previously documented in a paper by the same group (Palumbo et al, Prostate Cancer Prostatic Dis 2021). In that earlier publication, the authors reported no correlation between FSH and lean mass or ALBI, suggesting the significance of the correlation between FSH and ALBI/FBM arising from changes in fat body mass-a factor somehow not included in the prior paper, not necessarily from sarcopenia.

      The referee is correct, as there is no correlation between lean mass and FSH, nor between lean mass variations and FSH variations. The correlation between ALMI/FBM and FSH is mostly due to the effect on fat mass. The text now includes a statement that emphasizes this concept (see Discussion page 8, lines 19-22).

      Reviewer #2 (Public Review):

      This manuscript reports the results of an ancillary study of a prospective trial assessing the effects of androgen deprivation therapy (ADT) with Dagarelix (a GnRH antagonist) on body composition in patients with prostate cancer. An interesting relationship between FSH levels, that were suppressed by Dagarelix treatment, and body composition parameters (particularly fat body mass) was described after 12 months of therapy. Therefore, the authors conclude that FSH could be a promising marker to monitor the risk of sarcopenic obesity and cardiovascular complications in prostate cancer patients undergoing ADT. As acknowledged by the Authors the main limitation of the study is the limited sample of patients. However, since testosterone levels were not assessed it is not possible to firmly establish whether the changes in fat mass observed with treatment are directly or indirectly associated with a reduction in FSH (and therefore in the latter case mediated by testosterone). Moreover, it is not clear whether the effect of the change in FSH levels during the study and the body composition parameters achieved at 12 months was evaluated (instead of assessing the relationship between FSH changes and changes in body composition parameters). Finally, tests on bone muscle mass and strength were not performed, so the hypothesis that variation of FSH levels in prostate cancer patients in ADT may affect sarcopenia remains speculative.

      We appreciate the reviewer's positive assessment of our manuscript. We evaluated the correlation between FSH changes and body composition values after 12 months of Degarelix, as requested by the reviewer. No significant correlation was observed, see the attached table. Therefore we have decided not to insert this last statistical analysis in the revised paper.

    1. Author Response

      Reviewer #1 (Public Review):

      Using a HFD mouse model, the authors examined the H3K4me3 mark in sperm and placental tissues followed by correlation to the transcriptomic changes in the placental tissues of the male and female offspring. The hypothesis that the authors tried to test was that sperm histone epimutations affect placental function, thereby leading to metabolic disorders in offspring. The strength of this work includes the interesting idea and the initial data generated. However, the entire study remains purely correlative without any validation experiment to support the correlation. The conclusion needs to be further supported by bigger sample size and more functional analyses demonstrating the causal relationship among the histone epimutations detected, the dysregulated mRNA expression in the placenta, and the phenotypes in offspring.

      Functional data: We appreciate that we should have emphasized and written more clearly that we had indeed phenotyped the placentas and offspring metabolic health from the same model we derived the placenta tissue from as we reported in (Jazwiec et al., 2022)(PMID: 35377412). This was referenced in our submitted manuscript (Lines 105-107; 131-133; 135-139; 147-150; 232-235; 270-273; 297-300; 384-386; 433-435; 441-448; 507-514). We have made this more apparent in the manuscript by expanding our description of the offspring phenotypes in the introduction and clarified that it was from this model that the placenta’s used in this study were derived from (Jazwiec et al., 2022) (PMID: 35377412).

      Regarding effect and sample size: It appears that on review the animal numbers used for the ChIP-seq were confused with the number of replicates by the reviewers. These details were in Supplementary file 1a. There were 3 replicates per experimental group and each replicate contained sperm from pooled samples that was equalized in cell number and comprised of sperm from n=7 control males, or n=16 HFD males. For the RNA-seq n=4 placentas were used from each experimental group from both males and females for a total N of 16. Although the sample size is moderate, we followed the Canadian Council of Animal Care guideline which calls for the use of the lowest animal number that elicits significant effects (CCAC guidelines p6 “Consideration must also be given to reduction, to determine the fewest number of animals appropriate to provide valid information and statistical power, while still minimizing the welfare impact for each animal”).

      Validation: We used a high standard of computational validation and visualization strategies, to ensure confidence in genomic data. This also allowed for a comprehensive understanding of the biological and physiological impacts of paternal obesity on the sperm epigenome and placenta transcriptome. In our experimental design we also included biological and technical replicates. Together these methods provide robustness checks of the experimental data and support our conclusions. These are the validation strategies we used:

      Technical and experimental validation

      • We evaluated the quality of sequencing data using metrics of read quality, alignment and coverage. These are summarized in Supplementary file 1a.

      • Visualized and performed statistical analysis of data to check for anomalies and discrepancies, Pearson correlation analysis shown on heatmap to look for variance and patterns in samples- all here highly correlated (Figure 2 – Figure supplement 1 B and Figure 4 – Figure supplement 1 A). We checked for batch effects and normalized the data (Figure 4 – Figure supplement 1 B) we used PCA plot analysis as a second check for sample behaving oddly (Figure 2 – Figure supplement 1 C and Figure 4 – Figure supplement 1 C).

      • We used a deconvolution approach to improve the biological meaning of our bulk RNA-seq data (Figure 6, Figure 5 – Figure supplement 1 and 2).

      • Performed functional enrichment analysis to gain insight into biological functions, pathways, and genome ontology and visualized individual regions identified to be altered as a confirmation (Figure 2 D and 2 E; Figure 4 E and F; Figure 6, Figure 2 – Figure supplement 1 E; Figure 3 – Figure supplement 1). Comparison to external data sets:

      • We compared our data with external data sets using the same tissues and cell and to our prior studies: a) We compared ChIP-seq data from this obesity model with our former obesity ChIP-seq data (Figure 2 – Figure supplement 1); b) re-analyzed and compared placenta RNA-seq data from an in utero exposure hypoxia model that shared similar offspring and placenta phenotypes as we observed in the obesity model (Figure 6 and Figure 6 – Figure supplement 1).

      • We used a deconvolution approach to improve the biological meaning of our bulk RNA-seq data (Figure 6, Figure 5 – Figure supplement 1 and 2). Statistical Significance and False Discovery Rate (FDR):

      • We applied statistical tests and multiple testing corrections to reduce the likelihood of false positives (See also response 1 for additional testing added to the revised manuscript)

      Causation versus correlation: We agree that the relationship between the sperm epigenome and placenta transcriptome is correlative, however this is the current state of the field for studies of paternal epigenetic transmission of environmental information. To take this study to the point where causation can be implied would require the generation of a sperm epigenome edited mouse model where we target genes implicated in placental function. Indeed, this targeting approach is well underway in our research program.

      Reviewer #2 (Public Review):

      This study follows up on previous work from this group, and others, relating paternal diet to changes in sperm epigenetics, and offspring phenotypes. The authors focus on paternal diet (high-fat diet versus a control chow), sperm chromatin, and molecular changes in the placenta associated with offspring development.

      The text is well written and the figures are generally well presented and clear. The sperm epigenetic analyses and analysis of the placenta epigenetics and gene expression are generally well performed. The study provides new insight into how paternally mediated intergenerational epigenetic inheritance could involve placenta-embryo signaling.

      A major weakness is that the high-fat diet used was from a different manufacturer than the control (lower fat) diet. Therefore, it is difficult to judge whether the effects are due to a change in fat levels, or the many other molecules that are likely to differ in chow between different manufacturers. Other weaknesses include lack of methodological detail in parts, low n values for some experiments, and the need for more mechanistic data.

      Diets: It is worth reminding that we are studying the effects of obesity and not diet. Indeed, HFD induces metabolic dysfunction while the control does not. Although it is fair to point out that the composition of the control diet should be kept in mind, considering the desired outcomes within the scope of the study, the diets elicited the desired phenotypic effects serving as a model for obesity. We see this experimental design as a strength, as in this study we compared this model to our previous published obesity model (Pepin, Lafleur, Lambrot, Dumeaux, & Kimmins, 2022) (PMID: 35183795), and there was significant overlap in the regions of differential enrichment detected between both models even though they were conducted in different research settings, with different mouse substrain and different diet combinations. In our opinion this demonstrates that we are measuring robust effects of paternal obesity that can be replicated under different conditions. This comparative study design has been lacking in the field of epigenetic inheritance.

      Animal numbers and replicates: It appears that on review the animal numbers used for the ChIP-seq were confused with the number of replicates by the reviewers. These details were in Supplementary file 1a. There were 3 replicates per experimental group and each replicate contained sperm from pooled samples that was equalized in cell number and comprised of sperm from n=7 control males, or n=16 HFD males. For the RNA-seq n=4 placentas were used from each experimental group from both males and females for a total N of 16. Although the sample size is moderate, we followed the Canadian Council of Animal Care guideline which calls for the use of the lowest animal number that elicits significant effects (CCAC guidelines p6 “Consideration must also be given to reduction, to determine the fewest number of animals appropriate to provide valid information and statistical power, while still minimizing the welfare impact for each animal”).

      Whilst the authors may have achieved their aims, more data is needed to inform a potential mechanism.

      It is difficult in studies on paternal epigenetic inheritance to attribute a mechanism and we agree that the relationship between the obesity altered sperm epigenome and the placenta abnormalities are correlative. However, the novelty in our study is that we postulate a new mechanism for paternal transmission of metabolic disease that implicates the placenta and demonstrate this via an altered placenta transcriptome and placenta developmental abnormalities described here and in our previous paper on this model ((Jazwiec et al., 2022); PMID: 35377412). The next steps for the field to address causation/mechanism requires generation of a sperm epigenome edited mouse model where we induce and track histone methylation changes at specific genes to the tissues in the next generation. Indeed, this targeting approach is underway in our research program.

      Reviewer #3 (Public Review):

      This study represents a useful addition to the authors' previous study examining the effects of paternal high-fat diet on offspring metabolism and gene expression in offspring (PMID: 35183795). It differs from the previous study in some of the details of the experimental model (age of sire when exposed to the diet manipulation, mouse substrain, and the nature of the control diet) and the results are largely in line with previous findings. The major finding is that many genes at which sperm H3K4me3 signal is altered also have altered expression in the placenta; some of these genes are paternally imprinted, providing a paternal-specific epigenetic signature. Strengths of the study include establishment of an important dataset correlating the sperm epigenome with gene expression in placental tissue, leading to an interesting and provocative conclusion. Weaknesses include a relatively superficial analysis of the dataset, revealing broad patterns but few specific conclusions, reliance on correlative analysis to draw conclusions, and absence of validation studies. Deconvolution analysis of bulk RNA-seq data helps to account for differences in cell composition between placental datasets, but does not add additional insight toward the central question of how sperm epigenetic state contributes to offspring gene expression. Overall the advance over previous work is relatively small.

      Specific points:

      1) The analysis as it stands is limited. To compare sperm H3K4me3 and placental expression, numbers of overlapping genes are provided, but no statistical analysis is done to indicate the significance of the overlap.

      Fisher’s exact test to overlap paternal obesity-associated differentially enriched regions of H3K4me3 deH3K4me3) with female and male placenta differentially enriched genes (Figure 4 – Figure supplement 1 Di and ii).

      2) There is little direct connection to biological systems or validation of differential enrichment/expression analysis. Gene ontology enrichments for genes differentially enriched for H3K4me3 in sperm or differentially expressed in placenta (broken up by sex) are performed, but the biological significance of these categories is not clear.

      We used a high standard of computational validation and visualization strategies, to ensure confidence in genomic data. This also allowed for a comprehensive understanding of the biological and physiological impacts of paternal obesity on the sperm epigenome and placenta transcriptome. In our experimental design we also included biological and technical replicates. Together these methods provide robustness checks of the experimental data and support our conclusions. The validation strategies we used are detailed in response 17.

      We revised the text to expand discussion on the observed enriched gene ontology terms, as well as the biological significance and functions of the genes we refer to in this section:

      Lines 222-227: “The placenta is a rich source of hormone production, is highly vascularized, and secretes neurotransmitters (Hemberger, Hanna, & Dean, 2020; Rosenfeld, 2021). Disruption in these functions is suggested in the significantly enriched pathways that included genes involved in the transport of cholesterol, angiogenesis, and neurogenesis (Figure 4 C-D, Supplementary file 1e-f). Other significantly enriched processes included genes implicated in nutrient and vitamin transport (Figure 4 C-D).”

      Lines 441-463:“Many of the DEGs in the paternal obese-sired placentas were involved in the regulation of the heart and brain. This is in line with paternal obesity associated to the developmental origins of neurological, cardiovascular, and metabolic disease in offspring (Andescavage & Limperopoulos, 2021; Binder, Beard, et al., 2015; Binder et al., 2012; Chambers et al., 2016; Cropley et al., 2016; de Castro Barbosa et al., 2016b; T. Fullston et al., 2012; Tod Fullston et al., 2013; Grandjean et al., 2015; Huypens et al., 2016; Jazwiec et al., 2022; Mitchell, Bakos, & Lane, 2011; Ng et al., 2010; Pepin et al., 2022; Perez-Garcia et al., 2018; Terashima et al., 2015; Thornburg et al., 2016; Thornburg & Marshall, 2015; Ueda et al., 2022; Wei et al., 2014). The brain-placenta and heart-placenta axes refer to their developmental linkage to the trophoblast which produces various hormones, neurotransmitters, and growth factors that are central to brain and heart development (Parrettini, Caroli, & Torlone, 2020; Rosenfeld, 2021). This is further illustrated in studies where placental pathology is linked to cardiovascular and heart abnormalities (Andescavage & Limperopoulos, 2021; Thornburg et al., 2016; Thornburg & Marshall, 2015). For example, in a study of the relationship between placental pathology and neurodevelopment of infants, possible hypoxic conditions were a significant predictor of lower Mullen Scales of Early Learning (Ueda et al., 2022). A connecting factor between the neural and cardiovascular phenotypes is the neural crest cells which make a critical contribution to the developing heart and brain (Hemberger et al., 2020; Perez-Garcia et al., 2018). Notably, neural crest cells are of ectodermal origin which arises from the TE (Prasad, Charney, & García-Castro, 2019), which is in turn governed by paternally-driven gene expression. It is worth considering the routes by which TE dysfunction may be implicated in the paternal origins of metabolic and cardiovascular disease. First, altered placenta gene expression beginning in the TE could influence the specification of neural crest cells which are a developmental adjacent cell lineage in the early embryo. TE signaling to neural crest cells could alter their downstream function. Second, altered trophoblast endocrine function will influence cardiac and neurodevelopment (Hemberger et al., 2020).”

      3) The overall effect size is small. In most cases the magnitude of differences is minor, and it is not clear which of these changes are significant over noise. For example, the y-axis for the metagene plots in Figure 2B does not start at zero, so the total range of the difference in H3K4me3 is small. In Figure 6C, DEGs detected in hypoxic placenta after deconvolution analysis do not look very different compared to control.

      Thank-you for pointing out that the scales were different in Figure 2 Bi and ii. They have been revised to show the same Y axis scale beginning at zero for comparison of regions that gained and lost H3K4me3 making the differences in H3K4me3 more readily visible. The heatmap shown in Figure 6 C visualizes the DEGs in hypoxic vs control placenta where 1477 DEGS were identified in our re-analysis using a convolution approach applied to the bulk-seq data set from Chu et al., 2019. We do not share the view that they are not well visualized in the heat map.

      4) Deconvolution analysis was done on bulk RNA-seq data from placenta, and the numbers of DEGs identified with this analysis compared to the original analysis are shown, but is not clear how the deconvolution analysis changes the specific biological conclusions. In addition, the reference dataset for deconvolution is a published dataset generated in another lab, and it is unclear how comparable the reference sample is to the samples analyzed in this study, or how robust this analysis is when using a dataset generated under different conditions.

      The deconvolution analysis allows to infer cellular composition within a tissue and suggests that there are changes in cell-type proportion that could change placenta function and improves the detection of differentially expressed genes (Aliee & Theis, 2021; Campbell et al., 2023; Kuhn, Thu, Waldvogel, Faull, & Luthi-Carter, 2011) (PMID: 34293324; 36914823; 21983921).

      As per the published dataset used as a reference sample for the deconvolution analysis, it was ideal -we specifically chose this dataset for this analysis as the tissue of origin matched for the same mouse strain and developmental type points as our samples and those used in the Chu et al., 2019 analysis. We used the Chu et al., 2019 data set for comparative validation, and to further explore whether the biological effects of paternal obesity were like those of a hypoxic placenta. We have revised the text to more clearly show the biological relevance and interpretation of this analysis (see author response 12)

      We revised the text to clarify the biological implications of this analysis:

      Lines 282-290: “This reduction in the number of detected DEGs before versus after accounting for cellular composition suggests that changes in cell-type proportions at least partly drive tissue-level differential expression. This is consistent with the recent finding that preeclampsia-associated cellular heterogeneity in human placentas mediates previously detected bulk gene expression differences (Campbell et al., 2023). There were similarities between the bulk RNA-seq and deconvoluted analysis in that there was overlap of DEGs detected before and after adjusting for cell-type proportions (Figure 5 – Figure supplement 3 G and H, Fisher’s exact test P=1.8e-105 and P=0e+00, respectively). This differential gene expression analysis accounting for cellular composition provides insight into how paternal obesity may impact placental development and function and underscores the contribution of cellular heterogeneity in this process.”

      Reviewer #4 (Public Review):

      The members of the Kimmins lab perform a dietary study in mice to investigate the impact of obesity of fathers on the development of their offspring. To do so, they expose male mice to a high fat diet and determine the distribution and occupancy levels of the histone H3 lysine 4 trimethylation (H3K4me3) mark in spermatozoa and perform gene expression studies on placenta tissue obtained from mouse embryos during mid-gestation development. The authors report changes in H3K4me3 occupancy in sperm as well as in transcriptomes of placentas of male and female embryonic offspring. While the authors perform extensive computational analysis of the transcriptomic and chromatin immunoprecipitation data, the authors do not go much beyond making correlative statements at mainly the genome wide level between changes for H3K4me3 in sperm and transcriptional changes in placenta, the latter of which are in part related to changes in cellular composition (as deduced from transcriptional data). Given that both parental mice had the same genetic background, it was not possible to deduce parental specific contributions to transcriptional changes as observed in placentas of offspring. In all, the study falls short in increasing mechanistic insights into this important biological phenomenon.

      It is difficult in studies on paternal epigenetic inheritance to attribute a mechanism and we agree that the relationship between the obesity altered sperm epigenome and the placenta abnormalities are correlative. However, the novelty in our study is that we postulate a new mechanism for paternal transmission of metabolic disease that implicates the placenta and demonstrate this via an altered placenta transcriptome and placenta developmental abnormalities described here and in our previous paper on this model ((Jazwiec et al., 2022); PMID: 35377412). The next steps for the field to address causation/mechanism requires generation of a sperm epigenome edited mouse model where we induce and track histone methylation changes at specific genes to the tissues in the next generation. Indeed, this targeting approach is underway in our research program.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We appreciate the constructive comments made by the editor and the reviewers. We have corrected errors and provided additional experimental data and analysis to address the latest criticisms raised by the reviewers and provided point-by-point response to the reviewers as below.

      Reviewer #1 (Recommendations For The Authors):

      I do acknowledge the work the authors put into this manuscript and I can accept the fact that the authors decided on a minimum of additional experiments. However, I would recommend the authors to be more concise by adding more information in the method and result sections about how they performed their experiments such as which Nav and AMPAR DNA constructs they used, the age of the mice, how long time they exposed the patches to quinidine, information on how many times they repeated their pull downs etc.

      Answer: We thank the reviewer’s comments. we have incorporated the suggested modifications into our revised manuscript. Specifically, we have included detailed information on the NaV and AMPAR constructs in the Methods section. The age of the homozygous NaV1.6 knockout mice and the wild-type littermate controls is postnatal (P0-P1) (see in Results and Methods section). Prior to the application of step pulses, cells were subjected to the bath solution containing quinidine for approximately one minute (see in Methods section). Additionally, the co-immunoprecipitation assays for Slack and NaV1.6 were repeated three times (see in Methods section).

      Minor detail in line 263: "...KCNT1 (Slack) have been identified to related to seizure..." I guess this should have been "...KCNT1 (Slack) have been identified and related to seizure..."?

      Answer: We thank the reviewer for raising this point. We have corrected it in the revised manuscript.

      Also, and again minor detail, I had a comment about the color coding in Fig 4 and by mistake, I added 4B, but I meant the use of colors in the entire figure, and mainly the use of colors in 4C, G and I.

      Answer: We apologize for the confusion. We have changed the color coding of Figure 4 in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      While the paper is improved, several concerns do not seem to have been addressed. Some may have been missed because there is no response at all, but others may have been unclear because the response does not address the concern, but a related issue. Details are below.

      Answer: We thank the reviewer for the criticisms. We have made changes of our manuscript to address the concerns.

      Original issue:

      3) Remove the term in vivo.

      Answer: We thank the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the coimmunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      New comment from reviewer:

      The argument to use the term in vivo is not well supported by what the authors have said. Just because tissues are used from an animal does not mean experiments were conducted in vivo. As the authors say, they did not conduct experiments in living organisms. Therefore the term in vivo should be avoided. This is a minor point.

      Answer: We thank the reviewer for pointing this out. We have removed the term “in vivo” in the revised manuscript.

      Original:

      4) Figure 1C Why does Nav1.2 have a small inward current before the large inward current in the inset?

      Answer: We apologize for the confusion. We would like to clarify that the small inward current can be attributed to the current of membrane capacitance (slow capacitance or C-slow). The larger inward current is mediated by NaV1.2.

      New comment:

      This is not well argued. Please note why the authors know the current is due to capacitance. Also, how do they know the larger current is due to NaV1.2? Please add that to the paper so readers know too.

      Answer: We thank the reviewer’s comment. To provide a clearer representation of NaV1.2mediated currents in Fig. 1C, we have replaced the original example trace with a new one in which only one inward current is observed.

      Original:

      The slope of the rising phase of the larger sodium current seems greater than Nav1.6 or Nav1.5. Was this examined?

      Answer: Additionally, we did not compare the slope of the rising phase of NaV subtypes sodium currents but primarily focused on the current amplitudes.

      New comment:

      This is not a strong answer. There seems to be an effect that the authors do not mention and evidently did not quantify that argues against their conclusion, which weakens the presentation.

      Answer: We thank the reviewer’s comment. To assess the slope of the rising phase of NaV subtype currents, we compared the activation time constants of NaV1.2, NaV1.5, and NaV1.6 peak currents in HEK293 cells co-expressing NaV channel subtypes with Slack. The results have shown no significant differences (Author response image 1). We have included this analysis (see Fig. S9A) and the corresponding fitting equation (see in Methods section) in the revised manuscript.

      Author response image 1.

      The activation time constants of peak sodium currents in HEK293 cells co-expressing NaV1.2 (n=6), NaV1.5 (n=5), and NaV1.6 (n=5) with Slack, respectively. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      Original:

      2D-E For Nav1.5 the sodium current is very large compared to Nav1.6. Is it possible the greater effect of quinidine for Nav1.6 is due to the lesser sodium current of Nav1.6?

      Answer: We thank the reviewer for raising this point. We would like to clarify that our results indicate that transient sodium currents contribute to the sensitization of Slack to quinidine blockade (Fig. 2C,E). Therefore, it is unlikely that the greater effect observed for NaV1.6 in sensitizing Slack is due to its lower sodium currents.

      New comment:

      I am not sure the question I was asking was clear. How can the authors discount the possibility that quinidine is more effective on NaV1.6 because the NaV1.6 current is relatively weak?

      Answer: We thank the reviewer for raising this point. We have examined the sodium current amplitudes of NaV1.5, NaV1.5/1.6 chimeras, and NaV1.6 upon co-expression of NaV with Slack. Our analysis revealed that there are no significant differences between NaV1.5 and NaV1.5/6N, with both exhibiting much larger current amplitudes compared to NaV1.6 (Author response image 2), but only NaV1.5/6N replicates the effect of NaV1.6 in sensitizing Slack to quinidine blockade (Fig. 4H-I), suggesting the observed differences between NaV1.5 and NaV1.6 in sensitizing Slack are unlikely to be attributed to NaV1.6's lower sodium currents but may instead involve NaV1.6's Nterminus-induced physical interaction. We have included this analysis in the revised manuscript (see Fig. S9B).

      Author response image 2.

      Comparison of peak sodium current amplitudes of NaV1.5 (n=9), NaV1.5/6NC (n=13), NaV1.5/6N (n=10), and NaV1.6 (n=8) upon co-expressed with Slack in HEK293 cells. ns, p > 0.05, * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001; one-way ANOVA followed by Bonferroni’s post hoc test.

      Original:

      The differences between WT and KO in G -H are hard to appreciate. Could quantification be shown? The text uses words like "block" but this is not clear from the figure. It seems that the replacement of Na+ with Li+ did not block the outward current or effect of quinidine.

      Answer: We apologize for the confusion. We would like to clarify the methods used in this experiment. The lithium ion (Li+) is a much weaker activator of sodium-activated potassium channel Slack than sodium ion (Na+)1,2.

      1. Zhang Z, Rosenhouse-Dantsker A, Tang QY, Noskov S, Logothetis DE. The RCK2 domain uses a coordination site present in Kir channels to confer sodium sensitivity to Slo2.2 channels. J Neurosci. Jun 2 2010;30(22):7554-62. doi:10.1523/JNEUROSCI.0525-10.2010

      2. Kaczmarek LK. Slack, Slick and Sodium-Activated Potassium Channels. ISRN Neurosci. Apr 18 2013;2013(2013)doi:10.1155/2013/354262 Therefore, we replaced Na+ with Li+ in the bath solution to measure the current amplitudes of sodium-activated potassium currents (IKNa)3.

      3. Budelli G, Hage TA, Wei A, et al. Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci. Jun 2009;12(6):745-50. doi:10.1038/nn.2313

      The following equation was used for quantification:

      Furthermore, the remaining IKNa after application of 3 μM quinidine in the bath solution was measured as the following:

      The quantification results were presented in Fig. 1K. The term "block" used in the text referred to the inhibitory effect of quinidine on IKNa.

      New comment:

      The fact remains that the term "block" is too strong for an effect that is incomplete. Also, the authors should add to the paper that Li+ is a weaker activator, so the reader knows some of the caveats to the approach.

      Answer: We thank the reviewer for raising this point. We have added related citations and replaced the term “block” with “inhibit” in the revised manuscript.

      Original:

      1. In K, for the WT, why is the effect of quinidine only striking for the largest currents?

      Answer: We thank the reviewer for raising this point. After conducting an analysis, we found no correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (p = 0.6294) (Author response image 3). Therefore, the effect of quinidine is not solely limited to targeting the larger currents.

      Author response image 3.

      The correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (data from manuscript Fig. 1K). r = 0.1555, p=0.6294, Pearson correlation analysis.

      New comment:

      Please add this to the paper and the figure as Supplemental.

      Answer: We thank the reviewer for raising this point. We have added this figure as Fig.S3B in the revised manuscript.

      Original:

      5) Figure 2 A. The argument could be better made if the same concentration of quinidine were used for Slack and Slack + Nav1.6. It is recognized a greater sensitivity to quinidine is to be shown but as presented the figure is a bit confusing."

      Answer: We apologize for the confusion. We would like to clarify that the presented concentrations of quinidine were chosen to be near the IC50 values for Slack and Slack+NaV1.6.

      New comment:

      Please add this to the paper.

      Answer: We thank the reviewer for raising this point. We have added the clarification about the presented concentrations in the revised manuscript.

      Original:

      2C. Can the authors add the effect of quinidine to the condition where the prepulse potential was 90?"

      Answer: We apologize for the confusion. We would like to clarify that the condition of prepulse potential at -90 mV is the same as the condition in Fig. 1. We only changed one experiment condition where the prepulse potential was changed to -40 mV from -90 mV.

      New comment:

      There was no confusion. The authors should consider adding the condition where the prepulse potential was -90.

      Answer: We thank the reviewer for raising this point. We have added the clarification about the voltage condition in the revised manuscript (see in Fig. 2A caption).

      Original:

      2A. Clarify these 6 panels."

      Answer: We thank the reviewer for raising this point. We have clarified the captions of Fig. 3A in the revised manuscript.

      New comment: Clarification is needed. What is the blue? DAPI? What area of hippocamps? Please label cell layers. What area of cortex? Please label layers.

      Answer: We thank the reviewer for raising this point. We have included the clarification in the Figure caption.

      Original:

      Figure 7. The images need more clarity. They are very hard to see. Text is also hard to see."

      Answer: We apologize for the lack of clarity in the images and text. we would like to provide a concise summary of the key findings shown in this figure.

      Figure 7 illustrates an innovative intervention for treating SlackG269S-induced seizures in mice by disrupting the Slack-NaV1.6 interaction. Our results showed that blocking NaV1.6-mediated sodium influx significantly reduced Slack current amplitudes (Fig. 2D,G), suggesting that the Slack-NaV1.6 interaction contributes to the current amplitudes of epilepsy-related Slack mutant variants, aggravating the gain-of-function phenotype. Additionally, Slack’s C-terminus is involved in the Slack-NaV1.6 interaction (Fig. 5D). We assumed that overexpressing Slack’s C-terminus can disrupt the Slack-NaV1.6 interaction (compete with Slack) and thereby encounter the current amplitudes of epilepsy-related Slack mutant variants.

      In HEK293 cells, overexpression of Slack’s C-terminus indeed significantly reduced the current amplitudes of epilepsy-related SlackG288S and SlackR398Q upon co-expression with NaV1.5/6NC (Fig. 7A,B). Subsequently, we evaluated this intervention in an in vivo epilepsy model by introducing the Slack G269S variant into C57BL/6N mice using AAV injection, mimicking the human Slack mutation G288S that we previously identified (Fig. 7C-G).

      New comment:

      The images do not appear to have changed. Consider moving labels above the images so they can be distinguished better. Please label cell layers. Consider adding arrows to the point in the figure the authors want the reader to notice. The study design and timeline are unclear. What is (1) + (3), (2), etc.?

      Answer: We thank the reviewer for pointing this out. We have modified Figure 7 in the revised manuscript and included the cell layer information in the Figure caption.

      Original:

      It is not clear how data were obtained because injection of kainic acid does not lead to a convulsive seizure every 10 min for several hours, which is what appears to be shown. Individual seizures are just at the beginning and then they merge at the start of status epilepticus. After the onset of status epilepticus the animals twitch, have varied movements, sometime rear and fall, but there is not a return to normal behavior. Therefore one can not call them individual seizures. In some strains of mice, however, individual convulsive seizures do occur (even if the EEG shows status epilepticus is occurring) but there are rarely more than 5 over several hours and the graph has many more. Please explain."

      Answer: We apologize for the confusion. Regarding the data acquisition in relation to kainic acid injection, we initiated the timing following intraperitoneal injection of kainic acid and recorded the seizure scores of per mouse at ten-minute intervals, following the methodology described in previous studies4.

      1. Huang Z, Walker MC, Shah MM. Loss of dendritic HCN1 subunits enhances cortical excitability and epileptogenesis. J Neurosci. Sep 2 2009;29(35):10979-88. doi:10.1523/JNEUROSCI.1531-09.2009

      The seizure scores were determined using a modified Racine, Pinal, and Rovner scale5,6: (1) Facial movements; (2) head nodding; (3) forelimb clonus; (4) dorsal extension (rearing); (5) Loss of balance and falling; (6) Repeated rearing and failing; (7) Violent jumping and running; (8) Stage 7 with periods of tonus; (9) Dead.

      1. Pinel JP, Rovner LI. Electrode placement and kindling-induced experimental epilepsy. Exp Neurol. Jan 15 1978;58(2):335-46. doi:10.1016/0014-4886(78)90145-0

      2. Racine RJ. Modification of seizure activity by electrical stimulation. II. Motor seizure. Electroencephalogr Clin Neurophysiol. Mar 1972;32(3):281-94. doi:10.1016/00134694(72)90177-0

      New comment:

      This was clear. Perhaps my question was not clear. The question is how one can count individual seizures if animals have continuous seizures. It seems like the authors did not consider or observe status epilepticus but individual seizures. If that is true the data are hard to believe because too many seizures were counted. Animals do not have nearly this many seizures after kainic acid.

      Answer: We appreciate the reviewer’s clarification. Our methodology involved assessing the maximum seizure scale during 10-minute intervals per mouse as previously described7, rather than counting individual seizures. For instance, a mouse exhibited the loss of balance and falling multiple times within 30-40 minute interval, we recorded the seizure scale as 5 for that time interval.

      1. Kim EC, Zhang J, Tang AY, et al. Spontaneous seizure and memory loss in mice expressing an epileptic encephalopathy variant in the calmodulin-binding domain of Kv7.2. Proc Natl Acad Sci U S A. Dec 21 2021;118(51)doi:10.1073/pnas.2021265118

      Reviewer #3 (Recommendations For The Authors):

      While the authors have improved the manuscript, several outstanding issues still need to be addressed. Some may have been missed because there is no response at all, but others may have been unclear.

      Answer: We thank the reviewer for the criticisms. We have added additional experimental data and analysis to address the concerns.

      Original issue from Public Review:

      1. Immunolabeling of the hippocampus CA1 suggests sodium channels as well as Slack colocalization with AnkG (Fig 3A). Proximity ligation assay for NaV1.6 and Slack or a super-resolution microscopy approach would be needed to increase confidence in the presented colocalization results. Furthermore, coimmunoprecipitation studies on the membrane fraction would bolster the functional relevance of NaV1.6-Slack interaction on the cell surface.

      Answer: We thank the reviewer for good suggestions. We acknowledge that employing proximity ligation assay and high-resolution techniques would significantly enhance our understanding of the localization of the Slack-NaV1.6 coupling.

      At present, the technical capabilities available in our laboratory and institution do not support highresolution testing. However, we are enthusiastic about exploring potential collaborations to address these questions in the future. Furthermore, we fully recognize the importance of conducting coimmunoprecipitation (Co-IP) assays from membrane fractions. While we have already completed Co-IP assays for total protein and quantified the FRET efficiency values between Slack and NaV1.6 in the membrane region, the Co-IP assays on membrane fractions will be conducted in our future investigations.

      New comment from reviewer: so far, the authors have not demonstrated that Nav1.6 and Slack interact on the cell surface.

      Answer: We thank the reviewer for pointing this out. We acknowledgement that our data did not directly demonstrate interaction between NaV1.6 and Slack on the cell surface and we have removed related terminology in the revised manuscript. Notably, our patch-clamp experiments in Fig. 2D,G and Fig. S10B showed a Na+-mediated membrane current coupling of Slack and NaV1.6. Additionally, the FRET efficiency values between Slack and NaV1.6 were quantified in the membrane region. These findings suggest that membrane-near Slack interacts with NaV1.6.

      1. Although hippocampal slices from Scn8a+/- were used for studies in Fig. S8, it is not clear whether Scn8a-/- or Scn8a+/- tissue was used in other studies (Fig 1J & 1K). It will be important to clarify whether genetic manipulation of NaV1.6 expression (Fig. 1K) has an impact on sodiumactivated potassium current, level of surface Slack expression, or that of NaV1.6 near Slack.

      Answer: We thank the reviewer for pointing this out. In Fig. 1G,J,K, primary cortical neurons from homozygous NaV1.6 knockout (Scn8a-/-) mice were used. We will clarify this information in the revised manuscript. In terms of the effects of genetic manipulation of NaV1.6 expression on IKNa and surface Slack expression, we compared the amplitudes of IKNa measured from homozygous NaV1.6 knockout (NaV1.6-KO) neurons and wild-type (WT) neurons. The results showed that homozygous knockout of NaV1.6 does not alter the amplitudes of IKNa (Author response image 4). The level of surface Slack expression will be tested further.

      Author response image 4.

      The amplitudes of IKNa in WT and NaV1.6-KO neurons (data from manuscript Fig. 1K). ns, p > 0.05, unpaired two-tailed Student’s t test.

      New comment from reviewer: The current version of the manuscrip>t does not contain these pertinent details and needs to be updated to include the information pertaining homozygous NaV1.6 knockouts. What age were these homozygous NaV1.6 knockout mice? These details need to be clearly stated in the manuscript.

      Answer: We thank the reviewer for pointing this out. We have included this analysis in the revised manuscript (see Fig. S3A). The age of homozygous NaV1.6 knockout mice are P0-P1 and we have added this detail in the revised manuscript.

      1. Did the epilepsy-related Slack mutations have an impact on NaV1.6-mediated sodium current?

      Answer: We thank the reviewer’s question. We examined the amplitudes of NaV1.6 sodium current upon expression alone or co-expression of NaV1.6 with epilepsy-related Slack mutations (K629N, R950Q, K985N). The results showed that the tested epilepsy-related Slack mutations do not alter the amplitudes of NaV1.6 sodium current (Author response image 5).

      Author response image 5.

      The amplitudes of NaV1.6 sodium currents upon co-expression of NaV1.6 with epilepsy-related Slack mutant variants (SlackK629N, SlackR950Q, and SlackK985N). ns, p>0.05, oneway ANOVA followed by Bonferroni’s post hoc test.

      New comment from reviewer: Figure with the functional effect of co-expression of NaV1.6 with epilepsy-related Slack mutations should be included in the revised manuscript

      Answer: We thank the reviewer for pointing this out. We have included this analysis in the revised manuscript (see Fig. S10A).

      Original issue from Recommendations For The Authors:

      1. A reference to homozygous knockout is made in the abstract; however, only heterozygous mice are mentioned in the methods section. The genotype of the mice needs to be made clear in the manuscript. Furthermore, at what age were these mice used in the study. Since homozygous knockout of NaV1.6 is lethal at a very young age (<4 wks), it would be important to clarify that point as well.

      Answer: We thank the reviewer for pointing this out. In the revised manuscript, we have included information about the source of the primary cortical neurons used in our study. These neurons were obtained from postnatal homozygous NaV1.6 knockout C3HeB/FeJ mice and their wild-type littermate controls.

      New comment from reviewer: The answer that postnatal homozygous NaV1.6 knockout C3HeB/FeJ mice were used is insufficient. What age were these mice? This needs to be clearly stated in the manuscript.

      Answer: We thank the reviewer for pointing this out. The postnatal homozygous NaV1.6 knockout C3HeB/FeJ mice and their wild-type littermate controls are in P0-P1. We have included this information in the revised manuscript.

      1. How long were the cells exposed to quinidine before the functional measurement were performed?

      Answer: We thank the reviewer for pointing this out. The cells were exposed to the bath solution with quinidine for about one minute before applying step pulses.

      New comment from reviewer: This needs to be clearly stated in the manuscript.

      Answer: We thank the reviewer for pointing this out. We have included this information in the revised manuscript (see in Methods section).

      1. In Fig. 6B-D, it is not clear to what extent co-expression of Slack mutants and NaV1.6 increases sodium-activated potassium current.

      Answer: We thank the reviewer for pointing this out. We notice that the current amplitudes of Slack mutants exhibit a considerable degree of variation, ranging from less than 1 nA to over 20 nA (n =58). To accurately measure the effects of NaV1.6 on increasing current amplitudes of Slack mutants, we plan to apply tetrodotoxin in the bath solution to block NaV1.6 sodium currents upon coexpression of Slack mutants with NaV1.6.

      New comment from reviewer: Were these experiments with TTX completed? If so, they should be added to the revised manuscript.

      Answer: We thank the reviewer for pointing this out. We compared the current amplitudes of epilepsy-related Slack mutant (SlackR950Q) before and after bath-application of 100 nM TTX upon co-expression with NaV1.6 in HEK293 cells. The results showed that bath-application of TTX significantly reduced the current amplitudes of SlackR950Q at +100 mV by nearly 40% (Author response image 6), suggesting NaV1.6 contributes to the current amplitudes of SlackR950Q. We have included this data in the revised manuscript (see Fig. S10B).

      Author response image 6.

      The current amplitudes of SlackR950Q before and after bath-application of 100 nM TTX upon co-expression with NaV1.6 in HEK293 cells (n=5). ***p < 0.001, Two-way repeated measures ANOVA followed by Bonferroni’s post hoc test.

      Additionally, we have corrected some errors in the methods and figure captions section:

      1. Line 513, bath solution “5 glucose” should be “10 glucose.”

      2. Figure 3A caption, the description “hippocampus CA1 (left) and neocortex (right)” was flipped and we have corrected it.

      References

      1. Zhang Z, Rosenhouse-Dantsker A, Tang QY, Noskov S, Logothetis DE. The RCK2 domain uses a coordination site present in Kir channels to confer sodium sensitivity to Slo2.2 channels. J Neurosci. Jun 2 2010;30(22):7554-62. doi:10.1523/JNEUROSCI.0525-10.2010

      2. Kaczmarek LK. Slack, Slick and Sodium-Activated Potassium Channels. ISRN Neurosci. Apr 18 2013;2013(2013)doi:10.1155/2013/354262

      3. Budelli G, Hage TA, Wei A, et al. Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci. Jun 2009;12(6):745-50. doi:10.1038/nn.2313

      4. Huang Z, Walker MC, Shah MM. Loss of dendritic HCN1 subunits enhances cortical excitability and epileptogenesis. J Neurosci. Sep 2 2009;29(35):10979-88. doi:10.1523/JNEUROSCI.1531-09.2009

      5. Pinel JP, Rovner LI. Electrode placement and kindling-induced experimental epilepsy. Exp Neurol. Jan 15 1978;58(2):335-46. doi:10.1016/0014-4886(78)90145-0

      6. Racine RJ. Modification of seizure activity by electrical stimulation. II. Motor seizure. Electroencephalogr Clin Neurophysiol. Mar 1972;32(3):281-94. doi:10.1016/0013-4694(72)90177-0

      7. Kim EC, Zhang J, Tang AY, et al. Spontaneous seizure and memory loss in mice expressing an epileptic encephalopathy variant in the calmodulin-binding domain of Kv7.2. Proc Natl Acad Sci U S A. Dec 21 2021;118(51)doi:10.1073/pnas.2021265118

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Benner et al. identify OVO as a transcriptional factor instrumental in promoting the expression of hundreds of genes essential for female germline identity and early embryo development. Prior data had identified both ovo and otu as genes activated by OVO binding to the promoters. By combining ChIP-seq, RNA-seq, and analysis of prior datasets, the authors extend these data to hundreds of genes and therefore propose that OVO is a master transcriptional regulator of oocyte development. They further speculate that OVO may function to promote chromatin accessibility to facilitate germline gene expression. Overall, the data compellingly demonstrate a much broader role for OVO in the activation of genes in the female germline than previously recognized. By contrast, the relationship between OVO, chromatin accessibility, and the timing of gene expression is only correlative, and more work will be needed to determine the mechanisms by which OVO promotes transcription.

      We fully agree with this summary.

      Strengths:

      Here Benner et al. convincingly show that OVO is a transcriptional activator that promotes expression of hundreds of genes in the female germline. The ChIP-seq and RNA-seq data included in the manuscript are robust and the analysis is compelling.

      Importantly, the set of genes identified is essential for maternal processes, including egg production and patterning of the early embryo. Together, these data identify OVO as a major transcriptional activator of the numerous genes expressed in the female germline, deposited into the oocyte and required for early gene expression. This is an important finding as this is an essential process for development and prior to this study, the major drivers of this gene expression program were unknown.

      We are delighted that this aspect of the work came across clearly. Understanding the regulation of maternal effect genes has been something of a black-box, despite the importance of this class of genes in the history of developmental genetics. The repertoire of essential oogenesis/embryonic development genes that are bound by and respond to OVO are well characterized in the literature, but nothing is known about how they are transcriptionally regulated. We feel the manuscript will be of great interest to readers working on these genes.

      Weaknesses:

      The novelty of the manuscript is somewhat limited as the authors show that, like two prior, well-studied OVO target genes, OVO binds to promoters of germline genes and activates transcription. The fact that OVO performs this function more broadly is not particularly surprising.

      Clearly, transcription factors regulate more than one or two genes. Never-the-less we were surprised at how many of the aspects of oogenesis per se and maternal effect genes were OVO targets. It was our hypothesis that OVO would have a transcriptional effect genome-wide, however, it was less clear whether OVO would always bind at the core promoter, as is with the case of ovo and otu. Our results strongly support the idea that core promoter proximal binding is essential for OVO function; a conclusion of work done decades ago, which has not been revisited using modern techniques.

      A major challenge to understanding the impact of this manuscript is the fact that the experimental system for the RNA-seq, the tagged constructs, and the expression analysis that provides the rationale for the proposed pioneering function of OVO are all included in a separate manuscript.

      This is a case where we ended up with a very, very long manuscript which included a lot of revisiting of legacy data. It was a tough decision on how to break up all the work we had completed on ovo to date. In our opinion, it was too much to put everything into a single manuscript unless we wanted a manuscript length supplement (we were also worried that supplemental data is often overlooked and sometimes poorly reviewed). We therefore decided to split the work into a developmental localization/characterization paper and a functional genomics paper. As it stands both papers are long. Certainly, readers of this manuscript will benefit from reading our previous OVO paper, which we submitted before this one. The earlier manuscript is under revision at another journal and we hope that this improved manuscript will be published and accessible shortly.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Benner et al. interrogate the transcriptional regulator OVO to identify its targets in the Drosophila germline. The authors perform ChIP-seq in the adult ovary and identify established as well as novel OVO binding motifs in potential transcriptional targets of OVO. Through additional bioinformatic analysis of existing ATAC-seq, CAGE-seq, and histone methylation data, the authors confirm previous reports that OVO is enriched at transcription start sites and suggest that OVO does not act as part of the core RNA polymerase complex. Benner et al. then perform bulk RNA-seq in OVO mutant and "wildtype" (GAL4 mediated expression of OVO under the control of the ovo promoter in OVO mutants) ovaries to identify genes that are differentially expressed in the presence of OVO. This analysis supports previous reports that OVO likely acts at transcription start sites as a transcriptional activator. While the authors propose that OVO activates the expression of genes that are important for egg integrity, maturation, and for embryonic development (nanos, gcl, pgc, bicoid), this hypothesis is based on correlation and is not supported by in vivo analysis of the respective OVO binding sites in some of the key genes. A temporal resolution for OVO's role during germline development and egg chamber maturation in the ovary is also missing. Together, this manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis but lacks important in vivo experimental evidence that would validate the high-quality datasets.

      We thank reviewer 2 for the appreciation of the genomics data and analysis. Some of the suggested in vivo experiments are clear next steps, which are well underway. These are beyond the scope of the current manuscript.

      Temporal analysis of ovo function in egg chamber development is not easy, as only the weakest ovo alleles have any egg chambers to examine. However, we will also point out the long-known phenotypes of some of those weak alleles in the text (e.g. ventralized chambers in ovoD3/+). We will need better tools for precise rescue/degradation during egg chamber maturation.

      Strengths:

      The manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis

      Thank you. We went to great lengths to do our highly replicated experiments in multiple ways (e.g. independent pull-down tags) and spent considerable time coming up with an optimized and robust informatic analysis.

      Weaknesses:

      1) The authors propose that OVO acts as a positive regulator of essential germline genes, such as those necessary for egg integrity/maturation and embryonic/germline development. Much of this hypothesis is based on GO term analysis (and supported by the authors' ChIP-seq data). However accurate interpretation of GO term enrichment is highly dependent on using the correct background gene set. What control gene set did the authors use to perform GO term analysis (the information was not in the materials and methods)? If a background gene set was not previously specified, it is essential to perform the analysis with the appropriate background gene set. For this analysis, the total set of genes that were identified in the authors' RNA-seq of OVO-positive ovaries would be an ideal control gene set for which to perform GO term analysis. Alternatively, the total set of genes identified in previous scRNA-seq analysis of ovaries (see Rust et al., 2020, Slaidina et al., 2021 among others) would also be an appropriate control gene set for which to perform GO term analysis. If indeed GO term analysis of the genes bound by OVO compared to all genes expressed in the ovary still produces an enrichment of genes essential for embryonic development and egg integrity, then this hypothesis can be considered.

      We feel that this work on OVO as a positive regulator of genes like bcd, osk, nos, png, gnu, plu, etc., is closer to a demonstration than a proposition. These are textbook examples of genes required for egg and early embryonic development. Hopefully, this is not lost on the readers by an over-reliance on GO term analysis, which is required but not always useful in genome-wide studies.

      We used GO term enrichment analysis as a tool to help focus the story on some major pathways that OVO is regulating. To the specific criticism of the reference gene-set, GO term enrichment analysis in this work is robust to gene background set. We will update the GO term enrichment analysis text to indicate this fact and add a table using expressed genes in our RNA-seq dataset to the manuscript and clarify gene set robustness in greater detail in the methods of the revision. We will also try to focus the reader’s attention on the actual target genes rather than the GO terms in the revised text.

      2) The authors provide important bioinformatic analysis of new and existing datasets that suggest OVO binds to specific motifs in the promoter regions of certain germline genes. While the bioinformatic analysis of these data is thorough and appropriate, the authors do not perform any in vivo validation of these datasets to support their hypotheses. The authors should choose a few important potential OVO targets based on their analysis, such as gcl, nanos, or bicoid (as these genes have well-studied phenotypes in embryogenesis), and perform functional analysis of the OVO binding site in their promoter regions. This may include creating CRISPR lines that do not contain the OVO binding site in the target gene promoter, or reporter lines with and without the OVO binding site, to test if OVO binding is essential for the transcription/function of the candidate genes.

      Exploring mechanism using in vivo phenotypic assays is awesome, so this is a very good suggestion. But, it is not essential for this work -- as has been pointed out in the reviews, in vivo validation of OVO binding sites has been comprehensively done for two target genes, ovo and otu. The “rules” appear similar for both genes. That said, we are already following up specific OVO target genes and the detailed mechanism of OVO function at the core promoter. We removed some of our preliminary in vivo figures from the already long current manuscript. We continue to work on OVO and expect to include this type of analysis in a new manuscript.

      3) The authors perform de novo motif analysis to identify novel OVO binding motifs in their ChIP-seq dataset. Motif analysis can be significantly strengthened by comparing DNA sequences within peaks, to sequences that are just outside of peak regions, thereby generating motifs that are specific to peak regions compared to other regions of the promoter/genome. For example, taking the 200 nt sequence on either side of an OVO peak could be used as a negative control sequence set. What control sequence set did the authors use as for their de novo motif analysis? More detail on this is necessary in the materials and methods section. Re-analysis with an appropriate negative control sequence set is suggested if not previously performed.

      We apologize for being unclear on negative sequence controls in the methods. We used shuffled OVO ChIP-seq peak sequences as the background for the de novo motif analysis, which we will better outline in the methods of the revision. This is a superior background set of sequences as it exactly balances GC content in the query and background sequences. We are not fond of the idea of using adjacent DNA that won’t be controlled for GC content and shadow motifs. Furthermore, the de novo OVO DNA binding motifs are clear, statistically significant variants of the characterized in vitro OVO DNA binding motifs previously identified (Lu et al., 1998; Lee and Garfinkel, 2000; Bielinska et al., 2005), which lends considerable confidence. We also show that the OVO ChIP-seq read density are highly enriched for all our identified motifs, as well as the in vitro motifs. We provide multiple lines of evidence, through multiple methods, that the core OVO DNA binding motif is 5’-TAACNGT-3’. We have high confidence in the motif data.

      4) The authors mention that OVO binding (based on their ChIP-seq data) is highly associated with increased gene expression (lines 433-434). How many of the 3,094 peaks (conservative OVO binding sites), and what percentage of those peaks, are associated with a significant increase in gene expression from the RNA-seq data? How many are associated with a decrease in gene expression? This information should be added to the results section.

      Not including the numbers of the overlapping ChIP peaks and expression changes in the text was an oversight on our part. The numbers that relate to this (666 peaks overlapping genes that significantly increased in expression, significant enrichment according to Fishers exact test, 564 peaks overlapping genes that significantly decreased in expression, significant depletion according to Fishers exact test) are found in figure 4C and will be added to the text.

      5) The authors mention that a change in endogenous OVO expression cannot be determined from the RNA-seq data due to the expression of the OVO-B cDNA rescue construct. Can the authors see a change in endogenous OVO expression based on the presence/absence of OVO introns in their RNA-seq dataset? While intronic sequences are relatively rare in RNA-seq, even a 0.1% capture rate of intronic sequence is likely to be enough to determine the change in endogenous OVO expression in the rescue construct compared to the OVO null.

      This is a good point. The GAL4 transcript is downstream of ovo expression in the hypomorphic ovoovo-GAL4 allele. We state in the text that there is a nonsignificant increase in GAL4 expression with ectopic rescue OVO, although the trend is positive. We calculated the RPKM of RNA-seq reads mapping to the intron spanning exon 3 and exon 4 in ovo-RA and found that there is also a nonsignificant increase in intronic RPKM with ectopic rescue OVO (we will add to the results in the revision). We would expect OVO to be autoregulatory and potentially increase the expression of GAL4 and/or intronic reads, but the ovoovo-GAL4>UASp-OVOB is not directly autoregulatory like the endogenous locus. It is not clear to us how the intervening GAL4 activity would affect OVOB activity in the artificial circuit. Dampening? Feed-forward? Is there an effect on OVOA activity? Regardless, this result does not change our interpretation of the other OVO target genes.

      6) The authors conclude with a model of how OVO may participate in the activation of transcription in embryonic pole cells. However, the authors did not carry out any experiments with pole cells that would support/test such a model. It may be more useful to end with a model that describes OVO's role in oogenesis, which is the experimental focus of the manuscript.

      We did not complete any experiments in embryonic pole cells in this manuscript and base our discussion on the potential dynamics of OVO transcriptional control and our previous work showing maternal and zygotic OVO protein localization in the developing embryonic germline. Obviously, we are highly interested in this question and continue to work on the role of maternal OVO. We agree that we are extended too far and will remove the embryonic germ cell model in the figure. We will instead focus on the possible mechanisms of OVO gene regulation in light of the evidence we have shown in the adult ovary, as suggested.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Drawing on insights from preceding studies, the researchers pinpointed mutations within the spag7 gene that correlate with metabolic aberrations in mice. The precise function of spag7 has not been fully described yet, thereby the primary objective of this investigation is to unravel its pivotal role in the development of obesity and metabolic disease in mice. First, they generated a mice model lacking spag7 and observed that KO mice exhibited diminished birth size, which subsequently progressed to manifest obesity and impaired glucose tolerance upon reaching adulthood. This behaviour was primarily attributed to a reduction in energy expenditure. In fact, KO animals demonstrated compromised exercise endurance and muscle functionality, stemming from a deterioration in mitochondrial activity. Intriguingly, none of these effects was observed when using a tamoxifen-induced KO mouse model, implying that Spag7's influence is predominantly confined to the embryonic developmental phase. Explorations within placental tissue unveiled that mice afflicted by Spag7 deficiency experienced placental insufficiency, likely due to aberrant development of the placental junctional zone, a phenomenon that could impede optimal nutrient conveyance to the developing fetus. Overall, the authors assert that Spag7 emerges as a crucial determinant orchestrating accurate embryogenesis and subsequent energy balance in the later stages of life.

      The study boasts several noteworthy strengths. Notably, it employs a combination of animal models and a thorough analysis of metabolic and exercise parameters, underscoring a meticulous approach. Furthermore, the investigation encompasses a comprehensive evaluation of fetal loss across distinct pregnancy stages, alongside a transcriptomic analysis of skeletal muscle, thereby imparting substantial value. However, a pivotal weakness of the study centres on its translational applicability. While the authors claim that "SPAG7 is well-conserved with 97% of the amino acid sequence being identical in humans and mice", the precise role of spag7 in the human context remains enigmatic. This limitation hampers a direct extrapolation of findings to human scenarios. Additionally, the study's elucidation of the molecular underpinnings behind the spag7-mediated anomalous development of the placental junction zone remains incomplete. Finally, the hypothesis positing a reduction in nutrient availability to the fetus, though intriguing, requires further substantiation, leaving an aspect of the mechanism unexplored.

      Hence, in order to fortify the solidity of their conclusions, these concerns necessitate meticulous attention and resolution in the forthcoming version of the manuscript. Upon the comprehensive addressing of these aspects, the study is poised to exert a substantial influence on the field, its significance reverberating significantly. The methodologies and data presented undoubtedly hold the potential to facilitate the community's deeper understanding of the ramifications stemming from disruptions during pregnancy, shedding light on their enduring impact on the metabolic well-being of subsequent generations.

      Thanks to this reviewer for their thoughtful analysis and commentary. Human mutations in SPAG7 are exceedingly rare (SPAG7 | pLoF (genebass.org)), potentially because of the deleterious effects of SPAG7-deficiency on prenatal development. This makes investigation into the causative effects of SPAG7 in humans challenging. There exist mutations in the SPAG7 region of the genome that are associated with BMI, but no direct coding variants within the spag7 gene itself have been studied.

      We agree with the reviewer that the precise role of spag7 in the placenta remains unknown. However, given its robust expression and high protein levels in the placenta, including in key cells, such as the syncytiotrophoblast (https://www.proteinatlas.org/ENSG00000091640-SPAG7/tissue/Placenta), it is highly likely that spag7 is critical for normal placenta development and function. Multiple studies (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9716072/) have recently shown that sperm associated RNAs play a critical role in embryonic and early placenta development. Our findings will provide the basis for future studies that can elucidate the role of spag7 in human placenta.

      Reviewer #2:

      Summary:

      The authors of this manuscript are interested in discovering and functionally characterizing genes that might cause obesity. To find such genes, they conducted a forward genetic screen in mice, selecting strains which displayed increased body weight and adiposity. They found a strain, with germ-line deficiency in the gene Spag7, which displayed significantly increased body weight, fat mass, and adipose depot sizes manifesting after the onset of adulthood (20 weeks). The mice also display decreased organ sizes, leading to decreased lean body mass. The increased adiposity was traced to decreased energy expenditure at both room temperature and thermoneutrality, correlating with decreased locomotor activity and muscle atrophy. Major metabolic abnormalities such as impaired glucose tolerance and insulin sensitivity also accompanied the phenotype. Unexpectedly, when the authors generated an inducible, whole body knockout mouse using a globally expressed Cre-ERT2 along with a globally floxed Spag7, and induced Spag7 knockout before the onset of obesity, none of the phenotypes seen in the original strain were recapitulated. The authors trace this discrepancy to the major effect of Spag7 being on placental development.

      Strengths:

      Strengths of the manuscript are its inherently unbiased approach, using a forward genetic screen to discover previously unknown genes linked to obesity phenotypes. Another strong aspect of the work was the generation of an independent, complementary, strain consisting of an inducible knockout model, in which the deficiency of the gene could be assessed in a more granular form. This approach enabled the discovery of Spag7 as a gene involved in the establishment of the mature placenta, which determines the metabolic fate of the offspring. Additional strengths include the extensive array of physiological parameters measured, which provided a deep understanding of the whole-body metabolic phenotype and pinpointed its likely origin to muscle energetic dysfunction.

      Weaknesses:

      Weaknesses that can be raised are the lack of molecular mechanistic understanding of the numerous phenotypic observations. For example, the specific role of Spag7 to promote placental development remains unclear. Also, the reason why placental developmental abnormalities lead to muscle dysfunction, and whether indeed the entire metabolic phenotype of the offspring can be attributed solely to decreased muscle energetics is not fully explored.

      Overall, the authors achieved a remarkable success in identifying genes associated with development of obesity and metabolic disease, discovering the role of Spag7 in placental development, and highlighting the fundamental role of in-utero development in setting future metabolic state of the offspring.

      We thank this reviewer for their thoughtful analysis and commentary. Significant effort has been made to understand the causes of the metabolic phenotypes observed in SPAG7-deficient mouse models. It is clear that hyperphagia is not the cause and the muscle energetics deficit is likely not the sole cause. We expect that decreased access to nutrition in utero will lead to widespread and varied metabolic adaptation.

      We agree with the reviewer that further work can be done to understand the molecular mechanism driving the metabolic phenotypes of SPAG7-deficient animals. We believe that full investigation of the processes behind the developmental abnormalities is beyond the scope of this paper and best to be done under a separate paper.

      Reviewer #3:

      Summary:

      The manuscript by Flaherty III S.E. et al identified SPAG7 gene in their forward mutagenetic screening and created the germline knockout and inducible knockout mice. The authors reported that the SPAG7 germline knockout mice had lower birth weight likely due to intrauterine growth restriction and placental insufficiency. The SPAG7 KO mice later developed obesity phenotype as a result of reduced energy expenditure. However, the inducible SPAG7 knockout mice had normal body weight and composition.

      Strengths:

      In this reviewer's opinion, this study has high significance in the field of metabolic research for the following reasons.

      1) The authors' findings are significant in the field of obesity research, especially from the perspective of maternal-fetal medicine. The authors created and analyzed the SPAG7 KO mice and found that the KO mice had a "thrifty phenotype" and developed obesity.

      2) SPAG7 gene function hasn't been thoroughly studied. The reported phenotype will fill the gap of knowledge.

      Overall, the authors have presented their results in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings.

      Weaknesses:

      The manuscript can be further strengthened with more clarification on the following points.

      1) The germline whole-body KO mice were female mice (Line293), however the inducible knockout mice were male mice (Line549). Sexual dimorphism is often observed in metabolic studies, therefore the metabolic phenotype of both female and male mice needs to be reported for the germline and inducible knockouts in order to make the justified conclusion.

      2) SPAG7 has an NLS. Does this protein function in gene expression? Whether the overall metabolic phenotype is the direct cause of SPAG7 ablation is unclear. For example, the Hsd17b10 gene was downregulated in all tissues in the KO mice. Could this have been coincidentally selected for and thus be the cause of the developmental issues and adulthood obesity? Do the iSpag7 mice demonstrate reduced expression of Hsd17b10?

      3) Figure 2c should display the energy expenditure normalized to body weight (or lean body mass).

      4) Please provide more information for the figure legend, including the statistical test that was conducted for each data set, animal numbers for each genotype and sexes.

      5) The authors should report how long after treatment the data was collected for figures 4F-M.

      6) The authors should justify ending the data collection after 8 weeks for the iSPAG7 mice in Figures 4C-E. In the WT vs germline KO mice, there was no clear difference in body weight or lean mass at 15 weeks of age.

      Response to point #1 (Weakness): We thank the reviewer for their thoughtful analysis and commentary. All inducible KO animals described in the paper are female (the typo in Line 549 has been corrected). We did perform studies in both male and female animals for both of these lines. Males display similar metabolic phenotypes, though not as robustly as the females. A table summarizing key data from male and female germline KO animals and inducible KO animals has been included below.

      Author response table 1.

      Author response table 2.

      Response to point #2 (Weakness): SPAG7 contains an R3H domain, which is predicted to bind polynucleotides, and other proteins that contain R3H domains are known to bind RNA or ssDNA. The iSPAG7 mice do display decreased hsd17b10 expression (to a lesser degree than the germline KOs) in the tissues examined. When we knock-down SPAG7 in specific tissues, we also see hsd17b10 expression decrease specifically in those tissues. These data all suggest that hsd17b10 expression is, at least, linked to spag7 expression. They also raise the question of why these animals have no metabolic phenotype. Some possible explanations are that hsd17b10 expression is essential only during early development, or that the lower magnitude of downregulation of hsd17b10 in the iSPAG7 is insufficient to produce the metabolic phenotypes seen in the germline Kos with higher magnitude of downregulation.

      Response to point #3 (Weakness): How best to normalize total energy expenditure data is a subject of debate within the energy expenditure field. As the animals have increased body weight and decreased lean mass, normalizing to either will skew the results in different directions. We have included the data normalized to body weight and to lean mass below. The decrease in total energy expenditure remains significant in either scenario.

      Author response image 1.

      Response to point #4 (Weakness): The information has been added to all figures.

      Response to point #5 (Weakness): Weeks after treatment have been added to the figure legends for Figures 4F-M.

      Response to point #6 (Weakness): Highly significant changes in fat mass, glucose tolerance and insulin sensitivity are already present in the germline SPAG7 KO mice at age of 15 week or earlier. Tamoxifen injection effectively induced SPA7 gene KO in less than a week in the iSPAG7 KO mice. Given the absence of significant changes or any trends towards significance in glucose and insulin tolerance test as well as other metabolic testes in the iSPAG7 KO mice at age of 15 week (same age as the germline KO when these changes observed) and 8 week after SPAG7 gene KO, we did not anticipate to see the changes beyond this point and decided to stop the study at 9 weeks after treatment.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public Review)

      Weaknesses

      1) The usage of young growing mice (8-10 weeks) versus adult mice (>4 months) in the murine mechanical overload experiments. The usage of adult mice would be preferable for these experiments given that maturational growth may somehow affect the outcomes.

      The basis for this critique is not clear as it has been shown that the longitudinal growth of bones is complete by ⁓8 weeks of age (e.g., PMID: 28326349, and 31997656). These studies, along with others, also indicate that 8 weeks is a post-pubescent age in mice. For these reasons, 8 weeks of age was viewed as being representative of the human equivalent of when people start to perform resistance exercise with the goal of increasing muscle mass. Also, it’s important to consider that the mice were 10-12 weeks of age when the muscles were collected which would be equivalent to a human in their lower 20’s. In our human study, the mean age of the subjects was 23. Given the above points, it’s hard for us to appreciate why the use of mice that started at 8-10 weeks of age is viewed as a weakness. With that being said, we recognize that there may be age-related changes in mechanisms of mechanical load-induced growth, but it was not our intent to address this topic.

      1b) No consideration for biological sex.

      We appreciate this point and we agree that sex is an important variable to consider. In this study, we explored an unchartered topic and therefore we wanted to minimize as many known variables as possible. We did that, in part, by focusing specifically on male subjects. In the future, it will certainly be important to explore whether sex (and age) impact the structural adaptations that drive the mechanical load-induced growth of muscle fibers.

      2) Information on whether myofibrillogenesis is dependent on hypertrophy induced by loading, or just hypertrophy in general. To provide information on this, the authors could use, for instance, inducible Myostatin KO mice (a model where hypertrophy and force production are not always in lockstep) to see whether hypertrophy independent from load induces the same result as muscle loading regarding myofibrillogenesis.

      This is a great suggestion, but it goes beyond the intended scope of our study. Nevertheless, with the publication of our FIM-ID methodology, the answer to this and related questions can now be obtained in a time- and cost-effective manner.

      3) Limited information on Type 1 fiber hypertrophy. A "dual overload" model is used for the mouse where the soleus is also overloaded, but presumably, the soleus was too damaged to analyze. Exploring hypertrophy of murine Type 1 fibers using a different model (weight pulling, weighted wheel running, or forced treadmill running) would be a welcome addition.

      The point is well taken and further studies that are aimed at determining whether there are differences in how Type I vs. Type II fibers grow would be an excellent subject for future studies.

      Reviewer #3 (Public Review)

      1) Supplemental Figure 1 is not very clear.

      Supplemental Figure 1 is now presented as Supplemental Figure 2. We carefully reexamined this figure and, in our opinion, the key points have been appropriately conveyed. We would be more than happy to revise the figure, but we would need guidance with respect to which aspect(s) of the figure were not clear to the reviewer.

      Reviewer #1 (Recommendations For The Authors)

      Introduction.

      1) I do not think the first paragraph is really necessary. Cell growth is a fundamental property of cell biology that requires no further justification.

      We believe that it is essential to remind all readers about the importance of skeletal muscle research. For some, the detrimental impact of skeletal muscle loss on one’s quality of life and the greater burden on the healthcare system may not be known.

      2) I prefer "fundamental" over "foundationally".

      All mentions of the word “foundational” and “foundationally” have been changed to “fundamental” and “fundamentally.”

      3) As usual for the Hornberger lab, the authors do an excellent job of providing the (historical) context of the research question.

      Thank you for this positive comment.

      4) I prefer “Goldspink” as “Dr. Goldspink” feels too personal especially when you are critical of his studies.

      All instances of “Dr.” have been removed when referring to the works of others. This includes Dr. Goldspink and Dr. Tokuyasu.

      5) Fourth paragraph, after reference #17. I felt like this discussion was not necessary and did not really add any value to the introduction.

      We believe that this discussion should remain since it highlights the widely accepted notion that mechanical loading leads to an increase in the number of myofibrils per fiber, yet there is no compelling data to support this notion. This discussion highlights the need for documented evidence for the increase in myofibril number in response to mechanical loading and, as such, it serves as a major part of the premise for the experiments that were conducted in our manuscript.

      6) The authors do a nice job of laying out the challenge of rigorously testing the Goldspink model of myofiber hypertrophy.

      Thank you!

      Results

      1). For the EM images, can the authors provide a representative image of myofibril tracing? From the EM image provided, it is difficult to evaluate how accurate the tracing is.

      -Representative images and an explanation of myofibril calculation have been provided in Supplemental Figure 5.

      2) In the mouse, how does the mean myofibril CSA compare between EM and FIM-ID?

      Author response image 1.

      The above figures compare the myofibril CSA and fiber CSA measurements that were obtained with EM and FIM-ID for all analyzed fibers, as well as the same fibers separated according to the fiber type (i.e., Ox vs. Gly). The above figure shows that the FIM-ID measurements of myofibril CSA were slightly, yet significantly, lower than the measurements obtained with EM. However, we believe that it would be misleading to present the data in this manner. Specifically, as shown in Fig. 4C, a positive linear relationship exists between myofibril CSA and fiber CSA. Thus, a direct comparison of myofibril CSA measurements obtained from EM and FIM-ID would only be meaningful if the mean CSA of the fibers that were analyzed were the same. As shown on the panel on the right, the mean CSA of the fibers analyzed with FIM-ID was slightly, yet significantly, lower than the mean CSA of the fibers analyzed with EM. As such, we believe that the most appropriate way to compare the measurements of the two methods is to express the values for the myofibril CSA relative to the fiber CSA and this is how we presented the data in the main figure (i.e., Fig. 4E).

      3) Looking at Fig. 3D, how is intermyofibrillar space calculated when a significant proportion of the ROI is odd-shaped myofibrils that are not outlined? It is not clear how the intermyofibrillar space between the odd-shaped myofibrils is included in the total intermyofibrillar space calculation for the fiber.

      The area occupied by the intermyofibrillar components is calculated by using our custom “Intermyofibrillar Area” pipeline within CellProfiler. Briefly, the program creates a binary image of the SERCA signal. The area occupied by the white pixels in the binary image is then used to calculate the area that is occupied by the intermyofibrillar components. To help readers, an example of this process is now provided in supplemental figure 4.

      4) What is the average percentage of each ROI that was not counted by CP (because a myofibril did not fit the shape criteria)? The concern is that the method of collection is biasing the data. In looking at EM images of myofibrils (from other studies), it is apparent that myofibrils are not always oval; in fact, it appears that often myofibrils have a more rectangular shape. These odd-shaped myofibrils are excluded from the analysis yet they might provide important information; maybe these odd-shaped myofibrils always hypertrophy such that their inclusion might change the overall conclusion of the study. I completely understand the challenges of trying to quantify odd-shaped myofibrils. I think it is important the authors discuss this important limitation of the study.

      First, we would like to clarify that myofibrils of a generally rectangular shape were not excluded. The intent of the filtering steps was to exclude objects that exhibited odd shapes because of an incomplete closure of the signal from SERCA. To illustrate this point we have annotated the images from Figure 3B-D with a red arrow which points to a rectangular object and blue arrows which point to objects that most likely consisted of two or more individual myofibrils that were falsely identified as a single object.

      Author response image 2.

      We appreciate the reviewer's concern that differences in the exclusion rates between groups could have biased the outcomes. Indeed, this was something that we were keeping a careful eye on during our analyses, and we hope that the reviewer will take comfort in knowing that objects were excluded at a very similar rate in both the mouse and human samples (44% vs. 46% for SHAM vs. MOV in mice, and 47% vs. 47% for PRE vs. POST in humans). We realize that this important data should have been included in our original submission and it is now contained with the results section of the revised version of our manuscript. Hopefully the explanation above, along with the inclusion of this data, will alleviate the reviewers concerns that differences between the groups may have been biased by the filtering steps.

      Discussion.

      1) I think the authors provided a balanced interpretation of the data by acknowledging the limitation of having only one time-point. i.e., not being able to assess the myofibril splitting mechanism.

      Thank you!

      2) I think a discussion on the important limitation of only quantifying oval-shaped myofibrils should be included in the discussion.

      Please refer to our response to comment #4 of the results section.

      Reviewer #2 (Recommendations For The Authors)

      Overall, this is a thoughtful, clear, and impactful manuscript that provides valuable tools and information for the skeletal muscle field. My specific comments are as follows:

      1) In the introduction, I really appreciate the historical aspect provided on myofbrillogenesis. As written, however, I was expecting the authors to tackle the myofibril "splitting" question in greater detail with their experiments given the amount of real estate given to that topic, but this was not the case. Consider toning this down a bit as I think it sets a false expectation.

      We acknowledge that the study does not directly address the question about myofibril splitting. However, we believe that it is important to highlight the background of this untested theory since it serves as a major part of the premise for the experiments that were performed.

      2) In the introduction, is it worth worth citing this study? https://rupress.org/jcb/articlepdf/111/5/1885/1464125/1885.pdf.

      This is a very interesting study but, despite the title, we do not believe that it is accurate to say that this study investigated myofibrillogenesis. Instead (as illustrated by the author in Fig. 9) the study focused on the in-series addition of new sarcomeres at the ends of the pre-existing myofibrils (i.e., it studied in-series sarcomerogenesis). In our opinion, the study does not provide any direct evidence of myofibrillogenesis, and we are not aware of any studies that have shown that the chronic stretch model employed by the authors induces myofibrillogenesis. However, numerous studies have shown that chronic stretch leads to the in-series addition of new sarcomeres.

      3) Is there evidence for myofbrillogenesis during cardiac hypertrophy that could be referenced here?

      This is a great question, and one would think that it would have been widely investigated. However, direct evidence for myofibrillogenesis during load-induced cardiac hypertrophy is just as sparse as the evidence for myofibrillogenesis during load-induced skeletal muscle hypertrophy.

      4) In the introduction, perhaps mention that prolonged fixation is another disadvantage of EM tissue preparation. This typically prevents the usage of antibodies afterwards, whereas the authors have been able to overcome this using their method, which is a great strength.

      Thank you for the suggestion. This point has been added the 5th paragraph of the introduction.

      5) In the introduction, are there not EM-compatible computer programs that could sidestep the manual tracing and increase throughput? Why could software such as this not be used? https://www.nature.com/articles/s41592-019-0396-9

      While we agree that automated pipelines have been developed for EM, such methods require a high degree of contrast between the measured objects. With EM, the high degree of contrast required for automated quantification is rarely observed between the myofibrils and the intermyofibrillar components (especially in glycolytic fibers). Moreover, one of the primary goals of our study was to develop a time and cost-effective method for identifying and quantifying myofibrils. As such, we developed a method that would not require the use of EM. We only incorporated EM imaging and analysis to validate the FIM-ID method. Therefore, utilizing an EM-compatible program to sidestep the manual tracing would have sped up the validation step, but it would not have accomplished one of the primary goals of our study.

      6) In the results, specifically for the human specimens, were "hybrid" fibers detected and, if so, how did the pattern of SERCA look? Also, did the authors happen to notice centrallynucleated muscle fibers in the murine plantaris after overload? If so, how did the myofibrils look? Could be interesting.

      For the analysis of the human fibers, two distinct immunolabeling methods were performed. One set of sections was stained for SERCA1 and dystrophin, while the other set was stained for SERCA2 and dystrophin. In other words, we did not perform dual immunolabeling for SERCA1 and SERCA2 on the same sections. Therefore, during the analysis of the human fibers, we did not detect the presence of hybrid fibers. Furthermore, while we did not perform nuclear staining on these sections, it should be noted that nuclei do not contain SERCA, and to the best of our recollection, we did not detect any SERCAnull objects within the center of the fibers. Moreover, our previous work has shown that the model of MOV used in this study does not lead to signs of degeneration/regeneration (You, Jae-Sung et al. (2019). doi:10.1096/fj.201801653RR). Therefore, it can be safely assumed that very few (if any) of the fibers analyzed in this study were centrally nucleated.

      7) In the Results, fixed for how long? This is important since, at least in my experience, with 24+ hours of fixation, antibody reactivity is significantly reduced unless an antigen retrieval step is performed (even then, not always successful). Also, presumably these tissues were drop-fixed? These details are in the Methods but some additional detail here could be warranted for the benefit of the discerning and interested reader.

      For both the mouse and human, the samples were immersion-fixed (presumably the equivalent of “drop-fixed”) in 4% paraformaldehyde in 0.1M phosphate buffer solution for a total of 24 hours (as described in the Methods section). We agree that prolonged aldehyde fixation can affect antibody reactivity; however, the antibodies used for FIM-ID did not require an antigen retrieval step.

      8) In the results regarding NADH/FAD autofluorescence imaging, a complimentary approach in muscle was recently described and could be cited here: https://journals.physiology.org/doi/full/10.1152/japplphysiol.00662.2022

      We appreciate the reviewer’s recommendation to add this citation for the support of our method for fiber type classification and have added it to the manuscript in the second paragraph under the “Further refinement and validation of the automated measurements with FIM-ID” subsection of the Results as citation number 57.

      9) In the results, "Moreover, no significant differences in the mean number of myofibrils per fiber CSA were found when the results from the FIM-ID and EM-based measurements were directly compared, and this point was true when the data from all analyzed fibers was considered..." Nit-picky, but should it be "were considered" since data is plural?

      Thanks, this error was corrected.

      10) In the discussion, are the authors developing a "methodology" or a "method"? I think it may be the latter.

      We agree that “method” is the correct term to use. Instances of the word “methodology” have been replaced with “method.”

      11) In the discussion, since the same fibers were not being tracked over time, I'm not sure that saying "radial growth" is strictly correct. It is intuitive that the fibers were growing during loading, of course, but it may be safer to say "larger fibers versus control or the Pre sample" or something of the like. For example, "all the fiber types that were larger after loading versus controls" as opposed to "showed significant radial growth"

      While we agree that the fiber size was not tracked over time, the experiments were designed to test for a main effect of mechanical loading. Therefore, we are attributing the morphological adaptations to the mechanical loading variable (i.e., mechanical loadinduced growth). The use of terms like “the induction of radial growth” or “the induction of hypertrophy” are commonly used in studies with the methods employed in this study. Respectfully, we believe that it would be more confusing for the readers if we used the suggested terms like "all the fiber types that were larger after loading versus controls". For instance, if I were the reader I would think to myself… but there fiber types that were larger than others before loading (e.g., Ox vs. Gly), so what are the authors really trying to talk about?

      12) I would suggest making a cartoon summary figure to complement and summarize the Methods/Results/Discussion

      Thank you for this suggestion. We created a cartoon that summarizes the overall workflow for FIM-ID and this cartoon is now presented in Supplemental Figure 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public Review):

      Making state-of-the-art (super-resolution) microscopy widely available has been the subject of many publications in recent years as correctly referenced in the manuscript. By advocating the ideas of open-microscopy and trying to replace expensive, scientific-grade components such as lasers, cameras, objectives, and stages with cost-effective alternatives, interested researchers nowadays have a number of different frameworks to choose from. In the iteration of the theme presented here, the authors used the existing modular UC2 framework, which consists of 3D printable building blocks, and combined a cheapish laser, detector and x,y,(z) stage with expensive filters/dichroics and a very expensive high-end objective (>15k Euros). This particular choice raises a first technical question, to which extent a standard NA 1.3 oil immersion objective available for <1k would compare to the chosen NA 1.49 one.

      Measurement of the illumination quality (e.g. the spectral purity) of low budget lasers convinced us of the necessity to use spectral filtering. These cannot be replaced with lower budget alternatives, to sill retain the necessary sensitivity to image single molecules. As expected, the high-quality objectives are able to produce high-quality data. Lower budget alternatives (<500 €) to replace the objective have been tried out. Image quality is reduced but key features in fluorescent images can be identified (see figure S1). The usage of a low budget objective for SMLM imaging is possible, but quality benchmarks such as identifying railroad tracks along microtubule profiles is not possible. Their usage is not optimal for applications aiming to visualize single molecules and might find better application in teaching projects.

      The choice of using the UC2 framework has the advantage, that the individual building blocks can be 3D printed, although it should be mentioned that the authors used injection-molded blocks that will have a limited availability if not offered commercially by a third party. The strength of the manuscript is the tight integration of the hardware and the software (namely the implementations of imSwitch as a GUI to control data acquisition, OS SMLM algorithms for fast sub-pixel localisation and access to Napari).

      The injection-molded cubes can be acquired through the OpenUC2 platform. Alternatively, the 3D printable version of the cubes is freely available and just requires the user to have a 3D printer. https://github.com/openUC2/UC2-GIT/tree/master/CAD/CUBE_EmptyTemplate

      The presented experimental data is convincing, demonstrating (1) extended live cell imaging both using bright-field and fluorescence in the incubator, (2) single-particle tracking of quantum dots, and (3) and STORM measurements in cells stained against tubulin. In the following I will raise two aspects that currently limit the clarity and the potential impact of the manuscript.

      First, the manuscript would benefit from further refinement. Elements in Figure 1d/e are not described properly. Figure 2c is not described in the caption. GPI-GFP is not introduced. MMS (moment scaling spectrum) could benefit from a one sentence description of what it actually is. In Figure 6, the size of the STORM and wide-field field of views are vastly different, the distances between the peaks on the tubuli are given in micrometers rather than nanometers. (more in the section on recommendations for the author)

      Second, and this is the main criticism at this point, is that although all the information and data is openly available, it seems very difficult to actually build the setup due to a lack of proper documentation (as of early July 2023).

      1) The bill of materials (https://github.com/openUC2/UC2-STORM-and-Fluorescence#bill-of-material) should provide a link to the commercially available items. Some items are named in German. Maybe split the BoM in commercially available and 3D printable parts (I first missed the option to scroll horizontally).

      2) The links to the XY and Z stage refer to the general overview site of the UC2 project (https://github.com/openUC2/) requiring a deep dive to find the actual information.

      3) Detailed building instructions are unfortunately missing. How to assemble the cubes (pCad files showing exploded views, for example)? Trouble shooting?

      4) Some of the hardware details (e.g. which laser was being used, lenses, etc) should be mentioned in the manuscript (or SI)

      I fully understand that providing such level of detail is very time consuming, but I hope that the authors will be able to address these shortcomings.

      1) The bill of materials has been and will also in future still be improved. The items have been sorted into UC2 printed parts and externally acquired parts. The combination of part name as well as provider enables users to find and acquire the same parts. Additionally, depending on the country where the user is located, different providers of a given part might be advantageous as delivery means and costs might vary.

      2) The Z-stage now has a specific repository with different solutions, offering different solutions with different levels of movement precision. According to the user and their budget, different solutions can be optimal for the endeavor.

      https://github.com/openUC2/UC2-Zstage

      The XY stage now also has a detailed repository, as the motorizing of the stage requires a fair amount of tinkering. The video tutorials and the detailed instructions on stage motorizing should help any user to reproduce the stage shown within this manuscript. https://github.com/openUC2/UC2-Motorized-XY-Table

      3) The updated repository has a short video showing the general assembly of the cubes and the layers. Additionally, figure S2 shows all the pieces that are included in every layer (as a photograph as well as CAD). An exploded view of the complete setup would certainly be a helpful visualization of the complete setup. We however hope that the presented assembly tutorials and documents are sufficient to successfully reproduce the U.C.STORM setup.

      First, we want to thank the reviewers for their effort to help us improving our work. We apologize for any trivial mistakes we had overlooked. Please find below our answers to the very constructive and helpful comments of the editors.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      To complement the current data set:

      Figure 2(a & b): Panels i & ii, were chosen on the area where the distribution of the laser appears to be flatter. Can the authors select microtubules from a different section? Otherwise, it is reasonable to also crop the field-of-view along the flatter area (as done in Fig 6).

      Figure 2 was changed to according to the reviewer’s suggestions. The profiles of microtubules from a different section have similar profiles, but the region with best illumination thus best SNR of the profile have been used for the figure.

      Figure 2(c): The current plot shows the gaussian distribution which does not appear to be centered. Instead of a horizontal line, can the authors provide a diagonal profile across the field of view and update the panel below?

      A diagonal cross-section of the illuminated FOV is provided in figure 2 to replace the previous horizontal profile. The pattern seems not to be perfectly radially symmetric, and more light seems to be blocked at the bottom of the illumination pattern compared to the top. A possible improvement can be provided by a fiber-coupled laser, that could provide a more homogeneous illumination while being easier to handle in the assembly process.

      Author response image 1.

      Diagonal cross-section of the illuminated FOV. Pixel-size (104nm) is the same as in figure 2. Intensity has been normalized according to the maximal value.

      Figure 2(d): The system presents a XY drift of ~500nm over the course of a couple of hours. However, is not clear how the focus is being maintained. Can the authors clarify this point and add the axial drift to the plot?

      The axial position of the sample could be maintained over a prolonged period of time without correcting for drift. Measurements where an axial shift was induced by tension pulses in the electronics have been discarded, but the stability of the stage seems to be sufficient to allow for imaging without lateral and axial drift correction. The XY drift measurement displayed in Figure 2(d) can be extended by measuring the σ of the PSF over time. The increase of σ would suggest an axial displacement in relation to the focus plane. In these measurements, a slight axial drift can be seen, the fluorescent beads however can still be localized over the whole course of the measurement.

      A separate experiment was performed, using the same objective on the UC2 setup and on a high-quality setup equipped with a piezo actuator able to move in 10 nm steps. The precise Z steps of the piezo allows to reproducibly swipe through the PSF shape and to give an estimate of the axial displacement of the sample, according to the changes in PSF FWHM (Full Width at Half Maximum). When superimposing the graph with the UC2 measurement of fluorescent beads with the smallest possible Z step, an estimate about the relative axial position of the sample can be provided. The accuracy of the stage however remains limited.

      Author response image 2.

      Drift Figure: a. Drift of fluorescent TS beads on the UC2 setup positioned upon an optical table over a duration of two hours. Beads are localized and resulting displacement in i. and ii. are plotted in the graphs below. The procedure is repeated in b. with the microscope placed on a laboratory bench instead. c. (for the optical table i.) and d. (for the laboratory bench i.) show the variation in the sigma value of the localized beads over the measurement duration. As the sigma values changes when the beads are out of focus, the stability of the setup can be confirmed, as it remains practically unchanged over the measurement duration.

      Author response image 3.

      Z-focus Figure: Estimation of the axial position of TS beads on the UC2 setup. a. The change in PSF FWHM was quantified by acquiring a Z stack of a beads sample. The homebuilt high-quality setup (HQ) was used as a reference, by using the same objective and TS sample. The PSF FWHM on the UC2 setup was measured using the lowest possible axial stage displacement. A Z-position can thus be estimated for single molecules, as displayed in b.

      Addressing the seemingly correlated behavior of the X and Y drift:

      Further measurement show less correlation between drift in X and in Y. Simultaneous motion in X and Y seems to indicate that the stage or the sample is tilted. The collective movement in X and Y seems accentuated by bigger jumps, probably originating from vibrations (as more predominantly shown in the measurements on the laboratory bench compared to the optical table). Tension fluctuations inducing motion of the stage are possible but are highly unlikely to have induced the drift in the displayed measurements.

      Figure 3: Can the authors comment on the effect or otherwise potential effect of the incubator (humidity, condensation etc) may have on the system (e.g., camera, electronics etc)?

      When moving the microscope into the incubator, the first precaution is to check if the used electronics are able to perform at 37° C. Then, placing the microscope inside the incubator can induce condensation of water droplets at the cold interfaces, potentially damaging the electronics or reducing imaging quality. This can be prevented by preheating the microscope in e.g. an incubator without humidity, for a few hours before placing it within the functional incubator. The used incubator should also be checked for air streams (to distribute the CO2), and a direct exposure of the setup to the air stream should be prevented. The usage of a layer of foam material (e.g. Polyurethane) under the microscope helps to reduce possible effects of incubator vibrations on the microscope. The hydrophilic character of PLA makes its usage within the incubator challenging due to its reduced thermal stability. The temperature also inherently reduces the mechanical stability of 3D printed parts. Using a less hydrophilic and more thermally stable plastic, such as ABS, combined with a higher percentage of infill are the empirical solution to this challenge. Further options and designs to improve the usage of the microscope within the incubator are still in developement.

      Figure 5: Can the authors perform single molecule experiments with an alternative tag such as Alexa647?

      The SPT experiments were performed with QDs to make use of their photostability and brightness. The dSTORM experiment suggests that imaging single AF647 molecules with sufficient SNR is possible. The usage of AF647 for SPT is possible but would reduce the accuracy of the localization and shorten the acquired track-lengths, due to the blinking properties of AF647 when illuminated. The tracking experiment with the QDs thus was a proof of concept that the SPT experiments are possible and allow to reproduce the diffusion coefficients published in common literature. The usage of alternative tags can be an interesting extension of the capabilities that users can perform for their applications.

      Figure 6: The authors demonstrate dSTORM of microtubules. It would enhance the paper to also demonstrate 3D imaging (e.g., via cylindrical lens).

      The usage of a cylindrical lens for 3D imaging was not performed yet. The implementation would not be difficult, given the high modularity of the setup in general. The calibration of the PSF shape with astigmatism might however be challenging as the vertical scanning of the Z-stage lacks reliability in its current build. Methods such as biplane imaging might also be difficult to implement, as the halved number of photons in each channel leads to losses in the accuracy of localization. As a future improvement of the setup, the option of providing 3D information with single molecule accuracy is definitely desirable and will be tried out. In the following figure, two concepts for introducing 3D imaging capabilities in the detection layer of the microscope are presented.

      Author response image 4.

      3D concept Figure: Two possible setup modifications to provide axial information when imaging single molecules. a. A cylindrical lens can be placed to induce an asymmetry between the PSF FWHM in x and in y. Every Z position can be identified by two distinct PSF FWHM values in X and Y. b. By splitting the beam in two and defocusing one path, every PSF will have a specific set of values for its FWHM on the two detectors.

      Imaging modalities section: Regarding the use of cling film to diffuse; can the authors comment on the continual use of this approach, including its degradation over time?

      The cling foil was only used as a diffuser for broadening the laser profile. A detailed analysis of the constitution of the foil was not done, as no visible changes could be seen on the illumination pattern and the foil itself. The piece of cling foil is attached to a rotor. Detaching of the cling foil or vibrations originating from the rotor need to be minimized. By keeping the rotation speed to a necessary minimum and attaching the cling foil correctly to the rotor, a usable solution can be created. The low price of the cling foil provides the possibility to exchange the foil on a regular basis, allowing to keep the foil under optimal conditions.

      Author response image 5.

      Profile Figure: By moving a combination of pinhole and photometer to scan through the laser profile with a translational mount, the shape of the laser beam can be estimated. The cling foil plays the same role as a diffuser in other setups.

      Reviewer #2 (Recommendations for The Authors):

      lines

      20, add "," after parts

      110, rotating cling foil?

      112/116, "custom 3D printed" I thought they were injection molded, please finalize

      113, "puzzle pieces" rephrase and they are also barely visible

      119, not clear that the stage is a manual stage that was turned into a motorised one by adding belts

      123-126, detail for SI,

      132, replace Arduino-coded with Arduino-based

      143, add reference to Napari

      146, (black) cardboard seems to be a cheaper and quicker alternative

      153, dichroic

      151-155, reads more like a blog post than a paper (maybe add a section on trouble shooting)

      156, antibody?

      167/189, moderate, please be specific

      194, layer of foam material, specify

      221, add description/reference to GPI. What is that? why is it relevant?

      226: add one sentence description of MMS

      318, add "," after students

      332-334, as mentioned earlier, not clear, you bought a manual stage and connected belts, correct?

      376-377, might be difficult to understand for the layman

      391, what laser was used?

      Figure 1, poor contrast between components, components visible should be named as much as possible, maybe provide the base layer in a different shade. To me, the red and blue labels look like fluorophores.

      Figure 1. looks like d is the excitation layer and not e, please fix.

      Figure 2, caption a-c, figure 1-d!, btw, why is the drift so anti-correlated?

      Figure 6 (line 259) nanometer I guess, not micrometer

      We now incorporated all the above-mentioned changes in the manuscript. Furthermore we added the supplementary Figures as below.

      Author response image 6.

      Basic concept of the UC2 setup: Left: Cubes (green) are connected to one another via puzzle pieces (white). Middle: 3D printed mounts have been designed to adapt various optics (right) to the cube framework. Combined usage of cubes and design of various mounts allows to interface various optics for the assembly.

      Author response image 7.

      Building the UC2 widefield microscope: a. Photograph of the complete setup. b. All pieces necessary to build the setup. A list of the components can be found in the bill of materials. c. Bottom emission layer of the microscope before assembly. d. Emission layer after assembly. Connection between cubes is doubled by using a layer of puzzles on the top and the bottom of the emission layer. e. CAD schematic of the emission layer and the positioning of the optics. f. Middle excitation layer of the microscope before assembly. Beam magnifier and homogenizer have been left out for clarity. g. Excitation layer after assembly is also covered by a puzzle layer. h. CAD schematic of the excitation layer and the positioning of the optics. i. Z-stage photograph and corresponding CAD file. Motor of the stage is embedded within the bottom cube. j. A layer of empty cubes supports the microscope stage. k. At this stage of the assembly, the objective is screwed into the objective holder. l. Finally, the stage is wired to the electronics and can then be mounted on top of the microscope (see a.).

      Author response image 8.

      Measurements performed on the UC2 setup with lower budget objectives. The imaged sample is HeLa cells, stably transfected to express CLC-GFP, then labelled with AF647 through immunostaining. The setup has been kept identical except for the objectives. Scale bar respectively represents 30 µm.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      This paper now provides a convincing presentation of valuable results of the drivers of nest construction for one termite species, and they briefly discuss possible relevance to other termite species. However, the authors have not yet addressed how their results may be important outside the field of termite nest construction. I could imagine the significance of the paper being elevated to important if there is a broader discussion about the impact of this work, e.g., the relevance of the results, the approach, and/or next steps to related fields outside of termite nest construction.

      Reading our manuscript again, we have to agree with the reviewer that we mostly focused the discussion of our results in the context of termite construction, without attempting to generalise to other systems. To some extent we still defend this choice, as we prefer not to make too many claims on the relevance of our results beyond what we can reasonably support with our own experimental results. However, we thought that it would be appropriate – as suggested by the reviewer – to add at least one paragraph to indicate how our results could be extrapolated to other systems. This new paragraph is now at the end of the discussion section.

      Here we elaborate a bit further on this point: first of all, while termites certainly build the most complex structures found in the natural world, there aren’t many other animals that are capable of collectively building complex structures. Typically, collective building activity is limited to highly social (typically eusocial) animals, but other social insects, such as ants and wasps, are phylogenetically distant from termites, their nests are often different (the large majority of ant nests only comprise excavated galleries with little construction, while wasp nests tend to comprise multiple repeated patterns that could be produced from stereotyped individual behaviour). Because of these differences, drawing a comparison between the mechanisms that regulate termite architecture and those that regulate other forms of animal architecture would be too speculative. One domain, however, where similar mechanisms to those that we describe here could operate is that of pattern formation at the cellular and tissue level, where surface curvature was shown to drive different phenomena from cell migration to tissue growth. A comment on this is now added in the manuscript at the very end of the discussion.

      Similarly, on a related note, as someone not directly in the field of termite nest construction but wanting to understand the system (and the results) presented here in a broader context, I found the additional information about species and natural habitat very helpful and interesting, though I was rather disappointed to find it relegated to supplementary material where most readers will not see it.

      We considered this suggestion to present more information about the natural nesting habits of the termites that we study into the main text, but eventually we decided to leave it as supplementary only. We feel that the nesting habits of the termites that we studied here are not too central to the problem that we want to focus on, of how they coordinate their building activity. In fact, there is a large variety of nesting habits across termite genera and species, but we believe that, at a basic level, the mechanisms that we describe here would also apply to species with different nesting habits, because our results are consistent with what is described in the scientific literature for other termite species. As our introduction is already a bit long, we left this description of Coptotermes nesting habits in the supplementary material, where, hopefully, it will still be accessible and useful to readers interested in finding this information.

      When providing responses to reviewers, please directly address the reviewers’ comments point-by-point rather than summarizing comments and responding to summaries.

      We apologize for our previous way to respond to comments and thanks the reviewer for his remark as we learn to navigate through the eLife reviewing system (where some comments are repeated in the overall assessment and in the feed-back of individual reviewers).

      Figure 2 colors: Panels A and E and maybe B do not seem colorblind-friendly. I suggest modifying the colormaps to address this.

      We have changed the colormaps of figures A,B and E which are now colorblind-friendly.

      Line 180: This system is not in equilibrium. Perhaps the authors mean "steady-state?" I suggest reviewing language to ensure that the correct technical terms are used.

      We have now corrected this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      HP1 plays a pivotal role in orchestrating chromatin packaging through the creation of biomolecular condensates. The existence of distinct homologs offers an intriguing avenue for investigating the interplay between genetic sequence and condensate formation. In this study, the authors conducted extensive coarse-grained simulations to delve into the phase separation behavior of HP1 paralogs. Additionally, the researchers delved into the captivating possibility of various HP1 paralogs co-localizing within assemblies composed of multiple components. Importantly, the study also delved into the critical role of DNA in finely tuning this complex process.

      Strengths:

      I applaud the authors for their methodical approach in conducting simulations aimed at dissecting the contributions of hinges, CTE, NTE, and folded regions. The comprehensive insights unveiled in Figure 3 compellingly substantiate the significance of these protein components in facilitating the process of phase separation.

      This systematic exploration has yielded several innovative revelations. Notably, the authors uncovered a nuanced interplay between the folded and disordered domains. Although disordered regions have traditionally been linked to driving phase separation through their capacity for forming multivalent interactions, the authors have demonstrated that the contribution of the CD cannot be overlooked, as it significantly impacts the saturation concentration.

      The outcomes of this study serve to elucidate the intricate mechanisms and regulatory aspects governing HP1 LLPS.

      Weaknesses:

      The authors do not provide an assessment of the quantitative precision of their model. To illustrate, HP1a is anticipated to undergo phase separation primarily under low salt concentrations. Does the model effectively capture this sensitivity to salt conditions? Regrettably, the specific salt conditions employed in the simulations are not explicitly stated. While I anticipate that numerous findings in the manuscript remain valid, it could be beneficial to acknowledge potential limitations tied to the simulations. For instance, might the absence of quantitative precision impact certain predictions, such as the CD's influence on phase separation?

      We thank the reviewer for their kind feedback and for highlighting the essential mechanistic insights obtained from our study. We have addressed the concerns raised by the reviewer below, and the specific amendments made in the manuscript are also delineated.

      We appreciate the reviewer's comment on our model. Our coarse-grained (CG) physics-based model integrates electrostatic and short-range interactions, parametrized based on the Urry hydrophobicity scale. This approach effectively bridges the timescale gap between simulation and experiment, offering a transferable framework to compute protein phase diagrams in temperature-concentration space that can be compared to experimental phase behavior (1). Additionally, the vdW contact probability per residue correlation between AA and CG simulations (Fig. S1 f-h) underscores our model’s capability to uncover the mechanistic insights into the phase separation of HP1 paralogs. Despite its simplicity and widespread adoption for studying sequence-dependent phase separation in biomolecular condensates, we recognize that our CG model does not yet fully replicate experimental observations or the nuanced effects of local secondary structures on phase-separation propensities. We are actively refining our methods and exploring new strategies to enhance the accuracy and efficiency of CG models for the study of biological phase separation.

      In assessing the influence of salt on the LLPS of HP1α, we note that Wang et al. (2) demonstrated that HP1α can undergo LLPS at a low salt concentration (50 mM KCl). Furthermore, Wohl et al. (3) showed that the CG HPS (Kapcha-Rossky) model can capture the salt-dependent LLPS behavior through the electrostatic screening in HP1a, a Drosophila homolog of human HP1α. In our CG model, the salt concentration is captured by the DebyeHuckle term with tunable screening lengths, which allows for the simulations of salt-dependent effects in the low salt regime. We have added Figure S5 to illustrate the influence of salt on the LLPS propensity of HP1α. In the low-salt regime (50 mM), the Csat of HP1α was reduced by twofold compared to that at 100 mM. Increasing the salt concentration to 150 mM raised the Csat and started destabilizing the condensate. In the high salt regime (200500 mM), HP1α did not undergo phase separation, consistent with the experimental observations (2, 4–6).

      Author response image 1.

      Salt-dependent effects on the LLPS of HP1α homodimer. (a, b) Density profiles and snapshots of HP1α homodimer simulation with the box dimensions of 170x170x1190 Å3 at differing salt concentrations, 50, 100, 150, 200, 250, and 500 mM, respectively. The simulations were conducted at 320 K using the HPS-Urry model.

      However, the primary objectives of our study are to elucidate the molecular interactions and to delineate the domain contributions that dictate the distinct phase-separation behaviors of the HP1 paralogs. To this end, we standardized our simulation conditions to a physiological salt concentration of 100 mM for all paralog constructs, facilitating a direct comparison and enabling physiologically relevant predictions, including those for the CD domain. We have added the salt concentration used in the CG simulations in the Materials and Methods section, relevant figure captions, and the following sentence in the third paragraph of the Discussions section to improve clarity.

      “…Our CG simulations corroborate these experimental observations, indicating that a low salt concentration (50 mM) promotes the LLPS of HP1α. Raising the salt concentration weakens the electrostatic interactions and increases the Csat, eventually precluding HP1α’s phase separation at high salt regimes (200-500 mM) (Fig. S5).”

      Reviewer #2 (Public Review):

      In this paper, Phan et al. investigate the properties of human HP1 paralogs, their interactions and abilities to undergo liquid-liquid phase separation. For this, they use a coarse-grained computational approach (validated with additional all-atom simulations) which allows to explore complex mixtures. Matching (wet-lab) experimental results, HP1 beta (HP1b) exhibits different properties from HP1 alpha and gamma (HP1a,g), in that it does not phase separate. Using domain switch experiments, the authors determine that the more negatively charged hinge in HP1b, compared to HP1a and HP1g, is mainly responsible for this effect. Exploring heterotypic complexes, mixtures between HP1 subtypes and DNA, the authors further show that HP1a can serve as a scaffold for HP1b to enter into condensed phases and that DNA can further stabilize phase separated compartments. Most interestingly, they show that a multicomponent mixture containing DNA, and HP1a and HP1b generates spatial separation between the HP1 paralogs: due to increased negative charge of DNA within the condensates, HP1b is pushed out and accumulates at the phase boundary. This represents an example how complex assemblies could form in the cell.

      Overall, this is purely computational work, which however builds on extensive experimental results (including from the authors). The methods showcase how coarse-grained models can be employed to generate and test hypotheses how proteins can condense. Applied to HP1 proteins, the results from this tour-de-force study are consistent and convincing, within the experimental constraints. Moreover, they generate further models to test experimentally, in particular in light of multicomponent mixtures.

      There are, of course, some limitations to these models.

      First, the CG models employed probably will not be able to pick up more complex structure-driven interactions (i.e. specific binding of a peptide in a protein cleft, including defined H-bonds, or induced structural elements). Some of those interactions (i.e. beyond charge-charge or hydrophobics) may also play a role in HP1, and might be ignored here. There is also the question of specificity, i.e. how can diverse phases coexist in cells, when the only parameters are charge and hydrophobicity? Does the arrangement of charges in the NTD, hinges and CTDs matter or are only the average properties important?

      We thank the reviewer for the thoughtful comments. We also appreciate the opportunity to incorporate the feedback on the reviewer’s concerns below.

      We agree that the interaction picture becomes more sophisticated, and many interaction modes may be involved in the phase coexistence in the cell environment. However, due to system sizes and required sampling, studying LLPS at an atomistic resolution remains challenging with the current state-of-the-art computer hardware. Our approach employs the CG model to reduce the computational cost but still capture the predominant interactions at the residue level. We have added the plots (Fig. S1 f-h) to show the correlation of the vdW contact probability per residue for each paralog between AA and CG simulation. The Pearson correlation coefficient is approximately 0.86, suggesting a strong positive linear correlation in the contact propensity between AA and CG simulations.

      Author response image 2.

      Our sequence analysis reveals a high fraction of charged residues in HP1 paralogs, with Arg, Lys, Glu, and Asp constituting 39-45% of the total amino acid count in the sequence. This property may explain why the electrostatic interactions are predominantly involved in the phase-separation behaviors of HP1 paralogs. Our findings on electrostatically driven phase separation and co-localization of HP1 paralogs are consistent with experimental observations by Larson et al. and Keenen et al. (5, 6). Significantly, we observe that the charge patterning in the disordered regions (NTE, hinge, and CTE) plays a critical role in the LLPS of HP1 paralogs, as articulated in the second paragraph of the Discussions section. Modifying this charge patterning, such as by phosphorylating serine residues in HP1α, excising the HP1α CTE, or substituting four acidic residues with basic ones in the HP1β hinge, can profoundly augment the LLPS of these proteins (4, 5, 7). Our in silico molecular details, complemented by in vitro observations, lay a solid foundation for future experiments. These future investigations may delve deeper into the specificity of interactions and the role of structural elements in modulating HP1 phase separation.

      Second, the authors fix CSD-CSD dimers, whereas these interactions are expected to be quite dynamic. In the particular example of HP1 proteins, having dimerization equilibria may change the behavior of complex mixtures significantly, e.g. in view of the proposed accumulation of HP1b at a phase boundary. This point would warrant more discussion in the paper. Moreover, the biological plausibility of such a behavior would be interesting. Is there any experimental data supporting such assemblies?

      We appreciate the reviewer's insightful comment regarding the dynamic nature of CSD-CSD interactions in HP1 proteins. Our assumption of fixing CSD-CSD dimers is grounded on reported dissociation constant (Kd) values for HP1α and HP1β, which are within the nanomolar range, indicative of strong dimerization affinity (4, 8). While the precise Kd values for HP1γ are not available, a study has demonstrated that HP1γ dimerization is crucial for its interaction with chromatin, suggesting a similar strong dimerization tendency as its paralogs (9, 10). Furthermore, evidence from the literature underscores the dimeric functionality of HP1 paralogs facilitated by their ChromoShadow Domains (CSD), which are instrumental in forming stable genomic domains and engaging in crucial interactions within chromatin architecture (5, 6, 11).

      However, we acknowledge that despite the strong dimerization affinity, the CSD-CSD interactions exhibit dynamics, which may influence the behavior of complex mixtures, particularly at phase boundaries. A study by Nielsen et al. (12) shows that mammalian HP1 paralogs can interact directly with one another to form heterodimers. Moreover, the CSD-CSD interface has been shown to act as a hub for transient interactions with diverse binding partner proteins (5, 13). These experimental observations reflect the dynamic nature of CSD-CSD interactions. However, due to the computational constraints and the focus of our study, a simplified static model was employed to gain initial insights into the phase separation behaviors of HP1 paralogs. We believe that the dynamic nature of CSD-CSD interactions and its implications for phase behavior in complex mixtures form an exciting avenue for future computational and experimental studies.

      In light of the reviewer’s comment, we have expanded our discussion in the 6th paragraph of the Discussions Section:

      “... It is important to emphasize that our model is predicated on the assumption that HP1 proteins establish stable chromoshadow domain (CSD-CSD) dimers, a hypothesis supported by their Kd values being in the nanomolar range (13, 53). While this simplification serves as a useful starting point, it may not fully capture the dynamic nature of HP1 dimerization. Further computational and experimental studies are needed to understand better the behavior of the complex mixtures of HP1 paralogs, particularly at phase boundaries.”

      References: 1) R. M. Regy, J. Thompson, Y. C. Kim, J. Mittal, Improved coarse‐grained model for studying sequence dependent phase separation of disordered proteins. Protein Sci., doi: 10.1002/pro.4094 (2021).

      2) L. Wang, Y. Gao, X. Zheng, C. Liu, S. Dong, R. Li, G. Zhang, Y. Wei, H. Qu, Y. Li, C. D. Allis, G. Li, H. Li, P. Li, Histone Modifications Regulate Chromatin Compartmentalization by Contributing to a Phase Separation Mechanism. Mol. Cell 76, 646-659.e6 (2019).

      3) S. Wohl, M. Jakubowski, W. Zheng, Salt-Dependent Conformational Changes of Intrinsically Disordered Proteins. J. Phys. Chem. Lett. 12, 6684–6691 (2021).

      4) C. Her, T. M. Phan, N. Jovic, U. Kapoor, B. E. Ackermann, A. Rizuan, Y. C. Kim, J. Mittal, G. T. Debelouchina, Molecular interactions underlying the phase separation of HP1α: role of phosphorylation, ligand and nucleic acid binding. Nucleic Acids Res., gkac1194 (2022).

      5) A. G. Larson, D. Elnatan, M. M. Keenen, M. J. Trnka, J. B. Johnston, A. L. Burlingame, D. A. Agard, S. Redding, G. J. Narlikar, Liquid droplet formation by HP1α suggests a role for phase separation in heterochromatin. Nature 547, 236–240 (2017).

      6) M. M. Keenen, D. Brown, L. D. Brennan, R. Renger, H. Khoo, C. R. Carlson, B. Huang, S. W. Grill, G. J. Narlikar, S. Redding, HP1 proteins compact dna into mechanically and positionally stable phase separated domains. eLife 10, 1–38 (2021).

      7) W. Qin, A. Stengl, E. Ugur, S. Leidescher, J. Ryan, M. C. Cardoso, H. Leonhardt, HP1β carries an acidic linker domain and requires H3K9me3 for phase separation. Nucleus 12, 44–57 (2021).

      8) S. V. Brasher, The structure of mouse HP1 suggests a unique mode of single peptide recognition by the shadow chromo domain dimer. EMBO J. 19, 1587–1597 (2000).

      9) X. Li, S. Wang, Y. Xie, H. Jiang, J. Guo, Y. Wang, Z. Peng, M. Hu, M. Wang, J. Wang, Q. Li, Y. Wang, Z. Liu, Deacetylation induced nuclear condensation of HP1γ promotes multiple myeloma drug resistance. Nat. Commun. 14, 1290 (2023).

      10) Y. Mishima, C. D. Jayasinghe, K. Lu, J. Otani, M. Shirakawa, T. Kawakami, H. Kimura, H. Hojo, P. Carlton, S. Tajima, I. Suetake, Nucleosome compaction facilitates HP1γ binding to methylated H3K9. Nucleic Acids Res. 43, 10200–10212 (2015).

      11) D. O. Trembecka-Lucas, J. W. Dobrucki, A heterochromatin protein 1 (HP1) dimer and a proliferating cell nuclear antigen (PCNA) protein interact in vivo and are parts of a multiprotein complex involved in DNA replication and DNA repair. Cell Cycle 11, 2170–2175 (2012).

      12) A. L. Nielsen, M. Oulad-Abdelghani, J. A. Ortiz, E. Remboutsika, P. Chambon, R. Losson, Heterochromatin formation in mammalian cells: Interaction between histones and HP1 Proteins. Mol. Cell 7, 729–739 (2001).

      13) A. Thiru, D. Nietlispach, H. R. Mott, M. Okuwaki, D. Lyon, P. R. Nielsen, M. Hirshberg, A. Verreault, N. V. Murzina, E. D. Laue, Structural basis of HP1/PXVXL motif peptide interactions and HP1 localisation to heterochromatin. EMBO J. 23, 489–499 (2004).

      14) P. Yu Chew, J. A. Joseph, R. Collepardo-Guevara, A. Reinhardt, Thermodynamic origins of two-component multiphase condensates of proteins. Chem. Sci. 14, 1820–1836 (2023).

      Recommendations for the authors:

      In this important work, the authors apply a residue-resolution protein coarse-grained model to investigate the differences in molecule dimensions and phase behaviour of three HP1 paralogs, HP1 paralog mixtures, and HP1/DNA mixtures. The simulations are well designed to investigate the impact of HP1 sequence on its phase behaviour. The work reveals that electrostatic interactions are a key determinant of HP1 paralog phase behaviour; hence advancing our understanding of the molecular mechanisms driving the phase separation behaviour of HP1 paralogs. Notably, the authors uncovered a nuanced interplay between the folded and disordered domains of HP1. Although disordered regions have traditionally been linked to driving phase separation through their capacity for forming multivalent interactions, the authors demonstrate that the contribution of the CD cannot be overlooked, as it significantly impacts the saturation concentration.

      Essential revisions (based on reviewers assessment below):

      1) The manuscript describes the results of both single-molecule simulations and direct coexistence simulations. However, it is not very easy for the reader to determine which types simulations were performed in each section. The details on the simulations input parameters are also missing. Such details are needed throughout, i.e. to allow readers to follow the work and its implications. For instance, the specific salt conditions employed in the simulations are not explicitly stated. Since HP1 charge is presented as a key regulator for the modulation of HP1 paralogs radii of gyration and their phase behaviour, it is crucial for the authors to explicitly describe the salt concentration used for the different simulations and highlight how the relative differences observed are expected to change as the salt concentration decreases/increases.

      We have turned the first sentences in the paragraphs into subtitles to describe the results of single homodimers in dilute phase and multi-dimers in phase coexistence simulations.

      “Sequence variation affects the conformations of HP1 paralogs in the dilute phase.”

      “Sequence variation in HP1 paralogs leads to their distinct phase separation behaviors.”

      To improve the clarity, we have also added the following sentence to Fig. 2 caption.

      “… Figs. 2a-e show the results obtained under dilute conditions, while Figs. 2f-m illustrate the conditions of phase coexistence.”

      We have specified the salt concentration used in the CG simulations in the Materials and Methods section and the relevant figure captions to improve clarity. We also addressed the reviewer’s comment on salt concentration in the public review above.

      2) Since direct coexistence simulations suffer from important finite-size effects, especially for multi-component mixtures as those investigated here, describing how many proteins/DNA copies were used per system, the size of the simulation, and which checks were done to check for finite-size effects is important. Regarding this point, estimating C_sat from Direct Coexistence simulations is extremely challenging, given the sensitivity of the dilute phase concentration to the box dimensions. Hence, it would be valuable if the authors clarify that the differences on C_sat provided represent a qualitative comparison and are sensitive to the simulation conditions. Importantly, the observation of spatial segregation of components in multi-component condensates could be an artefact of the box dimensions, relative copies of the various components, and overall system density.

      We appreciate the reviewer’s concern regarding the finite-size effects in phase coexistence simulations and potential artifacts arising from box dimensions and system composition. In response to this, we have expanded the Materials and Methods section to elaborate on the specific checks to examine the finite-size effects. The new texts and additional SI figures are shown below.

      “Previous studies have demonstrated that slab geometry can help mitigate finite-size effects and facilitate efficient sampling of the phase diagram (41). To assess the potential impact of finite-size effects with our chosen box dimensions, we conducted a test using the HP1α homodimer, which serves as a representative system given the comparable sequence lengths of HP1 paralogs and their chimeras. By reducing the system size by 30% and constructing its phase diagram, we observed that both the original system size (50 dimers) and the reduced counterpart (35 dimers) produced similar phase diagrams, with critical temperatures of 353.3 K and 352.1 K, respectively, as shown in Figs. S4a,b.

      We further evaluated the influence of the xy cross-sectional area on the measurement of Csat. With the z-direction box length fixed at 1190 ų, we varied the xy cross-sectional areas (120x120, 150x150, and 200x200 Ų) while maintaining the protein density consistent with the control case (170x170 Ų). Given that HP1 dimers are multidomain proteins, a 120x120 Ų cross-section was the minimum size feasible to prevent particle overlap in HOOMD simulations due to the constraints of the small box size. Our findings indicate that the condensates remained stable across all tested cross-sectional areas and that there were no significant differences in Csat measurements within the margin of error, as depicted in Figs. S4c,d. These results confirm that our chosen box size is sufficiently large to minimize finite-size effects, thus ensuring the robustness of our results.”

      Author response image 3.

      Finite-size analysis. (a) Phase diagrams for the HP1α homodimer (50 dimers) and for a system reduced in size by 30% (35 dimers), with critical temperatures of 353.3 K and 352.1 K, respectively. (b) Density profiles of HP1α and its reduced size counterpart at various temperatures. (c, d) Density profiles and snapshots of HP1α homodimer simulation with box dimensions of 170x170x1190 Å3 and for systems with z-direction length fixed at 1190 Å and varying cross-sectional areas: 120x120, 150x150, and 200x200 Å2. The black dashed line shows the simulated saturation concentration of wildtype HP1α homodimer in the box dimensions of 170x170x1190 Å3. The simulations were conducted at 320 K and 100 mM salt concentrations. The error bars represent the standard deviation from triplicate simulation sets.

      In response to the observed spatial segregation in our multi-component condensates, we have carefully considered finite-size effects and are confident that the segregation reflects genuine phase behavior rather than an artifact of simulation parameters. This interpretation is supported by findings from Chew et al. (14), who observed similar multilayered condensates and conducted thorough validations to verify these phases. To clarify our approach, we have included additional details in the Materials and Methods section of our manuscript.

      “... By selecting a box size that minimizes finite-size effects, we can ensure that the spatial segregation observed in our multi-component condensates reflects genuine phase behavior. This finding aligns with Chew et al. (66), who also reported well-separated multilayered condensates and conducted thorough validations to confirm these phases.”

      3) The authors should provide a clearer assessment of the quantitative precision of their model. For instance, the authors use all-atom simulations to compare with CG interaction maps. The all-atom maps are sparser due to less sampling, but the authors state that the maps are 'in good agreement'. How do the authors judge this? The issue of model validation is very important: to illustrate, HP1a is anticipated to undergo phase separation primarily under low salt concentrations. Does the model effectively capture this sensitivity to salt conditions? While numerous findings in the manuscript likely remain valid, it could be beneficial to acknowledge potential limitations tied to the simulations. For instance, might the absence of quantitative precision impact certain predictions, such as the CD's influence on phase separation?<br /> The CG models employed do not consider the specific binding of a peptide in a protein cleft, including defined H-bonds, or induced structural elements. Thus, the authors should discuss whether specific interactions (i.e. beyond charge-charge or hydrophobics) may also play a role in the phase behaviour of HP1, and why it makes sense to ignore them in this study. If the only important parameters are charge and hydrophobicity, how can diverse phases coexist in cells? Does the arrangement of charges in the NTD, hinges and CTDs matter or are only the average properties important?

      This is similar to the point made by Reviewer 2 in the Public Review. We have addressed these questions in the public review and incorporated new plots (Fig. S1 f-h) in the SI.

      4) The authors fix CSD-CSD dimers, whereas these interactions are expected to be quite dynamic. In the particular example of HP1 proteins, having dimerization equilibria may change the behaviour of complex mixtures significantly, e.g. in view of the proposed accumulation of HP1b at a phase boundary. This point warrants more discussion in the paper.

      We have addressed the comment in the public review and extended the discussion in the Discussion section.

      Reviewer #2 (Recommendations For The Authors):

      The authors use all-atom simulations to validate their CG model. In Figure S1, they compare interaction maps. Of course, the AA maps are sparser due to less sampling, but the authors state that the maps are 'in good agreement'. How do the authors judge this (they do not look very similar to me, e.g. the NTD-hinge interactions are mostly lacking)?

      This is similar to Reviewer 1’s concern. We agree that the AA simulations are moderately limited over 5 μs due to the large size of the HP1 proteins (~400 residues in a dimer). However, the expansion trends of the average dimensions of the HP1 paralogs agree with the CG simulations (Fig. S1 a,b). Regarding the AA contact maps, we agree that they are relatively sparse, which makes it difficult to compare them to the CG maps. We have added new plots (Fig. S1 f-h) to show the correlation of the vdW contact probability per residue for each paralog in the AA and CG simulations. The Pearson correlation coefficients are approximately 0.86, suggesting a strong positive linear correlation in the contact propensity between AA and CG simulations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This valuable study, of interest for students of the biology of genomes, uses simulations in combination with published data to examine how many TADs remain after cohesin depletion. The authors suggest that a significant subset of chromosome conformations do not require cohesin, and that knowledge of specific epigenetic states can be used to identify regions of the genome that still interact in the absence of cohesin. The theoretical approaches and quantitative analysis are state-of-the-art, and the data quality and strength of the conclusions are solid. However, because "physical boundaries (of domains?)" in the model appear to be a consequence of preserved TADs, rather than the other way around, the functional insights are limited.

      Summary of the reviewer discussion for the authors:

      While the simulations are state of the art and the reviewers appreciated that the approaches used here might help to resolve apparent discrepancies between conclusions from single-cell and bulk/ensemble techniques to study chromosome conformation, the work would benefit from clarification of what precisely is meant with "physical boundaries" and from a comparison of CCM and HIPPS models to understand commonalities and differences between them. In addition, more discussion of the relation of the current work to previous studies, such as Schwarzer et al., 2017, and Nuebler et al., 2018, would elevate the work and make the key claims more compelling. Please see also the detailed comments from the expert reviewers.

      We thank the editor for the assessment and the reviewers for the incisive comments. We will address these comments one by one. In particular, we attempt to clarify the concept of “physical boundaries” and its relevance in our study. We hope our responses are satisfactory. We believe that our manuscript has benefitted substantially by revising the manuscript following the comments by the reviewers.

      Below is our point-by-point response to the comments:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Jeong et al. investigate the prevalence and cause of TADs that are preserved in eukaryotic cells after cohesin depletion. The authors perform an extensive analysis of previously published Hi-C data, and find that roughly 15% of TADs are preserved in both mouse liver cells and in HCT-116 cells. They confirm previous findings that epigenetic mismatches across the boundaries of TADs can cause TAD preservation. However, the authors also find that not all preserved TADs can be explained this way. Jeong et al. provide an argument based on polymer simulations that "physical boundaries" in 3D structures provide an additional mechanism that can lead to TAD preservation. However, in its current form, we do not find the argumentation and evidence that leads to this claim to be fully compelling.

      Strengths:

      We appreciate the extensive statistical analysis performed by the authors on the extent to which TAD's are preserved; this does seem like a novel and valuable contribution to the field.

      We thank the reviewer for a succinct and comprehensive summary of our work and for appreciating value of our work.

      Weaknesses:

      1) As the authors briefly note, the fact that compartmentalization due to epigenetic mismatches can cause TAD-like structures upon cohesin depletion has already been discussed in the literature; see for example Extended Data Figure 8 in (Schwarzer et al., 2017) or the simulation study (Nuebler et al., 2018). We are hence left with the impression that the novelty of this finding is somewhat overstated in this manuscript.

      It is unclear to us by studying the results in the Extended Data Figure 8 that the authors have shown that epigenetic mismatches cause TAD-like structures. As far as we can discern, the data, without a quantitative analysis, shows that may be new TAD-like structures that are not in the wild type appear when cohesin is deleted.

      The studies by Schwarzer et al 2017 and Nuebler et al 2018 are relevant to our own investigation, which we undertook after scrutinizing the experiments in Schwarzer et al 2017 and the related work by Rao et. al in 2017 on a different cell line. In the summary of the Reviewer discussion, it is suggested we discuss the relation to the experimental study by Schwarzer et al 2017 and the computational work by Nuebler et al 2018.

      (1) The results and the corresponding discussion in these two studies are different (may be complimentary) from our results. When referring to the Extended Data Figure 8 Schwarzer and co-authors state in the main text, “The finer compartmentalization explains most of the remaining or new domains and boundaries seen in Nipbl Hi-C maps”. We are not 100% sure what “remaining” means in this context. The Extended Data Fig. 8(a) shows the “common boundaries” is correlated with the eigenvectors of compartmentalization. If this indeed is what the reviewer is referring to, we believe that our study differs from theirs in two important ways: First, Extended Data Fig.8 (a) is a statistical analysis at the “ensemble” level. In our study, we examined the preservation of TADs at both individual and ensemble level with more detailed analysis. Second, in Extended Data Fig. 8(a), the “common boundaries” (incidentally we are uncertain how that was calculated) are compared to the eigenvectors of PCA analysis of the compartments (larger length scales). In contrast, in our study, we examined the correlation between TAD boundaries and the epigenetic profiles. We believe that this is an important distinction. The PCA analysis of compartments and “common boundaries” defined using (presumably) the insulation score are both derived from the Hi-C contact map. Epigenetic profile, on the other hand, is independent of Hi-C data. We believe our contribution, is to build the connection between epigenetic profiles with the preservation of TADs, and link it to 3D structures. For these reasons, we assert that our results are novel, and are not contained (or even implied) in the Schwarzer et al 2017 study.

      The simulations in Neubler et al 2018, which were undertaken to rationalize the experimenrs, revealed that compartmentalization of small segments is enhanced after cohesin depletion, while TADs disappear, which support the broad claims that are made in the experiments. They assert that the structures generated are non-equilibrium. They do not address the emergence of preserved nor the observation of TAD-like structures at the single cell level. However, our goal was to elucidate the reasons for of preservation of TADs. By that we mean, the reasons why certain TADs are present in both the wild and cohesin depleted cells? Through a detailed analyses of two cells, polymer simulations, we have provided a structural basis to answer the question. Finally, we have provided a plausible between TAD preservation and maintenance of enhancer-promoter interactions by analyzing the Micro-C data. For all these reasons, we believe that our study is different from the results in the Extended Figure 8 or the simulations described by Neubler.

      Let us summarize the new results in our study that are not contained in the studies referred to by this Reviewer. (1) We showed by analyzing the Hi-C data for mouse liver and HCT-16 that a non-negligible fraction of TAPs is preserved, which set in motion our detailed investigation. (2) Then, using polymer simulations on a different cell type, we generated quantitative insights (epigenetic mismatches as well as structural basis) for the preservation of TADs. Although not emphasized, we showed that deletion of cohesin in the GM12878 cells also give rise to P-TADs a prediction that suggests that the observations might be “universal”. (3) Rather than perform, time consuming polymer simulations, we calculated 3D structures directly from Hi-C data for the mouse liver and HCT-16 cells, which provided a structural basis for TAP preservation. (4) The 3D structures also showed how TAD-like features appear at the single cell level, which is in accord with imaging experiments. (5) Finally, we suggest that P-TADs may be linked to the maintenance of enhancer-promoter and promoter-promoter interactions by calculating the 3D structures using the recent Micro-C data.

      For the reasons given above, we assert that our results are novel, and bring new perspectives that are not in the aforementioned insightful studies cited by the Reviewer.

      2) It is not quite clear what the authors conceptually mean by "physical boundaries" and how this could offer additional insight into preserved TADs. First, the authors use the CCM model to show that TAD boundaries correlate with peaks in the single cell boundary probability distribution of the model. This finding is consistent with previous reports that TAD-like structures are present in single cells, and that specific TAD boundaries only arise as a population average.

      The finding based on the CCM simulations hence seems to be that preserved TADs also arise as a population average in cohesin-depleted cells, but we do not follow what the term "physical boundaries" refers to in this context. The authors then use the Hi-C data to infer a maximumentropy-based HIPPS model. They find that preserved TADs often have boundaries that correspond to peaks in the single cell boundary probabilities of the inferred model. The authors seem to imply that these peaks in the boundary probability correspond to "physical boundaries" that cause the preservation of TADs. This argument seems circular; the model is based on inferring interaction strengths between monomers, such that the model recreates the input Hi-C map. This means that the ensemble average of the model should have a TAD boundary where one is present in the input Hi-C data. A TAD boundary in the Hi-C data would then seem to imply a peak in the model's single-cell boundary probability. (The authors do display two examples where this is not the case in Fig.3h, but looking at these cases by eye, they do not seem to correspond to strong TAD boundaries.) "Physical boundaries" in the model are hence a consequence of the preserved TADs, rather than the other way around, as the authors seem to suggest. At the very least the boundary probability in the HIPPS model is not an independent statistic from the Hi-C map (on which their model is constrained), so we have concerns about using the physical boundaries idea to understand where some of the preserved TADs come from.

      There are many statements in this long comment that require us to unpack separately. First, using both the CCM simulations, and even more importantly using data-driven approach, we established that TAD-like structures are present in single cells with and without cohesin. The latter finding is fully consistent with imaging experiments. We are unaware of other computational efforts, before our work, demonstrating that this is the case. Perhaps, the Reviewer can point to the papers in the literature.

      Regarding the statement that our arguments are circular, and lack of clarity of the meaning of physical boundary, please allow us to explain. First, we apologize for the confusion. Let us clarify our approach. We first used CCM to understand the potential origin of substantial fraction of P-TADs in the GM. The simulations, allowed us to generate the plausible mechanisms, for the origin of P-TADs. Because the CCM does reproduce the Hi-C data, we surmised that the general mechanisms inferred from these simulations could be profitably used to analyze the experiments. The simulations also showed that knowledge of 3D structures produces a muchneeded structural basis of P-TADs , and potentially for emergence of new TADs upon cohesin depletion.

      Because 3D coordinates are needed to obtain structural insights into the role of cohesin, we use a novel method to obtain them without the need for simulations. In particular, we used the HIPPS method to obtain 3D coordinates with the Hi-C data as the sole input, which allowed us to calculate directly the boundary probabilities. The excellent agreement between the predicted 3D structures and imaging experiments suggests that meaningful information, not available in Hi-C, may be gleaned from the ensemble of calculated 3D structures.

      Although “physical boundary”, a notion introduced by Zhuang, is defined in in the method section, it is apparently unclear for which we apologize. Because this is an important technical tool, we have added a summary in the main text in the revision. We did not mean to imply that the physical boundaries cause the preservation of TADs, although we found that maintenance of the enhancer-promoter contacts (see Fig. 8 in the revision) could be one of the potential reasons for the emergence of physical boundaries. We agree with the reviewer that physical boundaries are structural evidence of preserved TADs (not the cause), that is when a TAD is preserved, we can detect it by prominent physical boundary. The purpose and benefit of physical boundary analysis and using HIPPS in general is to obtain three-dimensional structures of chromosomes. Although both CCM simulations and HIPPS use Hi-C contact maps, three-dimensional structures provide additional information that is not present in the Hi-C data.

      The arguments that the authors use to justify their claims could be clarified and strengthened. Here are some suggestions: -Explain the concept of "physical boundaries" more clearly in the main text.

      As explained above, we have revised the text to clarify the concept and purpose of physical boundaries analysis. See Page 7.

      • Justify why the boundary probabilities and the physical boundaries concept can be used to offer novel insight into where preserved TADs may come from.

      Boundary probabilities and physical boundaries provide previously unavailable 3D structural information on the TADs structures both at the single-cell and population level. This provides a direct structural basis for determining which TADs are preserved. But in order to understand where P-TADs may come from, physical boundaries analysis alone is not sufficient. As we have shown in the analysis of enhancer-promoter contact, using physical boundary analysis from 3D structures, we can conclude that conservation of enhancer-promoter contact could be one of the reasons for the P-TAD.

      • Explain more clearly what the additional value of using the HIPPS model to study TAD preservation is.

      Our goal, as announced in the title is to elucidate the structural basis for the emergence of PTADs. The HIPPS method, which avoids doing simulations (like CCM and other polymer models used in the literature) provides an ensemble of 3D conformations using averaged contact map generated in Hi-C experiments. Even more importantly, HIPPS produce an ensemble of structures, which can be the basis for predicting the outcomes at the single-cell level. The accuracy of the generated structures has been shown in our previous work (Shi and ThirumalaiPRX 2021). In ensemble-averaged Hi-C experiments, TADs appear to be relatively stable. However, imaging experiments (Bintu et. al, 2018) have revealed that TADs are not fixed structures present in every single cell, but instead exhibit variability at the single-cell level. TADlike structures with distinct boundaries are observed in individual cells, and the location of these boundaries varies from cell to cell. However, these TAD-like structures still show a preferential positioning in 3D structures. Interestingly, the preferential positioning often corresponds to TAD boundaries observed in population-averaged Hi-C data. This suggests that while cohesin is involved in establishing the overall organization of TADs, other factors and mechanisms could also contribute to TAD formation at the individual cell level. In this study, we showed some boundaries of P-TADs upon cohesin loss in the Hi-C maps, align with preferential boundaries in individual 3D structures of chromosomes. The makes the finding that a subset of TADs is preserved upon cohesin is robust.

      From a technical perspective, the use of HIPPS avoids time-consuming polymer simulations. The HIPPS is rapid and can be used to generate arbitrarily large ensemble of structures, allowing us calculate properties both at the single cell and ensemble level.

      In addition, we'd like to offer the following feedback to the authors.

      3) The discussion of enhancer-promoter loops as a cause of TAD preservation is interesting, but it would be interesting to know fraction of preserved TADs enhancer-promoter loops might explain.

      We thank the reviewer for the excellent suggestion. We have done the suggested calculation. The results are shown in a new Figure.8 in the main text. We also moved the results on enhancer-promoter to the main results section from the Discussion section.

      4) The last paragraph of the introduction seems to state that only the HIPPS model was used to find single-cell 3D structures and boundary probabilities. However, the main text suggests that the CCM model was also used for these purposes.

      We have revised the text to clarify this point on pages 3-4. Also please see the discussion on the utility of HIPPS above.

      5) When referring to the boundary probability, it would be useful if the authors always specified whether they refer to the boundary probability before or after cohesin depletion (or loop depletion in the CCM model). Statements such as "This implies that peaks in the boundary probabilities should correspond to P-TADs" are ambiguous; it is unclear if the authors mean that boundary probabilities before cohesin depletion predict that the boundary will be preserved, rather than that preserved TAD boundaries correlate with peaks in the boundary probability after cohesin depletion.

      We thank the reviewer for the suggestion. Indeed, it may be confusing. Hence, we have revised the text in numerous places to clarify this point.

      6) It would be interesting to analyze all TAD boundaries that are present after cohesin depletion, rather than just those that overlap with TAD boundaries in WT cells. This would give better statistics for answering the question what causes TAD-like structures in cells without cohesin.

      We thank the reviewer for this excellent suggestion. First, this would we believe this deviate from the primary goal of this study: what leads to TAD preservation after cohesin deletion? Second, this has to be done very systematically, as we did here for P-TADs, in order draw meaningful conclusions. This is a very useful study for another occasion.

      7) The use of a plethora of acronyms (P-TAD, CM, DM, CCM, HLM...) makes the paper difficult to read.

      We have revised the text to change CM to “contact map” and “DM” to “distance map”. For PTADs, CCM, and WLM, we would argue that P-TAD is rather a clear and intuitive abbreviation and CCM/WLM refers to specific methods/models and replacing them with full names would make text more difficult to read. We hope the reviewer is okay with us keeping these acronyms.

      Reviewer #2 (Public Review):

      Summary:

      Here Jeong et al., use a combination of theoretical and experimental approaches to define molecular contexts that support specific chromatin conformations. They seek to define features that are associated with TADs that are retained after cohesin depletion (the authors refer to these TADs as P-TADs). They were motivated by differences between single cell data, which suggest that some TADs can be maintained in the absence of cohesin, whereas ensemble HiC data suggest complete loss of TADs. By reananalyzing a number of HiC datasets from different cell types, the authors observe that in ensemble methods, a significant subset of TADs are retained. They observe that P-TADs are associated with mismatches in epigenetic state across TAD boundaries. They further observe that "physical boundaries" are associated with P-TAD maintenance. Their structure/simulation based approach appears to be a powerful means to generate 3D structures from ensemble HiC data, and provide chromosome conformations that mimic the data from single-cell based experiments. Their results also challenge current dogma in the field about epigenetic state being more related to compartment formation rather than TAD boundaries. Their analysis is particularly important because limited amounts of imaging data are presently available for defining chromosome structure at the single-molecule level, however, vast amounts of HiC and ChIP-seq data are available. By using HiC data to generate high quality simulated structural data, they overcome this limitation. Overall, this manuscript is important for understanding chromosome organization, particularly for contacts that do not require cohesin for their maintenance, and for understanding how different levels of chromosome organization may be interconnected. I cannot comment on the validity of the provided simulation methods and hope that another reviewer is qualified to do this.

      We appreciate the reviewer for a comprehensive summary of our work, and we are happy that the reviewer finds our work important, which provides valuable insights to the field.

      Specific comments

      • It is unclear what defines a physical barrier. From reading the text and the methods, it is not entirely clear to me how the authors have designated sites of physical barriers. It may help to define this on pg 7, second to last paragraph, when the authors first describe instances of PTAD maintenance in the absence of epigenetic mismatch.

      We thank the reviewer for the suggestions. The details of physical boundary designation are provided in the appendix data analysis. To make the concept and idea of physical boundary easy to understand, we have revised the text on page 7 in the revised main text.

      • Figure 7 adds an interesting take to their approach. Here the authors use microC data to analyze promoter-enhancer/promoter-promoter contacts. These data are included as part of the discussion. I think this data could be incorporated into the main text, particularly because it provides a biological context where P-TADs would have a rather critical role.

      We thank the reviewers for the suggestion. We also agree that results in Figure 7 provide novel insights on TAD formation and its possible preservation upon perturbation. We have followed the reviewer’s suggestion to move it to an independent section in the main results section as the last subsection.

      • Figure 3a- the numbers here do not match the text (page 6, second to last paragraph). The numbers have been flipped for either chromosome 10 or chromosome 13 in the text or the figures.

      We thank the reviewer for pointing out this error. In the revised main text, it has been corrected.

      Reviewer #3 (Public Review):

      This manuscript presents a comprehensive investigation into the mechanisms that explain the presence of TADs (P-TADs) in cells where cohesin has been removed. In particular, to study TADs in wildtype and cohesin depleted cells, the authors use a combination of polymer simulations to predict whole chromosome structures de novo and from Hi-C data. Interestingly, they find that those TADs that survive cohesin removal contain a switch in epigenetic marks (from compartment A to B or B to A) at the boundary. Additionally, they find that the P-TADs are needed to retain enhancer-promoter and promoter-promoter interactions.

      Overall, the study is well-executed, and the evidence found provides interesting insights into genome folding and interpretations of conflicting results on the role of cohesin on TAD formation.

      We are pleased with the reviewer’s positive assessment of our work.

      To strengthen their claims, the authors should compare their de-novo prediction approach to their data-driven predictions at the single cell level.

      We thank the reviewer for the very good suggestion. We are assuming that the Reviewer is asking us to compare the CCM simulations with HIPPS generated structures at the single cell level. We have shown, using the GM12878 cell data, that the polymer simulations reproduce the Hi-C contact maps (an average quantity) well (see Appendix Fig. 2 and Fig. 3). In addition, we show in Appendix Fig. 8 the comparison with ensemble averaged distance maps as well as at the single cell level for Chr 13 from the GM12878 cell. There are TAD-like structures at the single cell level just as we find for HCT-116 cell (Fig. 5 in the main text). Thus, the conclusions from de-novo prediction and data-driven predictions are consistent. In addition, in our previous publication introducing HIPPS in Phys Rev X 11: 011051 (2021), we showed that the method is quantitatively accurate in reproducing experimental data for all the interphase chromosomes.

      Having demonstrated this consistency, we used computationally simple data-driven predictions to analyze HCT-116 and mouse liver cell lines for which Hi-C data with and without cohesin rather than perform multiple laborious polymer simulations.

      Please see below for our response to specific comments.

      1) It is confusing that the authors change continuously their label for describing B-A and A-B switches. They should choose one expression. I think that the label "switch" between A and B is more precise than "mismatch".

      We have revised the text to make it consistent. Now it all reads “A-B”. Yes, the suggestion that we use switch is good but we think that mismatch is more concise. We trust that this Reviewer will indulge us on this point.

      2) In the Abstract, the authors mention HCT-116 cells but do not specify which cells are these.

      We have changed “HCT-116” in the abstract to “human colorectal carcinoma cell line”.

      3) In the Abstract, it is unclear what the authors mean by "without any parameters"

      In the theoretically based HIPPS method, there is no “free” parameter. In other words, the only parameter is uniquely determined. To avoid confusion, we have removed “without any parameters” from abstract.

      4) In Results, what do the authors mean by 16% (26%)?

      This refers the percentage of how many TADs are preserved after Nipbl and RAD21 removal in mouse and HCT-116 cells, respectively. Using TopDom method, we identified TAD boundaries in Wild and cohesin-depleted cells. There are 16% (959 out of 4176 – Fig. 1a) and 26% (1266 out of 4733 – Fig. 1b) of TADs are preserved after Nipbl and RAD21 removal in mouse and HCT-116 cells, respectively. We removed the percentages in the revised version.

      5) In Results, the authors mention "more importantly, we did tune the value of any parameter to fit the experimental CMs". Did they mean that instead they didn't tune any parameter?

      We apologize for the confusion. In the CCM, there is a single controlled parameter. We have changed the sentence to reflect this correctly.

      6) In Results, section "CCM simulations reproduce wild-type Hi-C maps", Kullback-Leibler (KL) divergence is used to assess the correlation between two loci, but it is unclear what the value 0.04 stands for; is it a good or a bad correlation?

      The value for Kullback-Leibler divergence can vary from 0 to infinity with 0 give the perfect correlation. Thus, 0.04 means that the correlation is excellent.

      7) The authors use two techniques to obtain 3D structures, one is CCM, which takes the cohesin as constraints, and another is HIPPS, which reconstructs from Hi-C maps. Both seem to have good agreement with the Hi-C contact maps. However, did the authors compare the CCM with the HIPPS 3D structures?

      This is detailed in response at the start of the reply to this Reviewer. As detailed in this response as well in the main text we used the CCM to generate hypotheses for the origin of P-TADs. In the process, we established the accuracy of CCM, which gives us confidence about the hypotheses. As explained above and emphasized in the revised version, CCM simulations are time consuming whereas generating 3D structures using HIPPS is computationally simple. Because HIPPS is also accurate, we used it to analyze the Hi-C data on mouse liver, HCT-116 as well as Micro-Data on mESC.

      In our paper in Phys Rev X 11: 011051 (2021) we showed that HIPPS reproduces Hi-C data. In the current manuscript, we showed in Appendix Fig. 2 and Fig. 3 as well as in a study in 2018 (Shi and Thirumalai, Nat Comm.) that CCM is accurate as well. Thus, there is little doubt about the accuracies of the methods that we have developed.

      8) In Results, section "P-TADs have prominent spatial domain boundaries", the authors constructed individual spatial distance matrices (DMs) using 10,000 simulated 3D structures. What are the differences among these 10,000 simulations? Do they start them with different initial structures?

      The structures are generated using HIPPS which is data-driven method that uses Hi-C contact map as constraints. The method, which uses the maximum entropy theory, samples from a distribution that describe the structural ensemble of chromosome. The 10,000 structures are randomly sampled and are independent from each other. The HIPPS method is not a simulation, and hence the issue of initial structures does not arise.

      9) In Methods, when the authors mention the "unknown parameter", do they use one parameter for all simulations (+/- cohesin) or is this parameter different for each system? Would this change the results?

      We apologize for the confusion. The “unknown parameter” is the energy scale 𝜖 that describes the interaction strength between chromosome loci. We have revised the text in the method (page 27) to clarify it. The same value of 𝜖 is used for all CCM simulation with or without cohesin.

      10) In Methods, when the authors perform DBSCAN clustering, they mention that they optimize the clustering parameters for each system. However, if they want to compare between different systems, the clustering parameters should be the same.

      The purpose of DBSCAN is to capture the spatial clustering topology of chromosome loci. However, different cell types and chromosomes may have different overall density, which will impact the average distance between loci. If using the same parameters, such global changes will impact the result of clustering most and the intended spatial clustering topology can be distorted. Hence, we tune the clustering parameter for each system in order to ignore the global effect but only capture the local and topology of clustering of chromosome loci.

      Grammar comments:

      1) "structures, with sharp boundaries are present, at.."

      We thank the reviewer for pointing out the error. We have fixed it.

      2) "Three headlines emerge from these studies are:"

      We have fixed it.

      3) "both the cell lines"

      We have fixed it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Since they used PBMCs, without other assays to confirm the cell subtypes, I am not sure if any of the heterogeneity they detected in 6 cytokine secretion would be able to relate back to biology.

      We agree with the reviewer that we cannot relate cytokine secretion back to specific cell populations and that part of the heterogeneity observed is due to various cellular populations and subpopulations. However, we would argue that the results obtained from measuring PBMCs especially relate to biology, not cellular identity, and provide useful information on how PBMCs will respond to a specific challenge since they offer more clinical relevance in patient stratification and monitoring. Thus, the possibility of identifying trends in polyfunctional cytokine secretion is not hindered by the isolated view of one specific cellular subpopulation. However, we agree that future experiments must identify the polyfunctional cells and decipher the extent of heterogeneity within the population.

      In addition, the two panels were measured on separate cells, I am not sure it is meaningful to make any comparisons of the two panels as they are on different cells.

      Thank you for mentioning this point. If this refers to Figure 3, where we compare the percentage of secreting cells incubation times, these cells are all individual data points, i.e., individual cells and then pooled. It is true that, potentially, these could be similar cell types (a cell co-secreting TNFa/IL-6 could also co-secrete IL-8/MIP-1a). Since they originate from the same cell batch and stimulation, only divided before encapsulation, we think it is a valid comparison as this would also be done in ELISpot or similar techniques.

      Reviewer 2

      The conclusions of the study are based on samples from a single donor, which makes the conclusions on secretion patterns difficult to interpret. The choice of cytokines is explained, but the justification of the groupings of the antibodies into the two panels is missing.

      Thank you for highlighting this valid criticism. We chose to use cells from one donor to examine the secretion patterns observed in one individual, as cells from different individuals might respond differently. The focus of the experiments described in this study was to describe secretion patterns with respect to the incubation times and secreted cytokine, including multiple donors, which would address a different question (i.e., how is polyfunctionality different between individuals). The cytokines were grouped according to expected secretion to observe overlaps between different cell types (to increase the chance of seeing secretion from both panels simultaneously). We have added complementary text discussing the justification of cytokine grouping in the updated manuscript.

      It would further be helpful to discuss how the single cell incubation might affect the secretion dynamics vs. the influence of co-culture of all cell types during the 24 h activation.

      Thank you for this input. We discussed this potential limitation in detail in a previous publication (Portmann et al., Cell Reports Methods, 2023) and added some addressing sentences to the discussion.

      The authors compare average secretion rates and levels. However, the right panel in Fig. 6 looks like there might be two different populations of mono- or polyfuntional cells that have two secretion rates. As the authors have single-cell data, I would find the separation into these populations more meaningful than comparing the mean values. In line with this comment, comparing the mean values for these cytokines instead of the mean of the populations with distinct seretion properties might actually show stronger differences than the authors report here.

      Thank you for this addition. This plot focuses on describing the relationship between secretion and incubation times. We agree that the data can be further divided into high and low secretion and the respective average plot. However, we finally decided against such a solution to avoid bias due to small event counts in certain high- and low-polysecreting populations. We checked whether dynamics are different between these populations, and the individual averages largely follow the overall trend, although on different plateaus – indeed, high-secreting cells will reach a plateau due to saturation. We have added the plot for IFNy here to visualize this point.

      Author response image 1.

      Is the plateau of the cytokine concentration caused by the fluorescence signal saturating the camera, saturation of the magnetic beads, exhaustion of the fluorescent antibodies, or constant cytokine concentrations?

      Thank you for raising this point. On the individual cell level, the plateau is caused by assay capacity limitations for high-secreting cell populations, i.e., the capacity of the nanoparticles. For low secreting populations, the plateau is caused by a cease in secretion, whereas for high-secreting cells, the capacity will be limiting. This has been extensively discussed in Portmann et al., Cell Report Methods, 2023.

      The high number of non-CSCs and the limited number of droplets decrease the statistical power of the method. The authors discuss their choice to use PBMCs and not solely T cells, but this aspect is missing in the discussion.

      As mentioned above, we chose PBMCs for their better representability and heterogeneity in clinical settings. Indeed, focusing on secreting cell subpopulations would increase the percentage of CSCs and the number, but we found the method to be sufficiently statistically powerful for our measurements. However, we also agree with the comment raised by reviewer 1 that a focus on a specific cell population might be interesting for many questions and applications. We have added respective text to the discussion section.

      The absolute cell number is missing. This might also answer the question of whether polyfunctional cells turn into monofunctional cells after stimulation for 24 hours or if the monofunctional population expands more.

      We are unsure of this comment. If the reviewer refers to a potential expansion ex vivo over 24 h, we have checked this for different conditions and could not observe cellular expansion within this timeframe – the numbers remained mostly stable, sometimes decreasing and only increasing in CD3/CD28. However, an overall change in cell counts does not necessarily relate to the functionalities of individual cells. This observation, combined with our results, hints towards a dynamic cellular restriction of polyfunctionality, but is no direct evidence for such a hypothesis as individual cells need to be followed in such an experiment over a much larger time frame.

      Fig. 4: Using a divergent colour scheme would be helpful. Fig. 6: Adding labels with the stimulation next to the plots would be helpful.

      We have changed the figures accordingly.

      A limitation of the approach is that the detection of polyfunctionality relies on how the three cytokines in each panel are selected and comparisons between the two panels are not otherwise helpful. Can the authors discuss how many panels would be needed to fully explore polyfunctionality among the six cytokines?

      Thank you for this comment. We agree that the identification of polyfunctional cells is dependent on the panel selection, and its composition. We had to select respective panels, and based our initial choice for this study on expected secretion behavior from PBMCs, instead of engineering panels specific for one cell type. However, these panels can be adapted to study additional questions. Interesting point. 6 cytokines into groups of 3 allows for 20 possible combinations. However, we very rarely see triple positive polyfunctional cells, and not all combinations would make sense due to cellular restrictions and differences in stimulations.

      Is there any way to increase the number of cytokines that could be detected in one droplet?

      This can be done on a lower throughput scale by removing the Cell Trace violet stain. This would allow the current method to measure up to 4 cytokines. An alternative would be adding different fluorophores without spectral overlap so that the throughput could increase to around 6-7 max, allowing us to measure polyfunctionality in a less biased manner. Other solutions are needed if >6-7 cytokines should be measured. Our experiments (with high-throughput cytokine detection systems, Fireplex and Isoplexis, i.e., 17-18 cytokines) showed that cells rarely secreted more than three cytokines at a time.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study explores the relationship between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 pathogenic strains. G4 structures were found to be non-randomly distributed within PAIs and conserved within the same strains. Positive correlations were observed between G4s and GC content across various genomic features, suggesting a link between G4 structures and GC-rich regions. Differences in GC content between PAIs and the core genome underscored the unique nature of PAIs. High-confidence G4 structures in Escherichia coli's regulatory regions were identified, influencing DNA integration within PAIs. These findings shed light on the molecular mechanisms of G4-PAI interactions, enhancing our understanding of bacterial pathogenicity and G4 structures in infectious diseases.

      Strengths:

      The findings of this study hold significant implications for our understanding of bacterial pathogenicity and the role of guanine-quadruplex (G4) structures. Molecular Mechanisms of Pathogenicity: The study highlights that G4 structures are not randomly distributed within pathogenicity islands (PAIs), suggesting a potential role in regulating pathogenicity. This insight into the uneven distribution of G4s within PAIs provides a basis for further research into the molecular mechanisms underlying bacterial pathogenicity.

      Conservation of G4 Structures: The consistent conservation of G4 structures within the same pathogenic strains suggests that these structures might play a vital and possibly conserved role in the pathogenicity of these bacteria. This finding opens doors for exploring how G4s influence virulence across different pathogens. Unique Nature of PAIs: The differences in GC content between PAIs and the core genome underscore the unique nature of PAIs. This distinction suggests that factors such as DNA topology and G4 structures might contribute to the specialized functions and characteristics of PAIs, which are often associated with virulence genes. Regulatory Role of G4s: The identification of high-confidence G4 structures within regulatory regions of Escherichia coli implies that these structures could influence the efficiency or specificity of DNA integration events within PAIs. This finding provides a potential mechanism by which G4s can impact the pathogenicity of bacteria.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Overall, the study provides fundamental insights into the pathogenicity island and conservation of G4 motifs.

      Thank you for your thorough review of our manuscript exploring the relationship between G4 structures and PAIs in 89 pathogenic strains. We appreciate your recognition of the strengths of our study and its potential implications for understanding bacterial pathogenicity. We are pleased that you highlighted the significance of our findings in revealing the non-random distribution and conservation of G4 structures within PAIs across various pathogenic strains.

      Your insightful comments about the molecular mechanisms of pathogenicity, the conservation of G4 structures, the unique nature of PAIs, and the regulatory role of G4s within Escherichia coli are invaluable. We are encouraged by your positive evaluation of these aspects, which underscores the potential impact of our work on advancing the understanding of bacterial pathogenicity.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript entitled "The Intricate Relationship of G-Quadruplexes and Pathogenicity Islands: A Window into Bacterial Pathogenicity" Bo Lyu explored the interactions between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 bacterial genomes through a rigorous computational approach. This paper handles an intriguing and complex topic in the field of pathogenomics. It has the potential to contribute significantly to the understanding of G4-PAI interactions and bacterial pathogenicity.

      Strengths:

      • The chosen research area.

      • The summarizing of the results through neat illustrations.

      Weaknesses:

      This reviewer did not find any significant weaknesses.

      Thank you for your positive and encouraging feedback on our manuscript. We appreciate your specific mention of the strengths, particularly highlighting the chosen research area and the effectiveness of our illustrations in summarizing the results. Your acknowledgment of these aspects is motivating, and we are pleased that the content and presentation resonated well with you.

      Reviewer #3 (Public Review):

      The main problem with the work is that the results are only descriptive and do not allow any inferences or conclusions about the importance of the function of G4 structures. The discussion and conclusions are poor. The results are preliminary and in order to try to make the analysis more interesting, it should be further extended and the data must be explored in a much greater depth.

      Thank you for your constructive feedback on our manuscript, and appreciate the time and effort you dedicated to evaluating our work. We acknowledge your concern regarding the descriptive nature of the results and the limitations in making inferences about the importance of G4 structures. To address this, we plan to enhance the depth of our analysis and provide more insightful interpretations in the discussion and conclusion sections. It's important to note that this study is intentionally a short report, emphasizing data mining findings rather than laboratory results. We understand the value of in-depth investigations and concur that our work lays the groundwork for more extensive studies in this area, aiming to provide a real-world scenario. We are committed to addressing your comments and refining our manuscript to contribute meaningfully to this field. Your insights are invaluable, and we look forward to presenting an improved version of our study.

      Reviewer #2 (Recommendations For The Authors):

      The authors could try a higher G-quadruplex score of 1.4 or higher values to substantiate their findings or pick up the bacterial genomes that relied on G4s for their pathogenecity.

      We acknowledge your recommendation to explore a higher G-quadruplex score, and we would like to assure you that we have already conducted analyses using thresholds of 1.4 and 1.6. The findings consistently support the observations presented in the manuscript. We have updated the text to reflect this additional analysis, and the results are included in the revised version of the manuscript (Figure S1).

      Reviewer #3 (Recommendations For The Authors):

      Minor points

      Introduction

      Q1. The introduction is shallow. The concept and the importance of PAIs is vague. Why should these genes be different from other genes?

      A1: Thank you for your valuable feedback and we have incorporated additional content to provide a more comprehensive understanding of PAIs and their distinctiveness from other genes in the Introduction section.

      Changes: Lines 44-49 “G4 structures are ...innovative technologies.” were added.

      Lines 51-55 “PAIs are distinct...such as plasmids.” were added.

      Lines 60-66 “PAIs typically contain...recipient genome” were added.

      Lines 77-80 “Growing evidence has...CpG islands, and PAIs” were added.

      Material and Methods

      Q2. It is not clear if the author used the TBTools or the G4Hunter software G4 structures. It would be interesting to include references to published articles that used this software.

      A2: Thank you! Corrected and added more references that used TBTools to extract sequences and G4Hunter to identify G4 structures.

      Q3. The statistical significance must not be based only on p-values. P-values are influenced by sample sizes. I strongly recommend the use of other parameters such as confidence interval and ROC analysis.

      A3: Thank you! We have incorporated confidence intervals and ROC analysis to complement p-values, enhancing the robustness of our statistical analysis.

      Changes: Lines 265-267 “The correlation's significance... sensitivity and specificity.” were added.

      Results and discussion

      Q4. The stability of G4 structures seems to be important for its function (doi:10.1111/febs.15065). Therefore it would be interesting if the analysis were carried out separating the G4 according to stability.

      A4: Thank you for highlighting the importance of G4 structure stability for its function and suggesting an analysis based on stability. We have carefully reviewed the referenced paper (doi:10.1111/febs.15065) and note that their study focused on the stability analysis of individual G4s. In our current study, we identified a large number of G4s, and while stability analysis for each G4 is indeed an interesting avenue, it goes beyond the scope of this particular investigation. However, we agree that exploring the relationship between G4 stability and function is a valuable topic. We plan to delve deeper into this aspect in future work, as discussed in our response to your previous comment.

      Changes: Lines 217-221 “Lastly, the stability of G4...molecular engineering.” were added.

      Q5. The quality of the figures is poor. Is not possible to read the correlation and p-values from Figure 2.

      A5: The revised figure is now submitted with enhanced clarity to ensure that correlation and p-values can be easily discerned.

      Q6. The analysis of promoter regions should be performed taking into account the distance between the G4 and the beginning of the gene.

      A6: Thank you and we have elaborated more in the revision.

      Changes: Lines 198-106 “Additionally, considering the distance...of G4 structures in promoters.” were added.

      Q7. The topic "Putative origin, transfer mechanisms, and functions of G4s in PAIs". The comments made on this topic are purely speculative and not backed up by data or any type of experimental analysis.

      A7: We appreciate the feedback and have revised the title to emphasize the focus on the functions of G4s in PAIs. We acknowledge that the content related to the putative origin and transfer mechanisms of G4s in PAIs is purely descriptive and speculative, we have made the adjustment to relocate this information to the discussion section for a more appropriate treatment.

      Q8. The supplemental material is hard to follow. The meaning of each column should be better explained. Why was the data divided into 10 parts?

      A8: Following your suggestion, we have revised the tables for better clarity. To address concerns about the division into 10 parts, we have decided to remove this data from the tables as it was deemed unnecessary for presentation.

      Q9. Why was the data of E. Coli strains 1 and 2 shown in Tables S3 and S4 and the other bacterial strains were not?

      A9: We appreciate your inquiry. The data of E. Coli strains 1 and 2 were specifically highlighted in Tables S3 and S4 as illustrative examples to demonstrate the putative functions of G4s in PAIs within the scope of our study. Given the extensive nature of function annotation analyses across various pathogenic strains, presenting additional tables for each strain would have resulted in an impractical volume of supplementary material.

      Q10. The Results and Discussion should be separated.

      A10: Thank you! Corrected as suggested.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major changes:

      Removed any claim of label-free detection, clarifying that ADeS can predict apoptotic events without apoptotic probes

      Provided a github repository with the executable code ( https://github.com/mariaclaudianicolai/ADeS )

      Uploaded all imaging data used to train and benchmark ADeS on Zenodo ( https://zenodo.org/uploads/10260643 )

      Added supplementary movie showing degraded performance on noisy movie in vivo (Supplementary Movie 3)

      Generated a supplementary figure showing the effect of noise on prediction accuracy (Supplementary Figure 4)

      Minor changes:

      Line 6: added Benjamin Grädel and Mariaclaudia Nicolai to the list of authors

      Line 44: dynamics

      Line 54: updated reference to a published paper

      Line 65: fixed spelling of "chronic"

      Line 74: fixed spelling of "limitations"

      Line 76: changed “biochemical reporters” to “fluorescent probes”

      Line 77: changed “label-free” to “probe-free”

      Line 85: “can apply” to "can be applied"

      Line 109: The citation is updated to appear in the reference

      Lines 143-144: Fixed statement about apoptotic cells having non-significant displacement compared to arrested cells

      Line 156: Figure 3 is cited

      Line 185 and Fig 3 legends: “chore” to "core"

      Lines 187 and 248: “withouth” to "without"

      Lines 177-178: introduced acronyms for deep learning networks

      Lines 276-277: Added interval ranges to clarify subgroups observed in Figure 6F

      Line 284: substituted “SNR” with “signal-to-noise ratio”

      Line 286: mentioned “Supplementary Movie 3”

      Line 515: explicitly defined “field of view” instead of “FOVs”

      Lines 604-606: Added data availability section

      Line 822: modified caption of Figure 1D to explain the estimation of nuclear area over time

      Lines 911-912: Explained gray area in caption of figure 8B-C

      Supplementary figure 1: removed “Neu” and “Eos” acronyms from caption. Introduced definition of “FOV” and “SNR” acronyms

      Editorial assessment

      This valuable work by Pulfer et al. advances our understanding of spatial-temporal cell dynamics both in vivo and in vitro. The authors provide convincing evidence for their innovative deep learning-based apoptosis detection system, ADeS, that utilizes the principle of activity recognition. Nevertheless, the work is incomplete due to the authors' claim that their system is valid for non-fluorescently labeled cells, without evidence supporting this notion. After revisions, this work will be of broad interest to cell biologists and neuroscientists

      We acknowledge that the “label-free” claim was misleading, and in the revised manuscript we addressed this aspect by stating that ADeS is “probe-free”, not requiring any apoptotic marker. For this reason we kindly ask the editor to modify its assessment concerning the work being incomplete, as our tool was specifically meant for fluorescent microscopy.

      Reviewer #1 (Public Review):

      Summary:

      Pulfer et al., describe the development and testing of a transformer-based deep learning architecture called ADeS, which the authors use to identify apoptotic events in cultured cells and live animals. The classifier is trained on large datasets and provides robust classification accuracies in test sets that are comparable to and even outperform existing deep learning architectures for apoptosis detection. Following this validation, the authors also design use cases for their technique both in vitro and in vivo, demonstrating the value of ADeS to the apoptosis research space.

      Strengths:

      ADeS is a powerful tool in the arsenal of cell biologists interested in the spatio-temporal co-ordinates of apoptotic events in vitro, since live cell imaging typically generates densely packed fields of view that are challenging to parse by manual inspection. The authors also integrate ADeS into the analysis of data generated using different types of fluorescent markers in a variety of cell types and imaging modalities, which increases its adaptability by a larger number of researchers. ADeS is an example of the successful deployment of activity recognition (AR) in the automated bioimage analysis space, highlighting the potential benefits of AR to quantifying other intra- and intercellular processes observable using live cell imaging.

      Weaknesses:

      A major drawback was the lack of access to the ADeS platform for the reviewers; the authors state that the code is available in the code availability section, which is missing from the current version of the manuscript. This prevented an evaluation of the usability of ADeS as a resource for other researchers.

      We acknowledge that having access to the code is pivotal, and therefore in this revised version we deposited the Python code deploying our DL model on github (link). Moreover, we included in the revised manuscript the training datasets (in vitro and in vivo), as well as all the testing videos used to benchmark ADeS.

      The authors also emphasize the need for label-free apoptotic cell detection in both their abstract and their introduction but have not demonstrated the performance of ADeS in a true label-free environment where the cells do not express any fluorescent markers.

      The system was developed to primarily analyze data acquired via fluorescent microscopy, which relies on fluorescent staining to visualize cells. Therefore, it is not possible to evaluate our methodology in a 100% label-free environment. What we meant using the term “label-free” is that our method can detect apoptotic events based exclusively on morphological cues, without the use of fluorescent apoptotic reporters. We acknowledge that this terminology was misleading and we apologize for the misunderstanding. To amend this, in our revised paper we avoid using the term “label-free”, referring instead to “probe-free” detection.

      While Pulfer et al., provide a wealth of information about the generation and validation of their DL classifier for in vitro movies, and the utility of ADeS is obvious in identifying apoptotic events among FOVs containing ~1700 cells, the evidence is not as strong for in vivo use cases. They mention the technical challenges involved in identifying apoptotic events in vivo, and use 3D rotation to generate a larger dataset from their original acquisitions. However, it is not clear how this strategy would provide a suitable training dataset for understanding the duration of apoptotic events in vivo since the temporal information remains the same.

      One of the main challenges encountered in vivo was the difficulty of capturing rare events such as apoptosis in physiological conditions. Moreover the lack of publicly available datasets further prevented us from collecting an extended training dataset suitable for data-hungry techniques such as supervised deep learning. Resorting to 3D rotations was a strategy to exploit the visual information within acquisition volumes to train our classifiers for 2D detection. This approach is a common data augmentation technique that can naturally increment the size of a dataset by displaying the same object from different angles. However this technique does not explicitly address temporal aspects of the apoptotic events, such as their duration. The duration of the apoptotic events was empirically estimated to obtain a temporal window suitable for detection (Supplementary Figure 1K-L).

      The authors also provide examples of in vivo acquisitions in their paper, where the cell density appears to be quite low, questioning the need for automated apoptotic detection in those situations. In the use cases for in vivo apoptotic detection using ADeS (Fig 8), it appears that the location of the apoptotic event itself was obvious and did not need ADeS, as in the case of laser ablation in the spleen and the sparse distribution of GFP labeled neutrophils in the lymph nodes.

      Before addressing the need for these methodologies in vivo, we provide a proof of concept for their applicability. Accordingly, in vivo acquisitions present several visual artifacts and challenges that can hamper activity recognition techniques. Therefore, from a computer vision perspective, the successful implementation of ADeS in vivo is an achievement per se.

      Concerning its need, we showed in supplementary figure 3 that ADeS is robust to increasingly populated fields of view, and might be useful in detecting hindered apoptotic events as well as in reducing human-bias.

      Finally, the authors also mention that video quality altered the sensitivity of ADeS in vivo (Fig 6L) but fail to provide an example of ADeS implementation on a video of poor quality, which would be useful for end users to assess whether to adopt ADeS for their own live cell movies.

      In figure 6L we quantitatively showed that videos affected by low quality were negatively affecting the sensitivity of ADeS. In this revised version we included a supplementary movie (supplementary movie X) depicting ADeS performances in high signal-to-noise conditions. We also addressed this aspect in vitro, by generating a synthetic degradation of the movie quality and measuring the effect on the performances (supplementary figure 4).

      Reviewer #2 (Public Review):

      Summary:

      Pulfer A. et al. developed a deep learning-based apoptosis detection system named ADeS, which outperforms the currently available computational tools for in vitro automatic detection. Furthermore, ADeS can automatically identify apoptotic cells in vivo in intravital microscopy time-lapses, preventing manual labeling with potential biases. The authors trained and successfully evaluated ADeS in packed epithelial monolayers and T cells distributed in 3D collagen hydrogels. Moreover, in vivo, training and evaluation were performed on polymorphonucleated leukocytes in lymph nodes and spleen.

      Strengths:

      Pulfer A. et colleagues convincingly presented their results, thoroughly evaluated ADeS for potential toxicity assay, and compared its performance with available state-of-the-art tools.

      Weaknesses:

      The use of ADeS is still restricted to samples where cells are fluorescently labeled either in the cytoplasm or in the nucleus, which limits its use for in vitro toxicity assays that are performed on primary cells or organoids (e.g., iPSCs-derived systems) that are normally harder to transfect. In conclusion, ADeS will be a useful tool to improve output quality and accelerate the evaluation of assays in several research areas with basic and applied aims.

      As addressed in the answer to reviewer one, we primarily focused on fluorescent microscopy, which implies fluorescent labeling of the cells. The application to other imaging platforms was not the scope of our study. However, a model to infer apoptosis within other imaging solutions, e.g. brightfield, could be explored in future analogue studies.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their remarks. Please find our detailed answers bellow.

      1) The authors' continued refusal to acknowledge the other reports before the final sentence of the Discussion, which has been pointed out in two previous rounds of review as a major flaw, detracts from the manuscript significantly.

      We now acknowledge and discuss the other SIRT6-nucleosome reports in the introduction as requested by the reviewer.

      2) While some of the grammatical errors in previous versions have been corrected, many remain, especially in the Methods section

      We corrected the remaining grammatical errors.

      3) Multiple statements of fact not supported by data shown in this work continue to lack appropriate references.

      We added references where facts were not supported by our data.

    1. Author Response

      We appreciate the thoughtful comments from the reviewers. All reviewers express common support for the study’s meaningful contribution to understanding interoceptive neurocircuitry in health and in psychiatric disorders. Specifically, the reviewers highlight the strong theoretical backing and the novel combination of tasks and analytical methods. In turn, the reviewers identify several areas for improvement that we plan to address in our resubmission. These include a more detailed demographic characterization of the study participants, increased clarity when describing the statistics that support each conclusion, and additional discussion when interpreting the resting state findings, as we did not include a separate control condition for the effect of time. One reviewer commented that we largely cite our previous work with the isoproterenol paradigm; while we will provide an updated and broader view of the literature in our resubmission, there remains a limited number of comparable interoceptive perturbation studies. Finally, one comment referred to our reliance on ratings of interoceptive intensity without included additional behavioral measures. While our measures of interest were chosen for their relevance to our hypotheses, we will consider adding additional measures such as interoceptive accuracy (correspondence between heart rate and dial ratings) that were collected during the perturbation task, should they provide additional insight into the insular responses of the participants.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript presents the first evidence for a plastic enhancement in the response of pial cortical arterioles to external stimulation. Specifically, they show (p8; Figure 3A-C) that repeated application of a visual stimulus at 0.25 Hz, at the upper edge of the vasomotor response, leads to a greater change in the diameter of pial arterioles at that frequency. This adds to the earlier, referenced work of Mateo et al (2017) that showed locking - or entrainment of pial arteriole vasomotion - by stimuli at different (0.0 to 0.3 Hz) frequencies.

      We thank the reviewer for positively identifying the value of our manuscript.

      The manuscript has a major flaw. Much as there is plasticity that leads to an increase in the amplitude of vasomotion at the drive frequency, the authors need to show reversibility. This could possibly be accomplished by driving the visual system at a different frequency, say 0.15 Hz, and observing if the 0.25 Hz response is then diminished. The authors could then test if their observation is repeatable by again driving at 0.25 Hz. Unless I missed the presentation on this point, there is no evidence for reversibility.

      The reviewer has raised a very important point of view. In our experiments, the visually induced vasomotion (or visual stimulus-triggered vasomotion) was always entrained by repeated trials of the 0.25 Hz temporal frequency stimuli. When the visual stimulation stops, the vasomotion frequency lock to 0.25 Hz quickly dissipates. After saturated training with this stimulus, the parameters of the visual stimulus were switched, for example to 0.15 Hz. The animal quickly adapted to this new stimulus paradigm and the vasomotion was frequency-locked to 0.15 Hz. The adaptation to this new paradigm occurred well within 5 minutes. In Fig. 5, various paradigms were randomly tested. In some of the trials, 0.25 Hz stimulus was tested after 0.15 Hz. The vasomotion also quickly adapted back to the 0.25 Hz. We agree with the reviewer that this reversibility could have been explicitly documented in the manuscript.

      Drew, P. J., A. Y. Shih, J. D. Driscoll, P. M. Knutsen, D. Davalos, P. Blinder, K. Akassoglou, P. S. Tsai, and D. Kleinfeld. 2010. 'Chronic optical access through a polished and reinforced thinned skull', Nature Methods, 7: 981-84.

      Morii, S., A. C. Ngai, and H. R. Winn. 1986. 'Reactivity of rat pial arterioles and venules to adenosine and carbon dioxide: With detailed description of the closed cranial window technique in rats', Journal of Cerebral Blood Flow & Metabolism, 6: 34-41.

      Reviewer #2 (Public Review):

      Sasaki et al. investigated methods to entrain vasomotion in awake wild-type mice across multiple regions of the brain using a horizontally oscillating visual pattern which induces an optokinetic response (HOKR) eye movement. They found that spontaneous vasomotion could be detected in individual vessels of their wild-type mice through either a thinned cranial window or intact skull preparation using a widefield macro-zoom microscope. They showed that low-resolution autofluorescence signals coming from the brain parenchyma could be used to capture vasomotion activity using a macro-zoom microscope or optical fibre, as this signal correlates well with the intensity profile of fluorescently-labelled single vessels. They show that vasomotion can also be entrained across the cortical surface using an oscillating visual stimulus with a range of parameters (with varying temporal frequencies, amplitudes, or spatial cycles), and that the amplitude spectrum of the detected vasomotion frequency increases with repeated training sessions. The authors include some control experiments to rule out fluorescence fluctuations being due to artifacts of eye movement or screen luminance and attempt to demonstrate some functional benefit of vasomotion entraining as HOKR performance improves after repeat training. These data add in an interesting way to the current knowledge base on vasomotion, as the authors demonstrate the ability to entrain vasomotion across multiple brain areas and show some functional significance to vasomotion with regards to information processing as HOKR task performance correlates well with vascular oscillation amplitudes.

      We thank the reviewer for summarizing the value of our study and recognizing its significance.

      The aims of the paper are mostly well supported by the data, but some streamlining of the data presentation would improve overall clarity. The third aim to establish the functional significance of vasomotion in relation to plasticity in information processing could be better supported by the inclusion of some additional control experiments.

      We thank the reviewer for recognizing our vast amount of data supporting our findings. We agree that better data presentation could have improved the clarity of the manuscript.

      Specifically:

      1) The clarity and comprehensibility of the paper could be significantly enhanced by incorporating additional details in both the introduction and discussion sections. In the introduction, a succinct definition of the frequency range of vasomotion should be provided, as well as a better description of the horizontal optokinetic response (i.e. as they have in the results section in the first paragraph below the 'Entrainment of vasomotion with visual stimuli presentation' sub-heading). The discussion would benefit from the inclusion of a clear summary of the results presented at the start, and the inclusion of stronger justification (i.e. more citations) with regards to the speculation about vasomotion and neuronal plasticity (e.g. paragraph 5 includes no citations).

      We agree that a better description of vasomotion and horizontal optokinetic response could have been provided in the introduction. As the reviewer suggests, the discussion could also have started with the following summary of the results.

      “We show that visually induced vasomotion can be frequency-locked to the visual stimulus and can be entrained with repeated trials. The initial drive for the vasomotion, or the sensory-evoked hyperemia, must be coming from the neuronal activity in the visual system. The vasomotion is likely triggered by activation of the neurovascular interaction (Kayser, 2004; van Veluw et al., 2020). Surprisingly, the entrained vasomotion was observed not only in the visual cortex but also widely throughout the surface of the brain and deep in the cerebellar flocculus. The global entrainment could be realized through separate mechanisms from the local neurovascular coupling. What is also unknown is where the plasticity occurs. The neuronal visual response in the primary visual cortex could potentially decrease with repeated visual stimulation presentation as the adaptive movement of the eye should decrease the retinal slip. With repeated training sessions, a more static projection of the presented image will likely be shown to the retina. The neurovascular coupling could be enhanced with increased responsiveness of the vascules and vascular-to-vascular coupling could also be potentiated.”

      2) The novel methods for detecting vasomotion using low-resolution imaging techniques are discussed across the first four figures, but this gets a little bit confusing to follow as the authors jump back and forth between the different imaging and analysis techniques they have employed to capture vasomotion. The data presentation could be better streamlined - for instance by presenting only the methods most relevant for the functional dataset (in Figures 5-7), with the additional information regarding the various controls to establish the use of autofluorescence intensity imaging as a valid method for capturing vasomotion reduced to fewer figure panels, or moved to supplementary figures so as to not detract from the main novel findings contributed in this study.

      We apologize for the confusing presentation of the data. Many of the initial figures were technical; however, we feel that following these steps was necessary to logically conclude that shadow imaging of the autofluorescence could be used as an indicator of vasomotion. We do agree with the reviewer that going back and forth between different techniques can be confusing. We could have added separate supplementary figures to introduce the various methods used upfront before going into the findings.

      3) The authors heavily rely on representative traces from individual vessels to illustrate their findings, particularly evident in Figures 1-4. While these traces offer a valuable visualization, augmenting their approach by presenting individual data points across the entire dataset, encompassing all animals and vessels, would significantly enhance the robustness of their claims. For instance, in Figures 1 and 2, where average basal and dilated traces are depicted for a representative vessel, supplementing these with graphs showcasing peak values across all measured vessels would enable the authors to convey a more holistic representation of their data. Or in Figure 3, where the amplitude spectrum is presented for individual Texas red fluorescence intensity changes in V1 across novice, trained, and expert mice, incorporating a summary graph featuring the amplitude spectrum value at 0.25Hz for each individual trace (across animals/imaging sessions), followed by statistical analysis, would fortify the strength of their assertions. Moreover, providing explicit details on sample sizes for each individual figure panel (where not a representative trace), including the number of animals or vessels/imaging sessions, would contribute to transparency and aid readers in assessing the generalisability of the findings.

      We agree with the reviewer that summarization of the data across a number of vessels/imaging sessions would lead to more generalization of the findings. However, contrary to what the reviewer described, we did summarize the vessel diameter expansion events across multiple vessel observations in Fig. 1F, G. The vasomotion parameters were not summarized for observation in intact skull shown in Fig. 2. However, this figure was intended just to show that vessel boundary cannot be well defined in intact skull imaging and Texas Red intensity or autofluorescence intensity fluctuation would give a better indication of vessel diameter fluctuation. In Fig. 3G, the peak ratio of 0.25 Hz was calculated for individual animals at Novice, Trained, and Expert levels and summarized for n = 5 animals. Statistical analysis was also done. The variability between imaging sessions within individual animals was not analyzed; thus, this could have been indicated.

      4) In the experiments where mice are classed as "novice", "trained" or "expert", the inclusion of the specific range of the number of training sessions for each category would improve replicability.

      We agree with the reviewer that classification on the level of training should have been explicitly indicated. Mice experiencing the first visual training session were defined as “Novice”. The mice that have experienced 3 training sessions are the “Trained” mice and the performance of the “Trained” mice during the 4th training session was evaluated. Mice that experienced 8 to 11 rounds of visual training sessions are the “Expert” mice.

      5) The authors don't state whether mice were habituated to the imaging set-up prior to the first data collection, as head-fixation and restraint can be stress-inducing for animals, especially upon first exposure, which could impact their neurovascular coupling responses differentially in "novice" versus "trained" imaging sessions (e.g. see Han et al., 2020, DOI: https://doi.org/10.1523/JNEUROSCI.1553-20.2020). The stress associated with a tail vein injection prior to imaging could also partially explain why mice didn't learn very well if Texas Red was injected before the training session. If no habituation was conducted in these experiments, the study would benefit from the inclusion of some control experiments where "novice" responses were compared between habituated and non-habituated animals.

      We agree with the reviewer that stress could well affect spontaneous vasomotion as well as visually induced vasomotion (or visual stimulus-triggered vasomotion). As the reviewer suggested, we could have compared the habituated and non-habituated mice to the initial visually induced vasomotion response. In addition, whether the experimentally induced increase in stress would interfere with the vasomotion or not could also be studied. With the Texas Red experiments, we observed that tail-vein injection stress appeared to interfere with the HOKR learning process. In the experiments presented in Fig. 3, Texas Red was injected before session 1. Vasomotion entrainment likely progressed with sessions 2 and 3 training. Before session 4, Texas Red was injected again to visualize the vasomotion. The vasomotion was clearly observed in session 4, indicating that the stress induced by tail-vein injection could not interfere with the generation of visually induced vasomotion.

      6) The experiments regarding the brain-wide vasomotion entrainment across the cortical surface would benefit from some additional information about how brain regions were identified (e.g. particularly how V1 and V2 were distinguished given how close together they are).

      The brain regions were identified by referring to the Mouse Brain Atlas. As the skull was intact, the location of bregma, lambda, and midline was clearly visible. We agree with the reviewer that strict separation of V1 and V2 could be difficult if we rely on the brain atlas alone. However, what we wanted to emphasize was that there was no specific localization of the vasomotion entrainment effect.

      7) Whilst the authors show that HOKR task performance and vasomotion amplitude are increased with repeated training to provide some support to their aim of investigating the functional significance of vasomotion with regards to information processing plasticity, the inclusion of some additional control experiments would provide stronger evidence to address this aim. For instance, if vasomotion signalling is blocked or reduced (e.g. using optogenetics or in an AD mouse model where arteriole amyloid load restricts vasomotion capacity), does flocculus-dependent task performance (e.g. HOKR eye movements) still improve with repeated exposure to the external stimulus.

      We agree that experimental intervention to vasomotion is ideal to test the functional significance of vasomotion. As pharmacological intervention lacks specificity, we are currently exploring the optogenetic approach. We have never thought of using the AD mouse as a model of restricted vasomotion by amyloid, and we agree this would be an interesting model to study. However, the AD mouse model would also have deficits other than the restricted vasomotion. On the other hand, we could test whether the repeated presentation of slowly oscillating visual stimuli can have beneficial effects in improving the cognitive abilities of AD model mice.

      Reviewer #3 (Public Review):

      Summary:

      Here the authors show global synchronization of cerebral blood flow (CBF) induced by oscillating visual stimuli in the mouse brain. The study validates the use of endogenous autofluorescence to quantify the vessel "shadow" to assess the magnitude of frequency-locked cerebral blood flow changes. This approach enables straightforward estimation of artery diameter fluctuations in wild-type mice, employing either low magnification wide-field microscopy or deep-brain fibre photometry. For the visual stimuli, awake mice were exposed to vertically oscillating stripes at a low temporal frequency (0.25 Hz), resulting in oscillatory changes in artery diameter synchronized to the visual stimulation frequency. This phenomenon occurred not only in the primary visual cortex but also across a broad cortical and cerebellar surface. The induced CBF changes adapted to various stimulation parameters, and interestingly, repeated trials led to plastic entrainment. The authors control for different artefacts that may have confounded the measurements such as light contamination and eye movements but found no influence of these variables. The study also tested horizontally oscillating visual stimuli, which induce the horizontal optokinetic response (HOKR). The amplitude of eye movement, known to increase with repeated training sessions, showed a strong correlation with CBF entrainment magnitude in the cerebellar flocculus. The authors suggest that parallel plasticity in CBF and neuronal circuits is occurring. Overall, the study proposes that entrained "vasomotion" contributes to meeting the increased energy demand associated with coordinated neuronal activity and subsequent neuronal circuit reorganization.

      We thank the reviewer for providing a thorough summarization of our manuscript.

      Strengths:

      • The paper describes a simple and useful method for tracking vasomotion in awake mice through an intact skull.

      • The work controls for artefacts in their primary measurements.

      • There are some interesting observations, including the nearly brain-wide synchronization of cerebral blood flow oscillations to visual stimuli and that this process only occurs after mice are trained in a visual task.

      • This topic is interesting to many in the CBF, functional imaging, and dementia fields.

      We thank the reviewer for positively recognizing the strength of the paper.

      Weaknesses:

      • I have concerns with the main concepts put forward, regarding whether the authors are actually studying vasomotion as they state, as opposed to functional hyperemia which is sensory-induced changes in blood flow, which is what they are actually doing. I recommend several additional experiments/analyses for them to explore. This is mostly further characterizing their effect which will benefit the interpretations.

      We recognized that the terminology used in our paper was not explicitly explained. Traditionally, “vasomotion” is defined as the dilation and constriction of the blood vessels that occurs spontaneously at low frequencies in the 0.1 Hz range without any apparent external stimuli. Sensory-induced changes in the blood flow are usually called “hyperemia”. However, in our paper, we used the term, vasomotion, literally, to indicate both forms of “vascular” “motion”. Therefore, the traditional vasomotion was called “spontaneous vasomotion” and the hyperemia induced with slow oscillating visual stimuli was called “visually induced vasomotion”.

      Using our newly devised methods, we show the presence of “spontaneous vasomotion”. However, this spontaneous vasomotion was often fragmented and did not last long at a specific frequency. With visual stimuli that slowly oscillated at temporal frequencies close to the frequency of spontaneous vasomotion, oscillating hyperemia, or “visually induced vasomotion” was observed.

      • Neuronal calcium imaging would also benefit the study and improve the interpretations.

      In our paper, we mainly studied the visually induced vasomotion (or visual stimulus-triggered vasomotion). Therefore, visual stimulation must first activate the neurons and, through neurovascular coupling, the initial drive for vasomotion is likely triggered. However, visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex. We also do not know how the synchronized vasomotion can spread throughout the whole brain. Where the plasticity for vasomotion entrainment occurs is also unknown. To identify the extent of the neuronal contribution to the vasomotion triggering, whole brain synchronization, and vasomotion entrainment, simultaneous neuronal calcium imaging would be ideal. However, due to the fact that fluorescent Ca2+ indicators expressed in neurons would also be distorted by the “shadow” effect from the vasomotion, exquisite imaging techniques would be required.

      • The plastic effects in vasomotion synchronization that occur with training are interesting but they could use an additional control for stress. Is this really a plastic effect, or is it caused by progressively decreasing stress as trials and progress? I recommend a habituation control experiment.

      As also pointed out by reviewer #2, we agree that, whether stress would affect visually induced vasomotion or not could be studied. Studying the visually induced vasomotion in mice well-habituated to the experimental apparatus would give an idea of whether stress could truly be a profounding factor affecting vasomotion. On the other hand, whether acutely induced stress can interfere with the already entrained vasomotion could also be studied. In the experiments presented in Fig. 3, Texas Red was injected via the tail vein, which would be quite stressful for the mouse. However, in the trained mouse, visually induced vasomotion could be observed regardless of the stress. It is likely that stress can interfere with the acquisition of vasomotion entrainment, but the already acquired entrainment will not be canceled with acute stress induced by tail-vein injection. We agree that further relationship between stress and vasomotion and plasticity related to vasomotion entrainment could be investigated.

      Appraisal

      I think the authors have an interesting effect that requires further characterization and controls. Their interpretations are likely sound and additional experiments will continue to support the main hypothesis. If brain-wide synchrony of blood flow can be trained and entrained by external stimuli, this may have interesting therapeutic potential to help clear out toxic proteins from the brain as seen in several neurodegenerative diseases.

      We thank the reviewer for the positive evaluation of our manuscript. Strong entrainment of visually induced vasomotion was observed with a simple presentation of slowly oscillating visual stimuli for several days. This is a totally non-invasive method to train the vasomotion capacity. As the reviewer recognizes, potential benefits for the treatment of dementia and neurodegenerative diseases could be evaluated with further studies.

    1. Author Response:

      We thank the reviewers and editor for their careful analysis of our manuscript and their appreciation of its strengths. Our plans to address the reviewers’ concerns regarding the weaknesses of the study are outlined below.

      Reviewing Editor (Public Review):

      “Weaknesses mainly concern the experiments and arguments leading to the authors' notion that Cav3 channels may partially compensate for the loss of Cav1.4 calcium currents in cone synapses. It is possible that the non-conducting Cav1.4 variant supports synapse development and the Cav3 channel then provides the calcium influx. However, in its current state, the study does not unequivocally assess Cav3 expression in wild-type cones, it lacks direct evidence of Cav3 expression and upregulation, e.g. via single cell transcriptomics, immunolabeling, or an elaboration on electrophysiology, and it does not test the authors' earlier idea that Cav1.4 might couple to intracellular calcium stores at photoreceptor synapses.”

      Current transcriptomic studies indicate that Cav3 transcripts are present at extremely low levels compared to that for Cav1.4 in cones of young mice (PMID 26000488, summarized in PMID 35650675), adult mice (PMID: 36807640), macaque (PMID 30712875), and human (PMID 31075224). Thus, it was somewhat surprising that Davison et al reported the presence of low voltage activated (LVA) Cav3-like currents with amplitudes that were ~50% of that for the Cav1 current in mouse cones at -40 mV (PMID 35803735). Using similar pharmacological criteria as Davison et al, we did not find functional evidence for a LVA current in cones of wild-type (WT) mouse retina: the Ca2+ current in our recordings was suppressed by the Cav1 antagonist isradipine (Fig 3a) but minimally affected in the expected voltage range by the Cav3 antagonist ML218 (Fig 3b). In WT mouse, voltage clamp steps from -90 mV to more depolarized voltages failed to show a transient inward current at onset (Fig 2e), which is a hallmark of LVA calcium currents. In addition, by standard physiological and pharmacological critera, we could not identify LVA currents in cones of ground squirrel (Fig.3c,d) and macaque retina (Supp. Fig.S3). Our results argue against a significant role for LVA currents in mammalian cones.

      A problem that we discovered (as did Davison et al, their Fig.2C) was that Cav3 blockers (e.g., ML218 and Z944) have non-specific actions on the high voltage activated (HVA) Ca2+ current (presumably mediated by Cav1.4) in WT mouse cones. This is clearly shown in our Supp. figure S1a-b where ML218 causes a dose-dependent negative shift in the I-V relationship but also inhibition of current density in HEK293T cells transfected with Cav1.4. We are planning a second study to thoroughly characterize these actions of ML218 and Z944 on Cav1 channels as the results are important for understanding the actions of these drugs in cell-types with mixed populations of Cav1 and Cav3 channels.

      A second problem is that dihydropyridines (DHP) used in both our study and that of Davison et al (e.g., isradipine, nifedipine) incompletely and slowly block Cav1 channels at negative membrane potentials (PMID: 12853422). Due to the slow kinetics of DHP block, Cav1 currents in the presence of such blockers can appear to inactivate rapidly (see Fig.6A in PMID 11487617). Thus, the Cav current recorded in the presence of DHP blockers in WT mouse cones may represent unblocked Cav1.4-mediated currents that appear rapidly inactivating, and therefore misconstrued as being mediated by Cav3 channels.

      Given the caveats of the pharmacological approach, we agree that stronger evidence is needed to rule out a small contribution of Cav3 channels in WT mouse cones. As mentioned in our text, we have found that currently available Cav3 antibodies produce similar patterns of immunofluorescence in WT and corresponding Cav3 KO retina so analysis at the level of Cav proteins is not possible. Thus, we are planning to compare the relative expression of Cav channel genes in cones using drop-seq experiments of G369i KI and WT mouse retina. We also plan to elaborate on our electrophysiological dissection of the HVA and LVA currents.

      Among the 3 Cav3 subtypes, Cav3.2 was the only one detected in mouse cones by Davison et al using nested RT-PCR (PMID 35803735). Thus, we obtained the Cav3.2 mouse strain from JAX (B6;129-Cacna1htm1Kcam/J) and generated a Cav3.2 KO/G369i KI double mutant mouse strain. If the Cav3 current that appears in the G369i KI cones is mediated by Cav3.2, then it should be undetectable in cones of the double mutant mice. Moreover, if these Cav3.2 channels contribute to the residual cone synaptic responses in G369i KI mice, then the double mutant mice should be deficient in this regard. We will test these predictions in patch clamp recordings and ERGs.

      Finally, we will conduct Ca2+ imaging experiments in cone terminals of the WT vs G369i KI mice to test whether increased coupling of Cav channels to intracellular Ca2+ release may be involved in cone synaptic responses of the G369i KI mice.

      Reviewer #1 (Public Review):

      Weaknesses:

      “The major criticism that I have of the study is that it infers Ca channel molecular composition based solely on pharmacological analysis, which, as the authors note, is confounded by the cross-reactivity of many of the "specific" channel-type antagonists. The authors note that Cav3 mRNAs have been found in cones, but here, they do not perform any analysis to examine Cav3 transcript expression after G369i-KI nor do they examine Ca channel transcript expression in monkey or squirrel cones, which serve as controls of sorts for the G369i-KI (i.e. like WT mouse cones, cones of these other species do not seem to exhibit LVA Ca currents).”

      Actually, we also used non-pharmacological (i.e., electrophysiological) criteria to back up our interpretation that Cav3 channels contribute to the Cav current in cones primarily in the absence of functional Cav1.4 channels. For example, in Fig.2, we show that the Ca2+ current in G369i KI and Cav1.4 KO mice exhibit the hallmarks of the Cav3 channel (negative activation and inactivation voltages and window current, rapid inactivation), which are quite distinct from the Ca2+ currents in WT cones. In recordings of ground squirrel and macaque cones (Supp.Figs.S2-3), negative holding voltages do not unmask a LVA current according to various criteria. In addition to the transcriptomic approaches described above, we plan to elaborate on the electrophysiological evidence for the absence of a LVA current in WT mouse cones as part of the revision.

      “Secondarily, in Maddox et al. 2020, the authors raise the possibility that G369i-KI, by virtue of having a functional voltage-sensing domain-might couple to intracellular Ca2+ stores, and it seems appropriate that this possibility be considered experimentally here.”

      We will conduct Ca2+ imaging experiments in cone terminals of the WT vs G369i KI mice to test whether increased coupling of Cav channels to intracellular Ca2+ release may be involved in cone synaptic responses of the G369i KI mice.

      “As a minor point: the authors might wish to note - in comparison to another retinal ribbon synapse-that Zhang et al. 2022 (in J. Neuroscience) performed a study of mouse rod bipolar cells found a number of LVA and HVA Ca conductances in addition to the typical L-type conductance mediated by Cav1-containing channels.”

      We are aware of the extensive evidence for the expression of Cav3 channels in retinal bipolar cells (PMID 11604141, 22909426, 19275782, 35896423) and our recordings of cone bipolar cells in ground squirrel confirm this (Supp. Fig.S2D). We could add reference to this work in our revision.

      Reviewer #2 (Public Review):

      Weaknesses:

      “The major critiques are related to the description of the Cav1.4 knock-in mouse as "sparing" function, which can be remedied in part by a simple rewrite, and in certain places, the data may need to be examined more critically. In particular, the authors should address features in the data presented in Figures 6 and 7 that seem to indicate that the retina of the Cav1.4 knock-in is not intact, but the interpretation given by the authors as "intact" is not appropriate and made without rigorous statistical testing.”

      We intended to use “sparing” and “intact” to indicate that cone synapses are present and to some extent functional, in contrast to their complete absence in the Cav1.4 KO mouse. However, we recognize this may be misinterpreted as “normal”. As suggested by the reviewer, we will revise our statistical analyses and text to clarify that cone synaptic responses do indeed differ significantly in G369i KI as compared to WT mice. We feel that this will be a strong addition to the study and will emphasize the key point that Cav3 cannot fully compensate for loss of Cav1.4 with respect to cone synapse structure and function.

      Reviewer #3 (Public Review):

      Weaknesses:

      “The study has been expertly performed but remains descriptive without deciphering the underlying molecular mechanisms of the observed phenomena, including the proposed homeostatic switch of synaptic calcium channels. Furthermore, a relevant part of the data in the present paper (presence of T-type calcium channels in cone photoreceptors) has already been identified/presented by previous studies of different groups (Macosko et al., 2015; pmid 26000488; Davison et al., 2021; pmid 35803735; Williams et al., 2022; pmid 35650675). The degree of novelty of the present paper thus appears limited.”

      We respectfully disagree that our paper lacks novelty. As indicated by Reviewer 2, a major advance of our study is in providing a mechanism that can explain the longstanding conundrum that congenital stationary night blindness type 2 mutations that would be expected to severely compromise Cav1.4 function do not produce complete blindness. We also disagree that the presence of T-type channels in cone photoreceptors has been unequivocally demonstrated, as the non-biased transcriptomic approaches show very little Cav3 transcript expression in mouse cones (PMIDs 26000488, 35650675, 36807640), macaque cones (PMID 30712875), and human cones (PMID 31075224). Transcription may not equate to translation, particularly at low expression levels. We also note that the one study to date that suggests a functional contribution of Cav3 channels in mouse cones (Davison et al., 2021; pmid 35803735) used a DHP to isolate the “LVA” current, which is problematic as described above. Our demonstration of minimal or undetectable Cav3-type currents in mammalian cones using physiological and pharmacological approaches, while a negative result, adds important context to the recent literature. As described in our response to the editor’s review, our planned revisions include testing whether Cav3 transcripts are upregulated in G369i KI cones and whether the Cav3.2 subtype suggested to be present in cones (PMID 35803735) contributes to Cav currents in these cells using Cav3.2 KO and Cav3.2 KO/G369i KI double mutant mice.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviews

      We would like to extend our thanks to the reviewers who took the time to carefully read our paper and provide thoughtful insights and suggestions on how to strengthen our conclusions. All reviewers agreed that our study presented strong data supporting a role for triglyceride lipase brummer (bmm) in regulating testis lipid droplets and spermatogenesis in Drosophila, and that our findings advance our understanding of lipid biology during sperm development. Reviewers made several helpful suggestions on how to strengthen our manuscript even further. Below, we outline how we revised our manuscript in response to reviewer comments to ensure we clearly communicate our data and conclusions with readers, and properly contextualize our findings.

      REVIEWER 1

      In this study, the authors investigate the role of triglycerides in spermatogenesis. This work is based on their previous study (PMID: 31961851) on triglyceride sex differences in which they showed that somatic testicular cells play a role in whole body triglyceride homeostasis. In the current study, they show that lipid droplets (LDs) are significantly higher in the stem and progenitor cell (pre-meiotic) zone of the adult testis than in the meiotic spermatocyte stages. The distribution of LDs anti-correlates with the expression of the triglyceride lipase Brummer (Bmm), which has higher expression in spermatocytes than early germline stages. Analysis of a bmm mutant (bmm[1]) - a P-element insertion that is likely a hypomorphic - and its revertant (bmm[rev]) as a control shows that bmm acts autonomously in the germline to regulate LDs. In particular, the number of LDs is significantly higher in spermatocytes from bmm[1] mutants than from bmm[rev] controls. Testes from males with global loss of bmm (bmm[1]) are shorter than controls and have fewer differentiated spermatids. The zone of bam expression, typically close to the niche/hub in WT, is now many cell diameters away from the hub in bmm[1] mutants. There is an increase in the number of GSCs in bmm[1] homozygotes, but this phenotype is probably due to the enlarged hub. However, clonal analyses of GSCs lacking bmm indicate that a greater percentage of the GSC pool is composed of bmm[1]-mutant clones than of bmm[rev]-clones. This suggests that loss of bmm could impart a competitive advantage to GSCs, but this is not explored in greater detail. Despite the increase in number of GSCs that are bmm[1]-mutant clones, there is a significant reduction in the number of bmm[1]-mutant spermatocyte and post-meiotic clones. This suggests that fewer bmm[1]-mutant germ cells differentiate than controls. To gain insights into triglyceride homeostasis in the absence of bmm, they perform mass spec-based lipidomic profiling. Analyses of these data support their model that triglycerides are the class of lipid most affected by loss of bmm, supporting their model that excess triglycerides are the cause of spermatogenetic defects in bmm[1]. Consistent with their model, a double mutant of bmm[1] and a diacylglycerol Oacyltransferase 1 called midway (mdy) reverts the bmm-mutant germline phenotypes.

      There are numerous strengths of this paper. First, the authors report rigorous measurements and statistical analyses throughout the study. Second, the authors ulize robust genetic analyses with loss-of-function mutants and lineage-specific knockdown. Third, they demonstrate the appropriate use of controls and markers. Fourth, they show rigorous lipidomic profiling. Lastly, their conclusions are appropriate for the results. In other words, they don't overstate the results.

      We thank the Reviewer for their positive assessment of our paper.

      There are a few weaknesses. Although the results support the germline autonomous role of bmm in spermatogenesis, one potential caveat that the mdy rescue was global, i.e., in both somatic and germline lineages. The authors did not recover somatic bmm clones, suggesting that bmm may be required for somatic stem self-renewal and/or niche residency. While this is beyond the scope of this paper, it is possible that somatic bmm does impact germline differentiation in a global bmm mutant.

      In the revised manuscript, we made several changes to address these points.

      1) We now clearly state when we used global versus germline-only loss of mdy to rescue bmm mutant phenotypes in the testis.

      “Notably, at least some of the effects of global loss of mdy on bmm1 males can be attributed to the germline:

      RNAi-mediated knockdown of mdy in the germline of bmm1 males partially rescued the defects in testis size (Figure 4I; Kruskal-Wallis rank sum test with Dunn’s multiple comparison test) and GSC variance (Figure S5J; p=4.5 x 10-5 and 8.2 x 10-3 by F-test from the GAL4- and UAS-only crosses, respectively).”

      “Importantly, testes isolated from males with global loss of both bmm and mdy (mdyQX25/k03902;bmm1) had fewer LD than testes dissected from bmm1 males (Figures 5D, S5I; one-way ANOVA with Tukey multiple comparison test).”

      2) We also discuss the possibility that somatic bmm may play a role in germline differentiation in a global bmm mutant, and present phenotypic data on somatic bmm1 clones.

      “We also reveal a potential non-cell-autonomous role for somatic bmm. While there was no difference in the ratio of Zd-1-positive cells between homozygous clones and heterozygous clones in animals carrying the bmm1 or bmmrev alleles at 14 days post clone induction (Figure S4O; Kruskal-Wallis rank sum test), the distance from the hub to the Zd-1 positive clones reside was significantly decreased in bmm1 homozygous clones (Figure S4P; Kruskal-Wallis rank sum test). Together, these data indicate bmm may play a cell-autonomous role in germline cells, and potentially a non-cell-autonomous role in somatic cells, to regulate spermatogenesis.”

      3) Finally, we clarify that we were unable to assess somatic LD. Specifically, this was a technical issue as the dye we use to visualize testis LD is incompatible with staining protocols to identify somatic cells. As a result, we were unable to count LD in somatic clones with confidence.

      “While we were unable to assess LD in bmm1 somatic clones, our data when taken together reveals a previously unrecognized cell-autonomous role for bmm as a regulator of testis LD in germline cells.”

      Regarding data presentation, I have a minor point about Fig. 3L: why aren't all data shown as box plots (only Day 14 bmm[rev] does).

      In our revised manuscript Figure 4L does present a boxplot across all genotypes and times; the appearance of ‘no boxes’ is simply due to the large number of datapoints with a value of zero, which compress the box near the X-axis.

      Finally, the authors provide a detailed pseudotime analysis of snRNA-seq of the testis in Fig. S2A-D, but this analysis is not sufficiently discussed in the text.

      In the revised manuscript we added text to describe our pseudotime analysis of single-cell RNA seq data in more detail.

      “Using pseudotime analysis, we arranged the germline (Figure S2A) and the somatic cells (Figure S2B) based on their annotated developmental trajectory. The expression pattern of bmm in the germline matched our observation with bmm-GFP reporter (Figure S2C). While levels of the bmm-GFP reporter were lower in somatic cells, single-cell RNA sequencing data identified bmm expression in the somatic lineage that was higher in cells at later stages of development (Figure S2D). Additional neutral lipid- and lipid droplet-associated genes such as lipid storage droplet-2, Seipin, Lipin, and midway also showed differential regulation during differentiation (Figure S2C, S2D). Combined with our data on the location of testis LD, these data suggest that bmm upregulation in both somatic and germline cells during differentiation corresponds to the downregulation of testis LD. Supporting this, germline GFP levels were negatively correlated with testis LD in bmm-GFP flies (Figure 2A, 2C), suggesting regions with higher bmm expression had fewer LD.”

      Overall, the many strengths of this paper outweigh the relatively minor weaknesses. The rigorously quantified results support the major aim that appropriate regulation of triglycerides are needed in a germline cell-autonomous manner for spermatogenesis.

      This paper should have a positive impact on the field. First and foremost, there is limited knowledge about the role of lipid metabolism in spermatogenesis. The lipidomic data will be useful to researchers in the field who study various lipid species. Going forward, it will be very interesting to determine what triglycerides regulate in germline biology. In other words, what functions/pathways/processes in germ cells are negatively impacted by elevated triglycerides. And as the authors point out in the discussion, it will be important to determine what regulates bmm expression such that bmm is higher in later stages of germline differentiation.

      We agree with the reviewer about the many interesting future directions for this project. We added a model figure in the revised manuscript to visualize our findings and highlight remaining questions about how bmm and triglycerides support normal spermatogenesis in Drosophila (Fig. 6).

      REVIEWER 2

      Summary:

      Here, the authors show that neutral lipids play a role in spermatogenesis. Neutral lipids are components of lipid droplets, which are known to maintain lipid homeostasis, and to be involved in non-gonadal differentiation, survival, and energy. Lipid droplets are present in the testis in mice and Drosophila, but not much is known about the role of lipid droplets during spermatogenesis. The authors show that lipid droplets are present in early differentiating germ cells, and absent in spermatocytes. They further show a cell autonomous role for the lipase brummer in regulating lipid droplets and, in turn, spermatogenesis in the Drosophila testis. The data presented show that a relationship between lipid metabolism and spermatogenesis is congruous in mammals and flies, supporting Drosophila spermatogenesis as an effective model to uncover the role lipid droplets play in the testis.

      We thank the Reviewer for their positive assessment of our paper.

      Strengths and weaknesses:

      The authors do a commendably thorough characterization of where lipid droplets are detected in normal testes: located in young somatic cells, and early differentiating germ cells. They use multiple control backgrounds in their analysis, including w[1118], Canton S, and Oregon R, which adds rigor to their interpretations. The authors employ markers that identify which lipid droplets are in somatic cells, and which are in germ cells. The authors use these markers to present measured distances of somatic and germ cell-derived lipid droplets from the hub. Because they can also measure the distance of somatic and germ cells with age-specific markers from the hub, these results allow the authors to correlate position of lipid droplets with the age of cells in which they are present. This analysis is clearly shown and well quantified.

      The quantification of lipid droplet distance from the hub is applied well in comparing brummer mutant testes to wild type controls. The authors measure the number of lipid droplets of specific diafteters, and the spatial distribution of lipid droplets as a function of distance from the hub. These measurements quantitatively support their findings that lipid droplets are present in an expanded population of cells further from the hub in brummer mutants. The authors further quantify lipid droplets in germline clones of specified ages; the quantitative analysis here is displayed clearly, and supports a cell autonomous role for brummer in regulating lipid droplets in spermatocytes.

      Data examining testis size and number of spermatids in brummer mutants clearly indicates the importance of regulating lipid droplets to spermatogenesis. The authors show beautiful images supported by rigorous quantification supporting their findings that brummer mutants have both smaller testes with fewer spermatids at both 29 and 25C. There is also significant data supporting defects in testis size for 14-day-old brummer mutant animals compared to controls. The comparison of number of spermatids at this age is not significant, which does not detract from the story but does not support sperm development defects specifically caused by brummer loss at 14 days. Their analysis clearly shows an expanded region beyond the testis apex that includes younger germ cells, supporting a role for lipid droplets influencing germ cell differentiation during spermatogenesis.

      We thank the reviewer for pointing out this inaccuracy in our manuscript. In the revised manuscript we chose more precise language to describe defects in 14-day-old bmm mutants:

      “Defects in testis size were also observed at 14-day post eclosion; suggesting testis size defects persist later into the life course (Figure S4C; Welch two-sample t-test). In contrast, the number of spermatid bundles per testis was not significantly different between bmm1 and bmmrev males at this age (Figure S4D; Welch two-sample ttest), potentially due to a large decrease in the number of spermatid bundles in 14-day-old bmmrev males (Figure 4C, S4D).”

      The authors present a series of data exploring a cell autonomous role for brummer in the germline, including clonal analysis and tissue specific manipulations. The clonal data indicating increased lipid droplets in spermatocyte clones, and a higher proportion of brummer mutant GSCs at the hub are convincing and supported by quantitation. The authors also show a tissue specific rescue of the brummer testis size phenotype by knocking down mdy specifically in germ cells, which is also supported by statistically significant quantitation. The authors present data examining the number of spermatocyte and post-meiotic clones 14 days aeer clonal induction. While data they present is significant with a 95% confidence interval and a p value of 0.0496, its significance is not as robust as other values reported in the study, and it is unclear how much information can be gained from that specific result.

      We thank the reviewer for raising this point. In the revised manuscript we displayed the p-value clearly in the text and on the figure to ensure our statistical output is clear for readers to evaluate our conclusions regarding bmm mutant clones 14 days after clone induction. We also state that the finding should be reproduced by others given that the statistical significance of this result was not as strong as our other data.

      “Because we observed significantly fewer bmm1 spermatocyte and spermatid clones at 14 days after clone induction (Figure 4K,4L; p = 0.0496, Kruskal-Wallis rank sum test), these effects on germline development may represent a cell-autonomous role in regulating spermatogenesis for bmm in this cell type. Given that the statistical significance of this finding was not as strong as for our other data, future studies should repeat this experiment with more samples.”

      The authors do a beautiful job of validating where they detect brummer-GFP by presenting their own pseudotime analysis of publicly available single cell RNA sequencing data. Their data is presented very clearly, and supports expression of brummer in older somatic and germline cells of the age when lipid droplets are normally not detected. The authors also present a thorough lipidomic analysis of animals lacking brummer to identify triglycerides as an important lipid droplet component regulating spermatogenesis.

      Impact:

      The authors present data supporting the broad significance of their findings across phyla. This data represents a key strength of this manuscript. The authors show that loss of a conserved triglyceride lipase impacts testis development and spermatogenesis, and that these impacts can be rescued by supplementing diet with medium chain triglycerides. The authors point out that these findings represent a biological similarity between Drosophila and mice, supporting the relevance of the Drosophila testis as a model for understanding the role of lipid droplets in spermatogenesis. The connection buttresses the relevance of these findings and this model to a broad scientific community.

      We thank the Reviewer very much for their positive assessment of our paper!

      REVIEWER 3

      In this manuscript, Chao et al seek to understand the role of brummer, a triglyceride lipase, in the Drosophila testis. They show that Brummer regulates lipid droplet degradation during differentiation of germ and somatic cells, and that this process is essential for normal development to progress. These findings are interesting and novel, and contribute to a growing realisation that lipid biology is important for differentiation.

      We thank the Reviewer for their positive comments about our manuscript.

      Major comments:

      1) The data in Figs 1 and 2, while helpful in setting the scene, do not add much to what was previously shown by the same group, namely that lipid droplets are present in both early germ cells and early somatic cells in the testis, and that Bmm regulates their degradation (PMID: 31961851). Measuring the distance of lipid droplets from the hub, while helpful in quantifying what is apparent, that only stem and early differentiated stages have lipid droplets, is not as informative as the way data are presented later (Fig. 2I), where droplets in specific stages are measured. Much of this could be condensed without much overall loss to the manuscript.

      We thank the reviewer for this comment. In our revised manuscript we edited the first part of the paper while still preserving the detailed characterization that builds upon our previous paper.

      2) It would be important to show images of the clones from which the data in Fig. 2I are generated. The main argument is that Bmm regulates lipid droplets in a cell autonomous manner; these data are the strongest argument in support of this and should be emphasised at the expense of full animal mutants (which could be moved to supplementary data).

      We thank the reviewer for this comment. In the revised manuscript we added a figure showing lipid droplets in control and bmm mutant spermatocyte clones in Fig. 3A, 3B with a quantification of this data in Figure 3C.

      Similarly, the title of Fig. S2 ("brummer regulates lipid droplets in a cell autonomous manner") should be changed as the figure has no experiments with cell (or cell-type)-specific knockdowns/mutants. This figure does show changes in lipid droplets in both lineages in bmm mutants, so an appropriate title could be "brummer regulates lipid droplets in both germ and soma".

      We thank the reviewer for this comment, we adjusted the Figure 2 legend title in the revised manuscript to “brummer regulates lipid droplets in both germline and somatic cells of the testis”.

      3) Interestingly, the clonal data show that bmm is dispensable in germ cells until spermatocyte stages, as no increase in lipid droplet number is seen until then. This should be more clearly stated, as it indicates that the important function of Bmm is to degrade lipid droplets at the transition from spermatogonial to spermatocyte stages. This is consistent with the phenotypes observed in which late stage germ cells are reduced or missing. However, the effect on niche retention of the mutant GSCs at the expense of neighbouring wildtype GSCs is hard to explain. Are lipid droplets in mutant GSCs larger than in control? Is there any discernible effect of bmm mutation on lipids in GSCs? Additionally, bam expression is delayed, suggesting that bmm may have roles on cell fate in earlier stages than its roles that can be detected on lipid droplets.

      We thank the reviewer for this comment. We included more text in the revised manuscript to clarify the key role bmm plays in regulating lipid droplets at the spermatogonia-spermatocyte transition.

      “Because we observed no significant effect of cell-autonomous bmm loss on LD at any other stage of germline development (Figure 3C), this suggests bmm function is not required to regulate LD at early stages of germ cell development. Instead, our data suggests bmm plays a role in regulating LD at the spermatogonia-spermatocyte transition.”

      We also added more detail to our description of how bmm affects lipid droplets in cells at the earliest stages of germline development.

      “Given that we detected no effect of cell-autonomous bmm loss on the number of GSC LD (Fig. 3C), more work will be needed to understand how bmm regulates GSC at a stage prior to its effects on LD number.”

      4) The bmm loss-of-function phenotype could be better described. Some of the data is glossed over with little description in the text (see for example the reference to Fig. 3A-C). For instance, in the discussion, the text states "loss of bmm delays germline differentiation leading to an accumulation of early-stage germ cells" (p13, l.25960). However, this accumulation has not been clearly shown, or at least described in the manuscript. Most of the data show a reduction (or almost complete absence) of differentiated cell types. This could indeed be due to delayed differentiation, or alternatively to a block in differentiation or to death of the differentiated cells. The clonal data presented show a decrease in the number of cells recovered, but do not allow inferences as to the timing of differentiation, making it hard to distinguish between the various possibilities for the lack of differentiated spermatids. Apart from data showing that GSCs are more likely to remain at the niche, no further data are shown to support the fact that mutant germ cells accumulate in early stages. While additional experiments could help resolve some of these issues, much of this could also be resolved by tempering the conclusions drawn in the text.

      We thank the reviewer for these comments. In the revised manuscript we temper our conclusions regarding bmm’s precise role in spermatogenesis by discussing different mechanisms (e.g. differentiation or death) that could lead to the phenotypes we observe.

      “This regulation is important for sperm development, as our data indicates that loss of bmm causes a decrease in the number of differentiated cell types. This reduction in differentiated cell types may be attributed to a delay in differentiation, a block in differentiation, or to a loss of differentiated cells through cell death. Future studies will therefore be essential to resolve why bmm loss causes a reduction in differentiated cell types.”

      5) In the discussion (p.14, l-273 onwards), the authors suggest that products of triglyceride breakdown are important for spermatogenesis. However, an alternative interpretation of the results presented here (especially those using the midway mutant) could be that triglycerides impede normal differentiation directly. Indeed, preventing the cells' ability to produce triglycerides in the first place can rescue many of the defects observed. A better discussion of these results with a model for the function of triglycerides and their by-products would be a great improvement to this manuscript.

      We thank the reviewer for this comment. To ensure our data is clearly communicated with readers, we added a model to the paper suggesting how triglyceride and its by-products influence spermatogenesis (Fig. 6) and text to clarify that triglyceride could potentially impeded differentiation.

      “It will also be important to determine whether it is the loss of metabolites produced by bmm’s enzymatic action, or an increase in triglycerides, that leads to the reduction in differentiated cell types during spermatogenesis. Together, these experiments will provide critical insight into how triglyceride stored within testis LD contributes to overall cellular lipid metabolism during spermatogenesis.”

      Together, these changes will strengthen our overall finding that bmm-mediated regulation of testis triglyceride is important for normal sperm development. Because our findings in flies align with and extend data from rodent models, the developmental mechanisms we uncovered about how triglyceride lipase bmm regulates testis lipid droplets and sperm development will likely operate in other species.  

      Reviewer #1 (Recommendations For The Authors):

      I have a minor concern about methodology: how were spermatocytes identified? I ask because data in Figure 3 indicate that there is a significant delay in germline differentiation in the bmm[1] mutant, with relatively smaller germ cells throughout the apical half of the testis. Typical large spermatocyte-like cells are not clearly obvious to me in Fig. 3.

      We thank the Reviewer for suggesting we add more clarity to how we identified spermatocytes. We state in the revised manuscript how we identify spermatocytes:

      “Cells in the testis region occupied by primary spermatocytes were identified by their large cell size and decondensed chromosome staining occupying three nuclear domains [120].”

      Also, we note that while it is difficult to see where the bmm1 testis have spermatocytes in Fig. 4E, this is due to the large number of early-stage cells in this close-up image. The spermatocytes can be more easily seen in Fig. 4I and 4I’ when the whole testis is included in the image.    

      Reviewer #2 (Recommendations For The Authors):

      • Lines 197-198 mention "Boule-positive area," "individualization complexes," and "waste bags." It would be helpful to the reader to explain what these measurements are to help contextualize the data shown related to these statements.

      We thank the Reviewer for this comment. We added the following text to the revised manuscript:

      “Because Boule-positive area, individualization complexes, and waste bags are all markers for later stages in sperm development, these data indicate the loss of bmm causes a reduction in differentiated cell types.”

      • Line 162 states a defect in sperm development observed in 14-day-old bmm[1] males, but the data presented in Figure S3D does not show a significant difference. The words "sperm development" should be removed from this sentence.

      We thank the Reviewer for pointing out this inaccurate statement. We fixed the statement as follows in the revised manuscript:

      “Defects in testis size were also observed at 14-day post eclosion; suggesting testis size defects persist later into the life course (Figure S4C; Welch two-sample t-test). In contrast, the number of spermatid bundles per testis was not significantly different between bmm1 and bmmrev males at this age (Figure S4D; Welch two-sample ttest), potentially due to a large decrease in the number of spermatid bundles in 14-day-old bmmrev males (Figure 4C, S4D).”

      • Line 294 has a typo: "regulating" should likely be "regulated"

      We thank the Reviewer for pointing out this mistake, which we corrected.

      • Line 456 should include the length of time for heat shock

      We thank the Reviewer for pointing out this omission. We now include these details:

      “Adult males were collected at 3-5 days post-eclosion and heat-shocked three times at 37°C for 30 min followed by a 10 min rest period at room temperature between heat shocks.”

      • Methods section beginning on Line 442 might include an explanation of how hub area was quantified.

      We thank the Reviewer for this suggestion. We now include the following information:

      “Hub size was measured by quantifying FasIII-positive area of the testis.”

      • Figure 1 legend could benefit from adding a statement on how spermatocytes (arrowheads) were identified

      We thank the Reviewer for this suggestion, we now refer the reader to the more detailed description in the methods section.

      • Figure 2A should present the merged panel in A' first. The legend states that Panel A shows Lipid Droplets, but LipidTox is not shown until A'.

      We thank the Reviewer for this suggestion, we now clarify that the text refers to panels A-A''''.

      • Figure 2I would benefit from a key, to emphasize that these are individual cell clones, highlighting the idea of cell autonomous effects of bmm in the spermatocytes. Showing example images of spermatocyte clones with increased lipid droplets could also emphasize this result. The legend for this panel should note the statistical test done to confirm significance in the SC result.

      We agree with the Reviewer and have added images of the LD in bmm1 spermatocyte clones in Figure 3B, and the quantification in Figure 3C. We explicitly state the significance of this result and the statistical test in Figure 3 legend.

      • In Figure 3, the cell autonomous data clearly indicates that there are higher proportions of bmm mutant GSCs occupying the hub compared to control GSCs. It could be worth stating whether this observation indicates an increased ability of bmm mutant GSCs to compete for occupying space at the hub.

      We thank the Reviewer for pointing out this potential implication of our data, which we acknowledge in the revised version of our manuscript:

      “Future studies will also need to confirm whether bmm1 mutant GSCs show an increased ability to occupy space at the hub.”

      • In Figure 4, I suggest changing the title of Panel B to "Proportion of significant species in each lipid class" for clarity.

      We made this change in the Figure 5 legend (Figure 5 is the corresponding figure in the revised manuscript).

      • It could be valuable to quantify the number of spermatids in the germline specific mdy knockdown, which would lend additional support to a cell autonomous requirement for bmm in spermatogenesis

      We added a sentence to the revised manuscript recognizing that this is an interesting experiment for studies on the role of germline triglyceride in promoting spermatogenesis.

      “While future studies will need to test whether germline-specific loss of mdy also rescues spermatid number defects in bmm1 males, our data suggest bmm-mediated regulation of testis triglyceride plays a previously unrecognized role in regulating sperm development.”

      Reviewer #3 (Recommendations For The Authors):

      1) bmm-GFP does not show expression in somatic cells yet previous work by the same group has shown a requirement for bmm in the testis soma using C587-Gal4.

      We thank the Reviewer for raising this issue. While the reporter shows low GFP expression in the somatic cells, the single-cell RNA sequencing data we analyze suggests bmm is expressed in these cells. We address this issue in the revised manuscript as follows:

      “While levels of the bmm-GFP reporter were lower in somatic cells, single-cell RNA sequencing data identified bmm expression in the somatic lineage that was higher in cells at later stages of development (Figure S2D).”

      2) p.11 l.200-202 "Because we recovered fewer bmm1 spermatocyte and spermatid clones 14 days after clone induction (Figure 3K,3L; Kruskal-Wallis rank sum test), this effect on germline development represents a cell-autonomous role for bmm." This sentence should be rephrased as the phenotype could be a combination of autonomous roles within the germline and non-autonomous roles in supporting cyst cells.

      “We also reveal a potential non cell-autonomous role for somatic bmm. While there was no difference in the ratio of Zd-1-positive cells between homozygous clones and heterozygous clones in animals carrying the bmm1 or bmmrev alleles at 14 days post clone induction (Figure S4O; Kruskal-Wallis rank sum test), the distance from the hub to the Zd-1 positive clones reside was significantly decreased in bmm1 homozygous clones (Figure S4P; Kruskal-Wallis rank sum test). Together, these data indicate bmm may play a cell-autonomous role in germline cells, and potentially a non-cell-autonomous role in somatic cells, to regulate spermatogenesis.”

      3) The labelling in Fig. 3 is confusing - presumably the graph in 3C refers to spermatid bundles [this comment applies to other figures showing spermatid bundle numbers], not individual spermatids, while the graph in 3G refers to the proportion of the total GSC pool that is contained within the clone. The data in Fig. 3C are not described in the main text.

      We adjusted the confusing labelling to ‘spermatid bundles’ from ‘number of spermatids’, as suggested. We also changed the title of panel Fig. 3G (now 4G) as suggested and men5oned Fig. 3C (now Fig. 4C) in the text.

      4) On p.9, comments are speculative or seek to draw comparisons with the broader literature and would seem to belong more to the discussion (eg "our data suggests flies are a good model to study how bmm/ATGL influences sperm development" - also there is a typo, it should be "suggest").

      We thank the Reviewer for raising concern about our speculative statement; we changed the text as follows in the revised manuscript:

      “This identifies similarities between flies and mice in fertility-related phenotypes associated with whole-body loss of bmm/ATGL.”

      5) The length of the heat shocks used for clone induction should be specified in the methods (rather than just the period in between heat shocks).

      We now include more information on clone induction:

      “Adult males were collected at 3-5 days post-eclosion and heat-shocked three times at 37°C for 30 min followed by a 10 min rest period at room temperature between heat shocks. Amer heat-shock, the flies were incubated at room temperature until dissection.”

      6) p.8 l.132 "bmm-GFP accurately reproduces changes to bmm mRNA levels". This sentence should be rephrased.

      We thank the Reviewer for this comment and rephrased the sentence:

      “We first examined bmm expression in the testis by isolating this organ from flies carrying a bmm promoter driven GFP transgene (bmm-GFP) that recapitulates many aspects of bmm mRNA regulation [77].”

      7) p.9 l.172 "we used germline-specific marker" should read "we used an antibody against the germline-specific marker".

      We corrected this inaccurate statement in our revised manuscript.

      8) p.10 several lines, "GSC" should be "GSCs".

      We corrected this inaccurate use of GSC in our revised manuscript.

      9) p.13 l.247 should read "variance in GSC numbers".

      Thank you, this error was fixed.

    1. Author Response

      We thank the editors and the reviewers for their assessment of our revised manuscript. Please see bellow, our answers to the recommendations by reviewer #2.

      Figure S2F - Seems like a very narrow range of parameters. Is there some fine tuning here?

      The range of values of tau_P that yields previous-trial biases is bounded by below and above for the following reasons: above a certain value of tau_P (therefore large integration time), the bump that had formed in the previous trial is not strong enough to remain stable for a long time, and therefore dissipates by the time the current trial starts (especially when adaptation is fast, towards the left of the third panel). Below a certain value, instead, this integration timescale is small enough to quickly form a representation of the current trial, hence the bump from the previous trial quickly dissipates (due to mutual inhibition). This interplay between the integration and the adaptation timescale as well as considering a phenomenon which is bounded in time (how close the activity bump is to the second stimulus of the previous trial which is presented between -22.4 and -5.6 seconds from the moment we are considering) yields a region for tau_P which is bounded. This region, however, appears narrow due to the limited number of points we have considered for the simulation grid.

      Regarding my comment on lapse at the boundaries (old line 221). Lapse parameters in psychometric curves correspond to errors on the "easy" trials. But the mechanistic explanation for lapse trials is that there is a non-zero probability for the subject to respond in a manner that is random and independent of the stimulus. In the case of extreme stimuli, this is the only reason for errors, and thus looking at the edges of the psychometric curves allows to calculate lapse rate. But - the usual assumption for underlying mechanism is that the subject lapses in all trials, regardless of stimulus. If I understand correctly, this is different than the mechanistic reason for lapses in the network model, which was described as something that happens more in the edges than in the center. Or more generally, to be a stimulus-dependent effect.

      We thank the reviewer for this clarification. The reviewer is right that in our mechanistic model, lapses (as defined by errors on easy trials) are more likely to occur for extreme stimuli, due to the vicinity to the boundary of the attractor. Such errors also occur for non-extreme stimuli, when delay intervals are long enough for the bump in PPC to drift to the boundaries. In experiments, lapse trials as described by the reviewer occur due to multiple different reasons; for lapse that is independent of the stimuli, mechanisms such as attention have been thought to play a role, this however is not included in our model.

      What are the parameters for the distributions (skewed, bimodal, ...)?

      These parameters are reported in the legend of Fig.6, where the distributions appear.

      Bump with adaptation. Sorry for the draft-like comment. I don't think the existing studies are in the form you describe. I do think it might be useful to point readers to these studies. If an interested reader wishes to understand network dynamics in this and similar scenarios, it might be useful to have the pointers. The reference I had in mind was Romani, S., & Tsodyks, M. (2015). Short‐term plasticity based network model of place cells dynamics. Hippocampus, 25(1), 94-105.

      We thank the reviewer for the clarification, and we will include this reference in the Version of Record.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study about the mechanisms underlying our capacity to represent and hold recent events in our memory and how they are influenced by past experiences. A key aspect of the model put forward here is the presence of discrete jumps in neural activity with the posterior parietal region of the cortex. The strength of evidence is largely solid, with some weaknesses noted in the methodology. Both reviewers suggested ways in which this aspect of the model can to be tested further and resolve conflicts with previously published experimental results, in particular the study by Papadimitriou et al 2014 in Journal of Neurophysiology.

      We thank the editors for their assessment. As mentioned in the cover letter, we have addressed all the reviewers’ concerns and would like to request and update of the assessment to reflect the revisions we have made.

      Public Reviews:

      We thank both reviewers for their careful reading and feedback that helped clarify many aspects of the model. Below, we address their comments.

      Reviewer #1 (Public Review):

      This paper aims to explain recent experimental results that showed deactivating the PPC in rats reduced both the contraction bias and the recent history bias during working memory tasks. The authors propose a twocomponent attractor model, with a slow PPC area and a faster WM area (perhaps mPFC, but unspecified). Crucially, the PPC memory has slow adaptation that causes it to eventually decay and then suddenly jump to the value of the last stimulus. These discrete jumps lead to an effective sampling of the distribution of stimuli, as opposed to a gradual drift towards the mean that was proposed by other models. Because these jumps are single-trial events, and behavior on single events is binary, various statistical measures are proposed to support this model. To facilitate this comparison, the authors derive a simple probabilistic model that is consistent with both the mechanistic model and behavioral data from humans and rats. The authors show data consistent with model predictions: longer interstimulus intervals (ISIs) increase biases due to a longer effect over the WM, while longer intertrial intervals (ITIs) reduce biases. Finally, they perform new experiments using skewed or bimodal stimulus distributions, in which the new model better fits the data compared to Bayesian models.

      The mechanistic proposed model is simple and elegant, and it captures both biases that were previously observed in behavior, and how these are affected by the ISI and ITI (as explained above). Their findings help rethink whether our understanding of contraction bias is correct.

      On the other hand, the main proposal - discrete jumps in PPC - is only indirectly verified.

      We agree with the reviewer that the evidence for discrete jumps in PPC has been provided in behavioural results (short-term, n-back trial biases), and not from neural data. However, we believe electrophysiological investigations are out of the scope of the current manuscript and future works are needed to further verify the results.

      The model predicts a systematic change in bias with inter-trial-interval. Unless I missed it, this is not shown in the experimental data. Perhaps the self-paced nature of the experiments allows to test this?

      We thank the reviewer for this great suggestion.

      We had not previously looked at this in the data for the reason that in the simulations, the ITI is set to either 2.2, 6 or 11 seconds, whereas the experiment is self-paced. Therefore, any comparison with the simulation should be made carefully.

      However, after the reviewer’s suggestion, we did look at the change in the bias with the inter-trial interval, by dividing trials according to ITIs lower than 3 seconds (“short” ITI), and higher than 3 seconds (“long” ITI). This choice was motivated by the shape of the distribution of ITIs, which is bimodal, with a peak around 1 second, and another after 3 seconds (new Fig 8F). Hence, we chose 3 seconds as it seemed a natural division. However, 3 seconds also happens to be approximately the 75th percentile of the distribution, and this means that there is much more data in the “short” ITI than the “long” ITI set. In order to have sufficient data in the “long” ITI for clearer effects we used all of our dataset – the negatively skewed, and also two bimodal distributions (of which only one was shown in the manuscript, for succinctness). This larger dataset allows us to clearly see not only a decreasing contraction bias with increasing ITI (Fig 8G), but also a decreasing onetrial-back attractive bias with increasing ITI (Fig 8H). We have uploaded all the datasets as well as scripts used to analyze them to this repository: https://github.com/vboboeva/ParametricWorkingMemory_Data.

      The data in some of the figures in the paper are hard to read. For instance, Figure 3B might be easier to understand if only the first 20 trials or so are shown with larger spacing. Likewise, Figure 5C contains many overlapping curves that are hard to make out.

      We have limited the dynamics in Fig 3B to the first 50 trials for better visibility. Likewise, as suggested, we report the standard error of the mean instead of the standard deviation in old Fig 5C (new Fig 6C) – this allows for the different curves to be better discernible.

      There is a gap between the values of tau_PPC and tau_WM. First - is this consistent with reports of slower timescales in PFC compared to other areas?

      Recent studies by Xiao-Jing Wang and colleagues (Refs. 1-3 below) suggest that may be the case. In Wang et al 2023, Ref 1 below), the authors use a generative model to study the concept of bifurcation in space in working memory, that is accompanied by an inverted-V shape of the time constants as a function of cortical hierarchy.

      Briefly, they propose a generative model of the cortex with modularity, incorporating repeats of a canonical local circuit connected via long-range connections. In particular, the authors define a hierarchy for each local circuit. At a critical point in this hierarchy axis, there is a phase transition from monostability to bistability in the firing rate. This means that a local circuit situated below the critical point will only display a low activity steady state, while those above the critical point additionally display a persistent activity steady state.

      The model predicts a critical slowing down of the neural fluctuations at the critical point, resulting in an inverted-V shape of the time constants as a function of the hierarchy. They test the predictions of their model – the bifurcation in space and that inverted-V-shaped time constants as a function of the hierarchy - on connectome-based models of the macaque and mouse cortex. Interestingly both datasets show similar behavior. In particular, during working memory, frontal areas (higher in the hierarchy, e.g. area 24c in macaques) has a smaller time constant relative to posterior parietal areas (lower in the hierarchy, like LIP or f7). We have now cited this new work.

      [1] https://www.biorxiv.org/content/10.1101/2023.06.04.543639v1

      [2] https://elifesciences.org/articles/72136

      [3] https://www.biorxiv.org/content/10.1101/2022.12.05.519094v3.abstract

      Second - is it important for the model, or is it mostly the adaptation timescale in PPC that matters?

      We have run simulations producing a phase diagram with tau_theta^P on the x-axis, tau^P on the y-axis, and in color, the fraction of trials in which the bump is in the vicinity of a target (Fig S2 F), before the network is presented with the second stimulus. This target can be the first stimulus s_1 (left), mean over stimuli (middle) and previous trial’s stimulus (right)). White point corresponds to parameters of the default network.

      In this phase diagram, the lowest value that tau_P takes is tau_WM=0.01. When tau_P=tau_WM, the bump is rarely in the vicinity of 1-trial-back stimulus, and we can see that tau_PPC should be greater than tau_WM in order for the model to yield 1-trial back effects. We conclude that it is indeed important for tau_PPC > tau_WM.

      We have included this in Fig S2 F of the manuscript.

      Regarding the relation to other models, the model by Hachen et al (Ref 45) also has two interacting memory systems. It could be useful to better state the connection, if it exists.

      The model proposed by Hachen et al is conceptually different in that one module stores the mean of the sensory stimulus; it could be related to a variant of our model where adaptation is turned off in the PPC network (Fig S2 A). However, the task they model is also different: subjects have to learn the location of a boundary according to which the stimulus is classified as ‘weak’ or ‘strong’, set by the experimenter. Hence, it is a task where learning is needed - this contrasts with the task we are modelling, where only working memory is required. How task demands reconfigure existing circuits via dynamics and/or learning to perform different computations is a fascinating area of research that is outside the scope of this work.

      Reviewer #2 (Public Review):

      Working memory is not error free. Behavioral reports of items held in working memory display several types of bias, including contraction bias and serial dependence. Recent work from Akrami and colleagues demonstrates that inactivating rodent PPC reduces both forms of bias, raising the possibility of a common cause.

      In the present study, Boboeva, Pezzotta, Clopath, and Akrami introduce circuit and descriptive variants of a model in which the contents of working memory can be replaced by previously remembered items. This volatility manifests as contraction bias and serial dependence in simulated behavior, parsimoniously explaining both sources of bias. The authors validate their model by showing that it can recapitulate previously published and novel behavioral results in rodents and neurotypical and atypical humans.

      Both the modeling and the experimental work is rigorous, providing compelling evidence that a model of working memory in which reports sometimes sample past experience can produce both contraction bias and serial dependence, and that this model is consistent with behavioral observations across rodents and humans in the parametric working memory (PWM) task.

      Evidence for the model advanced by the authors, however, remains incomplete. The model makes several bold predictions about behavior and neural activity, untested here, that either conflict with previous findings or have yet to be reported but are necessary to appropriately constrain the model.

      First, in the most general (descriptive) formulation of the Boboeva et al. model, on a fraction of trials items in working memory are replaced by items observed on previous trials. In delayed estimation paradigms, which allow a more direct behavioral readout of memory items on a trial-by-trial basis than the PWM task considered here, reports should therefore be locked to previous items on a fraction of trials rather than display a small but consistent bias towards previous items. However, the latter has been reported (e.g., in primate spatial working memory, Papadimitriou et al., J Neurophysiol 2014). The ready availability of delayed estimation datasets online (e.g., from Rademaker and colleagues, https://osf.io/jmkc9/) will facilitate in-depth investigation and reconciliation of this issue.

      As pointed out by the reviewer, in the PWM task that we are modelling here, the activity in the network is used to make a binary decision. However, it is possible to directly analyse the network activity before the onset of the second stimulus.

      In their manuscript, Papadimitriou et al. study a memory-guided saccade task in nonhuman primates and argue that the animals display a small but consistent bias towards previous items (Fig 2). In that figure, the authors compute the error as the difference between the saccade direction and target direction in each trial. They compute this error for all trials in which the preceding trial’s target direction is between 35° and 85° relative to the current trial (counterclockwise with respect to the current trial’s target). They discover that the residual error distribution is unimodal with a mode at 1.29° and a mean at 2.21° (positive, so towards the preceding target’s direction), from which they deduce a small but systematic bias towards previous trial targets.

      We have computed a similar measure for our network with default parameters (Table 1), by subtracting the location of the bump at the end of the delay interval (s_hat(t), ‘saccade’) from the initial location of the first stimulus in the current trial (s1(t) or the ‘target’). We have done this for all trials where s1(t)=0.2, and where s2(t-1) takes specific values. These distributions are characterized by two modes. The first corresponds to those trials where the bump is not displaced in WM (i.e. mean of zero). We can also see the appearance of a second mode at the location of s1(t) - s2(t-1), corresponding to the displacements towards the preceding trial’s stimulus described in the main text. If, instead, we limit the analysis to a small range of previous trials close to s1(t) (similar to Papadimitriou et al) then the distribution of residual errors will appear unimodal, as the two modes merge. Importantly, note that there is a large variability around the second mode, expressing a more complex dynamics in the network. As can be seen in Fig 3B, the location of the bump is not always slaved to the one in the PPC in a straightforward way -- due to the adaptation in the PPC, the global inhibition in the connectivity kernel, as well as interleaved design for various delay intervals, the WM bump can be displaced in nontrivial ways (see also Recommendation no 4), yielding the dispersion around the second peak. It remains to be seen whether such patterns can be observed in the data from previous works on continuous working memory recall (including Papadimitriou et al). However, to our knowledge, such detailed and full analysis of errors at the level of individual trials has not been done.

      In summary, this analysis shows that the type of dynamics in our network is not one of the two cases: 1) small and systematic bias in each and every trial or 2) large error that occurs only rarely; rather, the dispersion around both modes suggests that the dynamics in our model are a mixture of these two limit cases.

      We have also performed another typical analysis, reported in several continuous recall tasks (e.g. Jazayeri and Shadlen 2010) where contraction bias has been reported. We plot WM bump locations after the delay period for every trial (s_hat(t)), and their averages, against the nominal value of s1(t). We see that the mean WM location deviates from the identity line toward the mean values of s1(t), again showing contraction bias as an average effect, while individual trials follow the dynamics explained above.

      We have now included a new section on continuous recall (Sect. 1.5 and a new figure (Fig 5)), which details the two above-mentioned analyses. The analysis of freely available datasets of delayed estimation tasks, unfortunately, is out of the scope of this work, and we leave such analyses to future studies.

      Second, the bulk of the modeling efforts presented here are devoted to a circuit-level description of how putative posterior parietal cortex (PPC) and working-memory (WM) related networks may interact to produce such volatility and biases in memory. This effort is extremely useful because it allows the model to be constrained by neural observations and manipulations in addition to behavior, and the authors begin this line of inquiry here (by showing that the circuit model can account for effects of optogenetic inactivation of rodent PPC).

      Further experiments, particularly electrophysiology in PPC and WM-related areas, will allow further validation of the circuit model. For example, the model makes the strong prediction that WM-related activity should display 'jumps' to states reflecting previously presented items on some trials. This hypothesis is readily testable using modern high-density recording techniques and single-trial analyses.

      As mentioned in response to the previous comment, we note again that in the WM network, the bump ‘displacement’ has a complex dynamics -- the examples we have provided in Fig 1A and 2B mainly show the cases in which jumps occur in the WM network, but this is not the only type of dynamics we observe in the model. We do have instances in which the continuity of the model causes drift across values, and we have now replaced the right panel in Fig 2B with one such instance, in order to emphasize that this displacement towards the previous trial’s stimulus (s2(t-1)) can occur in various ways. For a more thorough analysis, we have analyzed the distance between s1(t) and the position of the bump in the WM network at the end of the delay period s_hat(t), conditioned on specific values of s1(t) and s2(t-1) (Fig 5C). In this figure, we can see the appearance of two modes: one centered around 0, corresponding to the correct trials where the stimulus is kept in WM (s1(t) = s_hat(t)), and another mode centered around s2(t-1), the location of the second stimulus of the previous trial, where the bump is displaced. Note, as we explain in Sect. 1.5, the large dispersion around this second mode, which suggests that the bump is not always displaced to that specific location and may undergo drift.

      We agree with the reviewer that future electrophysiological experiments (or analysis of existing datasets) are necessary for validation of these results.

      Finally, while there has been a refreshing movement away from an overreliance on p-values in recent years (e.g., Amrhein et al., PeerJ 2017), hypothesis testing, when used appropriately, provides the reader with useful information about the amount of variability in experimental datasets. While the excellent visualizations and apparently strong effect sizes in the paper mitigate the need for p-values to an extent, the paucity of statistical analysis does impede interpretation of a number of panels in the paper (e.g., the results for the negatively skewed distribution in 5D, the reliability of the attractive effects in 6a/b for 2- and 3- trials back).

      We share the reviewer’s criticism towards the misuse of p-values – in order for a clearer interpretation of old Fig 5D (new Fig 7E), we have looked at the 2 and 3 trials-back biases by using all of our dataset – the negatively skewed, and also two bimodal distributions (of which only one was shown in the manuscript). This larger dataset of 43 subjects (approximately 17,200 trials) allows us to clearly see the 2 and 3 trial back attractive biases, and the effect that the delay interval exerts on them.

      Reviewer #1 (Recommendations For The Authors):

      Fig 5 A&C - It might be beneficial to separate the distribution of stimuli from the performance. It is hard to read the details of the performance, especially with error bars.

      Following the next recommendation, we have exchanged the standard deviation to standard errors of the mean, hopefully this allows to better read the performance.

      Fig 5C. The number of participants should be written. Perhaps standard errors instead of standard deviation?

      We have now changed the standard deviation to standard errors of the mean and included the number of participants in the figure.

      Fig 2B - hard to understand, because there is no marking of where "perfect" memory of s1 would be.

      The perfect memory of s1 is shown in the upper panel as black bars.

      Fig 3B. dot number 9 (blue, around 0.7) - why is WM higher than stimulus?

      This trial has a long ISI (blue means 10s). During this delay, the bump in the PPC, under the influence of adaptation, drifts far below the first stimulus (note that the previous trial also had its first stimulus in the same location, as a result of which the adaptative thresholds have built up significantly, causing the bump to move away from that location). During this delay period, neurons in the WM network receive inputs from the PPC network: if this input is strong enough, it can disrupt an existing bump; if not, this input still exerts inhibiting influence on the existing bump via the global inhibition in the connectivity. This can cause an existing bump to slowly drift in a random direction, and finally dissipate. Note that the lines in Fig 2B represent the neuron with the maximal activity, this activity may be a stable bump, or an unstable bump that may soon dissipate.

      Other examples with similar dynamics include trials 43 and 54.

      L167 fewer -> smaller

      We have now corrected this.

      Fig 3C - bump can also be in between. Is this binned?

      We have not binned the length of the attractor; to produce that figure, we check whether the position of the neuron with the maximal firing rate is within a distance of ±5% of the length of the whole line attractor from the target location.

      L221 Lapse at the boundary of attractor. This seems very different from behavior. Specifically, if it is in the boundaries, it should be stimulus dependent.

      Very sorry, we did not manage to understand the reviewer’s comment.

      L236 are -> is

      We have now corrected this.

      Fig S4 - should be mostly in main text.

      Part of this figure is in Fig 6A, but given the amount of detail, we think Supplementary Material is better suited.

      L253-254. Differences across all distributions - very minor except the bimodal case.

      That is correct, this is why we conducted the experiment with the bimodal distribution, to better differentiate the predictions of the two models.

      L273 extra comma after "This probability"

      We have now corrected this.

      ITI was only introduced in section 1.5.2. Perhaps worth mentioning the default 5s value earlier in the paper.

      We have now mentioned this in line 97-98.

      Fig S6B title: perhaps "previous stimuli"?

      We have now corrected this.

      L364 i"n A given trial"

      Equation 2 - no decay term?

      Thank you for pointing out this error, we have now corrected this.

      Equation 5,6 are j^W and j^P indices of neurons in those populations?

      Yes, j^W indexes neurons in the WM network, and j^P those in the PPC. We have now added this in the text for clarity.

      Bump with adaptation - other REFs? Sandro?

      We are aware of continuous bump attractors implementing short-term synaptic plasticity in various studies (including by Sandro Romani), but not in the form we have described. May the reviewer kindly point us towards the relevant literature.

      Free boundary - what is the connectivity for neurons 1 and N? Is it weaker than others? Is the integral still 1? Does this induce some bias on the extreme values?

      The connectivity of the network is all-to-all. However, as expressed by Eq. (3), the distance-dependent contribution to the weights, K, decreases exponentially as we move from neuron 1 onwards, and from neuron N down. The sum (or integral, in the large-N limit) of the K_ij for j on either side of neuron i is unity only when i is sufficiently far from 1 or N. We have rephrased the paragraph starting in line 516 to make this clearer.

      The presence of a boundary could introduce a bias in theory, but in practice, it affects the dynamics only when the bump drifts sufficiently close to it. The smallest stimulus in the simulated task has amplitude 0.2, with width 0.05, which implies the activation of 50 neurons on either side of neuron 400. If one compares this with the width of the kernel K in stimulus space (d_0 = 0.02), which spans ~10 neurons, we can see that the bump of activity stays mostly far from the boundary. It is possible, though it is observed rarely, when several consecutive long delay intervals happen to occur, that the bump in PPC drifts beyond the location corresponding to either the minimum or maximum stimulus.

      Code availability?

      Code simulating the dynamics of the network as well as analysing the resulting data can be found in the following repository: https://github.com/vboboeva/ParametricWorkingMemory Code used to analyse human behavioural data and fit them with our statistical model can be found in this repository: https://github.com/vboboeva/ParametricWorkingMemory_Data Code used to run the auditory PWM experiments with human subjects (adapted from Akrami et al 2018) can be found here: https://github.com/vboboeva/Auditory_PWM_human

      L547 stimuli

      We have now corrected this.

      Equation 14 uses both stimuli. Was this the same for the rest of analysis in the paper (first figures for instance)?

      This equation was used for all GLM analyses (Figs 9 and S6).

      D0 is very small (0.02). Does this mean that activity is essentially discrete in the model? Fig 1A & 2B - the two examples of model activity suggest this is the case. In other words - are there cases where the continuity of the model causes drift across values? Can you show an example (similar to Fig 1A)?

      Since this point has been raised beforehand, we refer to the first comment, Fig 2B and Sect. 1.5 for the response to this question.

      Table 1 - inter trial interval 6. Text says 5

      We have now corrected this in the text.

      Reviewer #2 (Recommendations For The Authors):

      In addition to my review above, I just have a few minor comments:

      • If I understood correctly, the squares inside the purple rectangle in Figure 1B are meant to show a gradation from red to blue, but this was hard to make out in the pdf.

      Actually the squares are all on one side or the other of the diagonal, therefore they do not have any gradation.

      • line 164: "The resulting dynamics... [are]?"

      We have corrected this in the text.

      • Fig 7B legend: "The network performance is on average worse for longer ITIs" – correct?

      This was a mistake, we have replaced worse with better.

      Other comments

      We realized that the colorbar reported the incorrect fraction classified in Figs 1B, 2C, 7B (new 8B), S2C, S3A, S5B. We have corrected this in the new version of the manuscript.

      We also found a minor mistake in one of our analysis codes that computed the n-trial back biases for different delay intervals. This did not change our results, actually made the effects clearer. The figures concerned are Fig 3F and new Fig 7E.

    1. Author Response

      eLife assessment

      This study presents important findings for understanding cortical processing of color, binocular disparity, and naturalistic textures in the human visual cortex at the spatial scale of cortical layers and columns using state-of-the-art high-resolution fMRI methods at ultra-high magnetic field strength (7 T). Solid evidence supports an interesting layer-specific informational connectivity analysis to infer information flow across early visual areas for processing disparity and color signals. While the question of how the modularity of representation relates to cortical hierarchical processing is interesting and fundamental, the findings that texture does not map onto previously established columnar architecture in V2 is suggestive but would benefit from further controls. The successful application of high-resolution fMRI methods to study the functional organization along cortical columns and layers is relevant to a broad readership interested in general neuroscience.

      Thank you for your assessment of our manuscript "Mesoscale functional organization and connectivity of color, disparity, and naturalistic texture in human second visual area ". We have carefully considered the public reviews and have outlined our plans of revision by providing point-by-point responses to the reviewers’ comments.

      Reviewer #1 (Public Review):

      To support the finding that texture is not represented in a modular fashion, additional possibilities must be considered. These include the effectiveness and specificity of the texture stimulus and control stimuli, (b) further analysis of possible structure in images that may have been missed, and (c) limitations of imaging resolution.

      Thank you for your suggestions. We will provide evidence and additional analyses to show that there was indeed a large difference in high-order statistical information between the texture and control stimuli in our study, and thus the contrast between the two stimuli should be effective in localizing the processing of high-order texture information. Compared to the previous studies, another reason for the weaker texture selectivity in the current study could be the smaller number of images used and the slower rate of image presentation. Although our fMRI result at 1-mm isotropic resolution did not show a modular processing of naturalistic texture in CO-stripe columns, this does not exclude the possibility that smaller modules exist beyond the current fMRI resolution. We will discuss these limitations in the revised manuscript.

      More in-depth analysis of subject data is needed. The apparent structure in the texture images in peripheral fields of some subjects calls for more detailed analysis. e.g. Relationship to eccentricity and the need for a 'modularity index' to quantify the degree of modularity. A possible relationship to eccentricity should also be considered.

      We will perform further analysis based on your suggestion, especially regarding the relationship between eccentricity and modulation index. We will discuss this possibility in the revised manuscript.

      Given what is known as a modular organization in V4 and V3 (e.g. for color, orientation, curvature), did images reveal these organizations? If so, connectivity analysis would be improved based on such ROIs. This would further strengthen the hierarchical scheme.

      Thank you for your suggestion. The informational connectivity analyses used highly informative voxels by feature selection, which may already represent information from the modular organizations in these higher visual areas. We will examine the functional maps for possible modular organizations.

      Reviewer #2 (Public Review):

      In lines 162-163, it is stated that no clear columnar organization exists for naturalistic texture processing in V2. In my opinion, this should be rephrased. As far as I understand, Figure 2B refers to the analysis used to support the conclusion. The left and middle bar plots only show a circular analysis since ROIs were based on the color and disparity contrast used to define thin and thick stripes. The interesting graph is the right plot, which shows no statistically significant overlap of texture processing with thin, thick, and pale stripe ROIs. It should be pointed out that this analysis does not dismiss a columnar organization per se but instead only supports the conclusion of no coincidence with the CO-stripe architecture.

      Reviewer #1 also raised a similar concern. We agree that there may be a smaller functional module of textures in area V2 at a finer spatial scale than our fMRI resolution. We will rephrase our conclusions to be more precise.

      In Figure 3, cortical depth-dependent analyses are presented for color, disparity, and texture processing. I acknowledge that the authors took care of venous effects by excluding outlier voxels. However, the GE-BOLD signal at high magnetic fields is still biased to extravascular contributions from around larger veins. Therefore, the highest color selectivity in superficial layers might also result from the bias to draining veins and might not be of neuronal origin. Furthermore, it is interesting that cortical profiles with the highest selectivity in superficial layers show overall higher selectivity across cortical depth. Could the missing increase toward the pial surface in other profiles result from the ROI definition or overall smaller signal changes (effect size) of selected voxels? At least, a more careful interpretation and discussion would be helpful for the reader.

      We will discuss the limitations of cortical depth-dependent analysis using GE-BOLD fMRI. All our stimuli produced robust activations in these visual areas, thus the flat laminar profiles of modulatory indices are unlikely to be caused by smaller signal changes. We will show the original BOLD responses in addition to the modulation index.

      I was slightly surprised that no retinotopy data was acquired. The ROI definition in the manuscript was based on a retinotopy atlas plus manual stripe segmentation of single columns. Both steps have disadvantages because they neglect individual differences and are based on subjective assessment. A few points might be worth discussing: (1) In lines 467-468, the authors state that V2 was defined based on the extent of stripes. This classical definition of area V2 was questioned by a recent publication (Nasr et al., 2016, J Neurosci, 36, 1841-1857), which showed that stripes might extend into V3. Could this have been a problem in the present analysis, e.g., in the connectivity analysis? (2) The manual segmentation depends on the chosen threshold value, which is inevitably arbitrary. Which value was used?

      The retinotopic atlas on the standard surface is usually quite accurate in defining the boundaries of early visual areas. Although some stripes may extend into V3, these patterns should be more robust in V2. In our analysis, we selected only those with clear organizations within the retinotopic atlas. Thus, the signal contribution from V3 is likely to be small and would not affect the pattern of results. In addition, the results between V3 and V2 could be very different, we will compare the pattern of results from these areas in additional analyses. The threshold for segmentation is abs(T)>2, we will clarify this in the method.

      The use of 1-mm isotropic voxels is relatively coarse for cortical depth-dependent analyses, especially in the early visual cortex, which is highly convoluted and has a small cortical thickness. For example, most layer-fMRI studies use a voxel size of around isotropic 0.8 mm, which has half the voxel volume of 1 mm isotropic voxels. With increasing voxel volume, partial volume effects become more pronounced. For example, partial volume with CSF might confound the analysis by introducing pulsatility effects.

      We agree that the 1-mm isotropic voxel is much smaller in volume than the 0.8-mm isotropic voxel, but the resolution along the cortical depth is not a large difference. In addition to our study, there are also other studies showing that fMRI at 1-mm isotropic resolution is capable of resolving cortical depth-dependent signals. Also, our fMRI slices were oriented perpendicular to the calcarine sulcus, the higher in-plane resolution will also benefit in resolving depth-dependent signals. We will discuss these issues about fMRI resolution in the revised manuscript.

      The SVM analysis included a feature selection step stated in lines 531-533. Although this step is reasonable for the training of a machine learning classifier, it would be interesting to know if the authors think this step could have reintroduced some bias to remaining draining vein contributions.

      Several precautions have been taken in the ROI definition to reduce the influence of large draining veins. The same number of voxels were selected from each cortical depth for the SVM analysis, thus there was no bias from the superficial layers susceptible to draining veins. Also, since both feedforward and feedback connections involved the superficial voxels, the remaining influence of large draining veins should be comparable between the two connections.

      Reviewer #3 (Public Review):

      The authors tend to overclaim their results.

      Thank you for your comments. We will add more control analyses to strengthen our findings, and have appropriate discussion of results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article describes a useful python-based image-analysis tool for bacteria growing in the 'mother-machine' microfluidic device. This new method for image segmentation and tracking offers a user-friendly graphical interface based on the previously developed, promising environment for image analysis 'Napari'. The authors demonstrate the usefulness of their software and its robust performance by comparing it to other methods used for the same purpose. The comparison provides solid support for the new method, although it would have been even stronger if tested using data sets from other groups. This article will be of interest for scientists who utilize the 'mother machine', not least because it also provides a short overview of how to set up this widely used device.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors aim to develop an easy-to-use image analysis tool for the mother machine that is used for single-cell time-lapse imaging. Compared with related software, they tried to make this software more user-friendly for non-experts with a design of "What You Put Is What You Get". This software is implemented as a plugin of Napari, which is an emerging microscopy image analysis platform. The users can interactively adjust the parameters in the pipeline with good visualization and interaction interface.

      Strengths:

      • Updated platform with great 2D/3D visualization and annotation support.

      • Integrated one-stop pipeline for mather machine image processing.

      • Interactive user-friendly interface.

      • The users can have a visualization of intermediate results and adjust the parameters.

      We thank the reviewer for their positive comments.

      Weaknesses:

      • Based on the presentation of the manuscript, it is not clear that the goals are fully achieved.

      • Although there is great potential, there is little evidence that this tool has been adopted by other labs.

      • The comparison of Otsu and U-Net results does not make much sense to me. The systematic bias could be adjusted by threshold change. The U-Net output is a probability map with floating point numbers. This output is probably thresholded to get a binary mask, which is not mentioned in the manuscript. This threshold could also be adjusted. Actually, Otsu is a segmentation method and U-Net is an image transformation method and they should not be compared together. U-Net output could also be segmented using Otsu.

      We agree that the comparison of the classical and U-Net results may be misleading. As the reviewer points out, the issue ultimately comes down to thresholding. Indeed, the threshold of both the Otsu and U-Net outputs could be adjusted to bring them into line with each other. The comparison between the Otsu pipeline and U-Net pipeline is meant to illustrate that any pipeline (making use of a variety of methods) may be highly susceptible to the value of a user-input (or hard-coded threshold).

      We have clarified the discussion to emphasize that the comparison is not specifically between U-Net and Otsu but between the two pipelines (lines 238 - 257).

      We have also clarified that the U-Net probability map output was binarized with a threshold of 0.5 (lines 538-541). We note the same activation function and threshold are used in DeLTA. As the reviewer points out, Otsu’s method could indeed be applied to threshold the U-Net output as well. What we referred to as the “Otsu” MM3 method itself uses Otsu thresholding coupled with a Euclidean distance transform and a Random Walker algorithm. For clarity we now refer to it as a classical or non-learning method in the text.

      • The diversity of datasets used in this study is limited.

      We have added a section “Testing napari-MM3 on other datasets” (lines 187-196) evaluating the performance of MM3 on 4 datasets (3 E. coli, 1 Corynebacterium glutamicum) from outside our lab, demonstrating its versatility.

      • There is some ambiguity in the main point of this manuscript, the title and figures illustrate a complete pipeline, including imaging, image segmentation, and analysis. While the abstract focus only on the software MM3. If only MM3 is the focus and contribution of this manuscript, more presentations should focus on this software tool. It is also not clear whether the analysis features are also integrated with MM3 or not.

      We have added a line (lines 160-162) clarifying that final analysis and plotting must be done outside of napari. MM3 itself processes raw microscopy images, segments cells and reconstructs cell lineages (Figure 2).

      • The impact of this work depends on the adoption of the software MM3. Napari is a promising platform with expanding community. With good software user experience and long-term support, there is a good chance that this tool could be widely adopted in the mother machine image analysis community.

      We thank the reviewer for their endorsement of MM3’s potential.

      • The data analysis in this manuscript is used as a demo of MM3 features, rather than scientific research.

      Reviewer #2 (Public Review):

      The authors present an image-analysis pipeline for mother-machine data, i.e., for time-lapses of single bacterial cells growing for many generations in one-dimensional microfluidic channels. The pipeline is available as a plugin of the python-based image-analysis platform Napari. The tool comes with two different previously published methods to segment cells (classical image transformation and thresholding as well as UNet-based analysis), which compare qualitatively and quantitatively well with the results of widely accessible tools developed by others (BACNET, DelTA, Omnipose). The tool comes with a graphical user interface and example scripts, which should make it valuable for other mother-machine users, even if this has not been demonstrated yet.

      We thank the reviewer for their positive comments.

      The authors also add a practical overview of how to prepare and conduct mother-machine experiments, citing their previous work and giving more advice on how to load cells using centrifugation. However, the latter part lacks detailed instructions.

      We have added a more detailed experimental protocol, including the procedure we use for cell loading, to the lab github page https://github.com/junlabucsd/mother-machine-protocols (linked in the main text).

      Finally, the authors emphasize that machine-learning methods for image segmentation reproduce average quantities of training datasets, such as the length at birth or division. Therefore, differences in training can propagate to difference in measured average quantities. This result is not surprising and is normally considered a desired property of any machine-learning algorithm as also commented on below.

      Points for improvement:

      Different datasets: The authors demonstrate the use of their method for bacteria growing in different growth conditions in their own microscope. However, they don't provide details on whether they had to adjust image-analysis parameters for each dataset. Similarly, they say that their method also works for other organisms including yeast and C. elegans (as part of the Results section) but they don't show evidence nor do they write whether the method needs to be tuned/trained for those datasets. Finally, they don't demonstrate that their method works on data from other labs, which might be different due to differences in setup or imaging conditions.

      We have added a section “Testing napari-MM3 on other datasets” (lines 187-196) evaluating the performance of MM3 on 4 datasets (3 E. coli, 1 Corynebacterium glutamicum) from outside our lab, demonstrating its versatility. We provide details of the procedure and parameters used in the Methods section. (“Analysis of external datasets” lines 476-486).

      Bias due to training sets:

      The bias in ML-methods based on training datasets is not surprising but arguably a desired property of those methods. Similarly, threshold-based classical segmentation methods are biased by the choice of threshold values and other segmentation parameters. A point that would have profited from discussion in this regard: How to make image segmentation unbiased, that is, how to deliver physical cell boundaries? This can be done by image simulations and/or by comparison with alternative methods such as fluorescence microscopy.

      We agree this is an important point. We have revised the relevant sections (lines 238 - 270) to add context to the discussion of bias in both classical and deep learning methods. We have added a subsection (lines 401 - 410) discussing methods to this end, such as synthetic training data generation or calibrating the segmentation to fluorescence images.

      The authors stress the user-friendliness of their method in comparison to others. For example, they write: 'Unfortunately, many of these tools present a steep learning curve for most biologists, as they require familiarity with command line tools, programming, and image analysis methods.' I suggest to instead emphasize that many of the tools published in recent years are designed to be very use friendly. And as will all methods, MM3 also comes at a prize, which is to install Napari followed by the installation of MM3, which, according to their own instructions, is not easy either.

      We have modified our language to acknowledge that indeed recent software such as DeLTA and BACMMAN make a point to be user-friendly and accessible (lines 52-53).

      Reviewer #1 (Recommendations For The Authors):

      -The resources, including documentation and code, are referenced and are not easy to find. It should be easier for readers to curate them in a separate Resources section.

      We have created a Resources section in the Methods (top of first page) with the documentation, code and protocols hyperlinked.

      • It would be easier to understand the usage of MM3 with a screen recording video. I found a video from the GitHub paper, but the resolution is a bit low. Attaching a high-resolution screenshot video would be helpful.

      A high resolution tutorial video has been made more visible on the github page.

      • In Table 1, AMD GPU is used which is not easy to use for Deep Learning. It is not clear whether the GPU is used for Deep Learning training and inference.

      We have clarified this point in the Table 1 caption, and linked to a reference on how to use AMD GPUs with Tensorflow on Macs.

      • Some paragraphs in the Discussion section are like blogs with general recommendations. Although the suggestions look pretty useful, it is not the focus of this manuscript. It might be more appropriate to put it in the GitHub repo or a documentation page. The discussion should still focus on the software, such as features, software maintenance, software development roadmap, and community adoption.

      • It would be easier for reviewers to add line numbers in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Software Installation: This might be something for the GitHub forum, but briefly trying to install the plugin myself, I already failed at the first line of the GitHub instructions, which is to use mamba for installation. This relates to my point above: Any program that is not stand-alone requires some user-savviness and trial-and-error, which is just hard to avoid for any method. I suggest being less critical of 'other methods' and instead focus on the advantage of the mother-machine-specific aspects of napari-mm3.

      The authors write 'Still, most labs do not have the time and resources to evaluate other tools they do not use critically, [...]'. The sentence is not very clear. Evaluating tools not used is obviously difficult/impossible.

      We have reworded this sentence to be more clear (lines 54-55).

      The authors write: 'The supervised learning method uses a convolutional neural net (CNN) with the U-Net architecture [20].' Can the authors cite previous work that has taken advantage of this approach before (e.g., DelTA)?

      We have added citations to DeLTA and other previous software (line 151).

      Cell tracking and lineage reconstruction should be described in more detail and/or with reference to previous work.

      We have added more details to the SI (lines 554 - 567) discussing the method in the context of existing mother machine analysis software.

      The authors provide a figure for a '3D printed cell loader', but as far they don't give instructions including a CAD file and the model of the fan used for spinning. The same holds for the stage inset (which, as far as I see, is not referred to in the manuscript text nor described in a figure caption).

      Thank you for pointing out this omission. The centrifuge is referenced in Box 1. We have updated the manuscript with a link to a Github repository containing CAD files & details of the centrifuge construction. We decided to remove the stage insert from the figure.

      Figure S3: Is the asymmetry in growth rate due to the expression of a fluorescent protein, due to strain differences, or due to imaging artifacts? Maybe this is impossible to tell based on the available datasets, but this could be discussed.

      Based on previous work (DOI 10.1099/mic.0.057240-0) it is likely due to the expression of the fluorescent protein and fluorescence imaging. We have added a brief discussion in the Figure S3 caption.

    1. Author Response

      The authors appreciate the reviewers' thoughtful and constructive feedback. We are pleased to have the opportunity to address their comments through a revised version to strengthen our work. In particular:

      (1) As suggested, we will add references/details in Methods to further help readers to establish the cohort as population-derived and clarify details about the analysis and specificity of results.

      (2) We agree that reserve, inefficiency, and compensation are complex issues needing more discussion. We will add definitions and discussion to clarify our approaches, including multivariate/univariate analyses and addressing the specificity of results. We also appreciate the suggestions for future research directions.

      A revised version addressing these valuable recommendations will improve our study's contribution towards quantitative methods for understanding reserve and compensation in healthy cognitive ageing.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, the authors have explored how treating C. albicans fungal cells with EDTA affects their growth and virulence potential. They then explore the use of EDTA-treated yeast as a whole-cell vaccine in a mouse model of systemic infection. In general, the results of the paper are unsurprising. Treating yeast cells with EDTA affects their growth and the addition of metals rescues the phenotype. Because of the significant growth defects of the cells, they don't infect mice and you see reduced virulence. Injection with these cells effectively immunises the mice, in the same way that heat-killed yeast cells would. The data is fairly sound and mostly well-presented, and the paper is easy to follow. However, I feel the data is an incremental advance at best, and the immune analysis in the paper is very basic and descriptive.

      Strengths:

      Detailed analysis of EDTA-treated yeast cells

      Weaknesses:

      • Basic immune data with little advance in knowledge.

      • No comparison between their whole-cell vaccine and others tried in the field.

      • The data is largely unsurprising and not novel.

      Thank you so much for appreciating our effort to generate a live whole-cell vaccine by treating with EDTA. Also, we appreciate your comment that the manuscript is sound and well-presented. However, we are afraid that the respected reviewer assumed the CAET cells as dead cells. CAET is a live cell just that it replicates slower than the wild type. Since the respected reviewer presumed CAET to be a dead strain similar to heat-killed, most of his/her comments were partly negative.

      Reviewer #2 (Public Review):

      Summary:

      Invasive fungal infections are very difficult to treat with limited drug options. With the increasing concern of drug resistance, developing an antifungal vaccine is a high priority. In this study, the authors studied the metal metabolism in Candida albicans by testing some chelators, including EDTA, to block the metal acquisition and metabolism by the fungus. Interestingly, they found EDTA-treated yeast cells grew poorly in vitro and non-pathogenic in vivo in a murine model. Mice immunized by EDTA-treated Candida (CAET) were protected against challenge with wild-type Candida cells. RNA-Seq analysis to survey the gene expression profile in response to EDTA treatment in vitro revealed upregulation of genes in metal homeostasis and downregulation of ribosome biogenesis. They also revealed an induction of both pro- and anti-inflammatory cytokines involved in Th1, Th2 and Th17 host immune response in response to CAET immunization. Overall, this is an interesting study with translational potential.

      Strengths:

      The main strength of the report is that the authors identified a potential whole-cell live vaccine strain that can provide full protection against candidiasis. Abundant data both on in vitro phenotype, gene expression profile, and host immune response have been presented.

      Weaknesses:

      A weakness is that the immune mechanism of CAET-mediated host protection remains unclear. The immune data is somewhat confusing. The authors only checked cytokines and chemokines in blood. The immune response in infected tissues and antibody response may be investigated.

      Thank you very much for appreciating our work and finding our strain to be a live whole-cell vaccine strain with translational potential. Since the current study focused on the identification and detailed characterization of a non-genetically modified live attenuated strain and its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. We are in the process of developing another manuscript where we describe both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find a vaccine solution for invasive candidiasis.

      Strengths:

      The testing of the antifungal activity of EDTA on Candida is not new as many other papers have examined this effect. The novelty here is the use of this EDTA-treated strain as a vaccine to protect against a secondary challenge with wild-type Candida.

      Weaknesses:

      However, data presented in Figure 5 and Figure 6 are not convincing and need further experimental controls and analysis as the authors do not show a time-dependent effect on the CFU of their vaccine formulation. The methodology used is also an issue. As it stands, the impact is minor.

      Thank you so much for appreciating our efforts to develop a novel vaccine against fungal infections. Although the Figs. 5 and 6 are the main straight of the paper, we are afraid that this respected reviewer found them not convincing.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper by Perovic and colleagues describes how important blood vessels called collaterals form during development and remodel/expand upon injury to the brain. These vessels are conduits between arteries that do not have strong blood flow physiologically but upon injury can compensate for conduit loss. Published work by others is largely descriptive and does not address the cellular sources of collaterals over time. Here elegant lineage tracing is used to better understand the source of vascular endothelial cells during embryonic development, and how these lineages contribute to remodeling upon injury. The work is ambitious and important as collateral capacity can strongly influence the trajectory of outcomes with vascular blockage. The work reveals that proliferative arterial EC is the primary contributor to the collaterals developmentally, with a small contribution from capillary/venous EC, and that this shifts to almost completely arterial contribution from birth onward. There are several aspects of the work that, if addressed, would strengthen the study and better support the interesting and novel conclusions, including analysis of non-collateral lineage contributions, more careful interpretation of fixed image data, and more careful annotation of the image panels.

      We thank the reviewer for appreciating the ambition, importance and novelty of our work, and for the constructive suggestions for improvements.

      Reviewer #2 (Public Review):

      Pial collateral vessels are anastomotic connections that cross-connect distal arterioles of the middle, anterior, and posterior cerebral arteries. With respect to ischemic stroke, good pial collateral flow positively correlates with decreased infarct volume and improved recovery; accordingly, optimizing collateral flow represents an important intervention for limiting stroke damage. The goal of this study was to determine the endothelial cell (EC) subtype(s) that contribute to the embryonic and neonatal development of pial collaterals and their expansion in response to stroke. To this end, the authors used lineage tracing methods in the mouse, labeling arterial endothelial cells (using Bmx-CreERT on switch line, R26mTmG) or venous and microvascular endothelial cells (using Vegfr3-CreERT on R26mTmG) and assessing pial collaterals via confocal microscopy. The authors convincingly demonstrate that arterial-lineage ECs comprise the majority of pial collateral ECs during development and in adulthood, with a minor contribution from pial plexus-derived microvascular ECs that decline over time. They also convincingly demonstrate that pial collateral outward remodeling after experimentally-induced stroke (distal middle cerebral artery occlusion, or dMCAO) involves, at least in part, local proliferation of arterial-lineage ECs. The latter is intriguing given that arterial ECs generally leave the cell cycle. While these conclusions are quite solid, some key details are missing that could improve analysis, and some important caveats are not addressed. Moreover, less convincing are mechanistic claims that pial collaterals form via a migratory process of "mosaic colonization" of a preexisting vessel.

      We thank the reviewer for the careful assessment and suggestions for improvements. Claiming migratory behaviour from static images is indeed always tricky and comes with caveats. Our conclusions however are based on the appearance of cells in locations where they are not found at earlier stages. Given that we could exclude persistent recombination, a sound conclusion must be that cells appear in the new location through some means of translocation. Given our experience with the morphology of migrating cells in vivo, the appearance of polarized filopodial structures coinciding with the direction of observed appearance of cells at progressive later stages, strongly suggests active migration. Moreover, these highly migrating cells also exhibit ICAM2 positivity, suggesting that they are directly lining the pre-collateral lumen. In our explanation of how the immigration might occur, we would need to consider solitary cell migration through interstitial space, or rather intercalation movement. The active participation of migrating cells in lumen formation of the nascent pre-collateral suggests intercalation, but further analysis needs to be performed (such as a detailed analysis of cell-cell junctions or sustained apico-basal polarity). The conclusion that such a process highlights mosaic colonization of preexisting vessels is tightly linked to the demonstration of continuous lumen, whilst being found in a vessel without lineage marker, but beginning expression of arterial markers such as Cx40.

      1) It is difficult to understand whether individual collaterals are truly mosaic vessels, or whether arterial or venous/microvascular lineage ECs predominate in any particular region of the pial collateral vasculature. This is due to a number of methodological reasons: arterial and venous/microvascular contributions to pial collaterals were assessed independently, only a few (and in some cases, just one) collaterals were analyzed in each mouse, and regionality/location of collaterals was not addressed. Additionally, the inefficiency and variability of EC labeling, especially with the Vegfr3-CreERT line (Fig. S1, ~6-30%), compounds this problem.

      Factual error: 6 - 22% (not 30)

      The reviewer is correct in their statement that the independent assessment of contribution makes it difficult to locally demonstrate mosaicism. However, we are not aware of a method that could trace two different populations from different sources using recombination genetics simultaneously. Mosaicism however can be concluded from two observations independently. One, we find contribution from an alternative source that at the time point of labelling does not colocalize with arterial BMX lineage cells. Second, the BMX-lineage labelling is never complete in the collaterals, at least at developmental stages. Future work using scRNA seq may shed more light onto the degree of mosaicism. However at this point, the data strongly suggest mosaicism, even if the majority of the cells are of the BMX-lineage. The comment on inefficiency or variability of labelling in particular with the Vegfr3-CreERT line is interesting. At this point, we cannot rule out that the observed variability is due to intrinsic variability in expression, rather than inefficient recombination, or variability thereof. With our current tools we cannot easily distinguish between the two. Again, we hope that future studies with scRNA seq will be able to shed more light onto this interesting biology. Finally, we have not carefully assessed regionality, but have not seen obvious correlations with the degree of mosaicism. It is however important to note that in no case did we just examine one collateral per hemisphere. Each data point is an average of all collaterals from a part of a given collateral zone (imaging region). Usually, it is possible to image 2-4 collateral regions in each embryo. We always imaged multiple collaterals per animal, but sometimes only one region was imaged (due to technical issues).

      2) The identification of "pre-collateral" vessels requires further support. The authors define these vessels by their connection to the feeding artery, their (often) larger diameter, and their more pronounced ICAM2 expression. While most of these criteria are demonstrated in Figure S3, it is not apparent how these vessels were defined in Figure 4, which lacks specific annotation of each of these identifying criteria. As the identification of these novel vessels is one of the key findings of this paper, a more robust method of unambiguously defining them is warranted.

      We agree that it would be fabulous to have a unique marker at hand that identifies pre-collaterals. Our careful analysis of the distribution of the markers we tested, firmly established that the levels of ICAM2 expression nicely highlight structures that become colonized by these BMX lineage cells. Cx40 staining also confirmed this impression. We will attempt better annotation based on these markers to help the reader appreciate these findings. The combination of anatomical location and connection pattern with the stronger ICAM2 staining in our hands is a highly reliable and unambiguous identifier of what we called “pre-collaterals”.

      3) The conclusion that collateral-forming ECs migrate in the direction of flow into preexisting vessels is not well supported. The authors state that the presence of filopodial projections (Figure 4) supports this conclusion. However, filopodia number and directional polarization/orientation were not quantified, and "intercalation movements"/migration, per se, cannot be inferred from these static images.

      The reviewer is correct that claiming migration from static images is always difficult. As stated above, we base our conclusions on the progressive appearance of cells exhibiting migratory behavior, as well as the morphology including filopodia. Although we indeed didn’t quantify filopodia, these structures are in our experience not found on endothelial cells that do not engage in migration. Their consistent presence, and directionality is strongly suggestive of movement. . We will attempt to clarify this better in the text and the figures.

      4) In Figure 5, the simplest explanation for relative Cx40 expression in different vessels is the absence (low expression) or presence (high expression) of flow. This figure provides little mechanistic insight beyond this already-known relationship, and it is unclear how many times this experiment was performed (there is no N, no quantification or correlation).

      Flow is indeed one component of what regulated Cx40. However, a key point of this figure is to show that Cx40 expression can precede the recruitment of BMX lineage cells. This is important to distinguish whether arterial identity is only achieved by recruitment of BMX lineage cells, or exists in certain vessels (for example because they may have more flow) already before this colonization event. It suggests that the BMX population may rather serve to consolidate arterial state, as other structures that may have been Cx40 before, but do not become colonized lose arterial identity? We disagree that this finding does not contribute important information. If only BMX-lineage cells would express Cx40, the conclusion would be very different. This is not a question of how much, but of whether arterialization requires the recruitment of particular cells, or is induced in vessels that adopt arterial identity. This is not a singular observation and we will add the N number onto the figure legend.

      5) There is no statistical analysis in this work. This is justified by the authors by their admission that the study is of a "descriptive nature and...exploratory design."

      This is correct.

      Reviewer #3 (Public Review):

      Summary:

      These studies focus on a very interesting, understudied phenomenon in vascular development - the formation of pial collaterals between cerebral arteries. Understanding the mechanism(s) that regulates this process during normal development could provide important insights for the treatment of adult stroke patients, for which repair is highly dependent on collateral formation. Insights may also be relevant to other collateral-dependent diseases, such as heart disease and chronic peripheral ischemia.

      Strengths:

      The investigators use lineage tracing and 3D imaging to show that, in mouse embryos, endothelial cells (ECs) predominantly from Bmx+ arteries and some from the Vegfr3+ microvasculature, invade pre-existing pre-collateral vascular structures in a process they termed "mosaic colonization", and arterialization of the vessel segments is said to occur concurrently with colonization, although details about EC phenotypes are lacking. Growth of the collaterals in response to ischemic injury relies on local replication of the ECs within the collaterals and not further recruitment from veins and the microvasculature. Although detailed molecular mechanisms are not provided, demonstration of the "cellular mechanism" of pial collateral vascularization is novel.

      Weaknesses:

      Nonetheless, there are some issues that should be addressed, particularly to clarify the phenotype of the ECs forming the collaterals and expanding in response to injury; only their "origin" was traced and not their identity/growth after labeling in Bmx+ vessels.

      We thank the reviewer for pointing out the importance and novelty of our findings, and for the constructive suggestions for improvements. We indeed focussed here on origin and an attempt to distinguish how the cells arrive in their location rather than on their phenotype. We have performed detailed phenotypic analysis including EM analysis of collaterals but without the ability to connect these to the traced lineages. We therefore chose to leave these data for a separate manuscript. Future work will attempt to fully characterize these populations including their transcriptome using scRNA seq. However, isolating collateral ECs to faithfully characterize them is very challenging, and will not be a part of this manuscript. We have performed stainings for various arterial markers, with variable success.. Nevertheless, a full functional study will be part of future work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: The authors study the appearance of oscillations in motifs of linear threshold systems, coupled in specific topologies. They derive analytical conditions for the appearance of oscillations, in the context of excitatory and inhibitory links. They also emphasize the higher importance of the topology, compared to the strength of the links. Finally, the results are confirmed with WC oscillators, which are also linear. The findings are to some extent confirmed with spiking neurons, though here results are less clear, and they are not even mentioned in the Discussion.

      Overall, the results are sound from a theoretical perspective, but I still find it hard to believe that they are of significant relevance for biological networks, or in particular for the oscillations of BG-thalamus-cortex loop in PD. I find motifs in general to be too simplistic for multiscale and generally large networks as is the case in the brain. Moreover, the division of regions is more or less arbitrary by definition, and having such a strong dependence on an odd/even number of inhibitory links is far from reality. Another limitation is the fact that the cortex is considered a single node. Similarly, decomposing even such a coarse network in all possible (238 in this case) motifs doesn't seem of much relevance, when I assume that the emergence of pathological rhythms is more of an emergent phenomenon.

      Strengths:

      From the point of view of nonlinear dynamics, the results are solid, and the intuition behind the proofs of the theorems is well explained.

      Weaknesses:

      As stated in the summary, I find the work to be too theoretical without a real application in biological systems or the brain, where the networks are generally very large.

      We respectfully disagree with the reviewer here. The second half of the paper is all about explaining a biological problem. We have shown the validity of our theoretical results (which indeed were obtained in idealized settings) to explain emergence of oscillations in the basal ganglia. We clearly show that our theoretical results hold both in a rate-based model and in a network model with spiking neurons. The model with spiking neurons is one of the most complete network models of the basal ganglia available in the literature. So we emphasize that we have provided a clear application of our results for the brain networks.

      It is not the problem in the simplicity of the model or of the topology, it is often the case that the phenomena are explained by very reduced systems, but the problem is that the applicability of the finding cannot be extended. E.g. the Kuramoto model uses all-to-all coupling, or similar with QIF neurons which also need to follow a Lorentzian distribution in order to derive a mean field.

      We do not understand this comment. There is no need to extend these results to a network of Kuramoto models because in that setting we already assume that individual nodes/populations are oscillating – there is no problem of emergence of oscillations. Here, we are specifically considering a setting in which nodes themselves are not oscillators. We agree that we, at this point, have no insight into how to extend our analytical proof to a situation where individual nodes are spiking.

      But in those cases, relaxing the strict conditions that were necessary for the derivations, still conserves the main findings of the analysis, which I don't see being the case here. The odd/even number rule is too strict, and talking about a fixed and definite number of cycles in the actual brain seems too simplistic.

      We have clearly relaxed most of our assumptions when we considered a network model of basal ganglia in which each subpopulation is a collection of spiking neurons. And as we have shown our results still hold (see Figure 5). Again our model is about oscillations in a network of networks i.e. network of brain regions.

      At meso-scale it is not unreasonable to find such cycles and even-odd number rules. We have shown this for the case of a cortico-basal ganglia model. We can also extend this to cortico-thalamic networks and so on. We have already emphasized this point in the introduction to avoid any confusion: see lines 62-66 – “We prove this conjecture for the threshold-linear network (TLN) model without delays which can closely capture the dynamics of neural populations. Therefore, it is implicit that our results do not hold at the neuronal level but rather at the level of neuron populations/brain regions e.g. the basal ganglia (BG) network which can be described a network of different nuclei.” and lines 69-70 – ’Within the framework of the odd-cycle theory, distinct nuclei are associated with either excitatory or inhibitory nodes.’

      Being linear is another strong assumption, and it is not clear how much of the results are preserved for spiking neurons, even though there is such an analysis, or maybe for other nonlinear types of neuronal masses.

      Clearly our results hold in a network of spiking neurons (see Figure 5). It is of course interesting to ask whether our results hold in a network where individual spiking neurons have more complex spiking behavior like AdEx or Quadratic IF. But that kind of analysis deserves a full manuscript on its own.

      Delays are also mentioned, and their impact on the oscillatory networks is as expected: it reduces the amplitude, but there is no link to the literature, where this is an established phenomenon during synchronization. Finally, the authors should also discuss the time-delays as a known phenomenon to cause or amplify oscillations at different frequencies in a network of coupled oscillators, e.g Petkoski & Jirsa Network Neuroscience 2022, Tewarie et al. NeuroImage 2019, Davis et al. Nat Commun 2021.

      This is indeed a weakness of our model. But as the reviewer already knows, dynamical systems with delays are very difficult to analyze analytically. We have mentioned this in the limitations of the model and the analysis. In our simulations we have considered delays and when the delays are within reasonable limits our results hold.

      Reviewer #2 (Public Review):

      Summary:

      The authors present here a mathematical and computational study of the topological/graph theory requirements to obtain sustained oscillations in neural network models. A first approach mathematically demonstrates that a given network of interconnected neural populations (understood in the sense of dynamical systems) requires an odd number of inhibitory populations to sustain oscillations. The authors extend this result via numerical simulations of (i) a simplified set of Wilson-Cowan networks, (ii) a simplified circuit of the cortico-basal ganglia network, and (iii) a more complex, spike-based neural network of basal ganglia network, which provides insight on experimental findings regarding abnormal synchrony levels in Parkinson's Disease (PD).

      Strengths:

      The work elegantly and effectively combines solid mathematical proof with careful numerical simulations at different levels of description, which is uncommon and provides additional layers of confidence to the study. Furthermore, the authors included detailed sections to provide intuition about the mathematical proof, which will be helpful for readers less inclined to the perusal of mathematical derivations. Its insightful and well-informed connection with a practical neuroscience problem, the presence of strong beta rhythms in PD, elevates the potential influence of the study and provides testable predictions.

      Weaknesses:

      In its current form, the study lacks a more careful consideration of the role of delays in the emergence of oscillations. Although they are addressed at certain points during the second part of the study, there are sections in which this could have been done more carefully, perhaps with additional simulations to solidify the authors' claims. Furthermore, there are several results reported in the main figures which are not explained in the main text. From what I can infer, these are interesting and relevant results and should be covered. Finally, the text would significantly benefit from a revision of the grammar, to improve the general readability at certain sections. I consider that all these issues are solvable and this would make the study more complete.

      This point has been made by the first reviewer as well. So we repeat our answer:

      This is indeed a weakness of our model. But as the reviewer already knows, dynamical systems with delays are very difficult to analyze analytically. We have mentioned this in the limitations of the model and the analysis. In our simulations we have considered delays and when the delays are within reasonable limits our results hold.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in my comments above, I think that the work is already quite solid and relevant but would significantly improve if some issues were addressed:

      We would like to thank the reviewer for valuable comments and constructive feedback which has helped us greatly improve the manuscript.

      1) While the authors acknowledge early on the limitations of this study in terms of not considering plasticity or neuron biophysics (line 72), I think that the absence of propagation delays should be explicitly included here. This absence leads to inaccuracies --for example, the sentence "Consider a small network of two nodes. If we connect them mutually with excitatory synapses, intuitively we can say that the two-population network will not oscillate" (line 74) is only correct if the delays (or signal latencies) are zero. With a proper delay, two excitatory neurons can engage in oscillations with a period given by two times the value of the delay.

      A similar situation happens for inhibitory neurons, where the winner-take-all dynamics described in line 77 is only valid for zero delay. It is known that a homogeneous population of inhibitory spiking neurons with delayed synapses can lead to fast oscillations (Brunel and Hakim 1999), something which is also valid for the equivalent inhibitory single node with delayed self-inhibition. Indeed, a circuit of two inhibitory populations with delayed self- and cross-inhibition can generate oscillations, contradicting the main conclusion of the odd number of inhibitory nodes needed for oscillations.

      Because of these considerations, I think the authors should be more careful when explaining the effects of delays, and state that their main results on the link between oscillations and having an odd number of inhibitory nodes are not valid when delays are considered. They could modify the sentences in lines 72-77 above and include a supplementary figure right after their simulation study for the Wilson-Cowan (to explain the examples above, and also the one in the next point).

      The reviewer has brought up a critical point regarding the impact of propagation delays, and we completely concur with your assessment. In our study, we indeed did not comprehensively consider the effects of propagation delays in cycles with even inhibition, which may introduce inaccuracies in our conclusions.

      We note that in the Wilson-Cowan model with delays, certain cycles with even number of inhibitory links can also generate oscillations with a period equal to twice the delay value. However, in our hand such oscillations were transient and dissipated quickly.

      To better reflect the limitations of our research, we have made significant modifications to the relevant sections in our manuscript.

      In line 100, we've added text to explicitly state that we considered delays in our simulations and acknowledged their potential to generate oscillations ("Given the importance of delays in biological network such as BG, we will consider them in the simulations.").

      In line 102, we've clarified that our conclusions are based on a scenario without delays ("In this following, we give simple examples of the possibility of oscillation (or not) based on the connectivity characteristics of small networks without delays. Let us start with a network of two nodes.").

      Additionally, in line 230, we've included a reference figure supplement 3-2 to highlight the outcomes in terms of oscillations ("EII network only resulted in transient oscillations (Fig. 3, figure supplement 3-1, figure supplement 3-2)").

      In lines 234-237, we've added a sentence discussing the role of synaptic delays in generating transient oscillations in cycles with an even number of inhibitory components, referring to figure supplement 3-2 ("In networks with even number of inhibitory connections (e.g. EII, EEE, II), synaptic delays are the sole mechanism for initiating oscillations, however, unless delays are precisely tuned such oscillations will remain transient (see Supplementary figure supplement 3-2)").

      Moreover, in response to the reviewer’s suggestion, we have included an additional figure supplement 3-2 to illustrate how cycles with even inhibitory components generate transient oscillations when propagation delays are taken into account. This figure provides a visual representation of the phenomenon and enhances the clarity of our findings.

      2) In Figure 3, two motifs (III and EII) are explored to demonstrate the validity of the results across different parameters. Delays don't seem to play a disruptive role in these two cases, but the results seem to be different for other motifs not considered here. Aside from the examples mentioned above, I can imagine how a motif of EEE (i.e. a circle of three excitatory Wilson-Cowan neurons) would display oscillations when delays are included, as the activation would 'circulate' along the ring. However, this EEE motif has an even number of inhibitory units (or perhaps zero is considered an exception, but if so it's not mentioned in the text).

      We thank the reviewer for this observation regarding Figure 3. Indeed, the impact of delays may differ for other motifs not considered in our study. For example, as the reviewer has correctly anticipated, a motif of EEE (a circular network of three excitatory Wilson-Cowan neurons) would exhibit oscillations when delays are included, as activation could 'circulate' along the ring.

      To address this concern,we have performed new simulations (added as a new supplementary figure supplement 3-2). As illustrated in figure supplement 3-2, oscillations may indeed arise in the EEE motif when delays are introduced. However, these oscillations will eventually dissipate – at least with our settings.

      3) Figures 1b, 1c, and 4e display interesting results, but these are absent from the main text. Please include the description of those results. Particularly the case of Figs 1b and 1c seems very relevant to understanding the main results in the context of more complex networks, in which multiple loops with odd and even numbers of inhibitory units would coexist in the network. Does the number of odd-inhibitory loops in a given network affect somehow the power or frequency of the resulting network oscillations? It would be interesting to show this.

      Indeed, we did not explain Figs 1b,c and 4e properly. Now we have revised the manuscript in the following way to incorporate these results:

      In lines 124-128, we added the following text to introduce the concept: "We can generalize these results to cycles of any size, categorizing them into two types based on the count of their inhibitory connections in one direction (referred to as the odd cycle rule, as illustrated in Fig. 1b). More complex networks can also be decomposed into cycles of size 2…N (where N is number of nodes), and predict the ability of the network to oscillate (as shown in Fig. 1c)" In line 298, we included the following text to highlight the relevant result: "Next, we removed the STN output (equivalent to inhibition of STN), the Proto-D2-Arky subnetwork generated oscillations for weak positive inputs to the D2-SPNs (Fig.4e, bottom)."

      How the number of odd/even loops affect the frequency is an interesting question. Intuitively there should be a relation between the two. However, a complete treatment of this question is beyond the scope of the manuscript but we think that in a network with identical node properties, more odd cycles should imply higher oscillation power.

      4) The cortico-BG model is focused on how inactivating STN could suppress (or not) beta oscillations, following experimental observations. However, besides mechanisms for extinguishing oscillations, it would be interesting to see if the progressive emergence of pathological beta oscillations could be explained by the modification of some of the nodes in the model (for example, explicitly mimicking the loss of dopaminergic neurons in the substantia nigra). This could be a very interesting additional figure in the main text.

      This is an interesting suggestion. Something similar has been already done – e.g. Kumar et al. (2010) showed that progressive increase of inhibition of GPe can lead to oscillations. Similarly Holgado et al. (2008) showed how progressive change in the mutual connectivity between STN and GPe can cause oscillations. More recently, Ortone et al. (PloS Comp. Biol 2023) and Azizpour et al. (2023 Bioarxiv) have also shown the effect of progressive change in individual node properties on oscillations in basal ganglia using numerical simulations. Our work in a way provides the theoretical backing to their work. Therefore, we think it is not necessary to again show these results in our model. Instead we have cited these papers. Lines 392-396

      5) I observed some grammatical inconsistencies in the text, some of them are indicated below. I would suggest carefully going through the text to correct those issues or seeking help with editing.

      -line 32 "...which can closely capture the neural population dynamics". Which population dynamics? Do the authors refer to general neural dynamics?

      -line 33 "long term behavior" -> long-term behavior

      -line 68 "given the ionic channel composition" -> "given its ionic channel composition"

      We apologize for the grammatical inconsistencies in our manuscript. We have made the necessary corrections to improve the clarity and accuracy of our text.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript is useful for analytically showing that a cyclic network of threshold-linear neural populations can only oscillate if it has an odd number of inhibitory nodes with strong enough connections. Establishing this result, which holds under rather narrow assumptions, relies on standard tools from dynamical system theory. I find the strength of support for this result to be incomplete for the reasons detailed below:

      Although the mathematical arguments used appear to be correct, the manuscript lacks in rigor and clarity. For instance, the main result presented in theorem 2 is stated in a very unclear fashion: aside from the oddity of the number of inhibitory nodes, there are two conditions to check, which determines four cases. This can be explained in a much more straightforward way without introducing four relations in equations 4-7.

      We acknowledge the reviewer’s concern regarding the presentation of the main result in Theorem 2.

      We would like to emphasize that the introduction of four relations in equations 4-7 was intended to provide a detailed and transparent exposition of the conditions for the main result. While we understand that this approach may appear less straightforward, it allows for a more comprehensive understanding of the underlying logic and the multiple factors influencing the outcomes.

      However, we are open to suggestions for more concise and clear ways to express these conditions if the reviewer has specific recommendations or if there are alternative approaches that the reviewer believes would be more effective in conveying the information.

      Moreover, equation 3 in that same theorem is clearly wrong.

      We sincerely apologize for the typographical error in equation 3 within the same theorem. We thank the reviewer for noticing this. We have revised the text to rectify this mistake. The equation has now been corrected to ensure its accuracy.

      The proof of theorem 2 relies on standard linear algebra and can be improved as well: there are typos, approximations, and missing words (see line 664). The rigor of the exposition is also unsatisfactory. For instance, the proof of Lemma 1 ends with the sentence: "Similarly as before, the convergence of the dynamics driven by the left and right terms ends the proof". I don't know what this means.

      We thank the reviewer for the comments and suggestions. We have made the necessary adjustments to enhance the rigor and clarity of our mathematical reasoning in the revised manuscript.

      In line 644, we have provided clarification for the sentence you found unclear. The revised version now offers a more precise explanation that should help in understanding the proof.

      At the same time, the intuitive arguments presented in the main text are vague at best and do not really help grasping the possible generalizability of the results. For instance, I do not understand the message of panel B in Figure 2 and there seems to be no explanation about it in the main text.

      The main purpose of Figure 2B is to offer a visual representation of the concept and to serve as an aid for readers who may prefer a graphical illustration over extensive equations. While we understand that the figure may not provide a complete explanation on its own, it is intended to complement the text and mathematical content presented in the main text. In the revised version we have added the explanation of Figure 2B.

      Aside from the analytical result, most of the paper consists in simulating networks with distinct inhibitory cyclic structure to validate the theoretical argument. I do not find this approach particularly convincing due to the qualitative nature of the numerical results presented. There is little quantitative analysis of the network structure in relation to the emergence of oscillations. It is also hard to judge whether the examples discussed are cherry picked or truly representative of a large class of dynamics.

      The reviewer has a valid concern about numerical simulations and qualitative nature of the results. We would like to provide some perspective on our approach.

      In our paper, the primary focus is on the mathematical proof, which rigorously establishes the existence of our results. However, we understand that numerical simulations are valuable for illustrating the applicability of the theoretical framework and providing insights into the practical implications.

      If we get into the quantitative description of all the results, the manuscript will become prohibitively long. We acknowledge that there is a balance to be struck between theory and numerical examples in a research paper. We believe that, in conjunction with the mathematical proof, the numerical simulations serve the purpose of illustrating the existence of our results in specific examples. While we cannot provide an exhaustive exploration of all possible network structures, we have chosen representative cases to demonstrate the applicability of our findings. Some of these are already provided in figure supplements S3-1 and S3-3. In the absence of specific suggestions from the reviewer we would like to leave it as is.

      Moreover, the authors apply their cycle analysis to real-world networks by considering cycles of inhibitory nodes independently, whereas the same nodes can belong to several cycles. I find it hard to believe that considering these cycles independently should be enough to make predictions about the emergence of oscillations, as these cycles must interact with one another via shared nodes. I do not understand the color coding used to mark distinct cycles in supplementary figures. There is also not enough information to understand figures in the main text. For instance, I do not understand what the grids are representing in panel B, Figure 4.

      We have clarified the color coding and added more information to understand the figures. We appreciate the reviewer’s concern about our application of cycle analysis to real-world networks and the clarity of our figures. It is not a matter of belief – we have provided a mathematical proof and complemented that with illustrative examples from real-world networks i.e. cortico-basal ganglia network with both rate-based and spiking neurons. Clearly our results hold.

      Regarding the color coding in supplementary figures, we have revised the color scheme to make it more intuitive and informative in caption of figure 4: we use different colors to mark potential oscillators in each motif in BG, and each color means an oscillator from panel a. For more details, see figure supplements 4-1–4-6. The colors now represent distinct cycles more clearly, helping readers better interpret the figures.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents potentially useful findings describing how activity in the corticotropin-releasing hormone neurons in the paraventricular nucleus of the hypothalamus modulates sevoflurane anesthesia, as well as a phenomenon the authors term a "general anesthetic stress response". The technical approaches are solid and the data presented are largely clear. However, the primary conclusion, that the PVHCRH neurons are a mechanism of sevoflurane anesthesia, is inadequately supported.

      We appreciate the editors and reviewers for their thorough assessment and constructive feedback. We have provided clarifications and updated the manuscripts to better interpret our results, please see below. As for the primary conclusion, we revised it as PVH CRH neurons potently modulate states of anaesthesia in sevoflurane general anesthesia, being a part of anaesthesia regulatory network of sevoflurane.

      Combined Public Review:

      This study describes a group of CRH-releasing neurons, located in the paraventricular nucleus of the hypothalamus, which, in mice, affects both the state of sevoflurane anesthesia and a grooming behavior observed after it. PVH-CRH neurons showed elevated calcium activity during the post-anesthesia period. Optogenetic activation of these PVH-CRH neurons during sevoflurane anesthesia shifts the EEG from burst-suppression to a seemingly activated state (an apparent arousal effect), although without a behavioral correlate. Chemogenetic activation of the PVH-CRH neurons delays sevoflurane-induced loss of righting reflex (another apparent arousal effect). On the other hand, chemogenetic inhibition of PVH-CRH neurons delays recovery of the righting reflex and decreases sevoflurane-induced stress (an apparent decrease in the arousal effect). The authors conclude that PVH-CRH neurons are a common substrate for sevoflurane-induced anesthesia and stress. The PVH-CRH neurons are related to behavioral stress responses, and the authors claim that these findings provide direct evidence for a relationship between sevoflurane anesthesia and sevoflurane-mediated stress that might exist even when there is no surgical trauma, such as an incision. In its current form, the article does not achieve its intended goal.

      Thank you for the detailed review. We have carefully considered your comments and have revised the manuscript to provide a clearer interpretation of our findings. Our findings indicate that PVH CRH neurons integrate the anesthetic effect and post-anesthesia stress response of sevoflurane (GA), providing new evidence for understanding the neuronal regulation of sevoflurane GA and identifying a potential brain target for further investigation into modulating the post-anesthesia stress response. However, we did not propose that there was a direct relationship between sevoflurane anesthesia and sevoflurane-mediated stress in the absence of incision. Our results mainly concluded that PVH CRH neurons integrate the anaesthetic effect and post-anaesthesia stress response of sevoflurane GA, which offers new evidence for the neuronal regulation of sevoflurane GA and provides an important but ignored potential cause of the post-anesthesia stress response.

      Strengths:

      The manuscript uses targeted manipulation of the PVH-CRH neurons, and is technically sound. Also, the number of experiments is substantial.

      Thank you.

      Weaknesses:

      The most significant weaknesses are a) the lack of consideration and measurement of GABAergic mechanisms of sevoflurane anesthesia, b) the failure to use another anesthetic as a control, c) a failure to document a compelling post-anesthesia stress response to sevoflurane in humans, d) limitations in the novelty of the findings. These weaknesses are related to the primary concerns described below:

      Concerns about the primary conclusion, that PVH-CRH neurons mediate "the anesthetic effects and post-anesthesia stress response of sevoflurane GA".

      Thanks for the advice. Our responses are as below:

      1) Just because the activity of a given neural cell type or neural circuit alters an anesthetic's response, this does not mean that those neurons play a role in how the anesthetic creates its anesthetic state. For example, sevoflurane is commonly used in children. Its primary mechanism of action is through enhancement of GABA-mediated inhibition. Children with ADHD on Ritalin (a dopamine reuptake inhibitor) who take it on the day of surgery can often require increased doses of sevoflurane to achieve the appropriate anesthetic state. The mesocortical pathway through which Ritalin acts is not part of the mechanism of action of sevoflurane. Through this pathway, Ritalin is simply increasing cortical excitability making it more challenging for the inhibitory effects of sevoflurane at GABAergic synapses to be effective. Similarly, here, altering the activity of the PVHCRH neurons and seeing a change in anesthetic response to sevoflurane does not mean that these neurons play a role in the fundamental mechanism of this anesthetic's action. With the current data set, the primary conclusions should be tempered.

      Thank you for your comments. Our results adequately uncover PVH CRH neurons that modulate the state of consciousness as well as the stress response in sevoflurane GA, but are insufficient to demonstrate that these neurons play a role in the underlying mechanism of sevoflurane anesthesia. We will revise our conclusions and make them concrete. The primary conclusion has been revised as PVH CRH neurons potently modulate states of anaesthesia in sevoflurane GA, being a part of the anaesthesia regulatory network of sevoflurane.

      2) It is important to compare the effects of sevoflurane with at least one other inhaled ether anesthetic. Isoflurane, desflurane, and enflurane are ether anesthetics that are very similar to each other, as well as being similar to sevoflurane. It is important to distinguish whether the effects of sevoflurane pertain to other anesthetics, or, alternatively, relate to unique idiosyncratic properties of this gas that may not be a part of its anesthetic properties.

      For example, one study cited by the authors (Marana et al.. 2013) concludes that there is weak evidence for differences in stress-related hormones between sevoflurane and desflurane, with lower levels of cortisol and ACTH observed during the desflurane intraoperative period. It is not clear that this difference in some stress-related hormones is modeled by post-sevoflurane excess grooming in the mice, but using desflurane as a control could help determine this.

      Thank you for your suggestions. We completely agree on the importance of determining whether the effects of sevoflurane apply to other anesthetics or arise from unique idiosyncratic attributes separate from its anesthetic properties. However, it is challenging to definitively conclude whether the effects of sevoflurane observed in our study extend to other inhaled anesthetics, even with desflurane as a control. While sevoflurane shares many common anesthetic properties with other inhalation agents, it also exhibits distinct characteristics and potential idiosyncrasies that set it apart from its counterparts. Regarding studies related to desflurane's impact on hormone levels or stress-like behaviors, one study involving 20 women scheduled for elective total abdominal hysterectomy demonstrated that there was no significant correlation between the intra-operative depth of anesthesia achieved with desflurane and the extent of the endocrine-metabolic stress response (as indicated by the concentrations of plasma cortisol, glucose, and lactate)1. Besides, a study conducted with mice suggested the abilities related to sensorimotor functions, anxiety and depression did not undergo significant changes after 7 days of anesthesia administered with 8.0% desflurane for 6 h2. Furthermore, a study involving 50 Caucasian women undergoing laparoscopic surgery for benign ovarian cysts demonstrated that in low stress surgery, desflurane, when compared to sevoflurane, exhibited superior control over the intraoperative cortisol and ACTH response 3. Based on these findings, we propose that the effect we observed in this study is likely attributed to the unique idiosyncratic properties of sevoflurane. We will conduct additional experiments to investigate this proposal with other commonly used anaesthetics in our future studies.

      Concerns about the clinical relevance of the experiments

      In anesthesiology practice, perioperative stress observed in patients is more commonly related to the trauma of the surgical intervention, with inadequate levels of antinociception or unconsciousness intraoperatively and/or poor post-operative pain control. The authors seem to be suggesting that the anesthetic itself is causing stress, but there is no evidence of this from human patients cited. We were not aware that this is a documented clinical phenomenon. It is important to know whether sevoflurane effectively produces behavioral stress in the recovery room in patients that could be related to the putative stress response (excess grooming) observed in mice. For example, in surgeries or procedures that required only a brief period of unconsciousness that could be achieved by administering sevoflurane alone (comparable to the 30 min administered to the mice), is there clinical evidence of post-operative stress?

      Thank you for your question. There is currently no direct evidence available. Studies on sevoflurane in humans primarily focus on its use during surgical interventions, making it difficult to find studies that solely administer sevoflurane, as was done in our study with mice. Generally, a short anesthesia time refers to procedures that last less than one hour, while a long anesthesia time could be considered for procedures lasting several hours or more4. A study published in eLife investigated the patterns of reemerging consciousness and cognitive function in 30 healthy adults who underwent GA for three hours 5. This finding suggests that the cognitive dysfunction observed immediately and persistently after GA in healthy animals may not necessarily apply anesthesia and postoperative neurocognitive disorders could be influenced by factors other than GA, such as surgery or patient comorbidity. Therefore, further studies are needed to verify the post-operative stress in sevoflurane-only short time anesthesia.

      Indeed, stress after surgeries can result from multiple factors aside from anesthesia, including pain, anxiety, inflammation, but what we want to illustrate in this study is that anesthesia could be one of these factors that we ignored in previous studies. In our current study, we did not propose that there was a direct relationship between sevoflurane anesthesia and sevoflurane-mediated stress without incision. We observed stress-related behavioural changes after exposure of sevoflurane GA in mouse model, indicating sevoflurane-mediated stress might exist without surgical trauma. Importantly, whether anesthetic administration alone will cause post-operative stress is worth studying in different species especially human.

      Patients who receive sevoflurane as the primary anesthetic do not wake up more stressed than if they had had one of the other GABAergic anesthetics. If there were signs of stress upon emergence (increased heart rate, blood pressure, thrashing movements) from general anesthesia, the anesthesiologist would treat this right away. The most likely cause of post-operative stress behaviors in humans is probably inadequate anti-nociception during the procedure, which translates into inadequate post-op analgesia and likely delirium. It is the case that children receiving sevoflurane do have a higher likelihood of post-operative delirium. Perhaps the authors' studies address a mechanism for delirium associated with sevoflurane, but this is not considered. Delirium seems likely to be the closest clinical phenomenon to what was studied.

      We agree with your idea. We aim to establish a connection between post-operative delirium in humans and stress-like behaviors observed in mice following sevoflurane anesthesia. Specifically, we have observed that the increased grooming behavior exhibited by mice after sevoflurane anesthesia resembles the fuzzy state of consciousness experienced during post-operative delirium6. In our discussion, we also emphasized the occurrence of sevoflurane-induced emergence agitation, a common phenomenon reported in clinical studies with an incidence of up to 80%. This state is characterized by hyperactivity, confusion, delirium, and emotional agitation 7,8. Meanwhile, in our experimental tests, namely the open field test (OFT) and elevated plus maze (EPM) test, we observed that mice exposed to sevoflurane inhalation displayed reduced movement distances during both the OFT and EPM tests (Figure 7G and I). These findings suggest a decline in behavioral activity similar to what is observed in cases of delirium.

      Concerns about the novelty of the findings

      CRH is associated with arousal in numerous studies. In fact, the authors' own work, published in eLife in 2021, showed that stimulating the hypothalamic CRH cells leads to arousal and their inhibition promotes hypersomnia. In both papers, the authors use fos expression in CRH cells during a specific event to implicate the cells, then manipulate them and measure EEG responses. In the previous work, the cells were active during wakefulness; here- they were active in the awake state that follows anesthesia (Figure 1). Thus, the findings in the current work are incremental.

      Thank you for acknowledging our previous work focusing on the changes in the sleep-wake state of mice when PVH CRH neurons are manipulated. In this study, our primary objective was to identify the neuronal mechanisms mediating the anesthetic effects and post-anesthetic stress response of sevoflurane GA. While our study claims that activation of PVH CRH neurons leads to arousal, it provides evidence that PVH CRH neurons may play a role in the regulation of conscious states in GA. Our current findings uncover that PVH CRH neurons modulate the state of consciousness as well as the stress response in sevoflurane GA, and that the modulation of PVH CRH neurons bidirectionally altered the induction and recovery of sevoflurane GA. This identifies a new brain region involved in sevoflurane GA that goes beyond the arousal-related regions.

      The activation of CRH cells in PVN has already been shown to result in grooming by Jaideep Bains (cited as reference 58). Thus, the involvement of these cells in this behavior is expected. The authors perform elaborate manipulations of CRH cells and numerous analyses of grooming and related behaviors. For example, they compare grooming and paw licking after anesthesia with those after other stressors such as forced swim, spraying mice with water, physical attack, and restraint. However, the relevance of these behaviors to humans and generalization to other types of anesthetics is not clear.

      The hyperactivity of PVH CRH neurons and behavior (e.g., excessive self-grooming) in mice may partially mirror the observed agitation and underlying mechanisms during emergence from sevoflurane GA in patients. As mentioned in the Discussion section (page 16, lines 371-374), sevoflurane-induced emergence agitation represents a prevalent manifestation of the post-anesthesia stress response. It is frequently observed, with an incidence of up to 80% in clinical reports, and is characterized by hyperactivity, confusion, delirium, and emotional agitation7,8. Our aim in this study is to distinguish the excessive stress responses of patients to sevoflurane GA from stress triggered by other factors. Other stimuli, such as forced swimming, can be considered sources of both physical and emotional stress, which are associated with depression and anxiety in humans.

      Regarding generalization to other types of anesthetics, we propose that the stress-related behavioral effects observed in this study might occur in cases of the administration of certain types of anesthetics. For example, one study showed that intravenous ketamine infusion (10 mg/kg, 2 hours) elevated plasma corticosterone and progesterone levels in rats, reducing locomotor activity (sedation) 9. The administration of intravenous anesthesia with propofol combined with sevoflurane caused greater postoperative stress than the single use of propofol10. However, desflurane, a common inhaled ether anesthetic, when compared to sevoflurane, was associated with better control of intraoperative cortisol and ACTH response in low-stress surgeries8. Thus, these behaviors observed after exposure to sevoflurane GA may be related to the post-anesthesia stress response in humans, which might also occur in cases of the administration of certain types of anesthetics.

      Recommendations for the authors:

      Reviewer 1

      1) The CRH-Cre mouse line should be validated. There are several lines of these mice, and their fidelity varies.

      The CRH-Cre mouse line we used in this study is from The Jackson Laboratory (https://www.jax.org/strain/012704) with the name B6(Cg)-Crhtm1(cre)Zjh/J (Strain #: 012704). These CRH-ires-CRE knock-in mice have Cre recombinase expression directed to CRH positive neurons by the endogenous promoter/enhancer elements of the corticotropin releasing hormone locus (Crh). We have done standard PCR to validate the mouse line following genotyping protocols provided by the Jackson Laboratory. The protocol primers were: 10574 (SEQUENCE 5' → 3': CTT ACA CAT TTC GTC CTA GCC); 10575 (SEQUENCE 5' → 3': CAC GAC CAG GCT GCG GCT AAC); 10576 (SEQUENCE 5' → 3': CAA TGT ATC TTA TCA TGT CTG GAT CC). The 468-bp CRH-specific PCR product was amplified in mutant (CRH-Cre+/+) mice; in heterozygote (CRH-Cre+/-) mice, both the 468-bp and the 676-bp PCR products were detected; in wild type (WT) mice, only the 676-bp WT allele-specific PCR product was amplified. An example of PCR results is presented below. The heterozygote and mutant mice were included in our study.

      Author response image 1.

      1. It would be very helpful to validate the CRH antibody. Using any antiserum at 1:800 suggests that it may not be potent or highly specific.

      As requested, we used the same CRH antibody at a concentration of 1:800, following the methods described in the Method section. The results are displayed below.

      Author response image 2.

      1. In Figure 1C, the control sections are out of focus, any cells are blurry, reducing confidence in the analyses (locus ceruleus cells appear confluent in the control?)

      Sorry for the confusing figure and we have revised the control section part of Figure 1C:

      Author response image 3.

      Reviewer 2

      1) In the Abstract, to say that "General anesthetics benefit patients undergoing surgeries without consciousness. ..." is a gross understatement of the essential role that general anesthesia plays today to make surgery not only tolerable but humane. This opening sentence should be rewritten. General anesthesia is a fundamental process required to undertake safely and humanely a high fraction of surgeries and invasive diagnostic procedures.

      As requested, we rewrote this opening sentence, please see the follows:

      GA is a fundamental process required to undertake surgeries and invasive diagnostic procedures safely and humanely. However, the undesired stress response associated with GA can lead to delayed recovery and even increased morbidity in clinical settings.

      2) In the Abstract, when discussing the response of the PVN-CRH neurons to chemogenetic inhibition, say exactly what the "opposite effect" is.

      Thanks for your insights. We have rewritten our abstract as follows:

      Chemogenetic activation of these neurons delayed the induction and accelerated emergence from sevoflurane GA, whereas chemogenetic inhibition of PVH CRH neurons promoted induction and prolonged emergence from sevoflurane GA.

      3) In all spectrograms the dynamic range is compressed between 0.5 and 1. Please make use of the full range, as some details might be missed because of this compression.

      We are sorry for the incorrect unit of the spectrograms. We have provided the correct one with full range, please see below:

      Author response image 4.

      Author response image 5.

      4) The spectrogram in Figure 2D has several frequency chirps that do not seem physiological.

      Thank you for your comments. The frequency chips of the spectrogram during the During and Post 1 phase were caused by recording noises. To avoid confusion, we have deleted the spectrogram in Figure 2D.

      5) The 3D plots in Figures 3G and H are not helpful. Thanks for the comment. We'd like to keep the 3D plots as they aid visual comparison of three different features of grooming, which complements other panels in Figure 3.

      6) The spectrograms in Figures 5A and B are too small, while the spectra in Figures 5C and D are too large. Please invert this relationship, as it is interesting and important to see the details in the spectrograms. The same happens in Figure 6.

      We adjusted the layout of the Figure 5 and Figure 6 as requested, please see below:

      Author response image 6.

      Author response image 7.

      7) In Figure 6H, the authors compute the burst-suppression ratio during a period that seemingly has no bursts or suppressions (Figure 6B).

      The burst-suppression ratio was computed from data with the minimum duration of burst and suppression periods set at 0.5 s. Sorry for the confusion. We added a new supplementary figure (Figure 6-figure supplement 8) displaying a 40-second EEG with a burst suppression period to better visualize the burst suppression.

      Author response image 8.

      8) The data analyses are done in terms of p-values. They should be reported as confidence intervals so that any effect the authors wish to establish is measured along with its uncertainty.

      Thank you for your valuable suggestions regarding our manuscript. We appreciate your thoughtful consideration of our work. We understand your concern but we would like to provide some justification for our choice of reporting p-values and explain why we believe they are appropriate for our study. First, the use of p-values for hypothesis testing and significance assessment is a common practice in our field. Many previous studies in our area of research also report results in terms of p-values. For example, Wei Xu11 published in 2020 suggested sevoflurane inhibits MPB neurons through postsynaptic GABAA-Rs and background potassium channels, Ao Y12 demonstrated that activation of the TH:LC-PVT projections is helpful in facilitating the transition from isoflurane anesthesia to an arousal state, using P-value as data analyses. By adhering to this convention, we ensure that our findings are consistent with the existing body of literature. This makes it easier for readers to compare and integrate our results with previous work. Secondly, while confidence intervals can provide a measure of effect size and uncertainty, p-values offer a concise way to communicate statistical significance. They help readers quickly assess whether an effect is statistically significant or not, which is often the primary concern when interpreting research findings. We hope that by providing these reasons for our choice of reporting p-values, we can address your concern while maintaining the integrity and consistency of our study. If you believe there are specific instances where reporting confidence intervals would be more informative, please feel free to highlight those, and we will consider your suggestion on a case-by-case basis. 

      References

      1. Baldini, G., Bagry, H. & Carli, F. Depth of anesthesia with desflurane does not influence the endocrine-metabolic response to pelvic surgery. Acta Anaesthesiol Scand 52, 99-105, doi:10.1111/j.1399-6576.2007.01470.x (2008).
      2. Niikura, R. et al. Exploratory analyses of postanesthetic effects of desflurane using behavioral test battery of mice. Behav Pharmacol 31, 597-609, doi:10.1097/fbp.0000000000000567 (2020).
      3. Marana, E. et al. Desflurane versus sevoflurane: a comparison on stress response. Minerva Anestesiol 79, 7-14 (2013).
      4. Vutskits, L. & Xie, Z. Lasting impact of general anaesthesia on the brain: mechanisms and relevance. Nat Rev Neurosci 17, 705-717, doi:10.1038/nrn.2016.128 (2016).
      5. Mashour, G. A. et al. Recovery of consciousness and cognition after general anesthesia in humans. Elife 10, doi:10.7554/eLife.59525 (2021).
      6. Mattison, M. L. P. Delirium. Ann Intern Med 173, Itc49-itc64, doi:10.7326/aitc202010060 (2020).
      7. Dahmani, S. et al. Pharmacological prevention of sevoflurane- and desflurane-related emergence agitation in children: a meta-analysis of published studies. Br J Anaesth 104, 216-223, doi:10.1093/bja/aep376 (2010).
      8. Lim, B. G. et al. Comparison of the incidence of emergence agitation and emergence times between desflurane and sevoflurane anesthesia in children: A systematic review and meta-analysis. Medicine (Baltimore) 95, e4927, doi:10.1097/MD.0000000000004927 (2016).
      9. Radford, K. D. et al. Association between intravenous ketamine-induced stress hormone levels and long-term fear memory renewal in Sprague-Dawley rats. Behav Brain Res 378, 112259, doi:10.1016/j.bbr.2019.112259 (2020).
      10. Yang, L., Chen, Z. & Xiang, D. Effects of intravenous anesthesia with sevoflurane combined with propofol on intraoperative hemodynamics, postoperative stress disorder and cognitive function in elderly patients undergoing laparoscopic surgery. Pak J Med Sci 38, 1938-1944, doi:10.12669/pjms.38.7.5763 (2022).
      11. Xu, W. et al. Sevoflurane depresses neurons in the medial parabrachial nucleus by potentiating postsynaptic GABA(A) receptors and background potassium channels. Neuropharmacology 181, 108249, doi:10.1016/j.neuropharm.2020.108249 (2020).
      12. Ao, Y. et al. Locus Coeruleus to Paraventricular Thalamus Projections Facilitate Emergence From Isoflurane Anesthesia in Mice. Front Pharmacol 12, 643172, doi:10.3389/fphar.2021.643172 (2021).
    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The work is a useful contribution towards understanding the role of archaeal and plant D-aminoacyl-tRNA deacylase 2 (DTD2) in deacylation and detoxification of D-Tyr-tRNATyr modified by various aldehydes produced as metabolic byproducts in plants. It integrates convincing results from both in vitro and in vivo experiments to address the long-standing puzzle of why plants outperform bacteria in handling reactive aldehydes and suggests a new strategy for stress-tolerant crops. The impact of the paper is limited by the fact that only one modified D-aminoacyl tRNA was examined, in lack of evidence that plant eEF1A mimics EF-Tu in protecting L-aminoacyl tRNAs from modification, and in failure to measure accumulation of toxic D-aminoacyl tRNAs or impairment of translation in plant cells lacking DTD2.

      We have now addressed all the drawbacks as follows:

      ‘only one modified D-aminoacyl tRNA was examined’

      We wish to clarify that only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009) and D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007) and it also recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro as shown in our earlier work (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). To check the biochemical activity of DTD2 on D-Trp-tRNATrp, we have now done the D-Trp, D-Tyr and D-Asp toxicity rescue experiments by expressing the archaeal DTD2 in dtd null E. coli cells. We found that DTD2 could rescue the D-Trp toxicity with equal efficiency like D-Tyr and D-Asp (Figure: 1). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      Author response image 1.

      DTD2 recycles multiple D-aa-tRNAs with different side chain chemistry and size. Growth of wildtype (WT), dtd null strain (∆dtd), and Pyrococcus horikoshii DTD2 (PhoDTD2) complemented ∆dtd strains of E. coli K12 cells with 500 µM IPTG along with A) no D-amino acids, B) 2.5 mM D-tyrosine, C) 30 mM D-aspartate and D) 5 mM D-tryptophan.

      ‘lack of evidence that plant eEF1A mimics EF-Tu in protecting L-aminoacyl tRNAs from modification’

      To understand the role of plant eEF1A in protecting L-aa-tRNAs from aldehyde modification, we have done a thorough sequence and structural analysis. We analysed the aa-tRNA bound elongation factor structure from bacteria (PDB ids: 1TTT) and found that the side chain of amino acid in the amino acid binding site of EF-Tu is projected outside (Figure: 2A; 3A). In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs (Figure: 2B; 3B). Modelling of D-amino acid (D-phenylalanine and smallest chiral amino acid, D-alanine) in the same site shows serious clashes with main chain atoms of EF-Tu, indicating D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2C-E). Next, we superimposed the tRNA bound mammalian eEF-1A cryoEM structure (PDB id: 5LZS) with bacterial structure to understand the structural differences in terms of tRNA binding and found that elongation factor binds tRNA in a similar way (Figure: 3C-D). Modelling of D-alanine in the amino acid binding site of eEF-1A shows serious clashes with main chain atoms, indicating a general theme of D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2F; 3E). Structure-based sequence alignment of elongation factor from bacteria, archaea and eukaryotes (both plants and mammals) shows a strict conservation of amino acid binding site (Figure: 2G). This suggests that eEF-1A will mimic EF-Tu in protecting L-aa-tRNAs from reactive aldehydes. Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences (Figure: 3F). However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We have now included this sequence and structural conservation analysis in our revised manuscript (in text: line no 107-129; Figure: 2 and S2). Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      Author response image 2.

      Elongation factor enantio-selects L-aa-tRNAs through D-chiral rejection mechanism. A) Surface representation showing the cocrystal structure of EF-Tu with L-Phe-tRNAPhe. Zoomed-in image showing the binding of L-phenylalanine with side chain projected outside of binding site of EF-Tu (PDB id: 1TTT). B) Zoomed-in image of amino acid binding site of EF-Tu bound with L-phenylalanine showing the selection of amino group of amino acid through main chain atoms (PDB id: 1TTT). C) Modelling of D-phenylalanine in the amino acid binding site of EF-Tu shows severe clashes with main chain atoms of EF-Tu. Modelling of smallest chiral amino acid, alanine, in the amino acid binding site of EF-Tu shows D) no clashes with L-alanine and E) clashes with D-alanine. F) Modelling of D-alanine in the amino acid binding site of eEF-1A shows clashes with main chain atoms. (*Represents modelled molecule). G) Structure-based sequence alignment of elongation factor from bacteria, archaea and eukaryotes (both plants and animals) showing conserved amino acid binding site residues. (Key residues are marked with red star).

      Author response image 3.

      Elongation factor protects L-aa-tRNAs from aldehyde modification. A) Cartoon representation showing the cocrystal structure of EF-Tu with L-Phe-tRNAPhe (PDB id: 1TTT). B) Zoomed-in image of amino acid binding site of EF-Tu bound with L-phenylalanine (PDB id: 1TTT). C) Cartoon representation showing the cryoEM structure of eEF-1A with tRNAPhe (PDB id: 5LZS). D) Image showing the overlap of EF-Tu:L-Phe-tRNAPhe crystal structure and eEF-1A:tRNAPhe cryoEM structure (r.m.s.d. of 1.44 Å over 292 Cα atoms). E) Zoomed-in image of amino acid binding site of eEF-1A with modelled L-alanine (PDB id: 5ZLS). (*Modelled) F) Overlap showing the amino acid binding site residues of EF-Tu and eEF-1A. (EF-Tu residues are marked in black and eEF-1A residues are marked in red).

      ‘failure to measure accumulation of toxic D-aminoacyl tRNAs or impairment of translation in plant cells lacking DTD2’

      We agree that measuring the accumulation of D-aa-tRNA adducts from plant cells lacking DTD2 is important. We tried to characterise the same with dtd2 mutant plants extensively through Northern blotting as well as mass spectrometry. However, due to the lack of information about the tissue getting affected (root or shoot), identity of aa-tRNA as well as location of aa-tRNA (cytosol or organellar), we are so far unsuccessful in identifying them from plants. Efforts are still underway to identify them from plant system lacking DTD2. However, we have used a bacterial surrogate system, E. coli, as used earlier in Mazeed M. et al., Science Advances, 2021 to show the accumulation of D-aa-tRNA adducts in the absence of dtd. We could identify the accumulation of both formaldehyde and MG modified D-aa-tRNA adducts via mass spectrometry (Figure: 4). These results are now included in the revised manuscript (in line no: 190-197 and Figure: S5).

      Author response image 4.

      Loss of DTD results in accumulation of modified D-aminoacyl adducts on tRNAs in E. coli. Mass spectrometry analysis showing the accumulation of aldehyde modified D-Tyr-tRNATyr in A) Δdtd E. coli, B) formaldehyde and D-tyrosine treated Δdtd E. coli, and C) MG and D-tyrosine treated Δdtd E. coli. ESI-MS based tandem fragmentation analysis for unmodified and aldehyde modified D-Tyr-tRNATyr in D) Δdtd E. coli, E) and F) formaldehyde and D-tyrosine treated Δdtd E. coli, G) and H) MG and D-tyrosine treated Δdtd E. coli.

      Response to Public Reviews:

      We are grateful for the reviewers’ positive feedback and their comments and suggestions on this manuscript. Reviewer 1 has indicated two weaknesses and Reviewer 2 has none. We have now addressed all the concerns of the Reviewers.

      Reviewer #1 (Public Review):

      Summary:

      This work is an extension of the authors' earlier work published in Sci Adv in 2001, wherein the authors showed that DTD2 deacylates N-ethyl-D-aminoacyl-tRNAs arising from acetaldehyde toxicity. The authors in this study, investigate the role of archaeal/plant DTD2 in the deacylation/detoxification of D-Tyr-tRNATyr modified by multiple other aldehydes and methylglyoxal (produced by plants). Importantly, the authors take their biochemical observations to plants, to show that deletion of DTD2 gene from a model plant (Arabidopsis thaliana) makes them sensitive to the aldehyde supplementation in the media especially in the presence of D-Tyr. These conclusions are further supported by the observation that the model plant shows increased tolerance to the aldehyde stress when DTD2 is overproduced from the CaMV 35S promoter. The authors propose a model for the role of DTD2 in the evolution of land plants. Finally, the authors suggest that the transgenic crops carrying DTD2 may offer a strategy for stress-tolerant crop development. Overall, the authors present a convincing story, and the data are supportive of the central theme of the story.

      We are happy that reviewer found our work convincing and would like to thank the reviewer for finding our data supportive to the central theme of the manuscript.

      Strengths:

      Data are novel and they provide a new perspective on the role of DTD2, and propose possible use of the DTD2 lines in crop improvement.

      We are happy for this positive comment on the manuscript.

      Weaknesses:

      (a) Data obtained from a single aminoacyl-tRNA (D-Tyr-tRNATyr) have been generalized to imply that what is relevant to this model substrate is true for all other D-aa-tRNAs (term modified aa-tRNAs has been used synonymously with the modified Tyr-tRNATyr). This is not a risk-free extrapolation. For example, the authors see that DTD2 removes modified D-Tyr from tRNATyr in a chain-length dependent manner of the modifier. Why do the authors believe that the length of the amino acid side chain will not matter in the activity of DTD2?

      We thank the reviewer for bringing up this important point. As mentioned above, we wish to clarify that only half of the aminoacyl-tRNA synthetases are known to charge D-amino acids and only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009). D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007). Moreover, we have previously shown that it recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro as shown in our earlier work (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). To check the biochemical activity of DTD2 on D-Trp-tRNATrp, we have now done the D-Trp, D-Tyr and D-Asp toxicity rescue experiments by expressing the archaeal DTD2 in dtd null E. coli cells. We found that DTD2 could rescue the D-Trp toxicity with equal efficiency like D-Tyr and D-Asp (Figure 1). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      (b) While the use of EFTu supports that the ternary complex formation by the elongation factor can resist modifications of L-Tyr-tRNATyr by the aldehydes or other agents, in the context of the present work on the role of DTD2 in plants, one would want to see the data using eEF1alpha. This is particularly relevant because there are likely to be differences in the way EFTu and eEF1alpha may protect aminoacyl-tRNAs (for example see description in the latter half of the article by Wolfson and Knight 2005, FEBS Letters 579, 3467-3472).

      We thank the reviewer for bringing up this important point. As mentioned above, to understand the role of plant eEF1A in protecting L-aa-tRNAs from aldehyde modification, we have done a thorough sequence and structural analysis. We analysed the aa-tRNA bound elongation factor structure from bacteria (PDB ids: 1TTT) and found that the side chain of amino acid in the amino acid binding site of EF-Tu is projected outside (Figure: 2A; 3A). In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs (Figure: 2B; 3B). Modelling of D-amino acid (D-phenylalanine and smallest chiral amino acid, D-alanine) in the same site shows serious clashes with main chain atoms of EF-Tu, indicating D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2C-E). Next, we superimposed the tRNA bound mammalian eEF-1A cryoEM structure (PDB id: 5LZS) with bacterial structure to understand the structural differences in terms of tRNA binding and found that elongation factor binds tRNA in a similar way (Figure: 3C-D). Modelling of D-alanine in the amino acid binding site of eEF-1A shows serious clashes with main chain atoms, indicating a general theme of D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2F; 3E). Structure-based sequence alignment of elongation factor from bacteria, archaea and eukaryotes (both plants and mammals) shows a strict conservation of amino acid binding site (Figure: 2G). Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences (Figure: 3F). However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We have now included this sequence and structural conservation analysis in our revised manuscript (in text: line no 107-129; Figure: 2 and S2). Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      Reviewer #2 (Public Review):

      In bacteria and mammals, metabolically generated aldehydes become toxic at high concentrations because they irreversibly modify the free amino group of various essential biological macromolecules. However, these aldehydes can be present in extremely high amounts in archaea and plants without causing major toxic side effects. This fact suggests that archaea and plants have evolved specialized mechanisms to prevent the harmful effects of aldehyde accumulation.

      In this study, the authors show that the plant enzyme DTD2, originating from archaea, functions as a D-aminoacyl-tRNA deacylase. This enzyme effectively removes stable D-aminoacyl adducts from tRNAs, enabling these molecules to be recycled for translation. Furthermore, they demonstrate that DTD2 serves as a broad detoxifier for various aldehydes in vivo, extending its function beyond acetaldehyde, as previously believed. Notably, the absence of DTD2 makes plants more susceptible to reactive aldehydes, while its overexpression offers protection against them. These findings underscore the physiological significance of this enzyme.

      We thank the reviewer for the positive comments the manuscript.

      Response to recommendation to authors:

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed reading the manuscript entitled, "Archaeal origin translation proofreader imparts multi aldehyde stress tolerance to land plants" from the Sankaranarayanan lab. This work is an extension of their earlier work published in Sci Adv in 2001, wherein they showed that DTD2 deacylates N-ethyl-D-aminoacyl-tRNAs arising from acetaldehyde toxicity. Now, the authors of this study (Kumar et al.) investigate the role of archaeal/plant DTD2 in the deacylation/detoxification of D-Tyr-tRNATyr modified by multiple other aldehydes and methylglyoxal (which are produced during metabolic reactions in plants). Importantly, the authors take their biochemical observations to plants, to show that deletion of DTD2 gene from a model plant (Arabidopsis thaliana) makes them sensitive to the aldehyde supplementation in the media especially in the presence of D-Tyr. These conclusions are further supported by the observation that the model plant shows increased tolerance to the aldehyde stress when DTD2 is overproduced from the CaMV 35S promoter. The authors propose a model for the role of DTD2 in the evolution of land plants. Finally, the authors suggest that the transgenic crops carrying DTD2 may offer a strategy for stress-tolerant crop development. Overall, the authors present a convincing story, and the data are supportive of the central theme of the story.

      We are happy that reviewer enjoyed our manuscript and found our work convincing. We would also like to thank reviewer for finding our data supportive to the central theme of the manuscript.

      I have the following observations that require the authors' attention.

      1) The title of the manuscript will be more appropriate if revised to, "Archaeal origin translation proofreader, DTD2, imparts multialdehyde stress tolerance to land plants".

      Both the reviewer’s suggested to change the title. We have now changed the title based on reviewer 2 suggestion.

      2) Abstract (line 19): change, "physiologically abundantly produced" to "physiologically produced".

      As per the reviewer’s suggestion, we have now changed it to "physiologically produced".

      3) Introduction (line 50): delete, 'extremely'.

      We have removed the word 'extremely' from the Introduction.

      4) Line 79: change, "can be utilized" to "may be explored".

      We have changed "can be utilized" to "may be explored" as suggested by the reviewers.

      5) Results in general:

      (a) Data obtained from a single aminoacyl-tRNA (D-Tyr-tRNATyr) have been generalized to imply that what is relevant to this model substrate is true for all other D-aa-tRNAs (term modified aa-tRNAs has been used synonymously with the modified D-Tyr-tRNATyr). This is a risky extrapolation. For example, the authors see that DTD2 removes modified D-Tyr from tRNATyr in a chain-length dependent manner of the modifier. Why do the authors believe that the length of the amino acid side chain will not matter in the activity of DTD2?

      We thank the reviewer for bringing up this important point. As mentioned above, we wish to clarify that only half of the aminoacyl-tRNA synthetases are known to charge D-amino acids and only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009). D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007). Moreover, we have previously shown that it recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro as shown in our earlier work (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). To check the biochemical activity of DTD2 on D-Trp-tRNATrp, we have now done the D-Trp, D-Tyr and D-Asp toxicity rescue experiments by expressing the archaeal DTD2 in dtd null E. coli cells. We found that DTD2 could rescue the D-Trp toxicity with equal efficiency like D-Tyr and D-Asp (Figure 1). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      (b) Interestingly, the authors do suggest (in the Materials and Methods section) that the experiments were performed with Phe-tRNAPhe as well as Ala-tRNAAla. If what is stated in Materials and Methods is correct, these data should be included to generalize the observations.

      We regret for the confusing statement. We wish to clarify that L- and D-Tyr-tRNATyr were used for checking the TLC-based aldehyde modification, EF-Tu based protection assays and deacylation assays, D-Phe-tRNAPhe was used to characterise aldehyde-based modification by mass spectrometry and L-Ala-tRNAAla was used to check the modification propensity of multiple aldehydes. We used multiple aa-tRNAs to emphasize that aldehyde-based modifications are aspecific towards the identity of aa-tRNAs. All the data obtained with respective aa-tRNAs are included in manuscript.

      (c) While the use of EFTu supports that the ternary complex formation by the elongation factor can resist modifications of L-Tyr-tRNATyr by the aldehydes or other agents, in the context of the present work on the role of DTD2 in plants, one would want to see the data using eEF1alpha. This is particularly relevant because there are likely to be differences in the way EFTu and eEF1alpha may protect aminoacyl-tRNAs (for example see description in the latter half of the article by Wolfson and Knight 2005, FEBS Letters 579, 3467-3472).

      We thank the reviewer for bringing up this important point. As mentioned above, to understand the role of plant eEF1A in protecting L-aa-tRNAs from aldehyde modification, we have done a thorough sequence and structural analysis. We analysed the aa-tRNA bound elongation factor structure from bacteria (PDB ids: 1TTT) and found that the side chain of amino acid in the amino acid binding site of EF-Tu is projected outside (Figure: 2A; 3A). In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs (Figure: 2B; 3B). Modelling of D-amino acid (D-phenylalanine and smallest chiral amino acid, D-alanine) in the same site shows serious clashes with main chain atoms of EF-Tu, indicating D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2C-E). Next, we superimposed the tRNA bound mammalian eEF-1A cryoEM structure (PDB id: 5LZS) with bacterial structure to understand the structural differences in terms of tRNA binding and found that elongation factor binds tRNA in a similar way (Figure: 3C-D). Modelling of D-alanine in the amino acid binding site of eEF-1A shows serious clashes with main chain atoms, indicating a general theme of D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2F; 3E). Structure-based sequence alignment of elongation factor from bacteria, archaea and eukaryotes (both plants and mammals) shows a strict conservation of amino acid binding site (Figure: 2G). Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences (Figure: 3F). However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We have now included this sequence and structural conservation analysis in our revised manuscript (in text: line no 107-129; Figure: 2 and S2). Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      6) Results (line 89): Figure: 1C-G (not B-G).

      As correctly pointed out by the reviewer(s), we have changed it to Figure: 1C-G.

      7) Results (line 91): Figure: S1B-G (not C-G).

      We wish to clarify that this is correct.

      8) Line 97: change, "propionaldehyde" to "propionaldehyde (Figure: 1H)".

      As per the reviewer’s suggestion, we have now changed, "propionaldehyde" to "propionaldehyde (Figure: 1H)".

      9) Line 124: The statement, "DTD2 cleaved all modified D-aa-tRNAs at 50 pM to 500 nM range (Figure: 2A_D)" is not consistent with the data presented. For example, Figure 2D does not show any significant cleavage. Figure S2A-B also does not show cleavage.

      We thank the reviewers for pointing this out. We have changed the sentence to “DTD2 cleaved majority of aldehyde modified D-aa-tRNAs at 50 pM to 500 nM range".

      10) Line 131: Cleavage observed in Fig. S2E is inconsistent with the generalized statement on DTD1.

      We wish to clarify that the minimal activity seen in Fig. S2E is inconsistent with the general trend of DTD1’s biochemical activity seen on modified D-aa-tRNAs. In addition, we have earlier shown that D-aa-tRNA fits snugly in the active site of DTD1 (Ahmad S. et al., eLife, 2013) whereas the modified D-aa-tRNA cannot bind due to the space constrains in the active site of DTD1 (Mazeed M. et al., Science Advances, 2021). Therefore, this minimal activity could be a result of technical error during this biochemical experiment and could be considered as no activity.

      11) Lines 129-133: Citations of many figure panels particularly in the supplementary figures are inconsistent with generalized statements. This section requires a major rewrite or rearrangement of the figure panels (in case the statements are correct).

      We thank the reviewers for bringing forth this point and we have accordingly modified the statement into “DTD2 from archaea recycled short chain aldehyde-modified D-aa-tRNA adducts as expected (Figure: 3E-G) and, like DTD2 from plants, it did not act on aldehyde-modified D-aa-tRNAs longer than three chains (Figure: 3H; S3C-D; S4G-L)”.

      12) Line 142: I don't believe one can call PTH a proofreader. Its job is to recycle tRNAs from peptidyl-tRNAs.

      We thank the reviewers for pointing out this very important point. This is now corrected.

      13). Line 145: change, "DTD2 can exert its protection for" to "DTD2 may exert protection from".

      As per the reviewer’s suggestion, we have now changed"DTD2 can exert its protection for" to "DTD2 may exert protection from".

      14) Line 148: change, "a homozygous line (Figure: 3A) and checked for" to "homozygous lines (Figure: 3A) and checked them for".

      As per the reviewer’s suggestion, we have now changed, "a homozygous line (Figure: 3A) and checked for" to "homozygous lines (Figure: 3A) and checked them for".

      15) Line 148: Change, the sentence beginning with dtd2 as follows. Similar to earlier results30-32, dtd2-/- (dtd2 hereafter) plants were susceptible to ethanol (Figure: S4A) confirming the non-functionality DTD2 gene in dtd2 plants.

      As per the reviewer’s suggestion, we have now changed the sentence accordingly.

      16) Line 161: change, "linked" to "associated".

      As per the reviewer’s suggestion, we have now changed "linked" to "associated".

      17) Lines 173-176: It would be interesting to know how well the DTD2 OE lines do in comparison to the other known transgenic lines developed with, for example, ADH, ALDH, or AOX lines. Any ideas would help appreciate the observation with DTD2 OE lines!

      We greatly appreciate the reviewer’s suggestion. We have not done any comparison experiment with any transgenic lines so far. However, it can be potentially done in further studies with DTD2 OE lines.

      18) Line 194: change, "necessary" with "present".

      As per the reviewer’s suggestion, we have now changed "necessary" with "present".

      19) Line 210: what is meant by 'huge'? Would 'significant' sound better?

      As per the reviewer’s suggestion, we have now changed "huge" with "significant".

      20) Lines 239-243: This needs to be rephrased. Isn't alpha carbonyl of the carboxyl group that makes ester bond with the -CCA end of the tRNA required for DTD2 activity as well? Are you referring to the carbonyl group in the moiety that modifies the alpha-amino group? Please clarify. The cited reference (no. 64) of Atherly does not talk about it.

      We regret for the confusing statement. To clarify, we were referencing to the carbonyl carbon of the modification post amino group of the amino acid in aa-tRNAs (Figure: 5). We have now included a figure (Figure: S4Q of revised manuscript) to show the comparison of the carbonyl group for the better clarity. The cited reference Atherly A. G., Nature, 1978 shows the activity of PTH on peptidyl-tRNAs and peptidyl-tRNAs possess carbonyl carbon at alpha position post amino group of amino acid in L-aa-tRNAs.

      Author response image 5.

      Figure showing the difference in the position of carbonyl carbon in acetonyl and acetyl modification on aa-tRNAs.

      21) Line 261: thrive (not thrives).

      As per the reviewer’s suggestion, we have now changed it to thrive.

      22) In Fig3A: second last lane, it should be dtd-/-:: AtDTDH150A (not dtd-/-:: AtDTDH150A).

      We thank the reviewers for pointing out this, we have corrected it.

      23). Materials and methods: Please clarify which experiments used tRNAPhe, tRNAAla, PheRS, etc. Also, please carefully check all other details provided in this section.

      As per the reviewer’s suggestion, we would like to provide a table below explaining the use of different substrates as well as enzymes in our experiments.

      Author response table 1.

      24) Figure legends (many places): p values higher than 0.05 (not less than) are denoted as ns.

      We thank the reviewers for pointing out this. We have corrected it.

      Reviewer #2 (Recommendations For The Authors):

      I have only minor comments for the authors:

      Title: I would replace "Archeal origin translation proofreader" with " A translation proofreader of archeal origin"

      As per the reviewer’s suggestion, we have now changed the title.

      Abstract: This section could benefit from some rewriting. For instance, at the outset, the initial logical connection between the first and second sentences of the abstract is somewhat unclear. At the very least, I would suggest swapping their order to enhance the narrative flow. Later in the text, the term "chiral proofreading systems" is introduced; however, it is only in a subsequent sentence that these systems are explained to be responsible for removing stable D-aminoacyl adducts from tRNA. Providing an immediate explanation of these systems would enhance the reader's comprehension. The authors switch from the past participle tense to the present tense towards the end of the text. I would recommend that they choose one tense for consistency. In the final sentence, I would suggest toning down the statement and replacing "can be used" with "could be explored." (https://www.nature.com/articles/d41586-023-02895-w). The same comment applies to the introduction, line 79.

      As per the reviewer’s suggestion, we have now changed the abstract appropriately.

      General note: Conventionally, the use of italics is reserved for the specific species "Arabidopsis thaliana," while the broader genus "Arabidopsis" is not italicized.

      We acknowledge the reviewer for this pertinent suggestion. This is now corrected in revised version of our manuscript.

      General note: I would advise the authors against employing bold characters in conjunction with colors in the figures.

      We thank the reviewer for this suggestion. We have now changed it appropriately in revised version of our manuscript.

      Figure 1A: I recommend including the concentrations of the various aldehydes used in the experiment within the figure legend. While this information is available in the materials and methods section, it would be beneficial to have it readily accessible when analyzing the figure.

      As per the reviewer’s suggestion, we have now included the concentrations in figure legend.

      Figure 1I, J: some error bars are invisible.

      We thank the reviewers for pointing out this, we have corrected it.

      Figure 2M: The table could be simplified by removing aldehydes for which it was not feasible to demonstrate activity. The letter "M" within the cell labeled "aldehydes" appears to be a typographical error, presumably indicating the figure panel.

      As per the reviewer’s suggestion, we have now changed this appropriately.

      Figure 3: For consistency with the other panels in the figure, I recommend including an additional panel to display the graph depicting the impact of MG on germination.

      As per the reviewer’s suggestion, we have now changed this appropriately.

      Figure 4: Considering that only one plant is presented, it would be beneficial to visualize the data distribution for the other plants used in this experiment, similar to what the authors have done in panel A of the same figure.

      We thank the reviewer for bringing up this point. We wish to clarify that we have done experiment with multiple plants. However, for the sake of clarity, we have included the representative images. Moreover, we have included the quantitative data for multiple plants in Figure 3C-G.

      Figure 5E: The authors may consider presenting a chronological order of events as they believe they occurred during evolution.

      We thank the reviewer for the suggestion. However, it is very difficult to pinpoint the chronology of the events. Aldehydes are lethal for systems due to their hyper reactivity and systems would require immediate solutions to survive. Therefore, we think that both problem (toxic aldehyde production) and its solution (expansion of aldehyde metabolising repertoire and recruitment of archaeal DTD2) might have appeared simultaneously.

      Figure 6: The model appears somewhat crowded, which may affect its clarity and ease of interpretation. The authors might also consider dividing the legend sentence into two separate sentences for better readability.

      As per the reviewer’s suggestion, we have now changed this appropriately.

      Line 149: I recommend explicitly stating that ethanol metabolism produces acetaldehyde. This clarification will help the general reader immediately understand why DTD2 mutant plants are sensitive to ethanol.

      As per the reviewer’s suggestion, we have now changed this appropriately.

      Line 289: there is a typographical error, "promotor" instead of the correct term "promoter.".

      We thank the referee for pointing out this, we have now corrected it.

      Figure S5: The root morphology of DTD2 OE plants appears to exhibit some differences compared to the WT, even in the absence of a high concentration of aldehydes. It would be valuable if the authors could comment on these observed differences unless they have already done so, and I may have overlooked it.

      We thank the referee for pointing out this. We do see minor differences in root morphology, but they are more pronounced with aldehyde treatments. The reason for this phenotype remains elusive and we are trying to understand the role of DTD2 in root development in detail in further studies.

      Some Curiosity Questions (not mandatory for manuscript acceptance):

      1) Do DTD2 OE plants display an earlier flowering phenotype than wild-type Col-0?

      We have not done detailed phenotyping of DTD2 OE plants. However, our preliminary observations suggest no differences in flowering pattern as compared to wild-type Col-0.

      2) What is the current understanding of the endogenous regulation of DTD2?

      We have not done detailed analysis to understand the endogenous regulation of DTD2.

      3) Could the protective phenotype of DTD2 OE plants in the presence of aldehydes be attributed to additional functions of this enzyme beyond the removal of stable D-aminoacyl adducts from tRNAs?

      Based on the available evidence regarding the biochemical activity and in vivo phenotypes of DTD2, it appears that removal of stable D-aminoacyl adducts from tRNA is key for the protective phenotype of DTD2 OE.

      A Suggestion for Future Research (not required for manuscript acceptance):

      The authors could explore the possibility of overexpressing DTD2 in pyruvate decarboxylase transgenic plants and assess whether this strategy enhances flood tolerance without incurring a growth penalty under normal growth conditions.

      We thank the referee for this interesting suggestion for future research. We will surely keep this in mind while exploring the flood tolerance potential of DTD2 OE plants.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the forces that shape the genomic landscape of transposable elements. By exploiting both long-read sequencing of mutation accumulation lines and in vivo transposition assays, the authors offer compelling evidence that structural variation rather than transposition largely shapes transposable element copy number evolution in budding yeast. The work will be of interest to the transposable element and genome evolution communities.

      Public Reviews:

      Reviewer #1 (Public Review):

      Henault et al build on their own previous work investigating the longstanding hypothesis that hybridization between divergent populations can activate transposable element mobilization (transposition). Previously they created crosses of increasing sequence divergence, using both intra- and inter-species hybrids, and passaged them neutrally for hundreds of generations. Their previous work showed that neither hybrids isolated from natural environments nor hybrids from their mutation accumulation lines showed consistent evidence of increased transposable element content. Here, they sequence and assemble long-read genomes of 127 of their mutation-accumulation lines and annotate all existing and de novo transposable elements. They find only a handful of de novo transposition events, and instead demonstrate that structural variation (ploidy, aneuploidy, loss of heterozygosity) plays a much larger role in the transposable element load in a given strain. They then created transposable element reporter constructs using two different Ty1 elements from S. paradoxus lineages and measured the transposition rate in a number of intraspecific crosses. They demonstrate that the transposition rate is dependent on both the Ty1 sequence and the copy number of genomic transposable elements, the latter of which is consistent with what has been observed in the literature on transposable element copy number control in Saccharomyces. To my knowledge, others have not directly tested the effect of Ty1 sequence itself (have not created diverse Ty1 reporter constructs), and so this is an interesting advance. Finally, the authors show that mitotype has a moderate effect on transposition rate, which is an intriguing finding that will be interesting to explore in future work.

      This study represents a large effort to investigate how genetic background can influence transposable element load and transposition rate. The long read sequencing, assembly, and annotation, and the creation of these reporter constructs are non-trivial. Their results are straightforward, well supported, and a nice addition to the literature.

      The authors state that the results from their current work support results taken from their previous study using short-read sequencing data of the same lines. The argument that follows is whether the authors gained anything novel from long-read sequencing. I would like to see the authors make a stronger argument for why this new work was necessary, and a more detailed view of similarities or differences from their previous study (when should others choose to do long read vs. short read of evolved lines?).

      We thank the reviewer for the suggestion. While we initially aimed to justify the relevance and novelty of the current in relation to our previous study, we understand that this justification may not have been strong enough.

      In the second paragraph of the introduction, we explain how the multidimensional nature of TE load makes it more complex to characterize that simply reporting the abundance of a given TE family in a given genome. We added the following concluding sentence to further emphasize the importance of long reads in TE-focused genome inference:

      “As such, ongoing technological and computational advances in genome inference, including long-read sequencing, will certainly be key to getting a detailed understanding of the dynamics of TEs and the underpinning evolutionary forces.”

      In the penultimate introductory paragraph, we summarize our previous work from 2020 and highlight that the evolution of Ty contents in MA lines was inferred from aggregate measures of genomic abundance of TE families using short reads. We then make the point that combinations of multiple SVs could affect the landscape of TEs in ways that are not reflected by crude short-read measures. We added the following sentence to further emphasize this point and contrast it with the necessity of using more powerful methodologies for genome resolution:

      “Under this scenario, measuring Ty family abundance would yield no significant net change, and the dissection of the underlying SVs using short reads could often be challenging.”

      Relatedly, the authors should report the rates of structural variants that they observe. How are these results similar/different from other mutation-accumulation work in S. cerevisiae?

      Since this work does not attempt to provide an exhaustive report of all the SVs in the MA lines, but rather focus on attributing an SV type to individual loci occupied by TEs, we cannot include these estimates, excepted for de novo transposition itself (see below). We added the following sentence to the Results section on the classification of Ty loci by SV types:

      “We note that the current methodology does not aim at providing an exhaustive quantification of all SVs in the MA lines, as previously done for some SV types (Marsit et al., 2021), but focuses solely on loci containing Ty elements.”

      We added estimates of the average retrotransposition rate in the MA experiment based on the number of de novo insertions detected in the MA lines genomes.

      Figure 4:

      “The average retrotransposition rates estimated from the counts of de novo insertions (per line per generation per element) are the following: CC1, 1.0✕10-5; CC2, 4.9✕10-6; CC3, 7.6✕10-6; BB1, 1.5✕10-5; BC2, 1.7✕10-5; BA1, 6.5✕10-6; BA2, 2.2✕10-5; BSc1, 3.6✕10-5.”

      We added the following paragraph in the Discussion section to specifically discuss these estimates in relation to the in vivo measurements.

      “We note that while the CC crosses tend to have the lowest retrotransposition rates as estimated from the de novo insertions (~1✕10-5 per line per generation per element; Figure 4), these values are several orders of magnitude higher than the in vivo measures in SpC backgrounds. The discrepancy between these estimates could be due to uncharacterized biases inherent to each method. They could also be linked to differences between the parental genotypes used to generate the MA crosses and the fluctuation assays. One major difference is the use of ade2 genotypes in the MA parents, a strategy that was initially adopted to provide a marker for the loss of mitochondrial respiration (Joseph and Hall, 2004; Lynch et al., 2008). It has been shown that the induction of adenine starvation through minimal adenine concentration in the medium and deletion of ADE2, which inactivates the adenine de novo biosynthesis pathway, increases Ty1 transcript levels (Todeschini et al., 2005), resulting in higher transposition rates. Rich complex medium like the one that was used for the MA experiment (YPD) can exhibit substantial variation in adenine concentration (VanDusen et al., 1997), and adenine can quickly become the limiting nutrient for ade2 strains (Kokina et al., 2014). Thus, we cannot exclude that the choice of initial ade2 genotypes could have inflated the transposition rates in the MA experiment.”

      Since the authors show a small, but consistent influence of mitotype on transposition rates, adding further evidence for the role of mtDNA in regulating transposition, I'm curious what the transposition rate of a p0 strain is. I think including these results could make this observation more compelling.

      We agree that measuring in vivo transposition rates in ρ0 backgrounds would be an interesting avenue. However, there is a large distinction between having non-functional mitochondrial respiration in ρ0 strains and inheriting diverse functional mtDNA haplotypes. The effects we show are all linked to the reciprocal inheritance of intact mtDNAs, producing ρ+ strains that are all respiration-competent, as shown by our growth confirmations on non-fermentable carbon sources for all the diploid backgrounds generated. While potentially interesting, adding transposition rates measures for the ρ0 backgrounds seems hard to justify in the context of our results.

      Reviewer #2 (Public Review):

      This is an interesting follow-up study that uses long-read sequencing to examine previously constructed mutation accumulation lines between wild populations of S. cerevisiae and S. paradoxus. They also complement this work with reporter assays in hybrid backgrounds. The authors are attempting to test the hypothesis that hybridization leads to genome shock and unrestrained transposition. The paper largely confirms previous results (suggesting hybridization does not increase transposition) that are well cited and discussed in the paper, both from this group and from the Smukowski Heil/Dunham group but extends them to a new set of species/hybrids and with some additional resolution via the long read sequencing. The paper is well written and clear and I have no serious complaints.

      In the abstract, the authors make three primary claims:

      Structural variation plays a strong role in TE load.

      Transposition plays only a minor role in shaping the TE landscape in MA lines.

      Transposition rates are not increased by hybridization but are affected by genotype-specific factors.

      I found all three claims supported, albeit with some minor questions below:

      Structural variation plays a strong role in TE load.

      Convinced of this result. However:

      Line 185-187/Figure 3C: I'm curious given that the changes in Ty count are so often linked to changes in gross DNA sequence whether the count per total DNA sequence is actually changing on average in these genomes. Ie., does hybridization tend to increase TE count via CNV or does hybridization tend to increase DNA content in the MA lines and TEs come along for the ride?

      The Ty content definitely “rides along” with the rest of the genome that is affected by retrotransposition-unrelated SVs. To further highlight this point, we added a panel (E) to Figure 3 in which we correlate the net Ty copy number change (same as panel D, formerly C) to the corresponding genome size, which reflects the amount of DNA lost/gained by all SV types. We added the following to the results section:

      “The distributions of net Ty CN change per MA line showed that most crosses had significant gains (Figure 3D), suggesting that Ty load can often increase as a result of random genetic drift. Some (but not all) of these crosses also exhibited significant increases in genome size after evolution (Supplemental Figure S7A). The net Ty CN changes per MA line subgenome were globally correlated to the corresponding changes in subgenome size (Figure 3E). Even after excluding polyploid lines (which have the largest changes in both Ty CN and genome size), we found a significant relationship between the two variables (mixed linear model with random intercepts and slopes for MA crosses, P-value=3.71✕10-9; Supplemental Figure S7B), indicating that SVs affecting large portions of the genome have a substantial impact on the Ty landscape.”

      One question about ploidy (lines 175-177):

      Both aneuploidy and triploidy seem easy to call from this data. A 3:1 tetraploidy as well. However, in Figure 2B there are tetraploids that are around the 1:1 line. How are the authors calling ploidy for these strains? This was not clear to me from the text.

      This detail was indeed missing from the manuscript. The ploidy level of all MA lines was previously measured by DNA staining and flow cytometry, and the ploidy level of the subgenomes of each polyploid MA line was previously inferred from short-read sequencing. We modified the figure captions and the main text to include this along with the corresponding references:

      Figure 2:

      “The ploidy level of each line was previously determined by DNA staining and flow cytometry (Charron et al., 2019; Marsit et al., 2021).”

      Main text:

      “The ratio of classified bases per subgenome was consistent with the corresponding ploidy levels: triploid BC lines had two copies of the SpC subgenome, while tetraploid lines had both SpC subgenomes duplicated (Charron et al., 2019; Marsit et al., 2021) (Figure 2B).”

      “Finally, we used the ploidy level of each MA line subgenome as previously measured by flow cytometry and short-read sequencing (Charron et al., 2019; Marsit et al., 2021).”

      Reviewer #3 (Public Review):

      Henault et al. address the important open question of whether hybridization could trigger TE mobilization. To do this they analysed MA lines derived from crosses of Saccharomyces paradoxus and Saccharomyces cerevisiae using long-read sequencing. These MA lines were already analysed in a previous publication using Illumina short-read data but the novelty of this work is the long-read sequencing data, which may reveal previously missed information. It is an interesting message of this study that hybridization between the two species did not lead to much TE activity. Due to this low activity, the authors performed an additional TE activity assay in vivo to measure transposition rates in hybrid backgrounds. The study is well written and I cannot spot any major problems. The study provides some important messages (like the influence of the genotype and mitochondrial DNA on transposition rates).

      Major comments

      • What I miss the most in this work is the perspective of the host defence against TEs in Saccharmoces. Based on such a mechanistic perspective, why do the authors think that hybridization could lead to a TE reactivation? For example, in Drosophila small RNAs important for the defence against a TE, are solely maternally transmitted. Hybrid offspring will thus solely have small-RNAs complementary to the TEs of the mother but not to the TEs of the father, therefore a reactivation of the paternal TEs may be expected. I was thus wondering, what is the situation in yeast. Why would we expect an upregulation of TEs? Without such a mechanistic explanation the hypothesis that TEs should be upregulated in hybrids is a bit vague, based on a hunch.

      We agree with the reviewer that in the first version of the manuscript, the justification for the investigation of the reactivation hypothesis in the first place was not self-sufficient and relied too much on our previous work, upon which this article builds. We extensively remodeled the introduction to better justify the investigation of this hypothesis in the context of the current knowledge on the regulation of Ty elements in Saccharomyces.  

      Reviewer #1 (Recommendations For The Authors):

      It's interesting that the net change in transposable element copy number in mutation accumulation lines is either insignificant or gain, and never a significant loss. I think this could make a nice discussion point regarding the roles of drift and selection on TE load.

      We thank the reviewer for the suggestion and agree that this is an interesting perspective that we did not explore in the first version of the manuscript. We thus included a short discussion point in the Results:

      “The distributions of net Ty CN change per MA line showed that most crosses had significant gains (Figure 3D), suggesting that Ty load can often increase as a result of random genetic drift.”

      We also added the following paragraph to the discussion section:

      “Our experiments illustrate how under weakened natural selection efficiency, TE load can increase in hybrid genomes by the action of transposition-unrelated SVs. This offers a nuanced perspective on the classical interpretation of the transposition-selection balance model (Charlesworth et al., 1994; Charlesworth and Langley, 1989), in which increased TE load would be predominantly driven by the relaxation of purifying selection against TE insertions generated by de novo transposition. Our results suggest that SVs arising in the context of hybridization can act as a significant source of TE insertion polymorphisms which natural selection can purge more or less efficiently, depending on the population genetic context. This is closely related to the idea that sexual reproduction could favor the spread of TE families, contributing to their evolutionary success (Hickey, 1982; Zeyl et al., 1996). Since the insertion polymorphisms that contribute to increase TE load mostly originate from standing genetic variation, they could be less deleterious and thus harder for natural selection to purge efficiently.”

      The point about the role of LOH in TE load is cool!

      We thank the reviewer for their enthusiasm, it is one of our favorite results as well.

      Figure 1: Add a figure component of the green box and label it Ty1 or TE.

      We modified Figure 1 accordingly.

      Figure 2C: what is the assembly size ratio?

      We added the following sentence to the figure caption to clarify what we define as assembly size ratio:

      “Assembly size ratio refers to the ratio of subgenome assembly size to the corresponding parental assembly size.”

      Something cut off in the N50 plot axis

      Unfortunately, we can’t seem to understand what the reviewer meant with this comment, nothing seems cut out of the figure panel 2C in any of our versions of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      These are all minor comments/suggestions that the authors can take or leave.

      Line 42: "fuels" should be "fuel".

      Since the verb refers to “source” and not “variants”, we believe it should be at the third person singular.

      Line 43: unclear what the authors mean by "regroup".

      We understand how this phrasing may sound strange. We modified the sentence accordingly:

      “Structural variation is a term that encompasses a broad variety of large-scale sequence alterations”

      Line 51-52: There are a couple of really nice papers that could be cited here from Anna Selmecki's group (Todd et al. 2020, Todd and Selmecki 2019, both in eLife).

      We thank the reviewer for the suggestions, we included some of these references in the manuscript.

      Figure 1: This is a nice cartoon! I'd suggest spelling out LOH here for a truly naive reader.

      We modified the Figure 1 accordingly.

      Figure 3A: One thing that is slightly lost here in the presentation is the relative frequency of the different events because of the changing scales across 3A. I can see why you want to do it this way, but would consider whether there may be a way to present this that makes it more obvious how much more frequent polyploidy is than excision for example.

      We agree with the reviewer that the focus of this visualization is to compare crosses and individual MA lines within SV types, and fails to display the relative importance of each SV type. We solved this by including an additional panel (new 3A) that shows how the number of Ty loci affected by each SV type scales in comparison to others.

      Figure 5: I'm not a fan of the gray bars highlighting the individual strains. This made the graph less intuitively readable for me.

      We tend to agree with the reviewer and rolled back to a previous version of Figure 5 that was lighter on annotations.

      One thing I would like to see in the future from this data (definitely not in this paper) is genome rearrangements within these hybrid MA lines. How often are there structural changes and how often are those changes mediated by repeats including TEs?

      We completely agree with the reviewer that this would be a very interesting avenue, with a distinct (and likely higher) set of challenges at the analysis level compared to simply focusing on TE sequences like we did here. We hope to be able to tackle this goal in the future of this project.

      Reviewer #3 (Recommendations For The Authors):

      • I'm not from the yeast field. But why this focus on the Ty-load? Are Ty's the only active TEs in yeast? Provide some background on the TE landscape in yeast and a justification for focusing on Ty's.

      We agree with the reviewer that this point was only implicit in the introduction. We modified the introductory segment on Saccharomyces yeasts to mention that Ty retrotransposons are the only TEs found in these genomes, thus explaining the exclusive focus on them. It now reads as follows:

      “In the case of Saccharomyces cerevisiae, the only TEs found are five families of long terminal repeat (LTR) retrotransposons families named Ty1-Ty5 (Kim et al., 1998).”

      • 56 I would argue that Petrov et al 2003 is not the best citation for arguing that TEs can lead to genomic rearrangement through ectopic recombination. Petrov solely showed that some long TE families are at lower population frequency than short TE families ones. This could be due to many reasons (e.g. recent activity of long TEs - mostly LTRs) but Petrov interpreted the data as being due to ectopic recombination. Petrov, therefore, did not demonstrate any direct evidence for the involvement of ectopic recombination.

      We agree with the reviewer that this reference is not the best choice to simply support the role of TEs in generating ectopic recombination events and modified the references accordingly.

      • For the assembly the authors used two steps 1) separate the reads based on similarity to a subgenome 2) and assembly the reads from the resulting two sets separately. This is probably the only viable approach, but I'm wondering if this step can lead to some biases (many reads may not be assigned to one sub-genome or assigned to the wrong sub-genome). An alternative, possibly less biased approach, would be to use one of the emerging assemblers that promise to assemble sub-genomes. Maybe discuss why this approach was not pursued.

      We completely agree that our method has some level of bias. We adopted it because it seemed the most appropriate to answer our question, which required to resolve individual TE insertions at the level of single haplotype sequences. One specific challenge of this dataset is that we have a relatively wide range of nucleotide divergence between parental subgenomes in the different MA crosses, from <1% to ~15%. The efficiency of haplotype separation from tools that are not necessarily designed to be tunable with respect to the level of nucleotide divergence seemed uncertain, which is why we opted for a custom methodology. Although read non-classification remains a problem that is hard to solve (and would remain so using orthogonal strategies), we believe that read misclassification is minimized by our stringent criteria for read classification. The goal of this study was not to develop a tool nor to benchmark our approach against existing diploid assembly tools. It yielded phased genome representations that were of sufficient completeness and contiguity to confidently answer our questions, and we believe that pushing the discussion towards technical considerations would fall outside of our main objective.

      • The authors used a decision tree to classify Ty loci. What were the training data? How were the trees validated? Decision tree is a technical term for a classifier in machine learning. I do not think the authors used machine learning in this work, but rather an "an ad-hoc set of rules". The term decision tree in this study is misleading.

      We believe that the term “decision tree” can simply refer to a hierarchy of conditional rules implemented as a classification algorithm. As the reviewer pointed, it is clear from the manuscript that none of the analyses performed include any form of training or fitting of a machine learning classifier. However, we agree that its specific reference to the machine learning classifier can create unnecessary confusion. We thus agree to remove this term from the manuscript and replaced all its instances by “a hierarchy of binary rules”.

      • 272: as it is the CNC explanation does not make a lot of sense to me; some information is missing, is p22 expression increasing with copy numbers?

      Yes, p22 expression correlates positively with the CN of p22-expressing Ty1 elements.

      Why are the two alternative downstream codons important?

      We thought it would be useful to mention the two start codons at this point because later in the discussion, we bring the conservation of the first start codon as an observation consistent with the putative expression of p22 in S. paradoxus. We also thought that it helped clarify the mechanism by which the N-truncated version of the protein is expressed.

      p22 interferes with assembly viral particles when in high copy numbers, but what happens when at low copy numbers, is it essential for retroviral activity? Is it even necessary for the virus or just some garbage product (they mention N-truncated).

      To our knowledge, these questions regarding the potential molecular functions of p22 outside of a retrotransposition restriction factor are still open. We added details to the background on CNC in the Introduction and Results section to help clarify some the points raised:

      Introduction:

      “The best known regulation mechanism in yeast is termed copy number control (CNC) and was characterized in the Ty1 family of S. cerevisiae. This mechanism is a potent copy-number dependent negative feedback loop by which increasing the CN of Ty1 elements strengthens their repression (Czaja et al., 2020; Garfinkel et al., 2003; Saha et al., 2015).”

      Results:

      “The mechanism of negative copy-number dependent self-regulation of retrotransposition (CNC) was characterized in the Ty1 family of S. cerevisiae (Garfinkel et al., 2016). This mechanism relies on the expression of an N-truncated variant of the Ty1 capsid/nucleocapsid Gag protein (p22) from two downstream alternative start codons (Nishida et al., 2015; Saha et al., 2015). p22 expression scales up with the CN of Ty1 elements that encode it (Tucker et al., 2015), which gradually interferes with the assembly of the viral-like particles essential for Ty1 replication (Cottee et al., 2021; Saha et al., 2015). Thus, CNC yields a steep negative relationship between the retrotransposition rate measured with a tester element and the number of Ty1 copies in the genome (Garfinkel et al., 2003; Tucker et al., 2015).”

      • mtDNA influences transposition, is anything known about the mechanism?

      When presenting this result, we make it clear that this finding is not new and was previously observed in S. cerevisiae x S. uvarum hybrids by Smukowski-Heil et al. (2021). In this reference, the authors discuss multiple mechanisms by which mitochondrial biology and mito-nuclear interplay may affect transposition rate, although their data cannot support one specific hypothesis. Our data does not to allow to further dissect the mechanistic basis of the mtDNA effect, not more than the effect of distinct Ty1 natural variants. Since we simply provide new independent evidence for the mtDNA effect, it seems to us that repeating the discussion on putative mechanisms while bringing no support to any given hypothesis would be of limited relevance.

      • During the first reading, I got quite confused about what CN means (copy number as it turned out). I suggest using abbreviations only if absolutely necessary, and I'm not entirely convinced it is necessary here. But I leave this to the discretion of the authors.

      We agree that the excessive use of abbreviations in manuscripts is annoying. However, in this case, “copy number” is used so extensively that its abbreviation seemed to improve the reading experience. Thus, we would prefer to keep it unchanged.

      • Fig 3D: Wilcoxon Rank sum test. It is not clear to me what was tested here? Which data were used?

      We confirm that the statistical test employed is the Wilcoxon signed-rank test, and not the Wilcoxon rank-sum test (also known as Mann-Whitney U-test). The Wilcoxon signed-rank test is used here as a non-parametric one-sample test against the null hypothesis that the distribution is centered around zero.

      • de novo -> italics

      We choose to follow the recommendation of the general style conventions of the ACS guide for scholarly communications not to italicize common Latin terms like “de novo”, “e.g.” and “i.e.”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The reviewers make some suggestions aimed towards increasing the clarity of the manuscript, and I suggest that the authors examine those carefully. In particular, the figure is difficult to read and could contain additional information to help the reader's interpretation. For example, Reviewer 1 suggests including sample age estimates alongside depth, while Reviewer 3 also notes that there is missing information in the figure. Apart from the figure, Reviewer 1 suggests two additional analysis to help explain the amount of mammoth DNA recovered, which they observe is much higher than previous similar investigations. This would seem to be an important issue to address, given the surprising nature of the findings. In addition to this larger issue, the Reviewer makes a few important suggestions for supplementary material that may be needed to support the authors' statements.

      Some additional recommended edits -- in particular to the text and included references to related studies -- are suggested by Reviewers 2 and 3, and both commented on the lack of a publicly-available data repository. The authors may also wish to comment on or revisit their differential treatment of wooly mammoth vs. wooly rhinoceros samples, though I suspect this has more to do with low read numbers for the rhinos.

      Thank you very much for the positive assessment of our manuscript and clear suggestions for revision. We address these points below.

      Reviewer #1 (Recommendations For The Authors):

      I have a few suggestions that might further improve the manuscript:

      It is difficult for the reader to follow which core slices exactly have been sampled and sequenced. The authors mention 23 samples were taken from core LK-001 and 16 samples from core LK-007. From the text it remains unclear to me what the exact age of each of these samples is. Figure 1 shows the depth at which the LK-001 core was sampled, maybe sample age estimates could be included here.

      Thanks for pointing this out. We have added approximate ages to Figure 1, added the depth range to the text (“from 1.5 to 80 cm”; l. 73-74, caption Figure 1), and reworked the table of the sampling depths in the supplement.

      Line 84-87. The authors mention the retrieval of DNA from several expected Arctic taxa, however no further data regarding these findings is given in the manuscript. It would be useful to report the same numbers for these species as the ones given for the Mammuthus and woolly rhinoceros, which would allow for a comparison of the relative abundance of the DNA between these species. Are the expected Arctic species for instance at much higher (DNA) abundance in the samples? It would also be interesting to know if the authors discovered DNA from extant species that are unlikely to have occurred in the geographic region. A (supplementary)table listing the number of mapped reads to each of the respective mitogenomes for each sequence library would be useful for the reader.

      We added a supplementary table (S8) indicating the numbers of reads assigned to mammals.

      Line 90: I am somewhat amazed by the amount of mammoth DNA the authors recovered from these cores. A total depth of over 400X of the mitogenome is quite extraordinary and I am not aware of any ancient sediment study to date that has retrieved a similar amount of data. For instance, the Wang et al. 2021 paper, which the authors cite, sequenced over 400 samples and did not find any mammoth DNA in 70% of those. For the 30% of samples showing signs of mammoth DNA they retrieved on average 530 sequence reads. In this study the authors find on average ~20.000 reads, in 22 out of the 23 sequence libraries. This makes me wonder if the way the mapping was performed has been too lenient, resulting in possible spurious mappings? To really confirm the authenticity of the mammoth (and woolly rhino data) I would suggest two additional analysis:

      1) Mapping all the sequence libraries to a reference consisting of the complete Asian-elephant genome (for instance https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_024166365.1/), the complete human genome (+mitogenome) and the Asian elephant mitogenome. This could possibly reduce spurious mappings as conserved regions between the genomes are filtered out and could also reduce the possible mapping of NUMTS. If the authors could show that after such a mapping approach a significant number of reads are still assigned to the Asian elephant part (including the mitogenome) of the reference, the reported findings would be strengthened.

      2) I also suggest to construct a mitochondrial haplotype network from the obtained DNA, while also including previously published Asian and African elephants as well as previously published mammoth mitogenomes. If the obtained haplotypes indeed show that they cluster within the known haplotype diversity of mammoth, that would be strong support for the authenticity of the data

      The same analysis could be considered for the woolly rhino data, although the lower read numbers might make this analysis challenging.

      We agree that the amount of mammoth DNA is surprising, which is why we opted for further laboratory experiments for confirmation of the hybridization capture results of the first core, i.e., 1) DNA extraction from a second core of a different lake, 2) a quantitative PCR approach (ddPCR), and 3) metabarcoding. Our results of the highly specific ddPCR and metabarcoding assays confirmed considerable amounts of mammoth DNA in two sediment cores of different lakes, thus we have no doubts regarding the authenticity of the data. Considering the large amount of mammoth DNA, the high number of reads, and particularly the high mitogenome coverage, we argue that the effect of some spurious mapping is negligible and does not affect the main outcome and conclusions of our study. Although we agree that a haplotype network would be interesting, such analyses would stretch beyond the focus of this publication.

      Line 91: The authors mention negative controls (extraction and library blanks) did not produce any reads assigned to mammals. This is quite remarkable, as in my experience low levels of (human)contamination are almost always present in the blanks. Could the authors comment on why they think the blanks did not show any signal of mammalian DNA?

      The hybridization capture enrichment and the filtration and mapping procedures likely eliminated human contamination. Also, the data were mapped against Arctic mammal mitogenomes, which did not include human reference sequences. However, six of the sediment samples contained human sequences (now shown in supplementary table S8), albeit at low read counts (mean = 65)

      Line 97: "mapping suggested that the sequences throughout the core originated from multiple individuals" The authors do not provide any supporting data showing this. I think that an analysis (for instance based on allele frequencies) has to be included in manuscript to support this claim.

      We agree that his claim was not sufficiently supported. We performed further analyses including genomic data of previously retrieved mammoth remains and assigned our data to these haplogroups; the results were added to the main text and are shown as a figure (Fig. 2).

      Line 98: "Signatures of post-mortem DNA decay were comparably minor."

      Do the authors know if the used hybridisation enrichment method can distort the measurement of post-mortem damage? Are for instance reads with C-T substitutions less likely to be captured by the baits?

      To our knowledge, there is no study suggesting that damaged sites are less likely to be captured. In general, the hybridization capture procedure is not overly specific, and studies report that DNA is readily and preferentially captured as long as the difference between baits and DNA is not above 10%.

      Line 100: "The proportions of bases did not suggest a substantial deviation from those in the reference genomes or in the closest extant relative of Mammuthus, the Asian elephant (Elephas maximus)."

      It is not clear to me what the authors mean by this. Could the authors explain how this was measured and what their interpretation of this result is?

      We realize that the sentence was unclear. We meant that the nucleotide composition was similar to that of the reference genomes or the closest extant relative. However, as we do not consider this important for the argument, we have removed this sentence from the manuscript.

      Given the high number of recovered mammoth reads in the samples, it would be interesting to know how much mammoth reads are present in the sample before enrichment capture with the baits. Shotgun sequencing the raw extract of one of the samples with the highest number of mammoth reads might allow for a rough estimate of mammoth DNA abundance compared to the other extant species (e.g. reindeer, Arctic lemming and hare) found in the sample(s). This could give further clarification about the extent of stratigraphy disturbance and its overall effect on the DNA based community reconstruction. However, this is just a suggested additional analysis and not something I believe crucial for supporting the overall findings in this manuscript.

      We fully agree that this would be a highly interesting and informative additional analysis to perform. It was, however, not possible to perform this additional analyses in the course of the current experiments.

      Finally, I could not find a public link to the (sequence)data produced in this study. I strongly encourage the authors to make their data publicly available.

      Thank you for pointing this out. We have added a Data Availability paragraph, including the respective reference.

      Reviewer #2 (Recommendations For The Authors):

      In the Discussion it is mentioned that the reasons for Mammoth extinction are not entirely clear but are largely attributed to sudden climate warming (and add some relevant citations). However, there is also abundant literature that suggest humans also played a role in their extinction (for instance, a recent one, Damien et al. (2022) at Ecology Letters 25: 127-137).

      We agree with the reviewer and have added some the recent citation highlighting the possible influence of humans.

      One possibility to add further interest to this paper would be to conduct a phylogenetic tree with the Mammoth mitogenome(s) retrieved and a reference dataset; it could be interesting to know where do they fall in the phylogeny -already abundant with tens of individuals- and maybe it could be even possible to roughly estimate their date. There are some papers that report many Mammoth mitogenomes, including of course some from Siberia; for instance Chang et al. (2017) at Sci Reports and also Fellow Yates et al. (2017) also at Sci Reports (the latter mainly from Central Europe).

      We are well aware of the amount of mt genomes available for mammoth, and such an analyses would be an interesting addition, potentially also offering the possibility to date the DNA. However, the analyses was hampered and would be less secure for this dataset, as our sequences display quite some variation among each other, suggesting that we have a mix of multiple mt genomes, which we cannot readily distinguish. We thus refrain from this, also because we instead provide multiple lines of evidence for the existence of the mammoth DNA in the surface sediment core (metabarcoding, ddPCR).

      Minor points:

      -Correct wooly to woolly

      Revised.

      -In the sampling description it is not totally clear if the samples were taken at 1 cm each (it is mentioned that core LK-001 is sliced in the field at 1-cm steps for radiometric dating and later it is explained that 23 samples were analyzed from this core, but it is unclear if they represent 23 cm of core)

      -Maybe the authors could briefly define some terms such as "talik"

      Revised.

      Reviewer #3 (Recommendations For The Authors):

      Maybe I missed this but I could not find a data availability statement or the location of the repository

      We have added a Data Availability paragraph, including the respective reference.

      It would be good to see some additional analysis on the distribution of the woolly rhinoceros DNA through the sediment core - like the figure for the mammoth i.e read numbers vs depth.

      We have added to the supplements a table showing the numbers of assigned mammal reads over the core depths (Table S8). However, as rhinoceros reads are considerable rarer in our results, we did not produce a figure.

      Would it be possible to be more explicit about the multiple mammoth individuals, could you calculate a minimum number or haplotypes for example.

      We agree that his claim was not sufficiently supported and added results from additional analyses (incl. Fig. 2). Please see our response above.

      Based on the aim stated in the introduction, the analysis of the Arctic biodiversity of this area is missing, it would be nice to see these result added or maybe the focus needs to be changed for clarity.

      We now explicitly state that this objective pertains to a different study, which is currently still in preparation for publication.

      The single main figure needs a bit more consideration. For example in panel A - there was no information on the transformation performed or what the general trend line refers to. Do the results in panel B refer to all 22 libraries? What is the x-axis in Panel C and what do the coloured lines refer to? Additionally, I think the figure needs to be in higher resolution with increased text size on all axes.

      We revised the figure and the caption for clarity and readability.

      Finally this might be an accidental typo - but when referring to the sample aged at around 8,677 years in text it states this the 36.5 cm sample (line 130 and 192), but the supplementary says this is the 51cm sample (Table S6). This would maybe impact potential conclusions. Would you be able to clarify this.

      Thank you for noting this error, we revised it.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Alonso-Calleja and colleagues explore the role of TGR5 in adult hematopoiesis at both steady state and post-transplantation. The authors utilize two different mouse models including a TGR5-GFP reporter mouse to analyze the expression of TGR5 in various hematopoietic cell subsets. Using germline Tgr5-/- mice it's reported that loss of Tgr5 has no significant impact on steady-state hematopoiesis, with a small decrease in trabecular bone fraction, associated with a reduction in proximal tibia adipose tissue, and an increase in marrow phenotypic adipocytic precursors. The authors further explored the role of stroma TGR5 expression in the hematopoietic recovery upon bone marrow transplantation of wild-type cells, although the studies supporting this claim are weak. Overall, while most of the hematopoietic phenotypes have negative results or small effects, the role of TGR5 in adipose tissue regulation is interesting to the field.

      We thank Reviewer 1 for having identified some strengths and weaknesses of our study. As summarized below, we will work to consolidate the weaknesses of our study.

      Strengths:

      • This is the first time the role of TGR5 has been examined in the bone marrow.

      • This paper supports further exploration of the role of bile acids in bone marrow transplantation and possible therapeutic strategies.

      Weaknesses:

      • The authors fail to describe whether niche stroma cells or adipocyte progenitor cells (APCs) express TGR5.

      We are currently working to address this question using our reporter model and expect to be able to provide the data in the next version of the reviewed preprint.

      • Although the authors note a significant reduction in bone marrow adipose tissue in Tgr5-/- mice, they do not address whether this is white or brown adipose tissue especially since BA-TGR5 signaling has been shown to play a role in beiging.

      The nature of BMAT and how it relates to brown, white or brown/beige adipose tissue has been a persistent question in the field. Our understanding is that BMAT is currently considered a distinct adipose depot that is neither white nor brown/beige. BMAT does not express UCP1 to an appreciable extent, with reports showing its expressing possibly detecting contamination by tissues surrounding bone (Craft et al., 2019). Beyond this consideration, as the regulated BMAT in TGR5-/- mice is almost absent, determination of the brown/beige vs white nature of the regulated BMAT remains technically challenging.

      In Figure 1, the authors explore different progenitor subsets but stop short of describing whether TGR5 is expressed in hematopoietic stem cells (HSCs).

      Figure 1 of the originally submitted manuscript described TGR5 expression in committed myeloid progenitors (CMP, GMP and MEP). Below we provide the requested data (expression in MPPs and HSCs in Author response image 1) and we have further expanded our data with the expression in megakaryocyte progenitors (MkProg - Lin-cKit+Sca1-CD41+CD150+) as shown in Author response image 2.

      Author response image 1.

      Frequencies of GFP+ cells in MPPs and HSCs in the BM of 8-12-week-old male TGR5:GFP mice and their controls (n=9 for Wild-type control mice, n=11 for TGR5:GFP mice). Results represent the mean ± s.e.m., n represents biologically independent replicates. Two-tailed Student’s t-test was used for statistical analysis. p-values (exact value) are indicated.

      Author response image 2.

      A, representative flow cytometry gating strategy used to identify megakaryocyte progenitors (MkProg) and GFP positivity in TGR5:GFP mice and their wild-type controls. B, frequencies of GFP+ cells in MkProg population in the BM of 8-12-week-old male TGR5:GFP mice and their controls (n=3 for Wild-type control mice, n=4 for TGR5:GFP mice). Results represent the mean ± s.e.m., n represents biologically independent replicates. Two-tailed Student’s t-test (B) was used for statistical analysis. p-values (exact value) are indicated.

      • Are there more CD45+ cells in the BM because hematopoietic cells are proliferating more due to a direct effect of the loss of Tgr5 or is it because there is just more space due to less trabecular bone?

      While we do not have direct evidence to address this question, we see approximately an average 20% increase in CD45+ cell counts in the baseline Tgr5-/- mice. The absolute volume of bone and BMAT lost in these animals does not account for 20% of the total volume of the medullary cavity, so we speculate that the increase in CD45+ counts is not due exclusively to an increase in available volume.

      • In Figure 4 no absolute cell counts are provided to support the increase in immunophenotypic APCs (CD45-Ter119-CD31-Sca1+CD24-) in the stroma of Tgr5-/- mice. Accordingly, the absolute number of total stromal cells and other stroma niche cells such as MSCs, ECs are missing.

      We initially chose not to report the total number of cells per leg, as the processing of the bones for stroma isolation is less homogenous than that of the HSPC populations (which we do by crushing whole bones with a mortar and pestle). Regardless of these considerations, the data for absolute counts of APCs (left panel), the stroma-enriched fraction (CD45-Ter119-CD31- - middle panel) and endothelial cells (CD45-Ter119-CD31+ - right panel) is provided in Author response image 3. Note that the number of cells plated for CFU-F and BMSC in vitro differentiation is constant between the genotypes, thus confirming the importance of ther elative abundance data shown in the submitted version of the manuscript. In conclusion, we have prioritized the data showing the relative overrepresentation of APC progenitors in the BM stroma as measured by flow cytometry in a per cell basis, which is in line with the functional in vitro data. Further studies could address the specific question through 3D wholemount studies once APC in situ markers are firmly characterized.

      Author response image 3.

      Left panel: absolute number of adipocyte progenitor cells (APCs) in the CD45-Ter119-CD31- BM stromal gate for bothTgr5+/+ and Tgr5−/− (n=5). Middle panel: absolute number of cells isolated from the stroma-enriched BM fraction (CD45-Ter119-CD31-) in the same mice. Right panel: absolute number of endothelial cells, defined as CD45-Ter119-CD31+, in the same BM isolates.

      • There are issues with the reciprocal transplantation design in Fig 4. Why did the authors choose such a low dose (250 000) of BM cells to transplant? If the effect is true and relevant, the early recovery would be observed independently of the setup and a more robust engraftment dataset would be observed without having lethality post-transplant. On the same note, it's surprising that the authors report ~70% lethality post-transplant from wild-type control mice (Fig 4E), according to the literature 200 000 BM cells should ensure the survival of the recipient post-TBI. Overall, the results even in such a stringent setup still show minimal differences and the study lacks further in-depth analyses to support the main claim.

      We thank the reviewer for this comment. On the one hand, we disagree on the relevance of the effect size, as Tgr5-/- mice recover from low levels of platelets significantly faster than the Tgr5+/+ controls. Underlining the relevance, in a clinical setting, G-CSF is administered to patients routinely even if the acceleration of recovery is of 1-2 days (Trivedi et al., 2009).

      From the point of view of the mortality, we agree that it is higher than expected. We have suffered from cases of swollen muzzles syndrome in our facilities that have greatly hampered our ability to perform myeloablation experiments (Garrett et al., 2019), as even sublethal doses have resulted in the appearance of severe side effects that are reasons for euthanasia under Swiss legislation. For example, a strong reduction in mobility requires immediate euthanasia. All experiments were performed blinded to genotype allocation, so we can reasonably exclude experimenter bias. Finally, it could be argued that mice with more marked symptomatology leading to euthanasia are more likely to have hematopoietic deficits, which in our case was mostly seen for Tgr5+/+animals. We have therefore chosen to report mortality together with the longitudinal assessment of peripheral blood counts.

      • Mechanistically, how does the loss of Tgr5 impact hematopoietic regeneration following sublethal irradiation?

      The question of a non-lethal hematopoietic stress is a very relevant one. Unfortunately, and as delineated in the previous point, we have been seriously conditioned by cases of swollen muzzles syndrome (Garrett et al., 2019) that have stopped us from proceeding with more irradiation studies. We will profit from the change of animal facility that will consolidate during the upcoming year Labora(tory of Regenerative Hematopoiesis) to address this point in follow-up studies.

      • Only male mice were used throughout this study. It would be beneficial to know whether female mice show similar results.

      We agree with this comment, and we expect to include the characterization of BM microenvironment (Figure 3 of the current manuscript) in females in the reviewed version of the manuscript when a suitable cohort becomes available.

      Reviewer #2 (Public Review):

      Summary: In this manuscript, the authors examined the role of the bile acid receptor TGR5 in the bone marrow under steady-state and stress hematopoiesis. They initially showed the expression of TGR5 in hematopoietic compartments and that loss of TGR5 doesn't impair steady-state hematopoiesis. They further demonstrated that TGR5 knockout significantly decreases BMAT, increases the APC population, and accelerates the recovery upon bone marrow transplantation.

      Strengths: The manuscript is well-structured and well-written.

      We thank Reviewer #2 for this comment.

      Weaknesses: The mechanism is not clear, and additional studies need to be performed to support the authors' conclusion.

      We agree with Reviewer #2 that more studies are needed to understand what the role of TGR5 in the hematopoietic system is. We have been hampered in our studies of stress hematopoiesis because of frequent cases of swollen muzzles syndrome (Garrett et al., 2019), which has made difficult to continue with experiments involving myelosuppression (see response to Reviewer #1 as well). Further studies are planned or ongoing, including determining the role of the microbiome on the observed TGR5 bone and hematopoiesis stress phenotypes, but will be the focus of a separate study.

      References

      Craft, C.S., Robles, H., Lorenz, M.R., Hilker, E.D., Magee, K.L., Andersen, T.L., Cawthorn, W.P., MacDougald, O.A., Harris, C.A., Scheller, E.L., 2019. Bone marrow adipose tissue does not express UCP1 during development or adrenergic-induced remodeling. Sci Rep 9, 17427. https://doi.org/10.1038/s41598-019-54036-x

      Garrett, J., Sampson, C.H., Plett, P.A., Crisler, R., Parker, J., Venezia, R., Chua, H.L., Hickman, D.L., Booth, C., MacVittie, T., Orschell, C.M., Dynlacht, J.R., 2019. Characterization and Etiology of Swollen Muzzles in Irradiated Mice. Radiat Res 191, 31–42. https://doi.org/10.1667/RR14724.1

      Trivedi, M., Martinez, S., Corringham, S., Medley, K., Ball, E.D., 2009. Optimal use of G-CSF administration after hematopoietic SCT. Bone Marrow Transplant 43, 895–908. https://doi.org/10.1038/bmt.2009.75

    1. Author Response

      The following is the authors’ response to the original reviews.

      Answers to reviewers’ comments

      Peer Reviewers 2 and 3 criticized the name of the antibody – hvCADab - and the lack of proof that it recognized a classic cadherin. These criticisms were justified and in the intervening months the issue has been resolved. hvCADab does not recognize the cadherin protein, although it was made to an 18 amino acid sequence from the intracellular domain of the H. vulgaris cadherin protein. Newly available genome sequences from two other species, Hydra oligactis and Hydra viridissima, now show that the 18 amino acid antigen sequence is not present in these species.

      Nonetheless, the nerve net in both species is strongly stained by the antibody. Hence we have renamed the antibody PNab (pan-neuronal antibody). The antigen is currently not known. Nevertheless the antibody is an excellent reagent for imaging the nerve net in Hydra.

      We have revised the section on antibody preparation in Materials and Methods to state explicitly that PNab does not recognize classic cadherin. To support this conclusion we have added a sequence comparison (Suppl Fig 3) of the intracellular domains of classic cadherins from H. vulgaris, H. oligactis and H. viridissima, which show that the 18aa antigen sequence is only present in the H. vulgaris classic cadherin and not in the cadherin sequences from H. oligactis and H. viridissima. All three sequences have highly conserved p120/delta-catenin and beta-catenin binding domains. The sequence between these domains is highly variable and the 18aa antigen sequence used for antibody production is clearly not present in the H. oligactis and H. viridissima sequences.

      Both reviewers also criticized our evidence for pan-neuronal staining as inadequate. Hence we have now included additional data. We have stained a transgenic strain expressing NeonGreen under the control of a pan-neuronal alpha-tubulin promoter (Primak et al 2023). 684/684 transgenic nerve cells were stained with PNab. We consider this convincing evidence, in addition to the evidence presented previously, that PNab stains all nerve cells in Hydra. The first paragraph of Results has been revised to include these data.

      Reviewer 2 suggested moving gap junction/innexin data (Suppl Fig 3 and 4) from the Discussion to Results. These are indeed new results and we have followed this suggestion. Fig 12 (new) clearly shows gap junctions between neurites in bundles. It also shows that nerve cells in bundles express cell type specific innexins and hence can form cell type specific gap junctions. We have also added new images (Fig 11) of a transgenic Hym176B strain stained with PNab. These show that neurite bundles in the ectoderm contain neurites from different nerve cell types = neural circuits and hence that neurite links must be specific, e.g. gap junctions.

      As suggested by Reviewer 2 we have now provided a 3D interactive version of the block face SEM reconstruction (Suppl Fig 4). This shows that connections between neurites in bundles consist of thin overlapping fingers rather than “conventional” terminal contacts. It also shows that the purple neurite and extends past the green nerve cell body and does not end on it.

      Reviewer 2 suggested deleting discussion of possible functions for the endodermal nerve net (Discussion). We disagree with this suggestion. Our imaging results showed no connections between ectodermal and endodermal nerve nets. We also presented quantitative data for the absence of contact between the nerve nets in the gastric region. Consistent with our observations, Dupre and Yuste (2017) found no functional connection between the ectodermal and endodermal nerve nets based of neural activity measurements. Nevertheless, Giez et al (2023) in a recent preprint have described contact between specific endodermal and ectodermal nerve cells in the hypostome involved in the mouth opening response to glutathione. Both their observation and ours may be correct. The issue is not resolved. Hence we have included a discussion of possible functions for ectodermal and endodermal nerve nets. Importantly, our conclusions incorporate the difference in connectivity between muscle processes and nerve cells in the two nerve nets.

      Specific comments / Recommendations

      Reviewer 2

      Novelty: two preprints (Giez et al 2023) became available after the submission of our preprint. These include the results cited by the reviewer. These were not available to us at the time of submission.

      hvCADab has been re-named (see above). The differentiating nerve cell in Fig 11B is indeed stained by PNab. We have adjusted the intensities of red and green channels to show this more clearly.

      We consider the very clear black space between ectoderm and endoderm e.g. Fig 2B or Fig 4A to be an adequate marker for mesoglea. Use of an anti-mesoglea antibody would reduce the clarity of the image.

      It is always possible to look at more parts of Hydra tissue for possible nerve connections between ectoderm/endoderm. Nevertheless we provide the first quantitative data on the lack of contacts between 133 nerve cells (57 ectodermal and 76 endodermal) in the body column. Such data has not been previously available. And the EM result (Westfall 1973) cited by the reviewer is anecdotal at best. In later serial sectioning results on the hypostome/tentacle region from the Westfall lab no mention is made of nerve connections between the ectoderm and the endoderm. However, based on the results in the cited preprints (Giez et al) a closer examination of the hypostome/tentacle region in particular is warranted.

      To strengthen our conclusion that there are no contacts between the ectodermal and endodermal nerve nets, we now explicitly cite results from Dupre and Yuste (2017) on a calcium reporter strain demonstrating the absence of any crosscorrelation between the firing patterns of ectodermal RP1 network and the endodermal RP2 network. There was also no correlation between the activity of the second ectodermal nerve net CB and the endodermal RP2 network. These results demonstrate the absence of functional contacts between ectodermal and endodermal nerve nets.

      The reviewer criticizes the absence of trans-mesoglea links between ectodermal and endodermal epithelial cells in our EM images, e.g. Fig 9A. We can assure the reviewer that such links are frequently observed, although not in the image we chose for Fig 9A. This image, however, clearly documents two neurite bundles next to ectodermal muscle fibers.

      We agree with the reviewer that neurite bundles are an important discovery. And they raise the question of synaptic connections between neurites in bundles. Unfortunately, it is not possible to scan along the block face reconstruction (Fig 10) and count synapses. The resolution is not sufficient. Although scattered dense core vesicles (DCV) are observed in neurites, clustered DCV described by Westfall et al (1971) as synapses were not observed. We did, however, observe gap junctions between neurites in bundles (noted in Suppl Fig 3). These data have now been moved to the main body of the paper as Fig 12 together with the scRNAseq results on innexin gene expression in nerve cells. These results make it clear that neurites in bundles are connected via gap junctions and that these gap junctions are specific for neural circuits.

      The reviewer suggests that neurite bundles are an artifact of their interaction with muscle processes at the base of epithelial cells. We disagree with this statement. Muscle processes are temporary structures. They are withdrawn and reformed during every epithelial cell division, which occur approximately every three days. Bundles are almost certainly more stable structures. Furthermore, neurite bundles in the endoderm are distant from endodermal muscle fibers (Fig 4B and Fig 9D) and their polygonal pattern (Fig 2D) is completely different from the circumferential bands of endodermal muscle fibers.

      Reviewer 3

      Specific comments and suggestions have been answered above. Importantly, we show that the PNab antibody does not recognize cadherin and that it clearly stains all nerve cells in Hydra.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Dubicka and co-workers on calcification in miliolid foraminifera presents an interesting piece of work. The study uses confocal and electron microscopy to show that the traditional picture of calcification in porcelaneous foraminifera is incorrect.

      Strengths:

      The authors present high-quality images and an original approach to a relatively solid (so I thought) model of calcification.

      Weaknesses:

      There are several major shortcomings. Despite the interesting subject and the wonderful images, the conclusions of this manuscript are simply not supported at all by the results. The fluorescent images may not have any relation to the process of calcification and should therefore not be part of this manuscript. The SEM images, however, do point to an outdated idea of miliolid calcification. I think the manuscript would be much stronger with the focus on the SEM images and with the speculation of the physiological processes greatly reduced.

      Reply: We would like to give thanks for all of the highly valuable comments. Prior to our study, we were also convinced that the calcification model of Miliolid (porcelaneous) foraminifera was relatively solid. Nevertheless, our SEM imaging results surprisingly contradicted the old model. The main difference is the in situ biomineralization of calcitic needles that precipitate within the chamber wall after deposition of ACC-bearing vesicles. We agree that our fluorescence studies presented in the paper are not conclusive evidence for the calcification model used by the studied Miliolid species. However, our fluorescent results show that “the old model” (sensu Hemleben et al., 1986) is not completely outdated. Most of the fluorescent imaging data show a vesicular transport of substrates necessary for calcification. This transport is presented by Calcein labelling experiments (Movie 1 that show a high number of dynamic endocytic vesicles of sea water circulation within the cytoplasm. These very fine Calcein-labelled vesicles are most likely responsible for transport and deposition of Ca2+ ions. This is partly consistent with the model presented by Hemleben et al. (1986). We may speculate that calcite nucleation is already occurring within the transported vesicles, but at this stage of research we have no evidence for this phenomenon.

      Further live imaging fluorescence data show autofluorescence of vesicles upon excitation at 405 nm (emission 420–480 nm) associated with acidic vesicles marked by pH-sensitive LysoGlow84, may be a hint indicating association of ACC-bearing vesicles with acidic vesicles. Such spatial association of these vesicles may indicate a mechanism of pH elevation in the vesicles transporting Ca2+-rich gel to the calcifying wall of the new chamber.

      We will do our best to limit the physiological interpretation presented based on fluorescence studies in the revised version of the manuscript. We are convinced that our fluorescent live imaging experiments provide important observations in biomineralizing Miliolid foraminifera, which are still missing in the existing literature. It should be stressed that all the fluorescent experiments and SEM observations were based on specimens constructing and biomineralizing new chambers. All of them belong to the same species and come from the same culture. Due to the aforementioned reasons, it is worthwhile presenting these complimentary results of our study. In the future they may be helpful in further exploration and understanding of all aspects of calcification in foraminifera.

      Reviewer #2 (Public Review):

      Summary:

      Dubicka et al. in their paper entitled " Biocalcification in porcelaneous foraminifera" suggest that in contrast to the traditionally claimed two different modes of test calcification by rotallid and porcelaneous miliolid formaminifera, both groups produce calcareous tests via the intravesicular mineral precursors (Mg-rich amorphous calcium carbonate). These precursors are proposed to be supplied by endocytosed seawater and deposited in situ as mesocrystals formed at the site of new wall formation within the organic matrix. The authors did not observe the calcification of the needles within the transported vesicles, which challenges the previous model of miliolid mineralization. Although the authors argue that these two groups of foraminifera utilize the same calcification mechanism, they also suggest that these calcification pathways evolved independently in the Paleozoic.

      Reply: We would like to acknowledge the review and all valuable comments. We do not argue that Miliolida and Rotallida utilise an identical calcification mechanism, but both groups utilize less divergent crystallization pathways, where mesocrystalline chamber walls are created by accumulating and assembling particles of pre-formed liquid amorphous mineral phase.

      Strengths:

      The authors document various unknown aspects of calcification of Pseudolachlanella eburnea and elucidate some poorly explained phenomena (e.g., translucent properties of the freshly formed test) however there are several problematic observations/interpretations which in my opinion should be carefully addressed.

      Weaknesses:

      1) The authors (line 122) suggest that "characteristic autofluorescence indicates the carbonate content of the vesicles (Fig. S2), which are considered to be Mg-ACCs (amorphous MgCaCO3) (Fig. 2, Movies S4 and S5)". Figure S2 which the authors refer to shows only broken sections of organic sheath at different stages of mineralization. Movie S4 shows that only in a few regions some vesicles exhibit red autofluorescence interpreted as Mg-ACC (S5 is missing but probably the authors were referring to S3). In their previous paper (Dubicka et al 2023: Heliyon), the authors used exactly the same methodology to suggest that these are intracellularly formed Mg-rich amorphous calcium carbonate particles that transform into a stable mineral phase in rotaliid Aphistegina lessonii. However, in Figure 1D (Dubicka et al 2023) the apparently carbonate-loaded vesicles show the same red autofluorescence as the test, whereas in their current paper, no evidence of autofluorescence of Mg-ACC grains accumulated within the "gel-like" organic matrix is given. The S3 and S4 movies show circulation of various fluorescing components, but no initial phase of test formation is observable (numerous mineral grains embedded within the organic matrix - Figures 3A and B - should be clearly observed also as autofluorescence of the whole layer). Thus the crucial argument supporting the calcification model (Figure 5) is missing. There is no support for the following interpretation (lines 199-203) "The existence of intracellular, vesicular intermediate amorphous phase (Mg-ACC pools), which supply successive doses of carbonate material to shell production, was supported by autofluorescence (excitation at 405 nm; Fig. 2; Movies S3 and S4; see Dubicka et al., 2023) and a high content of Ca and Mg quantified from the area of cytoplasm by SEM-EDS analysis (Fig. S6)."

      Reply: We used laser line 405nm and multiphoton excitation to detect ACCs. These wavelengths (partly) permeate the shell to excite ACCs autofluorescence. The autofluorescence of the shells is present as well, but it is not clearly visible in movieS4 as the fluorescence of ACCs is stronger. This may be related to the plane/section of the cell which is shown. The laser permeates the shell above the ACCs (short distance), but to excite the shell CaCO3 around foraminifera in the same three-dimensional section where ACCs are shown, the light must pass a thick CaCO3 area due to the three-dimensional structure of the foraminifera shell. Therefore, the laser light intensity is reduced. In a revised version a movie/image with reduced threshold will be shown.

      2) The authors suggest that "no organic matter was detected between the needles of the porcelain structures (Figures 3E; 3E; S4C, and S5A)". Such a suggestion, which is highly unusual considering that biogenic minerals almost by definition contain various organic components, was made based only on FE-SEM observation. The authors should either provide clearcut evidence of the lack of organic matter (unlikely) or may suggest that intense calcium carbonate precipitation within organic matrix gel ultimately results in a decrease of the amount of the organic phase (but not its complete elimination), alike the pure calcium carbonate crystals are separated from the remaining liquid with impurities ("mother liquor"). On the other hand, if (249-250) "organic matrix involved in the biomineralization of foraminiferal shells may contain collagen-like networks", such "laminar" organization of the organic matrix may partly explain the arrangement of carbonate fibers parallel to the surface as observed in Fig. 3E1.

      Reply: We agree with the reviewer that biogenic minerals should, by definition, contain some organic components. We wrote that "no organic matter was detected between the needles of the porcelain structures” as we did not detect any organic structures based only on our FE-SEM observations. We are convinced that the shell incorporates a limited amount of organic matrix. We will rephrase this part of the text to avoid further confusion.

      3) The author's observations indeed do not show the formation of individual skeletal crystallites within intracellular vesicles, however, do not explain either what is the structure of individual skeletal crystallites and how they are formed. Especially, what are the structures observed in polarized light (and interpreted as calcite crystallites) by De Nooijer et al. 2009? The author's explanation of the process (lines 213-216) is not particularly convincing "we suspect that the OM was removed from the test wall and recycled by the cell itself".

      Reply: Thank you for this comment. We will do our best to supplement our explanations. We are aware of the structures observed in polarized light by De Nooijer et al. (2009). However, Goleń et al. (2022, Protist, https://doi.org/10.1016/j.protis.2022.125886) showed that organic polymers may also exhibit light polarization. Additional experimental studies are needed to distinguish these types of polarization. We will aim to investigate this issue in our future research.

      4) The following passage (lines 296-304) which deals with the concept of mesocrystals is not supported by the authors' methodology or observations. The authors state that miliolid needles "assembled with calcite nanoparticles, are unique examples of biogenic mesocrystals (see Cölfen and Antonietti, 2005), forming distinct geometric shapes limited by planar crystalline faces" (later in the same passage the authors say that "mesocrystals are common biogenic components in the skeletons of marine organisms" (are they thus unique or are they common)? It is my suggestion to completely eliminate this concept here until various crystallographic details of the miliolid test formation are well documented.

      Reply: Our intention was to express that mesocrystals are common biogenic components in the skeletons of marine organisms, however Miliolid needles that form distinct geometric shapes limited by planar crystalline faces are unique type of mesocrystals.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their valuable feedback and comments. Below we have addressed all points carefully and have, when needed, revised the manuscript accordingly.

      Note that we have taken the opportunity to correct minor typos and unclear text in the revised manuscript.

      Of importance to the editors and reviewers, we detected a few minor factual errors in the method section, which we have now corrected. The first error was that we wrongfully stated that our final dataset had 6358 unique TCRs, whereas it was in fact 6353 unique TCRs. The second error was that we stated that the maximum length of CDR1ꞵ was 5, where it was in fact 6. The last error was that we stated that we used a Levenshtein distance of at least 3 to discard similar peptides when swapping the TCRs to generate negatives. This should have been a Levenshtein greater than 3, to match the script we used to generate negatives (though no peptides had a Levenshtein distance of exactly 3).

      eLife assessment

      This important study reports on an improved deep-learning-based method for predicting TCR specificity. The evidence supporting the overall method is compelling, although the inclusion of real-world applications and clear comparisons with the previous version would have further strengthened the study. This work will be of broad interest to immunologists and computational biologists.

      It is not fully clear to us what is meant by “clear comparisons with the previous version”. In the manuscript we consistently compare the performance of each novel approach introduced to that of the ancestor NetTCR-2.1. Further, we concluded the manuscript with a performance to a large set of current state-of-the-art methods by training and evaluating the novel modeling framework on the IMMREP22 benchmark data.

      We agree that the manuscript can be improved by including a brief discussion of real-life applications of models for prediction of TCR specificity, and have included a brief text in the introduction.

      Reviewer #1 (Recommendations For The Authors):

      It was a great pleasure to read this article. All the concepts and motivations are clearly defined. I have just a few questions.

      What was the motivation behind employing a 1:5 positive-negative ratio? Could it be the cause of worse performance in the case of outliers?

      The ratio 1:5 is based on results from earlier work [36561755]. In this work, negatives were constructed as a mix of swapped and true (i.e measured) negatives with a ratio 1:5 for each. This work demonstrated a slight gain when including both types of negatives compared to only using swapped. In a subsequent publication [https://doi.org/10.1016/j.immuno.2023.100024], it demonstrated that optimal performance was obtained when only including swapped negatives (again in a ratio 1:5). Given this, we maintained this approach in the current work. It is clear that this choice is somewhat arbitrary, and that further work is needed to fully address this issue and the general issue of how to best generate negatives for ML of TCR specificity. Such work is in our view however beyond the scope of the current manuscript.

      Why is the patience of 200 epochs for peptide-specific models and 100 epochs for pan-specific and pre-trained models used in the context of the early stopping mechanism?

      We observed that the loss curve was overall very stable in the case of pan-specific training, likely due to the large amount of data included in this training. Therefore, these models were less likely to become stuck in a local minimum during training, meaning that a lower patience for early stopping would not prevent the model from learning optimally. In contrast, we found for some peptides that the loss curve was very erratic, and would sometimes become stuck in a local minimum for an extended time. To resolve this, the patience was increased from 100 to 200, which resulted in a better chance to escape these minima, as well as a better overall performance.

      Why is weight 3.8 used in the weighted loss function in the pan-specific model?

      The weighted loss was scaled with a division factor (c) of 3.8, in order to get an overall loss that was comparable to training without sample weights. This was primarily done to better compare the two approaches (scaling and no scaling) in terms of loss, and not so much to improve the training itself, as we already use a relatively conservative sample weight scaling based on log2. We have added a brief sentence to clarify this in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      This work is the evolution of previous studies that developed the NetTCR platform, and in a previous paper cited in this study, the authors explore the paired dataset approach with "paired α/β TCR sequence data". In this manuscript, the authors should make clear what advances were made when compared to the previous study. This is not clear, although extensive reference is made to NetTCR 2.0 and 2.1. Differences are scattered throughout the manuscript, so I would suggest a section or paragraph clearly delineating the advances in model architecture and training when compared to previous versions recently published.

      It is not clear to us when the reviewer is referring to when stating “the authors should make clear what advances were made when compared to the previous study”. Throughout the manuscript we consistently compare the performance of each novel approach introduced to that of the ancestor NetTCR-2.1. In addition, we briefly discuss all of the changes to the architecture and training at the start of the discussion section. Further, we concluded the manuscript with a performance to a large set of current state-of-the-art methods by training and evaluating the novel modeling framework on the IMMREP22 benchmark data. It is correct that the advances are described progressively by introducing each novel approach one by one, i.e. refining the machine learning model architecture and training setup, data denoising in terms of outlier identification in the training data, new model architectures combining the properties of a pan- and peptide-specific model, and integration of similarity based approach to boost model performance). We believe this helps better justify the relevance of each of the novel approaches introduced.

      In Figure 3, the colors have labels, but they are not explained in the legend or in the text. This makes it very difficult to understand the data in the various columns. Also, since it represents the Mean AUC, the data would be best displayed with a boxplot or a mean and bars for variance.

      We agree, and have changed Figure 3 and its corresponding AUC 0.1 figure (Supplementary Figure 1) into a boxplot. We also further clarified what the different models were in the figure text.

      Given the potential impact of this work on bioengineering and biotechnology, I would suggest adding a paragraph or section to the discussion where potential applications of the current model, or examples of applications of previous (or competing) models have been used to further biological research.

      We agree and have added a brief sentence in the introduction to outline biotechnological applications of models for prediction of TCR specificity.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Trenker et al. report cryo-EM structures of HER4/HER2 heterodimers and HER4 homodimers bound to Neuregulin-1b (Nrg1b) and Betacellulin (BTC). As observed for prior cryo-EM structures of full-length or near full-length HER-family receptors only the extracellular regions are visualized, presumably owing to flexibility in the relative orientation of extra- and intra-cellular regions. The authors observe no appreciable differences between Nrg1b and BTC bound heterodimers, both ligands, in this case being high-affinity ligands, and modest "scissor-like" differences in the subunit relationships in HER4 homodimers with Nrg1b and BTC bound.

      The authors also show that, as they showed for HER3, the HER4 dimerization arm is not indispensable for forming heterodimers with HER2 despite the HER4 dimerization arm forming a more canonical interaction with HER2. Perhaps most interestingly, the authors observe glycan interactions that appear to stabilize intra- and inter-subunit interactions in HER4 homodimers but that inter-subunit glycans are not present in HER2/HER4 heterodimers. The authors speculate that these glycan interactions may contribute to the apparent propensity of HER4 to homodimerize vs. heterodimerize with HER2.

      I realize that an important role of reviewers is to provide authors with informed and critical comments, but I found this manuscript a well-written, thoughtful, and important contribution. My only note is that I am not an electron microscopist so have assumed the microscopy has been carried out expertly and rely on other reviewers to vet structure determinations.

      We thank the reviewer for sharing our enthusiasm and the positive assessment of our manuscript. We have carefully reviewed the all microscopy-related concerns while responding to the assessment of reviewer #2.

      Reviewer #2 (Public Review):

      With the data presented in this manuscript, the authors help complete the set of high-resolution HER2-associated complex heterodimer structures as well as HER4 homodimer structures in the presence of NRG1b and BTC. Purification of HER2-HER4 heterodimers appears to be inherently challenging due to the propensity of HER4 to form homodimers. The authors have used an effective scheme to isolate these HER2-HER4 heterodimers and have employed graphene-oxide grid chemistry to presumably overcome the issues of low sample yield for solving cryo-EM structures of these complexes. The authors conclude HER2-HER4 heterodimers with either ligand are conformationally homogeneous relative to the HER4 homodimers. The HER2-HER4 heterodimers also appear to be better stabilized compared to other published HER2 heterodimers. The ability to model glycans in the context of HER4 homodimers is exciting to see and provides a strong rationale for the stability of these structures. Overall, the work is of great interest and the methods described in this work would benefit a wide variety of structural biology projects.

      We thank the reviewer for their positive assessment of our manuscript.

      Major comments:

      1) The HER2-HER4 heterodimer with BTC appears to be the lowest resolution of the reported structures. Although the authors claim the overall structure is similar to the HER2-HER4 heterodimer with NRG1b, it is therefore unclear whether the lower resolution of the BTC is due to challenging data collection conditions, sample preparation, or conformational dynamics not discernible due to the lower resolution. The authors should minimally clarify where they see the possible issues arising for the lower resolution as this is a key aspect of the work.

      The most likely reason for the lower resolution of the HER2/HER4/BTC reconstruction is not the underlying fundamental biology but a certain degree of preferred orientations in the sample, as can be seen from the directional FSC curves in the supplemental materials (Figure S3). We would like to note that while the overall resolution of the HER2/HER4/BTC reconstruction may be comparatively lower than other reconstructions presented in the manuscript, it remains of sufficiently high quality to substantiate our key claims. Specifically, our analysis indicates a close resemblance between the HER2/HER4/BTC reconstruction and the HER2/HER4/NRG reconstruction. For example, individual beta strands can still be well resolved allowing their accurate placement. There may be differences in features at higher resolution than 4.5Å between these two reconstructions which we cannot observe due to the lower resolution of HER2/HER4/BTC map, but these would amount to side chain motions rather than larger secondary structure movement. In the manuscript, we only draw comparisons between domain movements in different heterodimer structures and do not see any conformational variability in the final reconstructions, nor in their 3D classification analyses. Thus, we do not attribute the lower resolution of HER2/HER4/BTC reconstruction to increased dynamics at resolution scales that are discussed in the manuscript. What is more likely, is that variability in data quality, which we commonly observe between different GO grids, contributes to differences in resolution between different samples and potentially to the different orientation distributions. To comment on these possibilities, we added the following text to the manuscript (italic, underlined):

      Page 8 top paragraph:

      “Despite the diverse sequences of the NRG1β and BTC ligands, the larger-scale domain conformation of the HER2/HER4 heterodimers stabilized by each ligand is identical with only small differences in the ligand binding pockets (Figure 1d). Due to the lower resolution of the HER2/HER4/BTC complex, we cannot exclude the possibility of differences in side-chain arrangements between the two structures. However, we attribute the lower resolution to variability in data collection on GO grids, which we frequently observe, rather than differences in conformational heterogeneity of HER2/HER4/BTC.”

      Page 10, second paragraph:

      “Our cryo-EM structures of the full-length HER2/HER4 complexes bound to either NRG1β or BTC, did not reveal discernible differences at the receptor dimerization interface and larger-scale domain arrangements (Figure 1d).”

      2) For all maps, authors should display Euler angle plots from their final refinements to assess the degree of preferred orientation. Judging by the sphericity, it appears all the structures, except HER2-HER4-BTC, have well-sampled projection distributions. However, a formal clarification would be useful to the reader.

      We thank the reviewer for pointing this out. We regarded the 3DFSC curves included in our original submission as sufficient measure for projection distributions. In the revised manuscript, we now also include Euler angle plots from respective CryoSPARC refinements in the supplemental Figures.

      3) The authors should also include map-model FSCs to ascertain the quality of the map with respect to model building, as this is currently missing in the submission.

      We included map-model FSCs from Phenix validation runs in our supplemental material.

      Minor comments:

      1) With respect to complex formation, is there a reason why HER2 expression is dramatically lower than HER4?

      The expression of HER2 and HER4 in Expi293F cells, and consequently the amount of HER2 and HER4 receptors at the beginning of our first purification step, which is the NRG1b-mediated pulldown of HER4, is not noticeably different. After this initial purification step, a significant portion of HER2 is lost due to the fact that HER2/HER4 complexes constitute only a small fraction of the total HER complexes because HER4 homodimers preferentially tend to form. This is the reason why HER4 levels after the first purification step shown on the gel in Figure S1b are significantly higher than those of HER2. In the revised manuscript, in Figure S1d, we now show that both receptors are expressed at a comparable levels at the beginning of purification. In this experiment, levels of HER2-MBP-TS and HER4-TS purified separately from the equivalent volumes of transfected Exp293F cell culture via their shared TS-tags (MBP=Maltose Binding Protein, TS=Twin-Strep) are evaluated on a Coomassie-stained gel. When equal volumes of these elutions are then mixed and either subjected to HER4-directed pulldown using NRG1b-coated Flag-resin (lane 3, Figure S1d of the revised manuscript) or HER2-MBP-directed pulldown using amylose resin in the presence of NRG1b (lane 4, Figure S1d of revised manuscript), none of these pulldowns reveals substantial HER2/HER4 heterodimerization indicating that HER4 homodimerization is favored.

      2) Figures S1e authors should clarify if HER2 substitutions are VR alone or do these include GD substitutions as well. These should be suitably clarified in the main text.

      The HER2 constructs used in all cellular assays do not include the G778D mutation. We clarified this in Figure S1e, in the Materials and Methods section and in the main text on page 6.

      3) The validation reports for all 4 reported structures suggest the user-provided FSC-derived resolutions are different from those calculated by the deposition server. Are the masks deposited significantly different compared to the ones generated within cryoSPARC?

      The user-provided FSC-derived resolutions are different from those calculated by the server because the server only calculates resolution of unmasked curves from half maps while we provide the resolution derived from masked FSCs. These were all calculated using masks generated within the respective refinement job in cryoSPARC. However, we did notice that our author-provided FSC curves were from unmasked maps and we replaced the provided unmasked FSCs with masked FSCs as generated in cryoSPARC. These FSC plots in the validation reports now reflect the author-provided resolution in our validation reports and the plots generated by cryoSPARC shown in Figures S2, S3, S9 and S10.

      4) For interpretation regarding activation through phosphorylation in Figure 2e, have the authors considered HER4 could homodimerize as well? It appears from the data presented in Figure 4 and S12 that the propensity to form homodimers is greater for HER4 than to heterodimerize with HER2, despite the VR/IQ substitutions. This also appears to be supported by the reasonable amount of signal for pERK in lanes with HER4-IQ alone in the presence of NRG1b. It is recommended that the authors comment on this possibility.

      The IQ mutation, originally engineered to disrupt the receiver interface in EGFR, has been shown to have residual activity, which is greater than the mutation on the opposite site of the asymmetric dimer interface (VR) (PMID:16777603). This might be because this mutation partially destabilizes an inactive state of HER kinases by disrupting the hydrophobic interactions, which are both important for kinase inhibition and for stabilization of the active dimer. While IQ mutation is significantly inhibitory, as evidenced by the fact that we do not detect NRG1b-dependent HER4 phosphorylation in cells expressing HER4-IQ alone, it is possible that undetectable levels of phosphorylated HER4 cause the small increase in pERK signal. To acknowledge this possibility, we added the following sentence to the appropriate paragraph on page 10 in the main text:

      “Small increases in pERK levels in cells expressing the HER4-IQ construct are consistent with previous observations that the IQ mutation in HER kinase domains has small residual activity through homodimerization (PMID:16777603).”

      5) In the following line, "NRG1b-induced phosphorylation of HER2, HER4, ERK and AKT was not notably affected by substitution of the HER4 dimerization arm to a GS-arm relative to wild type receptors", it is unclear what the authors mean by wild-type receptors? There is presently no wildtype HER2 and/or HER4 tested in this blot.

      We thank the reviewer for pointing this out. Wild type receptors here refer to WT dimerization arm sequences in contrast to GS-arm mutants. We corrected the language in the appropriate place in the main text:

      “NRG1b-induced phosphorylation of HER2, HER4, ERK and AKT was not notably affected by substitution of the HER4 dimerization arm to a GS-arm relative to receptors featuring wild type dimerization arm sequences, indicating that the HER4 dimerization arm is not required for assembly and activation of HER2/HER4 heterodimers (Figure 2e).”

      6) Considering the asparagine residues can potentially mediate stabilization of HER2-HER4 dimers through glycosylation, the authors should include western blot data for receptor-activation for mutants where glycosylation can be disrupted. This could minimally instruct the reader on how functionally relevant the identified interactions like N576-N358 are.

      We agree with the Reviewer that this is a very interesting and important point, and it is subject of our future investigations. The different spectra of glycosylation that we observe between HER4 homodimers and HER2/HER4 heterodimers suggest that glycans will modulate these interactions differently. We speculate that glycans will likely be more important for HER4 homodimerization where glycosylation is more pronounced in our reconstructions. To investigate how these interactions change in the absence of single glycan modifications or their combinations, will also require taking into consideration how glycan mutations will alter an equilibrium between HER4 homodimers and HER2/HER4 heterodimerization. Such studies will require months of mutagenesis and optimization of controlled expression of such mutants, ideally generation of stable cell lines, and likely and ideally structural follow up studies. We respectfully argue that this undertaking is beyond the main scope of the current manuscript, and conceptually constitutes a separate, very important question that we are working on.

      Reviewer #1 (Recommendations For The Authors):

      The structural coordinates should be deposited in the RCSB.

      The coordinates will be released upon publication of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Figure S1b authors should ideally include a silver stain gel to assess the purity of the heterodimer-ligand complex. Although HER subunits are discernible, there is no clear band for NRG1b.

      Given its small size (9.7 kDa) our NRG1b construct is typically difficult to detect in our samples, but we would like to respectfully argue that the fact that we can resolve it at high resolution in our cryo-EM reconstructions provides sufficient evidence that it is present. Likewise, we argue that the Coomassie-stained gel we present in the manuscript is sufficient. It demonstrates that our purifications yield a stoichiometric complex of enough purity to obtain a high resolution cryo-EM reconstruction. Since we are not making any other claims about these preparations, we respectfully argue that providing a silver stain gel is not necessary to support conclusions of our study.

      We thank the reviewer for point this out. To best reflect what we wanted to convey, we change it to: “and is the same as observed in structures of an isolated HER2 ectodomain.”

      3) Page 8 first paragraph line 3, although one can deduce where the ligand binding pocket is, it would be clearer if this is marked in Figure 1d.

      We added arrows in the figure to indicate the ligand-binding pocket.

      4) Figure 2b inset A needs to be labeled 'A'.

      The inset was already labelled but in a different corner. We rearranged the label to make it clearer.

      5) Figure S5c will benefit from inset images zooming into the dimerization arm. It is hard to visualize the subtleties of the structural changes in the current format.

      Figure 5c predominantly shows side-views of various heterodimer overlays to highlight subtle differences in larger-scale assembly that correlate with differences in dimerization arm engagement. This side-orientation is not suitable for zooming into the dimerization arm regions, which can only be effectively visualized in front views (the view of the heart-shaped dimer illustrated in Figure 1a). We show a zoomed-in view of this representation in main Figure 2c, which is what we understand the Reviewer is requesting.

      6) Fig 3e is it A102 or A202 in the bottom-most panel.

      This is now corrected, thank you.

      7) Fig S9 revisit the color code for NRG1b, it appears there is no blue subunit of NRG1b. Also revisit the RMSD in the figure legend, since the text appears to suggest a different set of RMSDs for the 3 overlays.

      We fixed the color code in the Figure, thank you.

      In reference to Figure S9 (Figure S11 in the revised manuscript) we discuss two types of RMSDs:

      1) RMSDs between our cryo-EM homodimers and the crystal structure homodimers. The structure overlays are shown in Figure S9a and RMSD values were mentioned in the Figure legends. However, in the original manuscript we did not explicitly mention these values in the main text but have now added them to the main text of the revised version of the manuscript.

      2) RMSDs between monomers within our cryo-EM structures and within monomers of the crystal structure. Figure S11b and Figure S11c of the revised manuscript show these overlays for the cryo-EM structures only and the values are present in the Figure legend. We do not show the respective overlay for the crystal structures, which is why the values are not mentioned in the Figure legends, but we discuss the values in the main text.

      We recognize that this is confusing and added RMSD values for 1. to the main text and discuss this more carefully:

      “Our cryo-EM structures of the HER4/NRG1b homodimer differs slightly from the three HER4/NRG1b homodimers per asymmetric unit in the 3U7U crystal structure in which each monomer adopts a different orientation of the domain IV relative to the rest of the ectodomain (Figure S9a, RMSD: 5.438 Å, 5.435 Å and 3.662 Å). Notably, our two cryo-EM HER4 homodimer structures are more symmetric than the crystal structures of the HER4/NRG1β ectodomain homodimer. RMSDs for monomers within our cryo-EM structures are 1.42 Å in the cryo-EM HER4/NRG1b homodimer and 1.58 Å in the HER4/BTC homodimer (Figure S9b+c) compared to the monomers in the crystal structures which align with RMSDs of 1.67 Å, 5.76 Å and 2.38 Å”

      8) Page 12 paragraph 2 last line, expand on the abbreviation NAG.

      It is now expanded.

      9) What is the slit width used for the energy filter during data collection?

      The slit width was 20 eV. We added this information to the Methods section.

      10) The crosslinking conditions of 0.2% glutaraldehyde for 40 min on ice, with no quenching seems rather harsh. Have the authors attempted other crosslinking conditions? Do milder conditions or GraFix not help with complex stabilization?

      We thank the Reviewer for pointing this out. The reaction was quenched after 40 min by addition of 40 µl of 1M Tris pH 7.4 buffer. This information is now included in the Methods section. We have screened ideal crosslinking conditions for HER4 homodimers, and previously for HER2/HER3 heterodimers, and found that these crosslinking conditions were the mildest conditions that achieved complete crosslinking as assessed by SDS-PAGE.

      11) Have the authors used default parameters for all their data processing steps? Were additional steps like local per-particle CTF refinement and global defocus refinement employed during refinement?

      We did not perform any per particle CTF refinements as we previously have not observed any improvement from running such refinement on our size particles on top of per patch CTF estimation that already takes into account local CTF differences per micrograph. To make the manuscript clearer in this regard we added the following statement to the Methods section: “Unless specifically mentioned here or in the processing workflow, default parameters in CryoSPARC were used for each processing step.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Yamanaka et al.'s research investigates into the impact of volatile organic compounds (VOCs), particularly diacetyl, on gene expression changes. By inhibiting histone acetylase (HDACs) enzymes, the authors were able to observe changes in the transcriptome of various models, including cell lines, flies, and mice. The study reveals that HDAC inhibitors not only reduce cancer cell proliferation but also provide relief from neurodegeneration in fly Huntington's disease models. Although the findings are intriguing, the research falls short in providing a thorough analysis of the underlying mechanisms.

      HDAC inhibitors have been previously shown to induce gene expression changes as well as control cell division and demonstrated to work on disease models. The authors demonstrate diacetyl as a prominent HDAC inhibitor. Though the demonstration of diacetyl is novel, several similar molecules have been used before.

      In this manuscript we are not trying to understand the mechanisms by which HDAC inhibitors affect Huntington’s disease or cancer, since these have either been studied in detail before and are outside the scope of this manuscript. Our focus is to demonstrate that volatile odorants commonly found in the environment can inhibit HDACs, alter gene expression, and have downstream physiological effects. To the best of our knowledge this unusual effect of odorants has not been systematically described before.

      Reviewer #2 (Public Review):

      Sachiko et al. study presents strong evidence that implicates environmental volatile odorants, particularly diacetyl, in an alternate role as an inhibitors HDAC proteins and gene expression. HDACs are histone deacetylases that generally have repressive role in gene expression. In this paper the authors test the hypothesis that diacetyl, which is a compound emitted by rotting food sources, can diffuse through blood-brain-barrier and cell membranes to directly modulate HDAC activity to alter gene expression in a neural activity independent manner. This work is significant because the authors also link modulation of HDAC activity by diacetyl exposure to transcriptional and cellular responses to present it as a potential therapeutic agent for neurological diseases, such as inhibition of neuroblastoma and neurodegeneration.

      The authors first demonstrate that exposure to diacetyl, and some other odorants, inhibits deacetylation activity of specific HDAC proteins using in vitro assays, and increases acetylation of specific histones in cultured cells. Consistent with a role for diacetyl in HDAC inhibition, the authors find dose dependent alterations in gene expression in different fly and mice tissues in response to diacetyl exposure. In flies they first identify a decrease in the expression of chemosensory receptors in olfactory neurons after exposure to diacetyl. Subsequently, they also observe large gene expression changes in the lungs, brain, and airways in mice. In flies, some of the gene expression changes in response to diacetyl are partially reversable and show an overlap with genes that alter expression in response to treatment with other HDAC inhibitors. Given the use of HDAC inhibitors as chemotherapy agents and treatment methods for cancers and neurodegenerative diseases, the authors hypothesize that diacetyl as an HDAC inhibitor can also serve similar functions. Indeed, they find that exposure of mice to diacetyl leads to a decrease in the brain expression of many genes normally upregulated in neuroblastomas, and selectively inhibited proliferation of cell lines which are driven from neuroblastomas. To test the potential for diacetyl in treatment of neurodegenerative diseases, the authors use the fly Huntington's disease model, utilizing the overexpression of Huntingtin protein with expanded poly-Q repeats in the photoreceptor rhabdomeres which leads to their degeneration. Exposing these flies to diacetyl significantly decreases the loss of rhabdomeres, suggesting a potential for diacetyl as a therapeutic agent for neurodegeneration.

      The findings are very intriguing and highlight environmental chemicals as potent agents which can alter gene expression independent of their action through chemosensory receptors.

      We thank the reviewer for the encouraging comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      1) The results section for figure 1 seems poorly written with errors in figure citations. Please rewrite this section.

      We thank the reviewer for pointing it out and have now rewritten the results section as well as made concomitant changes in the introduction to address this comment.

      2) Discussion could be more focused and could speculate mechanistic details of HDAC inhibitors in rescue of neurodegeneration.

      We have added in information about the mechanistic role of the HDAC inhibition in rescue of neurodegeneration. “Exposure to diacetyl volatiles in the fly model of Huntington’s disease reduces cell degeneration, as has been previously observed with orally administered HDAC inhibitors like sodium butyrate and SAHA in this genetic model (27). Previous studies indicate that the inhibition of HDACs counter the acetyltransferase inhibitory activity of the polyglutaminedomain of the human Htt protein which binds to p300, P/CAF and CBP (27).”

      A few minor comments are:

      1) Figure 1 is not properly cited in the test (Eg: line 137- Its not relevant to Fig 1B and its to IC)

      We thank the referee for pointing out our error and have now corrected it.

      2) Some Abbreviations were not expanded at the first sight, which made difficult in understanding the statement (Eg: Line 51- VOC, 111- Or

      We have now defined abbreviations the first time they appear in the manuscript.

      3) Line 98- What was the unit when you mention 0.01%?

      We have added (v/v) in the text to represent the standard volume / total volume. We have also described it in the method section.

      4) Line 138- there is no comparative study done with b-HB, but the authors have claimed its was comparable. If it’s from previous study, a relative comparative statement could be given.

      We apologize for the confusion. We have added the IC50 values previously reported for b-hydroxy butyrate “IC50 for HDAC1: 5.3 mM and HDAC3 2.4 mM” which was shown in the reference #21.

      5) In lines 146-150, more details of what are the compounds and how similar they are to diacetyl could be added

      We have added representative structures and names for the chemicals tested in Figure 1C.

      6) In line 160, Why specifically they increase H3K14 acetylation?

      This observed increased H3K9 (not H3K14) acetylation levels is identical to what has previously reported for b-hydroxybutyrate. We have added a sentence pointing out this similarity “preferable acetylation of H3K9 was also observed in HEK193 cells with b-hydroxybutyrate (reference #21)”.

      7) In line 317, How HDAC inhibitors reverse the PolyQ disorder? What is its mechanism? Can at least discuss in the discussion section.

      Our assay is based on a previous publication using the Drosophila model (Ref #27) and evaluated the mechanisms in detail. We have now added a section in the Discussion describing the past findings. “Exposure to diacetyl volatiles in the fly model of Huntington’s disease reduces cell degeneration, as has been previously observed with orally administered HDAC inhibitors like sodium butyrate and SAHA in this genetic model (27). Previous studies indicate that the inhibition of HDACs counter the acetyltransferase inhibitory activity of the polyglutamine-domain of the Htt protein which binds to p300, P/CAF and CBP (27).”

      8) In figures, 1C and 1D, proper labeling of drug molecules is missing. Check 1D- Could have included Diacetyl for comparison, Where is the uninhibited control (negative)?

      We have added the name of the chemical compounds to Figure 1C and 1D. Each compound tested has a separate blank control, which forms the basis for calculation of the percentage inhibition. The negative control is therefore part of each column.

      Reviewer #2 (Recommendations For The Authors):

      As specific feedback for the authors, I have a few questions/recommendations about the main point of the paper:

      a. Throughout the manuscript, the authors demonstrate gene expression differences in different tissues in flies and mice in response to exposure to diacetyl using both transgenic reporter expression and RNAseq. The authors mention they were able to show that these gene expression changes are independent of neural activity, yet I am not sure which experiment specifically demonstrates this. How do the authors know that these changes in gene expression are due to diacetyl reaching the brain after passing blood brain barrier but not due to changes in gene expression with olfactory circuit activity? I acknowledge that disproving that the gene expression differences are independent of neural activity, but one question is whether inhibiting neural activity result in changes in the expression of overlapping genes in the same direction. Or for example, if one inhibits neural activity in Gr21a neurons, do they reversibly shut down expression of the receptor after a few days? Is this true for other ORs or specific to Gr21a and Gr63a?

      While it is difficult to completely rule out contributions of the olfactory effects in the brain, we also report differential gene expression in the lungs of mice where we do not expect olfactory circuit activity (Fig 3D-G). The overlap in DEGs is highly statistically significant between the organs suggesting at least some commonality in mechanism (Fig 5D). We recently evaluated a Drosophila tissue that does not express odorant receptors or connections, the ovaries, and also found substantial evidence of diacetyl-exposed modulation of genes. While the data are intended for a different publication, we found up to 123 up and 61 downregulated DEGs (FDR cutoff <0.05 and log2 fold change cutoff of 1 and -1). These data should also be viewed together with the in vitro HDAC inhibition data and the increased histone acetylation seen in cell lines.

      b. Is diacetyl detected by any chemosensory receptors in flies or mice? RNA profiles from these receptor mutants can be used to distinguish whether the gene expression changes are occurring due to neural activity or direct ability of diacetyl to alter HDAC activity. One relatively simple experiment would be to test whether differentially expressed genes in the orco mutant antennae overlap at all with antennal RNA profiles from diacetyl exposed flies.

      Diacetyl can be detected by multiple chemosensory receptors in flies and mice. In flies the Gr21a+Gr63a complex expressing neurons are inhibited by diacetyl as indicated, and Or9a, Or43b, Or59b, Or67a, and Or85b are activated receptors (Hallem, Cell, 2006). It would be extremely resource and time-consuming process to create and evaluate single mutants or combinations of mutants as suggested. In response to the previous point, we noted examples of tissues without olfactory receptors or olfactory circuits showing DEGs upon diacetyl exposure.

      As suggested by the referee, we compared DEGs from RNASeq data of Orco mutant antenna (N=2 replicates) generated for another project. There is very little overlap between antennal DEGs from Orco and the diacetyl (labelled chart as d4on_up and d4on_down) exposed flies. These data suggest that large-scale silencing of antennal neurons in Orco mutants do not alter expression of the same genes as altered by exposure to diacetyl.

      Author response image 1.

      c. The comparison of DEGs from individuals exposed to diacetyl versus the other two HDAC inhibitors shows some overlap. The overlap is greater for DEGs shared between the two HDAC inhibitors. Yet, there is still a substantial number of genes that are unique to diacetyl exposure. For example, if you compare SB to VA exposure, each condition has about 150-200 genes uniquely misexpressed for each condition with about 55 genes shared. However, the number of uniquely misexpressed genes is over 600 for diacetyl exposed individuals, with only 30 and 100 genes shared with either SB and VA respectively. I would have expected a higher overlap in DEGs if these compounds all inhibit similar HDACs. Do they inhibit different HDACs? Can this explain the significant number of uniquely misexpressed genes in each condition?

      It is difficult to judge significance of overlap in DEG sets the genome has around 13,000 genes from evaluating numbers without statistical analysis which we noted in the text. “A pairwise analysis using the Fisher’s exact test of each gene set revealed a statistically significant overlap of diacetyl-induced genes with SB-induced genes (p=6x10-11) and with VA-induced genes (p=2x10-65) (Figure 4F).”

      We have also further clarified in the text “This highly significant overlap among upregulated genes lends further support to our model that diacetyl vapors act as an HDAC inhibitor in vivo. As expected, each of the 3 treatments also modulated a substantial number of unique genes (Figure 4G,H), suggesting that differences in delivery format (oral vs vapor delivery), molecular structure and inhibition profile across the repertoire of HDACs may contribute to differences in gene regulation.”

      d. The authors show changes in RNA profiles in response to diacetyl exposure in different tissues and suggest that these are due to changes in histone acetylation without direct comparison of genes that show up or down regulation with acetylation patterns. They do show in the beginning that diacetyl inhibits HDAC function in vitro and in cell culture. Yet it is critical that they also show a general increase in acetylation levels within tissues profiled for RNA. Additional experiments profiling chromatin and histone acetylation patterns in the tissues where RNA is profiled from would strengthen the argument of the paper.

      We agree with the referee’s suggestion and appreciate it. However, given the heterogeneity of the cell types and therefore histone marks in chromatin within the tissues that we analyzed, we estimate that it will require substantial effort to purify or enrich specific cell populations before performing Chip-Seq. Such studies will examine correlations between up- and down-regulated genes and histone acetylation pattens in cells in the future studies. This effort will require significant resources and time which we feel are outside the scope of this manuscript.

      e. The rhabdomere experiments might benefit from a negative control. Can the authors expose the flies to another volatile and show neurodegeneration is not affected?

      We exposed the negative control group to headspace odorants of paraffin oil which is a mixture of hydrocarbons.

      f. The same is true for the initial HDAC activity profiles from Figure 1. Can the authors show an HDAC activity that is not affected by diacetyl exposure?

      We exposed the negative control group to headspace odorants of paraffin oil which is a mixture of hydrocarbons. Diacetyl shows very little inhibition (Average inhibition = 7.69%; N=2) in purified human HDAC4 when tested at the 15mM concentration.

      g. One point that might require some explanation in the discussion is why diacetyl exposure only increases acetylation of certain histones but not others in Figure 2, especially given that many HDACs are inhibited by diacetyl in Figure 1.

      Please see response to comment #6, Reviewer 1.

      h. Figure S1C is missing descriptions of what different histogram colors signify.

      We apologize for the oversight and have now indicated it in the Figure legend.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      To the Senior Editor and the Reviewing Editor:

      We sincerely appreciate the valuable comments provided by the reviewers, the reviewing editor, and the senior editor. After carefully reviewing and considering the comments, we have addressed the key concerns raised by the reviewers and made appropriate modifications to the article in the revised manuscript.

      The main revisions made to the manuscript are as follows:

      1) We have added comparison experiments with TNDM (see Fig. 2 and Fig. S2).

      2) We conducted new synthetic experiments to demonstrate that our conclusions are not a by-product of d-VAE (see Fig. S2 and Fig. S11).

      3) We have provided a detailed explanation of how our proposed criteria, especially the second criterion, can effectively exclude the selection of unsuitable signals.

      4) We have included a semantic overview figure of d-VAE (Fig. S1) and a visualization plot of latent variables (Fig. S13).

      5) We have elaborated on the model details of d-VAE, as well as the hyperparameter selection and experimental settings of other comparison models.

      We believe these revisions have significantly improved the clarity and comprehensibility of the manuscript. Thank you for the opportunity to address these important points.

      Reviewer #1

      Q1: “First, the model in the paper is almost identical to an existing VAE model (TNDM) that makes use of weak supervision with behaviour in the same way [1]. This paper should at least be referenced. If the authors wish they could compare their model to TNDM, which combines a state space model with smoothing similar to LFADS. Given that TNDM achieves very good behaviour reconstructions, it may be on par with this model without the need for a Kalman filter (and hence may achieve better separation of behaviour-related and unrelated dynamics).”

      Our model significantly differs from TNDM in several aspects. While TNDM also constrains latent variables to decode behavioral information, it does not impose constraints to maximize behavioral information in the generated relevant signals. The trade-off between the decoding and reconstruction capabilities of generated relevant signals is the most significant contribution of our approach, which is not reflected in TNDM. In addition, the backbone network of signal extraction and the prior distribution of the two models are also different.

      It's worth noting that our method does not require a Kalman filter. Kalman filter is used for post hoc assessment of the linear decoding ability of the generated signals. Please note that extracting and evaluating relevant signals are two distinct stages.

      Heeding your suggestion, we have incorporated comparison experiments involving TNDM into the revised manuscript. Detailed information on model hyperparameters and training settings can be found in the Methods section in the revised manuscripts.

      Thank you for your valuable feedback.

      Q2: “Second, in my opinion, the claims regarding identifiability are overstated - this matters as the results depend on this to some extent. Recent work shows that VAEs generally suffer from identifiability problems due to the Gaussian latent space [2]. This paper also hints that weak supervision may help to resolve such issues, so this model as well as TNDM and CEBRA may indeed benefit from this. In addition however, it appears that the relative weight of the KL Divergence in the VAE objective is chosen very small compared to the likelihood (0.1%), so the influence of the prior is weak and the model may essentially learn the average neural trajectories while underestimating the noise in the latent variables. This, in turn, could mean that the model will not autoencode neural activity as well as it should, note that an average R2 in this case will still be high (I could not see how this is actually computed). At the same time, the behaviour R2 will be large simply because the different movement trajectories are very distinct. Since the paper makes claims about the roles of different neurons, it would be important to understand how well their single trial activities are reconstructed, which can perhaps best be investigated by comparing the Poisson likelihood (LFADS is a good baseline model). Taken together, while it certainly makes sense that well-tuned neurons contribute more to behaviour decoding, I worry that the very interesting claim that neurons with weak tuning contain behavioural signals is not well supported.”

      We don’t think our distilled signals are average neural trajectories without variability. The quality of reconstructing single trial activities can be observed in Figure 3i and Figure S4. Neural trajectories in Fig. 3i and Fig. S4 show that our distilled signals are not average neural trajectories. Furthermore, if each trial activity closely matched the average neural trajectory, the Fano Factor (FF) should theoretically approach 0. However, our distilled signals exhibit a notable departure from this expectation, as evident in Figure 3c, d, g, and f. Regarding the diminished influence of the KL Divergence: Given that the ground truth of latent variable distribution is unknown, even a learned prior distribution might not accurately reflect the true distribution. We found the pronounced impact of the KL divergence would prove detrimental to the decoding and reconstruction performance. As a result, we opt to reduce the weight of the KL divergence term. Even so, KL divergence can still effectively align the distribution of latent variables with the distribution of prior latent variables, as illustrated in Fig. S13. Notably, our goal is extracting behaviorally-relevant signals from given raw signals rather than generating diverse samples from the prior distribution. When aim to separating relevant signals, we recommend reducing the influence of KL divergence. Regarding comparing the Poisson likelihood: We compared Poisson log-likelihood among different methods (except PSID since their obtained signals have negative values), and the results show that d-VAE outperforms other methods.

      Author response image 1.

      Regarding how R2 is computed: , where and denote ith sample of raw signals, ith sample of distilled relevant signals, and the mean of raw signals. If the distilled signals exactly match the raw signals, the sum of squared error is zero, thus R2=1. If the distilled signals always are equal to R2=0. If the distilled signals are worse than the mean estimation, R2 is negative, negative R2 is set to zero.

      Thank you for your valuable feedback.

      Q3: “Third, and relating to this issue, I could not entirely follow the reasoning in the section arguing that behavioural information can be inferred from neurons with weak selectivity, but that it is not linearly decodable. It is right to test if weak supervision signals bleed into the irrelevant subspace, but I could not follow the explanations. Why, for instance, is the ANN decoder on raw data (I assume this is a decoder trained fully supervised) not equal in performance to the revenant distilled signals? Should a well-trained non-linear decoder not simply yield a performance ceiling? Next, if I understand correctly, distilled signals were obtained from the full model. How does a model perform trained only on the weakly tuned neurons? Is it possible that the subspaces obtained with the model are just not optimally aligned for decoding? This could be a result of limited identifiability or model specifics that bias reconstruction to averages (a well-known problem of VAEs). I, therefore, think this analysis should be complemented with tests that do not depend on the model.”

      Regarding “Why, for instance, is the ANN decoder on raw data (I assume this is a decoder trained fully supervised) not equal in performance to the relevant distilled signals? Should a well-trained non-linear decoder not simply yield a performance ceiling?”: In fact, the decoding performance of raw signals with ANN is quite close to the ceiling. However, due to the presence of significant irrelevant signals in raw signals, decoding models like deep neural networks are more prone to overfitting when trained on noisy raw signals compared to behaviorally-relevant signals. Consequently, we anticipate that the distilled signals will demonstrate superior decoding generalization. This phenomenon is evident in Fig. 2 and Fig. S1, where the decoding performance of the distilled signals surpasses that of the raw signals, albeit not by a substantial margin.

      Regarding “Next, if I understand correctly, distilled signals were obtained from the full model. How does a model perform trained only on the weakly tuned neurons? Is it possible that the subspaces obtained with the model are just not optimally aligned for decoding?”:Distilled signals (involving all neurons) are obtained by d-VAE. Subsequently, we use ANN to evaluate the performance of smaller and larger R2 neurons. Please note that separating and evaluating relevant signals are two distinct stages.

      Regarding the reasoning in the section arguing that smaller R2 neurons encode rich information, we would like to provide a detailed explanation:

      1) After extracting relevant signals through d-VAE, we specifically selected neurons characterized by smaller R2 values (Here, R2 signifies the proportion of neuronal activity variance explained by the linear encoding model, calculated using raw signals). Subsequently, we employed both KF and ANN to assess the decoding performance of these neurons. Remarkably, our findings revealed that smaller R2 neurons, previously believed to carry limited behavioral information, indeed encode rich information.

      2) In a subsequent step, we employed d-VAE to exclusively distill the raw signals of these smaller R2 neurons (distinct from the earlier experiment where d-VAE processed signals from all neurons). We then employed KF and ANN to evaluate the distilled smaller R2 neurons. Interestingly, we observed that we could not attain the same richness of information solely through the use of these smaller R2 neurons.

      3) Consequently, we put forth and tested two hypotheses: First, that larger R2 neurons introduce additional signals into the smaller R2 neurons that do not exist in the real smaller R2 neurons. Second, that larger R2 neurons aid in restoring the original appearance of impaired smaller R2 neurons. Our proposed criteria and synthetic experiments substantiate the latter scenario.

      Thank you for your valuable feedback.

      Q4: “Finally, a more technical issue to note is related to the choice to learn a non-parametric prior instead of using a conventional Gaussian prior. How is this implemented? Is just a single sample taken during a forward pass? I worry this may be insufficient as this would not sample the prior well, and some other strategy such as importance sampling may be required (unless the prior is not relevant as it weakly contributed to the ELBO, in which case this choice seems not very relevant). Generally, it would be useful to see visualisations of the latent variables to see how information about behaviour is represented by the model.”

      Regarding "how to implement the prior?": Please refer to Equation 7 in the revised manuscript; we have added detailed descriptions in the revised manuscript.

      Regarding "Generally, it would be useful to see visualizations of the latent variables to see how information about behavior is represented by the model.": Note that our focus is not on latent variables but on distilled relevant signals. Nonetheless, at your request, we have added the visualization of latent variables in the revised manuscript. Please see Fig. S13 for details.

      Thank you for your valuable feedback.

      Recommendations: “A minor point: the word 'distill' in the name of the model may be a little misleading - in machine learning the term refers to the construction of smaller models with the same capabilities.

      It should be useful to add a schematic picture of the model to ease comparison with related approaches.”

      In the context of our model's functions, it operates as a distillation process, eliminating irrelevant signals and retaining the relevant ones. Although the name of our model may be a little misleading, it faithfully reflects what our model does.

      I have added a schematic picture of d-VAE in the revised manuscript. Please see Fig. S1 for details.

      Thank you for your valuable feedback.

      Reviewer #2

      Q1: “Is the apparently increased complexity of encoding vs decoding so unexpected given the entropy, sparseness, and high dimensionality of neural signals (the "encoding") compared to the smoothness and low dimensionality of typical behavioural signals (the "decoding") recorded in neuroscience experiments? This is the title of the paper so it seems to be the main result on which the authors expect readers to focus. ”

      We use the term "unexpected" due to the disparity between our findings and the prior understanding concerning neural encoding and decoding. For neural encoding, as we said in the Introduction, in previous studies, weakly-tuned neurons are considered useless, and smaller variance PCs are considered noise, but we found they encode rich behavioral information. For neural decoding, the nonlinear decoding performance of raw signals is significantly superior to linear decoding. However, after eliminating the interference of irrelevant signals, we found the linear decoding performance is comparable to nonlinear decoding. Rooted in these findings, which counter previous thought, we employ the term "unexpected" to characterize our observations.

      Thank you for your valuable feedback.

      Q2: “I take issue with the premise that signals in the brain are "irrelevant" simply because they do not correlate with a fixed temporal lag with a particular behavioural feature hand-chosen by the experimenter. As an example, the presence of a reward signal in motor cortex [1] after the movement is likely to be of little use from the perspective of predicting kinematics from time-bin to time-bin using a fixed model across trials (the apparent definition of "relevant" for behaviour here), but an entire sub-field of neuroscience is dedicated to understanding the impact of these reward-related signals on future behaviour. Is there method sophisticated enough to see the behavioural "relevance" of this brief, transient, post-movement signal? This may just be an issue of semantics, and perhaps I read too much into the choice of words here. Perhaps the authors truly treat "irrelevant" and "without a fixed temporal correlation" as synonymous phrases and the issue is easily resolved with a clarifying parenthetical the first time the word "irrelevant" is used. But I remain troubled by some claims in the paper which lead me to believe that they read more deeply into the "irrelevancy" of these components.”

      In this paper, we employ terms like ‘behaviorally-relevant’ and ‘behaviorally-irrelevant’ only regarding behavioral variables of interest measured within a given task, such as arm kinematics during a motor control task. A similar definition can be found in the PSID[1].

      Thank you for your valuable feedback.

      [1] Sani, Omid G., et al. "Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification." Nature Neuroscience 24.1 (2021): 140-149.

      Q3: “The authors claim the "irrelevant" responses underpin an unprecedented neuronal redundancy and reveal that movement behaviors are distributed in a higher-dimensional neural space than previously thought." Perhaps I just missed the logic, but I fail to see the evidence for this. The neural space is a fixed dimensionality based on the number of neurons. A more sparse and nonlinear distribution across this set of neurons may mean that linear methods such as PCA are not effective ways to approximate the dimensionality. But ultimately the behaviourally relevant signals seem quite low-dimensional in this paper even if they show some nonlinearity may help.”

      The evidence for the “useless” responses underpin an unprecedented neuronal redundancy is shown in Fig. 5a, d and Fig. S9a. Specifically, the sum of the decoding performance of smaller R2 neurons and larger R2 neurons is significantly greater than that of all neurons for relevant signals (red bar), demonstrating that movement parameters are encoded very redundantly in neuronal population. In contrast, we can not find this degree of neural redundancy in raw signals (purple bar).

      The evidence for the “useless” responses reveal that movement behaviors are distributed in a higher-dimensional neural space than previously thought is shown in the left plot (involving KF decoding) of Fig. 6c, f and Fig. S9f. Specifically, the improvement of KF using secondary signals is significantly higher than using raw signals composed of the same number of dimensions as the secondary signals. These results demonstrate that these dimensions, spanning roughly from ten to thirty, encode much information, suggesting that behavioral information exists in a higher-dimensional subspace than anticipated from raw signals.

      Thank you for your valuable feedback.

      Q5: “there is an apparent logical fallacy that begins in the abstract and persists in the paper: "Surprisingly, when incorporating often-ignored neural dimensions, behavioral information can be decoded linearly as accurately as nonlinear decoding, suggesting linear readout is performed in motor cortex." Don't get me wrong: the equivalency of linear and nonlinear decoding approaches on this dataset is interesting, and useful for neuroscientists in a practical sense. However, the paper expends much effort trying to make fundamental scientific claims that do not feel very strongly supported. This reviewer fails to see what we can learn about a set of neurons in the brain which are presumed to "read out" from motor cortex. These neurons will not have access to the data analyzed here. That a linear model can be conceived by an experimenter does not imply that the brain must use a linear model. The claim may be true, and it may well be that a linear readout is implemented in the brain. Other work [2,3] has shown that linear readouts of nonlinear neural activity patterns can explain some behavioural features. The claim in this paper, however, is not given enough”

      Due to the limitations of current observational methods and our incomplete understanding of brain mechanisms, it is indeed challenging to ascertain the specific data the brain acquires to generate behavior and whether it employs a linear readout. Conventionally, the neural data recorded in the motor cortex do encode movement behaviors and can be used to analyze neural encoding and decoding. Based on these data, we found that the linear decoder KF achieves comparable performance to that of the nonlinear decoder ANN on distilled relevant signals. This finding has undergone validation across three widely used datasets, providing substantial evidence. Furthermore, we conducted experiments on synthetic data to show that this conclusion is not a by-product of our model. In the revised manuscript, we added a more detailed description of this conclusion.

      Thank you for your valuable feedback.

      Q6: “Relatedly, I would like to note that the exercise of arbitrarily dividing a continuous distribution of a statistic (the "R2") based on an arbitrary threshold is a conceptually flawed exercise. The authors read too much into the fact that neurons which have a low R2 w.r.t. PDs have behavioural information w.r.t. other methods. To this reviewer, it speaks more about the irrelevance, so to speak, of the preferred direction metric than anything fundamental about the brain.”

      We chose the R2 threshold in accordance with the guidelines provided in reference [1]. It's worth mentioning that this threshold does not exert any significant influence on the overall conclusions.

      Thank you for your valuable feedback.

      [1] Inoue, Y., Mao, H., Suway, S.B., Orellana, J. and Schwartz, A.B., 2018. Decoding arm speed during reaching. Nature communications, 9(1), p.5243.

      Q7: “I am afraid I may be missing something, as I did not understand the fano factor analysis of Figure 3. In a sense the behaviourally relevant signals must have lower FF given they are in effect tied to the temporally smooth (and consistent on average across trials) behavioural covariates. The point of the original Churchland paper was to show that producing a behaviour squelches the variance; naturally these must appear in the behaviourally relevant components. A control distribution or reference of some type would possibly help here.”

      We agree that including reference signals could provide more context. The Churchland paper said stimulus onset can lead to a reduction in neural variability. However, our experiment focuses specifically on the reaching process, and thus, we don't have comparative experiments involving different types of signals.

      Thank you for your valuable feedback.

      Q8: “The authors compare the method to LFADS. While this is a reasonable benchmark as a prominent method in the field, LFADS does not attempt to solve the same problem as d-VAE. A better and much more fair comparison would be TNDM [4], an extension of LFADS which is designed to identify behaviourally relevant dimensions.”

      We have added the comparison experiments with TNDM in the revised manuscript (see Fig. 2 and Fig. S2). The details of model hyperparameters and training settings can be found in the Methods section in the revised manuscripts.

      Thank you for your valuable feedback.

      Reviewer #3

      Q1.1: “TNDM: LFADS is not the best baseline for comparison. The authors should have compared with TNDM (Hurwitz et al. 2021), which is an extension of LFADS that (unlike LFADS) actually attempts to extract behaviorally relevant factors by adding a behavior term to the loss. The code for TNDM is also available on Github. LFADS is not even supervised by behavior and does not aim to address the problem that d-VAE aims to address, so it is not the most appropriate comparison. ”

      We have added the comparison experiments with TNDM in the revised manuscript (see Fig. 2 and Fig. S2). The details of model hyperparameters and training settings can be found in the Methods section in the revised manuscripts.

      Thank you for your valuable feedback.

      Q1.2: “LFADS: LFADS is a sequential autoencoder that processes sections of data (e.g. trials). No explanation is given in Methods for how the data was passed to LFADS. Was the moving averaged smoothed data passed to LFADS or the raw spiking data (at what bin size)? Was a gaussian loss used or a poisson loss? What are the trial lengths used in each dataset, from which part of trials? For dataset C that has back-to-back reaches, was data chopped into segments? How long were these segments? Were the edges of segments overlapped and averaged as in (Keshtkaran et al. 2022) to avoid noisy segment edges or not? These are all critical details that are not explained. The same details would also be needed for a TNDM comparison (comment 1.1) since it has largely the same architecture as LFADS.

      It is also critical to briefly discuss these fundamental differences between the inputs of methods in the main text. LFADS uses a segment of data whereas VAE methods just use one sample at a time. What does this imply in the results? I guess as long as VAEs outperform LFADS it is ok, but if LFADS outperforms VAEs in a given metric, could it be because it received more data as input (a whole segment)? Why was the factor dimension set to 50? I presume it was to match the latent dimension of the VAE methods, but is the LFADS factor dimension the correct match for that to make things comparable?

      I am also surprised by the results. How do the authors justify LFADS having lower neural similarity (fig 2d) than VAE methods that operate on single time steps? LFADS is not supervised by behavior, so of course I don't expect it to necessarily outperform methods on behavior decoding. But all LFADS aims to do is to reconstruct the neural data so at least in this metric it should be able to outperform VAEs that just operate on single time steps? Is it because LFADS smooths the data too much? This is important to discuss and show examples of. These are all critical nuances that need to be discussed to validate the results and interpret them.”

      Regarding “Was the moving averaged smoothed data passed to LFADS or the raw spiking data (at what bin size)? Was a gaussian loss used or a poisson loss?”: The data used by all models was applied to the same preprocessing procedure. That is, using moving averaged smoothed data with three bins, where the bin size is 100ms. For all models except PSID, we used a Poisson loss.

      Regrading “What are the trial lengths used in each dataset, from which part of trials? For dataset C that has back-to-back reaches, was data chopped into segments? How long were these segments? Were the edges of segments overlapped and averaged as in (Keshtkaran et al. 2022) to avoid noisy segment edges or not?”:

      For datasets A and B, a trial length of eighteen is set. Trials with lengths below the threshold are zero-padded, while trials exceeding the threshold are truncated to the threshold length from their starting point. In dataset A, there are several trials with lengths considerably longer than that of most trials. We found that padding all trials with zeros to reach the maximum length (32) led to poor performance. Consequently, we chose a trial length of eighteen, effectively encompassing the durations of most trials and leading to the removal of approximately 9% of samples. For dataset B (center-out), the trial lengths are relatively consistent with small variation, and the maximum length across all trials is eighteen. For dataset C, we set the trial length as ten because we observed the video of this paradigm and found that the time for completing a single trial was approximately one second. The segments are not overlapped.

      Regarding “Why was the factor dimension set to 50? I presume it was to match the latent dimension of the VAE methods, but is the LFADS factor dimension the correct match for that to make things comparable?”: We performed a grid search for latent dimensions in {10,20,50} and found 50 is the best.

      Regarding “I am also surprised by the results. How do the authors justify LFADS having lower neural similarity (fig 2d) than VAE methods that operate on single time steps? LFADS is not supervised by behavior, so of course I don't expect it to necessarily outperform methods on behavior decoding. But all LFADS aims to do is to reconstruct the neural data so at least in this metric it should be able to outperform VAEs that just operate on single time steps? Is it because LFADS smooths the data too much?”: As you pointed out, we found that LFADS tends to produce excessively smooth and consistent data, which can lead to a reduction in neural similarity.

      Thank you for your valuable feedback.

      Q1.3: “PSID: PSID is linear and uses past input samples to predict the next sample in the output. Again, some setup choices are not well justified, and some details are left out in the 1-line explanation given in Methods.

      Why was a latent dimension of 6 chosen? Is this the behaviorally relevant latent dimension or the total latent dimension (for the use case here it would make sense to set all latent states to be behaviorally relevant)? Why was a horizon hyperparameter of 3 chosen? First, it is important to mention fundamental parameters such as latent dimension for each method in the main text (not just in methods) to make the results interpretable. Second, these hyperparameters should be chosen with a grid search in each dataset (within the training data, based on performance on the validation part of the training data), just as the authors do for their method (line 779). Given that PSID isn't a deep learning method, doing a thorough grid search in each fold should be quite feasible. It is important that high values for latent dimension and a wider range of other hyperparmeters are included in the search, because based on how well the residuals (x_i) for this method are shown predict behavior in Fig 2, the method seems to not have been used appropriately. I would expect ANN to improve decoding for PSID versus its KF decoding since PSID is fully linear, but I don't expect KF to be able to decode so well using the residuals of PSID if the method is used correctly to extract all behaviorally relevant information from neural data. The low neural reconstruction in Fid 2d could also partly be due to using too small of a latent dimension.

      Again, another import nuance is the input to this method and how differs with the input to VAE methods. The learned PSID model is a filter that operates on all past samples of input to predict the output in the "next" time step. To enable a fair comparison with VAE methods, the authors should make sure that the last sample "seen" by PSID is the same as then input sample seen by VAE methods. This is absolutely critical given how large the time steps are, otherwise PSID might underperform simply because it stopped receiving input 300ms earlier than the input received by VAE methods. To fix this, I think the authors can just shift the training and testing neural time series of PSID by 1 sample into the past (relative to the behavior), so that PSID's input would include the input of VAE methods. Otherwise, VAEs outperforming PSID is confounded by PSID's input not including the time step that was provided to VAE.”

      Thanks for your suggestions for letting PSID see the current neural observations. We did it per your suggestions and then performed a grid search for the hyperparameters for PSID. Specifically, we performed a grid search for the horizon hyperparameter in {2,3,4,5,6,7}. Since the relevant latent dimension should be lower than the horizon times the dimension of behavior variables (two-dimensional velocity in this paper) and increasing the dimension will reach performance saturation, we directly set the relevant latent dimensions as the maximum. The horizon number of datasets A, B, C, and synthetic datasets is 7, 6, 6 and 5, respectively.

      And thus the latent dimension of datasets A, B, and C and the synthetic dataset is 14, 12, 12 and 10, respectively.

      Our experiments show that KF can decode information from irrelevant signals obtained by PSID. Although PSID extracts the linear part of raw signals, KF can still use the linear part of the residuals for decoding. The low reconstruction performance of PSID may be because the relationship between latent variables and neural signals is linear, and the relationship between latent variables and behaviors is also linear; this is equivalent to the linear relationship between behaviors and neural signals, and linear models can only explain a small fraction of neural signals.

      Thank you for your valuable feedback.

      Q1.4: “CEBRA: results for CEBRA are incomplete. Similarity to raw signals is not shown. Decoding of behaviorally irrelevant residuals for CEBRA is not shown. Per Fig. S2, CEBRA does better or similar ANN decoding in datasets A and C, is only slightly worse in Dataset B, so it is important to show the other key metrics otherwise it is unclear whether d-VAE has some tangible advantage over CEBRA in those 2 datasets or if they are similar in every metric. Finally, it would be better if the authors show the results for CEBRA on Fig. 2, just as is done for other methods because otherwise it is hard to compare all methods.”

      CEBRA is a non-generative model, this model cannot generate behaviorally-relevant signals. Therefore, we only compared the decoding performance of latent embeddings of CEBRA and signals of d-VAE.

      Thank you for your valuable feedback.

      Q2: “Given the fact that d-VAE infers the latent (z) based on the population activity (x), claims about properties of the inferred behaviorally relevant signals (x_r) that attribute properties to individual neurons are confounded.

      The authors contrast their approach to population level approaches in that it infers behaviorally relevant signals for individual neurons. However, d-VAE is also a population method as it aggregates population information to infer the latent (z), from which behaviorally relevant part of the activity of each neuron (x_r) is inferred. The authors note this population level aggregation of information as a benefit of d-VAE, but only acknowledge it as a confound briefly in the context of one of their analyses (line 340): "The first is that the larger R2 neurons leak their information to the smaller R2 neurons, causing them contain too much behavioral information". They go on to dismiss this confounding possibility by showing that the inferred behaviorally relevant signal of each neuron is often most similar to its own raw signals (line 348-352) compared with all other neurons. They also provide another argument specific to that result section (i.e., residuals are not very behavior predictive), which is not general so I won't discuss it in depth here. These arguments however do not change the basic fact that d-VAE aggregates information from other neurons when extracting the behaviorally relevant activity of any given neuron, something that the authors note as a benefit of d-VAE in many instances. The fact that d-VAE aggregates population level info to give the inferred behaviorally relevant signal for each neuron confounds several key conclusions. For example, because information is aggregated across neurons, when trial to trial variability looks smoother after applying d-VAE (Fig 3i), or reveals better cosine tuning (Fig 3b), or when neurons that were not very predictive of behavior become more predictive of behavior (Fig 5), one cannot really attribute the new smoother single trial activity or the improved decoding to the same single neurons; rather these new signals/performances include information from other neurons. Unless the connections of the encoder network (z=f(x)) is zero for all other neurons, one cannot claim that the inferred rates for the neuron are truly solely associated with that neuron. I believe this a fundamental property of a population level VAE, and simply makes the architecture unsuitable for claims regarding inherent properties of single neurons. This confound is partly why the first claim in the abstract are not supported by data: observing that neurons that don't predict behavior very well would predict it much better after applying d-VAE does not prove that these neurons themselves "encode rich[er] behavioral information in complex nonlinear ways" (i.e., the first conclusion highlighted in the abstract) because information was also aggregated from other neurons. The other reason why this claim is not supported by data is the characterization of the encoding for smaller R2 neurons as "complex nonlinear", which the method is not well equipped to tease apart from linear mappings as I explain in my comment 3.”

      We acknowledge that we cannot obtain the exact single neuronal activity that does not contain any information from other neurons. However, we believe our model can extract accurate approximation signals of the ground truth relevant signals. These signals preserve the inherent properties of single neuronal activity to some extent and can be used for analysis at the single-neuron level.

      We believe d-VAE is a reasonable approach to extract effective relevant signals that preserve inherent properties of single neuronal activity for four key reasons:

      1) d-VAE is a latent variable model that adheres to the neural population doctrine. The neural population doctrine posits that information is encoded within interconnected groups of neurons, with the existence of latent variables (neural modes) responsible for generating observable neuronal activity [1, 2]. If we can perfectly obtain the true generative model from latent variables to neuronal activity, then we can generate the activity of each neuron from hidden variables without containing any information from other neurons. However, without a complete understanding of the brain’s encoding strategies (or generative model), we can only get the approximation signals of the ground truth signals.

      2) After the generative model is established, we need to infer the parameters of the generative model and the distribution of latent variables. During the inference process, inference algorithms such as variational inference or EM algorithms will be used. Generally, the obtained latent variables are also approximations of the real latent variables. When inferring the latent variables, it is inevitable to aggregation the information of the neural population, and latent variables are derived through weighted combinations of neuronal populations [3].

      This inference process is consistent with that of d-VAE (or VAE-based models).

      3) Latent variables are derived from raw neural signals and used to explain raw neural signals. Considering the unknown ground truth of latent variables and behaviorally-relevant signals, it becomes evident that the only reliable reference at the signal level is the raw signals. A crucial criterion for evaluating the reliability of latent variable models (including latent variables and generated relevant signals) is their capability to effectively explain the raw signals [3]. Consequently, we firmly maintain the belief that if the generated signals closely resemble the raw signals to the greatest extent possible, in accordance with an equivalence principle, we can claim that these obtained signals faithfully retain the inherent properties of single neurons. d-VAE explicitly constrains the generated signal to closely resemble the raw signals. These results demonstrate that d-VAE can extract effective relevant signals that preserve inherent properties of single neuronal activity.

      Based on the above reasons, we hold that generating single neuronal activities with the VAE framework is a reasonable approach. The remaining question is whether our model can obtain accurate relevant signals in the absence of ground truth. To our knowledge, in cases where the ground truth of relevant signals is unknown, there are typically two approaches to verifying the reliability of extracted signals:

      1) Conducting synthetic experiments where the ground truth is known.

      2) Validation based on expert knowledge (Three criteria were proposed in this paper). Both our extracted signals and key conclusions have been validated using these two approaches.

      Next, we will provide a detailed response to the concerns regarding our first key conclusion that smaller R2 neurons encode rich information.

      We acknowledge that larger R2 neurons play a role in aiding the reconstruction of signals in smaller R2 neurons through their neural activity. However, considering that neurons are correlated rather than independent entities, we maintain the belief that larger R2 neurons assist damaged smaller R2 neurons in restoring their original appearance. Taking image denoising as an example, when restoring noisy pixels to their original appearance, relying solely on the noisy pixels themselves is often impractical. Assistance from their correlated, clean neighboring pixels becomes necessary.

      The case we need to be cautious of is that the larger R2 neurons introduce additional signals (m) that contain substantial information to smaller R2 neurons, which they do not inherently possess. We believe this case does not hold for two reasons. Firstly, logically, adding extra signals decreases the reconstruction performance, and the information carried by these additional signals is redundant for larger R2 neurons, thus they do not introduce new information that can enhance the decoding performance of the neural population. Therefore, it seems unlikely and unnecessary for neural networks to engage in such counterproductive actions. Secondly, even if this occurs, our second criterion can effectively exclude the selection of these signals. To clarify, if we assume that x, y, and z denote the raw, relevant, and irrelevant signals of smaller R2 neurons, with x=y+z, and the extracted relevant signals become y+m, the irrelevant signals become z-m in this case. Consequently, the irrelevant signals contain a significant amount of information. It's essential to emphasize that this criterion holds significant importance in excluding undesirable signals.

      Furthermore, we conducted a synthetic experiment to show that d-VAE can indeed restore the damaged information of smaller R2 neurons with the help of larger R2 neurons, and the restored neuronal activities are more similar to ground truth compared to damaged raw signals. Please see Fig. S11a,b for details.

      Thank you for your valuable feedback.

      [1] Saxena, S. and Cunningham, J.P., 2019. Towards the neural population doctrine. Current opinion in neurobiology, 55, pp.103-111.

      [2] Gallego, J.A., Perich, M.G., Miller, L.E. and Solla, S.A., 2017. Neural manifolds for the control of movement. Neuron, 94(5), pp.978-984.

      [3] Cunningham, J.P. and Yu, B.M., 2014. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11), pp.1500-1509.

      Q3: “Given the nonlinear architecture of the VAE, claims about the linearity or nonlinearity of cortical readout are confounded and not supported by the results.

      The inference of behaviorally relevant signals from raw signals is a nonlinear operation, that is x_r=g(f(x)) is nonlinear function of x. So even when a linear KF is used to decode behavior from the inferred behaviorally relevant signals, the overall decoding from raw signals to predicted behavior (i.e., KF applied to g(f(x))) is nonlinear. Thus, the result that decoding of behavior from inferred behaviorally relevant signals (x_r) using a linear KF and a nonlinear ANN reaches similar accuracy (Fig 2), does not suggest that a "linear readout is performed in the motor cortex", as the authors claim (line 471). The authors acknowledge this confound (line 472) but fail to address it adequately. They perform a simulation analysis where the decoding gap between KF and ANN remains unchanged even when d-VAE is used to infer behaviorally relevant signals in the simulation. However, this analysis is not enough for "eliminating the doubt" regarding the confound. I'm sure the authors can also design simulations where the opposite happens and just like in the data, d-VAE can improve linear decoding to match ANN decoding. An adequate way to address this concern would be to use a fully linear version of the autoencoder where the f(.) and g(.) mappings are fully linear. They can simply replace these two networks in their model with affine mappings, redo the modeling and see if the model still helps the KF decoding accuracy reach that of the ANN decoding. In such a scenario, because the overall KF decoding from original raw signals to predicted behavior (linear d-VAE + KF) is linear, then they could move toward the claim that the readout is linear. Even though such a conclusion would still be impaired by the nonlinear reference (d-VAE + ANN decoding) because the achieved nonlinear decoding performance could always be limited by network design and fitting issues. Overall, the third conclusion highlighted in the abstract is a very difficult claim to prove and is unfortunately not supported by the results.”

      We aim to explore the readout mechanism of behaviorally-relevant signals, rather than raw signals. Theoretically, the process of removing irrelevant signals should not be considered part of the inherent decoding mechanisms of the relevant signals. Assuming that the relevant signals we extracted are accurate, the conclusion of linear readout is established. On the synthetic data where the ground truth is known, our distilled signals show a significant improvement in neural similarity to the ground truth when compared to raw signals (refer to Fig. S2l). This observation demonstrates that our distilled signals are accurate approximations of the ground truth. Furthermore, on the three widely-used real datasets, our distilled signals meet the stringent criteria we have proposed (see Fig. 2), also providing strong evidence for their accuracy.

      Regarding the assertion that we could create simulations in which d-VAE can make signals that are inherently nonlinearly decodable into linearly decodable ones: In reality, we cannot achieve this, as the second criterion can rule out the selection of such signals. Specifically,z=x+y=n^2+y, where z, x, y, and n denote raw signals, relevant signals, irrelevant signals and latent variables. If the relevant signals obtained by d-VAE are n, then these signals can be linear decoded accurately. However, the corresponding irrelevant signals are n^2-n+z; thus, irrelevant signals will have much information, and these extracted relevant signals will not be selected. Furthermore, our synthetic experiments offer additional evidence supporting the conclusion that d-VAE does not make inherently nonlinearly decodable signals become linearly decodable ones. As depicted in Fig. S11c, there exists a significant performance gap between KF and ANN when decoding the ground truth signals of smaller R2 neurons. KF exhibits notably low performance, leaving substantial room for compensation by d-VAE. However, following processing by d-VAE, KF's performance of distilled signals fails to surpass its already low ground truth performance and remains significantly inferior to ANN's performance. These results collectively confirm that our approach does not convert signals that are inherently nonlinearly decodable into linearly decodable ones, and the conclusion of linear readout is not a by-product by d-VAE.

      Regarding the suggestion of using linear d-VAE + KF, as discussed in the Discussion section, removing the irrelevant signals requires a nonlinear operation, and linear d-VAE can not effectively separate relevant and irrelevant signals.

      Thank you for your valuable feedback.

      Q4: “The authors interpret several results as indications that "behavioral information is distributed in a higher-dimensional subspace than expected from raw signals", which is the second main conclusion highlighted in the abstract. However, several of these arguments do not convincingly support that conclusion.

      4.1) The authors observe that behaviorally relevant signals for neurons with small principal components (referred to as secondary) have worse decoding with KF but better decoding with ANN (Fig. 6b,e), which also outperforms ANN decoding from raw signals. This observation is taken to suggest that these secondary behaviorally relevant signals encode behavior information in highly nonlinear ways and in a higher dimensions neural space than expected (lines 424 and 428). These conclusions however are confounded by the fact that A) d-VAE uses nonlinear encoding, so one cannot conclude from ANN outperforming KF that behavior is encoded nonlinearly in the motor cortex (see comment 3 above), and B) d-VAE aggregates information across the population so one cannot conclude that these secondary neurons themselves had as much behavior information (see comment 2 above).

      4.2) The authors observe that the addition of the inferred behaviorally relevant signals for neurons with small principal components (referred to as secondary) improves the decoding of KF more than it improves the decoding of ANN (red curves in Fig 6c,f). This again is interpreted similarly as in 4.1, and is confounded for similar reasons (line 439): "These results demonstrate that irrelevant signals conceal the smaller variance PC signals, making their encoded information difficult to be linearly decoded, suggesting that behavioral information exists in a higher-dimensional subspace than anticipated from raw signals". This is confounded by because of the two reasons explained in 4.1. To conclude nonlinear encoding based on the difference in KF and ANN decoding, the authors would need to make the encoding/decoding in their VAE linear to have a fully linear decoder on one hand (with linear d-VAE + KF) and a nonlinear decoder on the other hand (with linear d-VAE + ANN), as explained in comment 3.

      4.3) From S Fig 8, where the authors compare cumulative variance of PCs for raw and inferred behaviorally relevant signals, the authors conclude that (line 554): "behaviorally-irrelevant signals can cause an overestimation of the neural dimensionality of behaviorally-relevant responses (Supplementary Fig. S8)." However, this analysis does not really say anything about overestimation of "behaviorally relevant" neural dimensionality since the comparison is done with the dimensionality of "raw" signals. The next sentence is ok though: "These findings highlight the need to filter out relevant signals when estimating the neural dimensionality.", because they use the phrase "neural dimensionality" not "neural dimensionality of behaviorally-relevant responses".”

      Questions 4.1 and 4.2 are a combination of Q2 and Q3. Please refer to our responses to Q2 and Q3.

      Regarding question 4.3 about “behaviorally-irrelevant signals can cause an overestimation of the neural dimensionality of behaviorally-relevant responses”: Previous studies usually used raw signals to estimate the neural dimensionality of specific behaviors. We mean that using raw signals, which include many irrelevant signals, will cause an overestimation of the neural dimensionality. We have modified this sentence in the revised manuscripts.

      Thank you for your valuable feedback.

      Q5: “Imprecise use of language in many places leads to inaccurate statements. I will list some of these statements”

      5.1) In the abstract: "One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive due to the unknown ground truth of behaviorally-relevant signals". This statement is not accurate because it implies no prior work does this. The authors should make their statement more specific and also refer to some goal that existing linear (e.g., PSID) and nonlinear (e.g., TNDM) methods for extracting behaviorally relevant signals fail to achieve.

      5.2) In the abstract: "we found neural responses previously considered useless encode rich behavioral information" => what does "useless" mean operationally? Low behavior tuning? More precise use of language would be better.

      5.3) "... recent studies (Glaser 58 et al., 2020; Willsey et al., 2022) demonstrate nonlinear readout outperforms linear readout." => do these studies show that nonlinear "readout" outperforms linear "readout", or just that nonlinear models outperform linear models?

      5.4) Line 144: "The first criterion is that the decoding performance of the behaviorally-relevant signals (red bar, Fig.1) should surpass that of raw signals (the red dotted line, Fig.1).". Do the authors mean linear decoding here or decoding in general? If the latter, how can something extracted from neural surpass decoding of neural data, when the extraction itself can be thought of as part of decoding? The operational definition for this "decoding performance" should be clarified.

      5.5) Line 311: "we found that the dimensionality of primary subspace of raw signals (26, 64, and 45 for datasets A, B, and C) is significantly higher than that of behaviorally-relevant signals (7, 13, and 9), indicating that behaviorally-irrelevant signals lead to an overestimation of the neural dimensionality of behaviorally-relevant signals." => here the dimensionality of the total PC space (i.e., primary subspace of raw signals) is being compared with that of inferred behaviorally-relevant signals, so the former being higher does not indicate that neural dimensionality of behaviorally-relevant signals was overestimated. The former is simply not behavioral so this conclusion is not accurate.

      5.6) Section "Distilled behaviorally-relevant signals uncover that smaller R2 neurons encode rich behavioral information in complex nonlinear ways". Based on what kind of R2 are the neurons grouped? Behavior decoding R2 from raw signals? Using what mapping? Using KF? If KF is used, the result that small R2 neurons benefit a lot from d-VAE could be somewhat expected, given the nonlinearity of d-VAE: because only ANN would have the capacity to unwrap the nonlinear encoding of d-VAE as needed. If decoding performance that is used to group neurons is based on data, regression to the mean could also partially explain the result: the neurons with worst raw decoding are most likely to benefit from a change in decoder, than neurons that already had good decoding. In any case, the R2 used to partition and sort neurons should be more clearly stated and reminded throughout the text and I Fig 3.

      5.7) Line 346 "...it is impossible for our model to add the activity of larger R2 neurons to that of smaller R2 neurons" => Is it really impossible? The optimization can definitely add small-scale copies of behaviorally relevant information to all neurons with minimal increase in the overall optimization loss, so this statement seems inaccurate.

      5.8) Line 490: "we found that linear decoders can achieve comparable performance to that of nonlinear decoders, providing compelling evidence for the presence of linear readout in the motor cortex." => inaccurate because no d-VAE decoding is really linear, as explained in comment 3 above.

      5.9) Line 578: ". However, our results challenge this idea by showing that signals composed of smaller variance PCs nonlinearly encode a significant amount of behavioral information." => inaccurate as results are confounded by nonlinearity of d-VAE as explained in comment 3 above.

      5.10) Line 592: "By filtering out behaviorally-irrelevant signals, our study found that accurate decoding performance can be achieved through linear readout, suggesting that the motor cortex may perform linear readout to generate movement behaviors." => inaccurate because it us confounded by the nonlinearity of d-VAE as explained in comment 3 above.”

      Regarding “5.1) In the abstract: "One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive due to the unknown ground truth of behaviorally-relevant signals". This statement is not accurate because it implies no prior work does this. The authors should make their statement more specific and also refer to some goal that existing linear (e.g., PSID) and nonlinear (e.g., TNDM) methods for extracting behaviorally relevant signals fail to achieve”:

      We believe our statement is accurate. Our primary objective is to extract accurate behaviorally-relevant signals that closely approximate the ground truth relevant signals. To achieve this, we strike a balance between the reconstruction and decoding performance of the generated signals, aiming to effectively capture the relevant signals. This crucial aspect of our approach sets it apart from other methods. In contrast, other methods tend to emphasize the extraction of valuable latent neural dynamics. We have provided elaboration on the distinctions between d-VAE and other approaches in the Introduction and Discussion sections.

      Thank you for your valuable feedback.

      Regarding “5.2) In the abstract: "we found neural responses previously considered useless encode rich behavioral information" => what does "useless" mean operationally? Low behavior tuning? More precise use of language would be better.”:

      In the analysis of neural signals, smaller variance PC signals are typically seen as noise and are often discarded. Similarly, smaller R2 neurons are commonly thought to be dominated by noise and are not further analyzed. Given these considerations, we believe that the term "considered useless" is appropriate in this context. Thank you for your valuable feedback.

      Regarding “5.3) "... recent studies (Glaser 58 et al., 2020; Willsey et al., 2022) demonstrate nonlinear readout outperforms linear readout." => do these studies show that nonlinear "readout" outperforms linear "readout", or just that nonlinear models outperform linear models?”:

      In this paper, we consider the two statements to be equivalent. Thank you for your valuable feedback.

      Regarding “5.4) Line 144: "The first criterion is that the decoding performance of the behaviorally-relevant signals (red bar, Fig.1) should surpass that of raw signals (the red dotted line, Fig.1).". Do the authors mean linear decoding here or decoding in general? If the latter, how can something extracted from neural surpass decoding of neural data, when the extraction itself can be thought of as part of decoding? The operational definition for this "decoding performance" should be clarified.”:

      We mean the latter, as we said in the section “Framework for defining, extracting, and separating behaviorally-relevant signals”, since raw signals contain too many behaviorally-irrelevant signals, deep neural networks are more prone to overfit raw signals than relevant signals. Therefore the decoding performance of relevant signals should surpass that of raw signals. Thank you for your valuable feedback.

      Regarding “5.5) Line 311: "we found that the dimensionality of primary subspace of raw signals (26, 64, and 45 for datasets A, B, and C) is significantly higher than that of behaviorally-relevant signals (7, 13, and 9), indicating that behaviorally-irrelevant signals lead to an overestimation of the neural dimensionality of behaviorally-relevant signals." => here the dimensionality of the total PC space (i.e., primary subspace of raw signals) is being compared with that of inferred behaviorally-relevant signals, so the former being higher does not indicate that neural dimensionality of behaviorally-relevant signals was overestimated. The former is simply not behavioral so this conclusion is not accurate.”: In practice, researchers usually used raw signals to estimate the neural dimensionality. We mean that using raw signals to do this would overestimate the neural dimensionality. Thank you for your valuable feedback.

      Regarding “5.6) Section "Distilled behaviorally-relevant signals uncover that smaller R2 neurons encode rich behavioral information in complex nonlinear ways". Based on what kind of R2 are the neurons grouped? Behavior decoding R2 from raw signals? Using what mapping? Using KF? If KF is used, the result that small R2 neurons benefit a lot from d-VAE could be somewhat expected, given the nonlinearity of d-VAE: because only ANN would have the capacity to unwrap the nonlinear encoding of d-VAE as needed. If decoding performance that is used to group neurons is based on data, regression to the mean could also partially explain the result: the neurons with worst raw decoding are most likely to benefit from a change in decoder, than neurons that already had good decoding. In any case, the R2 used to partition and sort neurons should be more clearly stated and reminded throughout the text and I Fig 3.”:

      When employing R2 to characterize neurons, it indicates the extent to which neuronal activity is explained by the linear encoding model [1-3]. Smaller R2 neurons have a lower capacity for linearly tuning (encoding) behaviors, while larger R2 neurons have a higher capacity for linearly tuning (encoding) behaviors. Specifically, the approach involves first establishing an encoding relationship from velocity to neural signal using a linear model, i.e., y=f(x), where f represents a linear regression model, x denotes velocity, and y denotes the neural signal. Subsequently, R2 is utilized to quantify the effectiveness of the linear encoding model in explaining neural activity. We have provided a comprehensive explanation in the revised manuscript. Thank you for your valuable feedback.

      [1] Collinger, J.L., Wodlinger, B., Downey, J.E., Wang, W., Tyler-Kabara, E.C., Weber, D.J., McMorland, A.J., Velliste, M., Boninger, M.L. and Schwartz, A.B., 2013. High-performance neuroprosthetic control by an individual with tetraplegia. The Lancet, 381(9866), pp.557-564.

      [2] Wodlinger, B., et al. "Ten-dimensional anthropomorphic arm control in a human brain− machine interface: difficulties, solutions, and limitations." Journal of neural engineering 12.1 (2014): 016011.

      [3] Inoue, Y., Mao, H., Suway, S.B., Orellana, J. and Schwartz, A.B., 2018. Decoding arm speed during reaching. Nature communications, 9(1), p.5243.

      Regarding Questions 5.7, 5.8, 5.9, and 5.10:

      We believe our conclusions are solid. The reasons can be found in our replies in Q2 and Q3. Thank you for your valuable feedback.

      Q6: “Imprecise use of language also sometimes is not inaccurate but just makes the text hard to follow.

      6.1) Line 41: "about neural encoding and decoding mechanisms" => what is the definition of encoding/decoding and how do these differ? The definitions given much later in line 77-79 is also not clear.

      6.2) Line 323: remind the reader about what R2 is being discussed, e.g., R2 of decoding behavior using KF. It is critical to know if linear or nonlinear decoding is being discussed.

      6.3) Line 488: "we found that neural responses previously considered trivial encode rich behavioral information in complex nonlinear ways" => "trivial" in what sense? These phrases would benefit from more precision, for example: "neurons that may seem to have little or no behavior information encoded". The same imprecise word ("trivial") is also used in many other places, for example in the caption of Fig S9.

      6.4) Line 611: "The same should be true for the brain." => Too strong of a statement for an unsupported claim suggesting the brain does something along the lines of nonlin VAE + linear readout.

      6.5) In Fig 1, legend: what is the operational definition of "generating performance"? Generating what? Neural reconstruction?”

      Regarding “6.1) Line 41: "about neural encoding and decoding mechanisms" => what is the definition of encoding/decoding and how do these differ? The definitions given much later in line 77-79 is also not clear.”:

      We would like to provide a detailed explanation of neural encoding and decoding. Neural encoding means how neuronal activity encodes the behaviors, that is, y=f(x), where y denotes neural activity and, x denotes behaviors, f is the encoding model. Neural decoding means how the brain decodes behaviors from neural activity, that is, x=g(y), where g is the decoding model. For further elaboration, please refer to [1]. We have included references that discuss the concepts of encoding and decoding in the revised manuscript. Thank you for your valuable feedback.

      [1] Kriegeskorte, Nikolaus, and Pamela K. Douglas. "Interpreting encoding and decoding models." Current opinion in neurobiology 55 (2019): 167-179.

      Regarding “6.2) Line 323: remind the reader about what R2 is being discussed, e.g., R2 of decoding behavior using KF. It is critical to know if linear or nonlinear decoding is being discussed.”:

      This question is the same as Q5.6. Please refer to the response to Q5.6. Thank you for your valuable feedback.

      Regarding “6.3) Line 488: "we found that neural responses previously considered trivial encode rich behavioral information in complex nonlinear ways" => "trivial" in what sense? These phrases would benefit from more precision, for example: "neurons that may seem to have little or no behavior information encoded". The same imprecise word ("trivial") is also used in many other places, for example in the caption of Fig S9.”:

      We have revised this statement in the revised manuscript. Thanks for your recommendation.

      Regarding “6.4) Line 611: "The same should be true for the brain." => Too strong of a statement for an unsupported claim suggesting the brain does something along the lines of nonlin VAE + linear readout.”

      We mean that removing the interference of irrelevant signals and decoding the relevant signals should logically be two stages. We have revised this statement in the revised manuscript. Thank you for your valuable feedback.

      Regarding “6.5) In Fig 1, legend: what is the operational definition of "generating performance"? Generating what? Neural reconstruction?””:

      We have replaced “generating performance” with “reconstruction performance” in the revised manuscript. Thanks for your recommendation.

      Q7: “In the analysis presented starting in line 449, the authors compare improvement gained for decoding various speed ranges by adding secondary (small PC) neurons to the KF decoder (Fig S11). Why is this done using the KF decoder, when earlier results suggest an ANN decoder is needed for accurate decoding from these small PC neurons? It makes sense to use the more accurate nonlinear ANN decoder to support the fundamental claim made here, that smaller variance PCs are involved in regulating precise control”

      Because when the secondary signal is superimposed on the primary signal, the enhancement in KF performance is substantial. We wanted to explore in which aspect of the behavior the KF performance improvement is mainly reflected. In comparison, the improvement of ANN by the secondary signal is very small, rendering the exploration of the aforementioned questions inconsequential. Thank you for your valuable feedback.

      Q8: “A key limitation of the VAE architecture is that it doesn't aggregate information over multiple time samples. This may be why the authors decided to use a very large bin size of 100ms and beyond that smooth the data with a moving average. This limitation should be clearly stated somewhere in contrast with methods that can aggregate information over time (e.g., TNDM, LFADS, PSID) ”

      We have added this limitation in the Discussion in the revised manuscript. Thanks for your recommendation.

      Q9: “Fig 5c and parts of the text explore the decoding when some neurons are dropped. These results should come with a reminder that dropping neurons from behaviorally relevant signals is not technically possible since the extraction of behaviorally relevant signals with d-VAE is a population level aggregation that requires the raw signal from all neurons as an input. This is also important to remind in some places in the text for example:

      • Line 498: "...when one of the neurons is destroyed."

      • Line 572: "In contrast, our results show that decoders maintain high performance on distilled signals even when many neurons drop out."”

      We want to explore the robustness of real relevant signals in the face of neuron drop-out. The signals our model extracted are an approximation of the ground truth relevant signals and thus serve as a substitute for ground truth to study this problem. Thank you for your valuable feedback.

      Q10: “Besides the confounded conclusions regarding the readout being linear (see comment 3 and items related to it in comment 5), the authors also don't adequately discuss prior works that suggest nonlinearity helps decoding of behavior from the motor cortex. Around line 594, a few works are discussed as support for the idea of a linear readout. This should be accompanied by a discussion of works that support a nonlinear encoding of behavior in the motor cortex, for example (Naufel et al. 2019; Glaser et al. 2020), some of which the authors cite elsewhere but don't discuss here.”

      We have added this discussion in the revised manuscript. Thanks for your recommendation.

      Q11: “Selection of hyperparameters is not clearly explained. Starting line 791, the authors give some explanation for one hyperparameter, but not others. How are the other hyperparameters determined? What is the search space for the grid search of each hyperparameter? Importantly, if hyperparameters are determined only based on the training data of each fold, why is only one value given for the hyperparameter selected in each dataset (line 814)? Did all 5 folds for each dataset happen to select exactly the same hyperparameter based on their 5 different training/validation data splits? That seems unlikely.”

      We perform a grid search in {0.001, 0.01,0.1,1} for hyperparameter beta. And we found that 0.001 is the best for all datasets. As for the model parameters, such as hidden neuron numbers, this model capacity has reached saturation decoding performance and does not influence the results.

      Regarding “Importantly, if hyperparameters are determined only based on the training data of each fold, why is only one value given for the hyperparameter selected in each dataset (line 814)? Did all 5 folds for each dataset happen to select exactly the same hyperparameter based on their 5 different training/validation data splits”: We selected the hyperparameter based on the average performance of 5 folds data on validation sets. The selected value denotes the one that yields the highest average performance across the 5 folds data.

      Thank you for your valuable feedback.

      Q12: “d-VAE itself should also be explained more clearly in the main text. Currently, only the high-level idea of the objective is explained. The explanation should be more precise and include the idea of encoding to latent state, explain the relation to pip-VAE, explain inputs and outputs, linearity/nonlinearity of various mappings, etc. Also see comment 1 above, where I suggest adding more details about other methods in the main text.”

      Our primary objective is to delve into the encoding and decoding mechanisms using the separated relevant signals. Therefore, providing an excessive amount of model details could potentially distract from the main focus of the paper. In response to your suggestion, we have included a visual representation of d-VAE's structure, input, and output (see Fig. S1) in the revised manuscript, which offers a comprehensive and intuitive overview. Additionally, we have expanded on the details of d-VAE and other methods in the Methods section.

      Thank you for your valuable feedback.

      Q13: “In Fig 1f and g, shouldn't the performance plots be swapped? The current plots seem counterintuitive. If there is bias toward decoding (panel g), why is the irrelevant residual so good at decoding?”

      The placement of the performance plots in Fig. 1f and 1g is accurate. When the model exhibits a bias toward decoding, it prioritizes extracting the most relevant features (latent variables) for decoding purposes. As a consequence, the model predominantly generates signals that are closely associated with these extracted features. This selective signal extraction and generation process may result in the exclusion of other potentially useful information, which will be left in the residuals. To illustrate this concept, consider the example of face recognition: if a model can accurately identify an individual using only the person's eyes (assuming these are the most useful features), other valuable information, such as details of the nose or mouth, will be left in the residuals, which could also be used to identify the individual.

      Thank you for your valuable feedback.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary:

      In this interesting work, the authors investigated an important topical question: when we see travelling waves in cortical activity, is this due to true wave-like spread, or due to sequentially activated sources? In simulations, it is shown that sequential brain module activation can show up as a travelling wave - even in improved methods such as phase delay maps - and a variety of parameters is investigated. Then, in ex-vivo turtle eye-brain preparations, the authors show that visual cortex waves observable in local field potentials are in fact often better explained as areas D1 and D2 being sequentially activated. This has implications for how we think about travelling wave methodology and relevant analytical tools.

      Strengths:

      I enjoyed reading the discussion. The authors are careful in their claims, and point out that some phenomena may still indeed be genuine travelling waves, but we should have a higher evidence bar to claim this for a particular process in light of this paper and Zhigalov & Jensen (2023) (ref 44). Given this careful discussion, the claims made are well-supported by the experimental results. The discussion also gives a nice overview of potential options in light of this and future directions.

      The illustration of different gaussian covariances leading to very different latency maps was interesting to see.

      Furthermore, the methods are detailed and clearly structured and the Supplementary Figures, particularly single trial results, are useful and convincing.

      We are glad the reviewer found our manuscript “interesting”, the questions we raise “important”, our claims “well-supported by the experimental results”, and our methods “detailed and clearly structured”.

      The details of the sequentially activated Gaussian simulations give some useful results, but the fundamental idea still appears to be "sequential activation is often indistinguishable from a travelling wave", an idea advanced e.g. by Zhigalov & Jensen (2023). It takes a while until the (in my opinion) more intriguing experimental results.

      To emphasize the experimental results, we switched between the analytical results and the experimental results. Correspondingly, figure 2 now illustrates the more intriguing experimental results and figure 3 the analytical results. In addition, we added subtitles to the different sections of the results to ease the navigation through the paper and to enable the readers to access the different sections more easily.

      One of the key claims is that the spikes are more consistent with two sequentially activated modules rather than a continuous wave (with Fig 3k and 3l key to support this). Whilst this is more consistent, it is worth mentioning that there seems to be stochasticity to this and between-trial variability, especially for spikes.

      In the revised manuscript we added the reviewer’s comment about stochasticity, and we discuss its possible origins:

      "The transition was also not clear when examining spiking responses in some of the trials (as indicated by high DIP scores, Figure 2K). However, the observation that temporal grouping became more pronounced when using ALSA (a more robust estimate of local excitability) (Figure 2L,N), suggests that high DIP values may result from variability in the spike times of single neurons, and not necessarily from the lack of modular activation. Such issues can be resolved by denser sampling of spiking activity in the tissue."

      Recommendations For The Authors:

      The eye-cortex turtle preparation is not the most common. I would add more context about how specific the results are to this preparation vs how comparable it is to human data.

      We added a sentence explaining the relevance of our preparation: “Finally, while the layered organization of turtle cortex is different than that of mammalian cortex, the basic excitability features of both tissues are similar (Connors and Kriegstein, 1986; Hemberger et al., 2019; Kriegstein and Connors, 1986; Larkum et al., 2008; Shein-Idelson et al., 2017b), and substantial differences in the manner by which field potentials and spikes spread through the tissue are not to be expected.”

      Philosophical question: when does a 'module' become small enough for it to count as a travelling wave? More on this could be added to the discussion. I think we are in the very early days for a true understanding of travelling waves, and I wonder if these sequentially activated modules will functionally correspond to the known cortical segregation, or if it varies by area/task.

      We agree with the reviewer that macroscopic waves could be composed of smaller modules (or single neurons at the smallest scale). Our results suggest that modular patterns can be classified as wave patterns both at large scales (of brain areas) and smaller scales of local neural circuits. Therefore, we believe it is necessary to make this distinction across different scales. We sharpened this point in the first paragraph of the discussion:

      "…We showed that LFP measurements indicative of waves propagating across turtle cortex are underlined by discrete and consecutively activated neuronal populations, and not by a continuously propagating wavefront of spikes (Figure 2). Similarly, activation profiles that resemble continuous travelling waves in EEG simulations can be underlined by consecutive activation of two discrete cortical regions (Figure 1). We replicated these results using an analytical model and demonstrated that a simple scenario of sequentially activated Gaussians can exhibit WLPs with a rich diversity of spatiotemporal profiles (Figure 3). Our results offer insight into the scenarios and conditions for WLP detection by identifying failure points that should be considered when identifying travelling waves and therefore suggest caution when interpreting continuous phase latency maps as microscopically propagating wave patterns. Such failure points may exist both when examining activity at the scale of brain regions (Figure 1) and smaller neural circuits (Figure 2). Therefore, our results suggest that the discrepancy between modular and wave activation should be examined across spatial scales. Specifically, it is not necessarily the case that at the fine grained (single neuron) scale activation patterns are modular, but, following coarse graining, smooth wave patterns emerge. Rather, modular activation may hierarchically exist across scales (Kaiser and Hilgetag, 2010; Meunier et al., 2010) and may be masked by smeared spatial supra-threshold excitability boundaries. Below we discuss these limitations across techniques and their implications.”

      I would advise the authors to focus on the experimental data, perhaps by putting the simulations second, and by putting some of the equation details that are in Methods into the Supplementary Information. Whilst the simulation parameter space is well-explored, the fundamental idea of spreading Gaussians is relatively simple, and the current manuscript organization detracted from the main message for me a little bit.”

      Following the referee’s suggestion, we switched between the section with experimental data and the one with the analytic model (see response to comment 1). In addition, to ease the reading of the methods, we moved the mathematical derivation and related equations to appendix 1.

      Things I thought about that you may also enjoy thinking about: Could we tell something about sequential sources vs travelling waves by the nature of the wave - e.g. shape or dispersion? If some wave properties are conserved whilst travelling, this could be evidence for travelling vs two sources.

      This is a wonderful suggestion. We are currently working on a follow up publication with a new approach to do exactly that! We think that this new body of work is outside the scope of this paper.

      Could synaptic potentials spread like waves, but spikes more in modular bursts? This would also explain the LFP vs spikes difference - maybe travelling waves of EPSPs are there priming the network, 'looking' for suitable modules to activate, which then activate sequentially. The current discussion is quite spike-focused - could some information be in synaptic potentials after all?

      This is an interesting idea with intriguing functional implications. We added this idea to our discussion (see paragraph below). In addition, to emphasize our discussion on synaptic potentials, we reorganized the paragraphs in the discussion to separate between our discussion on sub-threshold excitability (which is mostly synaptic) and supra-threshold excitability which is the focus of the second part of the discussion.

      “Variability in responses may also be explained by differences in propagation mechanisms (Ermentrout and Kleinfeld, 2001; Muller et al., 2018; Wu et al., 2008). Several reports suggest that waves are underlined by propagation along axonal collaterals (Muller et al., 2018, 2014). Both the transmembrane voltage-gated currents excited during action potentials as well as the post-synaptic currents along axonal boutons can potentially contribute to measured signals. However, such waves travel at high propagation speeds and are not compatible with the wide diversity of wave velocities and mechanisms of local neuronal interactions (Ermentrout and Kleinfeld, 2001; Feller et al., 1996). An intriguing possibility is that such axonal waves prime neuronal excitability by sub-threshold inputs that later result in modular supra-threshold activation. The ability to experimentally discriminate between axonal inputs and local spiking excitability (e.g. by reporters with different wavelengths) can potentially resolve such discrepancies.

      Our turtle cortex results (Figure 2) exemplify how contrasting sub-threshold LFP measurements with supra-threshold spiking measurements can yield different conclusions about the nature of activity spread….”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      […] While this does not rule out criticality in the brain, it decidedly weakens the evidence for it, which was based on the following logic: critical systems give rise to power law behavior; power law behavior is observed in cortical networks; therefore, cortical networks operate near a critical point. Given, as shown in this paper, that power laws can arise from noncritical processes, the logic breaks. Moreover, the authors show that criticality does not imply optimal information transmission (one of its proposed functions). This highlights the necessity for more rigorous analyses to affirm criticality in the brain. In particular, it suggests that attention should be focused on the question "does the brain implement a dynamical latent variable model?".

      These authors are not the first to show that slowly varying firing rates can give rise to power law behavior (see, for example, Touboul and Destexhe, 2017; Priesemann and Shriki, 2018). However, to our knowledge they are the first to show crackling, and to compute information transmission in the critical state.

      We thank the reviewers for their thoughtful assessment of our paper.

      We would push back on the assessment that our model ‘has nothing to do with criticality,’ and that we observed ‘signatures of criticality [that] emerge through fundamentally non-critical mechanisms.’ This assessment partially stems from the definition of criticality provided in the Public Comment, that ‘criticality is a very specific set of phenomena in physics in which fundamentally local interactions produce unexpected long-range behavior.’

      Our disagreement is largely focused on this definition, which we do not think is a standard definition. Taking the favorite textbook example, the Ising model, criticality is characterized by a set of power-law divergences in thermodynamic quantities (e.g., susceptibility, specific heat, magnetization) at the critical temperature, with exponents of these power laws governed by scaling laws. It is not defined by local interactions. All-to-all Ising model is generally viewed as showing a critical behavior at a certain temperature, even though interactions there are manifestly non-local. It is possible that, by “local” in the definition, the Public Comment meant that interactions are “collective” and among microscopic degrees of freedom. However, that same all-to-all Ising model is mathematically equivalent to the mean-field model, where criticality is achieved through large fluctuations of the mean field, but not through microscopic interactions.

      More commonly, criticality is defined by power laws and scaling relationships that emerge at a critical value of a parameter(s) of the system. That is, criticality is defined by its signatures. What is crucial in all such definitions is that this atypical, critical state requires fine tuning. For example, in the textbook example of the Ising model, a parameter (the temperature) must be tuned to a critical value for critical behavior to appear. In the branching process model that generates avalanche criticality, criticality requires tuning m=1. The key result of our paper is that all signatures expected for avalanche criticality (power laws, crackling, and, as shown below, estimates of the branching rate m), and hence the criticality itself, appear without fine-tuning.

      As we discussed in our introduction, there are a few other instances of signatures of criticality (and hence of criticality itself) emerging without fine-tuning. The first we are aware of was the demonstration of Zipf’s Law (by Schwab, et al. 2014, and Aitchison et al. 2016), a power-law relationship between rank and frequency of states, which was shown to emerge generically in systems driven by a broadly distributed latent variable. A second example, arising from applications of coarse-graining analysis to neural data (cf., Meshulam et al. 2019; also, Morales et al., 2023), was demonstrated in our earlier paper (Morrell et al. 2021). Thus, here we have a third example: the model in this paper generates signatures of criticality in the statistics of avalanches of activity, and it does so without fine-tuning (cf., Fig. 2-3).

      The rate at which these ‘criticality without fine-tuning' examples are piling up may inspire revisiting the requirement of fine-tuning in the definition of criticality, and our ongoing work (Ngampruetikorn et al. 2023) suggests that criticality may be more accurately defined through large fluctuations (variance > 1/N) rather than through fine-tuning or scaling relations.

      References:

      • Schwab DJ, Nemenman I, Mehta P. “Zipf’s Law and Criticality in Multivariate Data without FineTuning.” Phys Rev Lett. 2014 Aug; doi::101103/PhysRevLett.113.068102,

      • Aitchison L, Corradi N, Latham PE. “Zipf’s Law Arising Naturally When There Are Underlying, Unobserved Variables.” PLOS Computational biology. 2016 12; 12(12):1-32. doi:10.1371/journal.pcbi.1005110

      • Meshulam L, Gauthier JL, Brody CD, Tank DW, Bialek W. “Coarse Graining, Fixed Points, and Scaling in a Large Population of Neurons.” Phys Rev Lett. 2019 Oct; doi: 10.1103/PhysRevLett.123.178103.

      • Morales GB, di Santo S, Muñoz MA. “Quasiuniversal scaling in mouse-brain neuronal activity stems from edge-of-instability critical dynamics.” Proceedings of the National Academy of Sciences. 2023; 120(9):e2208998120.

      • Morrell MC, Sederberg AJ, Nemenman I. “Latent Dynamical Variables Produce Signatures of Spatiotemporal Criticality in Large Biological Systems.” Phys Rev Lett. 2021 Mar; doi: 10.1103/PhysRevLett.126.118302.

      • Ngampruetikorn, V., Nemenman, I., Schwab, D., “Extrinsic vs Intrinsic Criticality in Systems with Many Components.” arXiv: arXiv:2309.13898 [physics.bio-ph]

      Major comments:

      1) For many readers, the essential messages of the paper may not be immediately clear. For example, is the paper criticizing the criticality hypothesis of cortical networks, or does the criticism extend deeper, to the theoretical predictions of "crackling" relationships in physical systems as they can emerge without criticality? Statements like "We show that a system coupled to one or many dynamical latent variables can generate avalanche criticality ..." could be misinterpreted as affirming criticality. A more accurate language is needed; for instance, the paper could state that the model generates relationships observed in critical systems. The paper should provide a clearer conclusion and interpretation of the findings in the context of the criticality hypothesis of cortical dynamics.

      Please see the response to the Public Review, above. To clarify the essential message that the dynamical latent variable model produces avalanche criticality without fine-tuning, we have made revisions to the abstract and introduction. This point was already made in the discussion (first sentence).

      Key sentences changed in the abstract:

      "… We find that populations coupled to multiple latent variables produce critical behavior across a broader parameter range than those coupled to a single, quasi-static latent variable, but in both cases, avalanche criticality is observed without fine-tuning of model parameters. … Our results suggest that avalanche criticality arises in neural systems in which activity is effectively modeled as a population driven by a few dynamical variables and these variables can be inferred from the population activity."

      In the introduction, we changed the final sentence to read:

      "These results demonstrate how criticality in neural recordings can arise from latent dynamics in neural activity, without need for fine-tuning of network parameters."

      2) On lines 97-99, the authors state that "We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from recurrent dynamics locally". This idea is also repeated at the beginning of the Summary section. Perhaps being agnostic isn't such a good idea: it's possible that the recurrent dynamics is in a critical regime, which would just push the problem upstream. Presumably you're thinking of recurrent dynamics with slow timescales that's not critical? Or are you happy if it's in the critical regime? This should be clarified.

      We have amended this sentence to clarify that any latent dynamics with large fluctuations would suffice:

      ”We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from large fluctuations in local recurrent dynamics.”

      3) Even though the model in Equation 2 has been described in a previous publication and the Methods section, more details regarding the origin and justification of this model in the context of cortical networks would be helpful in the Results section. Was it chosen just for simplicity, or was there a deeper reason?

      This model was chosen for its simplicity: there are no direct interactions between neurons, coupling between neurons and latent variables is random, and simulation is straightforward. More complex latent dynamics or non-random structure in the coupling matrices could have been used, but our aim was to explore this model in the simplest setting possible.

      We have revised the Results (“Avalanche scaling in a dynamical latent variable model,” first paragraph) to justify the choice of the model:

      "We study a model of a population of neurons that are not coupled to each other directly but are driven by a small number of dynamical latent variables -- that is, slowly changing inputs that are not themselves measured (Fig.~\ref{fig:fig1}A). We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from large fluctuations in local recurrent dynamics. The model was chosen for its simplicity, and because we have previously shown that this model with at least about five latent variables can produce power laws under the coarse-graining analysis \citep{Morrell2021}."

      We have added the following to the beginning of the Methods section expanding on the reasons for this choice:

      "We study a model from Morrell 2021, originally constructed as a model of large populations of neurons in mouse hippocampus. Neurons are non-interacting, receiving inputs reflective of place-field selectivity as well as input current arising from a random projection from a small number of dynamical latent variables, representing inputs shared across the population of neurons that are not directly measured or controlled. In the current paper, we incorporate only the latent variables (no place variables), and we assume that every cell is coupled to every latent variable with some randomly drawn coupling strength."

      4) The Methods section (paragraph starting on line 340) connects the time scale to actual time scales in neuronal systems, stating that "The timescales of latent variables examined range from about 3 seconds to 3000 seconds, assuming 3-ms bins". While bins of 3 ms are relevant for electrophysiological data from LFPs or high-density EEG/MEG, time scales above 10 seconds are difficult to generate through biophysically clear processes like ionic channels and synaptic transmission. The paper suggests that slow time scales of the latent variables are crucial for obtaining power law behavior resembling criticality. Yet, one way to generate such slow time scales is via critical slowing down, implying that some brain areas providing input to the network under study may operate near criticality. This pushes the problem toward explaining the criticality of those external networks. Hence, discussing potential sources for slow time scales in latent variables is crucial. One possibility you might want to consider is sources external to the organism, which could easily have time scales in the 1-24 hour range.

      As the reviewers note, it is a possibility that slow timescales arise from some other brain area in which dynamics are slow due to critical dynamics, but many other plausible sources exist. These include slowly varying sensory stimuli or external sources, as suggested by the reviewers. It is also possible to generate “effective” slow dynamics from non-critical internal sources. One example, from recordings in awake mice, is the slow change in the level of arousal that occurs on the scale of many seconds to minutes. These changes arise from release of neuromodulators that have broad effects on neural populations and correlations in activity (for a focused review, see Poulet and Crochet, 2019).

      We have added the following sentence to the Methods section where timescales of latent variables was discussed:

      "The timescales of latent variables examined range from about $3$ seconds to $3000$ seconds, assuming $3$-ms bins. Inputs with such timescales may arise from external sources, such as sensory stimuli, or from internal sources, such as changes in physiological state."

      5) It is common in neuronal avalanche analysis to calculate the branching parameter using the ratio of events in consecutive bins. Near-critical systems should display values close to 1, especially in simulations without subsampling. Including the estimated values of the branching parameter for the different cases investigated in this study could provide more comprehensive data. While the paper acknowledges that the obtained exponents in the model differ from those in a critical branching process, it would still be beneficial to offer the branching parameter of the observed avalanches for comparison.

      The reviewers requested that the branching parameter be computed in our model. We point out that, for the quasi-stationary latent variables (as in Fig. 3), a branching parameter of 1 is expected because the summed activity at time t+k is, on average, equal to the summed activity at time t, regardless of k. Numerics are consistent with this expectation. Following the methodology for an unbiased estimate of the branching parameter from Wilting and Priesemann (2018), we checked an example set of parameters (epsilon = 8, eta = 3) for quasi-stationary latent fields. We found that the naïve (biased) estimate of the branching parameter was 0.94, and that the unbiased estimator was exp(−1.4⋅10−8) ≈ 0.999999986.

      For faster time scales, it is no longer true that summed activity is constant over time, as the temporal correlations in activity decay exponentially. Using the five-field simulation from Figure 2, we calculated the branching parameter for several values of tau. The biased estimates of m are 0.76 (𝜏=50), 0.79 (𝜏=500), and 0.79 (𝜏=5000). The corrected estimates are 0.98 (𝜏=50), 0.998 (𝜏=500), and 0.9998 (𝜏=5000).

      6) In the Discussion (l 269), the paper suggests potential differences between networks cultured in vitro and in vivo. While significant differences indeed exist, it's worth noting that exponents consistent with a critical branching process have also been observed in vivo (Petermann et al 2009; Hahn et al. 2010), as well as in large-scale human data.

      We thank the reviewers for pointing out these studies, and we have added the missing one (Hahn et al. 2010) to our reference list. The following was added to the discussion, in the section “Explaining Experimental Exponents:”

      "A subset of the in vivo recordings analyzed from anesthetized cat (Hahn et al. 2010) and macaque monkeys (Petermann et al. 2009) exhibited a size distribution exponent close to 1.5."

      Along these lines, we noted two additional studies of high relevance that have been published since our initial submission (Capek et al. 2023, Lombardi et al. 2023), and we have added these references to the discussion of experimental exponents.

      Minor comments:

      1) The term 'latent variable' should be rigorously explained, as it is likely to be unfamiliar to some readers.

      Sentences and clauses have been added to the Introduction, Results and the Methods to clarify the term:

      Intro: “Numerous studies have reported relatively low-dimensional structure in the activity of large populations of neurons [refs], which can be modeled by a population of neurons that are broadly and heterogeneously coupled to multiple dynamical latent (i.e., unobserved) variables.”

      Results: “We studied a population of neurons that are not coupled to each other directly but are driven by a small number of dynamical latent variables -- that is, slowly changing inputs that are not themselves measured.”

      Methods: “Neurons are non-interacting, receiving inputs reflective of place-field selectivity as well as input current reflecting a random projection from a small number of dynamical latent variables, representing inputs shared across the population of neurons that are not directly measured.”

      2) There's a relatively important typo in the equations: Eq. 2 and Eq. 6 differ by a minus sign in the exponent. Eqs. 3 and 4 use the plus sign, but epsilon_0 on line 198 uses the minus sign. All very confusing until we figured out what was going on. But easy to fix.

      Thank you for catching this. We have made the following corrections:

      1) Figures adopted the sign convention that epsilon > 0, with larger values of epsilon decreasing the activity level. Signs in Eqs. 3 and 4 have been corrected to match.

      2) Equation 5 was missing a minus sign in front of the Hamiltonian. Restoring this minus sign fixed the discrepancy between 2 and 6.

      3) In Eq. 7, the left hand side is zeta'/zeta', which is equal to 1. Maybe it should be zeta'/zeta? Fixed, thank you.

      Additional comments:

      The authors are free to ignore these; they are meant to improve the paper.

      We are extremely grateful for the close reading of our paper and note the actions taken below.

      1) We personally would not use the abbreviation DLV; we find abbreviations extremely hard to remember. And DLV is not used that often.

      Done, thank you for the suggestion.

      2) l 198: epsilon_0 = -log(2^{1/N}-1) was kind of hard to picture -- we had to do a little algebra to make sense of it. Why not write e^{-epsilon_0} = 2^{1/N}-1 \approx log(2)/N, which in turn implies that epsilon_0 ~ log(N)?

      Thank you, good point. We have added a sentence now to better explain:

      "...which is maximized at $\epsilon_0 = - \log (2^{1/N} - 1)$, independent of $J_i$ and $\eta$. After some algebra, we find that $\epsilon_0 \sim \log N$ for large $N$."

      3) Typo on l 202: "We plot P_ava as a function of epsilon in Fig. 4B". 4B --> 4D.

      Done

      4) It would be easier on the reader if the tables were all in one place. It would be even nicer to put the parameters in the figure captions. Or at least N; that one is kind of important.

      Table placement was a Latex issue, which we have now fixed. We also have included links between tables and relevant figures and indicated network size.

      5) What's x_i in Eqs. 7 and 8?

      We added a sentence of explanation. These are the individual observations of avalanche sizes or durations, depending on what is being fit.

      6) The latent variables evolve according to an Ornstein-Uhlenbeck process. But we might equally expect oscillations or non-normal behavior coupling dynamical modes, and these are likely to give different behavior with respect to avalanches. It might be worth commenting on this.

      7) The model assumes a normal distribution of the coupling strengths between the latent variables and the binary units. Discussing the potential effects of different types of random coupling could provide interesting insights.

      Both 6 and 7 are interesting questions. At this point, we could speculate that the main results would be qualitatively unchanged, provided dynamics are sufficiently slow and that the distribution of coupling strengths is sufficiently broad (that is, there is variance in the coupling matrix across individual neurons). Further studies would be needed to make these statements more precise.

      8) In Fig 1, tau_f = 1E4 whereas in Fig 2 tau_f = 5E3. Why the difference?

      For Figure 1, we chose a set of parameters that gave clear scaling. In Figure 2, we saw some value in showing more than one example of scaling, hence different parameters for the examples in Fig 2 than Fig 1. Note that the Fig 1 simulations are represented in Fig. 2 G-J, as the 5-field simulation with tau_F = 1e4.

  2. Jan 2024
    1. Author Response

      eLife assessment

      This study presents a valuable finding on a new role of Foxp3+ regulatory T cells in sensory perception, which may have an impact on our understanding of somatosensory perception. The authors identified a previously unappreciated action of enkephalins released by immune cells in the resolution of pain and several upstream signals that can regulate the expression of the proenkephalin gene PENK in Foxp3+ Tregs. However, whereas the generation of transgenic mice with conditional deletion of PENK in Foxp3+ cells and PENK fate-mapping is novel and generates compelling data, they show an incomplete analysis of Tregs in the control and transgenic mice, proper tamoxifen controls nor the role of PENK+ skin T cells to further support their hypothesis. Nonetheless, the study would be of interest to the biologists working in the field of neuroimmunology and inflammation.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors explore mechanisms through which T-regs attenuate acute pain using a heat sensitivity paradigm. Analysis of available transcriptomic data revealed expression on the proenkephalin (Penk) gene in T-regs. The authors explore the contribution of T-reg Penk in the resolution of heat sensitivity.

      Strengths:

      Investigating the potential role of T-reg Penk in the resolution of acute pain is a strength.

      Weaknesses:

      The overall experimental design is superficial and lacks sufficient rigor to draw any meaningful conclusions.

      For instance:

      1) The were no TAM controls. What is the evidence that TAM does not alter heat-sensitive receptors.

      Author response : By comparing panel A and C, it appears that heat-sensitivity in controls (blue dots) is slightly different before and after TMX administration, suggesting that heat-sensitive receptors are moderately altered by TMX per se. However, heat sensitivity is increased by two fold in KO animals. Thus, a possible effect of TAM on heat receptors is not responsible for the heat hyperalgesia seen in KO, as shown in figure 4 and S3.

      2) There are no controls demonstrating that recombination actually occurred. How do the authors know a single dose of TAM is sufficient?

      Author response : these experiments are in progress. Specificity of the deletion will be presented in an updated version of the manuscript in the near future.

      3) Why was only heat sensitivity assessed? The behavioral tests are inadequate to derive any meaningful conclusions. Further, why wasn't the behavioral data plotted longitudinally

      Author response : We respectfuly point the reviewer to figure S3 where the longitudinal data are presented. New behavorial tests are being performed. The results will be presented in a revised version.

      Reviewer #2 (Public Review):

      Summary:

      The present study addresses the role of enkephalins, which are specifically expressed by regulatory T cells (Treg), in sensory perception in mice. The authors used a combination of transcriptomic databases available online to characterize the molecular signature of Treg. The proenkephalin gene Penk is among the most enriched transcripts, suggesting that Treg plays an analgesic role through the release of endogenous opioids. In addition, in silico analysis suggests that Penk is regulated by the TNFR superfamily; this being experimentally confirmed. Using flow cytometry analysis, the authors then show that Penk is mostly expressed in Treg of the skin and colon, compared to other immune cells. Finally, genetic conditional excision of Penk, selectively in Treg, results in heat hypersensitivity, as assessed by behavior analysis.

      Strengths:

      The manuscript is clear and reveals a previously unappreciated role of enkephalins, as released by immune cells, in sensory perception. The rationale in this manuscript is easy to follow, and conclusions are well supported by data.

      Weaknesses:

      The sensory deficit of Penk cKO appears to be quite limited compared to control littermates.

      Reviewer #3 (Public Review):

      Summary:

      Aubert et al investigated the role of PENK in regulatory T cells. Through the mining of publicly available transcriptome data, the authors confirmed that PENK expression is selectively enriched in regulatory but not conventional T cells. Further data mining suggested that OX40, 4-1BB as well as BATF, can regulate PENK expression in Tregs. The authors generated fate-mapping mice to confirm selective PENK expression in Tregs and activated effector T cells in the colon and spleen. Interestingly, transgenic mice with conditional deletion of PENK in Tregs resulted in hypersensitivity to heat, which the authors attributed to heat hyperalgesia.

      Strengths:

      The generation of transgenic mice with conditional deletion of PENK in foxp3 and PENK fate-mapping is novel and can potentially yield significant findings. The identification of upstream signals that regulate PENK is interesting but unlikely to be the main reason why PENK is predominantly expressed in Tregs as both BATF and TNFR are expressed in effector T cells.

      Weaknesses:

      There is a lack of direct evidence and detailed analysis of Tregs in the control and transgenic mice to support the authors' hypothesis. PENK was previously reported to be expressed in skin Tregs and play a significant role in regulating skin homeostasis: this should be considered as an alternative mechanism that may explain the changed sensitivity to heat observed in the paper.

      Author response : Supplementary figures are being prepared and new results are being collected to show that the KO do not perturb immune and/or skin homeostasis at the time of the experiments. These will be presented in a revised version.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Response to Reviewer Comments:

      We thank the editors and reviewers for their careful consideration of our revised manuscript. Reviewers 2 and 3 indicated that their previous comments had been satisfactorily addressed by our revisions. Reviewer 1 raised several points and our point by point responses can be found below.

      Reviewer #1 (Recommendations For The Authors):

      1) Please clarify the terminology of spontaneous recovery in your study.

      According to Rescorla RA 2004 ( http://www.learnmem.org/cgi/doi/10.1101/lm.77504.), he defines spontaneous recovery as "with the passage of time following nonreinforcement, there is some "spontaneous recovery" of the initially learned behavior. ". So in this study, I thought Test2 is spontaneous recovery while the Test1 is extinction test as most studies do. But authors seem to define spontaneous recovery from the last trial of Extinction3 to the first trial of Test1, which is confusing to me.

      We agree with the reviewer (and Rescorla, 2004) that spontaneous recovery is defined as the return of the initially learned behaviour after the passage of time. In our study, Test 1 is conducted 24-hours after the final extinction session (Extinction 3) and in our view, the return of responding following that 24-hour delay can be considered spontaneous recovery. Rescorla (2004 and elsewhere) also points out that the magnitude of spontaneous recovery may be greater with larger delays between extinction and testing. This in part motivated our second test 7 days following the last extinction session with optogenetic manipulation. We did not find evidence of greater spontaneous recovery in the test 7 days later, however, the additional extinction trials in Test 1 may have reduced the opportunity to detect such an effect.

      2) Why are E6-8 plots of Offset group in Figure 3E and F different?

      We apologise for this error and have corrected it. This was an artifact of an older version of the figure before final exclusions. The E6-8 data is now the same for panels 2E and 2F.

      3) Related to 2, Please clarify what type of data they are in Figure3E,F Figure5H, and I . If it's average, please add error bars. Also, it's hard to see the statistical significance at the current figure style.

      The data in these panels are the mean lever presses per trial as labeled on the y-axis of the figures. In our view, in this instance, error bars (or lines and other markers of significance) detract from the visual clarity of the figure. The statistical approach and outcomes are included in the figure legend and when presented alongside the figure in the final version of the paper should directly clarify these points.

      Reviewer #2 (Recommendations For The Authors):

      The authors have addressed my previous comments to my satisfaction.

      Reviewer #3 (Recommendations For The Authors):

      The authors have adequately addressed each of the points raised in my original review. The paper will make a nice contribution to the field.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      • It would be interesting if the authors would do calcium imaging or electrophysiology from LCNA neurons during appetitive extinction.

      Indeed these are interesting ideas. We have plans to pursue them but ongoing work is not yet ready for publication.

      • LC-NA neuronal responses during the omission period seem to be important for appetitive extinction as described in the manuscript (Park et al., 2013; Sara et al., 1994; Su & Cohen 2022). It would be nice to activate/inactivate LC-NA neurons during the omission period.

      Optogenetic manipulation was given for the duration of the stimulus (20 seconds; when reward should be expected contingent upon performance of the instrumental response). We believe the reviewer is suggesting briefer manipulation only at the precise time the pellet would have been expected but omitted. If so, the implementation of that is complex because animals were trained on random ratio schedules and so when exactly the pellet(s) was earned was variable and so when precisely the animal experiences “omission” is difficult to know with better temporal specificity than used in the current experiments. But we agree with the reviewer that now we see that there is an effect of LC manipulation, in future studies we could alter the behavioral task so that the timing of reward is consistent (e.g., train the animals with fixed ratio schedules or continuous reinforcement, or use a Pavlovian paradigm) where a reasonable assertion about when the outcome should occur, and thus when its absence would be detected, can be made and then manipulation given at that time to address this point.

      • Does LC-NA optoinhibition affect the expression of the conditioned response (the lever presses at early trials of Extinction 1)? It's hard to see this from the average of all trials.

      The eNpHR group responded numerically less overall during extinction. This effect appears greatest in the first extinction session, but fails to reach statistical significance [F(1,15)= 3.512, p=0.081]. Likewise, analysis of the trial by trial data for the first extinction session failed to reveal any group differences [F(1,15)= 3.512, p=0.081] or interaction [trial x group; F(1,15)=0.550, p=0.470].

      Comparison of responding in the first trial also failed to reveal group differences [F(1.15)=1.209, p=0.289]. Thus while there is a trend in the data, this is not borne out by the statistical analysis, even in early trials of the session.

      • While the authors manipulate global LC-NA neurons, many people find the heterogeneous populations in the LC. It would be great if the authors could identify the subpopulation responsible for appetitive extinction.

      We agree that it would be exciting to test whether and identify which subpopulation(s) of cells or pathway(s) are responsible for appetitive extinction. While related work has found that discrete populations of LC neurons mediate different behaviours and states, and may even have opposing effects, our initial goal was to determine whether the LC was involved in appetitive extinction learning. These are certainly ideas we hope to pursue in future work.

      Minor:

      • Why do the authors choose 10Hz stimulation?

      The stimulation parameters were based on previously published work. We have added these citations to the manuscript.

      Quinlan MAL, Strong VM, Skinner DM, Martin GM, Harley CW, Walling SG. Locus Coeruleus Optogenetic Light Activation Induces Long-Term Potentiation of Perforant Path Population Spike Amplitude in Rat Dentate Gyrus. Front Syst Neurosci. 2019 Jan 9;12:67. doi: 10.3389/fnsys.2018.00067. PMID: 30687027; PMCID: PMC6333706.

      Glennon E, Carcea I, Martins ARO, Multani J, Shehu I, Svirsky MA, Froemke RC. Locus coeruleus activation accelerates perceptual learning. Brain Res. 2019 Apr 15;1709:39-49. doi: 10.1016/j.brainres.2018.05.048. Epub 2018 May 31. PMID: 29859972; PMCID: PMC6274624.

      Vazey EM, Moorman DE, Aston-Jones G. Phasic locus coeruleus activity regulates cortical encoding of salience information. Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):E9439-E9448. doi: 10.1073/pnas.1803716115. Epub 2018 Sep 19. PMID: 30232259; PMCID: PMC6176602.

      • The authors should describe the behavior task before explaining Fig1e-g results.

      We agree that introducing the task earlier would improve clarity and have added a brief summary of the task at the beginning of the results section (before reference to Figure 1) and point the reader to the schematics that summarize training for each experiment (Figures 2A and 4D).

      NOTE R2 includes specific comments in their Public review. We have considered those as their recommendations and address them here.

      1) In such discrimination training, Pavlovian (CS-Food) and instrumental (LeverPress-Food) contingencies are intermixed. It would therefore be very interesting if the authors provided evidence of other behavioural responses (e.g. magazine visits) during extinction training and tests.

      In a discriminated operant procedure, the DS (e.g. clicker) indicates when the instrumental response will be reinforced (e.g., lever-pressing is reinforced only when the stimulus is present, and not when the stimulus is absent). This is distinct from something like a Pavlovianinstrumental transfer procedure and so we wish to just clarify that there is no Pavlovian phase where the stimuli are directly paired with food. After a successful lever-press the rat must enter the magazine to collect the food, but food is only delivered contingency upon lever-pressing and so magazine entries here are not a clear indicator of Pavlovian learning as they may be in other paradigms.

      Nonetheless, we have compiled magazine entry data which although not fully independent of the lever-press response in this paradigm, still tells us something about the animals’ expectation regarding reward delivery.

      For the ChR2 experiment, largely paralleling the results seen in the lever-press data, there were no group differences in magazine responses at the end of training [F(2,40)=2.442, p=0.100].

      Responding decreased across days of extinction (when optogenetic stimulation was given) [F(2, 80)=38.070, p<0.001], but there was no effect of group [F(2,40)=0.801, p=0.456] and no interaction between day and group [F(4,40)=1.461, p=0.222]. Although a similar pattern is seen in the test data, group differences were not statistically different in the first [F(2,40)=2.352, p=0.108] or second [F(2,40)=1.900, p=0.166] tests, perhaps because magazine responses were quite low. Thus, overall, magazine data do not present a different picture than lever-pressing, but because of the lack of statistical effects during testing, we have chosen not to include these data in the manuscript.

      For the eNpHR experiment, again a similar pattern to lever-pressing was seen. There were no group differences at the end of acquisition [F(1,15)=0.290, p=0.598]. Responding decreased across days of extinction [F(2, 30)=4.775, p=0.016] but there was no main effect of group [F(1,15)=1.188, p=0.293], and no interaction between extinction and group [F(2,30)=0.070, p=0.932]. There were no group differences in the number of magazine entries in Test 1 [F(1,15)=1.378, p=0.259] or Test 2 [F(1,15)=0.319, p=0.580].

      Author response image 1.

      Author response image 2.

      2) In Figure 1, the authors show the behavioural data of the different groups of control animals which were later collapsed in a single control group. It would be very nice if the authors could provide the data for each step of the discrimination training.

      We are a little confused by this comment. Figure 1, panels E, F, and G show the different control groups at the end of training, for each day of extinction (when manipulations occurred) and for each test, respectively. It’s not clear if there is an additional step the reviewer is interested in? We note neural manipulation only occurred during extinction sessions.

      We chose to compare the control groups initially, and finding no differences, to collapse them for subsequent analyses as this simplifies the statistical analysis substantially; when group differences are found, each of the subgroups has to be investigated (including the different controls means there are 5 groups instead of 3). It doesn’t change the story because we tested that there were not differences between controls before collapsing them, but collapsing the controls makes the presentation of the statistical data much shorter and easier to follow.

      3) Inspection of Figures 2C & 2D shows that responding in control animals is about the same at test 2 as at the end of extinction training. Therefore, could the authors provide evidence for spontaneous recovery in control animals? This is of importance given that the main conclusion of the authors is that LC stimulation during extinction training led to an increased expression of extinction memory as expressed by reduced spontaneous recovery.

      To address this we have added analyses of trial data, specifically comparison of the final 3 trials of extinction to the subsequent three trials of each test. These analyses are included on page 5 of the manuscript and additional data figures can be found as panels 2E and 2F and pasted below.

      What we observe in the trial data for controls is an increase in responding from the end of extinction to the beginning of each test, thus demonstrating spontaneous recovery. Importantly, responding in the ChR2 group does not increase from the end of extinction to the beginning of the test, illustrating that LC stimulation during extinction prevents spontaneous recovery.

      Comparison of the final three trials of Extinction to the three trials of Test 1:

      Author response image 3.

      Comparison of the final three trials of Extinction to the three trials of Test 2:

      Author response image 4.

      Halorhodopsin Experiment Tests 1 and 2, respectively.

      Author response image 5.

      4) Current evidence suggests that there are differences in LC/NA system functioning between males and females. Could the authors provide details about the allocation of male and female animals in each group?

      More females had surgical complications (excess bleeding) than males resulting in the following allocations; control group; 14 males and 8 females; ChR2 group 8 males and 7 females; offset 6 males.

      In our dataset, we did not detect sex differences in training [no main effect of sex: F(1,38)=1.097, p=0.302, sex x group interaction: F(1,38)= 1.825, p=0.185], extinction [no effect of sex; F(1,38)=0.370, p=0.547; no sex x extinction interaction: F(2,76)=0.701, p=0.499 ; no sex x extinction x group interaction: F(2,76)=2.223, p=0.115] or testing [Test 1 no effect of sex: F(1,38)=1.734, =0.196; no sex x group interaction: F(1,38)=0.009, p=0.924; Test 2 no effect of sex: F(1,38)=0.661, p=0.421; no sex x group interaction: F(1,38)=0.566, p=0.456].

      5) The histology section in both experiments looks a bit unsatisfying. Could the authors provide more details about the number of counted cells and also their distribution along the anteroposterior extent of the LC. Could the authors also take into account the sex in such an analysis?

      The antero-posterior coordinates used for cell counts and calculation of % infection rates were between -9.68 and -10.04 (Paxinos and Watson, 2007, 6th Edition) as infection rates were most consistent in this region and it was well-positioned relative to the optic probe although TH and mCherry positive cells were observed both rostral and caudal to this area. For each animal, an average of ~116+/- 25 TH-positive LC neurons as determined by DAPI and GFP positive cells were identified. Viral expression was identified by colocalized mCherry staining. Animals that did not have viral expression in the LC were not included in the experimental groups. We have added these details to the histology results on page 4.

      Males and females showed very similar infection rates (Males, 74%; Females, 72%). While sex differences, such as total number of LC cells or total LC volume have been reported (Guillamon, A. et al. 2005), Garcia-Falgueras et al. (2005) reported no differences in LC volume or number of LC neurons between male and female Long-Evans rats. So while differences may exist in the LC of Long-Evans rats, the cell counts here were comparable between groups (males, 103 +/- 27; females, 129 +/- 17; t-test, p>0.05).

      References:

      1) Garcia-Falgueras, A., Pinos, H., Collado, P., Pasaro, E., Fernandez, R., Segovia, S., & Guillamon, A. (2005). The expression of brain sexual dimorphism in artificial selection of rat strains. Brain Research, 1052(2), 130–138. https://doi.org/10.1016/j.brainres.2005.05.066

      2) Guillamon, A., De Bias, M. R., & Segovia, S. (1988). Effects of sex steroids on the of the locus coeruleus in the rat. Developmental Brain Research, 40, 306–310.

      Reviewer #3 (Recommendations For The Authors):

      MAJOR

      1) It is worth noting that responding in Group ChR2 decreased from Extinction 3 to Test 1, while responding in the other two groups appears to have remained the same. This suggests that there was no spontaneous recovery of responding in the controls; and, as such, something more must be said about the basis of the between-group differences in responding at test. This is particularly important as each extinction session involved eight presentations of the to-betested stimulus, whereas the test itself consisted of just three stimulus presentations. Hence, comparing the mean levels of performance to the stimulus across its extinction and testing overestimates the true magnitude of spontaneous recovery, which is simply not clear in the results of this study. That is, it is not clear that there is any spontaneous recovery at all and, therefore, that the basis of the difference between Group ChR2 and controls at test is in terms of spontaneous recovery.

      The reviewer is correct that there were a different number of trials in extinction vs. test sessions making direct comparison difficult and displaying the data as averages of the test session does not demonstrate spontaneous recovery per se. To address this we have added analyses of trial data and comparison of the final 3 trials of extinction to the subsequent three trials of each test. These analyses are included on page 5 and 6 of the manuscript and additional data figures can be found as panels 2E and 2F and 4 H and I, and pasted below.<br /> What we observe in the trial data for controls is an increase in responding from the end of extinction to the beginning of each test, thus demonstrating spontaneous recovery. Importantly, responding in the ChR2 group does not increase from the end of extinction to the beginning of the test, illustrating that LC stimulation during extinction prevents spontaneous recovery.

      Comparison of the final three trials of Extinction to the three trials of Test 1:

      Author response image 6.

      Comparison of the final three trials of Extinction to the three trials of Test 2:

      Author response image 7.

      Halorhodopsin Experiment Tests 1 and 2, respectively.

      Author response image 8.

      2a) Did the manipulations have any effect on the rates of lever-pressing outside of the stimulus?

      We did not detect any effect of the optogenetic manipulations on rates of lever pressing outside of the stimulus. This is demonstrated in the pre-CS intervals collected on stimulation days (i.e., extinction sessions) where we see similar response rates between controls and the ChR2 and Offset groups as shown below. There was no effect of group [F(2,40)=0.156, 0.856] or group x extinction day interaction [F(2,40)=0.146, p=0.865].

      Author response image 9.

      2b) Did the manipulations have any effect on rates of magazine entry either during or after the stimulus?

      For the ChR2 experiment, there were no group differences in magazine responses at the end of training [F(2,40)=2.442, p=0.100]. Responding decreased across days of extinction (when optogenetic stimulation was given) [F(2, 80)=38.070, p<0.001], but there was no effect of group [F(2,40)=0.801, p=0.456] and no interaction between day and group [F(4,40)=1.461, p=0.222]. Although a similar pattern is seen in the test data, group differences were not statistically different in the first [F(2,40)=2.352, p=0.108] or second [F(2,40)=1.900, p=0.166] tests, perhaps because magazine responses were quite low. Thus, overall, magazine data do not present a different picture than lever-pressing, but because of the lack of statistical effects during testing, we have chosen not to include these data in the manuscript.

      For the eNpHR experiment, again a similar pattern to lever-pressing was seen. There were no group differences at the end of acquisition [F(1,15)=0.290, p=0.598]. Responding decreased across days of extinction [F(2, 30)=4.775, p=0.016] but there was no main effect of group [F(1,15)=1.188, p=0.293], and no interaction between extinction and group [F(2,30)=0.070, p=0.932]. There were no group differences in the number of magazine entries in Test 1 [F(1,15)=1.378, p=0.259] or Test 2 [F(1,15)=0.319, p=0.580].

      Author response image 10.

      Author response image 11.

      2c) Did the manipulations affect the coupling of lever-press and magazine entry responses? I imagine that, after training, the lever-press and magazine entry responses are coupled: rats only visit the magazine after having made a lever-press response (or some number of leverpress responses). Stimulating the LC clearly had no acute effect on the performance of the lever-press response. If it also had no effect on the total number of magazine entries performed during the stimulus, it would be interesting to know whether the coupling of lever-presses and magazine entries had been disturbed in any way. One could assess this by looking at the jointdistribution of lever-presses (or runs of lever-presses) and magazine visits in each extinction session, or across the three sessions of extinction. As a proxy for this, one could look at the average latency to enter the magazine following a lever-press response (or run of leverpresses). Any differences here between the Controls and Group ChR2 would be informative with respect to the effects of the LC manipulations: that is, the results shown in Figure indicate that stimulating the LC has no acute effects on lever-pressing but protects against something like spontaneous recovery; whereas the results shown in Figure 4 indicate that inhibiting the LC facilitates the loss of responding across extinction without protecting against spontaneous recovery. The additional data/analyses suggested here would indicate whether LC stimulation had any acute effects on responding that might explain the protection from spontaneous recovery; and whether LC inhibition specifically reduced lever-pressing across extinction or whether it had equivalent effects on rates of magazine entry.

      Lever-press and magazine response data were collected trial by trial but not with the temporal resolution required for the analyses suggested by the reviewer. We do not have timestamps for magazine entries nor latency data. We can collect this type of data in future studies. At the session or trial level, magazine entries generally correspond to lever-pressing; being trained on ratio schedules, and from informal observation, rats will do several lever-presses and then check the magazine. Rates of each decrease across extinction (magazine data included in response to comment 2b. above). Optogenetic manipulation appeared to have no immediate effect on either response during extinction.

      ROCEDURAL

      1) Why were there three discriminative stimuli in acquisition: a light, white noise, and clicker?

      This was done to be consistent with and apply parameters similar to previous, related studies (Rescorla, 2006; Janak & Corbit, 2011) and to allow comparison to potential future studies that may involve stimulus compounds etc. (requiring training of multiple stimuli).

      2) Why were some rats extinguished to the noise while others were extinguished to the clicker? Were the effects of LC stimulation/inhibition dependent on the identity of the extinguished stimulus?

      Because the animals were trained with multiple stimuli, it allowed us some ability to choose amongst those stimuli to best balance response rates across groups before the key manipulations. The effects of LC manipulation did not differ between animals based on the identity of the extinguished stimulus.

      3) Did the acute effects of LC inhibition on extinction vary as a function of the stimulus identity?

      No

      4) Was the ITI in extinction the same as that in acquisition?

      Yes, the ITI was the same for acquisition and extinction sessions (variable, averaging to 90 seconds). We have added a sentence to the methods (p. 11) to reflect this.

      5) For Group Offset, when was the photo-stimulation applied in relation to the extinguished stimulus: was it immediately upon offset of the stimulus or at a later point in the ITI?

      The group label “Offset” was used to be consistent with Umaetsu et al. (2017) that delivered stimulation 50-70s after a trial. SImilarly, we mean it as discontinuous with the stimulus, not at the termination of the stimulus. We have revised the description of this group on page 11 to clarify the timing of the photostimulation as follows:

      “Animals in the Offset group (and relevant controls) underwent identical training with the exception that stimulation in extinction sessions occurred in the middle of the variable length ITI (45s after stimulus termination, on average).”

      MINOR

      1) "Such recovery phenomena undermine the success of extinction-based therapies..."

      ***Perhaps a different phrasing is needed here: "These phenomena show that extinction-based therapies are not always effective in suppressing an already-established response..."

      We have revised this sentence in line with the reviewer’s suggestion:

      “These phenomena mean that extinction-based therapies are not always successful in suppressing previously-established behaviours” (first paragraph of the introduction).

      2) Typo in para 1 of results: "F(2,19)=0.0.352"

      Thank you for finding this typo. It has been corrected. (p.4)

      3) "As another example of modular functional organization, no improvements to strategy setshifting following global LC stimulation, but improvements were observed when LC terminals in the medial prefrontal cortex were targeted (Cope et al., 2019)." ***This sentence is missing a "there were" before "no improvements".

      Thank you for finding this error. It has been corrected. (p.8)

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The Roco proteins are a family of GTPases characterized by the conserved presence of an ROC-COR tandem domain. How GTP binding alters the structure and activity of Roco proteins remains unclear. In this study, Galicia C et al. took advantage of conformation-specific nanobodies to trap CtRoco, a bacterial Roco, in an active monomeric state and determined its high-resolution structure by cryo-EM. This study, in combination with the previous inactive dimeric CtRoco, revealed the molecular basis of CtRoco activation through GTP-binding and dimer-to-monomer transition.

      Strengths:

      The reviewer is impressed by the authors' deep understanding of the CtRoco protein. Capturing Roco proteins in a GTP-bound state is a major breakthrough in the mechanistic understanding of the activation mechanism of Roco proteins and shows similarity with the activation mechanism of LRRK2, a key molecule in Parkinson's disease. Furthermore, the methodology the authors used in this manuscript - using conformation-specific nanobodies to trap the active conformation, which is otherwise flexible and resistant to single-particle average - is highly valuable and inspiring.

      Weakness:

      Though written with good clarity, the paper will benefit from some clarifications.

      1) The angular distribution of particles for the 3D reconstructions should be provided (Figure 1 - Sup. 1 & Sup. 2).

      The supplementary figures will be adapted to include particle distribution plots.

      2) The B-factors for protein and ligand of the model, Map sharpening factor, and molprobity score should be provided (Table 1).

      The map used to interpret the model was post-processed by density modification, therefore no sharpening factor was obtained. This information will be included in Table 1, together with B-factors and molprobity scores.

      3) A supplemental Figure to Figure 2B, illustrating how a0-helix interacts with COR-A&LRR before and after GTP binding in atomic details, will be helpful for the readers to understand the critical role of a0-helix during CtRoco activation.

      A supplemental figure will be prepared to illustrate this in the revised document.

      4) For the following statement, "On the other hand, only relatively small changes are observed in the orientation of the Roc a3 helix. This helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022), is located at the interface of the Roc and CORB domains and harbors the residues H554 and Y558, orthologous to the LRRK2 PD mutation sites N1337 and R1441, respectively."

      It is not surprising the a3-helix of the ROC domain only has small changes when the ROC domain is aligned (Figure 2E). However, in the study by Zhu et al (DOI: 10.1126/science.adi9926), it was shown that a3-helix has a "see-saw" motion when the COR-B domain is aligned. Is this motion conserved in CtRoco from inactive to active state?

      We indeed describe the conformational changes from the perspective of the Roc domain. When using the COR-B domain for structural alignment, a rotational movement of Roc (including a “seesaw”-like movement of the α3-helix helix around His554) with respect to COR-B is correspondingly observed. We will include this in the revised document.

      5) A supplemental figure showing the positions of and distances between NbRoco1 K91 and Roc K443, K583, and K611 would help the following statement. "Also multiple crosslinks between the Nbs and CtRoco, as well as between both nanobodies were found. ... NbRoco1-K69 also forms crosslinks with two lysines within the Roc domain (K583 and K611), and NbRoco1-K91 is crosslinked to K583".

      A provisional figure displaying these crosslinks is already provided below, and we will also consider including this in the revised manuscript. However, in interpreting these crosslinks it should be taken into consideration that the additive length of the DSSO spacer and the lysine side chains leads to a theoretical upper limit of ∼26 Å for the distance between the α carbon atoms of cross-linked lysines (and even a cut-off distance of 35 Å when taking into account protein dynamics).

      Author response image 1.

      6) It would be informative to show the position of CtRoco-L487 in the NF and GTP-bound state and comment on why this mutation favors GTP hydrolysis.

      We will create an additional figure showing the position of L487, and discuss possible mechanisms for the observed effect of a mutation on GTPase activity.

      Reviewer #2 (Public Review):

      Summary

      The manuscript by Galicia et al describes the structure of the bacterial GTPyS-bound CtRoco protein in the presence of nanobodies. The major relevance of this study is in the fact that the CtRoco protein is a homolog of the human LRRK2 protein with mutations that are associated with Parkinson's disease. The structure and activation mechanisms of these proteins are very complex and not well understood. Especially lacking is a structure of the protein in the GTP-bound state. Previously the authors have shown that two conformational nanobodies can be used to bring/stabilize the protein in a monomer-GTPyS-bound state. In this manuscript, the authors use these nanobodies to obtain the GTPyS-bound structure and importantly discuss their results in the context of the mammalian LRRK2 activation mechanism and mutations leading to Parkinson's disease. The work is well performed and clearly described. In general, the conclusions on the structure are reasonable and well-discussed in the context of the LRRK2 activation mechanism.

      Strengths:

      The strong points are the innovative use of nanobodies to stabilize the otherwise flexible protein and the new GTPyS-bound structure that helps enormously in understanding the activation cycle of these proteins.

      Weakness:

      The strong point of the use of nanobodies is also a potential weak point; these nanobodies may have induced some conformational changes in a part of the protein that will not be present in a GTPyS-bound protein in the absence of nanobodies.

      Two major points need further attention.

      1) Several parts of the protein are very flexible during the monomer-dimer activity cycle. This flexibility is crucial for protein function, but obviously hampers structure resolution. Forced experiments to reduce flexibility may allow better structure resolution, but at the same time may impede the activation cycle. Therefore, careful experiments and interpretation are very critical for this type of work. This especially relates to the influence of the nanobodies on the structure that may not occur during the "normal" monomer-dimer activation cycle in the absence of the nanobodies (see also point 2). So what is the evidence that the nanobody-bound GTPyS-bound state is biochemically a reliable representative of the "normal" GTP-bound state in the absence of nanobodies, and therefore the obtained structure can be confidentially used to interpret the activation mechanism as done in the manuscript.

      See below for an answer to remark 1 and 2.

      2) The obtained structure with two nanobodies reveals that the nanobodies NbRoco1 and NbRoco2 bind to parts of the protein by which a dimer is impossible, respectively to a0-helix of the linker between Roc-COR and LRR, and to the cavity of the LRR that in the dimer binds to the dimerizing domain CORB. It is likely the open monomer GTP-bound structure is recognized by the nanobodies in the camelid, suggesting that overall the open monomer structure is a true GTP-bound state. However, it is also likely that the binding energy of the nanobody is used to stabilize the monomer structure. It is not automatically obvious that in the details the obtained nonobody-Roco-GTPyS structure will be identical to the "normal" Roco-GTPyS structure. What is the influence of nanobody-binding on the conformation of the domains where they bind; the binding energy may be used to stabilize a conformation that is not present in the absence of the nanobody. For instance, NbRoco1 binds to the a0 helix of the linker; what is here the "normal" active state of the Roco protein, and is e.g. the angle between RocCOR and LRR also rotated by 135 degrees? Furthermore, nanobody NbRoco2 in the LRR domain is expected to stabilize the LRR domain; it may allow a position of the LRR domain relative to the rest of the protein that is not present without nanobody in the LRR domain. I am convinced that the observed open structure is a correct representation of the active state, but many important details have to be supported by e,g, their CX-MS experiments, and in the end probably need confirmation by more structures of other active Roco proteins or confirmation by a more dynamic sampling of the active states by e.g. molecular dynamics or NMR.

      Recently, nanobodies have increasingly been used successfully to obtain structural insights in protein conformational states (reviewed in Uchański et al, Curr. Opin. Struc. Biol. 2020). As reviewer # 2 points out, the concern is sometimes raised that antibodies could distort a protein into non-native conformations. Here, it is important to note that the nanobodies were raised by immunizing a llama with the fully native CtRoco protein bound to a non-hydrolysable GTP analogue, after which the nanobodies were selected by phage display using the same fully native and functional form of the protein. As clearly explained in Manglik et al. Annu Rev Pharmacol Toxicol. 2017, the probability of an in vivo matured nanobody inducing a non-native conformation of the antigen is low, although it is possible that it selects a high-energy, low-population conformation of a dynamic protein. Immature B cells require engagement of displayed antibodies with antigen to proliferate and differentiate during clonal selection. Antibodies that induce non-native conformations of the antigen pay a substantial energetic penalty in this process, and B cell clones displaying such antibodies will have a significantly lower probability of proliferation and differentiation into mature antibody-secreting B lymphocytes. Hence, many recent experiments and observation give credence to the notion that nanobodies bind antigens primarily by conformational selection and not induced fit (e.g. Smirnova et al. PNAS 2015).

      Extrapolated to the case of CtRoco, which is clearly very flexible in its GTP-bound form, this means that the nanobodies are able to trap and stabilize one conformational state that is representative of the “active state” ensemble of the protein. In this respect, it is clear from our experiments (XL-MS, affinity and effect on GTPase activity) that the effects of NbRoco1 and NbRoco2 are additive (or even cooperative), meaning that both nanobodies recognize different features of the same CtRoco “active state”. Correspondingly, the monomeric, elongated “open” conformation is also observed in the structure of CtRoco bound to NbRoco1 only (Figure1 - supplement 2), albeit that this structure still displays more flexibility. The monomerization and conformational changes that we observe and describe in the current paper at high resolution are also in very good agreement with earlier observations for CtRoco in the GTP-bound form in absence of any nanobodies, including negative stain EM (Deyaert et al. Nature Commun, 2017), hydrogen-deuterium exchange experiments (Deyaert et al. Biochem. J. 2019) and native MS (Leemans et al. Biochem J. 2020).

      In the revised document we will include some additional text to address and clarify these aspects.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The present study provides a phylogenetic analysis of the size prefrontal areas in primates, aiming to investigate whether relative size of the rostral prefrontal cortex (frontal pole) and dorsolateral prefrontal cortex volume vary according to known ecological or social variables.

      I am very much in favor of the general approach taken in this study. Neuroimaging now allows us to obtain more detailed anatomical data in a much larger range of species than ever before and this study shows the questions that can be asked using these types of data. In general, the study is conducted with care, focusing on anatomical precision in definition of the cortical areas and using appropriate statistical techniques, such as PGLS. That said, there are some points where I feel the authors could have taken their care a bit further and, as a result, inform the community even more about what is in their data.

      We thank the reviewer for this globally positive evaluation of our work, and we appreciate the advices to improve our manuscript.

      The introduction sets up the contrast of 'ecological' (mostly foraging) and social variables of a primate's life that can be reflected in the relative size of brain regions. This debate is for a large part a relic of the literature and the authors themselves state in a number of places that perhaps the contrast is a bit artificial. I feel that they could go further in this. Social behavior could easily be a solution to foraging problems, making them variables that are not in competition, but simply different levels of explanation. This point has been made in some of the recent work by Robin Dunbar and Susanne Shultz.

      Thank you for this constructive comment, and we acknowledge that the contrast between social vs ecological brain is relatively marginal here. Based also on the first remark by reviewer 3, we have reformulated the introduction to emphasize what we think is actually more critical: the link between cognitive functions as defined in laboratory conditions and socio-ecological variables measured in natural conditions. And the fact that here, we use brain measures as a potential tool to relate these laboratory vs natural variables through a common scenario. Also, we were already mentioning the potential interaction between social and foraging processes in the discussion, but we are happy to add a reference to recent studies by S. Shultz and R. Dunbar (2022), which is indeed directly relevant. We thank the reviewer for pointing out this literature.

      In a similar vein, the hypotheses of relating frontal pole to 'meta-cognition' and dorsolateral PFC to 'working memory' is a dramatic oversimplification of the complexity of cognitive function and does a disservice to the careful approach of the rest of the manuscript.

      We agree that the formulation of which functions we were attributing to the distinct brain regions might not have been clear enough, but the functional relation between frontal pole and metacognition in the one hand, and DLPFC and working memory on the other hand, have been firmly established in the literature, both through laboratory studies and through clinical data. Clearly, no single brain region is necessary and sufficient for any cognitive operation, but decades of neuropsychology have demonstrated the differential implication of distinct brain regions in distinct functions, which is all we mean here. We have made a specific point on that topic in the discussion (cf p. 16). We have also reformulated the introduction to clarify that, even if the relation between these regions and their functions (FP/ metacognition; DLPFC/ working memory) was clear in laboratory conditions, it was not clear whether this mapping could be used for real life conditions. And therefore whether that simplification was somehow justified beyond the lab (and the clinics), and whether these neuro-cognitive concepts could be applied to natural conditions, are indeed critical questions that we wanted to address. The central goal of the present study was precisely to evaluate the extent to which this brain/cognition relation could be used to understand more natural behaviors and functions, and we hope that it appears more clearly now.

      One can also question the predicted relationship between frontal pole meta-cognition and social abilities versus foraging, as Passingham and Wise show in their 2012 book that it is frontal pole size that correlates with learning ability-an argument that they used to relate this part of the brain to foraging abilities. I would strongly suggest the authors refrain from using such descriptive terms. Why not simply use the names of the variables actually showing significant correlations with relative size of the areas?

      We basically agree with the reviewer, and we acknowledge the lack of clarity in the introduction of the previous manuscript. There were indeed lots of ambiguity in what we were referring to as ‘function’, associated with a given brain region. « Function » referred to way to many things! We have reformulated the introduction not only to clarify the different types of functions that were attributed to distinct brain regions in the literature but also to clarify how this study was addressing the question: by trying to articulate concepts from neuroscience laboratory studies with concepts from behavioral ecology and evolution using intuitive scenarios. We hope that the present version of the introduction makes that point clearer.

      The major methodological judgements in this paper are of course in the delineation of the frontal pole and dorsolateral prefrontal cortex. As I said above, I appreciate how carefully the authors describe their anatomical procedure, allowing researchers to replicate and extend their work. They are also careful not to relate their regions of interest to precise cytoarchitectonic areas, as such a claim would be impossible to make without more evidence. That said, there is a judgement call made in using the principal sulcus as a boundary defining landmark for FP in monkeys and the superior frontal sulcus in apes. I do not believe that these sulci are homologous. Indeed, the authors themselves go on to argue that dorsolateral prefrontal cortex, where studied using cytoarchitecture, stretches to the fundus of principal sulcus in monkeys, but all the way to the inferior frontal sulcus in apes. That means that using the fundus of PS is not a good landmark.

      We thank the reviewer for his kind remarks on our careful descriptions. But then, it is not clear whether our choice of using the principal sulcus as a boundary for FP in monkeys vs the superior frontal sulcus in apes is actually a judgement call. First, and foremost, there is no clear and unambiguous definition of what should be the boundaries of the FP. By contrast with cytoarchitectonic maps, but clearly this is out of reach here. In humans and great apes we used Bludau et al 2014 (i.e. sup frontal sulcus), and in monkeys, we chose a conservative landmark that eliminated area 9, which is traditionally associated with the DLPFC (Petrides, 2005; Petrides et al, 2012; Semendeferi et al, 2001).

      Of course, any definition will attract criticism, so the best solution might be to run the analysis multiple times, using different definitions for the areas, and see how this affects results.

      Indeed, functional maps indicate that dorsal part of anterior PFC in monkeys is functionally part of FP. But again, cytoarchitectonic maps also indicate that this part of the brain includes BA 9, which is traditionally associated with DLPFC (Petrides, 2005; Petrides et al, 2012). As already pointed out in the discussion, there is a functional continuum between FP and DLPFC and our goal when using PS as dorsal border was to be very conservative and to exclude the ambiguous area. But we agree with the reviewer that given that this decision is arbitrary, it was worth exploring other definitions of the FP volume. So, we did complete a new analysis with a less conservative definition of the FP, to include this ambiguous dorsal area, and it is now included in the supplementary material. Maybe as expected, including the ambiguous area in the FP volume shifted the relation with socio-ecological variables towards the pattern displayed by the DLPFC (ie the influence of population density decreased). The most parsimonious interpretation of this results is that when extending the border of the FP region to cover a part of the brain which might belong to the DLPFC, or which might be somehow functionally intermediate between the 2, the specific relation of the FP with socio-ecological variables decreases. Thus, even if we agree that it was important to conduct this analysis, we believe that it only confirms the difficulty to identify a clear boundary between FP and DLPFC. Again, we have clearly explained throughout the manuscript that we admit the lack of precision in our definitions of the functional brain regions. In that frame, the conservative option seems more appropriate and for the sake of clarity, the results of the additional analysis of a FP volume that includes the ambiguous area is only included in the supplementary material.

      If I understand correctly, the PGLS was run separately for the three brain measure (whole brain, FP, DLPFC). However, given that the measures are so highly correlated, is there an argument for an analysis that allows testing on residuals. In other words, to test effects of relative size of FP and DLPFC over and above brain size?

      Generally, using residuals as “data” (or pseudo-data) is not recommended in statistical analyses. Two widely cited references from the ecological literature are:

      Garcia-Berthou E. (2001) On the Misuse of Residuals in Ecology: Testing Regression Residuals vs. the Analysis of Covariance. Journal of Animal Ecology, 70 (4): 708-711.

      Freckleton RP. (2002). On the misuse of residuals in ecology: regression of residuals vs. multiple regression. Journal of Animal Ecology 71: 542–545. https://doi.org/10.1046/ j.1365-2656.2002.00618.x.

      The main reason for this recommendation is that residuals are dependent on the fitted model, and thus on the particular sample under consideration and the eventual significant effects that can be inferred.

      In the discussion and introduction, the authors discuss how size of the area is a proxy for number of neurons. However, as shown by Herculano-Houzel, this assumption does not hold across species. Across monkeys and apes, for instance, there is a different in how many neurons can be packed per volume of brain. There is even earlier work from Semendeferi showing how frontal pole especially shows distinct neuron-to-volume ratios.

      We appreciate the reviewer’s comment, but the references to Herculano-Houzel that we have in mind do indicate that the assumption is legitimate within primates.

      Herculano-Houzel et al (2007) show that the neuronal density of the cortex is well conserved across primate species (but only monkeys were studied). The conclusion of that study is that using volumes as a proxy for number of neurons, as a measure of computational capacity, should be avoided between rodents and primates (and as they showed later, even more so with birds, for which neuronal density is higher). BUT within primates, since neuronal densities are conserved, volume is a good predictor of number of neurons. Gabi et al (2016) provide evidence that the neuronal density of the PFC is well conserved between humans and non-human primates, which implies that including humans and great apes in the comparison is legitimate. In addition, the brain regions included in the analysis presumably include very similar architectonic regions (e.g. BA 10 for FP, BA 9/46 for DLPFC), which also suggests that the neuronal density should be relatively well conserved across species. Altogether, we believe that there is sufficient evidence to support the idea that the volume of a PFC region in primates is a good proxy for the number of neurons in that region, and therefore of its computational capacity.

      Semendeferi and colleagues (2001) pointed out some differences in cytoarchitectonic properties across parts of the FP and discussed how these properties could 1) be used to identify area 10 across species 2) be associated with distinct computational properties, with the idea that thicker ‘cell body free’ layers would leave more space for establishing connections (across dendrites and axons). This pioneering work, together with more recent imaging studies on functional connectivity (e.g. Sallet et al, 2013) emphasize the critical contribution of connectivity pattern as a tool for comparative anatomy. But unfortunately, as pointed out in the discussion already, this is currently out of reach for us.

      We acknowledge the limitations, and to be fair, the notion of computational capacity itself is hard to define operationally. Based on the work of Herculano-Houzel et al, average density is conserved enough across primates (including humans) to justify our approximation. We have tried to define our regions of interest using both anatomical and functional maps and, thanks to the reviewer’s suggestions, we even tried several ways to segment these regions. Functional maps in macaques and humans do not exactly match cytoarchitectonic maps, presumably because functions rely not only upon the cytoarchitectonics but also on connectivity patterns (e.g. Sallet et al, 2013).

      In sum, we appreciate the reviewer’s point but feel that, given the current understanding of brain functions and the relative conservation of neuronal density across primate PFC regions, the volume of a PFC region seems to be reasonable proxy for its number of neurons, and therefore its computational capacity. We have added these points to the discussions, and we hope that the reader will be able to get a fair sense of how legitimate is that position, given the literature.

      Overall, I think this is a very valuable approach and the study demonstrates what can now be achieved in evolutionary neuroscience. I do believe that they authors can be even more thorough and precise in their measurements and claims.

      Reviewer #2 (Public Review):

      In the manuscript entitled "Linking the evolution of two prefrontal brain regions to social and foraging challenges in primates" the authors measure the volume of the frontal pole (FP, related to metacognition) and the dorsolateral prefrontal cortex (DLPFC, related to working memory) in 16 primate species to evaluate the influence of socio-ecological factors on the size of these cortical regions. The authors select 11 socio-ecological variables and use a phylogenetic generalized least squares (PGLS) approach to evaluate the joint influence of these socio-ecological variables on the neuro-anatomical variability of FP and DLPFC across the 16 selected primate species; in this way, the authors take into account the phylogenetic relations across primate species in their attempt to discover the influence of socio-ecological variables on FP and DLPF evolution.

      The authors run their studies on brains collected from 1920 to 1970 and preserved in formalin solution. Also, they obtained data from the Mussée National d´Histoire Naturelle in Paris and from the Allen Brain Institute in California. The main findings consist in showing that the volume of the FP, the DLPFC, and the Rest of the Brain (ROB) across the 16 selected primate species is related to three socio-ecological variables: body mass, daily traveled distance, and population density. The authors conclude that metacognition and working memory are critical for foraging in primates and that FP volume is more sensitive to social constraints than DLPFC volume.

      The topic addressed in the present manuscript is relevant for understanding human brain evolution from the point of view of primate research, which, unfortunately, is a shrinking field in neuroscience.

      We must not have been clear enough in our manuscript, because our goal is precisely not to separate humans from other primates. This is why, in contrast to other studies, we have included human and non-human primates in the same models. If our goal had been to study human evolution, we would have included fossil data (endocasts) from the human lineage.

      But the experimental design has two major weak points: the absence of lissencephalic primates among the selected species and the delimitation of FP and DLPFC. Also, a general theoretical and experimental frame linking evolution (phylogeny) and development (ontogeny) is lacking.

      We admit that lissencephalic species could not be included in this study because we use sulci as key landmarks. We believe that including lissencephalic primates would have introduced a bias and noise in our comparisons, as the delimitations and landmarks would have been different for gyrencephalic and lissencephalic primates. Concerning development, it is simply beyond the scope of our study.

      Major comments.

      1) Is the brain modular? Is there modularity in brain evolution?: The entire manuscript is organized around the idea that the brain is a mosaic of units that have separate evolutionary trajectories:

      "In terms of evolution, the functional heterogeneity of distinct brain regions is captured by the notion of 'mosaic brain', where distinct brain regions could show a specific relation with various socio-ecological challenges, and therefore have relatively separate evolutionary trajectories".

      This hypothesis is problematic for several reasons. One of them is that each evolutionary module of the brain mosaic should originate in embryological development from a defined progenitor (or progenitors) domain [see García-Calero and Puelles (2020)]. Also, each evolutionary module should comprise connections with other modules; in the present case, FP and DLPFC have not evolved alone but in concert with, at least, their corresponding thalamic nuclei and striatal sector. Did those nuclei and sectors also expand across the selected primate species? Can the authors relate FP and DLPFC expansion to a shared progenitor domain across the analyzed species? This would be key to proposing homology hypotheses for FP and DLPFC across the selected species. The authors use all the time the comparative approach but never explicitly their criteria for defining homology of the cerebral cortex sectors analyzed.

      We do not understand what the referee is referring to with the word ‘module’, and why it relates to development. Same thing for the anatomical relation with subcortical structures. Yes, the identity of distinct functional cortical regions relies upon subcortical inputs during development, but clearly this is neither technically feasible, nor relevant here anyways.

      We acknowledge, however, that our definition of functional regions was not precise enough, and we have updated the introduction to clarify that point. In short, we clearly do not want to make a strong case for the functional borders that we chose for the regions of interest here (FP and DLPFC), but rather use those regions as proxies for their corresponding functions as defined in laboratory conditions for a couple of species (rhesus macaques and humans, essentially).

      Contemporary developmental biology has showed that the selection of morphological brain features happens within severe developmental constrains. Thus, the authors need a hypothesis linking the evolutionary expansion of FP and DLPFC during development. Otherwise, the claims form the mosaic brain and modularity lack fundamental support.

      Once again, we do not think that our definition of modules matches what the reviewer has in mind, i.e. modules defined by populations of neurons that developed together (e.g. visual thalamic neurons innervating visual cortices, themselves innervating visual thalamic neurons). Rather, the notion of mosaic brain refers to the fact that different parts of the brain are susceptible to distinct (but not necessarily exclusive) sources of selective pressures. The extent to which these ‘developmental’ modules are related to ‘evolutionary’ modules is clearly beyond the scope of this paper.

      Our goal here was to evaluate the extent to which modules that were defined based on cognitive operations identified in laboratory conditions could be related (across species) to socio-ecological factors as measured in wild animals. Again, we agree that the way these modules/ functional maps were defined in the paper were confusing, and we hope that the new version of the manuscript makes this point clearer.

      Also, the authors refer most of the time to brain regions, which is confusing because they are analyzing cerebral cortex regions.

      We do not understand why the term ‘brain’ is more confusing than ‘cerebral cortex’, especially for a wide audience.

      2) Definition and delimitation of FP and DLPFC: The precedent questions are also related to the definition and parcellation of FP and DLPFC. How homologous cortical sectors are defined across primate species? And then, how are those sectors parcellated?

      The authors delimited the FP:

      "...according to different criteria: it should match the functional anatomy for known species (macaques and humans, essentially) and be reliable enough to be applied to other species using macroscopic neuroanatomical landmarks".

      There is an implicit homology criterion here: two cortical regions in two primate species are homologs if these regions have similar functional anatomy based on cortico-cortical connections. Also, macroscopic neuroanatomical landmarks serve to limit the homologs across species.

      This is highly problematic. First, because similar function means analogy and not necessarily homology [for further explanation see Puelles et al. (2019); García-Cabezas et al. (2022)].

      We are not sure to follow the Reviewer’s point here. First, it is not clear what would be the evolutionary scenario implied by this comment (evolutionary divergence followed by reversion leading to convergence?). Second, based on the literature, both the DLPFC and the FP display strong similarities between macaques and humans, in terms of connectivity patterns (Sallet et al, 2013), in terms of lesion-induced deficit and in terms of task-related activity (Mansouri et al, 2017). These criteria are usually sufficient to call 2 regions functionally equivalent. We do not see how this explanation is "highly problematic" as it is clearly the most parsimonious based on our current knowledge.

      Second, because there are several lissencephalic primate species; in these primates, like marmosets and squirrel monkeys, the whole approach of the authors could not have been implemented. Should we suppose that lissencephalic primates lack FP or DLPFC?

      We understand neither the reviewer’s logic, nor the tone. We understand that the reviewer is concerned by the debate on whether some laboratory species are more relevant than others for studying the human prefrontal cortex, but this is clearly not the objective of our work. As explained in the manuscript, we identified FP and DLPFC based on functional maps in humans and laboratory monkeys (macaques), and we used specific gyri as landmarks that could be reliably used in other species. And, as rightfully pointed out by reviewer 1, this is in and off itself not so trivial. Of course, lissencephalic animals could not be studied because we could not find these landmarks, but why would it mean that they do not have a prefrontal cortex? The reviewer implies that species that we did not study do not have a prefrontal cortex, which makes little sense. Standards in the field of comparative anatomy of the PFC, especially when it implies rodents (lissencephalic also) include cytoarchitectonic and connectivity criteria, but obviously we are not in a position to address it here. We have, however, included references to the seminal work of Angela Roberts and collaborator in the discussion on marmosets prefrontal functions, to reinforce the idea that the functional organization is relatively well conserved across all primates (with or without gyri on their brain) (Dias et al, 1996; Roberts et al, 2007).

      Do these primates have significantly more simplistic ways of life than gyrencephalic primates? Marmosets and squirrel monkeys have quite small brains; does it imply that they have not experience the influence of socio-ecological factors on the size of FP, DLPFC, and the rest of the brain?

      Again, none of this is relevant here, because we could not draw conclusions on species that we cannot study for methodological reasons. The reviewer seems to believe that an absence of evidence is equivalent to an evidence of absence, but we do not.

      The authors state that:

      "the strong development of executive functions in species with larger prefrontal cortices is related to an absolute increase in number of neurons, rather than in an increase in the ration between the number of neurons in the PFC vs the rest of the brain".

      How does it apply to marmosets and squirrel monkeys?

      Again, we do not understand the reviewer’s point, since it is widely admitted that lissencephalic monkeys display both a prefrontal cortex and executive functions (again, see the work of Angela Roberts cited above). Our goal here was certainly not to get into the debate of what is the prefrontal cortex in a handful of laboratory species, but to evaluate the relevance of laboratory based neuro-cognitive concepts for understanding primates in general, and in their natural environment.

      References:

      García-Cabezas MA, Hacker JL, Zikopoulos B (2022) Homology of neocortical areas in rats and primates based on cortical type analysis: an update of the Hypothesis on the Dual Origin of the Neocortex. Brain structure & function Online ahead of print. doi:doi.org/ 10.1007/s00429-022-02548-0

      García-Calero E, Puelles L (2020) Histogenetic radial models as aids to understanding complex brain structures: The amygdalar radial model as a recent example. Front Neuroanat 14:590011. doi:10.3389/fnana.2020.590011

      Nieuwenhuys R, Puelles L (2016) Towards a New Neuromorphology. doi:10.1007/978-3-319-25693-1

      Puelles L, Alonso A, Garcia-Calero E, Martinez-de-la-Torre M (2019) Concentric ring topology of mammalian cortical sectors and relevance for patterning studies. J Comp Neurol 527 (10):1731-1752. doi:10.1002/cne.24650

      Reviewer #3 (Public Review):

      This is an interesting manuscript that addresses a longstanding debate in evolutionary biology - whether social or ecological factors are primarily responsible for the evolution of the large human brain. To address this, the authors examine the relationship between the size of two prefrontal regions involved in metacognition and working memory (DLPFC and FP) and socioecological variables across 16 primate species. I recommend major revisions to this manuscript due to: 1) a lack of clarity surrounding model construction; and 2) an inappropriate treatment of the relative importance of different predictors (due to a lack of scaling/normalization of predictor variables prior to analysis). My comments are organized by section below:

      We thank the reviewer for the globally positive evaluation and for the constructive remarks. Introduction:

      • Well written and thorough, but the questions presented could use restructuring.

      Again, we thank the reviewer, and we believe that this is coherent with some of the remarks of reviewer 1. We have extensively revised the introduction, toning down the social vs ecological brain issue to focus more on what is the objective of the work (evaluating the relevance of lab based neuro-cognitive concepts for understanding natural behavior in primates).

      Methods:

      • It is unclear which combinations of models were compared or why only population density and distance travelled tested appear to have been included.

      The details of the model comparison analysis were presented as a table in the supplementary material (#3, details of the model comparison data), but we understand that this was not clear enough. We have provided more explanation both in the main manuscript and in the supplements. All variables were considered a priori; however, we proceeded beforehand to an exploratory analyses which led us to exclude some variables because of their lack of resolution (not enough categories for qualitative variables) or strong cross-correlations with other quantitative variables. There were much more than three variables included in the models but the combination of these 3 (body mass, daily traveled distance and population density) best predicted (had the smallest AIC) the size of the brain regions. We provide additional information about these exploratory analyses in the supplementary material, sections 2 and 3.

      • Brain size (vs. body size) should be used as a predictor in the models.

      We do not understand the theoretical reason for replacing body size by brain size in the models. Brain size is not a socio-ecological variable. And of course, that would be impossible for modeling brain size itself. Or is it that the reviewer suggests to use brain size as a covariate to evaluate the effects of other variables in the model over and above the effect on brain size? But what is the theoretical basis for this?

      • It is not appropriate to compare the impact of different predictors using their coefficients if the variables were not scaled prior to analysis.

      We thank the Reviewer for this comment; however, standardized coefficients are not unproblematic because their calculations are based on the estimated standard-deviations of the variables which are likely to be affected by sampling (in effect more than the means). We note that the methods of standardized coefficients have attracted several criticisms in the literature (see the References section in https://en.wikipedia.org/wiki/Standardized_coefficient). Nevertheless, we now provide a table with these coefficients which makes an easy comparison for the present study. We also updated tables 1, 2 and 3 to include standardized beta values.

      Reviewer #1 (Recommendations For The Authors):

      N/A

      Reviewer #2 (Recommendations For The Authors):

      Contemporary developmental biology has showed that the brain of all mammals, including primates, develops out of a bauplan (or blueprint) made of several fundamental morphological units that have invariant topological relations across species (Nieuwenhuys and Puelles 2016).

      At some point in the discussion the authors acknowledge that:

      "Our aim here was clearly not to provide a clear identification of anatomical boundaries across brain regions in individual species, as others have done using much finer neuroanatomical methods. Such a fine neuroanatomical characterization appears impossible to carry on for a sample size of species compatible with PGLS".

      I do not think it would be impossible to carry such neuroanatomical characterization. It would take time and effort, but it is feasible. Such characterization, if performed within the framework of contemporary developmental biology, would allow for well-founded definition and delineation of cortical sectors across primate species, including lissencephalic ones, and would allow for meaningful homologies and interspecies comparisons.

      We do not see how our work would benefit from developmental biology at that point, because it is concerned with evolution, and these are very distinct biological phenomena. We do not understand the reviewer’s focus on lissencephalic species, because they are not so prevalent across primates, and it is unlikely that adding a couple of lissencephalic species will change much to the conclusions.

      Minor points:

      • Please, format references according to the instructions of the journal.

      Ok - done

      • The authors could use the same color code across Figures 1, 2, and 3.

      Ok – done

      • The authors say that group hunting "only occurs in a few primate species", but it also occurs in wolves, whales, and other mammalian species.

      We focus on primates here, these other species are irrelevant. Again, this is beside the point.

      Reviewer #3 (Recommendations For The Authors):

      My comments are organized by section below:

      Introduction:

      • Well written and thorough

      • The two questions presented towards the end of the intro are not clear and do not guide the structure of the methods/results sections. I believe one it would be more appropriate to ask if: 1) the relative proportions of the FP and DLPFC (relative to ROB) are consistent across primates; and 2) if the relative size of these region is best predicted by social and/ or ecological variables. Then, the results sections could be organized according to these questions (current results section 1 = 1; current results sections 2, 3, 4 = 2.1, 2.2, 2.3)

      As explained above, we agree with the reviewer that the introduction was somehow misleading and we have edited it extensively. We do not, however, agree with the reviewer regarding the relative (vs absolute) measure. We have discussed this in our response to reviewer 1 regarding the comparison of regional volumes as proxies for number of neurons. The best predictor of the computing capacity of a brain region is its number of neurons, but there is no reason to believe that this capacity should decrease if the rest of the brain increases, as implied by the relative measure that the reviewer proposes. That debate is probably critical in the field of comparative neuroanatomy, and confronting different perspectives would surely be both interesting and insightful, but we feel that it is beyond the scope of the present article.

      Methods:

      • While the methods are straightforward and generally well described, it is unclear which combinations of models were compared or why only population density and distance travelled tested appear to have been included (in e.g., Fig SI 3.1) even though many more variables were collected.

      We agree that this was not clear enough, and we have tried to improve the description of our model comparison approach, both in the main text and in the supplementary material.

      • Why was body mass rather than ROB used as a predictor in the models? The authors should instead/also include analyses using ROB (so the analysis is of FP and DLPFC size relative to brain size). Using body mass confounds the analyses since they will be impacted by differences in brain size relative body size.


      Again, we have addressed this issue above. First, body size is a socio-ecological variable (if anything, it especially predicts energetic needs and energy expenditure), but ROB is clearly not. We do not see the theoretical relevance of ROB in a socio-ecological model. Second, from a neurobiological point of view, since within primates the volume of a given brain region is directly related to its number of neurons (again, see work of Herculano-Houzel), which is a good proxy for its computing capacity, we do not see the theoretical reason for considering ROB.

      • It is not appropriate to compare the impact of different predictors using their coefficients if the variables were not scaled prior to analysis. The authors need to implement this in their approach to make such claims.

      We thank the reviewer again for pointing that out. We have addressed this question above.

      • Differences across primates in terms of frontal lobe networks throughout the brain should be acknowledged (e.g., Barrett et al. 2020, J Neurosci).

      We have added that reference to the discussion, together with other references showing that the difference between human and non-human primates is significant, but essentially quantitative, rather than qualitative (the building blocks are relatively well conserved, but their relative weight differs a lot). Thank you for pointing it out.

      I hope the authors find my comments helpful in revising their manuscript.

      And we thank again the reviewer for the helpful and constructive comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study identifies the homeodomain transcription factor and suspected autism-candidate gene Meis2 as transcriptional regulators of maturation and end-organ innervation of low-threshold mechanoreceptors (LTMRs) in the dorsal root ganglia (DRG) of mice. For a few years, the view on autism spectrum disorders (ASD) has shifted from a disorder that exclusively affects the brain to a condition that also includes the peripheral somatosensory system, even though our knowledge about the genes involved is incomplete. The study by Desiderio and colleagues is therefore not only scientifically interesting but may also have clinical relevance. The work is convincing, with appropriate and validated methodology in line with current state-of-the-art and the findings contribute both to understanding and potential application.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work examined transcription factor Meis2 in the development of mouse and chick DRG neurons, using a combination of techniques, such as the generation of a new conditional mutant strain of Meis2, behavioral assays, in situ hybridization, transcriptomic study, immunohistochemistry, and electrophysiological (ex vivo skin-nerve preparation) recordings. The authors found that Meis2 was selectively expressed in A fiber LTMRs and that its disruption affects the A-LTMRs' end-organ innervation, transcriptome, electrophysiological properties, and light touch-sensation.

      Strengths:

      1) The authors utilized a well-designed mouse genetics strategy to generate a mouse model where the Meis2 is selectively ablated from pre- and post-mitotic mouse DRG neurons. They used a combination of readouts, such as in situ hybridization, immunhistochemistry, transcriptomic analysis, skin-nerve preparation, electrophysiological recordings, and behavioral assays to determine the role of Meis2 in mouse DRG afferents.

      2) They observed a similar preferential expression of Meis2 in large-diameter DRG neurons during development in chicken, suggesting evolutionarily conserved functions of this transcription factor.

      3) Conducted severe behavioral assays to probe the reduction of light-touch sensitivity in mouse glabrous and hairy skin. Their behavioral findings support the idea that the function of Meis2 is essential for the development and/or maturation of LTMRs.

      4) RNAseq data provide potential molecular pathways through which Meis2 regulates embryonic target-field innervation.

      5) Well-performed electrophysiological study using skin-nerve preparation and recordings from saphenous and tibial nerves to investigate physiological deficits of Meis2 mutant sensory afferents.

      6) Nice whole-mount IHC of the hair skin, convincingly showing morphological deficits of Meis2 mutant SA- and RA- LTMRs.

      Overall, this manuscript is well-written. The experimental design and data quality are good, and the conclusion from the experimental results is logical.

      Weaknesses:

      1) Although the authors justify this study for the involvement of Meis2 in Autism and Autism associated disorders, no experiments really investigated Autism-like specific behavior in the Meis2 ablated mice.

      Indeed, in the first version of the manuscript, we use current understanding of ASD in mouse models and associated sensory defects to articulate our introduction and discussion. As noticed by reviewer 1, none of our experiments really investigated ASD. To avoid over-interpretation of the data, we have now removed sentences mentioning ASD and related references throughout the manuscript.

      2) For mechanical force sensing-related behavioral assays, the authors performed VFH and dynamic cotton swabs for the glabrous skin, and sticky tape on the back (hairy skin) for the hairy skin. A few additional experiments involving glabrous skin plantar surfaces, such as stick tape or flow texture discrimination, would make the conclusion stronger.

      We fully agree on that performing more behavioral analysis investigating with more details the primary sensory defects as well as some ASD-related behavior would re-inforce our conclusions. Our behavioral analysis clearly showed a loss of sensitivity in response to mechanical stimuli within the light touch range but not for higher range mechanical or noxious thermal stimuli. While the experiments suggested by the reviewer are interesting and would strengthen our conclusions, they are far from trivial and require large cohorts. Given the current laboratory conditions as stated at the outset, these unfortunately are not within reach.

      3) The authors considered von Frey filaments (1 and 1.4 g) as noxious mechanical stimuli (Figure 1E and statement on lines 181-183), which is questionable. Alligator clips or pinpricks are more certain to activate mechanical nociceptors.

      To avoid misinterpretation of the higher Von Frey filament tests, we deleted the two following statement in page 7: “In the von Frey test, the thresholds for paw withdrawal were similar between all genotypes when using filaments exerting forces ranging from 1 to 1.4g, which likely reflects the activation of mechanical nociception suggesting that Meis2 gene inactivation did not affect nociceptor function.”. The sentence “… while sparing other somatosensory behaviors” was also deleted.

      4) There are disconnections and inconsistencies among findings from morphological characterization, physiological recordings, and behavior assays. For example, Meis2 mutant SA-LTMRs show a deficiency in Merkel cell innervation in the glabrous skin but not in hairy skin. With no clear justification, the authors pooled recordings of SA-LTMRs from both glabrous and hairy skin and found a significant increase in mean vibration threshold. Will the results be significantly different if the data are analyzed separately? In addition, whole-mount IHC of Meissner's corpuscles showed morphological changes, but electrophysiological recordings didn't find significant alternation of RAI LTMRs. What does the morphological change mean then? Since the authors found that Meis2 mice are less sensitive to a dynamic cotton swab, which is usually considered as an RA-LTMR mediated behavior, is the SAI-LTMR deficit here responsible for this behavior? Connections among results from different methods are not clear, and the inconsistency should be discussed.

      We thank Reviewer 1 for the careful review of our data and fully agree with the weaknesses identified, weaknesses we were ourselves aware of at the time of submission. In particular on the lack of stronger connections between histological and electrophysiological data. Electrophysiological studies were conducted on a first cohort of mice where we mostly emphasize on WT and Meis2 mutant mice. The goal was to describe differences in electrophysiological properties of identified mechanoreceptors from these two genotypes. While substantial differences between WT and Islet1-Cre mice were not expected, only very few mice with this genotype were examined at that time to confirm this assumption. We fully agree with reviewer 1 that confirming differences in SA-LTMRs responses in the hairy and glabrous at electrophysiological levels would be interesting and worthwhile. It is assumed that the physiological properties of SA-LTMRs from glabrous and hairy skins are equivalent in both skin types. Indeed direct comparisons have been made between glabrous and hairy skin SA-LTMRs revealing that they have equivalent receptor properties (see Walcher et al J Physiol quoted in the manuscript). We had not recorded from a sufficient number of hairy and glabrous skin SA-LTMRs to make any meaningful comparison statistically. When we noticed the dramatic differences in the innervation patterns of Merkel cell complexes between glabrous and hairy skin, we immediately planned a second mice cohort, but as explained in the onset to the Public Review, this cohort was sacrificed due to the pandemic lockdown. However, the obtained dataset clearly shows that in Meis2 mutant mice many SA-LTMRs had similar vibration thresholds to those of wild types.

      For Meissner corpuscle, histological analysis evidenced clear morphological differences that could of course be investigated at the level of the dual innervation previously reported by Neubarth et al. It is uncertain whether differences in their electrophysiological responses would be revealed by increasing the number of recorded fibers. For this reason, we clearly stated this limitation in the results section page 7 “There was a tendency for RA-LTMRs in Isl1Cre/+::Meis2LoxP/LoxP mutant mice to fire fewer action potentials to sinusoids and to the ramp phase of a series 2 second duration ramp and hold stimuli, but these differences were not statistically significant (Figure 5B). Nevertheless it is important to point out that an electrical search strategy revealed that many Aβ-fibers did not have mechanosensitive receptive fields. Thus by focusing on LTMRs with a mechanosensitive receptive field, we ignore the fact that fewer fibers are mechanosensitive. This is now more extensively discussed in the discussion section of the manuscript page 13:

      “Indeed, the electrophysiology methods used here can only identify sensory afferents that have a mechanosensitive receptive field. Primary afferents that have an axon in the skin but no mechanosensitvity can only be identified with a so-called electrical search protocol (45, 46) which was not used here. It is therefore quite likely that many primary afferents that failed to form endings would not be recorded in these experiments e.g. SA-LTMRs and RA-LTMRs that fail to innervate end-organs (Fig.4-6).”

      “From our data, we could not conclude whether SA-LTMR electrophysiological responses are differentially affected in the glabrous versus hairy skin of Meis2 mutant as suggested by histological analysis. Further electrophysiological analysis focused on SA-LTMR selectively innervating the glabrous or hairy skin would be necessary to answer this question. Similarly, the decreased sensitivity of Meis2 mutant mice in the cotton swab assay and the morphological defects of Meissner corpuscles evidenced in histological analysis do not correlate with RA-LTMR electrophysiological responses for which a tendency to decreased responses were however measured. The later might result from an insufficient number of fibers recording, whereas the first may be due of pooling SA-LTMR from both the hairy and glabrous skin.”.

      Reviewer #2 (Public Review):

      Summary:

      Desiderio and colleagues investigated the role of the TALE (three amino acid loop extension) homeodomain transcription factor Meis2 during maturation and target innervation of mechanoreceptors and their sensation to touch. They start with a series of careful in situ hybridizations to examine Meis2 transcript expression in mouse and chick DRGs of different embryonic stages. By this approach, they identify Meis2+ neurons as slowly- and rapidly adapting A-beta LTMRs, respectively. Retrograde tracing experiments in newborn mice confirmed that Meis2-expressing sensory neurons project to the skin, while unilateral limb bud ablations in chick embryos in Ovo showed that these neurons require target-derived signals for survival. The authors further generated a conditional knock-out (cKO) mouse model in which Meis2 is selectively lost in Islet1-expressing, postmitotic neurons in the DRG (IsletCre/+::Meis2flox/flox, abbreviated below as cKO). WT and Islet1Cre/+ littermates served as controls. cKO mice did not exhibit any obvious alteration in volume or cellular composition of the DRGs but showed significantly reduced sensitivity to touch stimuli and various innervation defects to different end-organ targets. RNA-sequencing experiments of E18.5 DRGs taken from WT, Islet1Cre/+, and cKO mice reveal extensive gene expression differences between cKO cells and the two controls, including synaptic proteins and components of the GABAergic signaling system. Gene expression also differed considerably between WT and heterozygous Islet1Cre/+ mice while several of the other parameters tested did not. These findings suggest that Islet1 heterozygosity affects gene expression in sensory neurons but not sensory neuron functionality. However, only some of the parameters tested were assessed for all three genotypes. Histological analysis and electrophysiological recordings shed light on the physiological defects resulting from the loss of Meis2. By immunohistochemical approaches, the authors describe distinct innervation defects in glabrous and hairy skin (reduced innervation of Merkel cells by SA1-LTMRs in glabrous but not hairy skin, reduced complexity of A-beta RA1-LTMs innervating Meissner's corpuscles in glabrous skin, reduced branching and innervation of A-betA RA1-LTMRs in hairy skin). Electrophysiological recordings from ex vivo skin nerve preparations found that several, but not all of these histological defects are matched by altered responses to external stimuli, indicating that compensation may play a considerable role in this system.

      Strengths:

      This is a well-conducted study that combines different experimental approaches to convincingly show that the transcription factor Meis2 plays an important role in the perception of light touch. The authors describe a new mouse model for compromised touch sensation and identify a number of genes whose expression depends on Meis2 in mouse DRGs. Given that dysbalanced MEIS2 expression in humans has been linked to autism and that autism seems to involve an inappropriate response to light touch, the present study makes a novel and important link between this gene and ASD.

      Weaknesses:

      The authors make use of different experimental approaches to investigate the role of Meis2 in touch sensation, but the results obtained by these techniques could be connected better. For instance, the authors identify several genes involved in synapse formation, synaptic transmission, neuronal projections, or axon and dendrite maturation that are up- or downregulated upon targeted Meis2 deletion, but it is unresolved whether these chances can in any way explain the histological, electrophysiological, or behavioral deficits observed in cKO animals. The use of two different controls (WT and Islet1Cre/+) is unsatisfactory and it is not clear why some parameters were studied in all three genotypes (WT, Islet1Cre/+ and cKO) and others only in WT and cKO. In addition, Meis2 mutant mice apparently are less responsive to touch, whereas in humans, mutation or genomic deletion involving the MEIS2 gene locus is associated with ASD, a condition that, if anything, is associated with an elevated sensitivity to touch. It would be interesting to know how the authors reconcile these two findings. A minor weakness, the first manuscript suffers from some ambiguities and errors, but these can be easily corrected.

      We thank the reviewer for the insightful comments and suggestions.

      The use of two different controls (WT and Islet1Cre/+) is unsatisfactory and it is not clear why some parameters were studied in all three genotypes (WT, Islet1Cre/+ and cKO) and others only in WT and cKO.

      First, we identified a labelling mistake in figures 4D, 5A and 6A where the control shown are from Islet1+/Cre mice and not from WT as reported in the first version. We apologize for this mistake which has now been corrected. This typographical error does not in any way affect our conclusion, on the contrary, it shows that innervation defects are not the consequence of Islet1 heterozygosity.

      The reviewer wonders why for some data both control genotypes are presented, and for some others only one is presented. It is quite possible that genes expression changes happen due to a synergistic effect of both heterozygous Meis2 deletion and heterozygous Islet1 deletion. However, we found no evidence that this led to defects in target-field innervation or to changes in the physiological properties of sensory neurons.

      Whereas it could be fairly envisaged that some gene expression is modified due to a synergistic effect of both heterozygous Meis2 deletion and heterozygous deletion of Islet1, several lines of evidence support that the defects in target-field innervation and electrophysiological responses are exclusively due to Meis2 deletion. Previous work on Islet1 specific deletion in DRG sensory neurons opens the possibility that some of the phenotypes we report here are in part due to an effect of Islet1 heterozygous deletion or a synergistic effect to Meis2 homozygous deletion.

      1) When Islet1 is conditionally deleted in mice using the Wnt1-Cre strain or at later stages using a tamoxifen inducible-Cre, homozygous pups die a few hours after birth. Early Islet1 deletion results in an increased apoptosis in the DRG, a massive loss of DRG sensory neurons and sensory defects associated to nociceptors mostly and some touch neurons while proprioceptive neurons are spared (Sun et al., 2008 now included in the revised version of the manuscript). There was a decrease in the number of Ntrk1+ and Ntrk2+ neurons whereas Ntrk3+ neurons number appeared normal. When Islet1 is inactivated later in development, the number of Ntrk1+ and Ntrk2+ neurons were normal and only the expression of nociceptor specific markers was decreased. Since neither the DRG volume, nor the number of Ntrk1+, Ntrk2+ and Ntrk3+ neurons are changed in Meis2 cKO using the Islet1-Cre strain, an early significant effect of Islet1 heterozygous deletion is very unlikely.

      2) For distal innervation defects, it is clear from the Wnt1-Cre::Meis2 data (Figure 3E) that the distal innervation phenotype occurred while Meis2 is inactivated independently of Islet1 expression.

      3) Finally, the lack of differences between WT and Islet+/Cre mice in behavioral assays and in electrophysiological characterization of RA-LTMR of the hairy skin (Figure 6C) and SA-LTMR (Figure 4B and C) argues for a lack of significant consequences of Islet1 heterozygous deletion on these parameters.

      4) For bulk RNAseq studies, all datasets has been now re-analyzed following Reviewer 2 specific comments (see below). To avoid misinterpretation of the data, the results are now presented differently (see pages 8 and 9) and more critically discussed (see pages 14 and 15). In particular, we included and discuss references on Islet1 cKO mice.

      We also agree with reviewer 2 that our RNAseq study only provides cues on potential genes expression that could impact distal innervation and electrophysiological responses. However, proving which of those genes are fully responsible for the morphological and electrophysiological defects would require extensive mouse genetic investigations such as restoring their normal expression level in a Meis2 mutant context, which is beyond the scope of the present study.

      Finally, the reviewer questioned how we could reconcile the lower touch sensitivity in Meis2 mutant mice with the exacerbated touch sensitivity found in ASD patient and mouse models of ASD. As suggested by reviewer 1, our study did not really investigate ASD specifically. Therefore, to avoid over interpretation of the data and to follow Reviewer 1 recommendation, we have removed all references to ASD in the revised version of the manuscript. Indeed, to our knowledge, none of the case reports on Meis2 mutant patients investigated sensory function in general and light touch in particular, maybe because of the severe intellectual disability characterizing these patients.

      Reviewer #1 (Recommendations For The Authors):

      In addition to the aforesaid suggestions in the section 2, there are some minor issues:

      We thank the reviewer for the careful reading and for identifying all these typos. All of them have been corrected in the revised version of the manuscript.

      1) There should not be a full stop mark in the title of the article. This has been corrected in the new version of the manuscript.

      2) Figure 1C, 1D, please correct the typo "controlateral' to "contralateral".

      This has been corrected in the new version of the manuscript.

      3) Figure 1D, lower graph, Y-axis, please correct the typo 'umber' to "number".

      This has been corrected in the new version of the manuscript.

      4) To make it easy for readers, add the names of the behavioral tests on top of the graphs in Fig 1E-H.

      The name of behavioral tests is now added to the figure.

      5) It would be easier to read the markers' names in IHC and ISH images if they were written outside of image panels. The blue staining color in image 1B could be easily mixed with the background. Suggest change colors.

      Markers for IHC and IH images are now written outside the image panel or colors have been change in figure 1 and 2 for better clarity.

      6) The font size of Genes' name in Figure 3B is too small and not readable.

      Figure 3 has now been changed following Reviewer 2 recommendation. The small font size in Figure 3B is no longer present in the figure.

      7) Quantification of Fig 3E (number of fibers innervating each dermal papilla or footpad, for example).

      Unfortunately, we did not kept the Wnt1Cre::Meis2LoxP/LoxP strain which prevents further analysis (see onset of the answer to public review).

      8) In Figure 4, please arrange IHC images and their quantification results adjacent to each other.

      The figure has been reorganized and changes in the result section and figures legend were made accordingly.

      9) For consistency, please use either LTMR or LTM (See Figure 4F, 5A, 6C), but not both.

      This has been homogenized throughout the manuscript.

      10) Add arrows/heads to mark the overlaps in Figure 4D.

      Arrows are now added in Figure 4D to point at the overlap between Nefh and CK8 staining.

      11) Figure 5A, 6A, Lines 236, 240, 247, 258, 305, 308, 313, 347, and many more in Figure legends: please check in entire manuscript and make the mouse genotype nomenclature (+/Cre?) consistent. In some places, Cre is written in all upper case (Line 657).

      This has been homogenized throughout the manuscript.

      12) Figure 4G: Histogram color could be darker for better contrast.

      The color of the histograms has been changes in figures 6 and 5 for better clarity.

      13) Please add the figure number to the Figure 6.

      The figure number is now indicated on the figure.

      1. Figure 6B: Y-axis typo, correct "Nfeh" to Nefh.

      This typo is now corrected.

      15) Either explain Figure 2B information before that of Figure 2C (In lines 204-207) in the text or change the figure panel sequence to keep the consistent flow of contents.

      The figure has been modified and the panel sequence now follows that of the main text.

      16) Line 213 has a typo: change "form" to "from".

      This typo is now corrected.

      17) Line 423 has a typo. Correct "al" to "all".

      This typo is now corrected.

      18) Line 625 has a typo. Correct "fo" to "of".

      This typo is now corrected.

      19) Line 669 has a typo. Correct "Alexa Fluo" to "Fluor".

      This typo is now corrected.

      20) Line 744: To be consistent in the entire manuscript, write "Nfh" as "Nefh".

      This typo is now corrected.

      21) 740-749: Please add host names for all primary antibodies, as some are given but some are not for the current version.

      We now indicated the host species for all primary antibodies used in the study.

      22) Line 751 has a typo: change "a" to "as".

      This typo is now corrected.

      23) Line 754: what is for 20'?

      This typo is now corrected.

      24) Line 832: change "day test" to "testing day".

      The change has been made.

      25) Please mention for how many seconds the VFH was administered on the plantar surface in the method.

      A new sentence has been added to the “Von Frey withdrawal test” Methods section (page 30): “During each application, bend filament was maintained for approximately four to five seconds”.

      26) For the sticky tape test, in lieu of hind paw attending bouts, wet-dog shake behavior, the authors also found some scratching behaviors. Did they separately quantify these behaviors? It would be interesting to see exactly which behavior significantly reduced after Meis2 inactivation.

      Unfortunately, at the time of the design of the sticky tape test, we did not consider separating the behaviors considered as “positive” reactions. As these experiments were not video recorded, we are not able to extract this kind of information without generating new mice cohort and repeating this experiment.

      27) Line 344-345: consider rephrasing the sentence.

      This sentence has been removed.

      Reviewer #2 (Recommendations For The Authors):

      This is a beautiful and well-conducted study with all the strengths listed in the paragraphs above. Nevertheless, there are still some open questions, ambiguities in the presentation, and minor errors that I would recommend addressing.

      Major Points:

      1) The authors performed RNA-seq analysis from E18.5 mouse total DEGs from three different genotypes, WT, Isle1Cre/+ and cKO. Although this approach identified several interesting Meis2-dependent candidate genes, the presentation of the results is confusing, and the publication would gain impact if the RNA-seq results were better connected to the histological, behavioral, and electrophysiological data. Specific concerns:

      1.1) The gene expression profiles of WT and Islet1Cre/+ samples are remarkably divergent. According to Yang Development 2006, Islet1-Cre was generated by knocking in Cre into the endogenous Islet1 locus and replacing the Isl1 ATG, hence resulting in a heterozygous null for Islet1. When purely technical derivations can be excluded, the RNAseq results presented here suggest that heterozygous loss of Islet1 causes considerable gene expression changes in the postnatal DRG. For analysis of the RNAseq results, the authors focus on genes that are differentially expressed between one experimental condition (Islet1Cre/+::Meis2flox/flox) and either one of two controls (WT or Islet1Cre/+). Hence, they pool the genes that are differently expressed between cKO and Islet1Cre/+ with the genes that are different between cKO and WT. This approach mixes gene expression differences that result from two different genetic alterations, heterozygosity of Islet1 and targeted deletion of Meis2, respectively. It seems much more logical to compare the results pairwise.

      We agree with reviewer 2 that heterozygous deletion of Islet1 causes a significant change in genes expression that seems to very little correlate with any of the phenotypes we investigated in the study. When Islet1 is conditionally deleted in mouse using the Wnt1-cre strain, pups die few hours after birth and display increased apoptosis in the DRG, massive loss of DRG sensory neurons and sensory defects associated to nociceptors mostly and some touch neurons while proprioceptive neurons are spared (Sun et al., 2008 now included in the revised version of the manuscript). There is a decrease numbers of Ntrk1+ and Ntrk2+ neurons whereas the numbers of Ntrk3+ neurons appear normal. Later Isl1 inactivation does not induces changes in number of neurons and does not change Ntrk1 and 2 expressions. As explained in the answer to public reviews, bulk RNAseq data have now been reanalyzed following the reviewer suggestions and presented accordingly in the related figures.

      In the study bay Sun et al. they also reported DEGs following Islet1 homozygous deletion, but data on Islet1 heterozygous deletion are not included. However, out of the 60 most dysregulated genes identified in their study, only 6 were differentially expressed in our datasets. Importantly, DEGs in their studies where identified using microarray. In another study, the same group, showed that Brn3a (another transcription factor important for DRG neurons differentiation) and Islet1 exhibit negative epistasis on sensory genes expression (Dykes et al., 2011 now included in the revised version of the manuscript). Thus we cannot rule out that similar rules apply for Islet1 and Meis2. However, given the high diversity of DRG sensory neurons, interpreting our bulk RNAseq analysis in such direction might lead to misinterpretation.

      1.2) Along the same line, gene expression changes in Islet1Cre/+ DRGs seem to have little functional consequences, at least in the cases where all three genotypes were analyzed (target dependency (Fig. 1E), behavior (Fig. 1F), innervation (Fig. 4F, 6C)). Why were some parameters measured in all three genotypes and others only for WT and cKO? The authors probably reason that parameters that do not differ between WT and cKO animals will likely also not differ between WT and Islet1Cre/+. But what about parameters that do differ? Considering that the innervation of Merkel cells (Fig. 4E) and Meissner corpuscles (Fig. 5A) differ profoundly between WT and cKO, it would be interesting to know what this innervation looks like in Islet1Cre/+ DRGs. NEFH staining together with CK8 or S100beta from existing tissue sections should easily answer this question.

      As explained in the answer for public reviews, there was a mistake in the annotation of the control in figure 4 D and E, and in Fig. 5 that has now been corrected. Concerning target-dependency, those are experiments conducted in chick embryo, and therefore no associated genotype.

      1.3) Was a minimum cut-off for gene expression applied? The up-and downregulated genes in Fig. 3B list a number of pseudogenes and predicted genes. A quick (and incomplete) check for their expression in Fig2 Supple Table 1 shows that only a few reads were detected for most of them. With such low expression, even small changes will show up as significant differences.

      In our first analysis, a cut-off of 10 reads was applied. As reviewer 2 mentioned, this cut-off included several pseudogenes and predicted genes with low expression for which small changes were significant. We now re-analyzed the dataset using a cut-off of 100 reads. This excluded most of the previous predicted genes and pseudogenes for the analysis and resulted in a much small number of DEGs for each dataset. As recommended by reviewer 2, we also now performed the David analysis separately. These results are now presented in Figure 3 and corresponding supplementary figures.

      1.4) Given that bulk RNAseq from whole embryonic DRGs was performed, it would be interesting to know what cell type(s) express the Meis2-dependent transcripts. To address this question, the authors resort to published scRNAseq data by Usoskin Nat Neurosci 2015. They correlate the expression of all 488 DEGs (different between cKO and either WT or Islet1Cre/+) with the expression of Meis2 in the sensory neuron subtypes that were classified in the Usoskin paper. From that they conclude that many Meis2-dependent genes were expressed in the same sensory neuron classes as Meis2 itself. This is not apparent from Fig. 3 Supplementary 2. Neither do the 488 DEGs seem to be in any way enriched in the MEIS2-expressing cell clusters NF2/3/4/5, nor is cluster PEP1 particularly high in Meis2 expression. Immunostaining for MEIS2 together with a few selected DEGs would be a better way to assess co-expression.

      We agree with reviewer 2 that the correlation between DEGs and the expression of Meis2 in the sensory neuron subtypes was far from striking. In our opinion, the new analysis shows now a more robust correlation. However, it has to be kept in mind that among DEGs not all are expected to be Meis2 direct target genes and therefore to be enriched in the same Meis2-expressing population. This also hold true for genes that could be de-repressed or induced following Meis2 inactivation. Finally, the scRNAseq by Usoskin et al was performed on adult sensory neurons whereas our bulk RNAseq was performed on E18.5 embryos. Thus, because gene expression in developing sensory neurons is well-known to be highly dynamic, it is not expected that the transcriptional signature of sensory neurons subclasses in E18.5 embryo perfectly matches the transcriptional signature of adult subclasses. Finally, we agree that immunostaining for Meis2 together with few selected DEGs would give a better answer on whether they co-localize or not, but our lack of experience with those antibodies together with the lack of financial support for the proposal precludes achieving this pertinent point.

      1.5) The authors identify Gabra1 and Gabra4 as upregulated and Gabrr1 as downregulated genes in MEIS2 cKO animals. Does this reflect a change in GABA-receptor subunit composition in LMTRs?

      This is an interesting point. First, in our new analysis, increasing the cut-off to 100 reads excluded Gabrr1 from the DEGs. Based on our results, we cannot conclude whereas Gabra1 and Gabra4 up-regulation reflects a change in GABA receptors composition. However, in the GEO term associated to Gabaergic synapse, whereas Gabra1 and Gabra4 were up-regulated the ionotropic glutamate receptor Grid1 was downregulated, rather claiming for an imbalanced GABA/Glutamate transmission. Finally, the increased GABAR expression in the LTMRs might be expected to increase pre-synaptic inhibition on the LTMR synapses onto target neurons in the dorsal horn, thus decreasing synaptic transmission from these neurons into spinal circuits.

      2) The authors assessed SA-LTMR innervating Merkel cells in glabrous and hairy skin by IFC staining for neurofilament H and electrophysiological recordings. Due to the small sample size, they pooled recordings, reasoning that nerves that do not successfully innervate Merkel cells (i.e. cKO glabrous skin) do not evoke electrophysiological responses following a touch stimulus.

      2.1) It is undoubtedly true that non-innervating nerves will likely not show electrophysiological responses. However, by pooling the recordings of SA-LTMRs from glabrous and hairy skin, the data obtained from the 20% successful recordings of SA-LTMRs from glabrous cKO skin (according to Fig. 4E, upper panel) will be overrepresented and hence lead to a systematic bias. How many recordings were made from the glabrous and hairy skin of each genotype? In case the number of recordings from cKO/glabrous skin is the limiting factor, does the observed difference in vibration threshold hold true when only recordings from hairy skin are compared?

      As explained in the text and in our answers to reviewer 1, data for hairy and glabrous SAMs where initially pooled as no differences between them were expected, and next planned electrophysiological experiments were compromised due to the Covid19 pandemic. We are sorry that at this point, we cannot provide additional experiments to clarify this important point. In addition, as mention

      3) From the IFC images shown in Fig. 6A, it is not clear how the authors quantified branch points and innervated hair follicles.

      Branch points correspond to every time a nerve split in 2 or more nerves. Innervated follicles correspond to follicles that are entangled by circumferential and/or lanceolate Nefh+ endings.

      4) The quality of the data is very high, but there are several ambiguities and errors in their presentation.

      We apologize for this mistake. Figure 1 Supplementary 1 that reports data from Cat walk analysis is now appropriately included in the files.

      4.2) Fig. 3A is confusing and the figure legend just repeats what is already said in the text. What do yellow, blue, and pink represent?

      Figure 3 is now fully remade. Legend is now better indicated in Figure 3A. We hope it is now more clear.

      4.3) What genotype do the black, grey, and white boxplots in Fig. 6C Fig. 3 Supplementary 1B correspond to?

      The legends were missing for Figure 6C and Figure 3 supplementary 1B. They are now appropriately included.

      4.4) Up- and downregulated genes are assigned differently in Fig. 3 and Fig. 3 Supplementary 2. The figure legend of Fig. 3 Suppl 2 lists panel B as up-regulated genes but the same genes are labeled down-regulated in Fig. 3.

      We apologize for this previous mistake. Figure 3 and corresponding supplementary figures have been redone in the new version.

      4.5) Fig. 3E would benefit from a more detailed description. One can easily appreciate that the neurofilament H staining in the cKO sample is different from that of the WT sample but what exactly can be seen here?

      We added the following sentence in the results section: “In WT newborn mice, numerous Nefh+ sensory fibers surround all dermal papillae of the hairy skin and footpad of the glabrous skin, whereas in Wnt1Cre::Meis2LoxP/LoxP littermates, very few Nefh+ sensory fibers are present and they poorly innervate the dermal papillae and footpads.“.

      4.6) The figure legend to Fig. 4A is unclear. Does the graph show the sum of all recordings performed? From the text, one would guess that the bars correspond to the cKO samples, but this is not specified. Do the controls correspond to WT, Islet1Cre/+ or a mixture of both? In addition, the graph in the lower panel is labeled % Ab fibers, the figure legend reads % of tap units among Ab fibers.

      The graphs show the number of tap units identified among all recorded Afibers. Numbers show the number of tap units over the number of recorded fibers. This as been now reformulated in the last version of the manuscript.

      4.7) The abbreviation SAM in figure legends 4F, G is not introduced.

      This is now indicated in the figure legend.

      4.8) Readers who are not familiar with the traces above the graphs in 4F and 4G will find a more detailed description helpful.

      This is now indicated in the figure legend.

      4.9) Lines 274-275: Does the statement "Finally, consistent with the lack of neuronal loss in Isl1Cre/+::Meis2LoxP/LoxP, the number of recorded fibers were identical in WT and Isl1Cre/+::Meis2LoxP/LoxP." refer to Fig. 4G? This is not specified in the text.

      These data were not included in the first version of the manuscript as we though they were not significantly informative. They just indicate the overall numbers of fibers that were recorded in electrophysiological experiments. The sentence has been now removed in the last version of the manuscript to avoid misunderstanding.

      4.10) There is no Fig. 6 supplementary 1.

      The typo is now corrected. The corresponding data were in fact in Figure 5 Supplementary 1.

      Minor points:

      • Gangfuß et al. report that a patient previously diagnosed with a range of neurological deficits including the diagnosis of severe infantile autism is heterozygous mutant for MEIS2. Although this study links MEIS2 gene function to ASD in the wider sense, adding a few additional references will make the link stronger. Examples are Shimojima et al., Hum Genome Var 2017 or Bae et al., Science 2022.

      These two references have been now included in the introduction section of the manuscript.

      • In some figures (e.g. Fig. 4) the numbering of the panels does not follow the order in which the respective data are mentioned in the text.

      Figure 4 is now re-organized so that panels follow the same order as in the results section.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Nitrogen metabolism is of fundamental importance to biology. However, the metabolism and biochemistry of guanidine and guanidine containing compounds, including arginine and homoarginine, have been understudied over the last few decades. Very few guanidine forming enzymes have been identified. Funck et al define a new type of guanidine forming enzyme. It was previously known that 2-oxogluturate oxygenase catalysis in bacteria can produce guanidine via oxidation of arginine. Interestingly, the same enzyme that produces guanidine from arginine also oxidises 2-oxogluturate to give the plant signalling molecule ethylene. Funck et al show that a mechanistically related oxygenase enzyme from plants can also produce guanidine, but instead of using arginine as a substrate, it uses homoarginine. The work will stimulate interest in the cellular roles of homoarginine, a metabolite present in plants and other organisms including humans and, more generally, in the biochemistry and metabolism of guanidines.

      1) Significance

      Studies on the metabolism and biochemistry of the small nitrogen rich molecule guanidine and related compounds including arginine have been largely ignored over the last few decades. Very few guanidine forming enzymes have been identified. Funck et al define a new guanidine forming enzyme that works by oxidation of homoarginine, a metabolite present in organisms ranging from plants to humans. The new enzyme requires oxygen and 2oxogluturate as cosubstrates and is related, but distinct from a known enzyme that oxidises arginine to produce guanidine, but which can also oxidise 2-oxogluturate to produce the plant signalling molecule ethylene.

      Overall, I thought this was an exceptionally well written and interesting manuscript. Although a 2-oxogluturate dependent guanidine forming enzyme is known (EFE), the discovery that a related enzyme oxidises homoarginine is really interesting, especially given the presence of homoarginine in plant seeds. There is more work to be done in terms of functional assignment, but this can be the subject of future studies. I also fully endorse the authors' view that guanidine and related compounds have been massively understudied in recent times. I would like to see the possibility that the new enzyme makes ethylene explored. Congratulations to the authors on a very nice study.

      Response: We thank the reviewer for the positive evaluation of our manuscript. In the revised version, we have emphasized more clearly that we found no evidence for ethylene production by the recombinant enzymes. The other suggestions of the reviewer are also considered in the revised version as detailed below.

      Reviewer #2 (Public Review):

      In this study, Dietmar Funck and colleagues have made a significant breakthrough by identifying three isoforms of plant 2-oxoglutarate-dependent dioxygenases (2-ODD-C23) as homo/arginine-6-hydroxylases, catalyzing the degradation of 6-hydroxyhomoarginine into 2aminoadipate-6-semialdehyde (AASA) and guanidine. This discovery marks the very first confirmation of plant or eukaryotic enzymes capable of guanidine production.

      The authors selected three plant 2-ODD-C23 enzymes with the highest sequence similarity to bacterial guanidine-producing (EFE) enzymes. They proceeded to clone and express the recombinant enzymes in E coli, demonstrating capacity of all three Arabidopsis isoforms to produce guanidine. Additionally, by precise biochemical experiments, the authors established these three 2-ODD-C23 enzymes as homoarginine-6-hydroxylases (and arginine-hydroxylase for one of them). Furthermore, the authors utilized transgenic plants expressing GFP fusion proteins to show the cytoplasmic localization of all three 2-ODD-C23 enzymes. Most notably, using T-DNA mutant lines and CRISPR/Cas9-generated lines, along with combinations of them, they demonstrate the guanidine-producing capacity of each enzyme isoform in planta. These results provide robust evidence that these three 2-ODD-C23 Arabidopsis isoforms are indeed homoarginine-6-hydroxylases responsible for guanidine generation.

      The findings presented in this manuscript are a significant contribution for our understanding of plant biology, particularly given that this work is the first demonstration of enzymatic guanidine production in eukaryotic cells. However, there are a couple of concerns and potential ways for further investigation that the authors should (consider) incorporate.

      Firstly, the observation of cytoplasmic and nuclear GFP signals in the transgenic plants may also indicate cleaved GFP from the fusion proteins. Thus, the authors should perform Western blot analysis to confirm the correct size of the 2-ODD-C23 fusion proteins in the transgenic protoplasts.

      Secondly, it may be worth measuring pipecolate (and proline?) levels under biotic stress conditions (particularly those that induce transcript changes of these enzymes, Fig S8). Given the results suggesting a potential regulation of the pathway by biotic stress conditions (eg. meJA), these experiments could provide valuable insights into the physiological role of guanidine-producing enzymes in plants. This additional analysis may give a significance of these enzymes in plant defense mechanisms.

      Response: We thank also reviewer 2 for the positive evaluation and useful suggestions. We performed the proposed GFP Western blot, which indeed indicated the presences of both, fulllength fusion proteins and free GFP, which can explain the partial nuclear localization. We fully agree that further experiments with biotic and abiotic stress will be required to determine the physiological function of the 2-ODD-C23 enzymes. However, the list of potential experiments is long and they are beyond the scope of the present manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Specific points

      Overall, I thought this was a very interesting study, comprising biochemical, cellular, and in vivo studies. Of course more could be done on each of these, and likely will be, but I think the assignment of biochemical function is very strong, across all three approaches. The one new experiment I would like to see is a clear demonstration of whether ethylene is produced - unlikely but should be tested.

      We had mentioned our failure to detect ethylene production by the plant enzymes in the previous version and have made it more prominent and reliable by including ethylene production as positive control in the new supplementary figure S5.

      Abstract

      Delete 'hitherto overlooked' - this is implicit 'but is more likely' to 'is likely'?

      Agreed and modified

      Introduction

      Second sentence - what about relevant small molecule primary metabolites including precursors of proteins/nucleic acids.

      We modified the sentence accordingly.

      Paragraph 2 - maybe also note EFE produces glutamate semi aldehyde, via arginine C-5 oxidation.

      Paragraph 2 has been re-phrased according to your suggestion.

      Overall, I thought the introduction was exceptionally well written.

      Perhaps either in the introduction, or later, note there are other 2OG oxygenases that oxidise arginine/arginine derivatives in various ways, e.g. clavaminate synthase/arginine hydroxylases/desaturases.

      We added a sentence mentioning the arginine hydroxylases VioC and OrfP to the introduction and included VioC into the sequence comparison in supplementary figure 2 to show that these enzymes, as well as NapI, are very different from EFE and the plant hydroxylases.

      Results

      Paragraph 1 - qualify similarity and refer to/give a structurally informed sequence alignment, including EFE

      A new supplemental figure S2 was added with sequence identity values and a structurally informed alignment. The text has been modified accordingly.

      Paragraph 2 - briefly state method of guanidine analysis

      We included a reference to the M&M section and mentioned LC-MS in paragraph 2.

      Figure 1 - trivial point - proteins are not expressed/genes are

      We have modified the legend to figure 1. However, we would like to point out that terms like “recombinant protein expression” are widely used in the field. A quick search with google Ngram viewer shows that “protein expression” started to appear in the mid-80ies and its use stayed constantly at 1/8th of “gene expression”.

      Define errors clearly in all figure legends, clearly defining biological/technical repeats<br /> Page 6 - was the His-tag cleared to ensure no issues with Ni contamination?

      We treat individual plants or independent bacterial cultures as biological replicates. Only in the case of enzyme activity assays with NAD(P)H, technical replicates were used and this has been indicated in the legend of figure 6.

      Lower case 'p' in pentafluorobenzyl corrected

      In Figure 2 make clear the hydroxylated intermediates are not observed

      We now use grey color for the intermediates and have put them in brackets. Additionally we state in the figure legend that these intermediates were not detected.

      Pages 6-7 - I may have missed this but it's important to investigate what happens to the 2OG. Is succinate the only product or is ethylene also produced? This possibility should also be considered in the plant studies, i.e. is there any evidence for responses related to perturbed ethylene metabolism. The authors consider a signalling role relating to AASA/P6C, but seem to ignore a potential ethylene connection.

      As stated above, we checked for ethylene production with negative result. EFE produced 6 times more guanidine than the plant enzymes under the same condition, but even 100-fold lower ethylene production would have been clearly detected.

      Page 12 - 'plants have been shown to....' Perhaps note how hydroxy guanidine is made?

      We now mention the canavanine-γ-lyase that cleaves canavanine into hydroxyguanidine and homoserine.

      Overall, I thought the discussion was good, but perhaps a bit long/too speculative on pages 12/13 and this detracted from the biochemical assignment of the enzyme. I'd suggest shortening the discussion somewhat - the precise roles of the enzyme can be the subject of future work. As indicated above, some discussion on potential links to ethylene would be appreciated.

      Since reviewer 2 wanted more (speculative) discussion on the role of the 2-ODD-C23 enzymes and there was no detectable ethylene production, we took the liberty to leave the discussion largely unaltered.

      I'd also like to see some more consideration/metabolic analyses of guanidine related metabolism in the genetically modified plants.

      Such analyses will certainly be included in future experiments once we get an idea about the physiological role of the 2-ODD-C23 enzymes.

      Page 16 - mass spectrometry

      Corrected.

      Please add a structurally informed sequence alignment with EFE and other 2OG oxygenases acting on arginine/derivatives.

      An excerpt of the alignment is now presented in supplementary figure S2.

      Reviewer #2 (Recommendations For The Authors):

      I would like to see more discussion in the manuscript about the possible interconnection/roles between 2-ODD-C23 guanidine-producing, lysine- ALD1-Pipecolate producing, and proline metabolism pathways during both biotic and abiotic stresses.

      Since we were unable to detect pipecolate in any of our plant samples and also our preliminary results with biotic stress did not produce any evidence for a function of the 2ODD-C23 enzymes in the tested defense responses, we would like to postpone such extended discussion until we find a condition where the physiological function of these enzymes is evident.

      Fig. 4: Authors should change colors for Col-0, 0.2 HoArg and ctrl? They look too similar in my pdf file.

      We changed the colors in figure 4 and hope that the enhanced contrast is maintained during the production of the final version of our article.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides a fundamental contribution to the understanding of the role of intrinsically disordered proteins in circadian clocks and the potential involvement of phase separation mechanisms. The authors convincingly report on the structural and biochemical aspects and the molecular interactions of the intrinsically disordered protein FRQ. This paper will be of interest to scientists focusing on circadian clock regulation, liquid-liquid phase separation, and phosphorylation.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      "Phosphorylation, disorder, and phase separation govern the behavior of Frequency in the fungal circadian clock" is a convincing manuscript that delves into the structural and biochemical aspects of FRQ and the FFC under both LLPS and non-LLPS conditions. Circadian clocks serve as adaptations to the daily rhythms of sunlight, providing a reliable internal representation of local time.

      All circadian clocks are composed of positive and negative components. The FFC contributes negative feedback to the Neurospora circadian oscillator. It consists of FRQ, CK1, and FRH. The FFC facilitates close interaction between CK1 and the WCC, with CK1-mediated phosphorylation disrupting WCC:c-box interactions necessary for restarting the circadian cycle.

      Despite the significance of FRQ and the FFC, challenges associated with purifying and stabilizing FRQ have hindered in vitro studies. Here, researchers successfully developed a protocol for purifying recombinant FRQ expressed in E. coli.

      Armed with full-length FRQ, they utilized spin-labeled FRQ, CK1, and FRH to gain structural insights into FRQ and the FFC using ESR. These studies revealed a somewhat ordered core and a disordered periphery in FRQ, consistent with prior investigations using limited proteolysis assays. Additionally, p-FRQ exhibited greater conformational flexibility than np-FRQ, and CK1 and FRH were found in close proximity within the FFC. The study further demonstrated that under LLPS conditions in vitro, FRQ undergoes phase separation, encapsulating FRH and CK1 within LLPS droplets, ultimately diminishing CK1 activity within the FFC. Intriguingly, higher temperatures enhanced LLPS formation, suggesting a potential role of LLPS in the fungal clock's temperature compensation mechanism.

      Biological significance was supported by live imaging of Neurospora, revealing FRQ foci at the periphery of nuclei consistent with LLPS. The amino acid sequence of FRQ conferred LLPS properties, and a comparison of clock repressor protein sequences in other eukaryotes indicated that LLPS formation might be a conserved process within the negative arms of these circadian clocks.

      In summary, this manuscript represents a valuable advancement with solid evidence in the understanding of a circadian clock system that has proven challenging to characterize structurally due to obstacles linked to FRQ purification and stability. The implications of LLPS formation in the negative arm of other eukaryotic clocks and its role in temperature compensation are highly intriguing.

      Strengths:

      The strengths of the manuscript include the scientific rigor of the experiments, the importance of the topic to the field of chronobiology, and new mechanistic insights obtained.

      Weaknesses:

      This reviewer had questions regarding some of the conclusions reached.

      Recommendations For The Authors:

      The reviewer has a few questions for the authors:

      1) Concerning the reduced activity of sequestered CK1 within LLPS droplets with FRQ, to what extent is this decrease attributed to distinct buffer conditions for LLPS formation compared to non-LLPS conditions?

      We don’t believe that these buffer conditions significantly influence the change in FRQ phosphorylation by CK1 observed at elevated temperatures. The pH and ionic strength of the buffer are in keeping with physiological conditions (300 mM NaCl, 50 mM sodium phosphate, 10 mM MgCl2, pH 7.5); CK1 autophosphorylation is robust and generally increases with temperature under these conditions (Figure 7B). However, as LLPS increases CK1 autophosphorylation remains high, whereas phosphorylation of FRQ dramatically decreases. In fact, we chose to alter temperature specifically to induce changes in phase behavior under constant buffer conditions. In this way LLPS could be increased, and FRQ phosphorylation evaluated, without altering the solution composition. Thus, we believe that the reduced CK1 kinase activity toward FRQ as a substrate is directly due to the impact of the generated LLPS milieu, i.e. the changes in structural/dynamic properties of FRQ and/or CK1 induced by the effects of being a phase separate microenvironment, which could be substantially different from non-phase separated buffer environment. For example, previous work done on the disordered region of DDX4 [Brady et al. 2017, and Nott et al. 2015] show that even the amount of water content and stability of biomolecules such as double strand nucleic acids encapsulated within the droplets differ between non- and phase separated DDX4 samples.

      Nott T.J. et al. Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell. 2015 57 936-947.

      Brady J.P. et al. Structural and hydrodynamic properties of an intrinsically disordered region of a germ cell-specific protein on phase separation. PNAS 2017 114 8194-8203.

      In the results section we have clarified the use of temperature to control LLPS, “We compared the phosphorylation of FRQ by CK1 in a buffer that supports phase separation under different temperatures, using the latter as a means to control the degree of LLPS without altering the solution composition.”

      On p.16 of the discussion we have elaborated on the above point, “We believe that the reduced CK1 kinase activity toward FRQ as a substrate is directly due to the impact of the generated LLPS milieu, i.e. the changes in structural/dynamic properties of FRQ and/or CK1 induced by the effects of being a phase separate microenvironment, which could be substantially different from non-phase separated buffer environment. For example, previous work done on the disordered region of DDX4 {Brady, 2017 #130;Nott, 2015 #131} show that even the amount of water content and stability of biomolecules such as double strand nucleic acids encapsulated within the droplets differ between non- and phase separated DDX4 samples. Indeed, the spin-labeling experiments indicate that the dynamics of FRQ have been altered by LLPS (Fig. 7D).”

      2) The DEER technique demonstrated spatial proximity between FRH and CK1 when bound to FRQ in the FFC. Is there evidence suggesting their lack of proximity in the absence of FRQ? Also, how important is this spatial proximity to FFC function?

      We have additional data substantiating that FRH and CK1 do not interact in the absence of FRQ. In the revised paper we have included the results of a SEC-MALS experiment showing that FRH and CK1 elute separately when mixed in equimolar amounts and applied to an analytical S200 column coupled to a MALS detector (Figure 1 below and Fig. S8). The importance of the FRH and CK1 proximity is currently unknown, but there are reasons to believe that it could have functional consequences. For example, CK1, as recruited by FRQ, phosphorylates the White-Collar Complex (WCC) in the repressive arm of the circadian oscillator [e.g. He et al. Genes Dev. 20, 2552 (2006); Wang et al, Mol. Cell 74, 771 (2019)]. Interactions between the WCC and the FFC are mediated at least in part by FRH binding to White Collar-2 [Conrad et al. EMBO J. 35, 1707 (2016)]. Thus, FRH:FRQ may effectively bridge CK1 to the WCC to facilitate the phosphorylation of the latter by the former.

      He et al. CKI and CKII mediate the FREQUENCY-dependent phosphorylation of the WHITE COLLAR complex to close the Neurospora circadian negative feedback loop. Genes Dev. 2006 20, 2552-2565.

      Wang B. et al. The Phospho-Code Determining Circadian Feedback Loop Closure and Output in Neurospora Mol. Cell 2019 74, 771-784.

      Conrad et al. Structure of the frequency-interacting RNA helicase: a protein interaction hub for the circadian clock. EMBO J. 2016 35, 1707-1719.

      Author response image 1.

      Size-exclusion chromatography- multiangle light scattering (SEC-MALS) of a mixture of purified FRH and CK1. The proteins elute separately as monomers with no evidence of co-migration.

      3) Is there any indication that impairing FRQ's ability to undergo LLPS disrupts clock function?

      We do not currently have direct evidence that LLPS of FRQ is essential for clock function. These experiments are ongoing, but complicated by the fact that changes to FRQ predicted to alter LLPS behavior also have the potential to perturb its many other clock-related functions that include dynamic interactions with partners, dynamic post-translational modification and rates of synthesis and degradation. That said, the intrinsic disorder of FRQ is important for it to act as a protein interaction hub, and large intrinsically disordered regions (IDRs) very often mediate LLPS, as is certainly the case here. In this work, we argue that the ability of FRQ to sequester clock proteins during the TTFL may involve LLPS. Additionally, we show that the phosphorylation state of FRQ, which is a critical factor in clock period determination, depends on LLPS. Given that the conditions under which FRQ phase separates are physiological in nature and that live-cell imaging is consistent with FRQ phase separation in the nucleus, it seems likely that FRQ does phase separate in Neurospora. Furthermore, given that the sequence features of FRQ that mediate phase-separation are conserved not only across FRQ homologs but also in other functionally related clock proteins, it is probable, albeit worthy of further investigation, that LLPS has functional consequences for the clock. See the response to reviewer 3 for more discussion on this topic.

      Minor Points:

      Indeed, we have included a reference to this paper on p. 3: “Emerging studies in plants (Jung, et al., 2020), flies (Xiao, et al., 2021) and cyanobacteria (Cohen, et al., 2014; Pattanayak, et al., 2020) implicate LLPS in circadian clocks, and in Neurospora it has recently been shown that the Period-2 (PRD-2) RNA-binding protein influences frq mRNA localization through a mechanism potentially mediated by LLPS (Bartholomai, et al., 2022).”

      • On page 9, six lines from the top, please insert "of" between "distributions" and "p-FRQ".

      We have corrected this typo.

      Reviewer #2 (Public Review):

      Summary:

      This study presents data from a broad range of methods (biochemical, EPR, SAXS, microscopy, etc.) on the large, disordered protein FRQ relevant to circadian clocks and its interaction partners FRH and CK1, providing novel and fundamental insight into oligomerization state, local dynamics, and overall structure as a function of phosphorylation and association. Liquid-liquid phase separation is observed. These findings have bearings on the mechanistic understanding of circadian clocks, and on functional aspects of disordered proteins in general.

      Strengths:

      This is a thorough work that is well presented. The data are of overall high quality given the difficulty of working with an intrinsically disordered protein, and the conclusions are sufficiently circumspect and qualitative to not overinterpret the mostly low-resolution data.

      Weaknesses:

      None

      Recommendations For The Authors:

      1)Fig.2B: Beyond the SEC part (absorbance vs elution volume), I don't understand this plot, in particular the horizontal lines. They appear to be correlating molecular weight with normalized absorption at 280 nm, but the chromatogram amplitudes are different. Clarify, or modify the plot. There are also some disconnected line segments between 10-11 mL - these seem to be spurious.

      We apologize for the confusion. The horizontal lines are meant to only denote the average molecular weights of the elution peaks and not correlate with the A280 values. The disconnected lines are the light-scattering molecular weight readouts from which the horizontal lines are derived. The problematic nature of the figure is that the full elution traces and MALS traces across the peaks call for different scales to best depict the relevant features of the data. We have reworked the figure and legend to make the key points more clear.

      2) It could be useful to add AF2 secondary structure predictions, pLDDT, and the helical propensity analysis to the sequence ribbon in Fig.1C.

      Thank you for the suggestion, we have updated the figure to incorporate the pLDDT scores into the linear sequence map, as well as the secondary structure predictions.

      3) Fig.3D: It would be better to show the raw data rather than the fits. At the same time, I appreciate the fact that the authors resisted the temptation to show distance distributions.

      Yes, we agree that it is important to show the raw data; it is included in the supplementary section. Depicting the raw data here unfortunately obscures the differences in the traces and we believe that showing the data as a superposition is quite useful to convey the main differences among the sites. However, we have now explicitly stated in the figure legend that the corresponding raw data traces are given in Figures S5-6.

      4) Fig.5: For all distance distributions, error intervals should be added (typically done in terms of shaded bands around the best-fit distribution). As shown, precision is visually overstated. The error analysis shown in the SI is dubious, as it shows some distances have no error whatsoever (e.g. 6nm in 370C-490C), which is not possible.

      We did previously show the error intervals in the SI, but we agree that it is better to include them here as well, and have done so in the new Figure 5. With respect to the error analysis, we are following the methodology described in the following paper:

      Srivastava, M. and Freed J., Singular Value Decomposition Method To Determine Distance Distributions in Pulsed Dipolar Electron Spin Resonance: II. Estimating Uncertainty. J. Phys Chem A (2019) 123:359-370. doi: 10.1021/acs.jpca.8b07673.

      Briefly, the uncertainty we are plotting is showing the "range" of singular values over which the singular value decomposition (SVD) solution remains converged. For most of the data displayed in this paper we only used the first few singular values (SVs) and the solution remained converged for ± 1 or 2 SVs near the optimum solution. For example, if the optimum solution was 4 SVs then the range in which the solution remained converged is ~3-6 SVs. We plot three lines - lowest range of SVs, highest range of SVs and optimum number of SVs – in the SI figures the optimum SV solution is shown in black and the region between the converged solutions with the highest and lowest number of SVs is shaded in red. Owing to the point-wise reconstruction of the distance distribution, the SVD method enables localized uncertainty at each distance value. Therefore, some points will have high uncertainty, whereas others low. The distance that may appear to have no uncertainty has actually very low uncertainty; which can be seen at close inspection. In these cases, we observe this "isosbestic" type behavior where the P(r) appears to change little across the acceptable solutions and hence there is only a small range of P(r) values at that particular r. This behavior results from multimodal distributions wherein the change in SVs shifts neighboring peaks to lower and higher distances respectively, producing an apparent cancelation effect. What we believe is most important for the biochemical interpretation, and accurately reflected by this analysis, is the general width of the uncertainty across the distribution and how this impacts the error in both the mean and the overall skewing of the distribution at short or long distances.

      Details of the error treatment as described above have been added to the supplementary methods section.

      5) The Discussion (p.13) states that the SAXS and DEER data show that disorder is greater than in a molten globule and smaller than in a denatured protein. Evidence to support this statement (molten globule DEER/SAXS reference data etc.) should be made explicit.

      We will make the statement more explicit by changing it to the following: “Notably, the shape of the Kratky plots generated from the SAXS data suggest a degree of disorder that is substantially greater than that expected of a molten globule (Kataoka, et al., 1997), but far from that of a completely denatured protein (Kikhney, et al., 2015; Martin, Erik W., et al., 2021). Similarly, the DEER distributions, though non-uniform across the various sites examined, indicate more disorder than that of a molten globule (Selmke et al., 2018) but more order than a completely unfolded protein (van Son et al. 2015).”

      van Son, M., et al. Double Electron−Electron Spin Resonance Tracks Flavodoxin Folding, J. Phys. Chem. B 2015, 119, 13507−13514. doi: 10.1021/acs.jpcb.5b00856.

      Selmke, B. et al. Open and Closed Form of Maltose Binding Protein in Its Native and Molten Globule State As Studied by Electron Paramagnetic Resonance Spectroscopy. Biochemistry 2018, 57, 5507−5512 doi: 10.1021/acs.biochem.8b00322.

      6) Fig. S11B could be promoted to the main paper.

      This comment makes a good point. Figure 8 is now an updated scheme, similar to the previous Fig. S11B. Thank you for the suggestion.

      Minor corrections:

      p.1: "composed from" -> "composed of"

      p.2: TFFLs -> TTFLs

      p.2: "and CK1 via" => "and to CK1 via"

      p.5: "Nickel" -> "nickel"

      p.5: "Size Exclusion Chromatography" -> "Size exclusion chromatography"

      p.5: "Multi Angle Light Scattering" -> "multi-angle light scattering"

      Fig.2 caption: "non-phosphorylated (np-FRQ)" -> "non-phosphorylated FRQ (np-FRQ)"

      Fig. S3: What are the units on the horizontal axis?

      Fig. 5H is too small

      Fig. S8, S9: all distance distribution plots show a spurious "1"

      Fig. 6A has font sizes that are too small to read

      p.11: "cytoplasm facing" -> "cytoplasm-facing"

      p.11: "temperature dependent" -> "temperature-dependent"

      p.12: "substrate-sequestration and product-release" -> "substrate sequestration and product release"

      p.12: "depend highly buffer composition" -> "depend highly on buffer composition"

      We thank the reviewer for finding these errors and their attention to detail. All of these minor points have been addressed in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript from Tariq and Maurici et al. presents important biochemical and biophysical data linking protein phosphorylation to phase separation behavior in the repressive arm of the Neurospora circadian clock. This is an important topic that contributes to what is likely a conceptual shift in the field. While I find the connection to the in vivo physiology of the clock to be still unclear, this can be a topic handled in future studies.

      Strengths:

      The ability to prepare purified versions of unphosphorylated FRQ and P-FRQ phosphorylated by CK-1 is a major advance that allowed the authors to characterize the role of phosphorylation in structural changes in FRQ and its impact on phase separation in vitro.

      Weaknesses:

      The major question that remains unanswered from my perspective is whether phase separation plays a key role in the feedback loop that sustains oscillation (for example by creating a nonlinear dependence on overall FRQ phosphorylation) or whether it has a distinct physiological role that is not required for sustained oscillation.

      The reviewer raises the key question regarding data suggesting LLPS and phase separated regions in circadian systems. To date condensates have been seen in cyanobacteria (Cohen et al, 2014, Pattanayak et al, 2020) where there are foci containing KaiA/C during the night, in Drosophila (Xiao et al, 2021) where PER and dCLK colocalize in nuclear foci near the periphery during the repressive phase, and in Neurospora (Bartholomai et al, 2022) where the RNA binding protein PRD-2 sequesters frq and ck1a transcripts in perinuclear phase separated regions. Because the proteins responsible for the phase separation in cyanobacteria and Drosophila are not known, it is not possible to seamlessly disrupt the separation to test its biological significance (Yuan et al, 2022), so only in Neurospora has it been possible to associate loss of phase separation with clock effects. There, loss of PRD-2, or mutation of its RNA-binding domains, results in a ~3 hr period lengthening as well as loss of perinuclear localization of frq transcripts. A very recent manuscript (Xie et al., 2024) calls into question both the importance and very existence of LLPS of clock proteins at least as regards to mammalian cells, noting that it may be an artefact of overexpression in some places where it is seen, and that at normal levels of expression there is no evidence for elevated levels at the nuclear periphery. Artefacts resulting from overexpression plainly cannot be a problem for our study nor for Xiao et al. 2021 as in both cases the relevant clock protein, FRQ or PER, was labeled at the endogenous locus and expressed under its native promoter. Also, it may be worth noting that although we called attention to enrichment of FRQ[NeonGreen] at the nuclear periphery, there remained abundant FRQ within the core of the nucleus in our live-cell imaging.

      Cohen SE, et al.: Dynamic localization of the cyanobacterial circadian clock proteins. Curr Biol 2014, 24:1836–1844, https://doi.org/10.1016/j.cub.2014.07.036.

      Pattanayak GK, et al.: Daily cycles of reversible protein condensation in cyanobacteria. Cell Rep 2020, 32:108032, https://doi.org/10.1016/j.celrep.2020.108032.

      Xiao Y, Yuan Y, Jimenez M, Soni N, Yadlapalli S: Clock proteins regulate spatiotemporal organization of clock genes to control circadian rhythms. Proc Natl Acad Sci U S A 2021, 118, https://doi.org/10.1073/pnas.2019756118.

      Bartholomai BM, Gladfelter AS, Loros JJ, Dunlap JC. 2022 PRD-2 mediates clock-regulated perinuclear localization of clock gene RNAs within the circadian cycle of Neurospora. Proc Natl Acad Sci U S A. 119(31):e2203078119. doi: 10.1073/pnas.2203078119.

      Yuan et al., Curr Biol 78: 102129, 2022. https://doi.org/10.1016/j.ceb.2022.102129

      Pancheng Xie, Xiaowen Xie, Congrong Ye, Kevin M. Dean, Isara Laothamatas , S K Tahajjul T Taufique, Joseph Takahashi, Shin Yamazaki, Ying Xu, and Yi Liu (2024). Mammalian circadian clock proteins form dynamic interacting microbodies distinct from phase separation. Proc. Nat. Acad. Sci. USA. In press.

      We have updated the discussion on p. 15 accordingly:

      “Live cell imaging of fluorescently-tagged FRQ proteins is consistent with FRQ phase separation in N. crassa nuclei. FRQ is plainly not homogenously dispersed within nuclei, and the concentrated foci observed at specific positions in the nuclei indicate condensate behavior similar to that observed for other phase separating proteins (Bartholomai, et al., 2022; Caragliano, et al., 2022; Gonzalez, A., et al., 2021; Tatavosian, et al., 2019; Xiao, et al., 2021). While ongoing experiments are exploring more deeply the spatiotemporal dynamics of FRQ condensates in nuclei, the small size of fungal nuclei as well as their rapid movement with cytoplasmic bulk flow through the hyphal syncytium makes these experiments difficult. Of particular interest is drawing comparisons between FRQ and the Drosophila Period protein, which has been observed in similar foci that change in size and subnuclear localization throughout the circadian cycle (Meyer, et al., 2006; Xiao, et al., 2021), although it must be noted that the foci we observed are considerably more dynamic in size and shape than those reported for PER in Drosophila (Xiao, et al., 2021). A very recent manuscript (Xie, et al., 2024) calls into question the importance and very existence of LLPS of clock proteins at least in regards to mammalian cells, noting that it may be an artifact of overexpression in some instances where it is seen, and that at normal levels of expression there is no evidence for elevated levels at the nuclear periphery. Artifacts resulting from overexpression are unlikely to be a problem for our study and that of Xiao et al as in both cases clock proteins were tagged at their endogenous locus and expressed from their native promoters. Although we noted enrichment of FRQmNeonGreen near the nuclear envelope in our live-cell imaging, there remained abundant FRQ within the core of the nucleus.”

      Recommendations For The Authors:

      The data in Fig 6 showing microscopy of Neurospora is suggestive but needs more information/controls. Does the strain that expresses FRQ-mNeonGreen have normal circadian rhythms? How were the cultures handled (in terms of circadian entrainment etc.) for imaging? Do samples taken at different clock times appear different in terms of punctate structures in microscopy? The authors cite the Xiao 2021 paper in Drosophila, but would be good to see if the in vivo picture is fundamentally similar in Neurospora.

      All of the live-cell images we report were from cells grown in constant light; in the dark, strains bearing FRQ[NeonGreen] have normally robust rhythms with a slightly elongated period length as measured by a frq Cbox-luc reporter. Although we are interested, of course, in whether and if so how the punctate structures changed as function of circadian time, this is work in progress and beyond the scope of the present study. This said, it is plain to see from the movie included as a Supplemental file here that the puncta we see are moving and fusing/splitting on a scale of seconds whereas those reported in Drosophila by Xiao et al. (Xiao et al, 2021, above) were stable for many minutes; thus the FRQ foci seen in Neurospora are quite a bit more dynamic than those in Drosophila.

      We have updated the results section on p. 11 to provide this information more clearly: “FRQ thus tagged and driven by its own promoter is expressed at physiologically normal levels, and strains bearing FRQmNeonGreen as the only source of FRQ are robustly rhythmic with a slightly longer than normal period length. Live-cell imaging in Neurospora crassa offers atypical challenges because the mycelia grow as syncytia, with continuous rapid nuclei motion during the time of imaging. This constant movement of nuclei is compounded by the very low intranuclear abundance of FRQ and the small size of fungal nuclei, making not readily feasible visualization of intranuclear droplet fission/fusion cycles or intranuclear fluorescent photobleaching recovery experiments (FRAP) that could report on liquid-like properties. Nonetheless, bright and dynamic foci-like spots were observed well inside the nucleus and near the nuclear periphery, which is delineated by the cytoplasm-facing nucleoporin Son-1 tagged with mApple at its C-terminus (Fig. 6D,E, Movie S1). Such foci are characteristic of phase separated IDPs (Bartholomai, et al., 2022; Caragliano, et al., 2022; Gonzalez, A., et al., 2021; Tatavosian, et al., 2019) and share similar patterning to that seen for clock proteins in Drosophila (Meyer, et al., 2006; Xiao, et al., 2021), although the foci we observed are substantially more dynamic than those reported in Drosophila.”

      Another issue where some commentary would be helpful: Fig 7 shows that phase separation behavior is strongly temperature dependent (not biophysically surprising). Is that at odds with the known temperature compensation of the circadian rhythm if LLPS indeed plays a key role in the oscillator?

      We believe that the dependence of CK1-mediated FRQ phosphorylation on temperature, as manifested by FRQ phase separation, is consistent with temperature compensation within the Neurospora circadian oscillator. The phenomenon of temperature compensation by circadian clocks involves the intransigence of the oscillator period to temperature change. Stability of period with temperature change would not necessarily be expected of a generic chemical oscillator, which would run faster (shorter period) at higher temperature owing to Arrhenius behavior of the underlying chemical reactions. Circadian phosphorylation of FRQ is one such chemical process that contributes to the oscillation of FRQ abundance on which the clock is based. Reduced CK1 phosphorylation of FRQ causes both longer periods [Mehra et al., 2009] and loss of temperature compensation (manifested as a reduction of period length at higher temperature) [Liu et al, Nat Comm, 10, 4352 (2019); Hu et al, mBio, 12, e01425 (2021)]. Thus, the ability of increased LLPS formation at elevated temperature to reduce FRQ phosphorylation by CK1 (but not intrinsic CK1 autophosphorylation) would be a means to counter a decreasing period length that would otherwise manifest in an under compensated system. As further negative feedback on the system, LLPS is also promoted by FRQ phosphorylation itself, which in turn will reduce phosphorylation by CK1. Thus, both increased FRQ phosphorylation and temperature will couple to increased LLPS and mitigate period shortening through reduction of CK1 activity.

      Mehra et al., A Role for Casein Kinase 2 in the Mechanism Underlying Circadian Temperature Compensation. May 15, 2009. Cell 137, 749–760,

      Liu et al. FRQ-CK1 interaction determines the period of circadian rhythms in Neurospora. Nat Comm. 2019, 10 4352.

      Hu et al FRQ-CK1 Interaction Underlies Temperature Compensation of the Neurospora Circadian Clock mBio 2021 12 WOS:000693451600006.

      We have added Figure 8 to clarify the interpretation of the temperature compensation implicaitons of our work, the legend of which reads:

      “Figure 8: LLPS may play a role in temperature compensation of the clock through modulation of FRQ phosphorylation. Reduced CK1 phosphorylation of FRQ causes both longer periods (Mehra, et al., 2009) and loss of temperature compensation (manifested as a shortening of period at higher temperature) (Hu, et al., 2021; Liu, X., et al., 2019). Thus, the ability of increased LLPS at elevated temperature (larger grey circle) to reduce FRQ phosphorylation by CK1 will counter a shortening period that would otherwise manifest in an under compensated system. As further negative feedback, LLPS is also promoted by increased FRQ phosphorylation, which in turn will reduce phosphorylation by CK1. Thus, both increased FRQ phosphorylation and temperature favor LLPS and reduction of CK1 activity.”

      one minor comment: The chemical structures in Fig 3A have some issues where the "N" and "S" are flipped. Would be good to remake these figures to fix this problem.

      We apologize, the figure has been replaced with an improved version.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      The single-mutant and double-mutant crp/rpoB strains were made by co-transduction with a nearby gene deletion (kanR-marked). I couldn't tell from the methods section whether these mutants, e.g., crp-H22N delta-chiA, were compared to wild-type cells or deletion mutants, e.g., delta chiA, in the proteomics experiments. I encourage the authors to explain this more clearly in the methods section, and to briefly mention in the Results section and relevant figure legends that the crp/rpoB mutant strains (and possibly the "wild-type" strains) also have gene deletions. If the comparison "wild-type" strains are fully wild-type (i.e., not deleted for chiA/yjaH), it is especially important to mention this in the Results section and the figure legends since the phenotypic changes could be due to the gene deletions rather than the mutations in crp/rpoB

      We appreciate and agree with the editor's suggestion to clarify this point.

      Accordingly, we have made the following changes to the text:

      p11 L30-34 in the main text:

      "The second experiment similarly compared an engineered BW25113 (BW) strain, containing the two regulatory mutations from the compact set (i.e., crp H22N and rpoB A1245V) together with the deletions used to insert them (see methods and DataS1 file), to a “wild type” BW strain (a corresponding knockout strain without the mutations, see methods)."

      p28 under Chemostat proteomics experiment L13-16 in methods:

      "The starting volume of each bioreactor was 150 ml M9 media supplemented with either 30 mM and 10mM D-xylose for the evolved and ancestor samples or only 10mM D-xylose for BW including compact set mutations and/or the deletions used for their insertions (DataS1 file). The minimal media also included trace elements and vitamin B1 was omitted."

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Su et al propose the existence of two mechanisms repressing SBF activity during entry into meiosis in budding yeast. First, a decrease in Swi4 protein levels by a LUTI-dependent mechanism where Ime1 would act closing a negative feedback loop. Second, the sustained presence of Whi5 would contribute to maintaining SBF inhibited under sporulation conditions. The article is clearly written and the experimental approaches used are adequate to the aims of this work. The results obtained are in line with the conclusions reached by the authors but, in my view, they could also be explained by the existing literature and, hence, would not represent a major advance in the field of meiosis regulation.

      We respectfully disagree with the reviewer about their comment that this work can be explained by the existing literature. First, while SWI4LUTI has been previously identified in meiotic cells along with ~ 380 LUTIs, the biological purpose of these alternative mRNA isoforms and their effect on cellular physiology still remain largely unknown. Our manuscript clarifies this gap in understanding for SWI4LUTI. Loss of SWI4LUTI contributes to dysregulation of meiotic entry and does so by failing to properly repress the known inhibitors of meiotic entry, the CLNs. Furthermore, even though Cln1 and Cln2 have been previously shown to antagonize meiosis, the mechanisms that restrict their activity was unclear prior to our study.

      We recognize work done by others demonstrating Whi5-dependent repression of SBF during mitotic G1/S transition (De Bruin et al., 2004; Costanzo et al., 2004). We further examined Whi5’s involvement during meiotic entry and found that it acts in conjunction with the LUTI-based mechanism to restrict SBF activity. Combined loss of both mechanisms results in the increased expression of G1 cyclins, decreased expression of early meiotic genes, and a delay in meiotic entry (Figure 6). Neither mechanism was previously known to regulate meiotic entry. Our study not only adds to our broader understanding of gene regulation during meiosis but also raises additional questions regarding how LUTIs regulate gene expression and function.

      Regarding the first mechanism, Fig 1 shows that Swi4 decreases very little after 1-2h in sporulation medium, whereas G1-cyclin expression is strongly repressed very rapidly under these conditions (panel D and work by others). This fact dampens the functional relevance of Swi4 downregulation as a causal agent of G1 cyclin repression.

      Reviewer 1 expresses concern for the observation that by 2 h in sporulation media there is a 32% decrease in Swi4-3V5 protein abundance compared to 0 h in SPO. This is consistent with the range of protein level decrease typically accomplished by LUTI-based gene regulation (Chen et al., 2017; Chia et al., 2017; Tresenrider et al., 2021), and while it is a modest reduction, it is consistent across replicates. Furthermore, we don’t make the argument that reduction in Swi4 levels alone is the sole regulator of G1 cyclin levels. In fact, we report that in addition to Swi4 downregulation, Whi5 also functions to restrict SBF activity during meiotic entry, thereby ensuring G1 cyclin repression.

      In addition, the LUTI-deficient SWI4 mutant does not cause any noticeable relief in CLN2 repression, arguing against the relevance of this mechanism in the repression of G1-cyclin transcription during entry into meiosis. The authors propose a second mechanism where Whi5 would maintain SBF inactive under sporulation conditions. The role of Whi5 as a negative regulator of the SBF regulon is well known. On the other hand, the double WHI5-AA SWI4-dLUTI mutant does not upregulate CLN2, the G1 cyclin with the strongest negative effect on sporulation, raising serious doubts on the functional relevance of this backup mechanism during entry into meiosis.

      Due to replicate variance, CLN2 did not make the cut by our mRNA-seq data analysis as a significant hit. To address reviewer 1’s final point we opted for the “gold standard” of reverse transcription coupled with qPCR to measure CLN2 transcript levels in the double mutant ∆LUTI; WHI5-AA and the wild-type control. This revealed that CLN2 levels were significantly increased in the double mutant compared to wild type at 2 h in SPO (Author Response Image 1, *, p = 0.0288, two-tailed t-test).

      Author response image 1.

      Wild type (UB22199) and ∆LUTI;WHI5-AA (UB25428) cells were collected to perform RT-qPCR for CLN2 transcript abundance. Transcript abundance was quantified using primer sets specific for each respective gene from three technical replicates for each biological replicate. Quantification was performed in reference to PFY1 and then normalized to wild-type control. FC = fold change. Experiments were performed twice using biological replicates, mean value plotted with range. Differences in wild type versus ∆LUTI; WHI5-AA transcript levels compared with a two-tailed t-test (*, p = 0.0288)

      Reviewer #2 (Public Review):

      Summary:

      The manuscript highlights a mechanistic insight into meiotic initiation in budding yeast. In this study, the authors addressed a genetic link between mitotic cell cycle regulator SBF (the Swi4-Swi6 complex) and a meiosis inducing regulator Ime1 in the context of meiotic initiation. The authors' comprehensive analyses with cytology, imaging, RNA-seq using mutant strains lead the authors to conclude that Swi4 levels regulates Ime1-Ume6 interaction to activate expression of early meiosis genes for meiotic initiation. The major findings in this paper are that (1) the higher level of Swi4, a subunit of SBF transcription factor for mitotic cell cycle regulation, is the limiting factor for mitosis-to-meiosis transition; (2) G1 cyclins (Cln1, Cln2), that are expressed under SBF, inhibit Ime1-Ume6 interaction under overexpression of SWI4, which consequently leads to downregulation of early meiosis genes; (3) expression of SWI4 is regulated by LUTI-based transcription in the SWI4 locus that impedes expression of canonical SWI4 transcripts; (4) expression of SWI4 LUTI is likely negatively regulated by Ime1; (5) Action of Swi4 is negatively regulated by Whi5 (homologous to Rb)-mediated inhibition of SBF, which is required for meiotic initiation. Thus, the authors proposed that meiotic initiation is regulated under the balance of mitotic cell cycle regulator SBF and meiosis-specific transcription factor Ime1.

      Strengths:

      The most significant implication in their paper is that meiotic initiation is regulated under the balance of mitotic cell cycle regulator and meiosis-specific transcription factor. This finding will provide a mechanistic insight in initiation of meiosis not only into the budding yeast also into mammals. The manuscript is overall well written, logically presented and raises several insights into meiotic initiation in budding yeast. Therefore, the manuscript should be open for the field. I would like to raise the following concerns, though they are not mandatory to address. However, it would strengthen their claims if the authors could technically address and revise the manuscript by putting more comprehensive discussion.

      Weaknesses:

      The authors showed that increased expression of the SBF targets, and reciprocal decrease in expression of meiotic genes upon SWI4 overexpression at 2 h in SPO (Figure 2F). However, IME1 was not found as a DEG in Supplemental Table 1. Meanwhile, IME1 transcript level was decreased at 2 h SPO condition in pATG8-CLN2 cells in Fig S4C.

      Now this reviewer still wonders with confusion whether expression of IME1 transcripts per se is directly or in directly suppressed under SBF-activated gene expression program at 2 h SPO in pATG8-SWI4 and pATG8-CLN2 cells. This reviewer wonders how Fig S4C data reconciles with the model summarized in Fig 6F.

      One interpretation could be that persistent overexpression of G1 cyclin caused active mitotic cell cycle, and consequently delayed exit from mitotic cell cycle, which may have given rise to an apparent reduction of cell population that was expressing IME1. For readers to better understand, it would be better to explain comprehensively this issue in the main text.

      We believe there was an oversight here. In supplemental table 1, IME1 expression is reported as significantly decreased. The volcano plot shown below also highlights this change (Author response image 2).

      Author response image 2.

      Volcano plot of DE-Seq2 analysis for ∆LUTI;WHI5-AA versus wild type. Dashed line indicates padj (p value) = 0.05. Analysis was performed using mRNA-seq from two biological replicates. Wild type (UB22199) and ∆LUTI;WHI5-AA (UB25428) cells were collected at 2 h in SPO. SBF targets (pink) (Iyer et al., 2001) and early meiotic genes (blue) defined by (Brar et al., 2012). Darker pink or darker blue, labeled dots are well studied targets in either gene set list.

      The % of cells with nuclear Ime1 was much reduced in pATG8-CLN2 cells (Fig 2B) than in pATG8-SWI4 cells (Fig 4C). Is the Ime1 protein level comparable or different between pATG8-CLN2 strain and pATG8-SWI4 strain? Since it is difficult to compare the quantifications of Ime1 levels in Fig S1D and Fig S4B, it would be better to comparably show the Ime1 protein levels in pATG8-CLN2 and pATG8-SWI4 strains.

      Further, it is uncertain how pATG8-CLN2 cells mimics the phenotype of pATG8-SWI4 cells in terms of meiotic entry. It would be nice if the authors could show RNA-seq of pATG8-CLN2/WT and/or quantification of the % of cells that enter meiosis in pATG8-CLN2.

      Analyzing bulk Ime1 protein levels across a population of cells (Author response image 3) reveals that overexpression of CLN2 causes a more severe decrease in Ime1 levels than overexpression of SWI4. This is consistent with our observation that pATG8-CLN2 has a more severe impact on meiotic entry than pATG8-SWI4. The higher CLN2 levels (Author response image 4) likely accounts for the observed difference in severity of phenotype between the two mutants.

      Author response image 3.

      Samples from strain wild type (UB22199), pATG8-SWI4 (UB2226), pATG8-CLN2 (UB25959) and were collected between 0-4 hours (h) in sporulation medium (SPO) and immunoblots were performed using α-GFP. Hxk2 was used a loading control.

      Author response image 4.

      Wild type (UB22199), pATG8-SWI4 (UB2226), pATG8-CLN2 (UB25959) cells were collected to perform RT-qPCR for CLN2 transcript abundance. Quantification was performed in reference to PFY1 and then normalized to wild-type control. FC = fold change.

      The authors stated that reduced Ime1-Ume6 interaction is a primary cause of meiotic entry defect by CLN2 overexpression (Line 320-322, Fig 4J-L). This data is convincing. However, the authors also showed that GFP-Ime1 protein level was decreased compared to WT in pATG8-CLN2 cells by WB (Fig S4A).

      Compared to wild type, pATG8-CLN2 cells have lower levels of Ime1. Consequently, reviewer 2 suggests that this reduction may be responsible for the observed meiotic defect. However, we tested this possibility and found it not to be the primary cause of the meiotic defect in pATG8-CLN2 cells. As shown in Figure S4A, when IME1 was overexpressed from the pCUP1 promoter, Ime1 protein levels were similar between wild-type and pATG8-CLN2 cells. Despite this similarity, we still observed a decrease in nuclear Ime1 (Figure 4F) and no rescue in sporulation (Figure 4A). Therefore, the reduction in Ime1 protein levels alone cannot explain the meiotic defect caused by CLN2 overexpression.

      Further, GFP-Ime1 signals were overall undetectable through nuclei and cytosol in pATG8-CLN2 cells (Fig 4B), and accordingly cells with nuclear Ime1 were reduced (Fig 4C). Although the authors raised a possibility that the meiotic entry defect in the pATG8-CLN2 mutant arises from downregulation of IME1 expression (Line 282-283), causal relationship between meiotic entry defect and CLN2 overexpression is still not clear.

      As reviewer 2 comments, we initially considered the possibility that meiotic entry defect induced by CLN2 overexpression could be attributed to decreased IME1 expression. However, in the following paragraph in the manuscript, we demonstrate equalizing IME1 transcript levels using the pCUP1-IME1 allele does not rescue the meiotic defect caused by CLN2 overexpression. Consequently, we conclude that the decrease in IME1 transcript levels alone cannot explain the meiotic defect caused by increased CLN2 levels.

      Is the Ime1 protein level reduced in the pATG8-CLN2;UME6-⍺GFP strain compared to WT? It would be better to comparably show the Ime1 protein levels in the pATG8-CLN2 strain and the pATG8-CLN2;UME6-⍺GFP strain by WB. Also, it would be nice if the authors could show quantification of the % of cells that enter meiosis in the pATG8-CLN2;UME6-⍺GFP strain to see how and whether artificial tethering of Ime1 to Ume6 rescued normal meiosis program rather than simply showing % sporulation in Fig4A.

      We do not agree with the suggestion to compare the pATG8-CLN2;UME6-⍺GFP with wild type as the kinetics of meiosis is rather different. The more appropriate comparison is UME6-⍺GFP and pATG8-CLN2;UME6-⍺GFP which shows GFP-Ime1 bulk protein levels are slightly lower (Author response image 5). However, when we use a more sensitive measurement of meiotic entry through the nuclear accumulation of Ime1 in single cells, as illustrated in Figure 4L, it becomes evident that the Ume6-Ime1 tether is capable of restoring nuclear Ime1 levels, even in the presence of CLN2 overexpression. Given that these cells exhibited wild type levels of nuclear Ime1 and underwent sporulation after 24 hours, we make the fair assumption that they have successfully initiated the meiotic program.

      Author response image 5.

      Wild type (UB22199), pATG8-SWI4 (UB35106), UME6-⍺GFP (UB35300), and UME6-⍺GFP; pATG8-CLN2 (UB35177) cells collected between 0-3 hours (h) in sporulation medium (SPO) and immunoblots were performed using α-GFP. Hxk2 was used a loading control

      The authors showed Ume6 binding at the SWI4LUTI promoter (Figure 5K). However, since Ume6 forms a repressive form with Rpd3 and Sin3a and binds to target genes independently of Ime1, Ume6 binding at the SWI4LUTI promoter bind does not necessarily represent Ime1-Ume6 binding there. Instead, it would be better to show Ime1 ChIP-seq at the SWI4LUTI promoter.

      We agree with reviewer 2 that Ime1 ChIP would be the ideal measurement. Unfortunately, this has proved to be technically challenging. To address this limitation, we utilized a published Ume6 ChIP-seq dataset along with a published UME6-T99N RNA-seq dataset. Cells carrying the UME6-T99N allele are unable to induce the expression of early meiotic transcripts due to lack of Ime1 binding to Ume6 (Bowdish et al., 1995). Accordingly, RNA-seq analysis should reveal whether or not the LUTIs identified by Ume6 ChIP are indeed regulated by Ime1-Ume6 during meiosis. For SWI4LUTI, this is exactly what we observe. Not only is there Ume6 binding at the SWI4LUTI promoter (Figure 5K), but there is also a significant decrease in SWI4LUTI expression in UME6-T99N cells under meiotic conditions (Figure S5). Based on these data, we conclude that the Ime1-Ume6 complex is responsible for regulating SWI4LUTI expression during meiosis.

      The authors showed ∆LUTI mutant and WHI5-AA mutant did not significantly change the expression of SBF targets nor early meiotic genes relative to wildtype (Figure 6A, C). Accordingly, they concluded that LUTI- or Whi5-based repression of SBF alone was not sufficient to cause a delay in meiotic entry (Line451-452), and perturbation of both pathways led to a significant delay in meiotic entry (Figure 6E). This reviewer wonders whether Ime1 expression level and nuclear localization of Ime1 was normal in ∆LUTI mutant and WHI5-AA mutant.

      Based on our observations in Figure 4, Ime1 protein and expression levels were not reliable indicators of meiotic entry. Consequently, we opted for a more downstream and functionally relevant measure of meiotic entry, which involved time-lapse fluorescence imaging of Rec8, an Ime1 target.

      Reviewer #1 (Recommendations For The Authors):

      The authors would like to mention previous work showing that G1-cyclin overexpression decreases the expression and nuclear accumulation of Ime1 (Colomina et al 1999 EMBO J 18:320). In this work, the interaction between Ime1 and Ume6 had been found to be resistant to G1-cyclin expression, arguing against a direct effect on the recruitment of Ime1 at meiotic promoters. Alternatively, differences in the experimental approaches used could be discussed to explain this apparent discrepancy.

      To clarify, in the paper that reviewer 1 is referring to (Colomina et al., 1999), the authors determine that the interaction between Ime1 and Ume6 is regulated by the presence of a non-fermentable carbon source. Additional work by others reveals that Ime1 undergoes phosphorylation by the protein kinases Rim11 and Rim15, promoting its nuclear localization and enabling interaction with Ume6 (Vidan and Mitchell, 1997; Pnueli et al., 2004; Malathi et al., 1999, 1997). Furthermore, both Rim11 and Rim15 kinase activities are inhibited by the presence of glucose via the PKA pathway (Pedruzzi et al., 2003; Rubin-Bejerano et al., 2004; Vidan and Mitchell, 1997). Accordingly, the elimination of cyclins in the presence of a non-fermentable carbon source (glucose) in (Colomina et al., 1999) is unlikely to result in an interaction between Ime1 and Ume6, as Rim11 and Rim15 remain repressed. Removal of cyclins in acetate does not further increase Ime1-Ume6 interaction leading the authors to conclude that G1 cyclins do not block Ime1 function through its interaction with Ume6. This work however uses loss of function (removal of G1 cyclins) to study the G1 cyclins’ effect on Ime1-Ume6 interaction while using timepoints that are well beyond meiotic entry. Additionally, Ime1-Ume6 interaction is being tested using yeast-two hybrid analysis with just the proposed interaction domain of Ime1 (amino acids 270-360). Therefore, the interpretation that G1 cyclins are dispensable for regulating the interaction between Ime1 and Ume6 is unclear from this work alone.

      There are many differences that can explain the discrepancy between our work and (Colomina et al., 1999). Our work uses increased expression of cyclins during meiotic entry. Additionally, in our study, we collected timepoints to measure meiotic entry (2 h in SPO) and sporulation (gamete formation) efficiency (24 h in SPO). Finally, we are using the endogenous, full length Ime1. These differences could very well explain the discrepancy with previous work. Lastly, in our discussion we acknowledge the lack of CDK consensus phosphorylation sites on Ime1. Therefore, it is most likely that G1 cyclins are not directly phosphorylating Ime1 and that other factors like Rim11 and Rim15 could be direct targets of the G1 cyclins, considering their involvement in the phosphorylation of Ime1-Ume6, as well as their role in regulating Ime1 localization and its interaction with Ume6. We have included these points in the revised manuscript (lines 547-551).

      Reviewer #2 (Recommendations For The Authors):

      This reviewer thinks that the findings in this paper are of general interest to meiosis field and help understanding the mechanism of meiotic initiation in mammals. The way of the current manuscript seems to be written for limited budding yeast scientists, and should not limited to the interest by the budding yeast scientists. Thus, it would be better to discuss more about what is known about the mechanism of initiation of meiosis not only in budding yeast but also in other species to share their finding to more broad scientists using other organisms.

      We appreciate reviewer 2’s comment and have added more discussion about the parallels between yeast and mammalian systems in meiotic initiation (lines 613-624).

      Reviewer #3 (Recommendations For The Authors):

      The effect of overexpression of Swi4 is tested for MI and MII (Fig1F): this is a very indirect readout of meiotic entry. The authors could present Rec8 localization (Fig2I) at this stage. However, this is still a superficial description of the meiotic phenotype: is the phenotype only a delay or is the meiotic prophase altered. It is specifically important to analyse this in more detail to answer whether the overexpression of Swi4 leads to an identical phenotype to the one of CLN2. Also the comparison between overexpression of Swi4 and Cln2 is difficult to evaluate: what is the level of CLN2 when SwI4 is overexpressed compared to CLN2 overexpression. The percentage of nuclear Ime1 is 50% vs 5% when Swi4 or Cln2 are overexpressed. What is the interpretation? What are the levels of Ime1? (Y axis of quantifications not comparable, see also comment for Fig5F,H)

      CLN2 is expressed at a much higher level in pATG8-CLN2 cells relative to pATG8-SWI4 (Author Response Image 4). Therefore, we don’t expect identical phenotypes, but rather a more severe deficiency in meiotic entry upon CLN2 overexpression. The key experiment that establishes causality between SWI4 and CLNs is reported in Figure 3, where deletion of either CLN1 or CLN2 rescues the meiotic entry delay exerted by SWI4 overexpression.

      Fig3EF: What is the phenotype of Cln1 and Cln2 without overexpression of Swi4?

      Meiotic entry is not faster in cln1∆ or cln2∆ cells compared to wild-type. We included these data in Supplemental Figure 3 and made the relevant changes in the manuscript (lines 257-261).

      Fig4F: Need a control with CLN2 overexpression only.

      A control with only CLN2 overexpression (pATG8-CLN2) is not appropriate since these meiotic time course experiments are synchronized using the pCUP1-IME1 allele. It would be a misleading comparison since the two meiosis would have different kinetics. Figure 4F reports that despite similar IME1 transcript levels and Ime1 protein levels, CLN2 overexpressing cells still have reduced nuclear Ime1. Since side-by-side comparison of pATG8-CLN2 and pCUP1-IME1 is not possible, we chose to measure sporulation efficiency at 24 h in Figure 4A. These data together suggest that elevated IME1 transcript and protein levels cannot rescue the defects associated with increased CLN2 expression.

      Fig5E: in wild type, by Northern blot, Swi4canon level is increasing during meiosis, not decreasing?, whereas protein level is decreasing, what is the interpretation?

      Northern data is less quantitative than smFISH, which show that SWI4canon transcript levels are significantly lower in meiosis compared to vegetative cells (Figure 5D). We also note that the Northern blot data were acquired from unsynchronized meiotic cells and could have additional limitations based on the population-based nature of the assay. Finally, additional analysis of a transcript leader sequencing (TL-seq) dataset from synchronized cells (Tresenrider et al., 2021) further confirms the decrease in SWI4canon transcript levels upon meiotic entry. (Author response image 6).

      Author response image 6.

      TL-seq data from (Tresenrider et al. 2021) visualized on IGV at the SWI4 locus. Two timepoints are plotted including premeiotic before IME1 induction (pink) and meiotic prophase or after IME1 induction (blue).

      Fig5F, H. This quantification needs duplicates for validation.

      Replicates are submitted for every blot in this paper to eLIFE.It can be found in the shared Dropbox folder to the editors (named Raw-blots-for-eLIFE).

      Fig5F, H. Why are the wild type values so different?

      The immunoblotting done between Figure 5F and Figure 5H are on separate blots and therefore should not be compared. Additionally, these values are not absolute measurements of wild type values of Swi4-3V5 and therefore we should not expect them to be the same. Any comparisons done of relative amounts of Swi4-3V5 are always done on the same blot and normalized to a loading control, hexokinase.

      FigS5: What is the effect of the Ume6-T99N on Swi4 protein level and on meiotic entry? Is the backup mechanism proposed active?

      We haven’t measured Swi4 protein levels in the UME6-T99N background but given that this mutation is known to disrupt the interaction between Ime1 and Ume6, we expect a similar trend to that reported in Figure 5I (pCUP1-IME1 uninduced).

      What is the evidence that Swi4/6 is a E2F homolog? What is the homology at the protein level?

      While there is no sequence homology between SBF and E2F there is remarkable similarity between metazoans and yeast in terms of the regulation of the G1/S transition (reviewed in Bertoli et al., 2013). E2F and SBF are both repressed before the G1/S transition by the inhibitors Rb and Whi5, respectfully (Costanzo et al., 2004; De Bruin et al., 2004; Hasan et al., 2014). During G1/S transition, a cyclin dependent kinase phosphorylates and inactivates these inhibitors. We have carefully edited our language in the manuscript to “functional homology” instead of just “homology”.

      FigS3 is missing

      Each supplemental figure was matched to its corresponding main figure. In the original submission, we didn’t have Figure S3. However, the revised manuscript now contains FigS3.

      Bertoli, C., J.M. Skotheim, and R.A.M. De Bruin. 2013. Control of cell cycle transcription during G1 and S phases. Nat. Rev. Mol. Cell Biol. 14:518–528. doi:10.1038/nrm3629.

      Bowdish, K.S., H.E. Yuan, and A.P. Mitchell. 1995. Positive control of yeast meiotic genes by the negative regulator UME6. Mol. Cell. Biol. 15:2955–2961. doi:10.1128/mcb.15.6.2955.

      Brar, G.A., M. Yassour, N. Friedman, A. Regev, N.T. Ingolia, and J.S. Weissman. 2012. High-Resolution View of the Yeast Meiotic Program Revealed by Ribosome Profiling. Science (80-. ). 335:552–558. doi:10.1126/science.1215110.

      De Bruin, R.A.M., W.H. McDonald, T.I. Kalashnikova, J. Yates, and C. Wittenberg. 2004. Cln3 activates G1-specific transcription via phosphorylation of the SBF bound repressor Whi5. Cell. 117:887–898. doi:10.1016/j.cell.2004.05.025.

      Chen, J., A. Tresenrider, M. Chia, D.T. McSwiggen, G. Spedale, V. Jorgensen, H. Liao, F.J. Van Werven, and E. Ünal. 2017. Kinetochore inactivation by expression of a repressive mRNA. Elife. 6:1–31. doi:10.7554/eLife.27417.

      Chia, M., A. Tresenrider, J. Chen, G. Spedale, V. Jorgensen, E. Ünal, and F.J. van Werven. 2017. Transcription of a 5’ extended mRNA isoform directs dynamic chromatin changes and interference of a downstream promoter. Elife. 6:1–23. doi:10.7554/eLife.27420.

      Colomina, N., E. Garí, C. Gallego, E. Herrero, and M. Aldea. 1999. G1cyclins block the Ime1 pathway to make mitosis and meiosis incompatible in budding yeast. EMBO J. 18:320–329. doi:10.1093/emboj/18.2.320.

      Costanzo, M., J.L. Nishikawa, X. Tang, J.S. Millman, O. Schub, K. Breitkreuz, D. Dewar, I. Rupes, B. Andrews, and M. Tyers. 2004. CDK activity antagonizes Whi5, an inhibitor of G1/S transcription in yeast. Cell. 117:899–913. doi:10.1016/j.cell.2004.05.024.

      Hasan, M., S. Brocca, E. Sacco, M. Spinelli, P. Elena, L. Matteo, A. Lilia, and M. Vanoni. 2014. A comparative study of Whi5 and retinoblastoma proteins : from sequence and structure analysis to intracellular networks. 4:1–24. doi:10.3389/fphys.2013.00315.

      Iyer, V.R., C.E. Horak, P.O. Brown, D. Botstein, V.R. Iyer, M. Snyder, and C.S. Scafe. 2001. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature. 409:533–538. doi:10.1038/35054095.

      Malathi, K., Y. Xiao, and A.P. Mitchell. 1997. Interaction of yeast repressor-activator protein Ume6p with glycogen synthase kinase 3 homolog Rim11p. Mol. Cell. Biol. 17:7230–7236. doi:10.1128/mcb.17.12.7230.

      Malathi, K., Y. Xiao, and A.P. Mitchell. 1999. Catalytic roles of yeast GSK3β/shaggy homolog Rim11p in meiotic activation. Genetics. 153:1145–1152. doi:10.1093/genetics/153.3.1145.

      Pedruzzi, I., F. Dubouloz, E. Cameroni, V. Wanke, J. Roosen, J. Winderickx, and C. De Virgilio. 2003. TOR and PKA Signaling Pathways Converge on the Protein Kinase Rim15 to Control Entry into G0. Mol. Cell. 12:1607–1613. doi:10.1016/S1097-2765(03)00485-4.

      Pnueli, L., I. Edry, M. Cohen, and Y. Kassir. 2004. Glucose and Nitrogen Regulate the Switch from Histone Deacetylation to Acetylation for Expression of Early Meiosis-Specific Genes in Budding Yeast. Mol. Cell. Biol. 24:5197–5208. doi:10.1128/mcb.24.12.5197-5208.2004.

      Rubin-Bejerano, I., S. Sagee, O. Friedman, L. Pnueli, and Y. Kassir. 2004. The In Vivo Activity of Ime1, the Key Transcriptional Activator of Meiosis-Specific Genes in Saccharomyces cerevisiae, Is Inhibited by the Cyclic AMP/Protein Kinase A Signal Pathway through the Glycogen Synthase Kinase 3- Homolog Rim11. Mol. Cell. Biol. 24:6967–6979. doi:10.1128/mcb.24.16.6967-6979.2004.

      Tresenrider, A., K. Morse, V. Jorgensen, M. Chia, H. Liao, F.J. van Werven, and E. Ünal. 2021. Integrated genomic analysis reveals key features of long undecoded transcript isoform-based gene repression. Mol. Cell. 81:2231-2245.e11. doi:10.1016/j.molcel.2021.03.013.

      Vidan, S., and A.P. Mitchell. 1997. Stimulation of yeast meiotic gene expression by the glucose-repressible protein kinase Rim15p. Mol. Cell. Biol. 17:2688–2697. doi:10.1128/mcb.17.5.2688.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Sender et al describe a model to estimate what fraction of DNA becomes cell-free DNA in plasma. This is of great interest to the community, as the amount of DNA from a certain tissue (for example, a tumor) that becomes available for detection in the blood has important implications for disease detection.

      However, the authors' methods do not consider important variables related to cell-free DNA shedding and storage, and their results may thus be inaccurate. At this stage of the paper, the methods section lacks important detail. Thus, it is difficult to fully assess the manuscript and its results.

      Strengths:

      The question asked by the authors has potentially important implications for disease diagnosis. Understanding how genomic DNA degrades in the human circulation can guide towards ways to enrich for DNA of interest or may lead to unexpected methods of conserving cell-free DNA. Thus, the question "how much genomic DNA becomes cfDNA" is of great interest to the scientific and medical community. Once the weaknesses of the manuscript are addressed, I believe this manuscript has the potential to be a widely used resource.

      Weaknesses:

      There are two major weaknesses in how the analysis is presented. First, the methods lack detail. Second, the analysis does not consider key variables in their model.

      Issues pertaining to the methods section.

      The current manuscript builds a flux model, mostly taking values and results from three previous studies: 1) The amount of cellular turnover by cell type, taken from Sender & Milo, 2021

      2) The fractions of various tissues that contribute DNA to the plasma, taken from Moss et al, 2018 and Loyfer et al, 2023

      My expertise lies in cell-free DNA, and so I will limit my comments to the manuscripts in (2). Paper by Loyfer et al (additional context):

      Loyfer et al is a recent landmark paper that presents a computational method for deconvoluting tissues of origin based on methylation profiles of flow-sorted cell types. Thus, the manuscript provides a well-curated methylation dataset of sorted cell-types. The majority of this manuscript describes the methylation patterns and features of the reference methylomes (bulk, sorted cell types), with a smaller portion devoted to cell-free DNA tissue of origin deconvolution.

      I believe the data the authors are retrieving from the Loyfer study are from the 23 healthy plasma cfDNA methylomes analyzed in the study, and not the re-analysis of the 52 COVID-19 samples from Cheng et al (MED 2021).

      Paper by Moss et al (additional context):

      Moss et al is another landmark paper that predates the Loyfer et al manuscript. The technology used in this study (methylation arrays) is outdated but is an incredible resource for the community. This paper evaluates cfDNA tissues of origin in health and different disease scenarios. Again, I assume the current manuscript only pulled data from healthy patients, although I cannot be sure as it is not described in the methods section.

      This manuscript:

      The current manuscript takes (I think) the total cfDNA concentration from males and females from the Moss et al manuscript (pooled cfDNA; 2 young male groups, 2 old male groups, 2 young female groups, 2 old female groups, Supplementary Dataset; "total_cfDNA_conc" tab). I believe this is the data used as total cfDNA concentration. It would be beneficial for all readers if the authors clarified this point.

      The tissues of origin, in the supplemental dataset ("fraction" tab), presents the data from 8 cell types (erythrocytes, monocytes/macrophages, megakaryocytes, granulocytes, hepatocytes, endothelial cells, lymphocytes, other). The fractions in the spreadsheet do not match the Loyfer or Moss manuscripts for healthy individuals. Thus, I do not know what values the supplementary dataset represents. I also don't know what the deconvolution values are used for the flux model.

      The integration of these two methods lack detail. Are the authors here using yields (ie, cfDNA concentrations) from Moss et al, and tissue fractions from Loyfer et al? If so, why? There are more samples in the Loyfer manuscript, so why are the samples from Moss et al. being used? The authors are also selectively ignoring cell-types that are present in healthy individuals (Neurons from Moss et al, 2018). Why?

      Appraisal:

      At this stage of the manuscript, I think additional evidence and analysis is required to confirm the results in the manuscript.

      Impact:

      Once the authors present additional analysis to substantiate their results, this manuscript will be highly impactful on the community. The field of liquid biopsies (non-invasive diagnostics) has the potential to revolutionize the medical field (and has already in certain areas, such as prenatal diagnostics). Yet, there is a lack of basic science questions in the field. This manuscript is an important step forward in asking more "basic science" questions that seek to answer a fundamental biological question.

      We thank the reviewer for the valuable comments on our analysis. In response to the feedback, we have updated the analysis to address all critical points as described below and revised the text to enhance the clarity of our methodology. One notable improvement to our analysis involved ensuring better alignment between the cohort data for cfDNA plasma concentration and cell turnover estimates. To achieve this, we utilized the total plasma concentration of cfDNA from a study conducted by Meddeb et al. 2019, taking into account the influence of age and sex on these concentrations and specifically focusing on a cohort of relatively young and healthy individuals. Additionally, we considered expected variations related to sex, age, and other pertinent factors, as outlined in the studies by Meddeb et al. 2019 and Madsen et al. 2019.

      In addition, we have addressed concerns regarding the technical aspects of cfDNA analysis, providing detailed explanations of their limited impact on our analysis and the resulting conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Cell-free DNA (cfDNA) are short DNA fragments released into the circulation when cells die. Plasma cfDNA level is thought to reflect the degree of cell-death or tissue injury. Indeed, plasma cfDNA is a reliable diagnostic biomarker for multiple diseases, providing insights into disease severity and outcomes. In this manuscript, Dr. Sender and colleagues address a fundamental question: What fraction of DNA released from cell death is detectable as plasma cfDNA? The authors use public data to estimate the amount of DNA produced from dying cells. They also utilize public data to estimate plasma cfDNA levels. Their calculations showed that <10% of DNA released is detectable as plasma cfDNA, the fraction of detectable cfDNA varying by tissue sources. The study demonstrates new and fundamental principles that could improve disease diagnosis and treatment via cfDNA.

      Strengths:

      1) The experimental approach is resource-mindful taking advantage of publicly available data to estimate the fraction of detectable cfDNA in physiological states. The authors did not assess if the fraction of detectable cfDNA changes in disease conditions. Nonetheless, their pioneering study lays the foundation and provides the methods needed for a similar assessment in disease states.

      2) The findings of this study potentially explain discrepancies in measured versus expected tissue-specific cfDNA from some tissues. For example, the gastrointestinal tract is subject to high cell turnover and release of DNA. Yet, only a small fraction of that DNA ends up in plasma as gastrointestinal cfDNA.

      3) The study proposes potential mechanisms that could account for the low fraction of detectable cfDNA in plasma relative to DNA released. This includes intracellular or tissue machinery that could "chew up" DNA released from dying cells, allowing only a small fraction to escape into plasma as cfDNA. Could this explain why the gastrointestinal track with an elaborate phagosome machinery contributes a small fraction of plasma cfDNA? Given the role of cfDNA as damage-associated molecular pattern in some diseases, targeting such a machinery may provide novel therapeutic opportunities.

      Weaknesses:

      In vitro and in vivo studies are needed to validate these findings and define tissue machinery that contribute to cfDNA production. The validation studies should address the following limitations of the study design: -

      1) Align the cohorts to estimate DNA production and plasma cfDNA levels. Cellular turnover rate and plasma cfDNA levels vary with age, sex, circadian clock, and other factors (Madsen AT et al, EBioMedicine, 2019). This study estimated DNA production using data abstracted from a homogenous group of healthy control males (Sender & Milo, Nat Med 2021). On the other hand, plasma cfDNA levels were obtained from datasets of more diverse cohort of healthy males and females with a wide range of ages (Loyfer et al. Nature, 2023 and Moss et al., Nat Commun, 2018).

      2) "cfDNA fragments are not created equal". Recent studies demonstrate that cfDNA composition vary with disease state. For example, cfDNA GC content, fraction of short fragments, and composition of some genomic elements increase in heart transplant rejection compared to no-rejection state (Agbor-Enoh, Circulation, 2021). The genomic location and disease state may therefore be important factors to consider in these analyses.

      3) Alternative sources of DNA production should be considered. Aside from cell death, DNA can be released from cells via active secretion. This and other additional sources of DNA should be considered in future studies. The distinct characteristics of mitochondrial DNA to genomic DNA should also be considered.

      We appreciate the reviewer's comments on our analysis. In response to the feedback, we have updated to address key points and revised the text accordingly.

      1) We have incorporated several enhancements to improve the coherence of our analysis. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      There was no specific estimate for a cohort of young males in both Meddeb et al. and Loyfer et al.; however, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in literature (Meddeb et al. 2019; Madsen et al. 2019). Thus, we demonstrate that sex and age have a small effect on the cfDNA concentrations and thus are unlikely to alter our conclusions substantially when considering a healthy population. We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion.

      2) In this study, we addressed the total amount of cfDNA in healthy individuals without regard to GC content, representation of different genomic regions, or fragment length, as the goal was to understand if cell death rates are fully accounted for by cfDNA concentration. We agree that it will be interesting to study the relative representation of the genome in cfDNA and the processes that determine cfDNA concentration in pathologies beyond the rate of cell death. These topics for future research fall beyond this study's scope.

      3) We know only a few specific cases whereby DNA is released from cells that are not dying. These include the release of DNA from erythroblasts and megakaryocytes to generate anucleated erythrocytes and platelets (Moss et al. 2022, cited in our paper) and the release of NETs from neutrophils.

      The presence of cfDNA fragments originating from megakaryocytes and erythroblasts indicates the elimination of megakaryocytes and erythroblasts and the birth of erythrocytes and platelets. However, the considerations in the rest of the paper still apply: the concentration of cfDNA from these sources is far lower than expected from the cell turnover rate.

      Concerning NETosis: the presence of cfDNA originating in neutrophils that have not died would reduce the concentration of cfDNA from dying neutrophils and thus further increase the discrepancy, which is the topic of our study (under-representation of DNA from dying cells in plasma).

      We neglected mitochondrial DNA, as it is not measured in methylation cell-of-origin analysis. Similarly to the argument above, if some of the total DNA measured in plasma is in fact, mitochondrial, this would mean that genomic cfDNA concentration is actually lower than the estimates, meaning that an even smaller fraction of DNA from dying cells is measured in plasma.

      Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      I think readers would appreciate the authors commenting or addressing the following points, in addition to addressing the concerns I raised about the methods section in the public review:

      What variables and considerations did the authors omit in this study?

      1) Cell-free DNA is found in virtually every biofluid.

      Thus, the fact that cell-free DNA is not present in the plasma does not mean it cannot be detected elsewhere. This also implies that phagocytosis may not be the only factor related to cfDNA not being present in the blood. One example (of many, many others) is neutrophil-derived cell-free DNA, which is present in the urine.

      Indeed, dying cells and their DNA can be consumed locally, released into the blood, or shed outside the body. The latter is a function of tissue topology. For example, intestinal epithelial cell turnover releases material to the lumen of the gut (i.e., stool); kidney and bladder cell turnover releases material to urine; and lung epithelium releases material to the air spaces. In these cases, the absence of cfDNA in plasma is expected. However, in cases where tissue topology dictates release to blood, low representation in cfDNA indicates local consumption or a related mechanism. In Figure 1 of the manuscript, we distinguish between tissues according to their topology, labeling organs that shed material to the outside denoted by open circles.

      Neutrophil-derived DNA in urine likely represents a local process in the kidney (neutrophils that penetrate the epithelium and fall into the urine). Neutrophils that die elsewhere in the body must release cfDNA to the blood before it can reach the urine. Hence, quantifying plasma cfDNA is a legitimate approach for assessing the relationship between cell death and cfDNA. The revised text clarifies this point. We made revisions to the initial paragraph in the results section and a paragraph within the discussion to provide clarity on this topic:

      “Based on atlases of human cell type-specific methylation signatures, Moss et al. and Loyfer et al. analyzed the main cell types contributing to plasma cfDNA. They found the primary sources of plasma cfDNA to be blood cells: granulocytes, megakaryocytes, macrophages, and/or monocytes (the signature could not differentiate between the last two), lymphocytes, and erythrocyte progenitors. Other cells that had detectable contributions are endothelial cells and hepatocytes. Qualitatively, these cells represent most of the leading cell types in cellular turnover, as shown in Sender & Milo 2021 (Sender and Milo 2021). Epithelial cells of the gastrointestinal tract, lung, kidney, bladder, and skin are other cell types that significantly contribute to cellular turnover. Dying cells in these tissues are shed into the gut lumen, the air spaces, the urine, or out of the skin (note that while DNA from gut, lung, and kidney epithelial cells can be found in stool, bronchoalveolar lavage, and urine, the fate of DNA from skin cells is not known). This arrangement may explain why DNA from these cell types is not represented in plasma cfDNA in healthy conditions. Therefore, it appears that cells with high cfDNA plasma levels are those with relatively high turnover that are not being shed out of the body.”

      “A comparison between the different types of cells shows a trend in which less DNA flux from cells with higher turnover gets to the bloodstream. In particular, a tiny fraction (1 in 3x104) of DNA from erythroid progenitors arrives at the plasma, indicating an extreme efficiency of the DNA recovery mechanism. Erythroid progenitors are arranged in erythroblastic islands. Up to a few tens of erythroid progenitors surround a single macrophage that collects the nuclei extruded during the erythrocyte maturation process (pyrenocytes) (Chasis and Mohandas 2008). The amount of DNA discarded through the maturation of over 200 billion erythrocytes per day (Sender and Milo 2021) exceeds all other sources of homeostatic discarded DNA. Our findings indicate that the organization of dedicated erythroblastic islands functions highly efficiently regarding DNA utilization. Neutrophils are another high-turnover cell type with a low level of cfDNA. When contemplating the process of NETosis (Vorobjeva and Chernyak 2020), the existence of cfDNA originating from live neutrophils would potentially diminish the concentration of cfDNA released by dying neutrophils, thereby amplifying the observed ratio for this particular cell type. The overall trend of higher turnover resulting in a lower cfDNA to DNA flux ratio may indicate similar design principles, in which the utilization of DNA is better in tissues with higher turnover. However, our analysis is limited to only several cell types (due to cfDNA test and deconvolution sensitivities), and extrapolation to cells with lower cell turnover is problematic.”

      2) Effect of biofluid storage.

      Cell-free DNA continues to degrade after it is extracted via blood draw. This is not expected to change tissue of origin predictions (although that remains to be shown in the literature), but definitely affects extraction yield. This is not accounted for (or even discussed) in the manuscript. It would be important to understand how this was done for the data presented here.

      The paper integrates data from multiple recent studies that adhered to state-of-the-art procedures requiring rapid processing of blood samples. In fact, earlier studies that were not careful to isolate plasma quickly typically reported very high concentrations due to the lysis of leukocytes and artifactual release of genomic DNA. Rapid plasma isolation and DNA extraction typically yield 5ng/ml in healthy donors, as stated in the paper (last paragraph of Results).

      3) Batch effects

      Batch effects are not discussed here and can affect cfDNA yields.

      Our analysis relies on data reported by multiple studies from different groups, which independently results in similar key findings (total concentration of cfDNA and the relative contribution of different tissues). Thus, batch effects are unlikely to affect the calculations markedly.

      4) Cell-free DNA extraction kits

      Different kits and methods extract cell-free DNA at different quantities. Importantly, much research has been done recently that most kits are not sensitive for ultrashort cell-free DNA (of lengths ~50bp). This may represent most of the DNA present in plasma. This raises an important question: are the yields that are being used in Moss et al (where I presume the total concentration is taken from) accurate? Is there more cell-free DNA that was missed? While the importance of this ultrashort cfDNA has yet to be shown, it is in the blood. Thus, the authors' model may underestimate ratios by not accounting for this. This is mentioned in the discussion, but it is not evident why it was not added into the model.

      The Qiagen cfDNA extraction kit can detect 50bp fragments. As shown in the specification sheets of the kit (https://www.qiagen.com/us/products/diagnostics-and-clinical-research/solutions-for -laboratory-developed-tests/qiasymphony-dsp-circulating-dna-kit), urine DNA contains abundant DNA fragments that peak at 50bp. In contrast, plasma cfDNA does not contain such fragments at appreciable concentrations. This suggests that small fragments, 50-150bp long, are not a major component of cfDNA, and thus, our measurements of the total concentration of cfDNA are not dramatically underestimated.

      The convention regarding the size distribution of cfDNA fragments is based on extensive evidence using multiple approaches. For example, a study that profiled the DNA released by multiple cell lines in vitro (Aucamp et al. 2017) used another kit for DNA isolation – the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel, Düren, Germany). This kit does extract fragments that are 50bp long (nucleospin-gel-and-pcr-clean-up-mini). Indeed, the DNA released from cultured cells did contain a peak at 50bp, but it was minor compared with the nucleosome-size peak.

      More recently, several studies did suggest the presence of ultra-short cfDNA fragments, 50 bp long on average, and concluded that such fragments might be present at a molar concentration that is comparable to that of nucleosome-protected DNA (for example, (Hisano et al. 2021)).

      Thus, our model estimates can be off by up to 2-fold (that is, actual cfDNA concentration measured in most studies overlooks the small fragments and thus underestimates the actual concentration of cfDNA by 2-fold). This is incorporated into the revised manuscript.

      We note that we cannot exclude the presence of abundant ultra-short DNA fragments (e.g., 10bp long). However, such fragments are not measurable in cfDNA analysis. Thus, we can refine our conclusion and state that only a small fraction of DNA of dying cells appears as measured cfDNA. We included a section in the methods detailing the integration of a potential factor for the short fragments and revised the discussion:

      “The overall plasma cfDNA concentration was multiplied by a factor of 1.5 to accommodate for the presence of small fragments of approximately 50 base pairs of cfDNA in the plasma. These fragments are suggested to contribute comparable molar concentrations (Hisano, Ito, and Miura 2021). Despite having approximately one-third of the mass, it is reasonable to presume that these fragments represent a similar number of genomes. This assumption is based on the idea that their source is a broken nucleosome unit, and the fragments represent the portion that was not degraded. Given the restricted data and its interpretation, we consider factors spanning the range of 1 (negligible effect) and 2 (doubling of the amount). The chosen factor, 1.5, is selected as the midpoint within this range of uncertainty.”

      “In this study, we report a surprising, dramatic discrepancy between the measured levels of cfDNA in the plasma and the potential DNA flux from dying cells. One hypothetical explanation for that discrepancy is the limited sensitivity of typical cfDNA assays to short DNA fragments, which may contribute a significant fraction of the overall cfDNA mass. Regular cfDNA analysis shows a size distribution concentrated around a length of 165 base pairs (bp). The sizes in ctDNA vary more, but most are longer than 100 bp (Alcaide et al. 2020; Udomruk et al. 2021). Recent studies suggested a significant fraction of single-strand ultrashort fragments (length of 25-60 bp) (Cheng et al. 2022; Hisano, Ito, and Miura 2021). However, the total amount of DNA contained in these fragments is less than or comparable to that of the longer “regular” nucleosome-protected cfDNA fragments (Cheng et al. 2022; Hisano, Ito, and Miura 2021), arguing against ultrashort fragments as a dominant explanation for the “missing” cfDNA material. We integrated the estimate provided by Hisano et al. into our analysis as a modifying factor for both the total concentration and uncertainty of plasma cfDNA. Importantly, this incorporation did not alter the overall conclusions, as the discrepancy between the cfDNA plasma concentration and potential DNA flux remains on the same order of magnitude. We note that we cannot exclude the presence of abundant DNA fragments that are even shorter (e.g., 10bp long) and are not measurable in cfDNA analysis. Thus, our formal conclusion is that only a small fraction of the DNA of dying cells appears as measurable cfDNA.”

      5) Health status of samples analyzed.

      Health, sex and physical activity affects cfDNA yields. This is not accounted for or discussed in the manuscript.

      We incorporated several enhancements to improve our analysis in response to the provided feedback. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      Furthermore, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in the works of (Meddeb et al. 2019; Madsen et al. 2019). Our intent in doing so was to demonstrate that these factors are unlikely to alter our conclusions substantially when considering a healthy population. We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion:

      “Our estimates for total plasma cfDNA concentration were derived from the median concentration observed in individuals below 47 years of age (n=52), as reported by (Meddeb et al. 2019). To complement this, we integrated our total concentration estimates with data on the proportion of cfDNA originating from specific cell types, leveraging a plasma methylome deconvolution method described by (Loyfer et al. 2023), which did not provide absolute quantities of cfDNA). To quantify the uncertainty associated with our cfDNA concentration estimates, we employed a methodology that considered several sources of variation. First, we incorporated the confidence interval of the median concentration reported by Meddeb et al. as a measure of uncertainty. Additionally, we accounted for individual-specific and analytic variations based on the study by (Madsen et al. 2019), encompassing factors such as the precise timing of measurements and assay precision. These sources of uncertainty were combined using the approach outlined below.”

      “Our current analysis focused on estimating plasma cfDNA concentration and cellular turnover in a cohort of healthy, relatively young individuals. The total plasma cfDNA concentrations were sourced from healthy individuals below 47 years, as reported by (Meddeb et al. 2019). We use data analyzed based on plasma samples from healthy individuals to estimate the proportion of cfDNA originating from specific cell types (Loyfer et al. 2023). These values were then compared to the potential DNA flux resulting from homeostatic cellular turnover, estimated for reference healthy males aged between 20 and 30 (Sender and Milo 2021). In our analysis, we considered various sources of uncertainty, including inter-individual variation, variability in the timing of sample collection, and analytical precision (Madsen et al. 2019; Meddeb et al. 2019). These factors collectively contributed to an uncertainty factor of less than 3. Importantly, this level of uncertainty does not alter our conclusion regarding the relatively small fraction of DNA present in plasma as cfDNA. Furthermore, we acknowledge that age and sex can impact total cfDNA concentration, as demonstrated by (Meddeb et al. 2019), with potential variations of up to 30%. However, as the results of our analysis present a much larger difference, these effects do not change the conclusions drawn from our analysis. Nevertheless, age and health status may influence the proportion of cfDNA originating from specific cell types and their corresponding cellular turnover rates. Consequently, the ratios themselves may vary in the elderly population or individuals with underlying health conditions.”

      Reviewer #2 (Recommendations For The Authors):

      1) Align the cohorts to estimate DNA production and plasma cfDNA levels. Cellular turnover rate and plasma cfDNA levels vary with age, sex, circadian clock, and other factors (Madsen AT et al, EBioMedicine, 2019). This study estimated DNA production using data abstracted from a homogenous group of healthy control males (Sender & Milo, Nat Med 2021). On the other hand, plasma cfDNA levels were obtained from datasets of more diverse cohort of healthy males and females with a wide range of ages (Loyfer et al. Nature, 2023 and Moss et al., Nat Commun, 2018).

      We have incorporated several enhancements to improve the coherence of our analysis. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      There was no specific estimate for a cohort of young males in both Meddeb et al. and Loyfer et al.; however, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in literature (Meddeb et al. 2019; Madsen et al. 2019). Thus, we demonstrate that sex and age have a small effect on the cfDNA concentrations and thus are unlikely to alter our conclusions substantially when considering a healthy population.

      We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion.

      “Our estimates for total plasma cfDNA concentration were derived from the median concentration observed in individuals below 47 years of age (n=52), as reported by (Meddeb et al. 2019). To complement this, we integrated our total concentration estimates with data on the proportion of cfDNA originating from specific cell types, leveraging a plasma methylome deconvolution method described by (Loyfer et al. 2023), which did not provide absolute quantities of cfDNA). To quantify the uncertainty associated with our cfDNA concentration estimates, we employed a methodology that considered several sources of variation. First, we incorporated the confidence interval of the median concentration reported by Meddeb et al. as a measure of uncertainty. Additionally, we accounted for individual-specific and analytic variations based on the study by (Madsen et al. 2019), encompassing factors such as the precise timing of measurements and assay precision. These sources of uncertainty were combined using the approach outlined below.”

      “Our current analysis focused on estimating plasma cfDNA concentration and cellular turnover in a cohort of healthy, relatively young individuals. The total plasma cfDNA concentrations were sourced from healthy individuals below 47 years, as reported by (Meddeb et al. 2019). We use data analyzed based on plasma samples from healthy individuals to estimate the proportion of cfDNA originating from specific cell types (Loyfer et al. 2023). These values were then compared to the potential DNA flux resulting from homeostatic cellular turnover, estimated for reference healthy males aged between 20 and 30 (Sender and Milo 2021). In our analysis, we considered various sources of uncertainty, including inter-individual variation, variability in the timing of sample collection, and analytical precision (Madsen et al. 2019; Meddeb et al. 2019). These factors collectively contributed to an uncertainty factor of less than 3. Importantly, this level of uncertainty does not alter our conclusion regarding the relatively small fraction of DNA present in plasma as cfDNA. Furthermore, we acknowledge that age and sex can impact total cfDNA concentration, as demonstrated by (Meddeb et al. 2019), with potential variations of up to 30%. However, as the results of our analysis present a much larger difference, these effects do not change the conclusions drawn from our analysis. Nevertheless, age and health status may influence the proportion of cfDNA originating from specific cell types and their corresponding cellular turnover rates. Consequently, the ratios themselves may vary in the elderly population or individuals with underlying health conditions.”

      2) "cfDNA fragments are not created equal". Recent studies demonstrate that cfDNA composition vary with disease state. For example, cfDNA GC content, fraction of short fragments, and composition of some genomic elements increase in heart transplant rejection compared to no-rejection state (Agbor-Enoh, Circulation, 2021). The genomic location and disease state may therefore be important factors to consider in these analyses.

      In this study, we addressed the total amount of cfDNA in healthy individuals without regard to GC content, representation of different genomic regions, or fragment length, as the goal was to understand if cell death rates are fully accounted for by cfDNA concentration. We agree that it will be interesting to study the relative representation of the genome in cfDNA and the processes that determine cfDNA concentration in pathologies beyond the rate of cell death. These topics for future research fall beyond this study's scope.

      3) Alternative sources of DNA production should be considered. Aside from cell death, DNA can be released from cells via active secretion. This and other additional sources of DNA should be considered in future studies. The distinct characteristics of mitochondrial DNA to genomic DNA should also be considered.

      We know only a few specific cases whereby DNA is released from cells that are not dying. These include the release of DNA from erythroblasts and megakaryocytes to generate anucleated erythrocytes and platelets (Moss et al. 2022, cited in our paper) and the release of NETs from neutrophils.

      The presence of cfDNA fragments originating from megakaryocytes and erythroblasts indicates the elimination of megakaryocytes and erythroblasts and the birth of erythrocytes and platelets. However, the considerations in the rest of the paper still apply: the concentration of cfDNA from these sources is far lower than expected from the cell turnover rate.

      Concerning NETosis: the presence of cfDNA originating in neutrophils that have not died would reduce the concentration of cfDNA from dying neutrophils and thus further increase the discrepancy, which is the topic of our study (under-representation of DNA from dying cells in plasma).

      We updated a paragraph in the discussion regarding this issue:

      “A comparison between the different types of cells shows a trend in which less DNA flux from cells with higher turnover gets to the bloodstream. In particular, a tiny fraction (1 in 3x104) of DNA from erythroid progenitors arrives at the plasma, indicating an extreme efficiency of the DNA recovery mechanism. Erythroid progenitors are arranged in erythroblastic islands. Up to a few tens of erythroid progenitors surround a single macrophage that collects the nuclei extruded during the erythrocyte maturation process (pyrenocytes) (Chasis and Mohandas 2008). The amount of DNA discarded through the maturation of over 200 billion erythrocytes per day (Sender and Milo 2021) exceeds all other sources of homeostatic discarded DNA. Our findings indicate that the organization of dedicated erythroblastic islands functions highly efficiently regarding DNA utilization. Neutrophils are another high-turnover cell type with a low level of cfDNA. When contemplating the process of NETosis (Vorobjeva and Chernyak 2020), the existence of cfDNA originating from live neutrophils would potentially diminish the concentration of cfDNA released by dying neutrophils, thereby amplifying the observed ratio for this particular cell type. The overall trend of higher turnover resulting in a lower cfDNA to DNA flux ratio may indicate similar design principles, in which the utilization of DNA is better in tissues with higher turnover. However, our analysis is limited to only several cell types (due to cfDNA test and deconvolution sensitivities), and extrapolation to cells with lower cell turnover is problematic.”

      We neglected mitochondrial DNA, as it is not measured in methylation cell-of-origin analysis. Similarly to the argument above, if some of the total DNA measured in plasma is in fact mitochondrial, this would mean that genomic cfDNA concentration is actually lower than the estimates, meaning that an even smaller fraction of DNA from dying cells is measured in plasma.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We would firstly like to thank all reviewers for their comments and support of this manuscript.

      Reviewer #1 (Recommendations For The Authors):

      No further recommendations.

      Reviewer #2 (Recommendations For The Authors):

      All of my comments have been sufficiently addressed.

      Reviewer #3 (Recommendations For The Authors):

      Thanks for responding to my former recommendations constructively. I believe these points have been fully addressed in this new version.

      However, I have not seen any comments on the points I raised in my former public review concerning the I-2 dependence of the FonSIX4 cell death. Do you know whether FonSIX4 would trigger cell death in tissues not expressing any I-2?

      We are a little confused concerning this comment. I-2 is a different class of resistance protein (NLR) that recognises Avr2 and this is likely to be intracellular. From the previous public review, we believe reviewer 3 may have been asking us to clarify the dependence of I (MM or M82) on FonSIX4 cell death. We have performed these controls by expressing FonSIX4 and associated FonSIX4/Avr1 chimeras in N. benthamiana (with the PR-1 signal peptide for efficient secretion of effectors) and it does not cause cell death in the absence of the I receptor – see S11F Fig. This was not explicitly conveyed in text so we have included the following in text: “Using the N. benthamiana assay we show FonSIX4 is recognised by I receptors from both cultivars (IM82 and iMoneymaker) and cell death is dependent on the presence of IM82 or iMoneymaker (Fig 5B, S11 Fig).”

      I still recommend discussing whether the Avr1 residues crucial for Avr activity are in the same structural regions of the C-terminal domain where previous work has identified residues under diversifying selection in symbiotic fungal FOLD proteins.

      The region important for recognition does encompass some residues within the structural region identified to be under diversifying selection in FOLD effectors from Rhizophagus irregularis previously reported (two residues within one beta-strand). However, we also see residues that don’t overlap to this area. We also note that the mycFOLD proteins analysed in symbiotic fungi are heavily skewed towards strong structurally similarity with FolSIX6 (similar cysteine spacing within both N and C-domains and structural orientation of the N and C-domains) rather than Avr1. We are under the impression that Avr1 was not included in the analysis of diversifying selection in symbiotic fungal FOLD proteins, it also is unclear to us if close Avr1 homologues are present. With this in mind, and considering our already lengthy discussion (as previously highlighted during reviewer), we have decided not to include further discussion concerning this point.


      The following is the authors’ response to the original reviews.

      We would like to thank the editor(s) and reviewers for their work concerning our manuscript. Most of the suggested changes were related to text changes which we have incorporated into the revised version. Please find our response to reviewers below.

      Reviewer #1 (Recommendations For The Authors):

      I only have very minor suggestions for the authors. The first one comes from reading the manuscript and finding it very dense with so many acronyms. This will limit the audience that will read the study and appreciate its impact. This is more noticeable in the Results, with many passages that I would suggest moving to Methodology.

      We thank reviewer 1 for their very positive review. We understand that due to the nature of this study, which includes many protein alleles/mutations that were expressed with different boundaries etc., it is difficult to achieve this. Reviewer 2 asked for more details to be provided. We hope we have achieved a nice balance in the revised manuscript.

      Something else that would facilitate the reading of the manuscript is the effectors name. The authors use the SIX name or the Avr name for some effectors and it makes it difficult to follow up.

      We have tried to make this consistent for Avr1 (SIX4), Avr2 (SIX3) and Avr3 (SIX1). Other SIX effectors are not known Avrs so the SIX names were used.

      Reading the manuscript and seeing how in most of the sections the authors used a computational approach followed by an experimental approach, I wonder why Alphafold2-multimer was not used to investigate the interaction between the effector and the receptor?

      This is a great suggestion, we have certainly investigated this, however to date there is no experimental evidence to directly support the direct interaction between I and Avr1. Post review, we spent some time trying to capture an interaction using a co-immunoprecipitation approach however to date we have not been able to obtain robust data that support this. We are currently looking to study this utilising protein biophysics/biochemistry but this work will take some time.

      Reviewer #2 (Recommendations For The Authors):

      We thank reviewer 2 for the very thorough editing and recommendations. We have incorporated all minor text edits below into the manuscript.

      Line 43: perhaps "Effector recognition" instead of "Effector detection", to be consistent with line 51?

      Line 60: Change to "leads".

      Line 79: Italicise Avr2.

      Line 94: Add the acronym ETI in parentheses after "effector-triggered immunity".

      Line 106: "(Leptosphaeria Avirulence-Supressing)" should be "(Leptosphaeria Avirulence and Supressing)".

      Line 112: Change "defined" to "define".

      Line 119: Spell out the species name on first use.

      Line 205: Glomeromycota is a division rather than a genus. Consistent with Fig 2, it also does not need to italicized.

      Line 207: Change "basidiomycete" to "Division Basidiomycota", consistent with Fig 2.

      Line 214: Change "alignment of Avr1, Avr3, SIX6 and SIX13" to "alignment of the mature Avr1, Avr3, SIX6 and SIX13 sequences".

      Line 324: Change "solved structures" to "solved protein structures".

      Line 335: Spell out acronyms like "MS" on first use in figure legends. Also dpi in other figure legends.

      Line 341: replace "effector-triggered immunity (ETI)" with "(ETI)" - see comment on Line 94.

      Line 370: Change "domains" to "domain".

      Line 374: In the title, change "C-terminus" to C-domain", consistent with the rest of the figure legend.

      Line 404: Change "(basidiomycetes and ascomycetes)" to "(Basidiomycota and Ascomycota fungi)", consistent with Fig 2C.

      Line 416: Change "in" to "by".

      Line 427: un-italicize the parentheses.

      Line 519: First mention of NLR. Spell out the acronym on first use in main text. S5 and S11 figure titles should be bolded.

      Line 852: Replace "@" with "at".

      S4 Table: Gene names should be italicised.

      S5 Table: Needs to be indicated that the primer sequences are in the 5´-3´ orientation.

      With regards to the Agrobacterium tumefaciens-mediated transient expression assays involving co-expression of the Avr1 effector and I immune receptor, the authors need to make clear how many biological replicates were performed as this information is only provided for the ion leakage assay.

      We have added these data to the figure legend

      Line 57: For me, the text "Fol secretes a limited number of structurally related effectors" reads as Fol secretes structurally related effectors, but very few of them are structurally related. Perhaps it would be better to say that the effector repertoire of Fol is made up of proteins that adopt a limited number of structural folds, or that the effector repertoire can be classified into a reduced set of structural families?

      This edit has been incorporated.

      Lines 66-67: Subtle re-wording required for "The best-characterized pathosystem is F. oxysporum f. sp. lycopersici (Fol)", as a pathosystem is made up of a pathogen and its host. Perhaps "The best-characterized pathosystem involves F. oxysporum f. sp. lycopersici (Fol) and tomato".

      Sentence has been reworded.

      Line 113 and throughout: Stick with one of "resistance protein", "receptor", "immune receptor" and "immunity receptor" throughout the manuscript.

      We have decided to use both receptor and immunity receptor as not all receptors investigated in the manuscript provide immunity.

      Lines 149-150: The title does not fully represent what is shown in the figure. The text "that is unique among fungal effectors" can be deleted as there is nothing in Fig 1 that shows that the fold is unique to fungal effectors.

      Figure title has been changed.

      Line 173: The RMSD of Avr3 is stated as being 3.7 Å, but in S3 Fig it is stated as being 3.6 Å.

      This was a mistake in the main text and has been corrected.

      Lines 202-204: This sentence needs to be reworded, as the way that it is written implies that the Diversispora and Rhizophagus genera are in the Ascomycota division. Also, "Ascomycetes" should be changed to "Ascomycota fungi", consistent with Fig 2.

      Sentence has been reworded.

      Line 233: "Scores above 8". What type of scores? Z-scores?

      These are Z-scores. This has been added in text.

      Lines 242-246: It is stated that SIX9 and SIX11 share structural similarity to various RNA-binding proteins, but no scores used to make these assessments is given. The scores should be provided in the text.

      Z-scores have been added.

      Fig 4A: SIX3 should be Avr2, consistent with line 292. The gene names should be italicised in Fig 4A.

      SIX3 was changed to Avr2. Gene names have been italicised.

      Line 356: Subtle rewording required, as "co-infiltrated with both IM82 and iMoneymaker" implies that you infiltrated with protein rather than Agrobacterium strains.

      Sentence has been reworded.

      Fig 5A, Fig 5C and Line 380: Light blue is used, but this looks grey. Perhaps change colour, as grey is already used to show the pro-domain in Fig 5A (or simply change the colour used to highlight the pro-domain)?

      Colour depicting the C-domain was changed.

      Lines 530-531: This text is no longer correct. Rlm4 and Rlm3 are now known to be alleles of Rlm9. See: Haddadi, P., Larkan, N. J., Van deWouw, A., Zhang, Y., Neik, T. X., Beynon, E., ... & Borhan, M. H. (2022). Brassica napus genes Rlm4 and Rlm7, conferring resistance to Leptosphaeria maculans, are alleles of the Rlm9 wall‐associated kinase‐like resistance locus. Plant Biotechnology Journal, 20(7), 1229.

      We thank the reviewer for picking this up. This text has been updated.

      Line 553: Provide more information on what the PR1 signal peptide is.

      More information about the PR1 signal peptide has been added.

      Lines 767-781: Descriptions and naming conventions of proteins throughout the figure legend need to be consistent and better reflect their makeup. For example, I think it would be best to put the sequence range after each protein mentioned - e.g. Avr118-242 or Avr159-242 instead of Avr1, PSL1_C37S18-111 instead of PSL1_C37S, etc. Furthermore, it is often stated that a protein is full-length when it lacks a signal peptide - my thought is that if a proteins lack its signal peptide, it is not full-length. The acronym "PD" also needs to be spelled out as "pro-domain (PD)" in the figure legend.

      We have incorporated sequence range for proteins that were produced upon first use. Sequence ranges that were modelled in AlphaFold2 were not added in text because they can be found in Supplementary Table 3.

      Lines 853-845: It is stated the sizes of proteins are indicated above the chromatogram in S10 Fig, but this is not the case. It is also not clear from S10B Fig that the faint peaks correspond to the peaks in the Fig 4B chromatogram. In S10D Fig, the stick of C58S is difficult to see. Perhaps change the colour or use an arrow/asterisk?

      Protein size estimates have been added above the chromatogram. Added text to indicate that the faint peaks correspond to peaks in Fig 4B. Added an asterisk in S10D Fig to identify the location of C58.

      S14 Fig is not mentioned/referenced in the main text of the manuscript.

      This was a mistake and has been added.

      The reference list needs to be updated to accommodate those referenced bioRxiv preprints that have now been published in peer-reviewed journals.

      The reference list has been updated.

      Reviewer #3 (Recommendations For The Authors):

      It would be good to discuss whether the pro-domains affecting virulence or avirulence activity.

      Kex2, the protease that cleaves the pro-domain functions in the golgi. We therefore suspect that the pro-domain is removed prior to secretion. For recombinant protein production in E. coli we find that these pro-domains are necessary to obtain soluble protein (doi: 10.1111/nph.17516). As we require the pro-domain for protein production and can not completely removing them from our preps, we cannot perform experiments to test this and subsequently comment further. In a paper that identified SIX effectors in tomato utilising proteomics approach (https://bsppjournals.onlinelibrary.wiley.com/doi/10.1111/j.1364-3703.2007.00384.x), it appears that the pro-domains were not captured in this analysis. This supports the conclusion that they are not associated with the mature/secreted protein.

      The authors stated that the C-terminal domain of SIX6 has a single disulfide bond unique to SIX6. Please clarify in which context is it unique: in Fusarium or across all FOLD proteins?

      This is in direct comparison to Avr1 and Avr3. The disulfide in the C-domain of SIX6 is unique compared to Avr1 and Avr3. This has been made clear in text.

      The structural similarity of FOLD proteins to other known structures have been discussed (lines 460ff), but it is not clear whether all structures and models identified in this work would yield cysteine inhibitor and tumor necrosis factors as best structural matches in the database or whether this is specific to a single FOLD protein. Please consider discussing recently published findings by others (Teulet et al. 2023, New Phytologist) on this aspect.

      This analysis was performed for Avr1, we obtained relatively low similarity hits for Avr3/Six6. We have updated this text accordingly… “Unfortunately, the FOLD effectors share little overall structural similarity with known structures in the PDB outside of the similarity with each other. At a domain level, the N-domain of the FOLD effector Avr1 has some structural similarities with cystatin cysteine protease inhibitors (PDB code: 4N6V, PDB code: 5ZC1) [60, 61], and the C-domain with tumour necrosis factors (PDB code: 6X83) [62] and carbohydrate-binding lectins (PDB code: 2WQ4) [63]. Relatively weak hits were observed for Avr3/Six6.”

      It might be useful to clearly point out that the ToxA fold and the C-terminus of the FOLD fold are different.

      We have secondary structural topology maps of the FOLD and ToxA-like families in S8 Fig which highlight the differences in topology between these two families.

      Please add information to Fig.S8 listing the approach to generate the secondary structure topology maps.

      We have added this information in the figure caption.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors found that nifuroxazide has the potential to augment the efficacy of radiotherapy in HCC by reducing PD-L1 expression. This effect may be attributed to increased degradation of PD-L1 through the ubiquitination-proteasome pathway. The paper provides new ideas and insights to improve treatment effectiveness, however, there are additional points that could be addressed.

      • The paper highlights that the combination of nifuroxazide increases tumor cell apoptosis. A discussion regarding the potential crosstalk or regulatory mechanisms between apoptotic pathways and PD-L1 expression would be valuable.

      Response: Thank you very much for your suggestion. Research has shown that regulating the STAT3/PD-L1 pathway can effectively increase apoptosis in lung cancer cells (1). Our study confirmed that nifuroxazide can effectively inhibit the expression of p-STAT3 and PD-L1 in liver cancer cells, which may be the reason for the increased apoptosis of these cells. We have added relevant descriptions in the discussion.

      • The benefits and advantages of nifuroxazide combination could be compared to the current clinical treatment options.

      Response: Thank you greatly for your insightful feedback. The primary objective of this study is to explore whether nifuroxazide can effectively enhance the degradation of PD-L1, thereby increasing the radiosensitivity of HCC. Our research reveals that compared to radiation therapy alone, combination therapy involving nifuroxazide and radiation significantly inhibits tumor growth in mice and boosts the anti-tumor immune response. This finding could potentially provide a valuable strategy for patients who exhibit resistance to radiation therapy in clinical practice. Moreover, clinical trial investigations have demonstrated that nivolumab, a PD-1 monoclonal antibody, when combined with radiation therapy for HCC, exhibits promising safety and efficacy (2). This evidence supports the future application of nifuroxazide in the treatment of HCC. However, to reach this objective, we must continue to conduct extensive research, including comparing nifuroxazide with existing therapies in clinical practice. We believe that nifuroxazide not only significantly inhibits the expression of PD-L1 protein in HCC cells but also functions as a PD-L1 inhibitor. Furthermore, it effectively curbs the proliferation and migration of HCC cells, induces tumor cell apoptosis, and may exhibit enhanced anti-tumor effects, making it a promising candidate for clinical use. We have incorporated relevant discussion content in the article to address these points.

      Reviewer #2 (Public Review):

      Summary:

      Zhao et al. aimed to explore an important question - how to overcome the resistance of hepatocellular carcinoma cells to radiotherapy? Given that the immune-suppressive microenvironment is a major mechanism underlying resistance to radiotherapy, they reasoned that a drug that blocks the PD-1/PD-L1 pathway could improve the efficacy of radiation therapy and chose to investigate the effect of Nifuroxazide, an inhibitor of stat3 activation, on radiotherapy efficacy in treating hepatocellular carcinoma cells. From in vitro experiments, they find combination treatment (Nifuroxazide+ radiotherapy) increases apoptosis and reduces proliferation and migration, in comparison to radiotherapy alone. From in vivo experiments, they demonstrate that combined treatment reduces the size and weight of tumors in vivo and enhances mice survival. These data indicate a better efficacy of combination therapy compared to radiotherapy alone. Moreover, they also determined the effect of combination therapy on tumor microenvironment as well as peripheral immune response. They find that combination therapy increases infiltration of CD4+ and CD8+ cells as well as M1 macrophages in the tumor microenvironment. Interestingly, they find that the ratio of Treg cells in spleen is increased by radiotherapy but decreased by Nifuroxazide. Considering the immune-suppressive role of Treg cells, this finding is consistent with reduced tumor growth by combination therapy. However, it is unclear whether the combined therapy affects the ratio of Treg cells in the tumors or not. The most intriguing part of the study is the determination of the effect of Nifuroxazide on PD-L1 expression in the context of radiotherapy. Considering Nifuroxazide is a stat3 activation inhibitor and stat3 inhibition leads to reduced expression of PD-L1, one would expect Nifuroxazide decreases PD-L1 expression through stat3. However, they found that the effect of Nifuroxazide on PD-L1 is dependent on GSK3 mediated Proteasome pathways and independent of stat3, in the given experimental context. To determine the relevance to human hepatocellular carcinoma, they also measured the PD-L1 expression in human tumor tissues of HCC patients pre- and post-radiotherapy. The increased PD-L1 expression level in HCC after radiotherapy is impressive. However, it is unclear whether the patients being selected in the study had resistant disease to radiotherapy or not.

      Overall, the data are convincing and supportive to the conclusions.

      Strengths:

      1) Novel finding: Identified novel mechanism underlying the effect of Nifuroxazide on PD-L1 expression in hepatocellular carcinoma cells in the context of radiotherapy.

      2) Comprehensive experimental approaches: using different approaches to prove the same finding. For example, in Fig 4, both IHC and WB were used. In Fig 5, both IF and WB were used.

      3) Human disease relevance: Compared observations in mice with human tumor samples.

      The question in the summary, “However, it is unclear whether the combined therapy affects the ratio of Treg cells in the tumors or not”.

      Response: Thank you very much for your valuable feedback. We have included additional flow cytometry results regarding the expression of relevant Treg cells (CD4+CD25+Foxp3+ T lymphocytes) in tumor tissues (Supplementary Fig 2). Our findings indicate that the number of Treg cells in tumor tissues significantly decreased following combination therapy with nifuroxazide and radiotherapy.

      The question in the summary, “However, it is unclear whether the patients being selected in the study had resistant disease to radiotherapy or not”.

      Response: Thank you very much for your valuable feedback. All the HCC patients selected in this study experienced recurrence after radiation treatment.

      Weaknesses:

      1) It is hard to tell whether the observed phenotype and mechanism are generic or specific to the limited cell lines used in the study. The in vitro experiments were performed in one human cell line and the in vivo experiments were performed in one mouse cell line.

      Response: Thank you very much for your feedback. We have included additional experimental data from another human cell line Huh7 (Supplementary Fig 3).

      2) The study did not distinguish the effect of increased radiosensitivity by nifuroxazide from combined anti-tumor effects by two different treatments.

      Response: Thank you greatly for your insightful feedback. In this study, we primarily compared the antitumor effects of nifuroxazide combined with radiotherapy versus either nifuroxazide or radiotherapy alone, and confirmed that the combined treatment demonstrated a more potent anti-hepatocellular carcinoma effect compared to single therapy. Furthermore, to achieve the goal of utilizing nifuroxazide for the treatment of clinical hepatocellular carcinoma, additional research is necessary, including comparisons with other clinically established therapies. We have also incorporated relevant discussions in our analysis.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors embarked on an exploration of how nifuroxazide could enhance the responsiveness to radiotherapy by employing both an in vitro cell culture system and an in vivo mouse tumor model.

      Strengths:

      The researchers conducted an array of experiments aimed at revealing the function of nifuroxazide in aiding the radiotherapy-induced reduction of proliferation, migration, and invasion of HepG2 cells.

      Weaknesses:

      The authors did not provide the molecular mechanism through which nifuroxazide collaborates with radiotherapy to effectively curtail the proliferation, migration, and invasion of HCC cells. Moreover, the evidence supporting the assertion that nifuroxazide contributes to the degradation of radiotherapy-induced upregulation of PD-L1 via the ubiquitin-proteasome pathway appears to be insufficient. Importantly, further validation of this discovery should involve the utilization of an additional syngeneic mouse HCC tumor model or an orthotopic HCC tumor model.

      Response: Thank you very much for your insightful comments. Nifuroxazide has been demonstrated to inhibit the expression of p-STAT3, thereby suppressing tumor cell proliferation and migration (3, 4). In our study, we observed that after 48 hours of treatment with Nifuroxazide, the expression of p-STAT3 in irradiated cells was significantly inhibited. Furthermore, compared to radiation alone, combined Nifuroxazide and radiotherapy resulted in a more pronounced decrease in PCNA expression. Simultaneously, we performed additional detection of migration-related protein MMP2 expression (revised Fig 2B), confirming that combined Nifuroxazide and radiotherapy led to a more significant inhibition of MMP2 expression. These findings suggest that the combined treatment may be responsible for the synergistic suppression of HCC cell proliferation and migration. We have included relevant discussions in our manuscript.

      Our initial results indicate that Nifuroxazide inhibits the expression of PD-L1 at the protein level, but does not affect its mRNA level. Interestingly, upon treatment with a proteasome inhibitor MG132, the inhibitory effect of Nifuroxazide on PD-L1 was eliminated, suggesting that Nifuroxazide may enhance the degradation of PD-L1 protein. Our experiments have demonstrated the inhibitory effect of Nifuroxazide on PD-L1 in both human and mouse cell lines. However, to translate these findings into clinical application for the treatment of hepatocellular carcinoma, additional research is necessary, including validation in genetically engineered mouse models of HCC. We have addressed these points in the discussion section of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) Please improve the quality of Figure 3E. It is hard to figure out the bar and details.

      Response: Thank you for your valuable feedback. We have meticulously revised the figures to enhance their clarity and presentation (revised Fig 3E).

      2) In Figure 7E, please elucidate the methods used for calculating the amount of PD-L1 mRNA level. Please adjust the picture angle and label the marker size on the left as well

      Response: Thank you for your feedback. We have incorporated a method for calculating PD-L1 mRNA levels and revised the corresponding figures accordingly (revised Fig 7E).

      Reviewer #2 (Recommendations For The Authors):

      Questions:

      1) What is the advantage of using a combination of nifuroxazide and radiotherapy in comparison to using a combination of anti-PD1/PDL1 and radiotherapy?

      Response: Thank you very much for your insightful comments. We believe that the advantage of nifuroxazide over PD-1 or PD-L1 antibodies lies in its ability not only to effectively inhibit PD-L1 expression but also to suppress tumor cell proliferation, migration, and promote cell apoptosis (Supplementary Fig 1). We have also expanded on these aspects in the discussion section of the manuscript.

      2) For the characterization of tumor microenvironment and immune cells in the spleen, were the same cell populations being investigated? What about NK and Treg cells in tumors? What about M1 macrophages in spleen?

      Response: Thank you very much for your insightful suggestion. We have measured the infiltration of NK and Treg cells in tumor tissues (Supplementary Fig 2), as well as the abundance of M1 macrophages (revised Fig 6) in the spleen, and provided additional relevant data to strengthen our study.

      Other comments:

      1) The data in Fig 1 is solid. However, it is hard to distinguish the effect of increased radiosensitivity by nifuroxazide from combined anti-tumor effects by two different treatments. The anti-tumor role of Nifuroxazide has been reported in melanoma, colorectal carcinoma, and hepatocellular carcinoma previously (PMID: 26830149; 28055016, 26154152). Therefore, the increased apoptosis and decreased proliferation and migration could be caused by nifuroxazide and not related to the sensitivity of cells to radiation therapy.

      Response: Thank you very much for your constructive feedback. As you suggested, the anti-tumor role of nifuroxazide has been reported. However, the innovation of our study does not lie in confirming its antitumor effects but rather in demonstrating how nifuroxazide can enhance radiotherapy's efficacy in treating hepatocellular carcinoma by inhibiting PD-L1 levels.

      We compared the efficacy of combined therapy versus radiotherapy and found that compared to radiation alone, combined therapy more significantly inhibited hepatocellular carcinoma cell proliferation and migration. In our animal model, we compared the therapeutic effects of combined therapy, nifuroxazide, and radiotherapy on hepatocellular carcinoma-bearing mice. We observed that compared to individual treatment groups, combined therapy more profoundly suppressed tumor growth and enhanced the antitumor effects in the mice.

      In response to your feedback, we have expanded the discussion on the impact of combined therapy versus nifuroxazide or radiotherapy on hepatocellular carcinoma cell proliferation, migration, and apoptosis (Supplementary Fig 1). The data show that compared to either individual therapy, combined therapy further inhibited cell proliferation and migration while promoting apoptosis.

      2) There is no direct evidence to show the improved efficacy of radiation therapy by nifuroxazide through the degradation of PD-L1.

      Response: Thank you very much for your valuable suggestions. In our cell experiments, we found that nifuroxazide inhibits the increased expression of PD-L1 in cells induced by radiation therapy, and this inhibitory effect is counteracted when using the proteasome inhibitor MG132. Therefore, we speculate that nifuroxazide may inhibit PD-L1 expression through a proteasome-dependent mechanism. To better reflect this, we have revised the title of our manuscript to "Nifuroxazide Suppresses PD-L1 Expression and Enhances the Efficacy of Radiotherapy in Hepatocellular Carcinoma."

      3) "The oncogene Stat3.....was effectively inhibited by radiotherapy in cells" - this sentence may be rephrased to make the point clear. The authors might mean to say "activation of the oncogene stat3...."

      "The results demonstrated that the combination therapy increased the expression of PARP," the authors might mean to say "expression of c-PARP"

      Response: Thank you very much for your feedback. We have revised the relevant sentence descriptions to improve clarity and accuracy.

      4) "histomorphology significantly improved after the treatment with nifuroxazide and radiation therapy (Fig 3E)." How to define "improved histomorphology"? The authors may want to provide more details to clarify "improved".

      Response: Thank you very much for your feedback. We have revised the relevant sentence descriptions to improve clarity and accuracy.

      5) In addition to normalizing protein expression by tubulin, the authors may consider normalizing p-stat3 expression level by stat3.

      Response: Thank you very much for your feedback. We have conducted a quantitative analysis of the expression levels of p-STAT3 and STAT3 (revised Fig 2A).

      6) Figure 3C and D, using a different color to represent each group might help the readers to better differentiate each group.

      Response: Thank you very much for your feedback. Following your suggestion, we have revised the figures accordingly (revised Fig 3C and 3D).

      Reviewer #3 (Recommendations For The Authors):

      In this study, the authors revealed the pivotal role of nifuroxazide in augmenting the efficacy of radiotherapy. This was evidenced by its synergistic effect in suppressing the proliferation and migratory capabilities of HCC cells, alongside its capacity to induce apoptosis in these cells. Furthermore, their findings underscored the substantial synergy between nifuroxazide and radiotherapy in retarding tumor growth, thereby extending survival rates in a tumor-bearing murine model. Moreover, the authors observed that nifuroxazide combined with radiotherapy significantly increases the tumor-infiltrating CD4+ T cells, CD8+ T cells, and M1 macrophages. Finally, the authors found that nifuroxazide countered the radiotherapy-induced upregulation of PD-L1 through the ubiquitin-proteasome pathway. However, the evidence for supporting the main claims is only partially supported. The following are my concerns and suggestions.

      1) In Figures 1 and 2, the authors convincingly demonstrate the synergistic impact of nifuroxazide and radiotherapy on curtailing the proliferation, colony formation, and migratory capabilities of HCC cells, while also instigating apoptosis in these cells. However, the underlying molecular mechanism remains elusive. A recent study highlighted nifuroxazide's potential to impede the proliferation of glioblastoma cells and induce apoptosis via the MAP3K1/JAK2/STAT3 pathway (Wang X., et al., Int Immunopharmacol. 2023 May;118:109987. doi: 10.1016/j.intimp.2023.109987). It would be valuable for the authors to investigate whether nifuroxazide employs a similar molecular mechanism to regulate proliferation and apoptosis in the context of HCC. This could offer deeper insights into the mechanisms at play in their observed effects.

      Response: Thank you very much for your insightful comments. As you pointed out, previous studies have reported that nifuroxazide exerts antitumor effects by inhibiting the STAT3 pathway. However, in our experiments, we observed that radiation therapy significantly increased the expression of PD-L1, but showed a trend of decreased p-STAT3 expression. Therefore, we believe that nifuroxazide does not inhibit PD-L1 expression through the STAT3 pathway. Subsequently, our further research revealed that the inhibitory effect of nifuroxazide on PD-L1 can be counteracted by a proteasome inhibitor. Thus, we propose that nifuroxazide inhibits PD-L1 expression through a proteasome-dependent mechanism, thereby enhancing the efficacy of radiation therapy in hepatocellular carcinoma.

      2) Figures 1 and 2 solely rely on the HepG2 cell line to establish their conclusions. To validate these findings robustly, it is recommended that another HCC cell line be included in the study. This additional cell line will contribute to the generalizability and reliability of the results, enhancing the overall credibility of the study's conclusions.

      Response: Thank you very much for your suggestion. We have included additional experimental results with the relevant cell line Huh7 (supplementary Fig 3).

      3) Figure 3 demonstrates the use of only one syngeneic mouse H22 tumor model. To ensure the robustness and validity of this finding, it would be advisable to incorporate at least one more syngeneic mouse HCC tumor model or even an orthotopic mouse tumor model. The inclusion of additional models would bolster the significance and reliability of the observed results, contributing to a more comprehensive understanding of the phenomenon under investigation.

      Response: Thank you for your valuable suggestion. In the H22 mouse tumor model, we conducted relevant assessments of survival rate and tumor growth. The results confirm that the combination of nifuroxazide and radiation therapy exhibits a promising synergistic antitumor effect. However, to achieve the goal of applying nifuroxazide combined with radiation therapy for the treatment of clinical hepatocellular carcinoma, we still need to undertake extensive research, including validation on genetically identical mouse HCC tumor models. We have also included relevant discussions in our ongoing discussions.

      4) In Figure 5, employing an alternative method, such as the flow cytometry assay, to analyze and corroborate the tumor-infiltrating immune cell profiling following various treatments would enhance the rigor of the study. This additional approach would provide a complementary perspective and validate the findings, strengthening the overall reliability and impact of the results presented.

      Response: Thank you for your insightful suggestion. We have included additional experimental data to strengthen our study (supplementary Fig 2).

      5) In Figure 7, the conclusion drawn regarding nifuroxazide's impact on PD-L1 expression through ubiquitination-proteasome mechanisms seems to lack the robust evidence needed to firmly establish nifuroxazide's role in regulating PD-L1 ubiquitination. To reinforce this aspect of the study, the authors may conduct comprehensive in vitro and in vivo ubiquitination assays. Performing these assays would offer direct insights into whether nifuroxazide genuinely influences PD-L1 ubiquitination, thus fortifying the credibility and importance of the reported findings.

      Response: Thank you for your valuable feedback. Our initial findings suggest that nifuroxazide inhibits the expression of PD-L1 protein levels, but does not affect the mRNA levels. Moreover, upon treatment with the proteasome inhibitor MG132, the inhibitory effect of nifuroxazide on PD-L1 was found to be abolished. Concurrently, we observed that nifuroxazide significantly enhances GSK-3β expression in both cell and animal experiments. Consequently, we propose that nifuroxazide augments the degradation of PD-L1 protein.

      6) Statistical methods should be included in the captions of all the figures with statistical graphs. The size of the scale should be supplemented with a description in the captions.

      Response: Thank you for your valuable suggestion. We have made the appropriate modifications to our study based on your recommendations.

      7) Considering the outcomes presented in the study, it appears that the title "Nifuroxazide enhances radiotherapy efficacy against hepatocellular carcinoma by upregulating PD-L1 degradation via the ubiquitin-proteasome pathway" may not accurately reflect the findings.

      Response: Thank you for your insightful feedback. We have revised the title to read, "Inhibitory Effects of Nifuroxazide on PD-L1 Expression and Enhanced Radiotherapy Efficacy in Hepatocellular Carcinoma".

      References

      1) Xie C, Zhou X, Liang C, Li X, Ge M, Chen Y, et al. Apatinib triggers autophagic and apoptotic cell death via VEGFR2/STAT3/PD-L1 and ROS/Nrf2/p62 signaling in lung cancer. Journal of experimental & clinical cancer research : CR. 2021;40(1):266. doi: 10.1186/s13046-021-02069-4.

      2) de la Torre-Alaez M, Matilla A, Varela M, Inarrairaegui M, Reig M, Lledo JL, et al. Nivolumab after selective internal radiation therapy for the treatment of hepatocellular carcinoma: a phase 2, single-arm study. Journal for immunotherapy of cancer. 2022;10(11). doi: 10.1136/jitc-2022-005457.

      3) Yang F, Hu M, Lei Q, Xia Y, Zhu Y, Song X, et al. Nifuroxazide induces apoptosis and impairs pulmonary metastasis in breast cancer model. Cell Death Dis. 2015;6(3):e1701. doi: 10.1038/cddis.2015.63.

      4) Nelson EA, Walker SR, Kepich A, Gashin LB, Hideshima T, Ikeda H, et al. Nifuroxazide inhibits survival of multiple myeloma cells by directly inhibiting STAT3. Blood. 2008;112(13):5095-102. doi: 10.1182/blood-2007-12-129718.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents H3-OPT, a deep learning method that effectively combines existing techniques for the prediction of antibody structure. This work is important because the method can aid the design of antibodies, which are key tools in many research and industrial applications. The experiments for validation are solid.

      Comments to Author:

      Several points remain partially unclear, such as:

      1). Which examples constitute proper validation;

      Thank you for your kind reminder. We have modified the text of the experiments for validation to identify which examples constitute proper validation. We have corrected the “Finally, H3-OPT also shows lower Cα-RMSDs compared to AF2 or tFold-Ab for the majority of targets in an expanded benchmark dataset, including all antibody structures from CAMEO 2022” into “Finally, H3-OPT also shows lower Cα-RMSDs compared to AF2 or tFold-Ab for the majority (six of seven) of targets in an expanded benchmark dataset, including all antibody structures from CAMEO 2022” and added the following sentence in the experimental validation section of our revised manuscript to clarify which examples constitute proper validation: “AlphaFold2 outperformed IgFold on these targets”.

      2) What the relevance of the molecular dynamics calculations as performed is;

      Thank you for your comment, and I apologize for any confusion. The goal of our molecular dynamics calculations is to compare the differences in binding affinities, an important issue of antibody engineering, between AlphaFold2-predicted complexes and H3-OPT-predicted complexes. Molecular dynamics simulations enable the investigation of the dynamic behaviors and interactions of these complexes over time. Unlike other tools for predicting binding free energy, MM/PBSA or MM/GBSA calculations provide dynamic properties of complexes by sampling conformational space, which helps in obtaining more accurate estimates of binding free energy. In summary, our molecular dynamics calculations demonstrated that the binding free energies of H3-OPT-predicted complexes are closer to those of native complexes. We have included the following sentence in our manuscript to provide an explanation of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”.

      3) The statistics for some of the comparisons;

      Thank you for the comment. We have incorporated statistics for some of the comparisons in the revised version of our manuscript and added the following sentence in the Methods section: “We conducted two-sided t-test analyses to assess the statistical significance of differences between the various groups. Statistical significance was considered when the p-values were less than 0.05. These statistical analyses were carried out using Python 3.10 with the Scipy library (version 1.10.1).”.

      4) The lack of comparison with other existing methods.

      We appreciate your valuable comments and suggestions. Conducting comparisons with a broader set of existing methods can further facilitate discussions on the strengths and weaknesses of each method, as well as the accuracy of our method. In our study, we conducted a comparison of H3-OPT with many existing methods, including AlphaFold2, HelixFold-Single, ESMFold, and IgFold. We demonstrated that several protein structure prediction methods, such as ESMFold and HelixFold-Single, do not match the accuracy of AlphaFold2 in CDR-H3 prediction. Additionally, we performed a detailed comparison between H3-OPT, AlphaFold2, and IgFold (the latest antibody structure prediction method) for each target.

      We sincerely thank the comment and have introduced a comparison with OmegaFold. The results have been incorporated into the relevant sections (Fig 4a-b) of the revised manuscript.

      Author response image 1.

      Public Reviews

      Comments to Author:

      Reviewer #1 (Public Review):

      Summary:

      The authors developed a deep learning method called H3-OPT, which combines the strength of AF2 and PLM to reach better prediction accuracy of antibody CDR-H3 loops than AF2 and IgFold. These improvements will have an impact on antibody structure prediction and design.

      Strengths:

      The training data are carefully selected and clustered, the network design is simple and effective.

      The improvements include smaller average Ca RMSD, backbone RMSD, side chain RMSD, more accurate surface residues and/or SASA, and more accurate H3 loop-antigen contacts.

      The performance is validated from multiple angles.

      Weaknesses:

      1) There are very limited prediction-then-validation cases, basically just one case.

      Thanks for pointing out this issue. The number of prediction-then-validation cases is helpful to show the generalization ability of our model. However, obtaining experimental structures is both costly and labor-intensive. Furthermore, experimental validation cases only capture a limited portion of the sequence space in comparison to the broader diversity of antibody sequences.

      To address this challenge, we have collected different datasets to serve as benchmarks for evaluating the performance of H3-OPT, including our non-redundant test set and the CAMEO dataset. The introduction of these datasets allows for effective assessments of H3-OPT’s performance without biases and tackles the obstacle of limited prediction-then-validation cases.

      Reviewer #2 (Public Review):

      This work provides a new tool (H3-Opt) for the prediction of antibody and nanobody structures, based on the combination of AlphaFold2 and a pre-trained protein language model, with a focus on predicting the challenging CDR-H3 loops with enhanced accuracy than previously developed approaches. This task is of high value for the development of new therapeutic antibodies. The paper provides an external validation consisting of 131 sequences, with further analysis of the results by segregating the test sets into three subsets of varying difficulty and comparison with other available methods. Furthermore, the approach was validated by comparing three experimentally solved 3D structures of anti-VEGF nanobodies with the H3-Opt predictions

      Strengths:

      The experimental design to train and validate the new approach has been clearly described, including the dataset compilation and its representative sampling into training, validation and test sets, and structure preparation. The results of the in-silico validation are quite convincing and support the authors' conclusions.

      The datasets used to train and validate the tool and the code are made available by the authors, which ensures transparency and reproducibility, and allows future benchmarking exercises with incoming new tools.

      Compared to AlphaFold2, the authors' optimization seems to produce better results for the most challenging subsets of the test set.

      Weaknesses:

      1) The scope of the binding affinity prediction using molecular dynamics is not that clearly justified in the paper.

      We sincerely appreciate your valuable comment. We have added the following sentence in our manuscript to justify the scope of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”.

      2) Some parts of the manuscript should be clarified, particularly the ones that relate to the experimental validation of the predictions made by the reported method. It is not absolutely clear whether the experimental validation is truly a prospective validation. Since the methodological aspects of the experimental determination are not provided here, it seems that this may not be the case. This is a key aspect of the manuscript that should be described more clearly.

      Thank you for the reminder about experimental validation of our predictions. The sequence identities of the wild-type nanobody VH domain and H3 loop, when compared with the best template, are 0.816 and 0.647, respectively. As a result, these mutants exhibited low sequence similarity to our dataset, indicating the absence of prediction bias for these targets. Thus, H3-OPT outperformed IgFold on these mutants, demonstrating our model's strong generalization ability. In summary, the experimental validation actually serves as a prospective validation.

      Thanks for your comments, we have added the following sentence to provide the methodological aspects of the experimental determination: “The protein expression, purification and crystallization experiments were described previously. The proteins used in the crystallization experiments were unlabeled. Upon thawing the frozen protein on ice, we performed a centrifugation step to eliminate any potential crystal nucleus and precipitants. Subsequently, we mixed the protein at a 1:1 ratio with commercial crystal condition kits using the sitting-drop vapor diffusion method facilitated by the Protein Crystallization Screening System (TTP LabTech, mosquito). After several days of optimization, single crystals were successfully cultivated at 21°C and promptly flash-frozen in liquid nitrogen. The diffraction data from various crystals were collected at the Shanghai Synchrotron Research Facility and subsequently processed using the aquarium pipeline.”

      3) Some Figures would benefit from a clearer presentation.

      We sincerely thanks for your careful reading. According to your comments, we have made extensive modifications to make our presentation more convincing and clearer (Fig 2c-f).

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript introduces a new computational framework for choosing 'the best method' according to the case for getting the best possible structural prediction for the CDR-H3 loop. The authors show their strategy improves on average the accuracy of the predictions on datasets of increasing difficulty in comparison to several state-of-the-art methods. They also show the benefits of improving the structural predictions of the CDR-H3 in the evaluation of different properties that may be relevant for drug discovery and therapeutic design.

      Strengths:

      The authors introduce a novel framework, which can be easily adapted and improved. The authors use a well-defined dataset to test their new method. A modest average accuracy gain is obtained in comparison to other state-of-the art methods for the same task while avoiding testing different prediction approaches.

      Weaknesses:

      1) The accuracy gain is mainly ascribed to easy cases, while the accuracy and precision for moderate to challenging cases are comparable to other PLM methods (see Fig. 4b and Extended Data Fig. 2). That raises the question: how likely is it to be in a moderate or challenging scenario? For example, it is not clear whether the comparison to the solved X-ray structures of anti-VEGF nanobodies represents an easy or challenging case for H3-OPT. The mutant nanobodies seem not to provide any further validation as the single mutations are very far away from the CDR-H3 loop and they do not disrupt the structure in any way. Indeed, RMSD values follow the same trend in H3-OPT and IgFold predictions (Fig. 4c). A more challenging test and interesting application could be solving the structure of a designed or mutated CDR-H3 loop.

      Thank you for your rigorous consideration. When the experimental structure is unavailable, it is difficult to directly determinate whether the target is easy-to-predict or challenging. We have conducted our non-redundant test set in which the number of easy-to-predict targets is comparable to the other two groups. Due to the limited availability of experimental antibody structures, especially nanobody structures, accurately predicting CDR-H3 remains a challenge. In our manuscript, we discuss the strengths and weakness of AlphaFold2 and other PLM-based methods, and we introduce H3-OPT as a comprehensive solution for antibody CDR3 modeling.

      We also appreciate your comment on experimental structures. We fully agree with your opinion and made attempts to solve the experimental structures of seven mutants, including two mutants (Y95F and Q118N) which are close to CDR-H3 loop. Unfortunately, we tried seven different reagent kits with a total of 672 crystallization conditions, but were unable to obtain crystals for these mutants. Despite the mutants we successfully solved may not have significantly disrupted the structures of CDR-H3 loops, they have still provided valuable insights into the differences between MSA-based methods and MSA-free methods (such as IgFold) for antibody structure modeling.

      We have further conducted a benchmarking study using two examples, PDBID 5U15 and 5U0R, both consisting of 18 residues in CDR-H3, to evaluate H3-OPT's performance in predicting mutated H3 loops. In the first case (target 5U15), AlphaFold2 failed to provide an accurate prediction of the extended orientation of the H3 loop, resulting in a less accurate prediction (Cα-RMSD = 10.25 Å) compared to H3-OPT (Cα-RMSD = 5.56 Å). In the second case (target 5U0R, a mutant of 5U15 in CDR3 loop), AlphaFold2 and H3-OPT achieved Cα-RMSDs of 6.10 Å and 4.25 Å, respectively. Additionally, the Cα-RMSDs of OmegaFold predictions were 8.05 Å and 9.84 Å, respectively. These findings suggest that both AlphaFold2 and OmegaFold effectively captured the mutation effects on conformations but achieved lower accuracy in predicting long CDR3 loops when compared to H3-OPT.

      2) The proposed method lacks a confidence score or a warning to help guide the users in moderate to challenging cases.

      We appreciate your suggestions and we have trained a separate module to predict confidence scores. We used the MSE loss for confidence prediction, where the label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100.

      3) The fact that AF2 outperforms H3-OPT in some particular cases (e.g. Fig. 2c and Extended Data Fig. 3) raises the question: is there still room for improvements? It is not clear how sensible is H3-OPT to the defined parameters. In the same line, bench-marking against other available prediction algorithms, such as OmegaFold, could shed light on the actual accuracy limit. We totally understand your concern. Many papers have suggested that PLM-based models are computationally efficient but may have unsatisfactory accuracy when high-resolution templates and MSA are available (Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies, Ruffolo, J. A. et al, 2023). However, the accuracy of AF2 decreased substantially when the MSA information is limited. Therefore, we directly retained high-confidence structures of AF2 and introduced a PSPM to improve the accuracy of the targets with long CDR-H3 loops and few sequence homologs. The improvement in mean Cα-RMSD demonstrated the room for accurately predicting CDR-H3 loops.

      We also appreciate your kind comment on defined parameters. In fact, once a benchmark dataset is established, determining an optimal cutoff value through parameter searching can indeed further improve the performance of H3-OPT in CDR3 structure prediction. However, it is important to note that this optimal cutoff value heavily depends on the testing dataset being used. Therefore, we provide a recommended cutoff value and offer a program interface for users who wish to manually define the cutoff value based on their specific requirements. Here, we showed the average Cα-RMSDs of our test set under different confidence cutoffs and the results have been added in the text accordingly.

      Author response table 1.

      We also appreciate your reminder, and we have conducted a benchmark against OmegaFold. The results have been included in the manuscript (Fig 4a-b).

      Author response image 3.

      Reviewer #1 (Recommendations For The Authors):

      1) In Fig 3a, please also compare IgFold and H3-OPT (merge Fig. S2 into Fig 3a)

      In Fig 3b, please separate Sub2 and Sub3, and add IgFold's performance.

      Thank you very much for your professional advice. We have made revisions to the figures based on your suggestions.

      Author response image 4.

      2) For the three experimentally solved structures of anti-VEGF nanobodies, what are the sequence identities of the VH domain and H3 loop, compared to the best available template? What is the length of the H3 loop? Which category (Sub1/2/3) do the targets belong to? What is the performance of AF2 or AF2-Multimer on the three targets?

      We feel sorry for these confusions. The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template. The CDR-H3 lengths of these nanobodies are both 17. According to our classification strategy, these nanobodies belong to Sub1. The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM.

      3) Is AF2-Multimer better than AF2, when using the sequences of antibody VH and antigen as input?

      Thanks for your suggestions. Many papers have benchmarked AlphaFold2-Multimer for protein complex modeling and demonstrated the accuracy of AlphaFold2-Multimer on predicting the protein complex is far from satisfactory (Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants, Rui Yin, et al., 2022). Additionally, there is no significantly difference between AlphaFold2 and AlphaFold2-Multimer on antibody modeling (Structural Modeling of Nanobodies: A Benchmark of State-of-the-Art Artificial Intelligence Programs, Mario S. Valdés-Tresanco, et al., 2023)

      From the data perspective, we employed a non-redundant dataset for training and validation. Since these structures are valuable, considering the antigen sequence would reduce the size of our dataset, potentially leading to underfitting.

      4) For H3 loop grafting, I noticed that only identical target and template H3 sequences can trigger grafting (lines 348-349). How many such cases are in the test set?

      We appreciate your comment from this perspective. There are thirty targets in our database with identical CDR-H3 templates.

      Reviewer #2 (Recommendations For The Authors):

      • It is not clear to me whether the three structures apparently used as experimental confirmation of the predictions have been determined previously in this study or not. This is a key aspect, as a retrospective validation does not have the same conceptual value as a prospective, a posteriori validation. Please note that different parts of the text suggest different things in this regard "The model was validated by experimentally solving three structures of anti-VEGF nanobodies predicted by H3-OPT" is not exactly the same as "we then sought to validate H3-OPT using three experimentally determined structures of anti-VEGF nanobodies, including a wild-type (WT) and two mutant (Mut1 and Mut2) structures, that were recently deposited in protein data bank". The authors are kindly advised to make this point clear. By the way, "protein data bank" should be in upper case letters.

      We gratefully thank you for your feedback and fully understand your concerns. To validate the performance of H3-OPT, we initially solved the structures of both the wild-type and mutants of anti-VEGF nanobodies and submitted these structures to Protein Data Bank. We have corrected “that were recently deposited in protein data bank” into “that were recently deposited in Protein Data Bank” in our revised manuscript.

      • It would be good to clarify the goal and importance of the binding affinity prediction, as it seems a bit disconnected from the rest of the paper. Also, it would be good to include the production MD runs as Sup, Mat.

      Thanks for your valuable comment. We have added the following sentence in our manuscript to clarify the goal and importance of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”. The details of production runs have been described in Method section.

      • Has any statistical test been performed to compare the mean Cα-RMSD values across the modeling approaches included in the benchmark exercise?

      Thanks for this kind recommendation. We conducted a statistical test to assess the performance of different modeling approaches and demonstrated significant improvements with H3-OPT compared to other methods (p<0.001). Additionally, we have trained H3-OPT with five random seeds and compared mean Cα-RMSD values with all five models of AF2. Here, we showed the average Cα-RMSDs of H3-OPT and AlphaFold2.

      Author response table 1.

      • In Fig. 2c-f, I think it would be adequate to make the ordering criterion of the data points explicit in the caption or the graph itself.

      We appreciate your comment and suggestion. We have revised the graph in the manuscript accordingly.

      Author response image 5.

      • Please revise Figure S2 caption and/or its content. It is not clear, in parts b and c, which is the performance of H3-OPT. Why weren´t some other antibody-specific tools such as IgFold included in this comparison?

      Thanks for your comments. The performance of H3-OPT is not included in Figure S2. Prior to training H3-OPT, we conducted several preliminary studies, and the detailed results are available in the supplementary sections. We showed that AlphaFold2 outperformed other methods (including AI-based methods and TBM methods) and produced sub-angstrom predictions in framework regions. The comparison of IgFold with other methods was discussed in a previous work (Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies, Ruffolo, J. A. et al, 2023). In that study, we found that IgFold largely yielded results comparable to AlphaFold2 but with lower prediction cost. Additionally, we have also conducted a detailed comparison of CDR-H3 loops with IgFold in our main text.

      • It is stated that "The relative binding affinities of the antigen-antibody complexes were evaluated using the Python script...". Which Python script?

      Thank you for your comments, and I apologize for the confusion. This python script is a module of AMBER software, we have corrected “The relative binding affinities of the antigen-antibody complexes were evaluated using the python script” into “The relative binding affinities of the antigen-antibody complexes were evaluated using the MMPBSA module of AMBER software”.

      Reviewer #3 (Recommendations For The Authors):

      Does H3-OPT improve the AF2 score on the CDR-H3? It would be interesting to see whether grafted and PSPM loops improve the pLDDT score by using for example AF2Rank [https://doi.org/10.1103/PhysRevLett.129.238101]. That could also be a way to include a confidence score into H3-OPT.

      We are so grateful for your kind question. H3-OPT could not provide a confidence score for output in current version, so we did not know whether H3-OPT improve the AF2 score or not.

      We appreciate your kind recommendations and have calculated the pLDDT scores of all models predicted by H3-OPT and AF2 using AF2Rank. We showed that the average of pLDDT scores of different predicted models did not match the results of Cα-RMSD values.

      Author response table 3.

      Therefore, we have trained a separate module to predict the confidence score of the optimized CDR-H3 loops. We hope that this module can provide users with reliable guidance on whether to use predicted CDR-H3 loops.

      The test case of Nb PDB id. 8CWU is an interesting example where AF2 outperforms H3-OPT and PLMs. The top AF2 model according to ColabFold (using default options and no template [https://doi.org/10.1038/s41592-022-01488-1]) shows a remarkably good model of the CDR-H3, explaining the low Ca-RMSD in the Extended Data Fig. 3. However, the pLDDT score of the 4 tip residues (out of 12), forming the hairpin of the CDR-H3 loop, pushes down the average value bellow the CBM cut-off of 80. I wonder if there is a lesson to learn from that test case. How sensible is H3-OPT to the CBM cut-off definition? Have the authors tried weighting the residue pLDDT score by some structural criteria before averaging? I guess AF2 may have less confidence in hydrophobic tip residues in exposed loops as the solvent context may not provide enough support for the pLDDT score.

      Thanks for your valuable feedback. We showed the average Cα-RMSDs of our test set under different confidence cutoffs and the results have been added in the text accordingly.

      Author response table 4.

      We greatly appreciate your comment on this perspective. Inspired on your kind suggestions, we will explore the relationship between cutoff values and structural information in related work. Your feedback is highly valuable as it will contribute to the development of our approach.

      A comparison against the new folding prediction method OmegaFold [https://doi.org/10.1101/2022.07.21.500999] is missed. OmegaFold seems to outperform AF2, ESM, and IgFold among others in predicting the CDR-H3 loop conformation (See [https://doi.org/10.3390/molecules28103991] and [https://doi.org/10.1101/2022.07.21.500999]). Indeed, prediction of anti-VEGF Nb structure (PDB WT_QF_0329, chain B in supplementary data) by OmegaFold as implemented in ColabFold [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb] and setting 10 cycles, renders Ca-RMSD 1.472 Å for CDR-H3 (residues 98-115).

      We appreciate your valuable suggestion. We have added the comparison against OmegaFold in our manuscript. The results have been included in the manuscript (Fig 4a-b).

      Author response image 6.

      In our test set, OmegaFold outperformed ESMFold in predicting the CDR-H3 loop conformation. However, it failed to match the accuracy of AF2, IgFold, and H3-OPT. We discussed the difference between MSA-based methods (such as AlphaFold2) and MSA-free methods (such as IgFold) in predicting CDR-H3 loops. Similarly, OmegaFold provided comparative results with HelixFold-Single and other MSA-free methods but still failed to match the accuracy of AlphaFold2 and H3-OPT on Sub1.

      The time-consuming step in H3-OPT is the AF2 prediction. However, most of the time is spent in modeling the mAb and Nb scaffolds, which are already very well predicted by PLMs (See Fig. 4 in [https://doi.org/10.3390/molecules28103991]). Hence, why not use e.g. OmegaFold as the first step, whose score also correlates to the RMSD values [https://doi.org/10.3390/molecules28103991]? If that fails, then use AF2 or grafting. Alternatively, use a PLM model to generate a template, remove/mask the CDR loops (at least CDR-H3), and pass it as a template to AF2 to optimize the structure with or without MSA (e.g. using AF2Rank).

      Thanks for your professional feedbacks. It is really true that the speed of MSA searching limited the application of high-throughput structure prediction. Previous studies have demonstrated that the deep learning methods performed well on framework residues. We once tried to directly predict the conformations of CDR-H3 loops using PLM-based methods, but this initial version of H3-OPT lacking the CBM could not replicate the accuracy of AF2 in Sub1. Similarly, we showed that IgFold and OmegaFold also provide lower accuracy in Sub1 (average Cα-RMSD is 1.71 Å and 1.83 Å, respectively, whereas AF2 predicted an average of 1.07 Å). Therefore, The predictions of AlphaFold2 not only produce scaffolds but also provide the highest quality of CDR-H3 loops when high-resolution templates and MSA are available.

      Thank you once again for your kind recommendation. In the current version of H3-OPT, we have highlighted the strengths of H3-OPT in combining the AF2 and PLM models in various scenarios. AF2 can provide accurate predictions for short loops with fewer than 10 amino acids, and PLM-based models show little or no improvement in such cases. In the next version of H3-OPT, as the first step, we plan to replace the AF2 models with other methods if any accurate MSA-free method becomes available in the future.

      Line 115: The statement "IgFold provided higher accuracy in Sub3" is not supported by Fig. 2a.

      We are sorry for our carelessness. We have corrected “IgFold provided higher accuracy in Sub3” into “IgFold provided higher accuracy in Sub3 (Fig. 3a)”.

      Lines 195-203: What is the statistical significance of results in Fig 5a and 5b?

      Thank you for your kind comments. The surface residues of AF2 models are significantly higher than those of H3-OPT models (p < 0.005). In Fig. 5b, H3-OPT models predicted lower values than AF2 models in terms of various surface properties, including polarity (p <0.05) and hydrophilicity (p < 0.001).

      Lines 212-213: It is not easy to compare and quantify the differences between electrostatic maps in Fig. 5d. Showing a Dmap (e.g. mapmodel - mapexperiment) would be a better option. Additionally, there is no methodological description of how the maps were generated nor the scale of the represented potential.

      Thank you for pointing this out. We have modified the figure (Fig. 5d) according to your kind recommendation and added following sentences to clarify the methodological description on the surface electrostatic potential:

      “Analysis of surface electrostatic potential

      We generated two-dimensional projections of CDR-H3 loop’s surface electrostatic potential using SURFMAP v2.0.0 (based on GitHub from February 2023: commit: e0d51a10debc96775468912ccd8de01e239d1900) with default parameters. The 2D surface maps were calculated by subtracting the surface projection of H3-OPT or AF2 predicted H3 loops to their native structures.”

      Author response image 7.

      Lines 237-240 and Table 2: What is the meaning of comparing the average free energy of the whole set? Why free energies should be comparable among test cases? I think the correct way is to compare the mean pair-to-pair difference to the experimental structure. Similarly, reporting a precision in the order of 0.01 kcal/mol seems too precise for the used methodology, what is the statistical significance of the results? Were sampling issues accounted for by performing replicates or longer MDs?

      Thanks for your rigorous advice and pointing out these issues. We have modified the comparisons of free energies of different predicted methods and corrected the precision of these results. The average binding free energies of H3-OPT complexes is lower than AF2 predicted complexes, but there is no significant difference between these energies (p >0.05).

      Author response table 4.

      Comparison of binding affinities obtained from MD simulations using AF2 and H3-OPT.

      Thanks for your comments on this perspective. Longer MD simulations often achieve better convergence for the average behavior of the system, while replicates provide insights into the variability and robustness of the results. In our manuscript, each MD simulation had a length of 100 nanoseconds, with the initial 90 nanoseconds dedicated to achieving system equilibrium, which was verified by monitoring RMSD (Root Mean Square Deviation). The remaining 10 nanoseconds of each simulation were used for the calculation of free energy. This approach allowed us to balance the need for extensive sampling with the verification of system stability.

      Regarding MD simulations for CDR-H3 refinement, its successful application highly depends on the starting conformation, the force field, and the sampling strategy [https://doi.org/10.1021/acs.jctc.1c00341]. In particular, the applied plan MD seems a very limited strategy (there is not much information about the simulated times in the supplementary material). Similarly, local structure optimizations with QM methods are not expected to improve a starting conformation that is far from the experimental conformation.

      Thank you very much for your valuable feedback. We fully agree with your insights regarding the limitations of MD simulations. Before training H3-OPT, we showed the challenge of accurately predicting CDR-H3 structures. We then tried to optimize the CDR-H3 loops by computational tools, such as MD simulations and QM methods (detailed information of MD simulations is provided in the main text). Unfortunately, these methods were not expected to improve the accuracy of AF2 predicted CDR-H3 loops. These results showed that MD simulations and QM methods not only are time-consuming, but also failed to optimize the CDR-H3 loops. Therefore, we developed H3-OPT to tackle these issues and improve the accuracy of CDR3-H3 for the development of antibody therapeutics.

      Text improvements

      Relevant statistical and methodological parameters are presented in a dispersed manner throughout the text. For example, the number of structures in test, training, and validation datasets is first presented in the caption of Fig. 4. Similarly, the sequence identity % to define redundancy is defined in the caption of Fig. 1a instead of lines 87-88, where authors define "we constructed a non-redundant dataset with 1286 high-resolution (<2.5 Å)". Is the sequence redundancy for the CDR-H3 or the whole mAb/Nb?

      Thank you for pointing out these issues. We have added the number of structures in each subgroup in the caption of Fig. 1a: “Clustering of the filtered, high-resolution structures yielded three datasets for training (n = 1021), validation (n = 134), and testing (n = 131).” and corrected “As data quality has large effects on prediction accuracy, we constructed a non-redundant dataset with 1286 high-resolution (<2.5 Å) antibody structures from SAbDab” into “As data quality has large effects on prediction accuracy, we constructed a non-redundant dataset (sequence identity < 0.8) with 1286 high-resolution (<2.5 Å) antibody structures from SAbDab” in the revised manuscript. The sequence redundancy applies to the whole mAb/Nb.

      The description of ablation studies is not easy to follow. For example, what does removing TGM mean in practical terms (e.g. only AF2 is used, or PSPM is applied if AF2 score < 80)? Similarly, what does removing CBM mean in practical terms (e.g. all AF2 models are optimized by PSPM, and no grafting is done)? Thanks for your comments and suggestions. We have corrected “d, Differences in H3-OPT accuracy without the template module. e, Differences in H3-OPT accuracy without the CBM. f, Differences in H3-OPT accuracy without the TGM.” into “d, Differences in H3-OPT accuracy without the template module. This ablation study means only PSPM is used. e, Differences in H3-OPT accuracy without the CBM. This ablation study means input loop is optimized by TGM and PSPM. f, Differences in H3-OPT accuracy without the TGM. This ablation study means input loop is optimized by CBM and PSPM.”.

      Authors should report the values in the text using the same statistical descriptor that is used in the figures to help the analysis by the reader. For example, in lines 223-224 a precision score of 0.75 for H3-OPT is reported in the text (I assume this is the average value), while the median of ~0.85 is shown in Fig. 6a.

      Thank you for your careful checks. We have corrected “After identifying the contact residues of antigens by H3-OPT, we found that H3-OPT could substantially outperform AF2 (Fig. 6a), with a precision of 0.75 and accuracy of 0.94 compared to 0.66 precision and 0.92 accuracy of AF2.” into “After identifying the contact residues of antigens by H3-OPT, we found that H3-OPT could substantially outperform AF2 (Fig. 6a), with a median precision of 0.83 and accuracy of 0.97 compared to 0.64 precision and 0.95 accuracy of AF2.” in proper place of manuscript.

      Minor corrections

      Lines 91-94: What do length values mean? e.g. is 0-2 Å the RMSD from the experimental structure?

      We appreciate your comment and apologize for any confusion. The RMSD value is actually from experimental structure. The RMSD value evaluates the deviation of predicted CDR-H3 loop from native structure and also represents the degree of prediction difficulty in AlphaFold2 predictions. We have added following sentence in the proper place of the revised manuscript: “(RMSD, a measure of the difference between the predicted structure and an experimental or reference structure)”.

      Line 120: is the "AF2 confidence score" for the full-length or CDR-H3?

      We gratefully appreciate for your valuable comment and have corrected “Interestingly, we observed that AF2 confidence score shared a strong negative correlation with Cα-RMSDs (Pearson correlation coefficient =-0.67 (Fig. 2b)” into “Interestingly, we observed that AF2 confidence score of CDR-H3 shared a strong negative correlation with Cα-RMSDs (Pearson correlation coefficient =-0.67 (Fig. 2b)” in the revised manuscript.

      Line 166: Do authors mean "Taken" instead of "Token"?

      We are really sorry for our careless mistakes. Thank you for your reminder.

      Line 258: Reference to Fig. 1 seems wrong, do authors mean Fig. 4?

      We sincerely thank the reviewer for careful reading. As suggested by the reviewer, we have corrected the “Fig. 1” into “Fig. 4”.

      Author response image 7.

      Point out which plot corresponds to AF2 and which one to H3-OPT

      Thanks for pointing out this issue. We have added the legends of this figure in the proper positions in our manuscript.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript aimed at elucidating the substrate specificity of two M23 endopeptidase Lysostaphin (LSS) and LytM in S. aureus. Endopeptidases are known to cleave the glycine-bridges of staphylococcal cell wall peptidoglycan (PG). To address this question, various glycine-bridge peptides were synthesized as substrates, the catalytic domain of LSS and LytM were recombinantly expressed and purified, and the reactions were analyzed using solution-state NMR. The major finding is that LytM is not only a Gly-Gly endopeptidase, but also cleaves D-Ala-Gly. Technically, the advantage of using real-time NMR was emphasized in the manuscript. The study explores an interesting aspect of cell wall hydrolases in terms of substrate-level regulation. It potentially identified new enzymatic activity of LytM. However, the biological significance and relevance of the conclusions remain clear, as the results are mostly from synthetic substrates.

      Strengths:

      The study explores an interesting aspect of cell wall hydrolases in terms of substrate-level regulation. It potentially identified new enzymatic activity of LytM.

      Weaknesses:

      1) Significance: while the current study provided a detailed analysis of various substrates, the conclusions are mainly based on synthesized peptides. One experiment used purified muropeptides (Fig. 3H); however, the results were unclear from this figure.

      We acknowledge the Reviewer for comments and concerns regarding the potential weaknesses of this study.

      Because peptidoglycan is insoluble, as such it is not amenable to solution-state NMR studies. However, soluble peptidoglycan (PG) fragments for NMR analyses can be obtained by digesting bacterial sacculi or via chemical synthesis. Whereas digestion results in mixtures of products, synthesis yields pure molecules. Analysis of NMR spectra of muropeptide-mimicking synthetic peptides before and after enzyme addition provides tools to identify peaks in the much more complex spectra of mutanolysin-treated sacculus.

      We will improve data presentation in Figure 3H in the revised version of our manuscript and emphasize the similarity of product peaks in spectra acquired from experiments using either synthetic peptides or mutanolysin-digested sacculus.

      The results from synthesized peptides may not necessarily correlate with their biological functions in vivo.

      The Reviewer refers several times to the use of synthetic peptides in this study. While it is unclear to us whether the concern is about the synthetic nature of the molecules or because the peptides are devoid of PG disaccharide units, it is true that PG fragments lack the 3D architecture present in intact sacculus, and thus cannot perfectly mimic the in vivo milieu. The fragments, as well as purified sacculus, also lack all other components present in an intact bacterial cell wall. Our largest synthetic peptide (7), however, represents a crosslinked muropeptide (stem-pentaGly-stem) which according to the structural model recently presented by Razew et al. (2023) (Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706) is large enough to cover the peptidic interaction interface between substrate and enzyme.

      Secondly, the study used only the catalytic domain of both proteins. It is known that the substrate specificity of these enzymes is regulated by their substrate-binding domains. There is no mention of other domains in the manuscript and no justification of why only the catalytic domain was studied. In short, the relevance of the results from the current study to the enzymes' actual physiological functions remains to be addressed, which attenuated the significance of the study.

      Lysostaphin catalytic domain was used for experimental simplicity and to allow direct comparison with LytM catalytic domain. Because lysostaphin cell-wall targeting (SH3b) domain interacts with the substrate with variable affinities depending on the substrate structure (Tossavainen et al., Structural and functional insights into lysostaphin-substrate interaction, Front. Mol. Biosci. 5, 60 (2018) and Gonzalez-Delgado et al., Two-site recognition of Staphylococcus aureus peptidoglycan by lysostaphin SH3b, Nat. Chem. Biol. 16, 24-30 (2020)), we would have had skewed results on kinetics because of this interaction.

      Catalytic domains were used also in the article by Razew et al. (Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706 (2023)). They showed that mature lysostaphin and lysostaphin catalytic domain hydrolysed the same Gly-Gly bonds.

      Moreover, full-length LytM is catalytically inactive. This is because the linker between its N-terminal and catalytic domains occludes the catalytic site (Odintsov et al. Latent LytM at 1.3 Å resolution. J. Mol. Biol. 225, 775 (2004)). LytM catalytic domain without its N-terminal segment is active (Odintsov et al (2004) and Firczuk et al. Crystal structure of active LytM. J. Mol. Biol 354, 578 (2005)).

      2) Impact and novelty:

      (1) the current study provided evidence suggesting the novel function of LytM in cleaving D-Ala-Gly. The impact of this finding is unclear. The manuscript discussed Enterococcus faecalis EnpA. But how about other M23 endopeptidases? What is biological relevance?

      EnpA was specifically mentioned because it has been reported to also cleave the D-Ala-Gly bond. Structural similarities between the enzymes could reveal the basis for this bond specificity. Moreover, the focus of the study was not to reveal the biological function of LytM but rather to understand which amino acid substitutions lead to differences in specificities in the two structurally very similar enzymes.

      (2) A very similar study published recently showed that the activity of LSS and LytM is regulated by PG cross-linking: LSS cleaves more cross-linked PG and LytM cleaves less cross-linked PG (Razew, A., Laguri, C., Vallet, A., et al. Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706 (2023). The results of this paper are different from the current study whereby both LSS and LytM prefer cross-linked substrates (Fig, 2JKL). Moreover, no D-Ala-Gly cleavage was observed by LytM using purified PG substrate from Razew A et al. An explanation of inconsistent results is needed here. In my opinion, the knowledge generated from the current study has not been fully settled. If the results can be validated, the contribution to the field is incremental, but not substantial.

      Another point raised by the Reviewer concerned the inconsistent results between our study and the recent paper by Razew et al. (2023) regarding LytM D-Ala--Gly cleavage. The explanation might lie in the type of NMR data acquired and its interpretation. We identified all hydrolysis products using 1H, 13C multiple bond correlation NMR spectra acquired from samples dissolved in deuterated buffers. Use of C-H signals is advantageous in that they are not prone to chemical exchange phenomena and enable unambiguous chemical shift assignment. Based on shown NMR spectra, Razew and co-workers identified cleaved muropeptide bonds by observing product glycine peaks in 1H, 15N correlation spectra, specifically amide peaks of product C-terminal glycines appearing in the 114-117 ppm 15N region of spectra of samples treated with LytM/LSS. D-Ala--Gly cleavage, however produces an N-terminal glycine, whose signal due to chemical exchange is not typically observed in regular N,H correlation spectra. Razew and co-workers validated their observations with UPLC-MS analysis. However, to our understanding, their data analysis was based on the assumption that LytM cleaves between Gly4-Gly5 (or Gly1-Gly2 using our numbering), and accordingly only masses corresponding to potential products containing 1 to 4 glycines anchored to the lysine side chain were considered.

      (3) The authors emphasized a few times in the text that it is superior to use NMR technology. In my opinion, NMR has certain advantages, such as measuring the efficacy of cleavage, but it is not that superior. It should be complementary to other methods such as mass spectrometry. In addition, more relevant solid-state NMR using intact PG or bacterial cells was not discussed in the study. I am of the opinion that the corresponding text should be revised.

      We value and agree with the Reviewer’s opinion that NMR spectroscopy is complementary to other methods e.g., mass spectrometry. However, in this particular case, NMR provided simultaneously information on reaction kinetics as well as scissile bonds in the substrates, which allowed us to compare rates of hydrolysis in different PG fragments and reshape the substrate specificities of LytM/LSS. We also agree that solid-state NMR is a wonderful technique. In our revised manuscript, we will edit the text accordingly.

      3) The conclusions are not fully supported by the data

      As mentioned above, the conclusions from synthesized peptide substrates may not necessarily reveal physiological functions. The conclusions need to be validated by more physiological substrates.

      As pointed out above in our response to the potential weaknesses of this study, the aim of this work was not to reveal the physiological function of LytM but to glean information on its substrate specificity that echoes its functional role in a substrate level. Hitherto LytM has been shown to cleave amide bonds between glycines without providing detailed information about the specific scissile bonds in the established PG components in S. aureus cell wall. The same holds true for lysostaphin as well. This study provides concomitantly information on the rates of hydrolysis and scissile bonds of these two enzymes. We deduced that LytM, and especially lysostaphin substrate specificity is defined by D-Ala-Gly cross-linking, which is a structural property, whereas Razew et al. (2023) discuss about “more cross-linked” and “less cross-linked PG”, which is a supramolecular asset or density.

      4) There are some issues with the presentation of the figures, text, and formatting.

      We are grateful to the Reviewer for bringing up issues in figures and text. We will address these in the revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This work investigates the enzymatic properties of lysostaphin (LSS) and LytM, two enzymes produced by Staphylococcus aureus and previously described as glycyl-glycyl endopeptidases. The authors use synthetic peptide substrates mimicking peptidoglycan fragments to determine the substrate specificity of both enzymes and identify the bonds they cleave.

      Strengths:

      • This work is addressing a real gap in our knowledge since very little information is available about the substrate specificity of peptidoglycan hydrolases.

      • The experimental strategy and its implementation are robust and provide a thorough analysis of LSS and LytM enzymatic activities. The results are very convincing and demonstrate that the enzymatic properties of the model enzymes studied need to be revisited.

      Weaknesses:

      • The manuscript is difficult to read in places and some figures are not always presented in a way that is easy to follow. This being said, the authors have made a good effort to present their experiments in an engaging manner. Some recommendations have been made to improve the current manuscript but these remain minor issues.

      We thank the Reviewer for providing positive feedback on our manuscript and for appreciating the systematic work behind this study which aims to unknot the substrate specificity of two S. aureus PG hydrolyzing enzymes. We are grateful for the comments aiming to improve the presentation of the current version of manuscript and we will take these into account while preparing the revised version of the manuscript.

    1. Author Response

      We would like to thank the senior editor, reviewing editor and all the reviewers for taking out precious time to review our manuscript and appreciating our study. We are excited that all of you have found strength in our work and have provided comments to strengthen it further. We sincerely appreciate the valuable comments and suggestions, which we believe will help us to further improve the quality of our work.

      Reviewer 1

      The manuscript by Dubey et al. examines the function of the acetyltransferase Tip60. The authors show that (auto)acetylation of a lysine residue in Tip60 is important for its nuclear localization and liquid-liquid-phase-separation (LLPS). The main observations are: (i) Tip60 is localized to the nucleus, where it typically forms punctate foci. (ii) An intrinsically disordered region (IDR) within Tip60 is critical for the normal distribution of Tip60. (iii) Within the IDR the authors show that a lysine residue (K187), that is auto-acetylated, is critical. Mutation of that lysine residue to a non-acetylable arginine abolishes the behavior. (iv) biochemical experiments show that the formation of the punctate foci may be consistent with LLPS.

      On balance, this is an interesting study that describes the role of acetylation of Tip60 in controlling its biochemical behavior as well as its localization and function in cells. The authors mention in their Discussion section other examples showing that acetylation can change the behavior of proteins with respect to LLPS; depending on the specific context, acetylation can promote (as here for Tip60) or impair LLPS.

      Strengths:

      The experiments are largely convincing and appear to be well executed.

      Weaknesses:

      The main concern I have is that all in vivo (i.e. in cells) experiments are done with overexpression in Cos-1 cells, in the presence of the endogenous protein. No attempt is made to use e.g. cells that would be KO for Tip60 in order to have a cleaner system or to look at the endogenous protein. It would be reassuring to know that what the authors observe with highly overexpressed proteins also takes place with endogenous proteins.

      Response: The main reason to perform these experiments with overexpression system was to generate different point mutants and deletion mutants of TIP60 and analyse their effect on its properties and functions. To validate our observations with overexpression system, we also examined localization pattern of endogenous TIP60 by IFA and results depict similar kind of foci pattern within the nucleus as observed with overexpressed TIP60 protein (Figure 4A). However, we understand the reviewers concern and agree to repeat some of the overexpression experiments under endogenous TIP60 knockdown conditions using siRNA or shRNA against 3’ UTR region.

      Also, it is not clear how often the experiments have been repeated and additional quantifications (e.g. of western blots) would be useful.

      Response: The experiments were performed as independent biological replicates (n=3) and this is mentioned in the figure legends. Regarding the suggestion for quantifying Western blots, we want to bring into the notice that where ever required (for blots such as Figure 2F, 6H) that require quantitative estimation, graph representing quantitated value with p-value had already been added. However as suggested, in addition, quantitation for Figure 6D will be performed and added in the revised version.

      In addition, regarding the LLPS description (Figure 1), it would be important to show the wetting behaviour and the temperature-dependent reversibility of the droplet formation.

      Response: We appreciate the suggestion, and we will perform these assays and include the results in the revised version.

      In Fig 3C the mutant (K187R) Tip60 is cytoplasmic, but still appears to form foci. Is this still reflecting phase separation, or some form of aggregation?

      Response: TIP60 (K187R) mutant remains cytosolic with homogenous distribution as shown in Figure 2E. Also with TIP60 partners like PXR or p53, this mutant protein remains homogenously distributed in the cytosol. However, when co-expressed with TIP60 (Wild-type) protein, this mutant protein although still remain cytosolic some foci-like pattern is also observed at the nuclear periphery which we believe could be accumulated aggregates.

      Reviewer 2

      The manuscript "Autoacetylation-mediated phase separation of TIP60 is critical for its functions" by Dubey S. et al reported that the acetyltransferase TIP60 undergoes phase separation in vitro and cell nuclei. The intrinsically disordered region (IDR) of TIP60, particularly K187 within the IDR, is critical for phase separation and nuclear import. The authors showed that K187 is autoacetylated, which is important for TIP60 nuclear localization and activity on histone H4. The authors did several experiments to examine the function of K187R mutants including chromatin binding, oligomerization, phase separation, and nuclear foci formation. However, the physiological relevance of these experiments is not clear since TIP60 K187R mutants do not get into nuclei. The authors also functionally tested the cancer-derived R188P mutant, which mimics K187R in nuclear localization, disruption of wound healing, and DNA damage repair. However, similar to K187R, the R188P mutant is also deficient in nuclear import, and therefore, its defects cannot be directly attributed to the disruption of the phase separation property of TIP60. The main deficiency of the manuscript is the lack of support for the conclusion that "autoacetylation-mediated phase separation of TIP60 is critical for its functions".

      This study offers some intriguing observations. However, the evidence supporting the primary conclusion, specifically regarding the necessity of the intrinsically disordered region (IDR) and K187ac of TIP60 for its phase separation and function in cells, lacks sufficient support and warrants more scrutiny. Additionally, certain aspects of the experimental design are perplexing and lack controls to exclude alternative interpretations. The manuscript can benefit from additional editing and proofreading to improve clarity.

      Response: We understand the point raised by the reviewer, however we would like to draw his attention to the data where we clearly demonstrated that acetylation of lysine 187 within the IDR of TIP60 is required for its phase separation (Figure 2J). We would like to draw reviewer’s attention to other TIP60 mutants within IDR (R177H, R188H, K189R) which all enters the nucleus and make phase separated foci. Cancer-associated mutation at R188 behaves similarly because it also hampers TIP60 acetylation at the adjacent K187 residue. Our in vitro and in cellulo results clearly demonstrate that autoacetylation of TIP60 at K187 within its IDR is critical for multiple functions including its translocation inside the nucleus, its protein-protein interaction and oligomerization which are prerequisite for phase separation of TIP60.

      There are two putative NLS sequences (NLS #1 from aa145; NLS #2 from aa184) in TIP60, both of which are within the IDR. Deletion of the whole IDR is therefore expected to abolish the nuclear localization of TIP60. Since K187 is within NLS #2, the cytoplasmic localization of the IDR and K187R mutants may not be related to the ability of TIP60 to phase separation.

      Response: We are not disputing the presence of putative NLS within IDR region of TIP60, however our results through different mutations within IDR region (K76, K80, K148, K150, R177, R178, R188, K189) clearly demonstrate that only K187 residue acetylation is critical to shuttle TIP60 inside the nucleus while all other lysine mutants located within these putative NLS region exhibited no impact on TIP60’s nuclear shuttling. We have mentioned this in our discussion, that autoacetylation of TIP60’s K187 may induce local structural modifications in its IDR which is critical for translocating TIP60 inside the nucleus where it undergoes phase separation critical for its functions. A previous example of similar kind shows, acetylation of lysine within the NLS region of TyrRS by PCAF promote its nuclear localization (Cao X et al 2017, PNAS). IDR region (which also contains K187 site) is important for phase separation once the protein enters inside the nucleus. This could be the cell’s mechanism to prevent unwarranted action of TIP60 until it enters the nucleus and phase separate on chromatin at appropriate locations.

      The chromatin-binding activity of TIP60 depends on HAT activity, but not phase-separation (Fig 1I), (Fig 2B). How do the authors reconcile the fact that the K187R mutant is able to bind to chromatin with lower activity than the HAT mutant (Fig 2F, 2I)?

      Response: K187 acetylation is required for TIP60’s nuclear translocation but not critical for chromatin binding. When soluble fraction is prepared in fractionation experiment, nuclear membrane is disrupted and TIP60 (K187R) mutant has no longer hindrance in accessing the chromatin and thus can load on the chromatin (although not as efficient as Wild-type protein). For efficient chromatin binding auto-acetylation of other lysine residues in TIP60 is required which might be hampered due to reduced catalytic activity or not sufficient enough to maintain equilibrium with HDAC’s activity inside the nucleus. In case of K187R, the reduced auto-acetylation is captured when protein is the cytosol. During fractionation, once this mutant has access to chromatin, it might auto-acetylate other lysine residues critical for chromatin loading (remember catalytic domain is intact in this mutant). This is evident due to hyper auto-acetylation of Wild-type protein compared to K187R or HAT mutant proteins. We want to bring into notice that phase-separation occurs only after efficient chromatin loading of TIP60 that is the reason that under in-cellulo conditions, both K187R (which cannot enter the nucleus) and HAT mutant (which enters the nucleus but fails to efficiently binds onto the chromatin) fails to form phase separated nuclear punctate foci.

      The DIC images of phase separation in Fig 2I need to be improved. The image for K187R showed the irregular shape of the condensates, which suggests particles in solution or on the slide. The authors may need to use fluorescent-tagged TIP60 in the in vitro LLPS experiments.

      Response: We believe this comment is for figure 2J. The irregularly shaped condensates observed for TIP60 K187R are unique to the mutant protein and are not caused by particles on the slide. We would like to draw reviewer’s attention to supplementary figure S2A, where DIC images for TIP60 (Wild-type) protein tested under different protein and PEG8000 conditions are completely clear where protein did not made phase separated droplets ruling out the probability of particles in solution or slides.

      The authors mentioned that the HAT mutant of TIP60 does not phase separate, which needs to be included.

      Response: We have already added the image of RFP-TIP60 (HAT mutant) in supplementary Fig S4A (panel 2) in the manuscript.

      Related to Point 3, the HAT mutant that doesn't form punctate foci by itself, can incorporate into WT TIP60 (Fig 5A). In vitro LLPS assay for WT, HAT, and K187R mutants with or without acetylation should be included. WT and mutant TIP can be labelled with GFP and RFP, respectively.

      Response: We would like to draw reviewer’s attention towards our co-expression experiments performed in Figure 5 where Wild-type protein (both tagged and untagged condition) is able to phase separate and make punctate foci with co-expressed HAT mutant protein (with depleted autoacetylation capacity). We believe these in cellulo experiments are already able to answer the queries what reviewer is suggesting to acheive by in vitro experiments.

      Fig 3A and 3B showed that neither K187 mutant nor HAT mutant could oligomerize. If both experiments were conducted in the absence of in vitro acetylation, how do the authors reconcile these results?

      Response: We thank the reviewer for highlighting our oversight in omitting the mention of acetyl coenzyme A here. To induce acetylation under in vitro conditions, we have added 10 µM acetyl CoA into the reactions depicted in Figure 3A and 3B. The information for acetyl CoA for Figure 3B was already included in the GST-pull down assay (material and methods section). We will add the same in the oligomerization assay of material and methods in the revised manuscript.

      In Fig 4, the colocalization images showed little overlap between TIP60 and nuclear speckle (NS) marker SC35, indicating that the majority of TIP60 localized in the nuclear structure other than NS. Have the authors tried to perturbate the NS by depleting the NS scaffold protein and examining TIP60 foci formation? Do PXR and TP53 localize to NS?

      Response: Under normal conditions majority of TIP60 is not localized in nuclear speckles (NS) so we believe that perturbing NS will not have significant effect on TIP60 foci formation. Interestingly, recently a study by Shelly Burger group (Alexander KA et al Mol Cell. 2021 15;81(8):1666-1681) had shown that p53 localizes to NS to regulate subset of its targeted genes. We have mentioned about it in our discussion section. No information is available about localization of PXR in NS.

      Were TIP60 substrates, H4 (or NCP), PXR, TP53, present inTIP60 condensates in vitro? It's interesting to see both PXR and TP53 had homogenous nuclear signals when expressed together with K187R, R188P (Fig 6E, 6G), or HAT (Suppl Fig S4A) mutants. Are PXR or TP53 nuclear foci dependent on their acetylation by TIP60? This can and should be tested.

      Response: Both p53 and PXR are known to be acetylated by TIP60. In case of PXR, TIP60 acetylate PXR at lysine 170 and this TIP60-mediated acetylation of PXR at K170 is important for TIP60-PXR foci which now we know are formed by phase separation (Bakshi K et al Sci Rep. 2017 Jun 16;7(1):3635).

      Since R188P mutant, like K187R, does not get into the nuclei, it is not suitable to use this mutant to examine the functional relevance of phase separation for TIP60. The authors need to find another mutant in IDR that retains nuclear localization and overall HAT activity but specifically disrupts phase separation. Otherwise, the conclusion needs to be restated. All cancer-derived mutants need to be tested for LLPS in vitro.

      Response: We appreciate the reviewer’s point here, but it is important to note that the objective of these experiments is to understand the impact of K187R (critical in multiple aspects of TIP60 including phase separation) and R188P (a naturally occurring cancer-associated mutation and behaving similarly to K187R) on TIP60’s activities to determine their functional relevance. As suggested by the reviewer to test and find IDR mutant that fails to phase separate however retains nuclear localization and catalytic activity can be examined in future studies.

      For all cellular experiments, it is not mentioned whether endogenous TIP60 was removed and absent in the cell lines used in this study. It's important to clarify this point because the localization and function of mutant TIP60 are affected by WT TIP60 (Fig 5).

      Response: Endogenous TIP60 was present in in cellulo experiments, however as suggested by reviewer 1 we will perform some of the in cellulo experiments under endogenous TIP60 knockdown condition to validate our findings.

      It is troubling that H4 peptide is used for in vitro HAT assay since TIP60 has much higher activity on nucleosomes and its preferred substrates include H2A.

      Response: The purpose of using H4 peptide in the HAT assay is to determine the impact of mutations of TIP60’s catalytic activity. As H4 is one of the major histone substrate for TIP60, we believe it satisfy the objective of experiments.

      Reviewer 3

      This study presents results arguing that the mammalian acetyltransferase Tip60/KAT5 auto-acetylates itself on one specific lysine residue before the MYST domain, which in turn favors not only nuclear localization but also condensate formation on chromatin through LLPS. The authors further argue that this modification is responsible for the bulk of Tip60 autoacetylation and acetyltransferase activity towards histone H4. Finally, they suggest that it is required for association with txn factors and in vivo function in gene regulation and DNA damage response.

      These are very wide and important claims and, while some results are interesting and intriguing, there is not really close to enough work performed/data presented to support them. In addition, some results are redundant between them, lack consistency in the mutants analyzed, and show contradiction between them. The most important shortcoming of the study is the fact that every single experiment in cells was done in over-expressed conditions, from transiently transfected cells. It is well known that these conditions can lead to non-specific mass effects, cellular localization not reflecting native conditions, and disruption of native interactome. On that topic, it is quite striking that the authors completely ignore the fact that Tip60 is exclusively found as part of a stable large multi-subunit complex in vivo, with more than 15 different proteins. Thus, arguing for a single residue acetylation regulating condensate formation and most Tip60 functions while ignoring native conditions (and the fact that Tip60 cannot function outside its native complex) does not allow me to support this study.

      Response: We appreciate the reviewer’s point here, but it is important to note that the main purpose to use overexpression system in the study is to analyse the effect of different generated point/deletion mutations on TIP60. We have overexpressed proteins with different tags (GFP or RFP) or without tags (Figure 3C, Figure 5) to confirm the behaviour of protein which remains unperturbed due to presence of tags. To validate we have also examined localization of endogenous TIP60 protein which also depict similar localization behaviour as overexpressed protein. We would like to draw attention that there are several reports in literature where similar kind of overexpression system are used to determine functions of TIP60 and its mutants. Also nuclear foci pattern observed for TIP60 in our studies is also reported by several other groups.

      Sun, Y., et. al. (2005) A role for the Tip60 histone acetyltransferase in the acetylation and activation of ATM. Proc Natl Acad Sci U S A, 102(37):13182-7.

      Kim, C.-H. et al. (2015) ‘The chromodomain-containing histone acetyltransferase TIP60 acts as a code reader, recognizing the epigenetic codes for initiating transcription’, Bioscience, Biotechnology, and Biochemistry, 79(4), pp. 532–538.

      Wee, C. L. et al. (2014) ‘Nuclear Arc Interacts with the Histone Acetyltransferase Tip60 to Modify H4K12 Acetylation(1,2,3).’, eNeuro, 1(1). doi: 10.1523/ENEURO.0019-14.2014.

      However, as a caution and suggested by other reviewers also we will perform some of these overexpression experiments in absence of endogenous TIP60 by using 3’ UTR specific siRNA/shRNA.

      We thank the reviewer for his comment on muti-subunit complex proteins and we would like to expand our study by determining the interaction of some of the complex subunits with TIP60 ((Wild-type) that forms nuclear condensates), TIP60 ((HAT mutant) that enters the nucleus but do not form condensates) and TIP60 ((K187R) that do not enter the nucleus and do not form condensates). We will include the result of these experiments in the revised manuscript.

      • It is known that over-expression after transient transfection can lead to non-specific acetylation of lysines on the proteins, likely in part to protect from proteasome-mediated degradation. It is not clear whether the Kac sites targeted in the experiments are based on published/public data. In that sense, it is surprising that the K327R mutant does not behave like a HAT-dead mutant (which is what exactly?) or the K187R mutant as this site needs to be auto-acetylated to free the catalytic pocket, so essential for acetyltransferase activity like in all MYST-family HATs. In addition, the effect of K187R on the total acetyl-lysine signal of Tip60 is very surprising as this site does not seem to be a dominant one in public databases.

      Response: We have chosen autoacetylation sites based on previously published studies where LC-MS/MS and in vitro acetylation assays were used to identified autoacetylation sites in TIP60 which includes K187. We have already mentioned about it in the manuscript and have quoted the references (1. Yang, C., et al (2012). Function of the active site lysine autoacetylation in Tip60 catalysis. PloS one 7, e32886. 10.1371/journal.pone.0032886. 2. Yi, J., et al (2014). Regulation of histone acetyltransferase TIP60 function by histone deacetylase 3. The Journal of biological chemistry 289, 33878–33886. 10.1074/jbc.M114.575266.). We would like to emphasize that both these studies have identified K187 as autoacetylation site in TIP60. Since TIP60 HAT mutant (with significantly reduced catalytic activity) can also enter nucleus, it is not surprising that K327 could also enter the nucleus.

      • As the physiological relevance of the results is not clear, the mutants need to be analyzed at the native level of expression to study real functional effects on transcription and localization (ChIP/IF). It is not clear the claim that Tip60 forms nuclear foci/punctate signals at physiological levels is based on what. This is certainly debated because in part of the poor choice of antibodies available for IF analysis. In that sense, it is not clear which Ab is used in the Westerns. Endogenous Tip60 is known to be expressed in multiple isoforms from splice variants, the most dominant one being isoform 2 (PLIP) which lacks a big part (aa96-147) of the so-called IDR domain presented in the study. Does this major isoform behave the same?

      Response: TIP60 antibody used in the study is from Santa Cruz (Cat. No.- sc-166323). This antibody is widely used for TIP60 detection by several methods and has been cited in numerous publications. Cat. No. will be mentioned in the manuscript. Regarding isoforms, three isoforms are known for TIP60 among which isoform 2 is majorly expressed and used in our study. Isoform and 1 and 2 have same length of IDR (150 amino acids) while isoform 3 has IDR of 97 amino acids. Interestingly, the K187 is present in all the isoforms (already mentioned in the manuscript) and missing region (96-147 amino acid) in isoform 3 has less propensity for disordered region (marked in blue circle). This clearly shows that all the isoforms of TIP60 has the tendency to phase separate.

      Author response image 1.

      • It is extremely strange to show that the K187R mutant fails to get in the nuclei by cell imaging but remains chromatin-bound by fractionation... If K187 is auto-acetylated and required to enter the nucleus, why would a HAT-dead mutant not behave the same?

      Response: We would like to draw attention that both HAT mutant and K187R mutant are not completely catalytically dead. As our data shows both these mutants have catalytic activity although at significantly decreased levels. We believe that K187 acetylation is critical for TIP60 to enter the nucleus and once TIP60 shuttles inside the nucleus autoacetylation of other sites is required for efficient chromatin binding of TIP60. In fractionation assay, nuclear membrane is dissolved while preparing the soluble fraction so there is no hindrance for K187R mutant in accessing the chromatin. While in the case of HAT mutant, it can acetylate the K187 site and thus is able to enter the nucleus however this residual catalytic activity is either not able to autoacetylate other residues required for its efficient chromatin binding or to counter activities of HDAC’s deacetylating the TIP60.

      • If K187 acetylation is key to Tip60 function, it would be most logical (and classical) to test a K187Q acetyl-mimic substitution. In that sense, what happens with the R188Q mutant? That all goes back to the fact that this cluster of basic residues looks quite like an NLS.

      Response: As suggested we will generate acetylation mimicking mutant for K187 site and examine it. Result will be added in the revised manuscript.

      • The effect of the mutant on the TIP60 complex itself needs to be analyzed, e.g. for associated subunits like p400, ING3, TRRAP, Brd8...

      Response: As suggested we will examine the effect of mutations on TIP60 complex

    1. Author Response

      Reviewer #1 (Public Review):

      “A sample size of 3 idiopathic seems underpowered relative to the many types of genetic changes that can occur in ASD. Since the authors carried out WGS, it would be useful to know what potential causative variants were found in these 3 individuals and even if not overlapping if they might expect to be in a similar biological pathway.

      If the authors randomly selected 3 more idiopathic cell lines from individuals with autism, would these cell lines also have altered mTOR signaling? And could a line have the same cell biology defects without a change in mTOR signaling? The authors argue that the sample size could be the reason for lack of overlap of the proteomic changes (unlike the phosphor-proteomic overlaps), which makes the overlapping cell biology findings even more remarkable. Or is the phenotyping simply too crude to know if the phenotypes truly are the same?”

      We appreciate these thoughtful comments and also agree that of several models, our studies indicate the possibility of mTOR alteration in multiple forms of ASD. As above, we are currently pursuing this hypothesis with newly acquired DOD support. With regard to the I-ASD population, we agree that there are a large variety of genetic changes that can occur in genetically undefined ASDs. Indeed, this is precisely why we expected to see “personalized” phenotypes in each I-ASD individual when we embarked on this study. At that time, several years ago, we had planned to expand the analyses to more I-ASD individuals to assess for additional personalized phenotypes. However, as our studies progressed, we were surprised to find convergence in our I-ASD population in terms of neurite outgrowth and migration and later proteomic results showing convergence in mTOR. We found it particularly remarkable that despite a sample size of 3 that this convergence was noted. When we had the opportunity to extend our studies to the 16p11.2 deletion population, we were thrilled to conduct the first comparison between I-ASD and a genetically defined ASD and, as such, the scope of the paper turned towards this comparison. We do agree that analyses of the other I-ASD individuals would be a beneficial endeavor, both to understand how pervasive NPC migration and neurite deficits are in autism and to assess the presence of mTOR dysregulation. Furthermore, it would be important to see whether alterations in other pathways could also lead to similar cell biological deficits, though we know that other studies of neurodevelopmental disorders have found such cellular dysregulations without reporting concurrent mTOR dysregulation. Given our current grant funding to extend these analyses, such experiments within this manuscript would not be feasible.

      Regarding the phenotyping methods used, we decided to assess neurite outgrowth and migration as they are both cytoskeleton dependent processes that are critical for neurodevelopment and are often regulated by the same genes. Furthermore, similar analyses have been applied to Fragile-X Syndrome, 22q11.2 deletion syndrome, and schizophrenia NPCs (Shcheglovitov A. et al., 2013; Mor-Shaked H. et al., 2016; Urbach A. et al., 2010; Kelley D. J. et al., 2008; Doers M. E. et al., 2014; Brennand K. et al., 2015; Lee I. S. et al., 2015; Marchetto M. C. et al., 2011). As such, it seems that multiple underlying etiologies can lead to similar dysregulated cellular phenotypes that can contribute to a variety of neurodevelopmental disorders. On a more global level, there are only a few different cellular functions a developing neuron can undergo, and these include processes such as proliferation, survival, migration, and differentiation. Thus, to understand neurodevelopmental disorders, it is important to study the more “crude” or “global” cellular functions occurring during neurodevelopment to determine whether they are disrupted in disorders such as ASD. In our studies we find that there are indeed dysregulations in many of these basic developmental processes, indicating that the typical steps that occur for normal brain cytoarchitecture may be disrupted in ASD. To understand why, we then further utilized molecular studies to “zoom” in on potential mechanisms which implicated common dysregulation in mTOR signaling as one driver for these common cellular phenotypes. As suggested, we did complete WGS on all the I-ASD individuals and did not see any overlapping genetic variants between the three I-ASD individuals as mentioned in our manuscript. The genetic data was published in a larger manuscript incorporating the data (Zhou A. et al., 2023). However, there were variants that were unique to each I-ASD individual which were not seen in their unaffected family members, and it is possible these variants could be contributing to the I-ASD phenotypes. We also utilized IPA to conduct pathway analysis on the WGS data utilizing the same approach we did in analysis of p- proteome and proteome data. From WGS data, we selected high read-quality variants that were found only in I-ASD individuals and had a functional impact on protein (ie excluding synonymous variants). The enriched pathways obtained from this data were strikingly different from the pathways we found in the p-proteome analysis and are now included in supplemental Figure 6 in the manuscript. Briefly, the top 5 enriched pathways were: O-linked glycosylation, MHC class 1 signaling, Interleukin signaling, Antigen presentation, and regulation of transcription.

      Reviewer #2 (Public Review):

      1) I found that interpreting how differential EF sensitivity is connected to the rest of the story difficult at times. First, it is unclear why these extracellular factors were picked. These are seemingly different in nature (a neuropeptide, a growth factor and a neuromodulator) targeting largely different pathways. This limits the interpretation of the ASD subtype-specific rescue results. One way of reframing that could help is that these are pro-migratory factors instead of EFs broadly defined that fail to promote migration in I-ASD lines due to a shared malfunctioning of the intracellular migration machinery or cell-cell interactions (possibly through tight junction signaling, Fig S2A). Yet, this doesn't explain the migration/neurite phenotypes in 16p11 lines where EF sensitivity is not altered, overall implying that divergent EF sensitivity independent of underlying mTOR state. What is the proposed model that connects all three findings (divergent EF sensitivity based on ASD subtypes, 2 mTOR classes, convergent cellular phenotypes)?

      We thank you for the kind assessment of our manuscript and for the thought-provoking questions posed. In terms of extracellular factors, for our study, we defined extracellular factor as any growth factor, amino acid, neurotransmitter, or neuropeptide found in the extracellular environment of the developing cells. The EFs utilized were selected due to their well-established role in regulation of early neurodevelopmental phenotypes, their expression during the “critical window” of mid-fetal development (as determined by Allan Brain Atlas), and in the case of 5-HT, its association with ASD (Abdulamir H. A. et al., 2018; Adamsen D. et al., 2014; Bonnin A. et al., 2011; Bonnin A. et al., 2007; Chen X. et al., 2015; El Marroun H. et al., 2014; Hammock E. et al., 2012; Yang C. J. et al., 2014; Dicicco-Bloom E. et al., 1998; Lu N. et al., 1998; Suh J. et al., 2001; Watanabe J. et al., 2016; Gilmore J. H. et al., 2003; Maisonpierre P. C. et al., 1990; Dincel N. et al., 2013; Levi- Montalcini R., 1987). Lastly, prior experiments in our lab with a mouse model of neurodevelopmental disorders, had shown atypical responses to EFs (IGF-1, FGF, PACAP). As such, when we first chose to use EFs in human NPCs we wanted to know 1) whether human NPCs even responded to these EFs, 2) whether EFs regulated neurite outgrowth and migration and 3) would there be a differential response in NPCs derived from those with ASD. Our studies were initiated on the I-ASD cohort and given the heterogeneity of ASD we had hypothesized we would get “personalized” neurite and migration phenotypes. Due to this reason, we also wanted to select multiple types of EFs that worked on different signaling pathways. Ultimately, instead of personalized phenotypes we found that all the I-ASD NPCs did not respond to any of the EFs tested whereas the 16p11.2 deletion NPCS did – this was therefore the only difference we found between these two “forms” of ASD. As noted, in I-ASD the lack of response to EFs can be ameliorated by modulating mTOR. However, in the 16p11.2 deletion, despite similar mTOR dysregulation as seen in I-ASD, there is no EF impairment. We do not have a cohesive model to explain why the 16pDel individuals differ from the I-ASD model other than to point to the p- proteomes which do show that the 16pDel NPCs are distinct from the I-ASD NPCs. It seems that mTOR alteration can contribute to impaired EF responsiveness in some NPCs but perhaps there is an additional defect that needs to be present in order for this defect to manifest, or that 16p11.2 deletion NPCs have specific compensatory features. For example, as noted in the thoughtful comment, the p-proteome canonical pathway analysis shows tight junction malfunction in I-ASD which is not present in the 16pDel NPCs and it could be the combination of mTOR dysregulation + dysregulated tight junction signaling that has led to lack of response to EFs in I-ASD. Regardless, we do not think the differences between two genetically distinct ASDs diminish the convergent mTOR results we have uncovered. That is, regardless of whatever defects are present in the ASD NPCs, we are able to rescue it with mTOR modulation which has fascinating implications for treatment and conceptualization for ASD. Lastly, we see our EF studies as an important inclusion as it shows that in some subtypes of ASD, lack of response to appropriate EFs could be contributing to neurodevelopmental abnormalities. Moreover, lack of response to these EFs could have implications for treatment of individuals with ASD (for example, SSRI are commonly used to treat co-morbid conditions in ASD but if an individual is unresponsive to 5- HT, perhaps this treatment is less effective). We have edited the manuscript to include an additional discussion section to address the EFs more thoroughly and have included a few extra sentences in the introduction as well!

      2) A similar bidirectional migration phenotype has been described in hiSPC-derived human cortical interneurons generated from individuals with Timothy Syndrome (Birey et al 2022, Cell Stem Cell). Here, authors show that the intracellular calcium influx that is excessive in Timothy Syndrome or pharmacologically dampened in controls results in similar migration phenotypes. Authors can consider referring to this report in support of the idea that bimodal perturbations of cardinal signaling pathways can converge upon common cellular migration deficits.

      We thank you for pointing out the similar migration phenotype in the Timothy Syndrome paper and have now cited it in our manuscript. We have also expanded on the concept of “too much or too little” of a particular signaling mechanism leading to common outcomes.

      3) Given that authors have access to 8 I-ASD hiPSC lines, it'd very informative to assay the mTOR state (e.g. pS6 westerns) in NPCs derived from all 8 lines instead of the 3 presented, even without assessing any additional cellular phenotypes, which authors have shown to be robust and consistent. This can help the readers better get a sense of the proportion of high mTOR vs low- mTOR classes in a larger cohort.

      We have already addressed this in response to reviewer 1 and the essential revisions section, providing our reasoning for not expanding the study to all 8 I-ASD individuals.

      4) Does the mTOR modulation rescue EF-specific responses to migration as well (Figure 7)

      We did not conduct sufficient replicates of the rescue EF specific responses to migration due to the time consuming and resource intensive nature of the neurosphere experiments. Unlike the neurite experiments, the neurosphere experiments require significantly more cells, more time, selection of neurospheres based on a size criterion, and then manual trace measurements. We did one experiment in Family-1 where we utilized MK-2206 to abolish the response of Sib NPCs to PACAP. Likewise, adding SC-79 to I-ASD-1 neurospheres allowed for response to PACAP.

      Author response image 1.

      Author response image 2.

      Reviewer #3: Public Review

      We appreciate the kind, detailed and very thorough review you provided for us!

      The results on the mTOR signaling pathway as a point of convergence in these particular ASD subtypes is interesting, but the discussion should address that this has been demonstrated for other autism syndromes, and in the present manuscript, there should be some recognition that other signaling pathways are also implicated as common factors between the ASD subtypes.

      With regards to the mTOR pathway, we had included the other ASD syndromes in which mTOR dysregulation has been seen including tuberous sclerosis, Cowden Syndrome, NF-1, as well as Fragile-X, Angelman, Rett and Phelan McDermid in the final paragraph of the discussion section “mTOR Signaling as a Point of Convergence in ASD”. We have now expanded our discussion to include that other signaling pathways such as MAPK, cyclins, WNT, and reelin which have also been implicated as common factors between the ASD subtypes.

      The conclusions of this paper are mostly well supported by data, but for the cell migration assay, it is not clear if the authors control for initial differences in the inner cell mass area of the neurospheres in control vs ASD samples, which would affect the measurement of migration.

      Thank you for this thoughtful comment! When we first started our migration data, inner cell mass size was indeed a major concern for which we controlled in our methods. First, when plating the neurospheres, we would only collect spheres when a majority of spheres were approximately a diameter of 100 um. Very large spheres often could not be imaged due to being out of focus and very small spheres would often disperse when plated. Thus, there were some constraints to the variability of inner cell mass size.

      Furthermore, when we initially collected data, we conducted a proof of principal test to see if initial inner cell mass area (henceforth referred to as initial sphere size or ISS) influenced migration data. To do so, we obtained migration and ISS data from each diagnosis (Sib, NIH, I-ASD, 16pASD). Then we utilized R studio to see if there is a relationship between Migration and ISS in each diagnosis category using the equation (lm(Migration~ISS, data=bydiagnosis). In this equation, lm indicates linear modeling and (~) is a term used to ascertain the relationship between Migration and ISS and the term data=bydiagnosis allows the data to be organized by diagnosis

      The results were expressed as R-squared values indicating the correlation between ISS and Migration for each diagnosis and the p-value showing statistical significance for each comparison. As shown in Author response table 1, for each data set, there is minimal correlation between Migration and ISS in each data set. Moreover, there are no statistically significant relationships between Migration and ISS indicating that initial sphere size DOES NOT influence migration data in any of our data-sets.

      Author response table 1.

      Lastly, utilizing R, we modeled what predicted migration would be like for Sib, NIH, I-ASD, and 16pASD if we accounted for ISS in each group. Raw migration data was then plotted against the predicted data as in Author response image 3.

      Author response image 3.

      As shown in the graph, there are no statistical differences between the raw migration data (the data that we actually measured in the dish) and the modeled data in which ISS is accounted for as a variable. As such, we chose not to normalize to or account for ISS in our other experiments. We have now included the above R studio analyses in our supplemental figures (Figure S1) as well.

      Also, in Fig 5 and 6, panels I and J omit the effects of drug on mTOR phosphorylation as shown for other conditions.

      Both SC-79 and MK2206 were selected in our experiments after thorough analysis of their effects on human epithelial cells and other cultured cells (citations in manuscript). However, initially, we did not know whether either of these drugs would modulate the mTOR pathway in human NPCs, thus, in Figures 5A,5D, 6A and 6D we chose to focus on two of our data-sets to establish the effect of these drugs in human NPCs. Our experiments in Family-1 and Family-2 showed us that SC-79 increases PS6 in human NPCs while MK-2206 downregulates it. Once this was established, we knew the drugs would have similar effects in the NPCs from the other families. Thus, we only conducted a proof of principle test to confirm the drug does indeed have the intended effect in I-ASD-3 and 16pDel. We have included these proof of principle westerns in Figure 5I, 5K, 6I and 6K to show that the effects of these drugs are reproducible across all our NPC lines. We did not include quantification since the data is only from our single proof of principle western.

    1. Author response

      eLife assessment

      Using a genetically controlled experimental setting, the authors find that the lack of Polycomb-dependent epigenetic programming in the oocyte and early embryo influences the developmental trajectory through gestation in the mouse. By showing a two-phase outcome of early growth restriction followed by enhancement, the authors address previous inconsistencies in the field. However, the link with placenta function and gene misregulation is not yet fully supported.

      We thank the Reviewers for their constructive comments. In response we have added significantly more data to the study and substantially rewritten the manuscript. New data include analyses of glucose, amino acid and metabolite levels in fetal and maternal blood samples, more highly resolved fetal growth analyses, a more detailed study of the hyperplastic placenta including IF analyses of labyrinth area, labyrinth to placenta and capillary to labyrinth ratios. We have also added analyses of placental DNA methylation state in offspring from oocytes lacking EED, which reveals a range of DNA methylation changes at imprinted and non-imprinted genes in HET-hom offspring compared to HET-het or WT-wt controls.

      Reviewer #1 (Public Review):

      Oberin, Petautschnig et. al investigated the developmental phenotypes that resulted from oocyte-specific loss of the EED (Embryonic Ectoderm Development) gene - a core component of the Polycomb repressive complex 2 (PRC2), which possess histone methyltransferase activity and catalyses trimethylation of histone H3 at lysine 27 (H3K27). The PRC2 complex plays essential roles in regulating chromatin structure, being an important regulator of cellular differentiation and development during embryogenesis. As novel findings, the authors find that PRC2-dependent programming in the oocyte, via loss of the core component EE2, causes placental hyperplasia and propose that the increase of placental transplacental flux of nutrients leads to fetal and postnatal overgrowth. At the mechanistic level, they show altered expression of genes previously implicated in placental hyperplasia phenotypes. They also establish interesting parallelism with the placental hyperplasia phenotype that is frequently observed in cloned mice.

      Strengths:

      The mouse breeding experiments are very well designed and are powerful to exclude potential confounding genetic effects on the developmental phenotypes that resulted from the loss of EED in oocytes. Another major strength is the developmental profiling across gestation, from pre-implantation to late gestation.

      Weaknesses:

      The evidence for 'oocyte' programming is restricted to phenotypic and gene expression analysis, without measurements of epigenetic dysregulation. It would be an added value if the authors could show evidence for altered H3K27me3 or DNA methylation in the placenta, for example.

      In an earlier previous study we identified a large number of developmentally important genes that accumulated H3K27me3 in primary-secondary stage growing oocytes and were repressed by EED (Jarred et al., 2022 Clinical Epigenetics). However, H3K27me3 was removed from all from these genes during preimplantation development, indicating that maternal inheritance of H3K27me3 at a wide range of genes is unlikely (Jarred et al., 2022 Clinical Epigenetics). Consistent with this only a small number of genes, including Slc38a4 and C2MC, have been shown to be functionally important in H3K27me3-dependent imprinting (Matoba et al., 2022 Genes and Development). Moreover, a related study showed that deletion of Setd2 and consequent loss of H3K36me3 in oocytes led to spreading of H3K27me3 into regions that were otherwise marked by H3K36me3 and DNA methylation (Xu et al. 2019 Nature Genetics 51:844–56). Based on these studies, we proposed that loss of EED and H3K27me3 may result in the ectopic spreading of H3K36me3 and DNA methylation in oocytes and that altered DNA methylation may then be transmitted to offspring and affect developmental outcomes (Jarred et al., 2022 Clinical Epigenetics)

      Given this hypothesis we analysed DNA methylation rather than H3K27me3 in the placenta of WT-wt, HET- het and HET-hom offspring. This revealed differentially methylated regions (DMRs) in HET-hom placentas at two H3K27me3 imprinted genes Sfmbt2 (C2MC) and Mbnl2, five classically imprinted genes and at 74 DMRs not associated with imprinted loci. Together, our data supports the hypothesis from Jarred et al., 2022 Clinical Epigenetics that loss of EED in oocytes results in altered DNA methylation patterning at both imprinted and non-imprinted genes in offspring and that this is likely to affect offspring growth and development. However, whether these changes result from direct alteration of DNA methylation in oocytes remains unclear.

      These new data are now included in results (Lines 387-409), Figure 6I, Supplementary File H-J and Discussion Lines 569-581.

      Reviewer Comment 1. The claim that placental hyperplasia drives offspring catch-up growth is not supported by current experimental data. The authors do not address if transplacental flux is increased in the hyperplastic placentae, measure amino acids and glucose in fetal/maternal plasma, or perform tetraploid rescue experiments to ascertain the contribution of the placenta to growth phenotypes. Furthermore, it is unclear, from the current data, if the surface area for nutrient transport is actually increased in the hyperplastic placenta and the extent to which other cell populations (i.e. spongiotrophoblasts) are affected in addition to glycogen cells. In addition, one of the supporting conclusions that the placenta is a key contributor to fetal overgrowth is based on a very crude measurement - placenta efficiency - which the authors claim is increased in the homozygous mutants compared to controls. After analysing the data carefully, I find evidence for decreased placental efficiency instead. I believe that the authors mistakenly present the data as placenta to fetal weight ratios, which led to the misinterpretation of the 'efficiency' concept.

      We thank the reviewer for pointing out our error in the placental efficiency data and we have now corrected the placental efficiency graphs (fetal/placental weight ratios) and updated the text throughout the manuscript as required (Figure 3I-K). As requested and described below, we have also added significantly more data, which support the conclusion that placental function is not enhanced in HET-hom mice and is unlikely to support fetal growth recovery.

      The new data and analyses we have added include:

      1. Further analyses of glycogen-enriched and non-glycogen-enriched cell counts in the decidua and junctional zones (Figure 4F-J)

      2. Total glycogen cell counts for male and female placentas (Figure 4 – figure supplement 1F)

      3. New analyses of fetal blood glucose levels at E17.5 and E18.5 and matching data from the mothers of each litter (Figure 4M)

      4. New analyses of the circulating amino acid levels and metabolites in fetal blood of E17.5 offspring and matching data from the mothers of each litter (Figure 8)

      5. New IF analyses of CD31 (PECAM-1) and combined this with machine learning assisted quantitative analyses of labyrinth and capillary areas using HALO (Figure 5)

      6. Separated male and female offspring and placental weights at E14.5 and E17.5 and total areas of the placenta, decidua, junctional zone and labyrinth (Figure 3 – figure supplement 1) which provide more insight into potential sex-specific differences in HET-hom offspring and placenta

      We have significantly re-written the results and discussion to reflect our new data and interpretation.

      While we did not assess transplacental flux, our new data revealed: 1. HET-hom fetuses had lower blood glucose levels at E18.5; 2. Circulating levels of amino acids and a wide range of metabolites did not differ between HET-hom and control offspring, or between the mothers of these offspring; 3. HET-hom placentas had lower total labyrinth area, labyrinth/placenta and capillary/labyrinth ratios based on analysis of total capillary and labyrinth areas, indicating that the surface area for nutrient transfer is not increased

      Together these data strongly indicate that hyperplastic HET-hom placentas do not provide greater support to HET-hom fetuses than controls, and that increased placental function in HET-hom offspring is unlikely to explain the late gestation fetal growth recovery we observed in HET-hom offspring or how HET-hom offspring were able to attain normal weights by birth.

      While we have not directly counted the spongiotrophoblast populations, we have now included analyses of both the glycogen-enriched and non-glycogen cell populations in the junctional zone and the decidua (Figure 4H-K). This revealed an increased area of both glycogen-enriched and non-glycogen cells in the junctional zone and in the decidua of HET-hom placentas, consistent with the greater junctional zone/placenta ratio observed in HET-hom placentas (Figure 4D). Together with data in Figure 4C-F and Supp. Fig. 3, our observations demonstrate that the overall decidua and junctional zone areas were increased in HET-hom offspring, but there was a disproportionate expansion of the junctional zone that was caused by increased areas of both glycogen and non-glycogen-enriched cells.

      Tetraploid rescue experiments would require a very significant amount of time and investment and are technically very demanding. While creation of complementary tetraploid offspring would be informative, unfortunately these experiments are beyond the scope of this current study.

      Reviewer Comment 1 cont. The authors do not mention alternative explanations for the observed fetal catch-up and postnatal overgrowth. Why would oocyte epigenetic programming effects be restricted to the placenta, and not include fetal organs?

      Our intention was certainly not to convey a message that effects may be placenta specific. Indeed, our ongoing work beyond the scope of this study provides evidence for effects in other tissues (brain and bones) that will be published elsewhere. Our new data clearly show low placental efficiency, fetal blood glucose, low capillary/labyrinth ratio and no impact on circulating fetal amino acid or metabolite levels in HET-hom offspring. In light of these new data, we have reinterpreted the findings of this study and substantially updated the discussion.

      Given our observations that fetal growth rate markedly increased during late gestation, but placental efficiency was reduced, our data strongly indicate that the effects of altered epigenetic oocyte programming due to loss of Eed affect both the placenta and the fetus. While our findings are significant, the precise mechanism underlying this growth response in HET-hom fetuses remains unknown. Understanding this mechanism will require substantially more work that will be the subject of future studies.

      Reviewer #2 (Public Review):

      Consistent fetal growth trajectories are vital for survival and later life health. The authors utilise an elegant and novel animal model to tease apart the role of Eed protein in the female germline from the role of somatic Eed. The authors were able to experimentally attribute placental overgrowth - particularly of the endocrine region of the placenta - to the function of Eed protein in the oocyte. Loss of Eed protein in the oocyte was also associated with dynamic changes in fetal growth and prolonged gestation. It was not determined whether the reported catch-up growth apparent on the day of birth was due to enhanced fetal growth very late in gestation, a longer gestational time ie the P0 pups are effectively one day "older" compared to the controls, or the pups catching up after birth when consuming maternal milk.

      To understand if increased growth occurred in HET-hom fetuses prior to birth, we have now included analyses of offspring weight at E18.5 (Figure 2F), all pups collected with a verified E19.5 birth date (Figure 2J) and for pups from similar litter sizes (5-7 pups) at E19.5 (Figure 2K). Together with our existing data, these additional analyses provide average weights for fetuses at E14.5, E17.5, E18.5 and pups born on E19.5. This confirmed that HET-hom offspring undergo enhanced growth in the last few days of pregnancy, resulting in the progression of substantially growth and developmentally restricted HET-hom fetuses at E14.5, to pups with normal weight at birth within the 40% of pregnancies that were born on E19.5 in a normal gestational time.

      However, in addition, gestational length was increased by one to two days in 60% of pregnancies from hom oocytes, but not in control pregnancies from het or wt oocytes. As average weights were significantly greater in all surviving HET-hom offspring at P0 (i.e. surviving pups born on E19.5-E21.5; Figure 2G), it appears that this additional gestational time contributed to the offspring overgrowth. This is logical, however it does not explain how growth and developmentally delayed fetuses at E14.5 attained normal weight and developmental stage by E19.5 (Figure 2J-K).

      Together our data clearly show that HET-hom offspring undergo enhanced growth during the late stages of pregnancy, allowing them to resolve the developmental delay and growth insufficiency observed at E14.5 so that they were born at normal weight and stage at E19.5. In addition, increased gestational time contributes to weight of pups delivered on E20.5 or 21.5, partly explaining the overgrowth phenotype observed in this model.

      The idea that increased milk consumption may explain the overgrowth of HET-hom offspring is interesting. It is possible that the increased growth rate of HET-hom offspring continues after birth and contributes to overgrowth. However, examining this outcome in a tightly controlled manner is complicated given that we cannot predict the day of birth of HET-hom litters, and that these litters are generally small and would need to be fostered on the day of birth alongside control litters. Given these challenges and that our primary observation is that HET-hom offspring underwent fetal growth recovery during pregnancies of normal length and via extension of gestational length, we have not examined the possibility of increased milk consumption after birth.

      We have updated the results to reflect the new analyses and have provided relevant discussion to address these data. Our description of these data can be found in Results (lines 165-197) and in Figure 2.

      Reviewer #3 (Public Review):

      My understanding of the main claims of the paper, and how they are justified by the data are discussed below:

      Overall, loss of PRC2 function in the developing oocyte and early embryo causes:

      1) Growth restriction from at least the blastocyst stage with low cell counts and midgestational developmental delay.

      Strengths:

      • Live embryo imaging added an important dimension to this study. The authors were able to confirm an unquantified finding from a previous lab (reduced time to 2-cell stage in oocyte-deletion Eed offspring, Inoue 2018, PMID: 30463900) as well as identify developmental delay and mortality at the blastocyst- hatching transition.

      • For the weight and morphological analysis the authors are careful to provide isogenic controls for most of the experiments presented. This means that any phenotypes can be attributed to the oocyte genotype rather than any confounding effects of maternal or paternal genotype.

      • Overall, there is good evidence that oocyte deletion of Eed results in early embryonic growth restriction, consistent with previous observations (Inoue 2018, PMID: 30463900).

      Reviewer 3, Comment 1: Weaknesses: Gaps in the reporting of specific features of the methodology make it difficult to interpret/understand some of the results.

      While we are unsure exactly which methods Reviewer 3 would like expanded, we have updated parts that we thought required further detail and allow more informed interpretation of the results. These include methods for placental histology (Lines 650-669) and immuno- histochemistry (Lines 671-690), and new methods for CD31 immunofluorescence (Lines 692-714), glucose and metabolomics (Lines 752-769) and DNA methylation (RRBS; Lines 734-750) analyses.

      To clarify the approach taken for histology, immunohistochemical and immunofluorescent staining, sections were cut in compound series from the centre of each placenta, ensuring that we collected representative data for each sample. QuPath was used to quantify the decidual and junctional zone areas in one complete, fully intact midline section for each placenta as close to the midline as possible. This provided data from 10 placentas for each genotype. In addition, glycogen-enriched and non-glycogen-enriched cells were identified and quantified using machine learning assisted QuPath analyses of the whole placenta, decidua and junctional zone regions. We have also added quantitative analyses of the labyrinth and labyrinth capillary network using immunofluorescent CD31 staining and machine learning assisted HALO software. This new analysis of placental morphology is included in the methods section.

      Moreover, as there were no sex-specific differences in placental morphology or weight, we combined the samples from both sexes to provide greater numbers for analysis in each genotype. For example, as described for the analyses of labyrinth and capillaries using CD31 IF, 4 placentas of each sex were used for data collection. This provided data from a total of 8 placentas (4 male and 4 female) for each genotype from a total of 17 WT-wt (9 male and 8 female), 21 HET-het (9 male and 12 female) and 24 HET-hom (16 male and 8 female) sections (2-3 sections/placenta).

      Reviewer 3, Comment 2: Placental hyperplasia with disproportionate overgrowth of the junctional trophoblast especially the glycogen trophoblast (GlyT) cells.

      Strengths: • The authors provide a comprehensive description of how placental and embryo weight is affected by the oocyte-Eed deletion through mid-to-late gestation development. The case for placentomegaly is clear.

      Weaknesses:

      • The placental efficiency data presented in Figure 3G-I is incorrect. Placental efficiency is calculated as embryo mass/placental mass, and it increases over the late gestation period. For e14.5 for example (Fig3G), WT-wt embryo mass = ~0.3g, placenta mass = 0.11g (from Fig 3D) = placental efficiency 2.7; HET-hom = 0.25/0.12 = 2.1. The paper gives values: WT-wt 0.5, HET-hom 0.7. Have the authors perhaps divided placenta weight by embryo mass? This would explain why the E17.5 efficiencies are so low (WT-wt 0.11 rather than a more usual figure of 8.88. If this is the case then the authors' conclusion that placental efficiency is improved by oocyte deletion of Eed is wrong - in fact, placental efficiency is severely compromised.

      The authors have performed cell type counting on histological sections obtained from placentas to discover which cells are contributing to the placentomegaly. This data is presented as %cell type area in the main figure, though the untransformed cross-sectional area for each cell type is shown in the supplementary data. This presentation of the data, as well as the description of it, is misleading because, while it emphasises the proportional increase in the endocrine compartment of the placenta it downplays the fact that the exchange area of the mutant placentas is vastly expanded. This is important for two reasons.

      Firstly, the whole placenta is increased in size suggesting that the mechanism is not placental lineage- specific and instead acting on the whole organ. Secondly in relation to embryonic growth, generally speaking, genetic manipulations that modify labyrinthine volume tend to have a positive correlation with fetal mass whereas the relationship between junctional zone volume and embryonic mass is more complex (discussed in Watson PMID: 15888575, for example). The authors should reconsider how they present this data in light of the previous point.

      We thank the reviewer for pointing out our error in the placental efficiency analysis and apologise for this error. We have corrected the presentation and interpretation of these data and have described this in detail in our response to Reviewer 1, Comment 1.

      As discussed in our response to Reviewer 1, Comment 1, we have added a range of analyses to determine whether placental efficiency was enhanced in HET-hom offspring. These include measuring fetal and maternal circulating glucose levels (Figure 4K), individual amino acids and an extensive range of metabolites (Figure 8) and providing CD31 immunofluorescent analyses of labyrinth area, labyrinth/placental ratio and capillary/labyrinth ratio in HET-hom and control placentas (Figure 5).

      We also added analyses of glycogen enriched and non-glycogen-enriched cell counts in the decidua and junctional zones. As suggested by Reviewer 3, both glycogen-enriched and non-enriched cell populations are significantly increased in HET-hom placentas.

      Combined, these new analyses significantly expand the study and support the conclusion that placental efficiency in HET-hom offspring was either compromised or not different from controls, depending on the analysis. We find no evidence that placental efficiency was increased in HET-hom offspring and have reworked our results and discussion sections to reflect these new data and interpretation.

      Reviewer 3, Comment 2 cont: Again, some of the methods are not clearly reported making interpretation difficult - especially how they have estimated their GlyT number.

      As outlined in our response to Reviewer 3 Comment 1, in the methods section we have added further detail of how we counted glycogen-enriched and non-enriched cells in the decidua and junctional zone regions of sections for the middle of WT-wt, WT-het, HET-het and HET-hom placentas (Lines 650-669).

      Reviewer 3, Comment 3: Perinatal embryonic/pup overgrowth.

      Strengths:

      • The overgrowth exhibited by the oocyte-Eed-deleted pups is striking and confirms the previous work by this group (Prokopuk, 2018). This is an important finding, especially in the context of understanding how PRC2-group gene mutations in humans cause overgrowth syndromes. It is also intriguing because it indicates that genetic/environmental insults in the mother that affect her gamete development can have long-term consequences on offspring physiology.

      Weaknesses:

      • Is the overgrowth intrauterine or is it caused by the increase in gestation length? The way the data is reported makes it impossible to work this out. The authors show that gestation time is consistently lengthened for mothers incubating oocyte-Eed-deleted pups by 1-2 days. In the supplementary material, the mutant embryos are not larger than WT at e19.5, the usual day of birth. Postnatal data is presented as day post-parturition. It would probably be clearer to present the embryonic and postnatal data as days post coitum. In this way, it will be obvious in which period the growth enhancement is taking place. This is information really important to determine whether the increased growth of the mutants is due to a direct effect of the intrauterine environment, or perhaps a more persistent hormonal change in the mother that can continue to promote growth beyond the gestation period.

      We have used embryonic day (E) to denote embryo and fetal age throughout the study – this is the same as using DPC (i.e. E19.5 is equivalent to 19.5 DPC). As described in the Methods “Collection of post-implantation embryos, placenta and postnatal offspring”, mice were time mated for two-four nights, with females plug checked daily. Positive plugs were noted as day E0.5.

      To make the data presentation clearer, we have shown the data for surviving HET-hom pups born on E19.5 (Figure 2J) separately from all HET-hom surviving pups born on E19.5-E21.5. (Figure 2G). As discussed in our response to Reviewer 2, we have also included growth data for pregnancies at E14.5, E17.5, E18.5 (Fig. 2C-F) and E19.5 (Figure 2J,K), as well as P0 (combined data for surviving pups born E19.5-E21.5), and P3 (combined data for surviving pups born E19.5-E21.5, Figure 2G,H).

      These data clearly show that HET-hom fetuses are substantially growth and developmentally delayed at E14.5 (Figure 2D), but HET-hom pups born on E19.5 are the same weight as WT-wt, WT-het and HET-het control pups (Figure 2J). This demonstrates that weight of HET-hom fetuses is normalised in utero between E14.5 and day of birth on E19.5.

      Importantly, as requested by Reviewer 3, we have separated average weight for all surviving pups with a day of birth of E19.5-21.5 (Figure 2G) from average weight of pups born on E19.5 only (Figure 2J). These analyses revealed that the average weight of surviving pups born between E19.5-21.5 was significantly higher than for controls (Figure 2G), but the average weight of pups born on E19.5 only was not. It is therefore clear that extended gestation also contributed to increased HET-hom pup birth weight. We have updated these additional analyses in Results (Lines 165-197) and Figure 2

      As revealed in Figure 2H, it is also possible/likely that growth of HET-hom pups during the three days post- partum may have contributed to the offspring overgrowth we observed in this and our previous study (Prokopuk et al., 2018 Clinical Epigenetics). However, we cannot determine whether there is a contribution from a persistent maternal hormonal change that promotes post-natal offspring growth or whether there is an innate growth benefit in HET-hom pups. As this is very difficult to dissect, separating these possibilities is beyond the scope of our study.

      Reviewer 3, Comment 4: "fetal growth restriction followed by placental hyperplasia, .. drives catch-up growth that ultimately results in perinatal offspring overgrowth".

      Here the authors try to link their observations, suggesting that i) the increased perinatal growth rate is a consequence of placentomegaly, and ii) the placentomegaly/increased fetal growth is an adaptive consequence of the early growth restriction. This is an interesting idea and suggests that there is a degree of developmental plasticity that is operating to repair the early consequences of transient loss of Eed function.

      Strengths:

      • Discrepancies between earlier studies are reconciled. Here the authors show that in oocyte-Eed-deleted embryos growth is initially restricted and then the growth rate increases in late gestation with increased perinatal mass.

      Weaknesses:

      • Regarding the dependence of fetal growth increase on placental size increase, this link is far from clear since placental efficiency is in fact decreased in the mutants (see above).

      • "Catch-up growth" suggests that a higher growth rate is driven by an earlier growth restriction in order to restore homeostasis. There is no direct evidence for such a mechanism here. The loss of Eed expression in the oocyte and early embryo could have an independent impact on more than one phase of development.

      Firstly, there is growth restriction in the early phase of cell divisions. Potentially this could be due to depression of genes that restrain cell division on autosomes, or suppression of X-linked gene expression (as has been previously reported, Inoue, 2018 PMID: 30463900). The placentomegaly is explained by the misregulation of non-canonically imprinted genes, as the authors report (and in agreement with other studies, e.g. Inoue, 2020. PMID: 32358519).

      • Explaining the perinatal phase of growth enhancement is more difficult. I think it is unlikely to be due to placentomegaly. Multiple studies have shown that placentomegaly following somatic cell nuclear transfer (SCNT) is caused by non-canonically imprinted genes, and can be rescued by reducing their expression dosage. However, SCNT causes placentomegaly with normal or reduced embryonic mass (for example -Xie 2022, PMID: 35196486), not growth enhancement. Moreover, since (to my knowledge) single loss of imprinting models of non-canonically imprinted genes do not exist, it is not possible to understand if their increased expression dosage can drive perinatal overgrowth, and if this is preceded by growth restriction and thus constitutes 'catch up growth'.

      Reviewer 3 is correct in their assessment that placental efficiency was decreased in HET- hom offspring and we have corrected the placental efficiency analysis based on fetal/placental weight ratios (discussed in detail in our response to Reviewer 1 Comment 1). We have added substantially more data (glucose, amino acids, metabolites, labyrinth capillary area and density). These data support the conclusion that a placentally driven advantage for HET-hom fetal growth is unlikely, despite our observation that HET- hom fetuses are developmental delayed and underweight at E14.5, but are born at normal weight after a normal gestational length (19.5 days) (discussed in our responses to Reviewer 3, Comment 3 and Reviewer 2).

      This demonstrates that HET-hom fetuses are able to attain normal birth weight despite being initially growth restricted state at E14.5, and that this occurs despite low placental function. Moreover, as we compared isogenic offspring with heterozygous loss of Eed (Het-het compared to HET-hom offspring) the outcomes we observed in HET-hom offspring originate from loss of EED in the growing oocyte or loss of maternal EED in the zygote strongly suggesting that a non-genetic mechanism is involved.

      As pointed out by Reviewer 3, the initial developmental delay in HET-hom offspring may be due to increased expression of genes that regulate cell proliferation – this could clearly explain the lower number of cells we observed in the ICM and the growth delay at later stages of embryonic and fetal development. Another possibility is that maternal PRC2 provided by the oocyte promotes cell divisions in preimplantation embryos We have discussed these possibilities on Lines 467-476.

      In addition, Matoba et al 2022 demonstrated that deletion of maternal Xist together with Eed was able to rescue male-biased lethality in offspring from oocytes lacking Eed, revealing a clear role for X-linked genes in this phenotype (Matoba et al 2022, Genes and Development). However, deletion of maternal Xist did not properly normalise survival offspring from Eed null oocytes (i.e. Eed/Xist double maternal null litters were smaller than litters derived from wild type oocytes) strongly suggesting other mechanisms provide the capacity for HET-hom offspring to attain normal weight at birth. We have added further discussion of the Matoba study in the context of our study on of the Discussion (Lines 544-555)

      Finally, with respect to the outcomes for SCNT derived offspring, we extracted SCNT fetal growth and placental weight data from the supplementary data included in Matoba et al., 2018 Cell Stem Cell. 2018;23(3):343-54.e5 and compared it with data collected in our study (Figure 7). This analysis revealed that the weights of placentas and fetuses of offspring derived via SCNT were very similar to the HET-hom offpsring in our study and we have discussed the similarities and potential differences between HET-hom and SCNT offspring in the Discussion (Lines 478-500).

      As pointed out by Reviewer 3, deletion of maternal non-canonically imprinted genes partially or fully rescued the placental hyperplasia phenotype in both SCNT derived and offspring from oocyte lacking EED. However, as we have discussed, the mechanisms underlying other aspects of the offspring phenotype, such as fetal growth recovery of HET-hom offspring observed in our study, remain unknown. Moreover, the comparison we provide in Figure 7 strongly indicates that HET-hom and SCNT fetuses are similarly delayed at E14.5 and undergo similar fetal growth recovery before birth, but the mechanism also remains unknown. Together, it appears that offspring derived from either Eed-null oocytes or by SCNT have an innate ability to remediate fetal growth restriction during the late stages of pregnancy without a requirement to correct maternally inherited impacts mediated by Xist or H3K27me3-dependent imprinting.

    1. Author response

      Reviewer #1 (Public Review):

      The main contribution appears to be related to functional specialization. I suggest clarifying the major novelty of the present report and to focus the introduction on it.

      We thank this reviewer for this suggestion. We have revised the introduction to emphasize the functional specialization question. The changes are extensive; we have included a tracked-changes version of the manuscript to make these edits easy to see.

      There is a growing literature on fluctuating neural firing patterns that is not considered in this report. The scholarship appears a bit impoverished with only 19 references, many of which point to work from this group of collaborators. I suggest that the authors consider the present work in the context of the wider literature more scholarly, even if not all the relations of these different lines of work can be conclusively connected at this point. For a few examples, there is work by Kienitz and colleagues on fluctuating neural patterns in V4 evoked by competing grating stimuli. Also, the work by Engel, Moore, and colleagues on 'on' and 'off' states in the context of selective attention seems relevant, or the work by Fiebelkorn and Kastner on rhythmic perception and attention.

      We agree completely with this suggestion! We have reworded the introduction to be more inclusive of other research in this area (especially Kienitz and colleagues – exciting work that we are pleased to have had brought to our attention) and we have added about 500 words in the Discussion to cover the work on on/off states (Engel et al.), rhythmic perception (Fiebelkorn & Kastner and others), and attention more generally (e.g., Triesman & Gelade’s work on serial sampling). We are particularly pleased to add these sections because these topics are very much on our minds – we have a commentary piece under review elsewhere in which we evaluate these synergistic lines of approach in a more complete fashion. In total, we’ve added about 15 additional references.

      Reviewer #2 (Public Review):

      The description of the results would benefit from a better explanation of how low spike counts may influence the outcome of the analysis. Due to a smoothing procedure used for visualization, the spike counts for the paired stimuli (AB, black lines) shown in Figure 3a-b and Figure 4a-d go below 0. However, the actual spike count on a trial can not go below 0. The symmetric smoothing procedure may hide an underlying skewed distribution of spike counts that can only be positive. The statistical analysis is not performed on the smoothed distribution but on the actual spike counts, and the validity of the result is therefore not in question. However, the paper would benefit from 1) visualization of the unsmoothed trial counts, and 2) an explanation of how assumptions of symmetric/skewed distributions may affect the outcome.

      We thank the reviewers for noting this and making these suggestions. We now include unsmoothed raw spike counts in all the example figures (Figure 3a-b and Figure 4a-d). With regard to the symmetric/skewed distributions and the analysis methods, a Poisson distribution will be skewed at low rates and become more symmetric at higher rates, so this is already incorporated into the analysis. Indeed, the utility of Poisson distributions for fitting non-negative data is one of the reasons these distributions are so commonly used in neuroscience. We now make this point explicitly at the beginning of Methods/Data analysis: “Our method centers on modeling spike counts based on Poisson distributions, a common technique for handling non-negative count data in neuroscience and other fields.” With this edit as well as the revised example figures now making clear that no spike counts are below zero, we are optimistic that readers will better understand the analysis method and how the shape of response distributions are incorporated into it.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank the editors and reviewers for their helpful comments, which have allowed us to improve the manuscript.

      Response to reviewer 2

      We thank the reviewer for this positive feedback, which requires no further revision.

      Response to reviewer 3

      We thank the reviewer for highlighting these additional points and provide further explanations on these below.

      Firstly, we started the analysis from a baseline of year 2000 because the largest international donor (the Global Fund) uses baseline malaria levels in the period 2000-2004 as the basis of their current allocation calculations (The Global Fund, Description of the 2020-2022 Allocation Methodology, December 2019). In the paper we compare our optimal strategy to a simplified version of this method, represented by our “proportional allocation” strategy.

      Even if our simulations started in the year 2015, a direct comparison with the Global Technical Strategy for Malaria 2016-2030 would not be possible due to the different approaches taken. The GTS was developed to progress towards malaria elimination globally and set ambitious targets of at least 90% reduction in malaria case incidence and mortality rates and malaria elimination in at least 35 countries by 2030 compared to 2015. Mathematical modelling at the time suggested that 90% coverage of WHO-recommended interventions (vector control, treatment and seasonal malaria chemoprevention) would be needed to approach this target (Griffin et al. 2016, Lancet Infectious Diseases). The global annual investment requirements to meet GTS targets were estimated at US$6.4 billion by 2020 and US$8.7 billion by 2030 (Patouillard et al. 2017, BMJ Global Health). This strategy therefore considers what resources would be required to achieve a specific global target, but not the optimized allocation of resources.

      Investments into malaria control have consistently been below the estimated requirements for the GTS milestones (World Health Organization 2022, World Malaria Report 2022). In our study, we therefore take a different perspective on how limited budgets can be optimally allocated to a single intervention (insecticide-treated nets) across countries/settings to achieve the best possible outcome for two objectives that are different to the GTS milestones (either minimizing the global case burden, or minimizing both the global case burden and the number of settings not having yet reached a pre-elimination phase). As stated in the discussion, our estimate of allocating 76% of very low budgets to high-transmission settings was similar to the global investment targets estimated for the GTS, where the 20 countries with the highest burden in 2015 were estimated to require 88% of total investments (Patouillard et al. 2017, BMJ Global Health). Nevertheless, we also show that if higher budgets were available, allocating the majority to low-transmission settings co-endemic for P. falciparum and P. vivax would achieve the largest reduction in global case burden. We acknowledge the modelling of a single intervention as one of the key limitations of this analysis, but this simplification was necessary in order to perform the complex optimisation problem. Computationally it would not have been feasible to optimize across a multitude of intervention and coverage combinations.

      A further limitation raised by the reviewer is the lack of cross-species immunity between P. falciparum and P. vivax in our model. While cross-reactivity between antibodies against these two species has been observed in previous studies and the potential implications of this would be important to explore in future work, we did not include it here as little is known to date about the epidemiological interactions between different malaria parasite species (Muh et al. 2020, PLoS Neglected Tropical Diseases).

      Lastly, we did not assume that transmission was homogenous within the four transmission settings in our study (very low, low, moderate, high); transmission dynamics were simulated separately in each country, accounting for heterogeneous mosquito bite exposure. However, results were summarised for the broader transmission settings since many other country-specific factors were not accounted for (see discussion) and the findings should not be used to inform individual country allocation decisions.


      The following is the authors’ response to the original reviews.

      Author response to peer review

      We thank the reviewers for their insightful comments, which raise several important points regarding our study. As the reviewers have recognised, we introduced a number of simplifications in order to perform this complex optimisation problem, such as by restricting the analysis to a single intervention (insecticide-treated nets) and modelling countries at a national level. Despite their clear relevance to the study, computationally it would not have been feasible to run the multitude of scenarios suggested by reviewer 1, which we recognise as a limitation. As such we agree with the assessment that this study primarily represents a thought experiment, based on substantive modelling and aggregate scenario-based analysis, to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. The findings are relevant primarily to global funders and should not be used to inform individual country allocation decisions, and also point to avenues for further research. This perspective also underlies our decision to start the analysis from a baseline of year 2000 as opposed to modelling the current 2023 malaria situation: the largest international donor (the Global Fund) uses baseline malaria levels in the period 2000-2004 as the basis of their allocation calculations (The Global Fund, Description of the 2020-2022 Allocation Methodology, December 2019) (1). A simplified version of this method is represented by our “proportional allocation” strategy. We have made several revisions to the manuscript to address the points raised by the reviewers, as detailed below.

      Reviewer #1 (Public Review):

      1. The authors present a back-of-the-envelope exploration of various possible resource allocation strategies for ITNs. They identify two optimal strategies based on two slightly different objective functions and compare 3 simple strategies to the outcomes of the optimal strategies and to each other. The authors consider both P falciparum and P vivax and explore this question at the country level, using 2000 prevalence estimates to stratify countries into 4 burden categories. This is a relevant question from a global funder perspective, though somewhat less relevant for individual countries since countries are not making decisions at the global scale.

      Thank you for this summary of the paper. We agree that our analysis is of relevance to global funders, but is not meant to inform individual country allocation decisions. In the discussion, we now state:

      p. 12 L19: “Therefore, policy decisions should additionally be based on analysis of country-specific contexts, and our findings are not informative for individual country allocation decisions.”

      1. The authors have made various simplifications to enable the identification of optimal strategies, so much so that I question what exactly was learned. It is not surprising that strategies that prioritize high-burden settings would avert more cases.

      Thank you for raising this point. Indeed, several simplifying assumptions were necessary to ensure the computational feasibility of this complex optimization problem. As a result, our study primarily represents a thought experiment to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. As now further outlined in the introduction, approaches to this have differed over time and it remains a relevant debate for malaria policy.

      p. 2 L22: “However, there remains a lack of consensus on how best to achieve this longer-term aspiration. Historically, large progress was made in eliminating malaria mainly in lower-transmission countries in temperate regions during the Global Malaria Eradication Program in the 1950s, with the global population at risk of malaria reducing from around 70% of the world population in 1950 to 50% in 2000 (2). Renewed commitment to malaria control in the early 2000s with the Roll Back Malaria initiative subsequently extended the focus to the highly endemic areas in sub-Saharan Africa (3).”

      We believe our findings not only confirm an “expected” outcome – that prioritizing high-burden settings would avert more cases – but also clearly illustrate various consequences of different allocation strategies that are implemented or considered in reality, which may not be so obvious. For example, we found that initially allocating a larger share of the budget to high-transmission countries could be both almost optimal in terms of reducing clinical cases and maximising the number of countries reaching pre-elimination. We also observed a trade-off between reducing burden and reducing the global population at risk (“shrinking the map”) through a focus on near-elimination settings, and estimate the loss in burden reduction when following an elimination target.

      1. Generally, I found much of the text confusing and some concepts were barely explained, such that the logic was difficult to follow.

      Thank you for bringing this to our attention, and we regret to hear the manuscript was confusing to read. We believe that the revisions made as a result of the reviewer comments have now made the manuscript much easier to follow. We additionally passed the manuscript to a colleague to identify confusing passages, and have added a number of sentences to clarify key concepts and improve the structure.

      1. I am not sure why the authors chose to stratify countries by 2000 PfPR estimates and in essence explore a counterfactual set of resource allocation strategies rather than begin with the present and compare strategies moving forward. I would think that beginning in 2020 and modeling forward would be far more relevant, as we can't change the past. Furthermore, there was no comparison with allocations and funding decisions that were actually made between 2000 and 2020ish so the decision to begin at 2000 is rather confusing.

      Thank you for pointing this out. We have now made the rationale for this choice clearer in the manuscript. Our main reason for this was to allow comparison with the Global Fund funding allocation, which is largely based on malaria disease burden in 2000-2004. As stated in the paper, malaria prevalence estimates in the year 2000 are commonly considered to represent a “baseline” endemicity level, before large-scale implementation of interventions in the following decades. In the manuscript, the transmission-related element of the Global Fund allocation algorithm is represented in our “proportional allocation” strategy. Previously this was only mentioned in the methods, but we have now added the following in the results to address this comment of the reviewer:

      p. 6 L12: “Strategies prioritizing high- or low-transmission settings involved sequential allocation of funding to groups of countries based on their transmission intensity (from highest to lowest EIR or vice versa). The proportional allocation strategy mimics the current allocation algorithm employed by the Global Fund: budget shares are mainly distributed according to malaria disease burden in the 2000-2004 period. To allow comparison with this existing funding model, we also started allocation decisions from the year 2000.”

      The Global Fund framework additionally considers economic capacity and other specific factors, and we have now also included a direct comparison with the 2020-2022 Global Fund allocation in Supplementary Figure S12 (see Author response image 1).

      We agree that looking at allocation decisions from 2020 onward would also constitute a very interesting question. However, the high dimensionality in scenarios to consider for this would currently make it computationally infeasible to run on the global level. Not only would it have to include all interventions currently implemented and available for malaria at different levels of coverage, but also the option of scaling down existing interventions. Instead, our priority in this paper was to conduct a thought experiment including both P. falciparum and P. vivax on a large geographical scale.

      Author response image 1.

      Impact of the proportional allocation strategy and the 2020-2022 Global Fund allocation on global malaria cases (panel A) and the total population at risk of malaria (panel B) at varying budgets. Both strategies use the same algorithm for budget share allocation based on malaria disease burden in 2000-2004, but the Global Fund allocation additionally involves an economic capacity component and specific strategic priorities.

      1. I realize this is a back-of-the-envelope assessment (although it is presented to be less approximate than it is, and the title does not reveal that the only intervention strategy considered is ITNs) but the number and scope of modeling assumptions made are simply enormous. First, that modeling is done at the national scale, when transmission within countries is incredibly heterogeneous. The authors note a differential impact of ITNs at various transmission levels and I wonder how the assumption of an intermediate average PfPR vs modeling higher and lower PfPR areas separately might impact the effect of the ITNs.

      Thank you for this comment. We agree the title could be more specific and have changed this to “Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”.

      Regarding the scale of ITN allocation, it is true that allocation at a sub-national scale could affect the results. However, considering this at a national scale is most relevant for our analysis because this is the scale at which global funding allocation decisions are made in practice. A sentence explaining this has been added in the methods.

      p. 15 L8: “The analysis was conducted on the national level, since this scale also applies to funding decisions made by international donors (1).”

      Further considering different geographical scales would also require introducing other assumptions, for example about how different countries would distribute funding sub-nationally, whether specific countries would take cooperative or competitive approaches to tackle malaria within a region or in border areas, and about delays in the allocation of bednets in specific regions. These interesting questions were outside of the scope of this work, but certainly require further investigation.

      1. Second, the effect of ITNs will differ across countries due to variations in vector and human behavior and variation in insecticide resistance and susceptibility to the ITNs. The authors note this as a limitation but it is a little mind-boggling that they chose not to account for either factor since estimates are available for the historical period over which they are modeling.

      Thank you for pointing this out. We did consider this and mentioned it as a limitation. Nevertheless, the complexity of accounting for this should also be recognised; for example, there is substantial uncertainty about the precise relationship between insecticide resistance and the population-level effect of ITNs (Sherrard-Smith et al., 2022, Lancet Planetary Health) (4). Additionally, our simulations extend beyond the 2000-2023 period so further assumptions about future changes to these factors would also be required. Simplifying assumptions are inherent to all mathematical modelling studies and we consider these particular simplifications acceptable given the high-level nature of the analysis.

      1. Third, the assumption that elimination is permanent and nothing is needed to prevent resurgence is, as the authors know, a vast oversimplification. Since resources will be needed to prevent resurgence, it appears this assumption may have a substantial impact on the authors' results.

      Thank you for this comment. In the discussion, we have now expanded on this:

      p. 13 L3: “While our analysis presents allocation strategies to progress towards eradication, the results do not provide insight into allocation of funding to maintain elimination. In practice, the threat of malaria resurgence has important implications for when to scale back interventions.”

      We believe that from a global perspective, the questions of funding allocation to achieve elimination vs to maintain it can currently still be considered separately given the large time-scales involved. The cost of preventing resurgence is not known, and one major problem in accounting for this would also be to identify relevant timescales to quantify this over.

      1. The decision to group all settings with EIR > 7 together as "high transmission" may perhaps be driven by WHO definitions but at a practical level this groups together countries with EIR 10 and EIR 500. Why not further subdivide this group, which makes sense from a technical perspective when thinking about optimal allocation strategies?

      Thank you for pointing this out. The WHO categories used are better interpreted in terms of the corresponding prevalence, which places countries with a prevalence of over 35% in the high transmission categories (WHO Guidelines for malaria, 31 March 2022) (5). We felt this is appropriate given that we are looking at theoretical global allocation patterns and do not aim to make recommendations for specific groups of countries or individual countries within sub-Saharan Africa that would be distinguished through the use of higher cut-offs. In our analysis, all 25 countries in the high transmission category were located in sub-Saharan Africa.

      1. The relevance of this analysis for elimination is a little questionable since no one eliminates with ITNs alone, to the best of my understanding.

      Thank you for this comment. We indeed state in the paper that ITNs alone are not sufficient to eliminate malaria. However, we still think that our analysis is relevant for elimination by taking a more theoretical perspective on reducing transmission using interventions. Starting from the 2000 baseline (or current levels) globally, large-scale transmission reductions such as those achieved by mass ITN distribution still represent the first key step on the path to malaria eradication, as shown in previous modelling work (Griffin et al., 2016, Lancet Infectious Diseases) (6). In the final phase of elimination, the WHO also recommends the addition of more targeted and reactive interventions (WHO Guidelines for malaria, 31 March 2022) (5). Our changes to the title of the article (“Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”) should now better reflect that we consider ITNs as just one necessary component to achieve malaria eradication.

      Reviewer #2 (Public Review):

      1. Schmit et al. analyze and compare different strategies for the allocation of funding for insecticide-treated nets (ITNs) to reduce the global burden of malaria. They use previously published models of Plasmodium falciparum and Plasmodium vivax malaria transmission to quantify the effect of ITN distribution on clinical malaria numbers and the population at risk. The impact of different resource allocation strategies on the reduction of malaria cases or a combination of malaria cases and achieving pre-elimination is considered to determine the optimal strategy to allocate global resources to achieve malaria eradication.

      Strengths:

      Schmit et al. use previously published models and optimization for rigorous analysis and comparison of the global impact of different funding allocation strategies for ITN distribution. This provides evidence of the effect of three different approaches: the prioritization of high-transmission settings to reduce the disease burden, the prioritization of low-transmission settings to "shrink the malaria map", and a resource allocation proportional to the disease burden.

      Thank you for providing this summary and outline of the strengths of the paper.

      1. Weaknesses:

      The analysis and optimization which provide the evidence for the conclusions and are thus the central part of this manuscript necessitate some simplifying assumptions which may have important practical implications for the allocation of resources to reduce the malaria burden. For example, seasonality, mosquito species-specific properties, stochasticity in low transmission settings, and changing population sizes were not included. Other challenges to the reduction or elimination of malaria such as resistance of parasites and mosquitoes or the spread of different mosquito species as well as other beneficial interventions such as indoor residual spraying, seasonal malaria chemoprevention, vaccinations, combinations of different interventions, or setting-specific interventions were also not included. Schmit et al. clearly state these limitations throughout their manuscript.

      The focus of this work is on ITN distribution strategies, other interventions are not considered. It also provides a global perspective and analysis of the specific local setting (as also noted by Schmit et al.) and different interventions as well as combinations of interventions should also be taken into account for any decisions.

      Thank you for raising these points. As outlined at the beginning of our response, for computational reasons we indeed had to introduce several simplifying assumptions to perform this complex optimisation problem. As a result of these factors you highlighted, our study should primarily be interpreted as a thought experiment to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. The findings are relevant primarily to global funders and should not be used to inform individual country allocation decisions, which we have further clarified in the manuscript.

      1. Nonetheless, the rigorous analysis supports the authors' conclusions and provides evidence that supports the prioritization of funding of ITNs for settings with high Plasmodium falciparum transmission. Overall, this work may contribute to making evidence-based decisions regarding the optimal prioritization of funding and resources to achieve a reduction in the malaria burden.

      Thank you for this positive assessment of our work.

      Reviewer #1 (Recommendations For The Authors):

      1. L144: last paragraph, the focus on endemic equilibrium: I did not really understand this, when 39 years is mentioned later is that a different analysis? How are cases averted calculated in a time-agnostic endemic equilibrium analysis? Perhaps a little more detail here would be helpful.

      A further explanation of this has been added in the results and methods.

      p. 8 L 22: “To evaluate the robustness of the results, we conducted a sensitivity analysis on our assumption on ITN distribution efficiency. Results remained similar when assuming a linear relationship between ITN usage and distribution costs (Figure S10). While the main analysis involves a single allocation decision to minimise long-term case burden (leading to a constant ITN usage over time in each setting irrespective of subsequent changes in burden), we additionally explored an optimal strategy with dynamic re-allocation of funding every 3 years to minimise cases in the short term.”

      p. 17 L25: “To ensure computational feasibility, 39 years was used as it was the shortest time frame over which the effect of re-distribution of funding from countries having achieved elimination could be observed.”

      p. 18 L 9: “Global malaria case burden and the population at risk were compared between baseline levels in 2000 and after reaching an endemic equilibrium under each scenario for a given budget.”

      1. L148: what is proportional allocation by disease burden and how is that different from prioritizing high-transmission settings?

      Further details have been added in the text.

      p. 6 L12: “Strategies prioritizing high- or low-transmission settings involved sequential allocation of funding to groups of countries based on their transmission intensity (from highest to lowest EIR or vice versa). The proportional allocation strategy mimics the current allocation algorithm employed by the Global Fund: budget shares are mainly distributed according to malaria disease burden in the 2000-2004 period. To allow comparison with this existing funding model, we also started allocation decisions from the year 2000.”

      1. L198-9: did low transmission settings get the majority of funding at intermediate and maximum budgets because they have the most population (I think so, based on Fig 1)?

      Yes, this is correct. We state in the results: “the optimized distribution of funding to minimize clinical burden depended on the available global budget and was driven by the setting-specific transmission intensity and the population at risk”.

      1. L206: what is ITN distribution efficiency? This is not explained. What is the 39-year period? Why this duration?

      Further explanations have been added in the results section, which were previously only detailed in the methods:

      p. 8 L 22: “To evaluate the robustness of the results, we conducted a sensitivity analysis on our assumption on ITN distribution efficiency. Results remained similar when assuming a linear relationship between ITN usage and distribution costs (Figure S10)."

      p. 17 L25: “To ensure computational feasibility, 39 years was used as it was the shortest time frame over which the effect of re-distribution of funding from countries having achieved elimination could be observed.”

      1. L218: what is "no intervention with a high budget"? is this a phrasing confusion?

      Yes, this has been changed.

      p. 9 L14: “We estimated that optimizing ITN allocation to minimize global clinical incidence could, at a high budget, avert 83% of clinical cases compared to no intervention.”

      1. L235-7: on comparing these results to previous work on the 20 highest-burden countries: is the definition of "high" similar enough across these studies that this is a relevant comparison?

      We believe this is reasonably comparable, as looking at the 20 highest-burden countries encompasses almost the entire high-transmission group in our work (25 countries in total), on which the comparison is made.

      1. L267-70: I didn't understand this sentence at all.

      Thanks for flagging this. The sentence referred to is: “Allocation proportional to disease burden did not achieve as great an impact as other strategies because the funding share assigned to settings was constant irrespective of the invested budget and its impact, and we did not reassign excess funding in high-transmission settings to other malaria interventions.”

      The previously mentioned added details on the proportional allocation strategy in the manuscript should now make this clearer, together with this clarification:

      p. 11 L17: “In modelling this strategy, we did not reassign excess funding in high-transmission settings to other malaria interventions, as would likely occur in practice.”

      For proportional allocation, a fixed proportion of the budget is calculated for each country based on disease burden, as described in the Global Fund allocation documentation (see Methods). However, since ITNs are the only intervention considered, this leads to a higher budget being allocated than is needed in some countries (i.e. where more funding doesn’t translate into further health gains).

      1. L339 EIR range: 80 is high at the country level but areas within countries probably went as high as 500 back in 2000. How does this affect the modeled estimates of ITN impact?

      The question of sub-national differences in transmission has been addressed in the public review comments. Briefly, we consider the national scale to be most relevant for our analysis because this is the scale at which global funding allocation decisions are made in practice. Although, as you correctly point out, the EIR affects ITN impact, it is not possible to conclude what the average effect of this would be on the country level without considering the following factors and introducing further assumptions on these: how would different countries distribute funding sub-nationally? Which countries would take cooperative or competitive approaches to tackle malaria within a region or in border areas? Would there be delays in the allocation of bednets in specific regions? These interesting questions were outside of the scope of this work, but certainly require further investigation.

      1. L347 population size constant: births and deaths are still present, is that right? Unclear from this sentence

      Yes, this is correct. Full details on the model can be found in the Supplementary Materials.

      1. L370 estimating ITN distribution required to achieve simulated population usage: is this a single relationship for all of Africa? Is it based on ITNs distributed 2:1 -> % access -> % usage? So it accounts for allocation inefficiency?

      Yes, this is represented by a single relationship for all of Africa to account for allocation inefficiency and is based on observed patterns across the continent and methodology developed in a previous publication (Bertozzi-Villa et al., 2021, Nature Communications) (7). Full details can be found in the Supplementary Materials (“Relationship between distribution and usage of insecticide-treated nets (ITNs)”, p. 21).

      1. L375: the ITN unit cost is assumed constant across countries and time (I think, it doesn't say explicitly), is this a good assumption?

      Yes, this is correct. We consider this a reasonable assumption within the scope of the paper. While delivery costs likely vary across countries, international funders usually have pooled procurement mechanisms for ITNs (The Global Fund, 2023, Pooled Procurement Mechanism Reference Pricing: Insecticide-Treated Nets).

      1. L399: "single allocation of a constant ITN usage" it is not explained what exactly this means

      Further explanations have been added in the manuscript.

      p. 8 L24: “While the main analysis involves a single allocation decision to minimise long-term case burden (leading to a constant ITN usage over time in each setting irrespective of subsequent changes in burden), we additionally explored an optimal strategy with dynamic re-allocation of funding every 3 years to minimise cases in the short term.”

      Reviewer #2 (Recommendations For The Authors):

      1. Additionally to the public comments, the only major comment is that in this reviewer's opinion, the focus on ITNs as the only intervention should be made clearer at different places in the manuscript (e.g. in the discussion lines 303-304). Otherwise, there are only some minor comments (see below).

      We have now modified the following sentence and also included this suggestion in the title (“Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”).

      p. 13 L8: “Our analysis demonstrates the most impactful allocation of a global funding portfolio for ITNs to reduce global malaria cases.”

      1. Minor comments:
      2. It may be of interest to compare the maximum budget obtained from the optimization with other estimates of required funding and actual available funding.

      Thank you for this interesting suggestion. Our maximum budget estimates are similar to the required investments projected for the WHO Global Technical Strategy: US$3.7 billion for ITNs in our analysis compared to between US$6.8 and US$10.3 billion total annual resources between 2020 and 2030, of which an estimated 55% would be required for (all) vector control (US$3.7 - US$5.7 billion) (Patouillard et al., 2016, BMJ Global Health) (8). However, it is well known that current spending is far below these requirements: total investments in malaria were estimated to be about US$3.1 billion per year in the last 5 years (World Health Organization, 2022, World Malaria Report 2022) (9).

      1. Line 177: should "Figure S7" be bold?

      Yes, this has been corrected.

      1. Line 218: what does "no intervention with high budget" mean? Should this simply be "no intervention"?

      This has been changed.

      p. 9 L14: “We estimated that optimizing ITN allocation to minimize global clinical incidence could, at a high budget, avert 83% of clinical cases compared to no intervention.”

      1. In this reviewer's opinion it would be easier for the reader if the weighting term in the objective function would be added in the Materials and Methods section. The weighting could be added without extending the section substantially and the explanation in lines 390-393 may be easier to understand.

      Thank you for this suggestion. We agree and have added this in the main manuscript.

      References

      1. The Global Fund. Description of the 2020-2022 Allocation Methodology 2019 [Available from: https://www.theglobalfund.org/media/9224/fundingmodel_2020-2022allocations_methodology_en.pdf.

      2. Hay SI, Guerra CA, Tatem AJ, Noor AM, Snow RW. The global distribution and population at risk of malaria: past, present, and future. Lancet Infect Dis. 2004;4(6):327-36.

      3. Feachem RGA, Phillips AA, Hwang J, Cotter C, Wielgosz B, Greenwood BM, et al. Shrinking the malaria map: progress and prospects. The Lancet. 2010;376(9752):1566-78.

      4. Sherrard-Smith E, Winskill P, Hamlet A, Ngufor C, N'Guessan R, Guelbeogo MW, et al. Optimising the deployment of vector control tools against malaria: a data-informed modelling study. The Lancet Planetary Health. 2022;6(2):e100-e9.

      5. World Health Organization. WHO Guidelines for malaria, 31 March 2022. Geneva: World Health Organization; 2022. Contract No.: Geneva WHO/UCN/GMP/ 2022.01 Rev.1.

      6. Griffin JT, Bhatt S, Sinka ME, Gething PW, Lynch M, Patouillard E, et al. Potential for reduction of burden and local elimination of malaria by reducing Plasmodium falciparum malaria transmission: a mathematical modelling study. The Lancet Infectious Diseases. 2016;16(4):465-72.

      7. Bertozzi-Villa A, Bever CA, Koenker H, Weiss DJ, Vargas-Ruiz C, Nandi AK, et al. Maps and metrics of insecticide-treated net access, use, and nets-per-capita in Africa from 2000-2020. Nature Communications. 2021;12(1):3589.

      8. Patouillard E, Griffin J, Bhatt S, Ghani A, Cibulskis R. Global investment targets for malaria control and elimination between 2016 and 2030. BMJ global health. 2017;2(2):e000176.

      9. World Health Organization. World malaria report 2022. Geneva: World Health Organization; 2022. Report No.: 9240064893.

    1. Author Response:

      We take the liberty to thank all of you for your constructive and inspiring comments, which will help us substantially improve the final version of the paper. Before our final revision with details, I am writing this provisional letter to have a quick response to our reviewers’ comments.

      I first give a quick and short summary for your public reviews, then respond point-by-point.

      Editors:

      1. More discussion is needed.

      2. More discussion about eye fixation during adaptation. Discuss why increasing visual uncertainty by blurring the cursor in the present study produces the opposite findings of previous studies (Tsay et al., 2021; Makino et al., 2023).

      3. Discuss the broad impact of the current model.

      4. Share the codes and the metadata (instead of the current data format).

      Response: This is a concise summary of the major concerns listed in the public review. Given these concerns are easy to address, we are giving a quick but point-to-point response for now. The elaborate version will be put into our formal revision.

      **Reviewer 1: **

      1) More credit should be given to the PReMo model: a) The PReMo model also proposes that perceptual error drives implicit adaptation, as in a new publication in Tsay et al., 2023, which was not public at the time of the current writing; and b) The PReMo model can account for some dataset, e.g. Fig 4A.

      Response: We will add this new citation and point out that the new paper also uses the term perceptual error. We will also point out that the PReMo model has the potential to explain Fig 4A, though for now, it assumes an additional visual shift to explain the positive proprioceptive changes relative to the target. We would expand the discussion about the comparison between the two models.

      2) The present study produced an opposite finding of a previous finding, i.e., upregulating visual uncertainty (by cursor blurring here) decreases adaptation for large perturbations but less so for small perturbations, while previous studies have shown the opposite (by using a cursor cloud; Tsay et al., 2021; Makino et al., 2023). This needs explanation.

      Response: Using the cursor cloud (Tsay et al., 2021, Makino et al., 2023) to modulate visual uncertainty has inherent drawbacks that make it unsuitable for testing the sensory uncertainty effect for visuomotor rotation. For the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) AND the sensory estimate of angles (the error itself, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (sigma_v in our model), but it additionally affects the mean of the distribution (mu). This unnecessary confound is avoided by using cursor blurring, which is still a cursor with its center (mu) unchanged from an un-blurred cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2021, the cursor cloud often overlaps with the visual target. This “target hit” would affect adaptation, possibly via a reward learning mechanism (See Kim et al., 2019 eLife). This is a second confound that accompanies the cursor cloud. We will expand our discussion to explain the discrepancy between our findings and previous findings.

      3) The estimation of visual uncertainty (our exp1) required people to fixate on the target, while this might not reflect the actual scenario during adaptation where people are free to look wherever they want.

      Response: Our data shows otherwise: in a typical error-clamp setting, people fixate on the target for the majority of the time. For our Exp1, the fixation on the straight line between the starting position and the target is 86%-95% (as shown in Figure S1). We also collected eye-tracking data in our Exp4, which is a typical error-clamp experiment. More than 95% of gaze falls with +/- 50 pixels around the center of the screen, even slightly higher than Exp1. We will provide this part of the data in the revision. In fact, we designed our Exp1 to mimic the eye-tracking pattern as in typical error-clamp learning with carefully executed pilot experiments.

      This high percentage of fixating on the target is not surprising: the error-clamp task requires participants to use their hands to move towards the target and to ignore the cursor. In fact, we would also like to point out that the high percentage of fixation on the aiming target is also true for conventional visuomotor rotation, which involves strategic re-aiming (shown in de Brouwer et al. 2018; Bromberg et al. 2019; we have an upcoming paper to show this). This is one reason that our new theory would also apply to other types of motor adaptation.

      4) More methodology details are needed. E.g., a figure showing the visual blurring, a figure showing individual data, a table showing data from individual sessions, code sharing, and a possible new correlational analysis.

      Response: All these additional methodological/analysis information will be provided. We were self-limited by writing a short paper, but the revision would be extended for all these details.

      Reviewer 2:

      1) More discussions are needed since the focus of this study is narrowly confined to visuomotor rotation. “A general computational principle, and its contributions to other motor learning paradigms remain to be explored”.

      Response: This is a great suggestion since we also think our original Discussion has not elaborated on the possible broad impact of our theory. Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating multimodal feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least we should consider using the two principles above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for the skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.

      We will add more discussion on the possible broad implications of our model in the revision.

      Reviewer 3:

      1) Similar to Reviewer 1, raised the concern about whether people’s fixation in typical motor adaptation settings is similar to the fixation that we instructed in our Exp1.

      Response: see above.

      2) Similar to Reviewer 2, the concern was raised about whether our new theory is applicable to a broad context. Especially, error clamp appears to be a strange experimental manipulation that has no real-life appeal, “(i)Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world”.

      Response: about the broad impact of our model, please see responses to Reviewer 2 above. We agree that ignoring errors (and thus “trying” to suppress adaptation) should not be a movement strategy for real-world intentional tasks. However, even in real life, we constantly attend to one thing and do the other thing; that’s when implicit motor processes are in charge. Furthermore, it is this exact “ignoring” instruction that elicits the implicit adaptation that we can work on. In this sense, the error-clamp paradigm is a great vehicle to isolate implicit adaptation and allows us to unpack its cognitive mechanism.

      3) In Exp1, the 1s delay between the movement end and the presentation of the reference cursor might inflate the actual visual uncertainty.

      Response: The 1s delay of the reference cursor would not inflate the estimate of visual uncertainty. Our Exp1 used a similar paradigm by visual science (e.g., White, Levi, and Aitsebaomo, Vision Research, 1992), which shows that delay does not lead to an obvious increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6). We will add more methodology justifications in our revision.

      4) Our Fig4A used Tsay et al., 2021 data, which, in the reviewer’s view, is not an appropriate measure of proprioceptive bias. The reason is that in this dataset, “participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to.”

      Response: We agree that Tsay et al., 2021 study used an unconventional way to measure the influence of implicit adaptation on proprioception. And, their observed “proprioceptive changes” should not be called “proprioceptive bias” which is conventionally a reserved term for measuring the difference between the estimated hand location relative to the actual hand location (and better to be a passively moved hand). However, we think their dataset is still subject to the same Bayesian cue combination principle and thus can be modeled. Our modeling of this dataset includes all relevant cues: the implicitly perceived hand position and the proprioceptive cue (given that the hand stays at the movement end). Both cues are in the extrinsic coordinates, which happened to set the target position as zero. But where to set the zero (whether it is the target or the actual hand location) does not matter for the model fitting. Note that our Exp4 is also based on PEA modeling of proprioceptive bias, and this time the data is presented relative to the actual location.

      In the revision, we would keep the current Fig4A and start to call the data as proprioceptive change as opposed to proprioceptive bias to follow the convention.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In no particular order:

      1. In Figs S3 and S4, can they also show gamma fit? (or rather corrected fit accounting for abundance conditioning?) The shapes look different, especially for the microbial mat.

      Author response: We have added gamma distribution fits to the rescaled AFD plots (Figs. S3, S4).

      1. Lines 170-176 seem like they should come before lines 164-166.

      Author response: In lines 166-170 we discuss empirical patterns in the data that motivate the introduction of the SLM as a model in lines 170-175. We have clarified these points in the revision.

      1. The wiggles in the gamma predictions in the occupancy-abundance plots are because occupancy depends not only on abundance but also on the shape parameter, right? Probably good to write a sentence or two explaining what's going on here.

      Author response: We agree with the reviewer that the variation in the prediction could be in-part driven by variation in the shape parameter across community members. We now include this observation in our revision (lines 209-211).

      1. In the predicted vs observed occupancy plots, it would be nice to add curves showing predicted standard deviation or similar to give a sense of how well the model is predicting the variability.

      Author response: In the revised manuscript we now include predictions for the variance of occupancy using the gamma distribution under both taxonomic and phylogenetic coarse-graining (Fig. S9; S10; lines 211-214).

      1. Covariance between sister groups: Figs S9 and S10 look very nice, but it's hard to see much because they're log-log plots over multiple decades, while even a several-fold difference from y = x would indicate a strong effect of correlations. It would be clearer if the y-axis showed the ratio of the coarsegrained variance to the sum of OTU variances and we were looking at how well it fit y = 1.

      Author response: We have included these plots in the revision (Fig. S14, S15).

      1. If the sum of gammas can be well-approximated by a gamma, does that mean that the gamma is just a fairly flexible distribution and we shouldn't take the quality of the gamma fits in general as a very specific indication of what's going on?

      Author response: While the sum of random variables that are drawn from gamma distributions with different parameters is often well-approximated by another gamma, this does not tell us why the gamma distribution holds for microbial communities at the finest-grain level (i.e., OTUs/ASVs). At present, the best explanation is that the gamma is a stationary distribution for certain stochastic differential equations which have ecological interpretations (Grilli, 2020; Shoemaker et al., 2023). Furthermore, alternative two-parameter distributions have been tested alongside the gamma and have done a comparatively poor job capturing observed macroecological patterns (Grilli, 2020). These results suggest that the utility of the gamma distribution is not simply an outcome of its flexible nature, it succeeds because it has captured core ecological properties of microbial communities. In the case of the SLM, gamma-like distributions arise when a community member is subject to self-limiting growth and environmental noise. On the other hand, the stability of the gamma distribution might explain why it can be detected as shape of the AFD, as it does not fade out across coarse-graining level.

      1. What's going on with the variance of diversity in Fig S12? Does this suggest that some of the problem in Figure 4 could be with the analytic approximation rather than the model? I had a hard time understanding the part of the Methods explaining the simulation details (lines 587-597). It would be worth expanding this. Is there some way to explain how the correlations were simulated in terms of the SLM, e.g., correlations in the noise term across OTUs?

      Author response: We believe that deviations in the variance of diversity in Fig. S16g,h are driven by small deviations in our predictions of the second moment $$< (x*ln(x) | N_{m}, \bar{x}{i}, \beta{i}^{2} >$$ (Eq. S16). Alone these predictions are slight, but their effects become noticeable when summed over hundreds or thousands of taxa. We have included this observation in the revised manuscript (lines 268-271). However, this deviation pales in comparison with the magnitude of covariance in the empirical data, suggesting that our inability to predict the variance of richness and diversity is primarily driven by our assumption of statistical independence.

      Regarding the source of the correlations, under the SLM correlations in abundances can be introduced either by adding deterministic interaction terms or through correlated environmental noise. Determining which of these two options drives empirical correlations is an active area of research (e.g., Camacho-Mateu et al., 2023). For the purpose of this study, we remain agnostic on the cause of the correlations, optioning to instead emphasize that that the inclusion of correlations is necessary to reproduce observed slopes of the fine vs. coarse-grained relationship for diversity.

      1. In Figure 5ab, is the idea that the correlation in richness is primarily driven by the number of samples from the environment? Line 390 seems to say so, but it would be good to make this explicit and put it right in that section of the Results.

      Author response: Our results suggest that sampling effort (# reads) plays a larger role in determining the correlations between fine and coarse-grained measures of richness. We now clarify this point in the revised manuscript (lines 429-435).

      1. I don't totally understand the contrast in lines 369-372. If fine-scale diversity within one group begets coarse-grained diversity in another group, couldn't that show up as correlations in the AFDs? Or is the argument that only including within-group correlations in AFDs is enough to reproduce the pattern? I'm not sure I see how that could be.

      Author response: The term “begets” implies both causation and direction. If we see a positive relationship between diversity estimates at two different scales of observation the causal mechanism cannot be determined solely from correlations between samples obtained once from different sites. So, mechanisms consistent with niche construction/"DBD" can produce correlations, though the existence of correlations do not necessarily imply DBD.

      1. The discussion of niche construction on 429-431 doesn't match very well with 440-441. Basically, niche construction is a very broad concept, not a specific one, right?

      Author response: In lines 472-576 (formerly 429-431) we discuss how the existence of correlations between fine and coarse-grained scales does not point to a single ecological mechanism. Alternatively stated, observing a non-zero slope does not mean that niche construction is driving the relationship.

      In lines 476-487 (formerly 440-441) we discuss how the mechanism of cross-feeding has been shown to generate a positive relationship between fine and coarse-grained measures of diversity. This mechanism can be interpreted as a form of “niche construction”, so it is an instance of a tested ecological mechanism that aligns with the interpretation given in Madi et al. (2020).

      1. Isn't (8) just the negative binomial distribution?

      Author response: The convolution of the stationary solution of the SLM (i.e., a gamma distribution) and the Poisson limit of a multinomial sampling distribution returns a negative binomial distribution of read counts across hosts if samples have identical sampling depths. We now include this detail in the revision (line 593-595). Note however that if different samples have different sampling depths, the distribution of reads across samples is not a negative binomial.

      1. Missing 1/M in (9).

      Author response: We have fixed this omission in the revision.

      1. Schematic figures illustrating what the different statistics are intuitively capturing would really help this work be understandable to a broader audience, but they'd also be a ton of work.

      Author response: Richness and diversity are used in ecology to such an extent that we do not see the benefit of a conceptual diagram. Furthermore, we have included a conceptual diagram about our pipeline in our revision at the request of Reviewer 2 (Fig. S20).

      Reviewer #2:

      Major Recommendations

      If I were reviewing this manuscript for a regular journal, I believe the following issues would be important to address prior to publication.

      1. From my reading, the main points of this advance are that

      a. SLM models AFDs well at all levels of coarse-graining.

      b. This makes SLM a better null-model than UNTB for macroecological relationships.

      c. Using SLM on the EMP data, the richness slopes are well explained by SLM but not the diversity slopes. Therefore, any theory that hopes to explain the diversity slopes must include interactions. Argument B appears to be one of the key points yet is missing from the abstract, and should be made clearer. If these aren't the main points the authors intended, then other main points need to be highlighted more.

      Author response: In the revision we now explicitly mention argument b in the Abstract.

      1. The title should be more specific, so as to better reflect the content. (E.g. "UNTB is not a good null model for macroecological patterns" would seem more appropriate.)

      Author response: We would prefer to focus on the success of the SLM rather than the limitations of the UNTB in the title of this work. Therefore, we have modified our title as follows: “Investigating macroecological patterns in coarse-grained microbial communities using the stochastic logistic model of growth”.

      1. The manuscript would benefit from a clearer description of exactly what information the SLM retains about the data (perhaps even a cartoon panel in one of the figures). In particular, it is important to be explicit about the number of model parameters.

      Author response: The number of model parameters for the gamma AFD are now explicitly stated in the revision (Lines 579-580).

      1. The main point of Figures 2-4 seems to be that SLM is good at describing the data (and when it fails it is due to interactions) while UNTB fails to reproduce this behavior, in support of Argument B. This is not clear from the figure descriptions or titles, which focus on SLM's "predictive" power.

      Author response: Fig. 2a demonstrates that the gamma distribution predicted by the SLM explains the empirical distribution of abundances. This result provides motivation to predict the fraction of sites harboring a given community member (i.e., occupancy, Fig. 2c) as well as general measures of community composition including mean richness (Fig. 3a,c) and mean diversity (Fig. 3b,d) using parameters estimated from the data (not free parameters).

      This success led us to consider whether the gamma distribution could predict the variance of richness and diversity, which it could not because it does not capture covariance between community members (Fig. 4).

      In the revision we have identified opportunities to make these points clear throughout the Results. Furthermore, we have added additional detail to the legends of Figs. 2-4.

      1. The manuscript would benefit from clarifying the use of "prediction" related to the SLM. Since the gamma distributions predicted by SLM were fit to empirical data, it seems like the agreement between analytic means and empirical means (Fig. 3) is a statement on gamma distributions being a good fit for the AFD's more than SLM predicting richness and diversity. For example, from my reading, it seems like this analysis could be done numerically by shuffling species abundances across environments and seeing whether this changed the mean richness/diversity. I would not call this shuffling test a prediction, since it is more a statement on the relevance of interactions. SLM predicts gamma-distributed AFD's, but those distributions recovering the data they were trained on doesn't seem like a prediction.

      Author response: In this manuscript we identified the gamma distribution as an appropriate probability distribution to describe the distribution of relative abundances across samples over a range of coarse-grained scales. Motivated by this result, we performed a separate analysis where at each scale we estimated the mean and variance of relative abundance across sites for each community member. We then used these parameters to obtain the expected value of a community-level measure using an equation we derived by assuming that the gamma distribution was appropriate (e.g., richness, Eq. 13). We then compared the expected value of richness to the mean value from empirical data and assessed the similarity between the two values.

      The outcome of this procedure constitutes a prediction. While the mean and variance are parameters, estimating them from the empirical data has no connection with the operation of training a distribution on empirical data. We could have derived predictions such as Eq. 13 using any other probability distribution that can be parameterized using the mean and variance (e.g., Gaussian). Such a prediction would likely do a poor job even though it used the same means and variances used for our gamma predictions. This is because the choice of distribution would not have been a good descriptor of the distribution of abundances across hosts.

      To better explain this last -- perhaps the most significant -- issue, I'd like to ask the authors if the following recasting would be an accurate reflection of their conclusions, or if something is missing.

      1. "Focusing on the empirical relationship observed between diversity slopes by Madi 2020, we ask the question: does explaining these relationships require accounting for species-species correlations? Or could it be reproduced in a noninteracting model?" To address this question, one can perform a randomization test, shuffling abundances to preserve all single-OTU statistics but breaking any correlations. My reading of the authors' results is that (new result 1) the richness relationships would be preserved, while diversity relationships would not be preserved. [Note that this result 1 need not mention either SLM or UNTB.]

      Author response: The question of whether correlations between species are necessary to explain the observed slope of the fine vs. coarse-grained relationship was only one component of our research goals. Our first question was whether the SLM would prove to be a more appropriate null for evaluating the novelty of observed slopes. We believe that our results support the conclusion that the SLM is an appropriate null for this question, as it was able to capture observed slopes of the fine vs. coarse-grained relationship for estimates of richness, determining that correlations and the interactions that are ultimately responsible are not necessary to explain this result.

      We then find that the SLM as a null model fails to capture observed slopes of the fine vs. coarsegrained relationship for estimates of diversity and simulate the SLM with correlations to return reasonable estimates of the slope. However, here the question about correlations is a direct follow-up from our question about a null model that excludes interactions, so it is unclear how a randomization test would relate to this result.

      1. Instead of doing a randomization test (resampling the empirical distribution), one might insist on instead fitting a model to the AFD distributions, and sampling from that distribution rather than the empirical one.

      a. If doing it this way, one should of course ensure that the distribution being fit is a good description of the data.

      b. UNTB is a bad fit. SLM is a better fit, and in fact (new result 2) continues to be a good empirical fit even at coarse-grained levels.

      c. Can make statements on using SLM as a null model for these types of cross-scale relationships. Could try arguing that fitting an SLM model per-OTU (instead of resampling the empirical distribution) could offer some advantage if certain properties could be computed analytically from the fit parameters, instead of averaging over multiple computational rounds of resampling.

      Do these two points accurately summarize the manuscript? If so, this presentation avoids the confusion with "prediction". If my summary is missing some important point, the presentation should be revised to clarify the points I appear to have missed.

      Author response: In our manuscript we derive predictions from the gamma distribution, the stationary distribution of the SLM, that require parameters estimated from the data (i.e., mean and variance of relative abundance). These parameters are estimated from the data using normal procedures and then plugged into our predictions that assume the appropriateness of the gamma, returning values that are then compared to estimates from empirical data. Our estimation of the mean and variance does not assume that the empirical distribution following a gamma distribution, but the value returned by our function derived from the gamma distribution (e.g., Eq. 13) does make that assumption.

      To address the reviewer’s broader comment, we believe that following points summarize our manuscript:

      1. The gamma distribution as a stationary solution of the SLM captures macroecological patterns and predicts typical community-level properties (i.e., mean richness and diversity) across phylogenetic and taxonomic scales.

      2. The gamma distribution fails to predict variation in community-level properties (i.e., variance of richness and diversity) across phylogenetic and taxonomic scales. This occurs because the SLM is a mean-field model that does not explicitly include interactions between community members.

      3. Despite the inability to capture interactions, the gamma distribution succeeds at predicting the fine vs. coarse-grain slope for richness, a pattern that had previously been attributed to community member interactions. This result demonstrates that the novelty of a macroecological pattern hinges on one’s choice of null model.

      4. However, the gamma cannot capture the same relationship for diversity. Simulations of the gamma distribution that incorporate correlations between community members are capable of generating reasonable estimates of the slope.

      To address the reviewer’s comments regarding the appropriateness fitted gamma distributions, in our revision we have added fitted gamma distributions to plots of AFDs so that the reader can visually assess the ability of the gamma to describe empirical patterns (Fig. S3, S4).

      We have also obtained predictions for the slope of the fine vs. coarse-grained relationship for community richness using the same form of UNTB used by Madi et al (2020). In our revised manuscript we establish a procedure to infer the single parameter of this model, generate predictions of richness at fine and coarse-grained scales, and then evaluate whether the UNTB is capable of predicting the slope of the fine vs. coarse-grained relationship for richness (Supplementary Information; Figs. S18, 24-28; lines 277-278; 370-380).

      Other/minor comments

      1. The manuscript would be improved with more consistent terminology ("fine vs. coarse-grained relationship"/"the relationship" vs. "diversity slope"). Also, many readers may be used to OTUs referring to the rather fine level of description, as opposed to any chosen level; and could interpret indexing over groups as being in contrast with indexing over OTU's (coarse vs fine). The authors' use is perfectly correct, but keeping a consistent terminology would help.)

      Author response: We have revised our manuscript to specify the “slope” as the “slope of the fine vs. coarse-grained relationship” (e.g., Line 318). We also specify in the Results and in the Methods that we use “fine” and “coarse” as relative terms, keeping with the sliding-scale approach used in Madi et al (2020).

      1. While I appreciate this "slope" is something borrowed from other work, the clarity of the paper might benefit from a cartoon of how one goes from the raw data to the slopes at a particular coarse-graining level. (Optional).

      Author response: We had added a conceptual diagram to the revision (Fig. S20).

      1. The text often colloquially references "the gamma," "predictions of the gamma," etc. This phrasing comes across as sloppy, and the manuscript would be improved by being more specific.

      Author response: We now specify “gamma” as the “gamma distribution” throughout the manuscript.

      1. Equation 6 appears to be missing some subscripts on the x terms (included on the left of the equation).

      Author response: We thank the reviewer for noticing this error and we have corrected it in the revision.

      1. In "Simulating communities of correlated...AFDs", the acronym SAD is not defined.

      Author response: We thank the reviewer for noticing this error and we have corrected it in the revision.

      1. In Figure 2:

      a. Invariant is probably the wrong word for the title, since all the AFD's were rescaled by mean and variance before being compared. Data does support that the gamma distributions are good at describing the AFD's, but as stated in the description it's the general shape that is preserved, not the distribution itself.

      Author response: When we mention the invariance of the AFD we now specify that we mean that the shape of the distribution remained qualitatively invariant.

      b. I'd recommend changing the color coding to something with more contrast, since currently it's impossible to assess the claim that the shape of the distribution collapses.

      Author response: Our coarse-graining procedure is a sequential operation that has no intuitive point that would suggest the use of a contrasting colormap (e.g., if our scale ranged from -1 to 1 then there would be a natural point of contrast at zero).

      c. The legend is missing relevant technical details: How many OTU's were used to make plot a? How many samples?

      Author response: The number of samples was listed in the Materials and Methods (line 523). In the revision we now include a table with the average and total number of OTUs as well as the average number of reads for each environment (Table S1, S2).

      d. In plot b, is the mean relative abundance referring to "mean abundance when observed" or "mean across all samples"?

      Author response: The mean relative abundance is the mean abundance across all sites (line 204) and in the legend of Fig. 2.

      e. Since one argument here is that SLM fits these distributions better than UNTB, if possible it would be nice to see UNTB's failed fits here.

      Author response: A major feature of the UNTB is that the demographic parameters of community members are indistinguishable. Under the SLM, the variation in the mean relative abundance we observe suggests that the carrying capacities of community members vary over multiple orders of magnitude, a result that is incompatible with most forms of the UNTB (x-axis of Fig. 2b). We now mention this point in the revised manuscript (lines 110; 229; 455-471).

      1. In Figure 3:

      a. It is not clear how coarse-graining is included in model fitting. The "Deriving biodiversity measure predictions" section would benefit from including how coarse-graining is incorporated.

      Author response: We predict measures of biodiversity separately at each coarse-grained scale. We now clarify this detail in the revised manuscript (Lines 624-627).

      b. Reference Shannon Diversity in Methods.

      Author response: We now cite Shannon’s diversity.

      c. What is the blue/white color coding in plots a & c? It doesn't have any color key.

      Author response: Figs. 3-6 use a uniform light-to-dark scale for all environments, with each environment having its own color. For example, Fig. 3a contains data from the human gut microbiome. Human gut data were assigned the color aquamarine, so the shade of aquamarine for a given datapoint in Fig. 3a indicates the phylogenetic scale.

      In the revision we now clarify the colorscale in the legend of Fig. 3 and specify that the same scale is used in all subsequent figure legends.

      d. Re: earlier comments, why is richness considered a prediction? (Am I correct in my interpretation that panel b is almost a tautology - counting the number of zeros in the matrix either by rows or by columns - whereas panel d is nontrivial?)

      Author response: Mean richness as a measure of biodiversity depends on the fraction of sites where a given community member is present (i.e., occupancy). The mean relative abundance of a community member and its variation across sites (beta) is clearly related to occupancy, but those two statistics do not give you a prediction of occupancy. Obtaining a prediction of occupancy and, subsequently, richness, requires 1) a probability distribution of abundances (i.e., the gamma) and 2) a probability distribution of sampling (i.e., the Poisson). Using these two pieces of information, we derived a prediction for mean richness (Eq. 13). We then compare the value of richness obtained by plugging in the mean relative abundances, betas, and known number of reads to the observed mean richness obtained from the data.

      e. The lettering of subplots in Figure 3 is not consistent with Figure 4. Figure 3 subplots are also cited incorrectly in paragraph two on page six (lines 251-254).

      Author response: We thank the reviewer for noticing the error and we have corrected it in the revision.

      f. Again, if possible show UNTB predictions in plots a & c.

      Author response: In our revised manuscript we provide extensive descriptions and predictions of mean richness and the slope of the fine vs. coarse-grained relationship for richness using the form of the UNTB used in Madi et al. (2020; Figs. S18, S24 - S29; lines 277-282; 370-380). We then compare the error of these slope predictions to those obtained from the SLM, finding that the SLM generally outperforms UNTB (Figs. S27-S29).

      1. In Figure 4:

      a. What are the color codings in plots a & b?

      Author response: The color scale used in Fig. 4 is identical to the color scale used in Fig. 3. This detail is now specified in the legend of Fig. 4.

      b. What are the two lines of empirical data in plots a & b, and why is one of them dashed?

      Author response: We now specify what the two lines mean in the key within the figure.

      c. Same comment as earlier on predictions and richness.

      Author response: We now specify what the two lines mean in the key within the figure.

      1. In Figure 5:

      a. It wasn't clear to me in the manuscript how the authors generated these plots from the raw data. The manuscript would benefit from a clear cartoon/description of the data pipeline, from raw data to empirical (and analytic) slopes.

      Author response: We have added a conceptual diagram to the revised manuscript (Fig. S20).

      b. Make the figure title more descriptive to better connect it to the figure's objective (the richness slopes relationship is not novel, but the diversity slopes relationship is).

      Author response: We have revised the figure title.

      References

      Camacho-Mateu, J., Lampo, A., Sireci, M., Muñoz, M. Á., & Cuesta, J. A. (2023). Species interactions reproduce abundance correlations patterns in microbial communities (arXiv:2305.19154). arXiv. https://doi.org/10.48550/arXiv.2305.19154

      Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature Communications, 11(1), 4743. https://doi.org/10.1038/s41467-020- 18529-y

      Madi, N., Vos, M., Murall, C. L., Legendre, P., & Shapiro, B. J. (2020). Does diversity beget diversity in microbiomes? eLife, 9, e58999. https://doi.org/10.7554/eLife.58999

      Shoemaker, W. R., Sánchez, Á., & Grilli, J. (2023). Macroecological laws in experimental microbial systems (p. 2023.07.24.550281). bioRxiv. https://doi.org/10.1101/2023.07.24.550281

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thorough assessment of our study, their overall enthusiasm, and the helpful suggestions for clarifying the methods and results, additional analyses, and discussion points. We have made earnest efforts to address the weaknesses raised in the public review and other recommendations made by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Herein, Blaeser et al. explored the impact of migraine-related cortical spreading depression (CSD) on the calcium dynamics of meningeal afferents that are considered the putative source of migraine-related pain. Critically previous studies have identified widespread activation of these meningeal afferents following CSD; however, most studies of this kind have been performed in anesthetized rodents. By conducting a series of technically challenging calcium imaging experiments in conscious head fixed mice they find in contrast that a much smaller proportion of meningeal afferents are persistently activated following CSD. Instead, they identify that post-CSD responses are differentially altered across a wide array of afferents, including increased and decreased responses to mechanical meningeal deformations and activation of previously non-responsive afferents following CSD. Given that migraine is characterized by worsening head pain in response to movement, the findings offer a potential mechanism that may explain this clinical phenomenon.

      Strengths:

      Using head fixed conscious mice overcomes the limitations of anesthetized preps and the potential impact of anaesthesia on meningeal afferent function which facilitated novel results when compared to previous anesthetized studies. Further, the authors used a closed cranial window preparation to maximize normal physiological states during recording, although the introduction of a needle prick to induce CSD will have generated a small opening in the cranial preparation, rendering it not fully closed as suggested.

      Weaknesses:

      Although this is a well conducted technically challenging study that has added valuable knowledge on the response of meningeal afferents the study would have benefited from the inclusion of more female mice. Migraine is a female dominant condition and an attempt to compare potential sex-differences in afferent responses would undoubtedly have improved the outcome.

      Our study included only two females, largely reflecting the much higher success rate of AAV-mediated meningeal afferent GCaMP expression in males than in females. The reason for the lower yield in female mice is unclear to us at present but may involve, at least partly, sex-specific differences in the mechanisms responsible for efficient transduction with this AAV vector observed in peripheral tissues (Davidoff et al. 2003). While our study did not address sex differences, a recent study (Melo-Carrillo et al. 2017) reported CSD equally activating and sensitizing second-order dorsal horn neurons that receive input from meningeal afferents in male and female rats.

      The authors imply that the current method shows clear differences when compared to older anaesthetized studies; however, many of these were conducted in rats and relied on recording from the trigeminal ganglion. Inclusion of a subgroup of anesthetized mice in the current preparation may have helped to answer these outstanding questions, being is this species dependent or as a result of the different technical approaches.

      We have tried to address the anesthesia issue by conducting imaging sessions in several isoflurane-anesthetized mice. However, during these experiments, we observed a substantial decrease in the GCaMP fluorescence signal with a much lower signal-to-noise ratio that made the analyses of the afferents’ calcium signal unreliable. Reduced GCaMP signal in meningeal axons during anesthesia may be related to the development of respiratory acidosis, since lower pH leads to decreased GCaMP signal, as also mentioned by Reviewer #3. Of note, urethane anesthesia, which was used in all previous rat experiments, also produces respiratory acidosis.

      The authors discuss meningeal deformations as a result of locomotion; however, despite referring to their previous work (Blaeser et al., 2022), the exact method of how these deformations were measured could be clearer. It is challenging to imaging that simple locomotion would induce such deformations and the one reference in the introduction refers to straining, such as cough that may induce intracranial hypertension, which is likely a more powerful stimulus than locomotion.

      As part of the revision, we now provide a better description of the methodology (“Image processing and calcium signal extraction” section) used to determine meningeal deformations, including scaling, shearing, and Z-shift. In our previous paper (Blaeser et al. 2023), we provided an extensive description of the types of meningeal deformations occurring in locomoting mice. It should also be noted that locomotion drives cerebral vasodilation and intracranial pressure increases (Gao and Drew, 2016), which likely mediate, at least in part, the movement of the meninges towards the skull (positive Z-shift) and potentially other meningeal deformation parameters. We also agree with the reviewer that sudden maneuvers such as coughing and sneezing that lead to a larger increase in intracranial pressure are likely to be even more powerful drivers of endogenous intracranial mechanical stimulation than locomotion. Thus, our finding of increased responsiveness to locomotion-related meningeal deformation post-CSD may underestimate the increased afferent responsivity post-CSD during other behaviors such as coughing. We added this point to the discussion.

      More recently, several groups have used optogenetic triggering of CSD to avoid opening of the cranium for needle prick. Given the authors robustly highlight the benefit of the closed cranium approach, would such an approach not have been more appropriate.

      We agree with the reviewer that optogenetic methods used for CSD induction in non-craniotomized animals will further ensure accurate pressurization and, thus, will be an even better approach that avoids the burr hole used for pinprick. It should be noted, however, that the burr hole used for the pinprick likely had a minimal effect on intracranial pressure, as we minimized depressurization by plugging the burr hole throughout the experiments with a silicone elastomer. We have added this information to the revised Methods section.

      It is also worth noting that the optogenetic methodology used by others to provoke CSD was optimized only recently and relies on transgenic mice with a strong expression of YFP (Thy1.ChR2-YFP mice) within the superficial cortex that is not compatible with the afferent GCaMP imaging of meningeal afferents. Modifications using red-shifted opsins may allow the use of this strategy in the future.

      It was not clear how deformations predictors increased independent of locomotion (Figure 4D) as locomotion is essentially causing the deformations as noted in the study. This point was not so clear to this reviewer.

      As noted in our previous paper (Blaeser et al., 2023), deformation variables often exhibit different time courses than locomotion, even when a deformation is initially induced by the onset of locomotion. Most notably, the scaling-related deformation ramps up slowly and often persists for tens of seconds after the onset and termination of locomotion, which may be related to the recovery dynamics of the meningeal vascular response to locomotion. Overall, while locomotion serves as a predictor of meningeal deformation, we observed previously (Blaeser et al. 2023) many afferents whose responses were more closely associated with the moment-to-moment deformations than with the state of locomotion per se, suggesting that a unique set of stimuli is responsible for the activation of this deformation-sensitive afferent population. The increased sensitivity to deformation signals we observed following CSD suggests that the afferent population sensitive to deformation has unique properties that render it most susceptible to becoming sensitized following CSD. We now discuss this possibility.

      Reviewer #2 (Public Review):

      This is an interesting study examining the question of whether CSD sensitizes meningeal afferent sensory neurons leading to spontaneous activity or whether CSD sensitizes these neurons to mechanical stimulation related to locomotion. Using two-photon in vivo calcium imaging based on viral expression of GCaMP6 in the TG, awake mice on a running wheel were imaged following CSD induction by cortical pinprick. The CSD wave evoked a rise in intracellular calcium in many sensory neurons during the propagation of the wave but several patterns of afferent activity developed after the CSD. The minority of recorded neurons (10%) showed spontaneous activity while slightly larger numbers (20%) showed depression of activity, the latter pattern developed earlier than the former. The vast majority of neurons (70%) were unaffected by the CSD. CSD decreased the time spent running and the numbers of bouts per minute but each bout was unaffected by CSD. There also was no influence of CSD on the parameters referred to as meningeal deformation including scale, shear, and Z-shift. Using GLM, the authors then determine that there there is an increase in locomotion/deformation-related afferent activity in 51% of neurons, a decrease in 12% of neurons, and no change in 37%. GLM coefficients were increased for deformation related activity but not locomotion related activity after CSD. There also was an increase in afferents responsive to locomotion/deformation following CSD that were previously silent. This study shows that unlike prior reports, CSD does not lead to spontaneous activity in the majority of sensory neurons but that it increases sensitivity to mechanical deformation of the meninges. This has important implications for headache disorders like migraine where CSD is thought to contribute to the pathology in unclear ways with this new study suggesting that it may lead to increased mechanical sensitivity characteristic of migraine attacks.

      1) It would be helpful to know what is meant by "post-CSD" in many of the figures where a time course is not shown. The methods indicate that 4, 30 min runs were collected after CSD but this would span 2 hours and the data do not indicate whether there are differences across time following CSD nor whether data from all 4 runs are averaged.

      While we monitored time course changes in ongoing activity (see Figure 2), it was challenging to evaluate post-CSD changes in locomotion-related deformation responses at a fine temporal scale, as running bouts resumed at different time points post-CSD and occurred intermittently throughout the post-CSD analysis period. Our experiments were also not sufficiently powered to break out analyses at multiple different epochs post-CSD, partly because there wasn’t much locomotion. To allow comparisons using a sufficient number of bouts, we conducted our GLM analyses using all data collected during running bouts in the 2-hour post-CSD period (termed “post-CSD) versus in the 1-hour pre-CSD period. We have now clarified this further in the main text and figure legends.

      2) Why is only the Z-shift data shown in Figures 4A-C? Each of the deformation values seems to contribute to the activity of neurons after CSD but only the Z-shift values are shown.

      In many afferents, only one deformation variable best predicted the activity at both the pre- and post-CSD epochs. However, at the population level, all deformation variables were equally predictive. In the examples provided, the afferent developed augmented sensitivity that could only be predicted by the Z-shift variable, and the other deformation variables were not included to keep the figure legible. This is now clarified in the figure legend.

      3) How much does the animal moving its skull against the head mount contribute to deformations of the meninges if the skull is potentially flexing during these movements? Even if mice are not locomoting, they can still attempt to move their heads thus creating pressure changes on the skull and underlying meninges. The authors mention in the methods that the strong cement used to bind the skull plates and headpost together minimize this, but how do they know it is minimized?

      We did not measure skull flexing during locomotion and its potential effect on meningeal deformation. However, we would like to point out several considerations. It is evident from numerous imaging studies across various brain regions in freely moving animals, utilizing brain motion registration, that brain motion of the same scale (a few microns), as that observed in our studies, also occurs in the absence of head fixation (e.g., Glas et al, 2019; Zong et al 2021). In our system, the head-fixed mouse is locomoting on a cantilevered (spring-like) running wheel (see also Ramesh et al., 2018), which dissipates most, albeit not all, upward and forward forces applied to the skull during locomotion. Furthermore, the position of the headpost, anterior to where the mouse's paws touch the wheel, makes it hard for the mouse to push straight up and apply forces to the skull. We have updated the text in the methods section (Running wheel habituation) to address this. In our previous work (See Figure 2B in Blaeser et al. 2023), we found a substantial subset of afferents showing an increase in calcium activity that began after each bout of locomotion had terminated, and that lasted for many seconds, suggesting that skull flexing during locomotion may not play a leading role. Finally, we proposed in that study that meningeal deformations play a major role in the afferent response, given our findings of (i) sigmoidal stimulus-response curves between afferent activity and meningeal deformation and (ii) of different afferents that track scaling deformations along different axes. It is unlikely that all of these are related to any residual forces generated from skull deformations.

      4) What is the mechanism by which afferents initiate the calcium wave during the CSD itself? Is this mechanical pressure due to swelling of the cortex during the wave? If so, why does the CSD have no impact on the deformation parameters? It seems that this cortical swelling would have some influence on these values unless the measurements of these values are taken well after cortical swelling subsides. Related to point 1 above, it is not clear when these measurements are taken post-CSD.

      We provide, for the first time, evidence that CSD evokes local calcium elevation in meningeal afferent fibers in a manner that is incongruent with action potential propagation, as the activity gradually advances along individual afferents across many seconds during the wave. As indicated in Figure 1H, we measured these changes during the first 2 minutes post-CSD. Based on the reviewer’s question, we have now addressed whether mechanical changes occurring in the cortex in the wake of CSD might be responsible for the acute afferent activation we observed. We now include new data (Results, “Acute afferent activation is not related to CSD-evoked meningeal deformation” and Figure S2) showing an acute phase of meningeal deformation (as expected given the changes in extracellular fluid volume) lasting 40-80 seconds following the induction of CSD. Our data suggests, however, that these meningeal deformations are unlikely to be the main driver of the acute afferent calcium response. We propose that, based on the speed of the afferent calcium wave propagation and the distinct dynamics of calcium activity as compared to the dynamics of the deformations, the acute afferent response is more likely to be mediated by the spread of algesic mediators (e.g., glutamate, K+ ATP) and their diffusion into the overlying meninges.

      Because the peri-CSD meningeal deformations return to baseline soon after the cessation of the CSD wave, they are unlikely to affect our analyses of post-CSD changes in afferent sensitivity in the following 2 hours. This is also supported by our data (see Figure 3F-H) showing similar locomotion-related deformations pre- and post-CSD, which were measured after the deformations related to the CSD itself had subsided.

      5) How does CSD cause suppression of afferent activity? This is not discussed. It is probably a good idea in this discussion to reinforce that suppression in this case is suppression of the calcium response and not necessarily suppression of all neuronal activity.

      The mechanism underlying the suppression of afferent activity remains unclear. We now discuss the following points:

      First, the pattern of afferent responses resembles the rapid loss of cortical activity in the wake of a CSD, but its faster recovery points to a mechanism distinct from the pre-and post-synaptic changes responsible for the silencing of cortical activity (Sawant-Pokam et al., 2017; Kucharz and Lauritzen, 2018). Whether CSD drives the local release of mediators capable of reducing afferent excitability and spiking dynamics will require further studies.

      Second, the reviewer proposes that the suppressed calcium activity we observed in ~20% of the afferents immediately following CSD may reflect a decreased calcium response independent of afferent spiking activity. Such a process could theoretically involve factors influencing the GCaMP fluorescence (see also our response to Reviewer #3) and/or factors modifying the afferents’ spiking-to-calcium coupling. We note that if a CSD-related factor could modify the calcium response independent of afferent spiking, one would expect a more consistent effect across axons, reflected as a reduced signal in a larger proportion of the afferents, which we did not observe.

      6) How do the authors interpret the influence of CSD on locomotor activity? There was a decrease in bouts but the bouts themselves showed similar patterns after CSD. Is CSD merely inhibiting the initiation of bouts? Is this consistent with what CSD is known to do to motor activity? And again related to point 1, how long after CSD were these measurements taken? Were there changes in locomotor activity during the actual CSD compared to post-CSD?

      To the best of our knowledge, there is very little data on the effect of CSD on motor activity, making it challenging to engage in further speculation regarding the mechanisms underlying the preservation of running bouts patterns post-CSD. Houben et al. (2017) described a similar reduction in locomotion in mice, corresponding to decreased motor cortex (M1) activity, and preservation of intermittent locomotion bouts. In the revised Results section, we now provide information about the cessation of locomotor activity during the CSD wave and have added information regarding the measurement of locomotion following CSD.

      7) The authors mention the caveats of prior work where the skull is open and is thus depressurized. Is this not also the case here given there is a hole in the skull needed to induce CSD?

      Unlike previous electrophysiological studies, which involved several large openings (~2x2 mm), including at the site of the afferents’ receptive field, our study involved only a small burr hole located remotely (1.5 mm) from the frontal edge of our imaging window. As noted in our response to Reviewer #1, this burr hole (~0.5 mm diameter) was unlikely to produce inflammation at the imaging site or cause depressurization as it was sealed with a silicone plug throughout the experiment.

      8) The authors should check the %'s and the numbers in the pie chart for Figure 4. Line 224 says 53 is 22% but it does not look this way from the chart.

      The 22% reported is the percentage of afferents that developed sensitivity post-CSD among all the non-sensitive ones pre-CSD. The pie chart illustrates only afferents that were deemed sensitive before and/or after the CSD. We removed the % to clarify.

      9) Line 319 mentions that CSD causes "powerful calcium transients" in sensory neurons but it is not clear what is meant by powerful if there are no downstream effects of these transients being measured. The speculation is that these calcium transients could cause transmitter release, which would be an important observation in the absence of AP firing, but there are no data evaluating whether this is the case.

      We changed the term to “robust”

      Reviewer #3 (Public Review):

      Summary:

      Blaeser et al. set out to explore the link between CSD and headache pain. How does an electrochemical wave in the brain parenchyma, which lacks nociceptors, result in pain and allodynia in the V1-3 distribution? Prior work had established that CSD increased the firing rate of trigeminal neurons, measured electrophysiologically at the level of the peripheral ganglion. Here, Blaeser et al. focus on the fine afferent processes of the trigeminal neurons, resolving Ca2+ activity of individual fibers within the meninges. To accomplish these experiments, the authors injected AAV encoding the Ca2+ sensitive fluorophore GCamp6s into the trigeminal ganglion, and 8 weeks later imaged fluorescence signals from the afferent terminals within the meninges through a closed cranial window. They captured activity patterns at rest, with locomotion, and in response to CSD. They found that mechanical forces due to meningeal deformations during locomotion (shearing, scaling, and Z-shifts) drove non-spreading Ca2+ signals throughout the imaging field, whereas CSD caused propagating Ca2+ signals in the trigeminal afferent fibers, moving at the expected speed of CSD (3.8 mm/min). Following CSD, there were variable changes in basal GCamp6s signals: these signals decreased in the majority of fibers, signals increased (after a 25 min delay) in other fibers, and signals remained unchanged in the remainder of fibers. Bouts of locomotion were less frequent following CSD, but when they did occur, they elicited more robust GCamp6s signals than pre-CSD. These findings advance the field, suggesting that headache pain following CSD can be explained on the basis of peripheral cranial nerve activity, without invoking central sensitization at the brain stem/thalamic level. This insight could open new pathways for targeting the parenchymal-meningeal interface to develop novel abortive or preventive migraine treatments.

      Strengths:

      The manuscript is well-written. The studies are broadly relevant to neuroscientists and physiologists, as well as neurologists, pain clinicians, and patients with migraine with aura and acephalgic migraine. The studies are well-conceived and appear to be technically well-executed.

      Weaknesses:

      1) Lack of anatomic confirmation that the dura were intact in these studies: it is notoriously challenging to create a cranial window in mouse skull without disrupting or even removing the dura. It was unclear which meningeal layers were captured in the imaging plane. Did the visualized trigeminal afferents terminate in the dura, subarachnoid space, or pia (as suggested by Supplemental Fig 1, capturing a pial artery in the imaging plane)? Were z-stacks obtained, to maintain the imaging plane, or to follow visualized afferents when they migrated out of the imaging plane during meningeal deformations?

      We agree that avoiding disruption of the dura is challenging. Indeed, it took many months of practice before conducting the experiments in this manuscript to master methods for a craniotomy that spared the dura.

      We addressed the issue of meningeal irritation due to cranial window surgery in our previous work (Blaeser et al., 2023). In brief, we conducted vascular imaging using the same cranial window approach and showed no leakage of macromolecules from dural or pial vessels anywhere within the imaging window at 2-6 weeks after the surgery (Figure S1D in Blaeser et al. 2022). This data suggested no ongoing meningeal inflammation below the window. The very low level of ongoing activity we observed at baseline also suggests a lack of an inflammatory response that could lead to afferent sensitization before CSD. This is now mentioned in the Discussion.

      We conducted volumetric imaging for three main reasons: 1) To capture the activity of afferents throughout the meningeal volume. In our volumetric imaging approach, including in this work, we observed afferent calcium signals throughout the meningeal thickness (see Figure 5 in Blaeser et al. 2022). However, the majority of afferents were localized to the most superficial 20 microns (Figure S1E in Blaeser et al. 2022), suggesting that we mostly recorded the activity of dural afferents; 2) to enable simultaneous quantification of three-dimensional deformation and the activity of afferents throughout the thickness of the meninges. This allowed us to determine whether changes in mechanosensitivity could involve augmented activity to intracranial mechanical forces that produced meningeal deformation along the Z-axis of the meninges (e.g., increased intracranial pressure); 3) to provide a direct means to confirm that the afferent GCaMP fluorescent changes we observed were not due to artifacts related to meningeal motion along the Z-axis. We have now added this information to the “Two-photon imaging” section of the Methods.

      2) Findings here, from mice with chronic closed cranial windows, failed to fully replicate prior findings from rats with acute open cranial windows. While the species, differing levels of inflammation and intracranial pressure in these two preparations may contribute, as the authors suggested, the modality of measuring neuronal activity could also contribute to the discrepancy. In the present study, conclusions are based entirely on fluorescence signals from GCamp6s, whereas prior rat studies relied upon multiunit recordings/local field potentials from tungsten electrodes inserted in the trigeminal ganglion.

      As a family, GCamp6 fluorophores are strongly pH dependent, with decreased signal at acidic pH values (at matched Ca2+ concentration). CSD induces an impressive acidosis transient, at least in the brain parenchyma, so one wonders whether the suppression of activity reported in the wake of CSD (Figure 2) in fact reflects decreased sensitivity of the GCamp6 reporter, rather than decreased activity in the fibers. If intracellular pH in trigeminal afferent fibers acidifies in the wake of CSD, GCamp6s fluorescence may underestimate the actual neuronal activity.

      Previous in vivo rodent studies observed a tissue acidosis transient that peaks during the DC shift corresponding to the wavefront of the spreading depolarization, and lasting for ~ 10 min. (Mutch and Hansen, 1984). Since we observed a massive increase in afferent calcium activity with a propagation pattern resembling the cortical wave, it is unlikely that the cortical acidosis during the CSD wave strongly affected the GCaMP signal in the overlying meninges. Furthermore, if cortical acidosis non-discriminately affects the GCaMP signal, one would expect a more consistent effect across axons, reflected as a reduced calcium signal in a larger proportion of the afferents, which we did not observe. Finally, the finding that in affected afferents, decreased calcium activity lasted for > 20 min – a time point when cortical acidosis has fully recovered - points to a distinct underlying mechanism. We also note that any residual acidosis would not confound our main finding of increased calcium responses to meningeal deformation at later periods post-CSD, as acidosis should, if anything, decrease calcium-related fluorescence.

      The authors might consider injecting an AAV encoding a pHi sensor to the trigeminal ganglion, and evaluating pHi during and after CSD, to assess how much this might be an issue for the interpretation of GCamp6s signals. Alternatively, experiments assessing trigeminal fiber (or nerve/ganglion) activity by electrophysiology or some other orthologous method would strengthen the conclusions.

      Please see our comment above regarding the short duration of the pH changes post-CSD.

      N's are generally reported as # of afferents, obscuring the number of technical/biological replicates (# of imaging sessions, # of locomotion bouts, # of CSDs induced, # of animals).

      We now report the number of replicates (# of afferent, # of CSD events, and # of mice).

      Fig 1F trace over the heatmap is not explained in the figure legend. Is this the speed of the running wheel? Is it the apparent propagation rate of the GCamp6s transient through the imaging field?

      We have added to the legend of Figure 1 that the trace in panel F depicts locomotion speed.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable paper examines gene expression differences between male and female individuals over the course of flower development in the dioecious angiosperm Trichosantes pilosa. Male-biased genes evolve faster than female-biased and unbiased genes, which is frequently observed in animals, but this is the first report of such a pattern in plants. In spite of the limited sample size, the evidence is mostly solid and the methods appropriate for a non-model organism. The resources produced will be used by researchers working in the Cucurbitaceae, and the results obtained advance our understanding of the mechanisms of plant sexual reproduction and its evolutionary implications: as such they will broadly appeal to evolutionary biologists and plant biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      The evolution of dioecy in angiosperms has significant implications for plant reproductive efficiency, adaptation, evolutionary potential, and resilience to environmental changes. Dioecy allows for the specialization and division of labor between male and female plants, where each sex can focus on specific aspects of reproduction and allocate resources accordingly. This division of labor creates an opportunity for sexual selection to act and can drive the evolution of sexual dimorphism.

      In the present study, the authors investigate sex-biased gene expression patterns in juvenile and mature dioecious flowers to gain insights into the molecular basis of sexual dimorphism. They find that a large proportion of the plant transcriptome is differentially regulated between males and females with the number of sex-biased genes in floral buds being approximately 15 times higher than in mature flowers. The functional analysis of sex-biased genes reveals that chemical defense pathways against herbivores are up-regulated in the female buds along with genes involved in the acquisition of resources such as carbon for fruit and seed production, whereas male buds are enriched in genes related to signaling, inflorescence development and senescence of male flowers. Furthermore, the authors implement sophisticated maximum likelihood methods to understand the forces driving the evolution of sex-biased genes. They highlight the influence of positive and relaxed purifying selection on the evolution of male-biased genes, which show significantly higher rates of non-synonymous to synonymous substitutions than female or unbiased genes. This is the first report (to my knowledge) highlighting the occurrence of this pattern in plants. Overall, this study provides important insights into the genetic basis of sexual dimorphism and the evolution of reproductive genes in Cucurbitaceae.

      Reviewer #2 (Public Review):

      Summary:

      This study uses transcriptome sequence from a dioecious plant to compare evolutionary rates between genes with male- and female-biased expression and distinguish between relaxed selection and positive selection as causes for more rapid evolution. These questions have been explored in animals and algae, but few studies have investigated this in dioecious angiosperms, and none have so far identified faster rates of evolution in male-biased genes (though see Hough et al. 2014 https://doi.org/10.1073/pnas.1319227111).

      Strengths:

      The methods are appropriate to the questions asked. Both the sample size and the depth of sequencing are sufficient, and the methods used to estimate evolutionary rates and the strength of selection are appropriate. The data presented are consistent with faster evolution of genes with male-biased expression, due to both positive and relaxed selection.

      This is a useful contribution to understanding the effect of sex-biased expression in genetic evolution in plants. It demonstrates the range of variation in evolutionary rates and selective mechanisms, and provides further context to connect these patterns to potential explanatory factors in plant diversity such as the age of sex chromosomes and the developmental trajectories of male and female flowers.

      Weaknesses:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Reviewer #3 (Public Review):

      The potential for sexual selection and the extent of sexual dimorphism in gene expression have been studied in great detail in animals, but hardly examined in plants so far. In this context, the study by Zhao, Zhou et al. al represents a welcome addition to the literature.

      Relative to the previous studies in Angiosperms, the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers).

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      I have reviewed this new version and find that it now addresses some of the shortcomings of the previous manuscript. However, several important limitations still remain:

      1) The conclusion that sex-linked genes contribute relatively little to the patterns described is important and would be worth including in the manuscript briefly (not just the response letter), focusing for instance on the overall comparable proportions of sex-linked genes among male-biased (3/343=0.087%), female-biased (19/1145=1.66%) and unbiased genes (36/2378=1.51%).

      Authors’ response: Thank you for your advice. We have added these sentences in “Discussion” section (Lines 492-499).

      2) The new sentence included in the results "we also found that most of them were members of different gene families generated by gene duplication" is too vague. The motivation of this analysis is not explained, leaving the intended message unclear.

      Authors’ response: In the previous revision, as stressed by reviewer #1 “(2) Paragraph (407-416) describes the analysis of duplicated genes under relaxed selection but there is no mention of this in the results”, we added the sentence “we also found that most of them were members of different gene families generated by gene duplication” in “Relaxed selection” paragraph of the results. Accordingly, in “Discussion” section, we discussed the associations between gene duplication and relaxed selection (Lines 461-473).

      Following your suggestion, we revised the results (Lines 304-307) to “Using the RELAX model, we detected that 18 out of 343 OGs (5.23%) showed significant evidence of relaxed selection (K = 0.0184–0.6497) (Tables S9). Most of the 18 OGs are members of different gene families generated by gene duplication (Table S13)”. This makes it more coherent with the discussion.

      3) The sentences "given that dN/dS values of sex-biased genes were higher due to codon usage bias..." are very confusing. I do not understand the argument being made here. I do not see why "lower dS rates would be expected in sex-biased genes ..."

      Authors’ response: We respectfully argue that codon usage bias was positively related to synonymous substitution rates. That is, stronger codon usage bias may be related to higher synonymous substitution rates (Parvathy et al., 2022). Lower ENC values represent stronger codon usage bias. So, if ω (dN/dS) values of sex-biased genes are higher due to codon usage bias, we expect lower dS rates (That is, higher ENC values). Please refer to the relevant papers (e. g. Darolti et al., 2018; Catalan et al., 2018; Schrader et al., 2021, cited in the references of the paper).

      4) The manuscript now reports the proportion of unitigs annotated by similarity with a number of species. While this is an interesting observation, the reviewer was actually asking for a comparison between the number of unitigs (59,051) and the number of genes annotated in a typical cucurbitaceae genome. This would give an indication of the level of redundancy of the de novo assembled transcriptome.

      Authors’ response: We admit that in the final assembly, transcripts may be overestimated. We respectfully suggest that it may be inappropriate to assess the redundancy of the de novo assembled transcriptome by comparing the transcriptome sequences with the genomic sequences. An appropriate approach is to compare transcriptome sequences and transcriptome sequences among different species. For example, Hu et al., 2020 (reference cited in the paper) obtained 145,975 non-redundant unigenes from flower buds of female and male plants in Trichosanthes kirilowii. Mohanty et al. (2017) obtained 71,823 non-redundant unigenes from flower buds of female and male plants in Coccinia grandis.

      Reference:

      Mohanty JN, Nayak S, Jha S, Joshi RK. 2017. Transcriptome profiling of the floral buds and discovery of genes related to sex-differentiation in the dioecious cucurbit Coccinia grandis (L.) Voigt. Gene. 626: 395-406.

      5) From reading the text I could not understand the extent to which the permutation test actually agreed with the Wilcoxon rank sum test. The text says that the results were "almost consistent", which is too vague. This paragraph should be clarified.

      Authors’ response: We performed permutation test for sex-biased genes in floral buds and flowers at anthesis. However, only in floral buds, the results of both tests (permutation test and Wilcoxon rank sum test) are significant. Taking your suggestions in consideration, we have revised them as “Additionally, we found that only in floral buds, there were significant differences in ω values in the results of ‘free-ratio’ model (female-biased versus male-biased genes, P = 0.04282 and male-biased versus unbiased genes, P = 0.01114) and ‘two-ratio’ model (female-biased versus male-biased genes, P = 0.01992 and male-biased versus unbiased genes, P = 0.02127, respectively) by permutation t test, which is consistent with the results of Wilcoxon rank sum test.(Lines 273-280)”.

      6) The paragraph on the link between codon usage and dN/dS is very unclear and quite unnecessary. I would suggest to simply remove lines 312-323.

      Authors’ response: We respectfully argue that codon usage bias is one of the most important factors for higher rates of sequence evolution. Please refer to Darolti et al. (2018), Catalan et al. (2018) and Schrader et al. (2021) (cited in the references of the paper). We retain these lines here.

      7) The discussion contains many unnecessary repeats from the introduction and results section. I suggest shortening drastically at several places, including:

      • remove lines 367-369

      Authors’ response: Thank you for your suggestion. We revised these lines to “In this study, we compared the expression profiles of sex-biased genes between sexes and two tissue types, investigated whether sex-biased genes exhibited evidence of rapid evolutionary rates of protein sequences and identified the evolutionary forces responsible for the observed patterns in the dioecious Trichosanthes pilosa (Lines 369-373)”.

      We removed the sentence “We compared the expression profiles of sex-biased genes between sexes and two tissue types and examined the signatures of rapid sequence evolution for sex-biased genes, as well as the contributions of potential evolutionary forces. (Lines 374-376)”.

      • remove lines 395-410

      Authors’ response: Here we mainly discussed the possible associations between sex-biased genes, adaptation and sexual dimorphic traits. We retain them here for clarity.

      • remove lines 449-483, as they are almost entirely repetitions of elements already made clear in the results section.

      Authors’ response: In these paragraphs, we discussed reasons that lead to relaxed purifying selection for sex-biased genes. They are coherent with the results section. We retain them to make it clearer.

      Minor comments:

      • line 146: remove "However"

      Authors’ response: We have revised it.

      • line 187: "female flower buds tend to masculinize": the meaning is obscure

      Authors’ response: We revised them as “Using hierarchical clustering analysis, we evaluated different levels of gene expression across sexes and tissues (Fig. 2C). Gene expression for female floral buds clustered most distantly from expression in female flowers at anthesis. However, expression in male floral buds clustered with expression in female flowers at anthesis, suggesting that male floral buds maybe tend to feminization in the early stages of floral development.”.

      • line 226: "we sequenced transcriptomes of T. pilosa": rather say "we used the transcriptomes described above for T. pilosa"

      Authors’ response: We have revised it.

      • line 279: the meaning of "branch-site model A and branch site model null" is still not made clear.

      Authors’ response: We have revised it.

      • line 324: change to: "we also analysed whether female-biased and unbiased genes underwent... "

      Authors’ response: We have revised it.

    2. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable paper examines gene expression differences between male and female individuals over the course of flower development in the dioecious angiosperm Trichosantes pilosa. The authors show that male-biased genes evolve faster than female-biased and unbiased genes. This is frequently observed in animals, but this is the first report of such a pattern in plants. In spite of the limited sample size, the evidence is mostly solid and the methods appropriate for a non-model organism. The resources produced will be used by researchers working in the Cucurbitaceae, and the results obtained advance our understanding of the mechanisms of plant sexual reproduction and its evolutionary implications: as such they will broadly appeal to evolutionary biologists and plant biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      The evolution of dioecy in angiosperms has significant implications for plant reproductive efficiency, adaptation, evolutionary potential, and resilience to environmental changes. Dioecy allows for the specialization and division of labor between male and female plants, where each sex can focus on specific aspects of reproduction and allocate resources accordingly. This division of labor creates an opportunity for sexual selection to act and can drive the evolution of sexual dimorphism.

      In the present study, the authors investigate sex-biased gene expression patterns in juvenile and mature dioecious flowers to gain insights into the molecular basis of sexual dimorphism. They find that a large proportion of the plant transcriptome is differentially regulated between males and females with the number of sex-biased genes in floral buds being approximately 15 times higher than in mature flowers. The functional analysis of sex-biased genes reveals that chemical defense pathways against herbivores are up-regulated in the female buds along with genes involved in the acquisition of resources such as carbon for fruit and seed production, whereas male buds are enriched in genes related to signaling, inflorescence development and senescence of male flowers. Furthermore, the authors implement sophisticated maximum likelihood methods to understand the forces driving the evolution of sex-biased genes. They highlight the influence of positive and relaxed purifying selection on the evolution of male-biased genes, which show significantly higher rates of non-synonymous to synonymous substitutions than female or unbiased genes. This is the first report (to my knowledge) highlighting the occurrence of this pattern in plants. Overall, this study provides important insights into the genetic basis of sexual dimorphism and the evolution of reproductive genes in Cucurbitaceae.

      Reviewer #2 (Public Review):

      Summary:

      This study uses transcriptome sequence from a dioecious plant to compare evolutionary rates between genes with male- and female-biased expression and distinguish between relaxed selection and positive selection as causes for more rapid evolution. These questions have been explored in animals and algae, but few studies have investigated this in dioecious angiosperms, and none have so far identified faster rates of evolution in male-biased genes (though see Hough et al. 2014 https://doi.org/10.1073/pnas.1319227111).

      Strengths:

      The methods are appropriate to the questions asked. Both the sample size and the depth of sequencing are sufficient, and the methods used to estimate evolutionary rates and the strength of selection are appropriate. The data presented are consistent with faster evolution of genes with male-biased expression, due to both positive and relaxed selection.

      This is a useful contribution to understanding the effect of sex-biased expression in genetic evolution in plants. It demonstrates the range of variation in evolutionary rates and selective mechanisms, and provides further context to connect these patterns to potential explanatory factors in plant diversity such as the age of sex chromosomes and the developmental trajectories of male and female flowers.

      Weaknesses:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Reviewer #3 (Public Review):

      The potential for sexual selection and the extent of sexual dimorphism in gene expression have been studied in great detail in animals, but hardly examined in plants so far. In this context, the study by Zhao, Zhou et al. al represents a welcome addition to the literature.

      Relative to the previous studies in Angiosperms, the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers).<br /> Some aspects of the presentation have been improved in this new version of the manuscript.

      Specifically:

      • the link between sex-biased and tissue-biased genes is now slightly clearer,

      • the limitation related to the de novo assembled transcriptome is now formally acknowledged,

      • the interpretation of functional categories of the genes identified is more precise,

      • the legends of supplementary figures have been improved - a large number of typos have been fixed.

      in response to this first round of reviews. As I detail below, many of the relevant and constructive suggestions by the previous reviewers were not taken into account in this revision.

      For instance:

      • Reviewer 2 made precise suggestions for trying to take into account the potential confounding factor of sex-chromosomes. This suggestion was not followed.

      For the question of reviewer 2:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Empirically, the analyses could be expanded by an attempt to distinguish between genes on the autosomes and the sex chromosomes. Genotypic patterns can be used to provisionally assign transcripts to XY or XX-like behavior when all males are heterozygous and all females are homozygous (fixed X-Y SNPs) and when all females are heterozygous and males are homozygous (lost or silenced Y genes). Comparing such genes to autosomal genes with sex-biased expression would sharpen the results because there are different expectations for the efficacy of selection on sex chromosomes. See this paper (Hough et al. 2014; https://www.pnas.org/doi/abs/10.1073/pnas.1319227111), which should be cited and does in fact identify faster substitution rates in Y-linked genes.

      Authors’ response: We have cited Hough et al. (2014) and Sandler et al. (2018) in the revised manuscript. We agree that the presence of sex chromosomes is potentially a confounding factor. By adopting methods in Hough et al. (2014) and Sandler et al. (2018), we tried to distinguish transcripts on sex chromosomes from autosomal chromosomes. For a total of 2,378 unbiased genes, we found that 36 genes were putatively sex chromosomal genes, 20 of which were exclusively heterozygous and homozygous for males and females, respectively; while the other 16 genes showing an opposite genotyping patterns between males and females. For 343 male-biased genes, only three ones exhibit a pattern of potentially sex-linked. For the 1,145 female-biased genes, we identified 19 genes which might located on the sex chromosomes. Among the 19 genes, five genes were exclusively heterozygous for males and exclusively homozygous for females, while reversed genotyping patterns presented in the other 14 genes. So, sex-linked genes may contribute relatively little to rapid evolution of male-biased genes. An alternative explanation is that the results could be unreliable due to small sample sizes. Thus, we did not describe them in the Results section. We will investigate the issue when whole genome sequences and population datasets become available in the near future.

      • Reviewer 1 & 3 indicated that results were mentioned in the discussion section without having been described before. This was not fixed in this new version.

      For the question of reviewer 1:

      2) Paragraph (407-416) describes the analysis of duplicated genes under relaxed selection but there is no mention of this in the results.

      Authors’ response: Following this suggestion, in the Results section, we have added a sentence, “We also found that most of them were members of different gene families generated by gene duplication (Table S13)” on line 310-311 in the revised manuscript (Rapid_evolution_of_malebiased_genes_Trichosanthes_pilosa_Tracked_change_2023_11_06.docx).

      For the question of reviewer 1:

      38- line 417-424. The discussion should not contain new results.

      Authors’ response: Thank you for pointing out this. In the Results section, we have added a few sentences as following: “Similarly, given that dN/dS values of sex-biased genes were higher due to codon usage bias, lower dS rates would be expected in sex-biased genes relative to unbiased genes (Ellegren & Parsch, 2007; Parvathy et al., 2022). However, in our results, the median of dS values in male-biased genes were much higher than those in female-biased and unbiased genes in the results of ‘free-ratio’ (Fig. S4A, female-biased versus male-biased genes, P = 6.444e-12 and malebiased versus unbiased genes, P = 4.564e-13) and ‘two-ratio’ branch model (Fig. S4B, femalebiased versus male-biased genes, P = 2.2e-16 and male-biased versus unbiased genes, P = 9.421e08, respectively). ” on line 323-331, and consequently, removed the following sentence, “femalebiased vs male-biased genes, P = 6.444e-12 and male-biased vs unbiased genes, P = 4.564e-13” and “female-biased versus male-biased genes, P = 2.2e-16 and male-biased versus unbiased genes, P = 9.421e-08, respectively” in the Discussion section.

      • Reviewer 1 asked for a comparison between the number of de novo assembled unigenes in this transcriptome and the number of genes in other Cucurbitaceae species. I could not see this comparison reported.

      Authors’ response: In the first revision, we described only percentages. We have now added the number of genes. We modify this part as follows: “The majority of unigenes were annotated by homologs in species of Cucurbitaceae (61.6%, 36,375), including Momordica charantia (16.3%, 9,625), Cucumis melo (11.9%, 7,027), Cucurbita pepo (11.9%, 7,027), Cucurbita moschata (11.5%, 6,791), Cucurbita maxima (10.1%, 5,964) and other species (38.4%, 22,676) (Fig. S1C).”.

      • Reviewer 1 pointed out that permutation tests were more appropriate, but no change was made to the manuscript.

      Authors’ response: Thank you for your suggestion. In the first revision, we have indirectly responded to the issues. Wilcoxon rank sum test is more commonly used for all comparisons between sex-biased and unbiased genes in many papers. Additionally, we tested datasets using permutation t-tests, which is consistent with the results of Wilcoxon rank sum test. For example, we found that only in floral buds, there are significant differences in ω values in the results of ‘free-ratio’ (female-biased versus male-biased genes, P = 0.04282 and male-biased versus unbiased genes, P = 0.01114) and ‘two-ratio’ model (female-biased versus male-biased genes, P = 0.01992 and male-biased versus unbiased genes, P = 0.02127, respectively). We also described these results in the Results section accordingly (line 278-284).

      • Reviewer 3 pointed out the small sample size (both for the RNA-seq and the phylogenetic analysis), but again this limitation is not acknowledged very clearly.

      Authors’ response: Sorry, we acknowledged that our sample size was relatively small. In the revised version, we have added a sentence as follows, “Additionally, our sample size is relatively small, and may provide low power to detect differential expression.” in the Discussion section.

      • Reviewer 1 & 3 pointed out that Fig 3 was hard to understand and asked for clarifications that I did not see in the text and the figure in unchanged.

      Authors’ response: Thank you for your suggestions. We have revised the manuscript to clarify the meaning of the acronym (F1TGs, F2TGs, M1TGs, M2TGs, F1BGs, F2BGs, M1BGs and M2BGs) and presented the number of genes. We have added two labels, indicating that panels A and B correspond to males and C and D to females in Fig. 3.

      • Reviewer 3 suggested to combine all genes with sex-bias expression when evaluating the evolutionary rate, in addition to the analyses already done. This suggestion was not followed.

      For the question of reviewer 3:line 196 and following: In these analyses, I could not understand the rationale for keeping buds vs mature flowers as separate analyses throughout. Why not combine both and use the full set of genes showing sex-bias in any tissue? This would increase the power and make the presentation of the results a lot more straightforward.

      Authors’ response: Thank you for your suggestions. In the first revision, we tried to respond to the issues. First, we observed strong sexual dimorphism in floral buds, such as racemose versus solitary, early-flowering versus late-flowering. Second, as you pointed out earlier, “the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers)”, we totally agree with you on this point. Third, according to your suggestions, we combined all genes with sex-bias expression to evaluate the evolutionary rates. We found significant differences (please see a Figure below) in ω values in the results of ‘free-ratio’ (female-biased versus male-biased genes, P =0.005622 and male-biased versus unbiased genes, P = 0.001961) and ‘two-ratio’ model (female-biased versus male-biased genes, P = 0.008546 and male-biased versus unbiased genes, P = 0.009831, respectively) using Wilcoxon rank sum test. However, the significance is lower than previous results in floral buds due to sex-biased genes of mature flower joined, especially compared to the results of “free-ratio model”. Additionally, we also test all combined genes with sex-bias expression using permutation t-test. Unfortunately, there are no significant differences in ω values expect for male-biased versus unbiased genes in the results of ‘free-ratio’ model (P = 0.03034) and ‘two-ratio’ model (P = 0.0376), respectively. To a certain extent, the combination of all genes with sex-bias expression may cover the signals of rapid evolution of sex-biased genes in floral buds. Therefore, these results are not described in our manuscript. In the near future, we would like to make further investigations through more development stages of flowers and new technologies (e.g. Single-Cell method, See Murat et al., 2023) in each sex to consolidate the conclusion, and it is hoped that we could find more meaningful results.

      Author response image 1.

      • Reviewer 3 pointed out that hand-picking specific categories of genes was not statistically valid, and in fact not necessary in the present context. This was not changed.

      For the question of reviewer3: removing genes on a post-hoc basis seems statistically suspicious to me. I don't think your analysis has enough power to hand-pick specific categories of genes, and it is not clear what this brings here. I suggest simply removing these analyses and paragraphs.

      Authors’ response: Thank you for your suggestions. We have changed them accordingly. We removed a part of the following paragraph, “To confirm the contributions of positive selection and relaxed selection to rapid rates of male-biased genes in floral buds, we generated three datasets of OGs by excluding different sets of genes. Specifically, we excluded 18 relaxed selective male-biased genes (5.23%), 98 positively selected male-biased genes (28.57%), and 112 male-biased genes (32.65%) under positive and relaxed selection from 343 OGs (Fig. S4). We observed that after excluding male-biased genes under relaxed purifying selection, the median (0.264) decreased by 0.34% compared to the median (0.265) of all OGs (Fig. S4A-B). However, after excluding positively selected male-biased genes, the median (0.236) was reduced by 11% (Fig. S4A, C) in the results of ‘free-ratio’ branch model. This pattern was consistent with the results of ‘two-ratio’ branch model as well (Fig. S4E-G).” on line 290 to 300.

      However, we kept the following paragraph, “We also analyzed female-biased and unbiased genes that underwent positive and relaxed selection in floral buds (Tables S6-S10). We identified 216 (18.86%) positively selected, and 69 (6.03%) relaxed selective female-biased genes from 1,145 OGs, respectively. Similarly, we found 436 (18.33%) positively selected, and 43 (1.81%) unbiased genes under relaxed selection from 2,378 OGs, respectively. Notably, male-biased genes have a higher proportion (10%) of positively selected genes compared to female-biased and unbiased genes. However, relaxed selective male-biased genes have a higher proportion (3.24%) than unbiased genes, but about 0.8% lower than that of female-biased genes.”. In this way, we can compare the proportion of sex-biased genes that have undergone positive selection and release selection among female-biased genes, unbiased genes and male-biased genes in floral buds in the Discussion section.

      • Reviewer 1 asked for all data to be public, but I could not find in the manuscript where the link to the data on ResearchGate was provided.

      Authors’ response: We have added a link in the Data Availability section.

      • Reviewers 1 & 3 pointed out that since only two tissues were compared, the claims on pleiotropy should have been toned down, but no change was made to the text.

      Authors’ response: Thank you for your suggestions. We revised “due to low pleiotropic constraints” to “due to low evolutionary constraints” and revised “low pleiotropy” to “low constraints”.

      • Reviewer 1 asked for a clarification on which genes are plotted on the heatmap of Fig3C and an explanation of the color scale. No change was made.

      Authors’ response: Sorry for the confusion. Actually, Reviewer 1 asked that “Fig. 2C, which genes are plotted on the heatmap and what is the color scale corresponding to?” In the previous revision, we have revised them (See Fig. 2 Sex-biased gene expression for floral buds and flowers at anthesis in males and females of Trichosanthes pilosa). Sex-biased genes (the union of sex-biased genes in F1, M1, F2 and M2) are plotted on the heatmap. The color gradient represents from high to low (from red to green) gene expression.

      • Reviewer 1 asked for panel B in Fig S5 and S6 to be removed. They are still there. They asked for abbreviations to be explained in the legend of Fig S8. This was not done. They asked for details about columns headers. Such detailed were not added. They asked for more recent references on line 53-56: this was not done.

      Authors’ response: We have removed panel B in Fig. S5 and S6. We explained abbreviations in text and Fig. S8. We added more details about the column headers in Supplementary Table S4, S5, S6, S7, S8, S9 and S10. We also added more recent references on line 53-56.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Authors’ response: Thank you for your suggestions. We have revised/fixed these issues following your concerns and suggestions.

      Line 46-48 would be clearer as « Sexual dimorphism is the condition where sexes of the same species exhibit different morphological, ecological and physiological traits in gonochoristic animals and dioecious plants, despite male and female individuals sharing the same genome except for sex chromosomes or sex-determining loci »

      Authors’ response: Thanks. We have revised it accordingly.

      Line 50: replace «in both » by «between the two »

      Authors’ response: We have revised it.

      Line 51: « genes exclusively » -> « genes expressed exclusively »

      Authors’ response: We have revised it.

      Line 58: « in many animals » -> « in several animal species »

      Authors’ response: We have revised it to “in some animal species”.

      Line 58: « to which » -> « of this bias »

      Authors’ response: We have revised it.

      Line 64: « Most dioecious plants possess homomorphic sex-chromosomes that are roughly similar in size when viewed by light microscopy. » : a reference is missing

      Authors’ response: We have added the reference.

      Line 67: remove « that »

      Authors’ response: We have revised it.

      line 96: change to: « only the five above-mentioned studies »

      Authors’ response: We have revised it.

      Line 97: remove « the »

      Authors’ response: We have revised it.

      Line 111: « Drosophia » -> Drosophila

      Authors’ response: We have revised it.

      Line 114: exhibiting -> « exhibited »

      Authors’ response: We have revised it.

      Line 115: suggest -> « suggesting »

      Authors’ response: We have revised it.

      Line 117: « studies in plants have rarely reported elevated rates of sex-biased genes » : is it « rarely » or « never » ?

      Authors’ response: We have revised to “never”.

      Line 143: « It’s » -> « Its »

      Authors’ response: We have revised it.

      Line 143-146: say whether the male parts (e.g. anthers) are still present in females flowers, and the female parts (pistil+ ovaries) in the male flowers, or whether these respective organs are fully aborted.

      Authors’ response: We have added the following sentence, “The male parts (e. g., anthers) of female flowers, and the female parts (e. g., pistil and ovaries) of male flowers are fully aborted” in line 148150 of the Introduction section.

      Line 158: this is now clearer, but please specify whether you are talking about 12 floral buds in total, or 12 per individual (i.e. 72 buds in total).

      Authors’ response: We have revised it to “Using whole transcriptome shotgun sequencing, we sequenced floral buds and flowers at anthesis from female and male of dioecious T. pilosa. We set up three biological replicates from three female and three male plants, including 12 samples in total (six floral buds and six flowers at anthesis)”.

      Line 194-198: These sentences are unclear and hard to link to the figure. Consider changing for « In male plants, the number of tissue-biased genes in flowers at anthesis (M2TGs: n = 2795) was higher than that in floral buds (M1TGs: n = 1755, Fig. 3A and 3B). Figure 3 is also very hard to read. Adding a label on the side to indicate that panels A and B correspond to male-biased genes and C and D to female-biased genes could be useful.

      Authors’ response: Thank you for your suggestions. We have revised the text to clarify the meaning of the acronym (F1TGs, F2TGs, M1TGs, M2TGs, F1BGs, F2BGs, M1BGs and M2BGs) and presented the number of genes. We have added two labels, indicating that panels A and B correspond to males and C and D to females in Figure 3.

      Line 208: explain the approach: e.g. « We then compared rates of protein evolution among malebiased, female-biased and unbiased genes. To do this, we sequenced floral bud transcriptomes from the closely related T. anguina, as well as two more distant outgroups, T. kirilowii and Luffa cylindrica. T. kirilowii is a dioecious species like T. pilosa, and the other two are monoecious. We identified one-to-one orthologous groups (OGs) for 1,145 female-biased, 343 male-biased, and 2,378 unbiased genes. »

      Authors’ response: We have revised this paragraph to the following, “We compared rates of protein evolution among male-biased, female-biased and unbiased genes in four species with phylogenetic relationships (((T. anguina, T. pilosa), T. kirilowii), Luffa cylindrica), including dioecious T. pilosa, dioecious T. kirilowii, monoecious T. anguina in Trichosanthes, together with monoecious Luffa cylindrica. To do this, we sequenced transcriptomes of T. pilosa. We also collected transcriptomes of T. kirilowii, as well as genomes of T. anguina and Luffa cylindrica.”

      Line 220: « the same ω value was in all branches » -> « all branches are constrained to have the same ω value ».

      Authors’ response: We have revised it.

      Line 221: « results of the 'two-ratio' branch model ... »

      Authors’ response: We have revised it.

      Line 235: add a few words to explain why the effect size is bigger than for buds, but still is not significant: e.g. «possibly because of limited statistical power due to the low number of sex-biased genes in flowers at anthesis »

      Authors’ response: We have revised this to “However, there is no statistically significant difference in the distribution of ω values using Wilcoxon rank sum tests for female-biased versus male-biased genes (P = 0.0556), female-biased versus unbiased genes (P = 0.0796), and male-biased versus unbiased genes (P = 0.3296) possibly because of limited statistical power due to the low number of sex-biased genes in flowers at anthesis.” in line 260-261.

      Line 255: explain in plain English what the « A model » is. This was already requested in the previous version.

      Authors’ response: We have revised “A model” to “classical branch-site model A”.

      Line 258: explain in plain English what the « foreground 2b ω value » corresponds to

      Authors’ response: We have revised to as follows, “foreground 2b ω value” to “foreground ω >1”. Additionally, we also added the sentence “The classical branch-site model assumes four site classes (0, 1, 2a, 2b), with different ω values for the foreground and background branches. In site classes 2a and 2b, the foreground branch undergoes positive selection when there is ω > 1.” in line 624-627.

      Line 259: explain how these different approaches complement each other rather than being redundant. This was also already requested in the previous version.

      Authors’ response: Sorry. We have now revised it as follows, “As a complementary approach, we utilized the aBSREL and BUSTED methods that are implemented in HyPhy v.2.5 software, which avoids false positive results by classical branch-site models due to the presence of rate variation in background branches, and detected significant evidence of positive selection.” in line 292-295.

      Line 270: remove « dramatically », and also remove « or eliminated at both gene-wide and genomewide levels », as well as « relative to positive selection »

      Authors’ response: Thank you for your suggestions. We have revised it.

      Line 290-309: remove this section - this was already pointed out in the previous reviews as a « ad hoc » procedure, and this point has already been made clear with the RELAX analysis.

      Authors’ response: Thank you for your suggestions. We revised this section accordingly. We remove the following paragraph, “To confirm the contributions of positive selection and relaxed selection to rapid rates of male-biased genes in floral buds, we generated three datasets of OGs by excluding different sets of genes. Specifically, we excluded 18 relaxed selective male-biased genes (5.23%), 98 positively selected male-biased genes (28.57%), and 112 male-biased genes (32.65%) under positive and relaxed selection from 343 OGs (Fig. S4). We observed that after excluding malebiased genes under relaxed purifying selection, the median (0.264) decreased by 0.34% compared to the median (0.265) of all OGs (Fig. S4A-B). However, after excluding positively selected malebiased genes, the median (0.236) was reduced by 11% (Fig. S4A, C) in the results of ‘free-ratio’ branch model. This pattern was consistent with the results of ‘two-ratio’ branch model as well (Fig. S4E-G).” on line 334-344.

      However, we kept the other parts “We also analyzed female-biased and unbiased genes that underwent positive and relaxed selection in floral buds (Tables S6-S10). We identified 216 (18.86%) positively selected, and 69 (6.03%) relaxed selective female-biased genes from 1,145 OGs, respectively. Similarly, we found 436 (18.33%) positively selected, and 43 (1.81%) unbiased genes under relaxed selection from 2,378 OGs, respectively. Notably, male-biased genes have a higher proportion (10%) of positively selected genes compared to female-biased and unbiased genes. However, relaxed selective male-biased genes have a higher proportion (3.24%) than unbiased genes, but about 0.8% lower than that of female-biased genes.”. In this way, we can compare the proportion of sex-biased genes that have undergone positive selection and release selection among female-biased genes, unbiased genes and male-biased genes in floral buds in the Discussion sections.

      Line 348: Here you talk about « Numerous studies », but then only report three studies. Please clarify.

      Authors’ response: Thank you for your suggestions. We have revised it to “Several studies”.

      Line 352: Cut the sentence: « In contrast, the wind-pollinated dioecious plant Populus balsamifera ... »

      Authors’ response: Thank you for your suggestions. We have revised it.

      Line 357: « In contrast to the above studies... »: If I understand correctly, this is not in contrast to the observation in Populus balsamifera. Please clarify.

      Authors’ response: Thank you for your suggestions. We have revised to “Similar to the above study of Populus balsamifera.”.

      Line 420: « our results » -> « we »; « that underwent » -> « undergoing »

      Authors’ response: Thank you for your suggestions. We have revised it.

      Figure 3 is very hard to read and poorly labeled (see my comments on line 194 above). It is also hard to link to the text, since the numbers reported in the text are actually not present in the figure unless the readers makes some calculations themselves. This should be improved. Also, the use of acronyms (e.g. M1BG, F2TG etc.) contributes to making the text very difficult to read. The acronyms should at least be explained very clearly in the text when they are used.

      Authors’ response: Thank you for your suggestions. We have revised the text to clarify the meaning of the acronym (F1TGs, F2TGs, M1TGs, M2TGs, F1BGs, F2BGs, M1BGs and M2BGs) and give the number of genes. We have added two labels, indicating that panels A and B correspond to males and C and D to females in Figure 3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The apicoplast, a non-photosynthetic vestigial chloroplast, is a key metabolic organelle for the synthesis of certain lipids in apicomplexan parasites. Although it is clear metabolite exchange between the parasite cytosol and the apicoplast must occur, very few transporters associated with the apicoplast have been identified. The current study combines data from previous studies with new data from biotin proximity labeling to identify new apicoplast resident proteins including two putative monocarboxylate transporters termed MCT1 and MCT2. The authors conduct a thorough molecular phylogenetic analysis of the newly identified apicoplast proteins and they provide compelling evidence that MCT1 and MCT2 are necessary for normal growth and plaque formation in vitro along with maintenance of the apicoplast itself. They also provide indirect evidence for a possible need for these transporters in isoprenoid biosynthesis and fatty acid biosynthesis within the apicoplast. Finally, mouse infection experiments suggest that MCT1 and MCT2 are required for normal virulence, with MCT2 completely lacking at the administered dose. Overall, this study is generally of high quality, includes extensive quantitative data, and significantly advances the field by identifying several novel apicoplast proteins together with establishing a critical role for two putative transporters in the parasite. The study, however, could be further strengthened by addressing the following aspects:

      Response: We thank very much the reviewer for his/her positive evaluation of our work. To address the detailed function of the transporters, in the past three months, we have re-constructed plasmids (with codon-optimized DNA sequences of the genes) for expression of the transporters in a regular expression E. coli strain (BL21DE3) and in a pyruvate import knockout E. coli strain (a gift from Prof. Kirsten Jung), to examine the transport capability in vitro. And, we have also re-constructed a new plasmid containing a new leading peptide for targeting the pyruvate sensor PyronicSF to the apicoplast in the parasite, to probe the possible substrate pyruvate. However, we did not successfully observe expression of the transporters in the above E. coli strains, and we were unable to target the sensor to the correct localization (the apicoplast) in the parasite. As a result, all efforts have led the study to the current version of manuscript on the functional identification of transporters. We will keep working on this aspect, attempting to dissect out the exact transport function of the transporters in the future. In the current manuscript, we have discussed the limitations of our study in the last part of the manuscript.

      Main comments

      1) The conclusion that condition depletion of AMT1 and/or AMT2 affects apicoplast synthesis of IPP is only supported by indirect measurements (effects on host GFP uptake or trafficking, possibly due to effects on IPP dependent proteins such as rabs, and mitochondrial membrane potential, possibly due to effects on IPP dependent ubiquinone). This conclusion would be more strongly supported by directly measuring levels of IPP. If there are technical limitations that prevent direct measurement of IPP then the author should note such limitations and acknowledge in the discussion that the conclusion is based on indirect evidence.

      Response: We thank the reviewer very much for the suggestions. We have tried to establish the measurement of IPP using a commercial company in recent months, yet we have not been successful in making the assay work. Considering the problem of indirect evidence, we have discussed this limitation in the discussion.

      2) The conclusion that condition depletion of AMT1 and/or AMT2 affects apicoplast synthesis of fatty acids is also poorly supported by the data. The authors do not distinguish between the lower fatty acid levels being due to reduced synthesis of fatty acids, reduced salvage of host fatty acids, or both. Indeed, the authors provide evidence that parasite endocytosis of GFP is dependent on AMT1 and AMT2. Host GFP likely enters the parasite within a membrane bound vesicle derived from the PVM. The PVM is known to harbor host-derived lipids. Hence, it is possible that some of the decrease in fatty acid levels could be due to reduced lipid salvage from the host. Experiments should be conducted to measure the synthesis and salvage of fatty acids (e.g., by metabolic flux analysis), or the authors should acknowledge that both could be affected.

      Response: We thank the reviewer very much for comments and suggestions. We partially agree with the comments that the depletion of transporters could affect lipids scavenged from the host cells, as endocytic vesicles are indeed derived from the parasite plasma membrane at the micropore and potentially from the host cell endo-membrane system, as demonstrated with the micropore endocytosis in our previous study (pmid: 36813769). Our latest study has addressed this by showing that the endocytic trafficking of GFP vesicles is regulated by prenylation of proteins (e.g. Rab1B and YKT6.1), depletion of which resulted in diffusion of GFP vesicles, but not disappearance of GFP vesicles in the parasites (pmid: 37548452), indicating that the vesicles (containing lipids) enter the parasites. In the current manuscript, the percentage of parasites containing GFP foci was significantly reduced in AMT1/AMT2-depleted parasites, and instead, parasites containing GFP diffusion appeared and the percentage was almost equal to the reduced level of parasites with GFP foci. These results suggested that endocytic vesicles (e.g. GFP vesicles) were continuously generated by the micropore in the parasites depleted with AMT1/AMT2, and that the vesicle trafficking was regulated by proteins modified by IPP derivatives that were derived from the apicoplast. Based on these observations, we considered that lipids in endocytic vesicles should not contribute to the reduced level of fatty acids and other lipids in parasites depleted with AMT1/AMT2. We have added in a short discussion concerning the fatty acids and lipids reduced in the parasites.

      Reviewer #2 (Public Review):

      In this study Hui Dong et al. identified and characterized two transporters of the monocarboxylate family, which they called Apcimplexan monocarboxylate 1 and 2 (AMC1/2) that the authors suggest are involved in the trafficking of metabolites in the non-photosynthetic plastid (apicoplast) of Toxoplasma gondii (the parasitic agent of human toxoplasmosis) to maintain parasite survival. To do so they first identified novel apicoplast transporters by conducting proximity-dependent protein labeling (TurboID), using the sole known apicoplast transporter (TgAPT) as a bait. They chose two out of the three MFS transporters identified by their screen based and protein sequence similarity and confirmed apicoplast localisation. They generated inducible knock down parasite strains for both AMC1 and AMC2, and confirmed that both transporters are essential for parasite intracellular survival, replication, and for the proper activity of key apicoplast pathways requiring pyruvate as carbon sources (FASII and MEP/DOXP). Then they show that deletion of each protein induces a loss of the apicoplast, more marked for AMC2 and affects its morphology both at its four surrounding membranes level and accumulation of material in the apicoplast stroma. This study is very timely, as the apicoplast holds several important metabolic functions (FASII, IPP, LPA, Heme, Fe-S clusters...), which have been revealed and studied in depth but no further respective transporter have been identified thus far. hence, new studies that could reveal how the apicoplast can acquire and deliver all the key metabolites it deals with, will have strong impact for the parasitology community as well as for the plastid evolution communities. The current study is well initiated with appropriate approaches to identify two new putatively important apicoplast transporters, and showing how essential those are for parasite intracellular development and survival. However, in its current state, this is all the study provides at this point (i.e. essential apicoplast transporters disrupting apicoplast integrity, and indirectly its major functions, FASII and IPP, as any essential apicoplast protein disruption does). The study fails to deliver further message or function regarding AMC1 and 2, and thus validate their study. Currently, the manuscript just describes how AMC1/2 deletion impacts parasite survival without answering the key question about them: what do they transport? The authors yet have to perform key experiments that would reveal their metabolic function. I would thus recommend the authors work further and determine the function of AMC1 and 2.

      Response: We thank very much the reviewer for his/her positive evaluation of our work. To address the detailed function of the transporters, in the past three months, we have re-constructed plasmids (with codon-optimized DNA sequences of the genes) for expression of the transporters in a regular expression E. coli strain (BL21DE3) and in a pyruvate import knockout E. coli strain (a gift from Prof. Kirsten Jung), to examine the transport capability in vitro. And, we have re-constructed a new plasmid containing a new leading peptide for targeting the pyruvate sensor PyronicSF to the apicoplast in the parasite, to probe the possible substrate pyruvate. However, we were unable to successfully observe expression of the transporters in the above E. coli strains, and we were unable to target the sensor to the correct localization (the apicoplast) in the parasite. As a result, all these efforts have led the study to the current version of manuscript on the functional identification of transporters. We will keep working on this aspect, attempting to dissect out the exact transport function of the transporters in the near future. In this current manuscript, we have discussed the limitations of our study in the last part of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      Line 35: ...appears to have evolved...

      Line 67: remove first comma

      Line 105: thereafter or therefore?

      Line 130: define ACP

      Line 131: define TMD

      Response: We thank very much the reviewer for the suggestions, and we have revised the points in the current manuscript.

      Figure 1: more information on APT1 would be helpful for readers to interpret the results from turboID e.g., consider showing an illustration showing, according to Karnataki et al 2007 that APT1 likely occupies all 4 membranes of the apicoplast. Also, according to DeRocher et al 2012, APT1 N-term and C-term are both cytosolically exposed, at least in the outermost membrane. The orientation in the other membranes is not known.

      Response: We thank very much the reviewer for the suggestions. We analyzed the localization information of APT1 in T. gondii, based on the studies as the reviewer proposed (Karnataki, et al., 2007; DeRocher et al., 2012). The HA tag at the C-terminus of APT1 was distributed at the four membranes of the apicoplast, indicating that the topology of APT1 might be difficult to be defined at the membranes. Considering this information, we felt hesitant to clearly describe the topology in a schematic diagram about the protein APT1. Nevertheless, the TurboID tagging at the C-terminus of APT1 was an excellent model for identification of potential transporters localized at membranes of the apicoplast. We have put more information about the topology of APT1 in the manuscript, thus providing a better understanding of the proteomic results.

      Figure 2: add a space between "T." and "gondii"

      Figure 2: remove period between "Fitness" and "scores"

      Figure 2: different fonts are used within the figure. Consider using only one font such as arial. Same for Figure 4.

      Figure 2: "Fitness scores" is not bold in panel A but is bold in panel B.

      Response: We thank very much the reviewer for the suggestions. We have revised the points in the current version of the manuscript.

      Line 187: superscript -7

      Line 249: Caution should be used in interpreting two bands as being a precursor and mature product without additional experiments to establish such a relationship. Consider using the term "might" rather than "appear to". The presence of multiple bands could be due to phenomena other than proteolytic processing e.g., alternative splicing, alternative initiator codons, etc.

      Response: We thank very much the reviewer for the suggestions. We have revised the sentences in the current version of manuscript.

      Line 291: define IPP

      Figure 3E. The data points for KD strains appear to be positioned above the zero value on the y-axis. Is this correct?

      Response: We thank very much the reviewer for the suggestions. We have rechecked the figure and replaced it with the correct one.

      Figure 3 G/H legend. Please describe what a single data point represents e.g., the average of one field of view, the average of a certain number of fields of view, or something else? Are the data combined from three experiments or from a representative experiment?

      Response: We thank very much the reviewer for the suggestions. Three independent experiments were performed with at least three replicates. At least 150 vacuoles were scored in each replicate, thus resulting in at least 9 data points in total. The data points were shown with the results from each replicate.

      Line 325: define MEP and explain how it is connected to IPP

      Response: We thank very much the reviewer for the suggestions. We have provided the information in the current version of the manuscript.

      Lines 351-355: The authors refer to Figure 4D to support this statement, but presumably they mean 4E. Also, the authors use the terms C14, C16, and C18. They should more precisely use the terms myristic acid, palmitoleic acid, and trans_oleic acid if this is what they are referring to. Finally, the authors should determine if there is a statistically significant difference between levels of these fatty acids between AMT1 KD and AMT2 KD. If not, they should suggest there is an overall trend toward lower levels of these fatty acids in AMT2 KD parasites compared to AMT1 KD parasites.

      Response: We thank very much the reviewer for the suggestions. We have revised the information in the current version of the manuscript.

      Lines 363-364: The basis of this comment is unclear. Please clarify.

      Lines 369-370: the authors have not shown that the observed lower levels of fatty acids are due to synthesis, as noted above

      Response: We thank very much the reviewer for the suggestions. We have accordingly revised the information in the current version of the manuscript.

      Line 383: Should be Figure S6D

      Line 386: An entire section of the results is used to describe data that are entirely in a supplemental figure. Consider moving this data to a main figure.

      Response: We thank very much the reviewer for the suggestions. We have transferred the data to the main figure in the current version of the manuscript.

      Line 391: Consider using the term virulence instead of growth since now experiments were performed to specifically assess parasite growth in the infected mice.

      Response: We thank very much the reviewer for the suggestions. We have revised the terms in the Results section.

      Line 427: Perhaps the authors mean "...strong growth defect..." or ...strong growth impairment..."

      Line 460-461: This statement is unclear. Please explain how strong backgrounds in proteomics have made it difficult to identify apicoplast transporters. Because they are low abundance? Because they are membrane proteins?

      Response: We thank very much the reviewer for the suggestions. We have revised the corresponding sentences in the current version. The strong backgrounds in the proteomics resulted from the high activity and nonspecific labeling of biotin ligase fused with the apicoplast proteins.

      518-521: It would be helpful for non-specialists if the authors explained how pyruvate is connected to IPP biosynthesis.

      523: delete period after "Escherichia"

      548-549: "We observed similar decreases in level of the MEP biosynthesis activity upon depletion of AMT1 and AMT2..." Reword this since no experiments were done to measure MEP biosynthesis activity.

      Response: We thank very much the reviewer for the suggestions. We have accordingly revised the relevant sentences in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      • The metabolomic data on fatty acid synthesis and isoprenoid levels is relevant but cannot inform about the function of the transporter, since any protein causing loss of the apicoplast would behave in such a manner, i.e. block the apicoplast pathways.

      Response: We thank very much the reviewer for the comment. We agree with this comment. We have thus discussed these points in a subsection in the Discussion, pointing out some of the limitations in the study.

      • Currently, the manuscript fails to directly prove what AMC1 and AMC2 transports, potentially pyruvate as suggested to putatively fuel FASII and MEP/DOXP. Further experimental approaches using exogenous complementation and/or metabolomic analyses using stable isotope labelling (for example) should potentially bring light to the putative functions of AMC1/2.

      Response: We thank very much the reviewer for the comments. As described above, we attempted several approaches to find out the substrates that the AMT1 and AMT2 transports. However, we could not successfully express the proteins in E. coli strains, and we did not generate a T. gondii strain that a pyruvate sensor was properly targeted to the apicoplast. At the end of the Discussion, we have a subsection that discusses the limitations of this study. We hope that our future approaches will be able to tackle these difficulties on the substrate identification.

      Furthermore, the authors have not considered other pathways of interest, like heme or lysophosphatidic acid (LPA)n synthesis, which are two other key pathway, which may be related to AMC1/2 function. Those proposed experiments represent an important body of work, required to bring light to their metabolic functions.

      Response: We thank very much the reviewer for the comments. We thought about that, but we finally decided to mainly discuss two of the pathways that the transporters might participate in, since the transporters contain specific domains on the proteins sequences that potentially are associated with pyruvate.

      Further, the authors might have partially missed some referencing and data about the apicoplast in their introduction (and potentially to address other facets of the apicoplast metabolic functions/capacities in regards to AMC1/2 function): the introduction referencing and explanations are somehow not fully exact/precise for the part of the apicoplast and its pathway: references about the apicoplast, discovery and origin are not citing the original work (that should be Wilson et al. 1996, McFadden et al. 1996, Kohler et al. 1997,), same for the discovery of FASII and MEP./DOXP (Waller 1998, Jomaa et al...). The introduction (and the study?) lacks information about other key functions of the apicoplast: heme synthesis, lysophosphatidic acid synthesis (using FASII products). The explanations about the roles of FASII/DOXP are partial and not fully citing important references: Krishnan et al. 2020, and Amiar et al. 2020 are also key to understanding how the role of FASII is metabolically flexible depending on nutrient content. A whole part on the fact that FASII is not only dispensible but can also become essential under metabolic adaptations conditions, are missing (Botté et al. 2013, Amiar et al. 2020, Primo et al. 2021). These novel important facets of parasite biology should be mentioned as well as directly linked to the author's topic. This is more minor but could bring new ideas to the authors.

      Response: We thank very much the reviewer for the suggestions. We have revised the relevant part in the introduction.

      We are grateful for the suggestions to improve the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents a valuable conceptual advance of how Vitamin A and its derivatives contribute to atherosclerosis. There is solid evidence invoking the contributions of specialized populations of T cells in atherosclerosis resolution, including use of multiple in vivo models to validate the functional effect. The significance of the study would be strengthened with more detailed interrogation of lesions composition and consolidation with previous work on the topic from human studies.

      Answer: We thank the reviewers and editorial office for their comments and constructive criticism. Below we provide point by point responses to the comments and concerns, which include the issues of lesion composition and consolidation with human studies. We also proofread the manuscript and included information about the immunostaining procedures that were previously missing (Lines 199 – 206).

      Public Reviews

      REVIEWER #1:

      This is an interesting study by Pinos and colleagues that examines the effect of beta carotene on atherosclerosis regression. The authors have previously shown that beta carotene reduces atherosclerosis progress and hepatic lipid metabolism, and now they seek to extend these findings by feeding mice a diet with excess beta carotene in a model of atherosclerosis regression (LDLR antisense oligo plus Western diet followed by LDLR sense oligo and chow diet). They show some metrics of lesion regression are increased upon beta carotene feeding (collagen content) while others remain equal to normal chow diet (macrophage content and lesion size). These effects are lost when beta carotene oxidase (BCO) is deleted. The study adds to the existing literature that beta carotene protects from atherosclerosis in general, and adds new information regarding regulatory T-cells. However, the study does not present significant evidence about how beta-carotene is affecting T-cells in atherosclerosis. For the most part, the conclusions are supported by the data presented, and the work is completed in multiple models, supporting its robustness. However there are a few areas that require additional information or evidence to support their conclusions and/or to align with the previously published work.

      Specific additional areas of focus for the authors:

      1. The premise of the story is that b-carotene is converted into retinoic acid, which acts as a ligand of the RAR transcription factor in T-regs. The authors measure hepatic markers of retinoic acid signaling (retinyl esters, Cyp26a1 expression) but none of these are measured in the lesion, which calls into question the conclusion that Tregs in the lesion are responsible for the regression observed with b-carotene supplementation.

      Answer: We agree with the Reviewer’s comment, which prompted us to quantify the expression of the retinoic acid-sensitive maker Cyp26b1 in the atherosclerotic lesions. Cyp26b1, together with Cyp26a1 and c1, contain retinoic acid response elements (RAREs) in their promoter, and therefore, are highly sensitive to retinoic acid. Indeed, the mRNA/protein expression of Cyp26s are widely considered surrogate markers for retinoic acid levels in cells or tissues.

      We typically use Cyp26a1 as a surrogate marker for retinoic acid signaling in the adipose tissue and the liver, as we did in this study. However, our RNA seq data in murine bone-marrow derived macrophages (mBMDMs) exposed to retinoic acid revealed that Cyp26b1 is the only Cyp26 family member responsive to retinoic acid (PMID: 36754230). Actually, Cyp26a1 or c1 were not expressed in our mBMDMs (data not shown). Unlike the M2 marker arginase 1, Cyp26b1 did not respond to IL-4 (Figure iA). Hence, Cyp26b1 is an adequate marker to evaluate retinoic acid signaling in the lesion of mice, rich in macrophages.

      Before staining the lesions, we validated the Cyp26b1 antibody by staining mBMDMs exposed to retinoic acid (Figure iB).

      Author response image 1.

      (A) mBMDMs were divided in M0 or M2 (exposed to IL-4 for 24 h), and then treated with either DMSO or retinoic acid for 6 h before harvesting for RNA seq analysis. Exploring the RNA seq dataset, we identified Cyp26b1 as a RA-sensitive gene in mBMDMs (PMID: 36754230). (B) Validation of Cyp26b1 antibody in mBMDMs exposed to retinoic acid confirms the suitability of this antibody for measuring retinoic acid signaling in our experimental settings.

      In the current version of the manuscript, we include the results of Cyp26b1 quantifications (Figure 5H, I), (Lines: 362 - 366). To put these findings in perspective to human studies, we discuss these results with the role human CYP26B1 plays in the atherosclerotic lesion (Lines: 450 - 464).

      1. There does not appear to be a strong effect of Tregs on the b-carotene induced pro-regression phenotype presented in Figure 5. The only major CD25+ cell dependent b-carotene effect is on collagen content, which matches with the findings in Figure 1 +2. This mechanistically might be very interesting and novel, yet the authors do not investigate this further or add any additional detail regarding this observation. This would greatly strengthen the study and the novelty of the findings overall as it relates to b-carotene and atherosclerosis.

      Answer: As the Reviewer points out, the effects of β-carotene on collagen content are more pronounced than those on CD68 content in the lesion. Indeed, we have observed the majority of the experiments in this manuscript.

      Collagen accumulation in the lesion is a complex process, where smooth muscle cells secrete collagen and plaque macrophages (typically) degrade it. Matrix metalloproteases produced by macrophages contribute to the degradation of collagen, and studies show that retinoic acid regulates the expression of metalloproteinases in various cell types (PMID: 2324527, 24008270). We explored the expression of metalloproteases in macrophages exposed to retinoic acid in our mBMDM RNA seq, but we did not observe any significant result (data not shown).

      Interestingly, M2 macrophages can secrete collagen by upregulating arginase 1 expression. In the current version of the manuscript, we acknowledge this in the results (Lines: 358-359) and in the discussion section (Lines: 443-449).

      1. The title indicates that beta-carotene induces Treg 'expansion' in the lesion, but this is not measured in the study.

      Answer: Following the suggestion by the Reviewer, we have re-worded the title to “β-carotene accelerates the resolution of atherosclerosis in mice”

      REVIEWER #2:

      Pinos et al present five atherosclerosis studies in mice to investigate the impact of dietary supplementation with b-carotene on plaque remodeling during resolution. The authors use either LDLR-ko mice or WT mice injected with ASO-LDLR to establish diet-induced hyperlipidemia and promote atherogenesis during 16 weeks, and then they promote resolution by switching the mice for 3 weeks to a regular chow, either deficient or supplemented with b-carotene. Supplementation was successful, as measured by hepatic accumulation of retinyl esters. As expected, chow diet led to reduced hyperlipidemia, and plaque remodeling (both reduced CD68+ macs and increased collagen contents) without actual changes in plaque size. But, b-carotene supplementation resulted in further increased collagen contents and, importantly, a large increase in plaque regulatory T-cells (TREG). This accumulation of TREG is specific to the plaque, as it was not observed in blood or spleen. The authors propose that the anti-inflammatory properties of these TREG explain the atheroprotective effect of b-carotene, and found that treatment with anti-CD25 antibodies (to induce systemic depletion of TREG) prevents b-carotene-stimulated increase in plaque collagen and TREG.

      1. An obvious strength is the use of two different mouse models of atherogenesis, as well as genetic and interventional approaches. The analyses of aortic root plaque size and contents are rigorous and included both male and female mice (although the data was not segregated by sex). Unfortunately, the authors did not provide data on lesions in en face preparations of the whole aorta.

      Answer: We appreciate the positive comments on rigor. We considered displaying our data segregated by sex, although for some experiments, we did not have matching numbers of male and female mice, which could be distracting for the reader. The goal of our study was to analyze changes in plaque composition. Therefore, our experimental approach was designed to study atherosclerosis resolution (plaque composition changes, but not plaque size) instead of atherosclerosis regression (both plaque composition and size change). As expected, we did not observe differences in plaque size at the level of the atherosclerotic root for any of our experiments, which deterred us from quantifying plaque content by en-face in the aorta.

      2.Overall, the conclusion that dietary supplementation with b-carotene may be atheroprotective via induction of TREG is reasonably supported by the evidence presented. Other conclusions put forth by the authors (e.g., that vitamin A production favors TREG production or that BCO1 deficiency reduces plasma cholesterol), however, will need further experimental evidence to be substantiated.

      Answer: We apologize for the lack of clarity in the presentation of our results and overstating our conclusions. We have rephrased some of these conclusions in the results and discussion sections.

      3.The authors claim that b-carotene reduces blood cholesterol, but data shown herein show no differences in plasma lipids between mice fed b-carotene-deficient and -supplemented diets (Figs. 1B, 2A, and S3A).

      Answer: As Reviewer 2 points out, we did not observe changes in plasma cholesterol between mice undergoing Resolution in response to β-carotene. For clarity, we rephrased our plasma lipids results for each of our experimental designs (Lines: 230 – 236, 270 – 272, and 288-290). We also include a clarification in the discussion section about the differential effects of β-carotene on plasma lipids when mice undergo atherosclerosis progression and resolution. (Lines: 419 - 430).

      1. Also, the authors present no experimental data to support the idea that BCO1 activity favors plaque TREG expansion (e.g., no TREG data in Fig 3 using Bco1-ko mice).

      Answer: We appreciate the suggestion by the Reviewer 2. In the current version of the manuscript, we stained the aortic roots from Bco1-/- mice for FoxP3. We did not observe differences between Control and β-carotene resolution groups, in agreement with the results in plaque composition (CD68 and collagen contents). These new data strengthen our manuscript and now we included these results as a Supplementary Figure 3D, E. (Lines: 465 - 471).

      5.As the authors show, the treatment with anti-CD25 resulted in only partial suppression of TREG levels. Because CD25 is also expressed in some subpopulation of effector T-cells, this could potentially cloud the interpretation of the results. Data in Fig 4H showing loss of b-carotene-stimulated increase in numbers of FoxP3+GFP+ cells in the plaque should be taken cautiously, as they come from a small number of mice. Perhaps an orthogonal approach using FoxP3-DTR mice could have produced a more robust loss of TREG and further confirmation that the loss of plaque remodeling is indeed due to loss of TREG.

      Answer: We agree with the reviewer, and we rephrased the results and discussion to avoid overstating our findings. We now acknowledge a second experimental approach would help us confirm our findings employing a blocking antibody targeting CD25. We favored the use of anti-CD25 infusions over other depletion methods based on the experimental protocol carried out by our collaborators in which the examined the effect of Tregs on atherosclerosis regression (PMID: 32336197). The utilization of FoxP3-DTR mice would nicely complement our findings. In the current version of the manuscript, we discuss this alternative approach (Line : 491 - 501).

      Recommendations for the Authors

      All reviewers agreed that despite the claims of the title, there is no direct interrogation of Tregs or vitamin A signaling in lesions.

      The work does not consolidate well with the role of B-carotene in human heart disease. Additional discussion and synthesis are required to elaborate on the significance of the findings. For example, the idea of beta carotene supplementation for cardiovascular prevention has attracted attention for years but recent meta-analysis showed no benefit, and, if anything, an increase in cardiovascular events. The U.S. Preventive Services Task Force (USPSTF) went as far to recommend AGAINST the use of beta-carotene for the prevention of cardiovascular disease.

      In light of the above point and elife editorial policies, please revise the title to include species.

      Answer: Thanks for your feedback. Carotenoid metabolism in mammals is complex, and establishing direct parallelisms between humans and rodents must be done with caution. For example, β-carotene supplementation in humans inevitably results in the accumulation of this compound in plasma, while in rodents, β-carotene is quickly metabolized to vitamin A. Our findings over the years reveal that the effects of β-carotene in mice derive exclusively from its role as vitamin A precursor.

      In the current study, we confirm our previous work utilizing Bco1-/- mice, which are unable to produce vitamin A when fed β-carotene. Then, we observe that vitamin A promotes atherosclerosis resolution in mice independently of alterations in plasma cholesterol in two independent mouse models. Lastly, we utilized anti-CD25 blocking antibodies to deplete Tregs to establish a direct connection between dietary β-carotene/vitamin A and Tregs in the lesion. While this experimental approach failed to completely deplete Tregs, our morphometric assays indicates that these infusions were sufficient to partially mitigate the effect of β-carotene on atherosclerosis resolution.

      Regardless, in the discussion section of our manuscript, we attempt to consolidate our preclinical studies with clinical data (Lines: 374 – 376, and 461 – 464).

      We have also revised the title, as suggested by Reviewer 1. We also included “mice” in the title to align with the editorial policies of eLife.

      Reviewer #1:

      1.1. The authors need to measure retinoic acid signaling directly in the lesion and in Tregs to be able to draw the conclusion that b-carotene is directly activating Tregs to promote regression.

      Answer: Please see comments above.

      1.2. The authors to investigate the role of beta carotene on collagen production by T-regs.

      Answer: Please see comments above.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      2.1. If the authors still have frozen sections of the aortas from their Bco1-ko experiment, it should be trivial to look at plaque TREG contents to confirm that vitamin A production is indeed needed for the effect of b-carotene on plaque remodeling.

      Answer: Please see comments above.

      Minor:

      2.2. This reviewer wonders if the axis for lesion size in all figures is off by an order of magnitude. Most studies show aortic root lesions in the 10^5 um2 range, not in the 10^6 um2.

      Answer: We apologize for this error. We have corrected the units in all our quantifications.

      2.3. FPLC lipoprotein profiles would enhance the manuscript.

      Answer: We have run FPLCs for the plasmas and included them in the results (Lines: 233 – 236). Data are presented in Figure 1C, D.

      2.4.This reviewer could not cope with the thought that mice that are fed 16+ weeks a diet that is vitamin A-deficient did not become vit A-deficient (e.g., Fig. 1E). Perhaps the authors could elaborate a little on this in their discussion.

      Answer: Mice are extremely resistant to vitamin A deficiency. A common protocol to achieve deficiency in mice requires feeding a vitamin A deficient diet to dams during their pregnancy and lactation to deplete new-born pups of vitamin A stores. Even in that situation, pups display enough vitamin A stores to sustain circulating vitamin A levels to those observed in wild-type mice. In the current version of the manuscript, we have included a paragraph in the discussion to cover this “interesting” aspect. (Lines: 476 – 483).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The evolution of transporter specificity is currently unclear. Did solute carrier systems evolve independently in response to a cellular need to transport a specific metabolite in combination with a specific ion or counter metabolite, or did they evolve specificity from an ancestral protein that could transport and counter-transport most metabolites? The present study addresses this question by applying selective pressure to Saccharomyces cerevisiae and studying the mutational landscape of two well-characterised amino acid transporters. The data suggest that AA transporters likely evolved from an ancestral transporter and then specific sub-families evolved specificity depending on specific evolutionary pressure.

      Strengths:

      The work is based on sound logic and the experimental methodology is well thought through. The data appear accurate, and where ambiguity is observed (as in the case of citruline uptake by AGP1), in vitro transport assays are carried out to verify transport function.

      Weaknesses:

      Although the data and findings are well described, the study lacked additional contextual information that would support a clear take-home message.

      We appreciate the reviewer’s positive assessment of the work, and the helpful comment to summarize the findings into a short take-home message. We chose not to discuss protein evolution theories in detail to keep the text as concise as possible. However, we do acknowledge the fact that the reader might want to see our results embedded in more context. In a revised version, we will integrate our findings more with the pertinent literature, which will show how our results align with theoretical models for protein evolution towards novel functions. We will also discuss in more detail how our laboratory results could be translated into a “natural” setting of evolution.

      Reviewer #2 (Public Review):

      Summary:

      This paper describes evolution experiments performed on yeast amino acid transporters aiming at the enlargement of the substrate range of these proteins. Yeast cells lacking 10 endogenous amino acid transporters and thus being strongly impaired to feed on amino acids were again complemented with amino acid transporters from yeast and grown on media with amino acids as the sole nitrogen source.

      In the first set of experiments, complementation was done with seven different yeast amino acid transporters, followed by measuring growth rates. Despite most of them have been described before in other experimental contexts, the authors could show that many of them have a broader substrate range than initially thought.

      Moving to the evolution experiments, the authors used the OrthoRep system to perform random mutagenesis of the transporter gene while it is actively expressed in yeast. The evolution experiments were conducted such that the medium would allow for poor/slow growth of cells expressing the wt transporters, but much better/faster growth if the amino acid transporter would mutate to efficiently take up a poorly transported (as in the case of citrulline and AGP1) or non-transported (as in case of Asp/Glu and PUT4) amino acid.

      This way and using Sanger sequencing of plasmids isolated from faster-growing clones, the authors identified a number of mutations that were repeatedly present in biological replicates. When these mutations were re-introduced into the transporter using site-directed mutagenesis, faster growth on the said amino acids was confirmed. Growth phenotype data were attempted to be confirmed by uptake experiments using radioactive amino acids; however, the radioactive uptake data and growth-dependent analyses do not fully match, hinting at the existence of further parameters than only amino acid uptake alone to impact the growth rates.

      When mapped to Alphafold prediction models on the transporters, the mutations mapped to the substrate permeation site, which suggests that the changes allow for more favourable molecular interactions with the newly transported amino acids.

      Finally, the authors compared the growth rates of the evolved transporter variants with those of the wt transporter and found that some variants exhibit a somewhat diminished capacity to transport its original range of amino acids, while other variants were as fit as the wt transporter in terms of uptake of its original range of amino acids.

      Based on these findings, the authors conclude that transporters can evolve novel substrates through generalist intermediates, either by increasing a weak activity or by establishing a new one.

      Strengths:

      The study provides evidence in favour of an evolutionary model, wherein a transporter can "learn" to translocate novel substrates without "forgetting" what it used to transport before. This evolutionary concept has been proposed for enzymes before, and this study shows that it also can be applied to transporters. The concept behind the study is easy to understand, i.e. improving growth by uptake of more amino acids as nitrogen source. In addition, the study contains a large and extensive characterization of the transporter variants, including growth assays and radioactive uptake measurements.

      Weaknesses:

      The authors took a genetic gain-of-function approach based on random mutagenesis of the transporter. While this has worked out for two transporters/substrate combinations, I wonder how comprehensive and general the insights are. In such approaches, it is difficult to know which mutation space is finally covered/tested. And information that can be gained from loss-of-function analyses is missed. The entire conclusions are grounded on a handful of variants analyzed. Accordingly, the outcome is somewhat anecdotal; in some cases, the fitness of the variants was changed and in others not. Highlighting the amino acid changes in the context of the structural models is interesting, but does not fully explain why the variants exhibit changed substrate ranges. Two important technical elements have not been studied in detail by the authors, but may well play a certain role in the interpretation of the results. Firstly, the authors did not quantify the amount of transporter being present on the cell surface; altered surface expression can impact uptake rates and thus growth rates. Secondly, the authors have not assessed whether overexpressing wt versus variant transporters has an impact on the growth rate per se. Overexpressing transporters from plasmids is quite a burden for the cells and often impacts growth rates. Variants may be more or less of a burden, an effect that may (or may also not) go hand in hand with increased/decreased surface production levels.

      And finally, I was somewhat missing an evolutionary analysis of these transporters to gain insights into whether the identified substitutions also occurred during natural evolution under real-life conditions.

      First of all, we thank the reviewer for the attention to detail with which they have read the manuscript, and the very helpful comments on how to improve it. We will indeed take on some of the suggestions in a revised version of the text:

      Regarding the match of growth rate and uptake rate measurements, we plan to plot their correlation in a graph.

      Regarding the amount of transporter on the plasma membrane, we acknowledge that the visual representation of the fluorescence micrographs already in the text might not be enough. We therefore will quantify expression levels from said micrographs and include the information in the manuscript.

      On a similar note, we had already measured the growth rates of all transporter variant cultures in the absence of selection for amino acid uptake (i.e., in medium with ammonium as the nitrogen source; Figure 4 - Supplement figure 1). We will include the measured growth rates in the text to give an indication of what the impact of transporter overexpression is on the growth rate per se.

      Regarding the proposed analysis of natural transporter sequences, we do see the possible value in such an analysis. However, it is currently out of scope for the present study. The reasons are 1) that preliminary analyses show that the sequence similarity of functionally verified/annotated transporters is too low to reliably pinpoint a phenotype to a single residue, and 2) that we do not envision that the variants that we discovered are necessarily beneficial in a natural setting, where fine-grained regulation of amino acid transport may be more important than a broad substrate range. Regarding the generality of the insights, we do agree on the reviewer’s comment that we “only” analyzed a relatively small number of variants. However, the target of the study was not to generate high-throughput data on a large set of variants (e.g., by NGS of the whole culture) but to provide in-depth data for characterized and verified variants in a clean genetic background (i.e., verified phenotype and fitness measurements on all native and novel substrates).

      As to the mutation space, we will include an estimate in a revised version of the text. We estimate that a majority of all possible single mutants is covered in the first and second passages of the selection experiment, which is corroborated by the fact that we repeatedly find the same mutants in biological replicates.

      Regarding the mentioned loss-of-function analyses, we are unsure about what the reviewer intends with this statement at this point. To briefly summarize, we feel that our results are a good indication that transporters can evolve new functions analogously to enzymes. We explicitly do not imply that this is the only way to evolve novelty.

      Reviewer #3 (Public Review):

      The goal of the current manuscript is to investigate how changes in transporter substrate specificity emerge through experimental evolution. The authors investigate the APC family of amino acid transporters, a large family with many related transporters that together cover the spectrum of amino acid uptake in yeast.

      The authors use a clever approach for their experimental evolutions. By deleting 10 amino acid uptake transporters in yeast, they develop a strain that relies on amino acid import by introducing APC transporters under nitrogen-limiting conditions. They can thus evolve transporters towards the transport of new substrates if no other nitrogen source is available. The main takeaway from the paper is that it is relatively easy for the spectrum of substrates in a particular transporter of this family to shift, as a number of single mutants are identified that modulate substrate specificity. In general, transporters evolved towards gain-of-function mutations (better or new activities) and also confer transport promiscuity, expanding the range of amino acids transported.

      The data in the paper support the conclusions, in general, and the outcomes (evolution towards promiscuity) agree with the literature available for soluble enzymes. However, it is also a possibility that the design of these experiments selects for promiscuity among amino acids. The selections were designed such that yeast had access to amino acids that were already transported, with a greater abundance of the amino acid that was the target of selection. Under these conditions, it seems probable that the fittest variants will provide the yeast access to all amino acid substrates in the media, and unlikely that a specificity swap would occur, limiting the yeast to only the new amino acid.

      The authors also examine the fitness costs of mutants, but only in the narrow context of growth on a single (original) amino acid under conditions of nitrogen limitation. Amino acid uptake is typically tightly controlled because some amino acids (or their carbon degradation products) are toxic in excess. This paper does not address or discuss whether there might be a fitness cost to promiscuous mutants in conditions where nitrogen is not limiting.

      We are grateful for the reviewer’s insightful comments on the paper.

      Regarding the design of our experiments, we followed the concept of directed evolution as described by pioneers of the field, in which the starting point for evolving a protein is to have a basic level of that activity. In the case of AGP1, the promiscuous activity is Cit uptake. We recognize that elimination of all the already transported amino acids from the evolution media could also yield very insightful results. However, we aimed to simulate the effect of the evolutionary pressure acting in a “natural” environment, where the uptake of the specific amino acid is not initially crucial for its survival. In the case of PUT4, the experimental design was chosen to ensure the initial survival of the culture (since neither Glu nor Asp support the growth of the strain) by providing a low level of already transported amino acids. In the revised manuscript, we will state this more clearly.

      Regarding the second point, we agree that a short discussion about the potentially detrimental effects of promiscuous transporters would be beneficial for the reader. We will touch on this aspect in the revised version of the text. Indeed, our system is intentionally simplified, as we try to take regulation of transport out of the equation (e.g., by using the constitutive ADH1 promoter as opposed to a nitrogen-regulated one). In a natural setting, microorganisms encounter fluctuations of nutrient availability, necessitating tight control of nutrient transport. This is probably a major reason why microorganisms typically encode transporters with redundant specificities (i.e., promiscuous and specific ones). Otherwise, one very broad-range nutrient transporter would suffice. In our system, we artificially select for broad-range transport, which is reflected in the observed phenotypes of the evolved transporters. We expect that in a natural setting, a broad-range transporter would be a stepping stone to evolve a narrow-range transporter with a new specificity (which is actually what we see in the double-mutant AGP1-NV, with lowered fitness in original substrates and increased fitness in Cit).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our understanding of the ways in which different types of communication signals differentially affect mouse behaviors and amygdala cholinergic/dopaminergic neuromodulation. Researchers interested in the complex interaction between prior experience, sex, behavior, hormonal status, and neuromodulation should benefit from this study. Nevertheless, the data analysis is incomplete at this stage, requiring additional analysis and description, justification, and - potentially - power to support the conclusions fully. With the analytical part strengthened, this paper will be of interest to neuroscientists and ethologists.

      GENERAL COMMENTS ON REVIEWS AND REVISIONS

      Experimental design

      Here we address questions from several reviewers regarding our periods of neuromodulator and behavioral analysis. First, we recognize that the text would benefit from an overview of the experimental structure different from the narrative we provide in the first paragraphs of the Results. We now include this near the beginning for the Materials and Methods (page 17). We further articulate that the 10-minute time periods were dictated by the sampling duration required to perform accurate neurochemical analyses (and to reserve half of the sample in the event of a catastrophic failure of batch-processing samples). Since neurochemical release may display multiple temporal components (e.g., ACh: Aitta-aho et al., 2018) during playback stimulation, and since these could differ across neurochemicals of interest, we decided to collect, analyze, and report in two stimulus periods as well as one Pre-Stim control. We now clarify this in additional text in the Material and Methods (p. 24, lines 20-22; p. 26, lines 17-19). We decided not to include analyses of the post-stimulus period because this is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback.

      We also sought to clarify the meaning of the periods “Stim 1” and “Stim 2”; they are two data collection periods, using the same examplar sequences in the same order. We have added statements in the Material and Methods (p. 18, lines 4-7; Fig. caption, p. 39, lines 11-13) to clarify these periods.

      For behavioral analyses, observation periods were much shorter than 10 mins, but the main purpose of behavioral analyses in this report is to relate to the neurochemical data. As a result, we matched the temporal features of the behavioral and neurochemical analyses (p. 22, lines 17-22). We plan a separate report, focused exclusively on a broader set of behavioral responses to playback, that may examine behaviors at a more granular level.

      Data and statistical analyses

      Reviewers 1 and 3 expressed concerns about our normalization of neurochemical data, suggesting that it diminishes statistical power or is not transparent. We note that normalization is a very common form of data transformation that does not diminish statistical power. It is particularly useful for data forms in which the absolute value of the measurement across experiments may be uninformative. Normalization is routine in microdialysis studies, because data can be affected by probe placement and factors affecting neurochemical recovery and processing. Recent examples include:

      Li, Chaoqun, Tianping Sun, Yimu Zhang, Yan Gao, Zhou Sun, Wei Li, Heping Cheng, Yu Gu, and Nashat Abumaria. "A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice." Neuron (2023).

      Gálvez-Márquez, Donovan K., Mildred Salgado-Ménez, Perla Moreno-Castilla, Luis Rodríguez-Durán, Martha L. Escobar, Fatuel Tecuapetla, and Federico Bermudez-Rattoni. "Spatial contextual recognition memory updating is modulated by dopamine release in the dorsal hippocampus from the locus coeruleus." Proceedings of the National Academy of Sciences 119, no. 49 (2022): e2208254119.

      Holly, Elizabeth N., Christopher O. Boyson, Sandra Montagud-Romero, Dirson J. Stein, Kyle L. Gobrogge, Joseph F. DeBold, and Klaus A. Miczek. "Episodic social stress-escalated cocaine self-administration: role of phasic and tonic corticotropin releasing factor in the anterior and posterior ventral tegmental area." Journal of Neuroscience 36, no. 14 (2016): 4093-4105.

      Bagley, Elena E., Jennifer Hacker, Vladimir I. Chefer, Christophe Mallet, Gavan P. McNally, Billy CH Chieng, Julie Perroud, Toni S. Shippenberg, and MacDonald J. Christie. "Drug-induced GABA transporter currents enhance GABA release to induce opioid withdrawal behaviors." Nature neuroscience 14, no. 12 (2011): 1548-1554.

      However, since all reviewers requested raw values of neurochemicals, we provide these in supplementary tables 1-3. The manuscript references these table early in the Results (p. 6, lines 18-19) and in the Material and Methods (p. 27, lines 3-4)

      All reviewers commented on correlation analyses that we presented, with different perspectives. Reviewer 2 questioned the validity of such analyses, performed across experimental groups, while Reviewer 1 pointed out that the analyses were redundant with the GLM. We agree with these criticisms, and note the challenges associated with correlations involving behaviors for which there is a “floor” in the number of observations. As a result, we have removed most correlation analyses from the manuscript. The text and figures have been modified accordingly. Due these changes, we have to decline requests of Reviewer 3 to include many more such analyses. While correlation analyses could still be performed between neurochemicals and behaviors for each group, the relatively small size of each experimental group, the large number of groups, and the even larger numbers of pairings between neurochemicals and behavior, the statistical power is very low. The only correlations we utilize in the manuscript concern the interpretation of our increased acetylcholine levels.

      As part of this revision, we re-ran our statistical analyses on neuromodulators because of a calculation error in 3 animals (regarding baseline values). In a few instances, a significance level changed, but none of these changed a conclusion regarding neuromodulator changes under our experimental conditions.

      Other revisions

      INTRODUCTION: We modified the Introduction to provide both a more general framework and specific gaps in our understanding relating neuromodulators with vocal communication.

      DISCUSSION: We have added material in the first two pages of the Discussion to provide more framework to our conclusions, to address the issues of the temporal aspects of neurochemical release and behavioral observations, and to identify limitations that should be addressed in future studies.

      FIGURES: All figures are now in the main part of the manuscript. We modified most figures in response to reviewer comments. We removed neuromodulator – behavior correlations from several figures. We modified all box plots to ensure that all data points are visible. The visible data points match the numbers reported in figure captions. We brought 5-HIAA data into the main figures reporting on neuromodulator results.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript addresses a fundamental question about how different types of communication signals differentially affect brain states and neurochemistry. In addition, the manuscript highlights the various processes that modulate brain responses to communication signals, including prior experience, sex, and hormonal status. Overall, the manuscript is well-written and the research is appropriately contextualized. The authors are thoughtful about their quantitative approaches and interpretations of the data.

      That being said, the authors need to work on justifying some of their analytical approaches (e.g., normalization of neurochemical data, dividing the experimental period into two periods (as opposed to just analyzing the entire experimental period as a whole)) and should provide a greater discussion of how their data also demonstrate dissociations between neurochemical release in the basolateral amygdala and behavior (e.g., neurochemical differences during both of the experimental periods but behavioral differences only during the first half of the experimental period). The normalization of neurochemical data seems unnecessary given the repeated-measures design of their analysis and could be problematic; by normalizing all data to the baseline data (p. 24), one artificially creates a baseline period with minimal variation (all are "0"; Figures 2, 3 & 5) that could inflate statistical power.

      Please see our general responses to structure of observation periods and normalization of neuromodulator data. Normalization is a common and appropriate procedure in microdialysis studies that does not alter statistical power.

      We have included a section in the Discussion concerning the temporal relationship between behavioral responses and neurochemical changes in response to vocal playback (p. 12, lines 3-17). We note where the linkage is particularly strong (e.g., ACh release and flinching). This points to a need to examine these phenomena with finer temporal resolution, but also with the recognition that the brain circuits driving a behavioral response may extend beyond the BLA.

      The Introduction could benefit from a priori predictions about the differential release of specific neuromodulators based on previous literature.

      We added some material to the Introduction to provide additional rationale for the study. However, we did not attempt to develop predictions for the range of neuromodulators that we sought to test. The literature can lead to opposite predictions for a given neuromodulator. For example, acetylcholine could be associated with both positive and negative valence. Instead, we note in the Introduction the association of both DA and ACh with vocalizations.

      The manuscript would also benefit from a description of space use and locomotion in response to different valence vocalizations.

      We have provided additional descriptions of space use and video tracking data in Material and Methods (p. 23, lines 1-6). We now report a few correlations based on these data in the Results to demonstrate that increased ACh in Restraint males and Mating estrus females was not related to the amount of locomotion (p. 9, lines 8-14).

      Nevertheless, the current manuscript seems to provide some compelling support for how positive and negative valence vocalizations differentially affect behavior and the release of acetylcholine and dopamine in the basolateral amygdala. The research is relevant to broad fields of neuroscience and has implications for the neural circuits underlying social behavior.

      Reviewer #2 (Public Review):

      Ghasemahmad et al. report findings on the influence of salient vocalization playback, sex, and previous experience, on mice behaviors, and on cholinergic and dopaminergic neuromodulation within the basolateral amygdala (BLA). Specifically, the authors played back mice vocalizations recorded during two behaviors of opposite valence (mating and restraint) and measured the behaviors and release of acetylcholine (ACh), dopamine (DA), and serotonin in the BLA triggered in response to those sounds.

      Strength: The authors identified that mating and restraint sounds have a differential impact on cholinergic and dopaminergic release. In male mice, these two distinct vocalizations exert an opposite effect on the release of ACh and DA. Mating sounds elicited a decrease of Ach release and an increase of DA release. Conversely, restraint sounds induced an increase in ACh release and a trend to decrease in DA. These neurotransmission changes were different in estrus females for whom the mating vocalization resulted in an increase of both DA and ACh release.

      Weaknesses: The behavioral analysis and results remain elusive, and although addressing interesting questions, the study contains major flaws, and the interpretations are overstating the findings.

      Although Reviewer 2 raises several valid issues that we have addressed in our response and revision, we believe that none represent “major flaws” in the study that challenge the validity of our central conclusions. In brief, we will:

      --provide enhanced description of behaviors (pp. 22-23 and Table 1)

      --clarify / modify box-plot representations of data (p 28. Lines 3-9)

      --point to our methods that describe corrections for multiple comparisons (p. 27; lines 15-16)

      --revise figures to clarify sample size (Figs. 3-6)

      Reviewer #3 (Public Review):

      Ghasemahmad et al. examined behavioral and neurochemical responses of male and female mice to vocalizations associated with mating and restraint. The authors made two significant and exciting discoveries. They revealed that the affective content of vocalizations modulated both behavioral responses and the release of acetylcholine (ACh) and dopamine (DA) but not serotonin (5-HIAA) in the basolateral amygdala (BLA) of male and female mice. Moreover, the results show sex-based differences in behavioral responses to vocalizations associated with mating. The authors conclude that behavior and neurochemical responses in male and female mice are experience-dependent and are altered by vocalizations associated with restraint and mating. The findings suggest that ACh and DA release may shape behavioral responses to context-dependent vocalizations. The study has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the BLA while an animal listens to social vocalizations; however, multiple concerns must be addressed to substantiate their conclusions.

      Major concerns:

      1) The authors normalized all neurochemical data to the background level obtained from a single pre-stimulus sample immediately preceding playback. The percentage change from the background level was calculated based on a formula, and the underlying concentrations were not reported. The authors should report the sample and background concentrations to make the results and analyses more transparent. The authors stated that NE and 5-HT had low recovery from the mouse brain and hence could not be tracked in the experiment. The authors could be more specific here by relating the concentrations to ACh, DA, and 5-HIAA included in the analyses.

      Please see our general statement regarding normalization of neurochemical data. We have added supplemental tables that shows concentrations of dopamine, acetylcholine, 5-HIAA. We do not report serotonin or noradrenalin since these were below the detection threshold.

      2) For the EXP group, the authors stated that each animal underwent 90-min sessions on two consecutive days that provided mating and restraint experiences. Did the authors record mating or copulation during these experiments? If yes, what was the frequency of copulation? What other behaviors were recorded during these experiences? Did the experiment encompass other courtship behaviors along with mating experiences? Was the female mouse in estrus during the experience sessions?

      In the mating experience, mounting or attempted mounting was required for the animal to be included in subsequent testing. Since the session lasted 90 minutes, more general courtship behavior was likely. However, we did not record detailed behaviors or track estrous stage for the mating experience. See p. 21, line 20-22.

      3) For the mating playback, the authors stated that the mating stimulus blocks contained five exemplars of vocal sequences emitted during mating interactions. The authors should clarify whether the vocal sequences were emitted while animals were mating/copulating or when the male and female mice were inside the test box. If the latter was the case, it might be better to call the playback "courtship playback" instead of "mating playback".

      We have modified the Results (p. 5, lines 18-20) and Materials and Methods (p. 21, lines 8-15) to clarify our meaning. We continue to use the term “mating” because this refers to a specific set of behaviors associated with mounting and copulation, rather than the more general term “courtship”. We also indicate that we based these behaviors on previous work (e.g., Gaub et al., 2016).

      4) Since most differences that the authors reported in Figure 3 were observed in Stim 1 and not in Stim 2, it might be better to perform a temporal analysis - looking at behaviors and neurochemicals over time instead of dividing them into two 10-minute bins. The temporal analysis will provide a more accurate representation of changes in behavior and neurochemicals over time.

      Please see our general response to the structuring of experimental periods. The 10-min periods are the minimum for the neurochemical analyses, and we adopted the same periods for behavioral analyses to match the two types of observations. Our repeated measures analysis is a form of temporal analysis, since it compares values in three observation periods.

      5) In Figures 2 and 3, the authors show the correlation between Flinching behavior and ACh concentration. The authors should report correlations between concentrations of all neurochemicals (not just ACh) and all behaviors recorded (not just Flinching), even if they are insignificant. The analyses performed for the stim 1 data should also be performed on the stim 2 data. Reporting these findings would benefit the field.

      Please see general comments regarding correlation analyses. We removed almost all such analyses and references to them from the manuscript based on concerns of the other reviewers.

      6) The mice used in the study were between p90 - p180. The mice were old, and the range of ages was considerable. Are the findings correlated with age? The authors should also discuss how age might affect the experiment's results.

      Our p90-p180 mice are not “old”. CBA/CaJ mice display normal hearing for at least 1 year (Ohlemiller, Dahl, and Gagnon, JARO 11: 605-623, 2010) and adult sexual and social behavior throughout our observation period. They are sexually mature adults, appropriate for this study. We decline to perform correlation analyses with age, both because this was not a question for this study and because the very large number of correlations, for each experimental group (as requested by reviewer #2), render this approach statistically problematic.

      7) The authors reported neurochemical levels estimated as the animals listened to the sounds played back. What about the sustained effects of changes in neurochemicals? Are there any potential long-term effects of social vocalizations on behavior and neurochemical levels? The authors might consider discussing long-term effects.

      We have not included discussion of long term effects of neuromodulatory release, both because our data analysis doesn’t address it (see response to Comment #10) and because we desired to keep the Discussion focused on topics more closely related to the results.

      8) Histology from a single recording was shown in supplementary figure 1. It would benefit the readers if additional histology was shown for all the animals, not just the colored schematics summarizing the recording probe locations. Further explanation of the track location is also needed to help the readers. Make it clear for the readers which dextran-fluorescein labeling image is associated with which track in the schematic.

      Based on the recent publications cited in our overall response to reviewer comments about statistical methods, our reporting of histological location of microdialysis exceeds the standard. We believe that the inclusion of all histology is unnecessary and not particularly helpful. Raw photomicrographs do not always illustrate boundaries, so interpretation is required. However, we added a second photomicrograph example and we identified which tracks correspond to these photomicrographs (see Figure 2; now in main body of manuscript).

      9) The authors did not control for the sounds being played back with a speaker. This control may be necessary since the effects are more pronounced in Stim 1 than in Stim 2. Playing white noise rather than restraint or courtship vocalizations would be an excellent control. However, the authors could perform a permutation analysis and computationally break the relationship between what sound is playing and the neurochemical data. This control would allow the authors to show that the actual neurochemical levels are above or below chance.

      We considered a potential “control” stimulus in our experimental design. We concluded, based on our previous work (e.g., Grimsley et al., 2013; Gadziola et al., 2016), that white noise is not or not necessarily a neutral stimulus and therefore the results would not clarify the responses to the two vocal stimuli. Instead, we opted to use experience as a type of control. This control shows very clearly that temporal patterns and across-group differences in neurochemical response to playback disappear in the absence of experience with the associated behavior.

      10) The authors indicated that each animal's post-vocalization session was also recorded. No data in the manuscript related to the post-vocalization playback period was included. This omission was a missed opportunity to show that the neurochemical levels returned to baseline, and the results were not dependent on the normalization process described in major concern #1. The data should be included in the manuscript and analyzed. It would add further support for the model described in Figure 6.

      We decided not to include analyses of the post-stimulus period because this period is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback. We agree that the general question is of interest to the field, but we don’t think our study is best designed to answer that question.

      11) The authors could use a predictive model, such as a binary classifier trained on the CSF sampling data, to predict the type of vocalizations played back. The predictive model could support the conclusions and provide additional support for the model in Figure 6.

      We recognize that a binary classifier could provide an interesting approach to support conclusions. However, we do not believe that the sample size per group is sufficient to both create and test the classifier.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • Introduction: It would be useful to set up an experimental framework before delving into the results. What are the predictions about specific neuromodulators based on previous literature?

      Because this narrative is laid out in the first two paragraphs of the Results, which immediately follow the Introduction, we believe that additional text in the Introduction on the experimental framework is redundant. As stated above, detailing predictions for a range of neuromodulators would make for a long and not particularly illuminating Introduction. We instead have related our findings to more general understanding of DA and ACh in the Discussion.

      • There really isn't a major difference in stimuli during the "Stim 1" and "Stim 2" phases, and it's not clear why the authors divided the experimental period into two phases. Therefore, the authors need to justify their experimental approach. For example, the authors could first anecdotally mention that behavioral responses to playbacks seem to be larger in the first half of the playbacks than during the second half, therefore they individually analyzed each half of the experimental period. Or adopt a different approach to justify their design. Overall, the analytical approach is reasonable but it is currently not justified.

      See general comment for analysis periods. As noted, we clarified these issues in several locations with Materials and Methods (pp. 24, lines 20-22; p. 26, lines 17-19). We also sought to clarify the meaning of the periods “Stim 1” and “Stim 2”; they are two data collection periods, using the same examplar sequences in the same order. We have added statements in the Material and Methods (p. 18, lines 4-7; Fig. caption, p. 39, lines 11-13).

      • The normalization of neurochemical data seems problematic and unnecessary. By normalizing all data to the baseline data (p. 24), one artificially creates a baseline period with minimal variation (all are "0"; Figures 2, 3 & 5) and this has implications for statistical power. Because the analysis is a within-subjects analysis, this normalization is not necessary for the analysis itself. It can be useful to normalize data for visualization purposes, but raw data should be analyzed. Indeed, behavioral data are qualitatively similar to the neurochemical data, and those data are not normalized to baseline values.

      Please see our general comment on this issue. We believe normalization does not affect statistical power and is both the standard way and an appropriate way to analyze microdialysis results. We include concentrations of ACh, DA, and 5-HIAA in supplementary tables?

      • The authors should include a discussion (in the Discussion section) of how behavior and neurochemical release are associated during the first half of the experimental session but not in the second half (e.g., differences in Ach and DA release between mating and restraint groups during stim 1 and 2, but behavioral differences only during stim 1).

      We have included a section in the Discussion concerning the temporal relationship between behavioral responses and neurochemical changes in response to vocal playback. We note that the linkage is particularly strong in some cases (e.g., ACh release and flinching). This points to a need to examine these phenomena with finer temporal resolution, but also with the recognition that the brain circuits driving a behavioral response may extend beyond the BLA.

      Minor comments:

      • Keywords: add "serotonin" (even though there are no significant differences on 5-HIAA, people interested in serotonin would find this interesting).

      Added to keywords list.

      • Do the authors collect data on the vocalizations of mice in response to these playbacks?

      We monitored vocalizations during playback, noting that vocalizations–especially “Noisy” vocalization–were common. However, we did not record vocalizations and are therefore unable quantify our observations.

      • First line of page 7: readers do not know about "stim 1" and "stim 2". Therefore, the authors need to describe their approach to analyzing behavior and neurochemical release.

      We first introduce these terms earlier, citing Figure 1D,E. We have added some additional wording for further clarification. page 7, lines 4-5.

      • Make sure citations are uniformly formatted (e.g., Inconsistencies in: "As male and female mice emit different vocalizations during mating (Finton et al., 2017; J. M. S. Grimsley et al., 2013; Neunuebel et al., 2015; Sales (née Sewell), 1972)").

      We have reviewed and corrected citations throughout the manuscript.

      • Last paragraph of page 7: "attending behavior" has not been defined yet.

      Table 1 contains our description of the behaviors analyzed in this study. We have now inserted a reference to Table 1 earlier in the Results (p. 6, line 12).

      • Figure 2E and 3G: I find these correlations to be redundant with the GLMs. This is because the significant relationship is likely to be driven by group differences in behavior and in neurochemical release.

      Please see general comments regarding correlation analyses. We removed such analyses and references to them from the manuscript.

      • Page 2, 2nd paragraph, 2nd sentence: this paragraph seems to be rooted in comparing and contrasting experienced and inexperienced mice, so there should be explicit comparisons in each sentence. For example, the 2nd sentence should read: "Whereas EXP estrus females demonstrated increased flinching behaviors in response to mating vocalizations, INEXP ....". This paragraph overall could use some refining.

      We believe this refers to page 9. We have revised the paragraph to clarify our findings (Beginning p. 9, line 23).

      • Page 9: "Further, there were no significant differences across groups during Stim 1 or Stim 2 periods. These results contrast sharply with those from all EXP groups, in which both ACh and DA release changed significantly during playback (Figs. 2C, 2D, 3E, 3F)." While I understand their perspective, this is misleading because changes were only observed during the Stim 1 period.

      We have slightly revised the wording in this paragraph, because the restraint males did not show significant ACh decreases. However, we do not believe our statements mislead readers just because some changes are observed in only one of the stimulation periods (p 10, lines 13-16).

      • Last paragraph of page 14: it would be useful to mention the increase in flinching in experienced females in response to mating vocalizations.

      We have added a sentence in this paragraph relating flinching in estrus females to increased ACh (p. 15, lines 18-20).

      • Was there a full analysis of locomotion in response to playbacks? I see that locomotion was correlated with neurochemical release but was it different in response to different stimuli? Were there changes to the part of the arena that mice occupied in response to restraint vs. mating vocalizations? Given their methods section, it would be useful for the authors to mention the results of the analyses of these aspects of movement.

      We have provided additional descriptions of space use and video tracking data in Material and Methods (p. 23, lines 1-6). We now report additional results associated with these analyses (p. 8, lines 13-15; p. 9, lines 8-14).

      • I believe that each experimental mouse only heard one of the stimuli (given the analytical approach). Because it is plausible to measure neurochemical release in response to both types of stimuli, I encourage the authors to be more explicit about this aspect of the experimental design (e.g., mention in Results section).

      Sentence modified to read: “Each mouse received playback of either the mating or restraint stimuli, but not both: same-day presentation of both stimuli would require excessively long playback sessions, the condition of the same probe would likely change on subsequent days, and quality of a second implanted probe on a subsequent day was uncertain.” (p. 7, lines 5-9).

      • Figure 1A and 1B: add labels to the panels so readers don't have to read the legend to know what spectrogram is associated with what context.

      We added these labels to Figure 1.

      • Table 1: in the definition of "still and alert", should this mention "abrupt attending" instead of "abrupt freezing"? The latter isn't described.

      Yes, we intended “abrupt attending”, and now indicated that in Table 1

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      • The authors report they performed manual behavioral analysis, and provide a table defining the different behaviors. However, it remains unclear how some of these behaviors were detected (such as still-and-alert events). A thorough description of the criteria used to define these events needs to be provided.

      We have modified some descriptions of manually analyzed behaviors in Table 1, and have added additional description of how we developed this set of behaviors for analysis in the study (pp. 22-23).

      • The box plots do not appear to represent the "minimum, first quartile, median, third quartile, and maximum values." as specified on page 24 (Methods). Indeed, the individual data points sometimes do not reach the max or min of the bar plot, and sometimes are way beyond them.

      We used the “inclusive median” function in Excel to generate final boxplots. These boxplots will sometimes result in a data point being placed outside of the whiskers. SPSS considers these to be “outliers”, but our GLM analysis includes these values. We describe this in Data Analysis section of Materials and Methods (p. 28, lines 3-9)

      • Some of the data are replicated in different Figures: Figure 2A and Figure 3C. While this is acceptable, the authors did not correct for multiple comparisons (dividing the p value by the number of comparisons).

      Our analysis included corrections for multiple comparisons, as we have indicated on p. 27, lines 15-16.

      • Overall, the sample sizes are too small (for example in Figure 3, non-estrus females are at n=3), and are different in experiments where they should be equal (Figure 2B: mating stim 1 is at n=5 and mating stim 2 is at n=3).

      We apologize that sample sizes were not properly displayed in figures. Please note that sample sizes are identified in the figure captions. For neuromodulator data, all sample sizes are at least 7. For behavioral data, the minimum sample size is 5. We have revised Figures 3-6 to ensure that all data points are visible.

      • It remains unclear why the impact of mating vocalizations has been tested only in males.

      We assume the reviewer meant that only males were tested in restraint. We now indicate that our preliminary evidence indicated no difference in behavioral responses to restraint vocalization between males and females, so we opted to perform the neurochemical analysis for restraint only in males (page 22 lines 4-5). If there were no limitations to time and cost, we would have preferred to test responses to restraint in females as well. We note that such inclusion would have added up to 4 experimental groups (estrus and non-estrus groups in both EXP and INEXP groups).

      • The correlation between the number of flinching and ACh release changes (Figure 2E) visually appears to be opposite between mating and restraint playbacks. The authors should perform independent correlations for these 2 playbacks.

      Please see general comments regarding correlation analyses. We removed such analyses and references to them from the manuscript.

      • The authors state that their findings "indicate that behavioral responses to salient vocalizations result from interactions between sex of the listener or context of vocal stimuli with the previous behavioral experience associated with these vocalizations.". However, in male mice, they do not report any difference in previous experience on flinching for both restraint and mating sounds, as well as no difference in rearing for the restrain sounds (Figure 4A-B). Thus, the discussion of these results should be completely revisited.

      We revised the paragraph in question (p. 9, line 22 through p. 10, line 9). For instance, we note that significant differences between EXP male-mating and male-restraint flinching do not exist between the INEXP groups. We believe that the last sentence correctly summarizes findings described in this paragraph.

      • For serotonin experiments in Figure S2 there are strong outliers (150% increase in 5HIAA release). Did the authors correlate these levels with the behavior of the animals?

      Outliers are identified by the Excel function that generated the boxplots, but we have no reason to consider these as outliers and exclude them. As noted above, we have clarified that these “outliers” are the result of the Excel function in the Materials and Methods (p. 28, lines 3-9) and we have revised the plotting of data points

      Minor comments:

      • Mating vocalization playback is mainly emitted by males, thus, instead of a positive valence signal, this could also be interpreted as a competitive signal to other males.

      There is support in the literature for viewing our mating stimulus as having positive valence. Gaub et al., 2016 describe the emission of stepped calls, lower frequency harmonics, and increased sound level as indicators of “positive emotion”. We have shown (Grimsley et al, 2013) that the female LFH vocalization can be highly attractive to male mice, under the right conditions, indicating something like “sex is happening”. The inclusion of both the male and female vocalizations in our stimuli was a key piece of our experimental design, based on our understanding of the contributions of both vocalizations to the meaning of the overall acoustic experience.

      • Figure 1 should include panel titles.

      No change. This information is available in the Figure caption.

      • n=31 should be indicated in the EXP group.

      We’re not sure where the reviewer is referring to this value.

      • The color legend of Figure 1E is absent, making the Figure not understandable.

      We added text in the Figure 1 caption to indicate that each color represents a different exemplar. We don’t think a legend provides additional useful information.

      • The point of making two blocks (stim 1 and stim2) should be stated more clearly.

      Please see general statement regarding experimental blocks. We have modified our description of these in an Experimental overview section in the Material and Methods.

      • Including raw data of micro-dialysis in the supplementary figures would allow assessment of the variability and quality of the measurements.

      We have added concentrations of neurochemicals in supplemental tables 1-3.

      • Baseline (prestimulus) number of flinch and rearing should systematically be indicated (missing in Figure 4).

      The focus in this figure is on the differences that occur in Stim 1 values. There are no differences between EXP and INEXP animals of any group during the Pre-Stim period. We now state that in the Figure 4 caption.

      • Discussion: "increase in AMPA/NMDA currents". We believe the authors are referring to the ratio of AMPA to NMDA currents. This sentence should be reformulated.

      These are modified to refer to “… the AMPA/NMDA current ratio…” in two locations in the Discussion (p. 14, lines 8-9; p. 15, line 4)

      • Overall the discussion is very speculative and should rely more on the data.

      We believe that the Discussion provides appropriate speculation that is based on our experimental data and previous literature. We have added a paragraph to identify limitations of our findings and recommendations of future experiments to resolve some issues (p. 12, lines 3-17)

      Reviewer #3 (Recommendations For The Authors):

      Minor concerns:

      1) The authors stated that USVs are most likely to be emitted by males, and LFH are likely to be emitted by females. However, Oliveira-Stahl et al. 2023, Matsumoto et al. 2022, Warren et al. 2018, Heckman et al. 2017, Neunuebel et al., 2015 showed that females also emit USVs. The authors should mention that USVs are emitted by both males and females and discuss how the sex of the vocalizing animal (both males and females) can influence neuromodulator release.

      The reviewer slightly mis-stated the wording of our text, changing the meaning significantly. Our wording is “These sequences included ultrasonic vocalizations (USVs) with harmonics, steps, and complex structure, mostly emitted by males, and low frequency harmonic calls (LFHs) emitted by females (Fig. 1A,C)…” This phrasing is correct and carefully chosen. The Discussion in Oliveira-Stahl et al 2023 (p. 10-11) supports our statement: “The exact fraction of USVs emitted by females as concluded in all previous studies on dyadic courtship has varied, ranging from 18%, 17.5%, and 16% to 10.5% in the present study…”.

      2) The authors should explain why ECF from BLA was collected unilaterally from the left hemisphere.

      p. 23, lines 9-11: We inserted a sentence to explain why we targeted the BLA unilaterally. “Since both left and right amygdala are responsive to vocal stimuli in human and experimental animal studies (Wenstrup et al., 2020), we implanted microdialysis probes into the left amygdala to maintain consistency with other studies in our laboratory..” Beyond that, the choice was arbitrary.

      3) The authors said each animal recovered in its home cage for four days before the playback experiment. A 4-day period may not be sufficient for every animal to recover from surgery, so the authors should describe how a mouse's recovery was assessed.

      p. 23, lines 20-23: We provide more description about the recovery and how it was assessed. Except for a few animals that were not included in the experiments, all animals recovered within 4 days.

      4) The authors stated that each animal was exposed to 90-min sessions with mating and restraint behaviors in a counterbalanced design. This description for Figure 1D should also include the duration of the mating and restraint experience.

      The Results that immediately precede citation to this figure include this information.

      5) The authors stated, "Data are reported only from mice with more than 75% of the microdialysis probe implanted within the BLA". What are the implications of having 25% of the probe outside the BLA? The authors should shed more light on this by discussing this issue as it relates to the findings and commenting on where the other 25% of the probe was located.

      We inserted a sentence to explain the rationale for this inclusion criterion. “We verified placement of microdialysis probes to minimize variability that could arise because regions surrounding BLA receive neurochemical inputs from different sources (e.g., cholinergic inputs to putamen and central amygdala).” (p. 25, lines 21-23).

      All brain regions that surround BLA, dorsal, medial, ventral, or lateral, could have been sampled by the “other” 25%. Some of these, e.g., the central amygdala or caudate-putamen, have different sources of cholinergic input that may not have the same release pattern. We do not think it is worthy of further speculation in the Discussion. Due to the high cost of the neurochemical analysis, we often did not process the neurochemistry data if histology indicated that a probe missed the BLA target.

      6) The authors confirmed that the estrus stage did not change during the experiment day by evaluating and comparing estrus prior to and after data collection. This strategy was a fantastic experimental approach, but the authors should have discussed the results. How did the results the authors included change when the females were in estrus before but not after data collection? What percentage of females started in estrus but ended in metestrus? Assuming that some females changed estrus state, were these animals excluded from the analyses?

      All animals were in the same estrus state at the beginning and end of the playback session.

      7). Authors cite Neunuebel et al., 2015 for the sentence "As male and female mice emit different vocalizations during mating". However, Neunuebel et al., 2015 showed vocalizations emitted during chasing--not mating. If mating is a general term for courtship, then this reference is appropriate, but see major concern #3.

      In the Results (p. 8, line 5), we changed the phrasing to “courtship and mating” to include the Neunubel et al study.

      As we indicate in our response to Public Comment #3, we have modified the Results (p. 5, lines 18-20) and Materials and Methods (p. 21, lines 8-15) to clarify our meaning. We continue to use the term “mating” because this refers to a specific set of behaviors associated with mounting and copulation, rather than the more general term “courtship”. We also indicate that we based these behaviors on previous work (e.g., Gaub et al., 2016).

      8) Authors interpret Figure 3F as DA release showed a "consistent" increase during mating playback across all three experimental groups. However, the increase in the estrus female group is inconsistent, as seen in the graph. This verbiage should be reworded to describe the data more accurately.

      p. 8, line 23 “consistent” was deleted.

      9) In all the box plots, multiple data points overlay each other. A more transparent way of showing the data would be adding some jitter to the x value to make each data point visible. The mean (X's) in Figure 3D (pre-stim mating and mating estrus) are difficult to see, as are all the data points in mating non-estrus. Adding all the symbols to the figure legend or a key in the figure instead of the method section would aid the reader and make the plots easier to interpret

      We have revised the boxplots to ensure that all data points are visible.

      10) Some verbiage used in the discussion should be toned down. For example, "intense" experiences and "emotionally charged" vocalizations should be removed.

      We have not changed these terms, which we believe are appropriate to describe these experiences and vocalizations.

      11) The authors include "Emotional Vocalizations" in the title. It would be beneficial if the authors included more detail and references in the introduction to help set up the emotional content of vocalizations. It may benefit a broader readership as typically targeted by eLife.

      We now cite Darwin and some more recent publications that articulate the general understanding that social vocalizations carry emotional content.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents potentially valuable results on glutamine-rich motifs in relation to protein expression and alternative genetic codes. The author's interpretation of the results is so far only supported by incomplete evidence, due to a lack of acknowledgment of alternative explanations, missing controls and statistical analysis and writing unclear to non experts in the field. These shortcomings could be at least partially overcome by additional experiments, thorough rewriting, or both.

      We thank both the Reviewing Editor and Senior Editor for handling this manuscript.

      Based on your suggestions, we have provided controls, performed statistical analysis, and rewrote our manuscript. The revised manuscript is significantly improved and more accessible to non-experts in the field.

      Reviewer #1 (Public Review):

      Summary

      This work contains 3 sections. The first section describes how protein domains with SQ motifs can increase the abundance of a lacZ reporter in yeast. The authors call this phenomenon autonomous protein expression-enhancing activity, and this finding is well supported. The authors show evidence that this increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance, and that this phenomenon is not affected by mutants in translational quality control. It was not completely clear whether the increased protein abundance is due to increased translation or to increased protein stability.

      In section 2, the authors performed mutagenesis of three N-terminal domains to study how protein sequence changes protein stability and enzymatic activity of the fusions. These data are very interesting, but this section needs more interpretation. It is not clear if the effect is due to the number of S/T/Q/N amino acids or due to the number of phosphorylation sites.

      In section 3, the authors undertake an extensive computational analysis of amino acid runs in 27 species. Many aspects of this section are fascinating to an expert reader. They identify regions with poly-X tracks. These data were not normalized correctly: I think that a null expectation for how often poly-X track occur should be built for each species based on the underlying prevalence of amino acids in that species. As a result, I believe that the claim is not well supported by the data.

      Strengths

      This work is about an interesting topic and contains stimulating bioinformatics analysis. The first two sections, where the authors investigate how S/T/Q/N abundance modulates protein expression level, is well supported by the data. The bioinformatics analysis of Q abundance in ciliate proteomes is fascinating. There are some ciliates that have repurposed stop codons to code for Q. The authors find that in these proteomes, Q-runs are greatly expanded. They offer interesting speculations on how this expansion might impact protein function.

      Weakness

      At this time, the manuscript is disorganized and difficult to read. An expert in the field, who will not be distracted by the disorganization, will find some very interesting results included. In particular, the order of the introduction does not match the rest of the paper.

      In the first and second sections, where the authors investigate how S/T/Q/N abundance modulates protein expression levels, it is unclear if the effect is due to the number of phosphorylation sites or the number of S/T/Q/N residues.

      There are three reasons why the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities:

      First, we have reported previously that phosphorylation-defective Rad51-NTD (Rad51-3SA) and wild-type Rad51-NTD exhibit similar autonomous PEE activity. Mec1/Tel1-dependent phosphorylation of Rad51-NTD antagonizes the proteasomal degradation pathway, increasing the half-life of Rad51 from ∼30 min to ≥180 min (1). (page 1, lines 11-14)

      Second, in our preprint manuscript, we have already shown that phosphorylation-defective Rad53-SCD1 (Rad51-SCD1-5STA) also exhibits autonomous PEE activity similar to that of wild-type Rad53-SCD (Figure 2D, Figure 4A and Figure 4C). We have highlighted this point in our revised manuscript (page 9, lines 19-21).

      Third, as revealed by the results of Figure 4, it is the percentages, and not the numbers, of S/T/Q/N residues that are correlated with the PEE activities of Q-rich motifs.

      The authors also do not discuss if the N-end rule for protein stability applies to the lacZ reporter or the fusion proteins.

      The autonomous PEE function of S/T/Q-rich NTDs is unlikely to be relevant to the N-end rule. The N-end rule links the in vivo half-life of a protein to the identity of its N-terminal residues. In S. cerevisiae, the N-end rule operates as part of the ubiquitin system and comprises two pathways. First, the Arg/N-end rule pathway, involving a single N-terminal amidohydrolase Nta1, mediates deamidation of N-terminal asparagine (N) and glutamine (Q) into aspartate (D) and glutamate (E), which in turn are arginylated by a single Ate1 R-transferase, generating the Arg/N degron. N-terminal R and other primary degrons are recognized by a single N-recognin Ubr1 in concert with ubiquitin-conjugating Ubc2/Rad6. Ubr1 can also recognize several other N-terminal residues, including lysine (K), histidine (H), phenylalanine (F), tryptophan (W), leucine (L) and isoleucine (I) (68-70). Second, the Ac/N-end rule pathway targets proteins containing N-terminally acetylated (Ac) residues. Prior to acetylation, the first amino acid methionine (M) is catalytically removed by Met-aminopeptidases (MetAPs), unless a residue at position 2 is non-permissive (too large) for MetAPs. If a retained N-terminal M or otherwise a valine (V), cysteine (C), alanine (A), serine (S) or threonine (T) residue is followed by residues that allow N-terminal acetylation, the proteins containing these AcN degrons are targeted for ubiquitylation and proteasome-mediated degradation by the Doa10 E3 ligase (71).

      The PEE activities of these S/T/Q-rich domains are unlikely to arise from counteracting the N-end rule for two reasons. First, the first two amino acid residues of Rad51-NTD, Hop1-SCD, Rad53-SCD1, Sup35-PND, Rad51-ΔN, and LacZ-NVH are MS, ME, ME, MS, ME, and MI, respectively, where M is methionine, S is serine, E is glutamic acid and I is isoleucine. Second, Sml1-NTD behaves similarly to these N-terminal fusion tags, despite its methionine and glutamine (MQ) amino acid signature at the N-terminus. (Page 12, line 3 to page 13, line 2)

      The most interesting part of the paper is an exploration of S/T/Q/N-rich regions and other repetitive AA runs in 27 proteomes, particularly ciliates. However, this analysis is missing a critical control that makes it nearly impossible to evaluate the importance of the findings. The authors find the abundance of different amino acid runs in various proteomes. They also report the background abundance of each amino acid. They do not use this background abundance to normalize the runs of amino acids to create a null expectation from each proteome. For example, it has been clear for some time (Ruff, 2017; Ruff et al., 2016) that Drosophila contains a very high background of Q's in the proteome and it is necessary to control for this background abundance when finding runs of Q's.

      We apologize for not explaining sufficiently well the topic eliciting this reviewer’s concern in our preprint manuscript. In the second paragraph of page 14, we cite six references to highlight that SCDs are overrepresented in yeast and human proteins involved in several biological processes (5, 43) and that polyX prevalence differs among species (79-82).

      We will cite a reference by Kiersten M. Ruff in our revised manuscript (38).

      K. M. Ruff, J. B. Warner, A. Posey and P. S. Tan (2017) Polyglutamine length dependent structural properties and phase behavior of huntingtin exon1. Biophysical Journal 112, 511a.

      The authors could easily address this problem with the data and analysis they have already collected. However, at this time, without this normalization, I am hesitant to trust the lists of proteins with long runs of amino acid and the ensuing GO enrichment analysis. Ruff KM. 2017. Washington University in St.

      Ruff KM, Holehouse AS, Richardson MGO, Pappu RV. 2016. Proteomic and Biophysical Analysis of Polar Tracts. Biophys J 110:556a.

      We thank Reviewer #1 for this helpful suggestion and now address this issue by means of a different approach described below.

      Based on a previous study (43), we applied seven different thresholds to seek both short and long, as well as pure and impure, polyX strings in 20 different representative near-complete proteomes, including 4X (4/4), 5X (4/5-5/5), 6X (4/6-6/6), 7X (4/7-7/7), 8-10X (≥50%X), 11-10X (≥50%X) and ≥21X (≥50%X).

      To normalize the runs of amino acids and create a null expectation from each proteome, we determined the ratios of the overall number of X residues for each of the seven polyX motifs relative to those in the entire proteome of each species, respectively. The results of four different polyX motifs are shown in our revised manuscript, i.e., polyQ (Figure 7), polyN (Figure 8), polyS (Figure 9) and polyT (Figure 10). Thus, polyX prevalence differs among species and the overall X contents of polyX motifs often but not always correlate with the X usage frequency in entire proteomes (43).

      Most importantly, our results reveal that, compared to Stentor coeruleus or several non-ciliate eukaryotic organisms (e.g., Plasmodium falciparum, Caenorhabditis elegans, Danio rerio, Mus musculus and Homo sapiens), the five ciliates with reassigned TAAQ and TAGQ codons not only have higher Q usage frequencies, but also more polyQ motifs in their proteomes (Figure 7). In contrast, polyQ motifs prevail in Candida albicans, Candida tropicalis, Dictyostelium discoideum, Chlamydomonas reinhardtii, Drosophila melanogaster and Aedes aegypti, though the Q usage frequencies in their entire proteomes are not significantly higher than those of other eukaryotes (Figure 1). Due to their higher N usage frequencies, Dictyostelium discoideum, Plasmodium falciparum and Pseudocohnilembus persalinus have more polyN motifs than the other 23 eukaryotes we examined here (Figure 8). Generally speaking, all 26 eukaryotes we assessed have similar S usage frequencies and percentages of S contents in polyS motifs (Figure 9). Among these 26 eukaryotes, Dictyostelium discoideum possesses many more polyT motifs, though its T usage frequency is similar to that of the other 25 eukaryotes (Figure 10).

      In conclusion, these new normalized results confirm that the reassignment of stop codons to Q indeed results in both higher Q usage frequencies and more polyQ motifs in ciliates.  

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to understand the connection between protein sequence and function in disordered regions enriched in polar amino acids (specifically Q, N, S and T). While the authors suggest that specific motifs facilitate protein-enhancing activities, their findings are correlative, and the evidence is incomplete. Similarly, the authors propose that the re-assignment of stop codons to glutamine-encoding codons underlies the greater user of glutamine in a subset of ciliates, but again, the conclusions here are, at best, correlative. The authors perform extensive bioinformatic analysis, with detailed (albeit somewhat ad hoc) discussion on a number of proteins. Overall, the results presented here are interesting, but are unable to exclude competing hypotheses.

      Strengths:

      Following up on previous work, the authors wish to uncover a mechanism associated with poly-Q and SCD motifs explaining proposed protein expression-enhancing activities. They note that these motifs often occur IDRs and hypothesize that structural plasticity could be capitalized upon as a mechanism of diversification in evolution. To investigate this further, they employ bioinformatics to investigate the sequence features of proteomes of 27 eukaryotes. They deepen their sequence space exploration uncovering sub-phylum-specific features associated with species in which a stop-codon substitution has occurred. The authors propose this stop-codon substitution underlies an expansion of ploy-Q repeats and increased glutamine distribution.

      Weaknesses:

      The preprint provides extensive, detailed, and entirely unnecessary background information throughout, hampering reading and making it difficult to understand the ideas being proposed.

      The introduction provides a large amount of detailed background that appears entirely irrelevant for the paper. Many places detailed discussions on specific proteins that are likely of interest to the authors occur, yet without context, this does not enhance the paper for the reader.

      The paper uses many unnecessary, new, or redefined acronyms which makes reading difficult. As examples:

      1) Prion forming domains (PFDs). Do the authors mean prion-like domains (PLDs), an established term with an empirical definition from the PLAAC algorithm? If yes, they should say this. If not, they must define what a prion-forming domain is formally.

      The N-terminal domain (1-123 amino acids) of S. cerevisiae Sup35 was already referred to as a “prion forming domain (PFD)” in 2006 (48). Since then, PFD has also been employed as an acronym in other yeast prion papers (Cox, B.S. et al. 2007; Toombs, T. et al. 2011).

      B. S. Cox, L. Byrne, M. F., Tuite, Protein Stability. Prion 1, 170-178 (2007). J. A. Toombs, N. M. Liss, K. R. Cobble, Z. Ben-Musa, E. D. Ross, [PSI+] maintenance is dependent on the composition, not primary sequence, of the oligopeptide repeat domain. PLoS One 6, e21953 (2011).

      2) SCD is already an acronym in the IDP field (meaning sequence charge decoration) - the authors should avoid this as their chosen acronym for Serine(S) / threonine (T)-glutamine (Q) cluster domains. Moreover, do we really need another acronym here (we do not).

      SCD was first used in 2005 as an acronym for the Serine (S)/threonine (T)-glutamine (Q) cluster domain in the DNA damage checkpoint field (4). Almost a decade later, SCD became an acronym for “sequence charge decoration” (Sawle, L. et al. 2015; Firman, T. et al. 2018).

      L. Sawle and K, Ghosh, A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem Phys. 143, 085101(2015).

      T. Firman and Ghosh, K. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins. J. Chem Phys. 148, 123305 (2018).

      3) Protein expression-enhancing (PEE) - just say expression-enhancing, there is no need for an acronym here.

      Thank you. Since we have shown that the addition of Q-rich motifs to LacZ affects protein expression rather than transcription, we think it is better to use the “PEE” acronym.

      The results suggest autonomous protein expression-enhancing activities of regions of multiple proteins containing Q-rich and SCD motifs. Their definition of expression-enhancing activities is vague and the evidence they provide to support the claim is weak. While their previous work may support their claim with more evidence, it should be explained in more detail. The assay they choose is a fusion reporter measuring beta-galactosidase activity and tracking expression levels. Given the presented data they have shown that they can drive the expression of their reporters and that beta gal remains active, in addition to the increase in expression of fusion reporter during the stress response. They have not detailed what their control and mock treatment is, which makes complete understanding of their experimental approach difficult. Furthermore, their nuclear localization signal on the tag could be influencing the degradation kinetics or sequestering the reporter, leading to its accumulation and the appearance of enhanced expression. Their evidence refuting ubiquitin-mediated degradation does not have a convincing control.

      Although this reviewer’s concern regarding our use of a nuclear localization signal on the tag is understandable, we are confident that this signal does not bias our findings for two reasons. First, the negative control LacZ-NV also possesses the same nuclear localization signal (Figure 1A, lane 2). Second, another fusion target, Rad51-ΔN, does not harbor the NVH tag (Figure 1D, lanes 3-4). Compared to wild-type Rad51, Rad51-ΔN is highly labile. In our previous study, removal of the NTD from Rad51 reduced by ~97% the protein levels of corresponding Rad51-ΔN proteins relative to wild-type (1).

      Based on the experimental results, the authors then go on to perform bioinformatic analysis of SCD proteins and polyX proteins. Unfortunately, there is no clear hypothesis for what is being tested; there is a vague sense of investigating polyX/SCD regions, but I did not find the connection between the first and section compelling (especially given polar-rich regions have been shown to engage in many different functions). As such, this bioinformatic analysis largely presents as many lists of percentages without any meaningful interpretation. The bioinformatics analysis lacks any kind of rigorous statistical tests, making it difficult to evaluate the conclusions drawn. The methods section is severely lacking. Specifically, many of the methods require the reader to read many other papers. While referencing prior work is of course, important, the authors should ensure the methods in this paper provide the details needed to allow a reader to evaluate the work being presented. As it stands, this is not the case.

      Thank you. As described in detail below, we have now performed rigorous statistical testing using the GofuncR package (Figure 11, Figure 12 and DS7-DS32).

      Overall, my major concern with this work is that the authors make two central claims in this paper (as per the Discussion). The authors claim that Q-rich motifs enhance protein expression. The implication here is that Q-rich motif IDRs are special, but this is not tested. As such, they cannot exclude the competing hypothesis ("N-terminal disordered regions enhance expression").

      In fact, “N-terminal disordered regions enhance expression” exactly summarizes our hypothesis.

      On pages 12-13 and Figure 4 of our preprint manuscript, we explained our hypothesis in the paragraph entitled “The relationship between PEE function, amino acid contents, and structural flexibility”.

      The authors also do not explore the possibility that this effect is in part/entirely driven by mRNA-level effects (see Verma Na Comms 2019).

      As pointed out by the first reviewer, we present evidence that the increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance (Figure 2), and that this phenomenon is not affected in translational quality control mutants (Figure 3).

      As such, while these observations are interesting, they feel preliminary and, in my opinion, cannot be used to draw hard conclusions on how N-terminal IDR sequence features influence protein expression. This does not mean the authors are necessarily wrong, but from the data presented here, I do not believe strong conclusions can be drawn. That re-assignment of stop codons to Q increases proteome-wide Q usage. I was unable to understand what result led the authors to this conclusion.

      My reading of the results is that a subset of ciliates has re-assigned UAA and UAG from the stop codon to Q. Those ciliates have more polyQ-containing proteins. However, they also have more polyN-containing proteins and proteins enriched in S/T-Q clusters. Surely if this were a stop-codon-dependent effect, we'd ONLY see an enhancement in Q-richness, not a corresponding enhancement in all polar-rich IDR frequencies? It seems the better working hypothesis is that free-floating climate proteomes are enriched in polar amino acids compared to sessile ciliates.

      We thank this reviewer for raising this point, however her/his comments are not supported by the results in Figure 7.

      Regardless, the absence of any kind of statistical analysis makes it hard to draw strong conclusions here.

      We apologize for not explaining more clearly the results of Tables 5-7 in our preprint manuscript.

      To address the concerns about our GO enrichment analysis by both reviewers, we have now performed rigorous statistical testing for SCD and polyQ protein overrepresentation using the GOfuncR package (https://bioconductor.org/packages/release/bioc/html/GOfuncR.html). GOfuncR is an R package program that conducts standard candidate vs. background enrichment analysis by means of the hypergeometric test. We then adjusted the raw p-values according to the Family-wise error rate (FWER). The same method had been applied to GO enrichment analysis of human genomes (89).

      The results presented in Figure 11 and Figure 12 (DS7-DS32) support our hypothesis that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition, Candida albicans filamentous growth, peptidyl-glutamic acid modification in ciliates with reassigned stop codons (TAAQ and TAGQ), Tetrahymena thermophila xylan catabolism, Dictyostelium discoideum sexual reproduction, Plasmodium falciparum infection, as well as the nervous systems of Drosophila melanogaster, Mus musculus, and Homo sapiens (78). In contrast, peptidyl-glutamic acid modification and microtubule-based movement are not overrepresented with Q-rich proteins in Stentor coeruleus, a ciliate with standard stop codons.

      Recommendations for the authors:

      Please note that you control which revisions to undertake from the public reviews and recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors):

      The order of paragraphs in the introduction was very difficult to follow. Each paragraph was clear and easy to understand, but the order of paragraphs did not make sense to this reader. The order of events in the abstract matches the order of events in the results section. However, the order of paragraphs in the introduction is completely different and this was very confusing. This disordered list of facts might make sense to an expert reader but makes it hard for a non-expert reader to understand.

      Apologies. We endeavored to improve the flow of our revised manuscript to make it more readable.

      The section beginning on pg 12 focused on figures 4 and 5 was very interesting and highly promising. However, it was initially hard for me to tell from the main text what the experiment was. Please add to the text an explanation of the experiment, because it is hard to figure out what was going on from the figures alone. Figure 4 is fantastic, but would be improved by adding error bars and scaling the x-axis to be the same in panels B,C,D.

      Thank you for this recommendation. We have now scaled both the x-axis and y-axis equivalently in panels B, C and D of Figure 4. Error bars are too small to be included.

      It is hard to tell if the key variable is the number of S/T/Q/N residues or the number of phosphosites. I think a good control would be to add a regression against the number of putative phosphosites. The sequences are well designed. I loved this part but as a reader, I need more interpretation about why it matters and how it explains the PEE.

      As described above, we have shown that the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities.

      I believe that the prevalence of polyX runs is not meaningful without normalizing for the background abundance of each amino acid. The proteome-wide abundance and the assumption that amino acids occur independently can be used to form a baseline expectation for which runs are longer than expected by chance. I think Figures 6 and 7 should go into the supplement and be replaced in the main text with a figure where Figure 6 is normalized by Figure 7. For example in P. falciparum, there are many N-runs (Figure 6), but the proteome has the highest fraction of N’s (Figure 7).

      Thank you for these suggestions. The three figures in our preprint manuscript (Figures 6-8) have been moved into the supplementary information (Figures S1-S3). For normalization, we have provided four new figures (Figures 7-10) in our revised manuscript.

      The analysis of ciliate proteomes was fascinating. I am particularly interested in the GO enrichment for “peptidyl-glutamic acid modification” (pg 20) because these enzymes might be modifying some of Q’s in the Q-runs. I might be wrong about this idea or confused about the chemistry. Do these ciliates live in Q-rich environments? Or nitrogen rich environments?

      Polymeric modifications (polymodifications) are a hallmark of C-terminal tubulin tails, whereas secondary peptide chains of glutamic acids (polyglutamylation) and glycines (polyglycylation) are catalyzed from the γ-carboxyl group of primary chain glutamic acids. It is not clear if these enzymes can modify some of the Q’s in the Q-runs.

      To our knowledge, ciliates are abundant in almost every liquid water environment, i.e., oceans/seas, marine sediments, lakes, ponds, and rivers, and even soils.

      I think you should include more discussion about how the codons that code for Q’s are prone to slippage during DNA replication, and thus many Q-runs are unstable and expand (e.g. Huntington’s Disease). The end of pg 24 or pg 25 would be good places.

      We thank the reviewer for these comments.

      PolyQ motifs have a particular length-dependent codon usage that relates to strand slippage in CAG/CTG trinucleotide repeat regions during DNA replication. In most organisms having standard genetic codons, Q is encoded by CAGQ and CAAQ. Here, we have determined and compared proteome-wide Q contents, as well as the CAGQ usage frequencies (i.e., the ratio between CAGQ and the sum of CAGQ, CAGQ, TAAQ, and TAGQ).

      Our results reveal that the likelihood of forming long CAG/CTG trinucleotide repeats are higher in five eukaryotes due to their higher CAGQ usage frequencies, including Drosophila melanogaster (86.6% Q), Danio rerio (74.0% Q), Mus musculus (74.0% Q), Homo sapiens (73.5% Q), and Chlamydomonas reinhardtii (87.3% Q) (orange background, Table 2). In contrast, another five eukaryotes that possess high numbers of polyQ motifs (i.e., Dictyostelium discoideum, Candida albicans, Candida tropicalis, Plasmodium falciparum and Stentor coeruleus) (Figure 1) utilize more CAAQ (96.2%, 84.6%, 84.5%, 86.7% and 75.7%) than CAAQ (3.8%, 15.4%, 15.5%, 13.3% and 24.3%), respectively, to avoid the formation of long CAG/CTG trinucleotide repeats (green background, Table 2). Similarly, all five ciliates with reassigned stop codons (TAAQ and TAGQ) have low CAGQ usage frequencies (i.e., from 3.8% Q in Pseudocohnilembus persalinus to 12.6% Q in Oxytricha trifallax) (red font, Table 2). Accordingly, the CAG-slippage mechanism might operate more frequently in Chlamydomonas reinhardtii, Drosophila melanogaster, Danio rerio, Mus musculus and Homo sapiens than in Dictyostelium discoideum, Candida albicans, Candida tropicalis, Plasmodium falciparum, Stentor coeruleus and the five ciliates with reassigned stop codons (TAAQ and TAGQ).

      Author response table 1.

      Usage frequencies of TAA, TAG, TAAQ, TAGQ, CAAQ and CAGQ codons in the entire proteomes of 20 different organisms.

      Pg 7, paragraph 2 has no direction. Please add the conclusion of the paragraph to the first sentence.

      This paragraph has been moved to the “Introduction” section” of the revised manuscript.

      Pg 8, I suggest only mentioning the PFDs used in the experiments. The rest are distracting.

      We have addressed this concern above.

      Pg 12. Please revise the "The relationship...." text to explain the experiment.

      We apologize for not explaining this topic sufficiently well in our preprint manuscript.

      SCDs are often structurally flexible sequences (4) or even IDRs. Using IUPred2A (https://iupred2a.elte.hu/plot_new), a web-server for identifying disordered protein regions (88), we found that Rad51-NTD (1-66 a.a.) (1), Rad53-SCD1 (1-29 a.a.) and Sup35-NPD (1-39 a.a.) are highly structurally flexible. Since a high content of serine (S), threonine (T), glutamine (Q), asparanine (N) is a common feature of IDRs (17-20), we applied alanine scanning mutagenesis approach to reduce the percentages of S, T, Q or N in Rad51-NTD, Rad53-SCD1 or Sup35-NPD, respectively. As shown in Figure 4 and Figure 5, there is a very strong positive relationship between STQ and STQN amino acid percentages and β-galactosidase activities. (Page 13, lines 5-10)

      Pg 13, first full paragraph, "Futionally, IDRs..." I think this paragraph belongs in the Discussion.

      This paragraph is now in the “Introduction” section (Page 5, Lines 11-15).

      Pg. 15, I think the order of paragraphs should be swapped.

      These paragraphs have been removed or rewritten in the “Introduction section” of our revised manuscript.

      Pg 17 (and other parts) I found the lists of numbers and percentages hard to read and I think you should refer readers to the tables.

      Thank you. In the revised manuscript, we have avoided using lists of numbers and percentages, unless we feel they are absolutely essential.

      Pg. 19 please add more interpretation to the last paragraph. It is very cool but I need help understanding the result. Are these proteins diverging rapidly? Perhaps this is a place to include the idea of codon slippage during DNA replication.

      Thank you. The new results in Table 2 indicate that the CAG-slippage mechanism is unlikely to operate in ciliates with reassigned stop codons (TAAQ and TAGQ).

      Pg 24. "Based on our findings from this study, we suggest that Q-rich motifs are useful toolkits for generating novel diversity during protein evolution, including by enabling greater protein expression, protein-protein interactions, posttranslational modifications, increased solubility, and tunable stability, among other important traits." This idea needs to be cited. Keith Dunker has written extensively about this idea as have others. Perhaps also discuss why Poly Q rich regions are different from other IDRs and different from other IDRs that phase-separate.

      Agreed, we have cited two of Keith Dunker’s papers in our revised manuscript (73, 74).

      Minor notes:

      Please define Borg genomes (pg 25).

      Borgs are long extrachromosomal DNA sequences in methane-oxidizing Methanoperedens archaea, which display the potential to augment methane oxidation (101). They are now described in our revised manuscript. (Page 15, lines 12-14)

      Reviewer #2 (Recommendations For The Authors):

      The authors dance around disorder but never really quantify or show data. This seems like a strange blindspot.

      We apologize for not explaining this topic sufficiently well in our preprint manuscript. We have endeavored to do so in our revised manuscript.

      The authors claim the expression enhancement is "autonomous," but they have not ruled things out that would make it not autonomous.

      Evidence of the “autonomous” nature of expression enhancement is presented in Figure 1, Figure 4, and Figure 5 of the preprint manuscript.

      Recommendations for improving the writing and presentation.

      The title does not recapitulate the entire body of work. The first 5 figures are not represented by the title in any way, and indeed, I have serious misgivings as to whether the conclusion stated in the title is supported by the work. I would strongly suggest the authors change the title.

      Figure 2 could be supplemental.

      Thank you. We think it is important to keep Figure 2 in the text.

      Figures 4 and 5 are not discussed much or particularly well.

      This reviewer’s opinion of Figure 4 and Figure 5 is in stark contrast to those of the first reviewer.

      The introduction, while very thorough, takes away from the main findings of the paper. It is more suited to a review and not a tailored set of minimal information necessary to set up the question and findings of the paper. The question that the authors are after is also not very clear.

      Thank you. The entire “Introduction” section has been extensively rewritten in the revised manuscript.

      Schematics of their fusion constructs and changes to the sequence would be nice, even if supplemental.

      Schematics of the fusion constructs are provided in Figure 1A.

      The methods section should be substantially expanded.

      The method section in the revised manuscript has been rewritten and expanded. The six Javascript programs used in this work are listed in Table S4.

      The text is not always suited to the general audience and readership of eLife.

      We have now rewritten parts of our manuscript to make it more accessible to the broad readership of eLife.

      In some cases, section headers really don't match what is presented, or there is no evidence to back the claim.

      The section headers in the revised manuscript have been corrected.

      A lot of the listed results in the back half of the paper could be a supplemental table, listing %s in a paragraph (several of them in a row) is never nice

      Acknowledged. In the revised manuscript, we have removed almost all sentences listing %s.

      Minor corrections to the text and figures.

      There is a reference to table 1 multiple times, and it seems that there is a missing table. The current table 1 does not seem to be the same table referred to in some places throughout the text.

      Apologies for this mistake, which we have now corrected in our revised manuscript.

      In some places its not clear where new work is and where previous work is mentioned. It would help if the authors clearly stated "In previous work...."

      Acknowledged. We have corrected this oversight in our revised manuscript.

      Not all strains are listed in the strain table (KO's in figure 3 are not included)

      Apologies, we have now corrected Table S2, as suggested by this reviewer.

      Author response table 2.

      S. cerevisiae strains used in this study

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Recommendations For The Authors):

      While the details are mostly well-explained, I think that the authors could better bring forth the goals and potential usages of hippocampome.org overall.

      I think that this is a great and helpful tool that can leverage various and detailed cellular experimental studies that are out there in the literature to garner potential insights, direct future experimental studies, observe/classify experimental 'differences' (e.g., the deep and superficial pyramidal studies they mention) and so on. Say that one gets some mechanistic insight from more abstract theoretical models, hippocampome can be used to determine whether the experimental data where available is supportive of the theory. They also describe CA3 model and grid cells. While I am not suggesting that the authors completely re-organize the manuscript, I did feel that the last section 'potential applications...' could have perhaps been brought forth earlier (in a summarized form) for the reader/user to better appreciate hippocampome - indeed it is line 288 that should be near the beginning of the paper I thought.

      We thank the Reviewer for the suggestion. We have now included a summary of the simulation readiness of Hippocampome.org in the Introduction.

      I thought the 'application' paragraph (starting line 288) needed expansion to appreciate - I did not have a chance to look at the cited papers in that section - but maybe 2 paragraphs, one on CA3 and the other on grid cells, with a few more sentences of goal/context and tool usage details could be provided?

      We thank the Reviewer for the suggestion. We have added expanded paragraphs describing the simulation work on CA3 and grid cells.

      The authors start their Discussion by mentioning other resources (e.g. blue brain) in comparison. I thought that this was not too helpful without a bit more expansion about these other resources and what in particular is comparable. For example, the blue brain project is different in that it does not mine the literature per se (I think)? But then I am not sure of the extent of the comparison that the authors intend with blue brain and the other mentioned resources.

      Thank you for the helpful suggestion. We have now expanded upon the paragraph to draw more explicit parallels and contrasts among the various projects, in particular between the Blue Brain Project and Hippocampome.org.

      Minor comments

      • Fig 3D caption missing

      Thank you for pointing this out. We have now amended the figure caption.

      • Fig 5A line 211-12 refers to v2.0 but Fig 5 caption says v1.0?

      We apologize for the confusion. We have now added text clarifying the V1.X relevant descriptions around Figure 5.

      • Fig 6A confusing with thin and thick arrows and direction?

      We apologize for the confusion. We have re-colored the thick arrows orange to emphasize the fact that they are feeding directly into the spiking neural simulations.

      • Line 260 - not sure what this means - how is importance defined?

      We apologize for the confusion. We have now added text clarifying that “importance” refers to the role the neuron type plays in the functioning circuitry of the hippocampal formation.

      • CARLsim vs Brian/NEST in choosing - maybe a sentence or two for rationale

      Thank you for the suggestion. We have now added a sentence explaining the selection of CARLsim. CARLsim was selected due to its ability to run on collections of GPUs. CARLsim was the only simulator with this capability at the time the simulation work was being planned, and the power of a GPU supercomputer was needed to simulate the millions of neurons that comprise a full simulation of the complete hippocampal formation.

      • Fig 9 mv should be mV, and the voltage values specified there refer to which dash?

      Thank you for pointing these situations out. We have amended the millivolts label and have made changes to the figure to help clarify which specific tick marks are being labeled.

      Reviewer #3 (Recommendations For The Authors):

      Compliments to the authors on this nicely organized and structured presentation of V 2.0 of hippocampome.org. The paper is well prepared giving a useful short summary of the history of hippocampome for the newcomers and refreshing the memory of users, switching to highlighting the new data additions, why these are relevant and how these complement the existing database, and opening up to new applications. The added potential is well illustrated and in addition, the authors provide numerical information on the usage of this amazing resource. I enjoyed roaming around in the new version, which was made available for reviewers, and although it has been a while since I worked with the system, the new version is easy to work with. I have not had the time to use it extensively so cannot comment in detail but based on the long experience of the authors and their support team, I trust that version 2 will be almost not completely flawless; however that will for sure become clear when it is released.

      One could always wish for more, disagree, or even criticize choices made to cluster neurons, divide areas, and so forth, though in my view that does not contribute to what the resource has to offer. Having said this, the authors might consider addressing briefly issues about differences in the nomenclature used in original descriptions and how they handled the translation into their nomenclature. To mention one that is constantly being debated: how does one define the border between SMo and SMi.

      Thank you for the suggestion. We have added text to the Introduction that addresses the nomenclature issue, as presented in Hamilton et al. (2017), and provide a definition for SMo and SMi.

      Another confusing issue is presented by layers in the entorhinal cortex or its subdivisions (how many and how are these defined). So, some remarks for newcomers in the field who might use the database without spending too much energy to read the original data, might be useful.

      Thank you for the suggestion to clarify this situation pertaining to the entorhinal cortex. Often, we have assumed the authors’ own definitions of the layers and subdivisions (medial and lateral), when naming neuron types. When our name is a hybrid of two published names that include both medial and lateral neurons, our name is prefixed by a simple EC, rather than by MEC or LEC.

      As noted, the authors present version 2 nicely and comprehensibly and I have only a few additional comments, meant to further improve the already high quality of the paper.

      1) The figures, nice as they are, are incredibly information-dense, so they require serious study to get the details; the legends do help, but the many abbreviations coming from totally different fields make it challenging to keep track of them while reading. This is a pity since there is a lot of new information in this version of the dataset, compared to previous versions and the authors overall succeed in emphasizing what is new and why this might be of use/importance.

      So a few suggestions: i) add relevant/most important abbreviations to the legends of the individual figures; ii) introduce all abbreviations upon first use and do not simply refer to the table in the methods. Interestingly, even the authors lose track in the introduction where they use BICCN in line 43 and refer to the abbreviation list, though the full name is given two lines below.

      We apologize for the confusion. We have amended the main text to clarify abbreviations. We have added the abbreviation definitions to the captions of the figures, and in some instances, removed the abbreviations from the figures altogether where space allowed.

      2) Figure 3 and even more so figure 5 depend strongly on the color differences red/green; please change since generally red/green is no longer used for obvious reasons.

      Thank you for pointing this out. We have switched the fonts in Figure 3 to black (excitatory) and gray (inhibitory) to match our previous publication. We have also changed the color schemes in Figure 5 to avoid red and green.

      Reviewer #3 commented on the complexity of our figures and how the figures are information dense. To partially address this, we have decided to remove panel A2 of Figure 3. It was originally meant to emphasize where the information came from to add new axonal projections to two v1.0 neuron types; however, it is not necessary to make the point in the illustration. Thus, we have removed the panel and amended the caption for Figure 3A to include the cited reference.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for The Authors):

      1) While the specificity of the observed muscle phenotypes seems clear, the subsequent molecular analysis of Numb protein interactors does not seem to consider the potential involvement of Numb-like. The authors should demonstrate the relative expression levels of Numb and Numb-like in the models used, and establish the specificity of the antibodies used in IP, western and staining experiments.

      Response: Perhaps the most convincing evidence that the anti-Numb antibody did not pull down Numb-like is that this protein was not detected among immunoprecipitated protein complexes pulled down by the anti-Numb antibody used. The antibody used in the immunoprecipitation was validated by the supplier and was previously reported to immunoprecipitate Numb [1, 2]. We previously demonstrated that a morpholino against Numb mRNA almost completely eliminated the band detected by this antibody and that this band was at the expected molecular weight [ref]. In our hands, mRNA levels for Numb-like in skeletal muscle are 5-10-fold lower than those for Numb [3]. We have been unable to detect Numb-like protein in healthy adult skeletal muscle by immunoblotting or immunofluorescence staining. Taking all of these findings together, it seems unlikely that the antibodies used for immunoprecipitating Numb-protein complexes pulls down Numb-like.

      2) The authors use PCR to investigate Numb isoform expression and conclude that p65 is likely the dominant protein isoform expressed. While this agrees with the single band observed in Supp Figure 4A, a positive control for exon 9 excluded and included isoforms in the PCR reactions would strengthen this conclusion.

      Response: The amplicons shown in Supplemental 4 were sequenced. The clones corresponded to the isoforms with the exon 3 present or removed. No amplicons containing exon 9 were detected. The following sentence was added to the Analysis of Splice Variants section of Methods to address this point: “PCR products were cloned using the TOPO TA cloning system (ThermoFisher) and multiple resulting clones were sequenced to confirm that the expected products were generated.”

      3) PCR analysis of total Numb and Numb-like expression levels are not shown. This is important given the specificity of the Numb antibodies used for AP-MS experiments are not described and some Numb antibodies are well known to also recognize Numb-like. Two different Numb antibodies were used for Western and immunoprecipitation but the specificity for Numb and Numb-like is not described. In particular, does the antibody used in the AP-MS experiment recognize both Numb and Numb-like? Supplementary Table 1 does not list Numb or Numb-like, but presumably peptides were identified?

      Response: As noted above, the specificity of anti-Numb antibodies was confirmed in previous studies [3]. Importantly, Numb-like mRNA levels are 5-10-fold lower than Numb mRNA, and NumbL protein is undetectable in healthy adult skeletal muscle by Western. The physiology data reported in this manuscript supports the conclusion that a single KO of Numb is sufficient to recapitulate the physiological phenotype of Numb/Numb-like KO . We therefore reason that the majority, if not all, of the physiological contribution of these proteins to muscle contractility due to Numb (Fig. 1).

      4) The validation experiment used the same Numb antibody for immunoprecipitation, immunoblotted with Septin 7. A reciprocal IP of Septin 7 and blotted with Numb should be performed. In addition, a Numb-like IP or immunoblot would also be useful to demonstrate the specificity of the interaction. Efforts to map the interaction between Numb and Septin 7 would be useful to demonstrate specificity of the interaction and strategies to establish the biological relevance of the interaction.

      Response: We agree with the reviewer and attempted several IPs with anti-Septin7 antibodies. These were unsuccessful. In a new collaboration, Dr. Italo Cavini (University of Sao Paulo) has used machine-learning-based approaches to model binding between Numb and several septins, including Septin 7. The analysis suggests that binding of Numb with septins involves a domain of Numb that has not yet been ascribed a function in protein-protein interactions. These computational predictions require experimental validation but provide rational starting point for experiments to define the domains responsible for these interactions. Such experiments were included in our recent NIH R01 renewal application. We hope to be able to report on results of confirmatory experiments of these computational models in the future.

      5) Other septins were identified in the AP-MS experiment and might have been anticipated to also be disrupted by Numb/Numb-like deletion. Are these septins known to interact in a complex?

      Response: This is an excellent question. Septins have conserved motifs providing a clear reason to imagine that many different mammalian septins could directly interact with Numb. Septins form heterooligomers consisting of complexes formed by 3, 6 or 8 septins [4]. It is likely that when Numb binds to one septin, antibodies against Numb pull down other septins present in the septin oligomer to which Numb is bound. The following paragraph was added to the discussion: “Our findings suggest that Numb may also interact with other septins such as septins 2, 9 and 10, which were also identified with a high level of confidence as Numb interacting proteins by our LC/MS/MS analysis. Our data to not allow us to determine if Numb binds directly to these septins. Septins contain highly conserved regions, and, consequently, if one such region of septin 7 interacts with Numb, then many septins would be expected to directly bind Numb through the same domain. However, because septins self-oligomerize, is possible that when Numb binds to one septin, antibodies against Numb could also pull down other septins present in the septin oligomer to which Numb is bound regardless of whether or not they are also bound by Numb. “

      6) The text for Figure 5 describes analysis of Septin localization in inducible Numb/Numb-like cKO muscle, but the figure indicates only Numb is knocked out. Please clarify.

      Response: We apologize for this oversight on our part. The Legend to Figure 5 has been corrected.

      7) Supplementary Figure 2 seems to show that TAM treatment increases Numb expression. Please clarify. Also, please correct reference 9.

      Response: The figure was incorrectly labeled. We apologize for this oversight and have corrected the figure in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      Overall, the manuscript is well written. I do have a few minor issues/concerns, which are detailed below.

      Abstract: Please be a little more specific regarding which where the tissue came from (i.e. humans, mice, cell) when referring to your previous studies.

      Response: The abstract has been revised as requested.

      Introduction: Please be more specific regarding the technique used for detecting ultrastructural changes. I assume it was done with TEM, but the reference is listed as an "invalid citation" in your reference list.

      Response: The introduction was revised as requested and the citation was updated to reference a valid citation.

      Methods / Numb Co-Immunoprecipitation: Please indicated the level of confluency of the C2C12 cells as this will alter gene expression.

      Response: As indicated in the updated Methods section, confluent C2C12 cells were switched to differentiation media (low serum) for seven days. When harvested, the cells had differentiated and fused into myotubes.

      Methods / Immunohistochemical Staining: The first sentence needs to be edited regarding plurality and grammar.

      Response: Thank you for this comment. The text was revised accordingly.

      Results / GWAS and WGS Identify...: Please spell out phosphodiesterase (I assume) for PDE4D

      Response: This change was incorporated in the text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reports jAspSnFR3, a biosensor that enables high spatiotemporal resolution of aspartate levels in living cells. To develop this sensor, the authors used a structurally guided amino acid substitution in a glutamate/aspartate periplasmic binding protein to switch its specificity towards aspartate. The in vitro and in cellulo functional characterization of the biosensor is convincing, but evidence of the sensor's effectiveness in detecting small perturbations of aspartate levels and information on its behavior in response to acute aspartate elevations in the cytosol are still lacking.

      We thank the reviewers and editors for the detailed assessment of our work and for their constructive feedback. Most comments have now been experimentally addressed in the revised manuscript, which we feel is substantially improved from the initial draft.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Davidsen and coworkers describe the development of a novel aspartate biosensor jAspSNFR3. This collaborative work supports and complements what was reported in a recent preprint by Hellweg et al., (bioRxiv; doi: 10.1101/2023.05.04.537313). In both studies, the newly engineered aspartate sensor was developed from the same glutamate biosensor previously developed by the authors of this manuscript. This coincidence is not casual but is the result of the need to find tools capable of measuring aspartate levels in vivo. Therefore, it is undoubtedly a relevant and timely work carried out by groups experienced in aspartate metabolism and in the generation of metabolite biosensors.

      Reviewer #2 (Public Review):

      In this work the IGluSnFR3 sensor, recently developed by Marvin et al (2023) is mutated position S72, which was previously reported to switch the specificity from Glu to Asp. They made 3 mutations at this position, selected a S72P mutant, then made a second mutation at S27 to generate an Asp-specific version of the sensor. This was then characterized thoroughly and used on some test experiments, where it was shown to detect and allow visualization of aspartate concentration changes over time. It is an incremental advance on the iGluSnFR3 study, where 2 predictable mutations are used to generate a sensor that works on a close analog of Glu, Asp. It is shown to have utility and will be useful in the field of Asp-mediated biological effects.

      Reviewer #3 (Public Review):

      In this manuscript, Davidsen and collaborators introduce jAspSnFR3, a new version of aspartate biosensor derived from iGluSnFR3, that allows monitoring in real-time aspartate levels in cultured cells. A selective amino acids substitution was applied in a key region of the template to switch its specificity from glutamate to aspartate. The jAspSnFR3 does not respond to other tested metabolites and performs well, is not toxic for cultured cells, and is not affected by temperature ensuring the possibility of using this tool in tissues physiologically more relevant. The high affinity for aspartate (KD=50 uM) allowed the authors to measure fluctuations of this amino acid in the physiological range. Different strategies were used to bring aspartate to the minimal level. Finally, the authors used jAspSnFR3 to estimate the intracellular aspartate concentration. One of the highlights of the manuscript was a treatment with asparagine during glutamine starvation. Although didn't corroborate the essentiality of asparagine in glutamine depletion, the measurement of aspartate during this supplementation is a glimpse of how useful this sensor can be.

      Reviewer #1 (Recommendations For The Authors):

      The authors should evaluate the effectiveness of the sensor in detecting small perturbations of aspartate levels and its behavior in response to acute aspartate elevations in the cytosol. In vivo aspartate determinations were performed exclusively in conditions that cause aspartate depletion. By means the use of mitochondrial respiratory inhibitors or aspartate withdrawal, it was determined the reliability of the sensor performing readings during relatively long periods, until reaching a steady-state of aspartate-depletion 12-60 hours later. Although in Hellweg and coworkers, it has been demonstrated that a related aspartate sensor could detect increases in aspartate in cell overexpressing the aspartate-glutamate GLAST transporter, the differences reported here between both sensors advise testing whether this aspect is also improved, or not, using jAspSNFR3.

      Similarly, Davidsen et al. did not test if the sensor can be able to detect transient variations in cytosolic aspartate levels. In proliferative cells aspartate synthesis is linked to NAD+ regeneration by ETC (Sullivan et al., 2015, Cell), indeed the authors deplete aspartate using CI or CIII inhibitors but do not analyze if those are recovered, and increased, after its removal. Furthermore, the sequential addition of oligomycin and uncouplers could generate measurable fluctuations of aspartate in the cytosol.

      We agree with the reviewer that only including situations of aspartate depletion in our cell culture experiments provided an incomplete evaluation of the utility of this biosensor. In the revised manuscript we provide three additional experiments using secondary treatments that restore aspartate synthesis to conditions that initially caused aspartate depletion. First, we conducted experiments where cells expressing jAspSnFR3/NucRFP were changed into media without glutamine, inducing aspartate depletion, with glutamine being replenished at various time points to observe if GFP/RFP measurements recover. As expected, glutamine withdrawal caused a decay in the GFP/RFP signal and we found that restoring glutamine caused a subsequent restoration of the GFP/RFP signal at all time points, with each fully recovering the GFP/RFP signal over time (Revised Manuscript Figure 2E). Next, we conducted the experiment suggested by the reviewer, testing whether the published finding, that oligomycin induced aspartate limitation can be remedied by co-treatment with electron transport chain uncouplers, could be visualized using jAspSnFR3 measurements of GFP/RFP. Indeed, after 24 hours of oligomycin induced aspartate depletion, treatment with the ETC uncoupler BAM15 dose dependently restored GFP/RFP signal (Revised Manuscript Figure 2G). Finally, we also measured whether the ability of pyruvate to mitigate the decrease in aspartate upon co-treated with rotenone (Figure 2B) could also be detected in a sequential treatment protocol after aspartate depletion. Indeed, after 24 hours of aspartate depletion by rotenone treatment, the GFP/RFP signal was rapidly restored by additional treatment with pyruvate (Revised Manuscript Figure 2, figure supplement 1C). Collectively, these results provide support for the utility of jAspSnFR3 to measure transient changes in aspartate levels in diverse metabolic situations, including conditions that restore aspartate to cells that had been experiencing aspartate depletion.

      Reviewer #2 (Recommendations For The Authors):

      Weaknesses: Sensor basically identical to iGluSnFR3, but nevertheless useful and specific. The results support the conclusions, and the paper is very straightforward. I think the work will be useful to people working on the effects of free aspartate in biology and given it is basically iGluSnFR3, which is widely used, should be very reproducible and reliable.

      We appreciate the reviewer’s comment that sensor is useful for specific detection of aspartate. We agree that the advance of the paper is primarily in demonstrating its utility to measure aspartate, rather than any fundamental innovation on the biosensor approach. We hope the fact that jAspSnFR3 derives from a well validated biosensor (iGluSnFR3) will support its adoption.

      Reviewer #3 (Recommendations For The Authors):

      Although this is a well-performed study, I have some comments for the authors to address:

      1) A red tag version of the sensor (jAspSnFR3-mRuby3) was generated for normalization purposes, with this the authors plan to correct GFP signal from expression and movement artifacts. I naturally interpret "movement artifacts" as those generated by variations in cell volume and focal plane during time-lapse experiments. However, it was mentioned that jAspSnFR3-mRuby3 included a histidine tag that may induce a non-specific effect (responses to the treatment with some amino acids). This suggests that a version without the tag needs to be generated and that an alternative design needs to be set for normalization purposes. A nuclear-localized RFP was expressed in a second attempt to incorporate RFP as a normalization signal. Here the cell lines that express both signals (sensor and RFP) were generated by independent lentiviral transductions (insertions). Unless the number of insertions for each construct is known, this approach will not ensure an equimolar expression of both proteins (sensor and RFP). In this scenario is not clear how the nuclear expression of RFP will help the correction by expression or monitor changes in cell volume. The authors may be interested in attempting a bicistronic system to express both the sensor and RFP.

      The reviewer noted several potential issues concerning the use of RFP for normalization, which will be separated into sections below:

      Movement artifacts:

      We are glad the reviewer raised this issue since we see how it was confusingly worded. We have deleted the text “and movement artefacts” from the sentence.

      His-tag and non-specific responses to some amino acids:

      We also found it concerning that non-specific responses to amino acids could potentially contribute to our RFP normalization signal, and so we conducted additional experiments to address whether this was likely to be an issue in intracellular measurements. We first tested whether the non-specific signal was related to the histidine tag, or was intrinsic to the mRuby3 protein itself, by comparing the fluorescence response to a titration of histidine (which showed the largest effect of red fluorescence), aspartate, and GABA (structurally related to glutamate and aspartate, but lacking a carboxylate group) across a group of mRuby containing variants, with or without histidine tags. We replicated the non-specific signal originally observed in jAspSnFR3-mRuby3-His and found that another biosensor with a histidine tagged on the C terminus of mRuby3 had a similar response (iGlucoSnFR2.mRuby3-His), as did mRuby3-His alone, indicating that the aspect of being fused with jAspSnFR3 or another binding protein was not required for this effect. Additionally, we also compared the fluorescence response of lysates expressing mRuby2 and mRuby3 without histidine tags and found that the non-specific signal was essentially absent (Revised Manuscript Figure 1, figure supplement 4B-D). Collectively. These data support our original hypothesis that the histidine tag was responsible for the non-specific signal, alleviating concerns about more substantial protein design issues or with using nuc-RFP for normalization. Since we also found that measuring aspartate signal using GFP/RFP ratios from cells with linked the jAspSnFR3-Ruby3-His agreed with measurements from cells separately expressing jAspSnFR3 and nucRFP (without a His tag), and the amino acid concentrations needed to significantly alter His tagged Ruby3 signal are above those typically found in cells, we conclude that this is unlikely to be a significant factor in cells. Nonetheless, we have added all the relevant data to the manuscript to allow readers to make their own decision about which construct would be best for their purposes.

      Original text:

      "Surprisingly, the mRuby3 component responds to some amino acids at high millimolar concentrations, indicating a non-specific effect, potentially interactions with the C-terminal histidine tag (Figure 1—figure Supplement 2, panel B). Notably, this increase in fluorescence is still an order of magnitude lower than the green fluorescence response and it occurs at amino acid concentrations that are unlikely to be achieved in most cell types."

      Revised text:

      "Surprisingly, the mRuby3 fluorescence of affinity-purified jAspSnFR3.mRuby3 responds to some amino acids at high millimolar concentrations, indicating a non-specific effect (Figure 1—figure Supplement 4, panel A). This was determined to be due to an unexpected interaction with the C-terminal histidine tag and could be reproduced with other proteins containing mRuby3 and purified via the same C-terminal histidine tag (Figure 1—figure Supplement 4, panel B and C). Interestingly, a structurally related, non-amino acid compound, GABA, does not elicit a change in red fluorescence; indicating, that only amino acids are interacting with the histidine tag (Figure 1—figure Supplement 4, panel D). Nevertheless, most of our cell culture experiments were performed with nuclear localized mRuby2, which lacks a C-terminal histidine tag, and these measurements correlated with those using the histidine tagged jAspSnFR3-mRuby3 construct (Figure 1—figure Supplement 1 panel D)."

      Lentiviral transductions

      We agree that splitting the two fluorescent proteins across two expression constructs and infections effectively guarantees that there will not be equimolar expression of jAspSnFR3 and RFP, however we do not think equimolar expression is necessary in this context. The primary goal of RFP measurements in these experiments (and in experiments using the jAspSnFR3-mRuby3 fused construct) is to control for global alterations in protein expression that might confound the interpretation that a change in GFP fluorescence corresponds to a change in aspartate levels. While a bicistronic system is arguably a better approach to improve the similarity of expression of jAspSnFR3 and nuc-RFP in a cell, we only require that the cells have consistent expression of both proteins across all cells in the population, not that the expression of one necessarily be a similar molarity to the other. We accomplish consistent expression of proteins by single cell cloning after expression of jAspSnFR3 and nucRFP (or jAspSnFR3-mRuby3), and screening for clones that have high enough expression of both proteins such that they are well detected by standard Incucyte conditions. Given that our data do not identify an obvious downside to separate expression of jASPSnFR3 and nuc-RFP compared to the fused jAspSnFR3-mRuby3 construct (where the fluorescent proteins are truly equimolar) (Figure 2, Figure Supplement 1C), we elected to prioritize the separate jAspSnFR3 and nuc-RFP combination, which provides additional opportunities to measure cell number in the same experiment (see below).

      2) The authors were interested in establishing the temporal dynamics of aspartate depletion by genetics and pharmaceutical means. For the inhibition of mitochondrial complex I rotenone and metformin were used. Although the assays are clearly showing aspartate depletion the report of cell viability is missing. Considering that glutamine deprivation induces arrest in cell proliferation, I think will be important to know the conditions of the cell cultures after 60 hours of treatment with such inhibitors.

      We agree that ensuring that cells are still viable in conditions where aspartate is depleted, as determined by GFP/RFP in jAspSnFR3 expressing cells, is an important goal. To this end, we added a new experiment investigating the restoration of glutamine on the GFP/RFP signal at different time points after glutamine depletion (Revised Manuscript Figure 2E, see response to reviewer 1). One advantage of using the nuclear RFP as a normalization marker is that it also enables measurements of nuclei counts, a surrogate measurement for cell number. In the same glutamine depletion experiment we therefore measured cell counts using nuclear RFP incidences and confluency as measurements of cell proliferation/growth. In both cases, the arrest in cell proliferation upon glutamine withdrawal was obvious, as was the restoration of cell proliferation following glutamine replenishment, with the amount of growth delay corresponding to the length of glutamine withdrawal (Revised Manuscript Figure 2, Figure Supplement 2A-B). Nonetheless, there was no obvious lasting defects in restarting cell proliferation even after 12 hours of glutamine withdrawal, indicating that cell viability is preserved. In the case of mitochondrial inhibitors, we also observe even that after 24 hours of treatment with oligomycin or rotenone, restoration of aspartate synthesis from BAM15 or pyruvate, respectively, can also restore GFP/RFP signal, supporting the conclusion that cellular metabolism is still active in these conditions (Revised Manuscript Figure 2G; Revised Manuscript Figure 2, figure supplement 1C).

      3) The pH sensitivity was checked in vitro with jAspSnFR3-mRuby3 and the sensor reported suitable for measurements at physiological pH. It would be an opportunity to revisit the analysis for pH sensitivity in cultured cells using an untagged version of jAspSnFR3 coupled, for example, to a sensor for pH.

      We thank the reviewer for the suggestion and agree that pH effects on sensor signal could be a confounding factor in some conditions. Unfortunately, measuring intracellular pH is not trivial and using multiple fluorescent sensors that change simultaneously would be complex to interpret, particularly in the absence of controls to unambiguously control intracellular pH and aspartate concentrations. Thus, we believe that proper investigation of the variable of pH is beyond the scope of this study. Nonetheless, we agree that measuring the contribution of pH to sensor signal is an important goal for future work, particularly if deploying it in conditions likely to cause substantial pH differences, such as comparing compartmentalized signal of jAspSnFR3 in the cytosol and mitochondria. We have added the following italicized text to the conclusions section to underscore this point:

      “Another potential use for this sensor would be to dissect compartmentalized metabolism, with mitochondria being a critical target, although incorporating the influence of pH on sensor fluorescence will be an important consideration in this context.”

      4) While the authors take an interesting approach to measuring intracellular aspartate concentration, it will be highly desirable if a calibration protocol can be designed for this sensor. Clearly, glutamine depletion grants a minimal ("zero") aspartate concentration. However, having a more dynamic way for calibration will facilitate the introduction of this tool for metabolism studies. This may be achieved by incorporating a cultured cell that already expresses the transporter or by ectopic expression in the cells that have already been used.

      We appreciate the suggestion and would similarly desire a calibration protocol to serve as a quantitative readout of aspartate levels from fluorescence signal, if possible. While we do calibrate jAspSnFR3 fluorescence in purified settings, conducting an analogous experiment intracellularly is currently difficult, if not impossible. While we have several methods to constrain the production rate of aspartate (glutamine withdrawal, mitochondrial inhibitors, and genetic knockouts of GOT1 and GOT2), we cannot prevent cells from decreasing aspartate consumption and so cannot get a true intracellular zero to aid in calibration. Additionally, the impermeability of aspartate to cell membranes makes it challenging to specifically control intracellular concentrations using environmental aspartate, and the best-known aspartate transporter (SLC1A3) is concentrative and so has the reciprocal problem. Considering these issues, we are wary of implying to readers that any specific fluorescence measurement can be used to directly interpret aspartate concentration given the many variables that can impact its signal, both related to the biosensor system itself (expression of jAspSnFR3, expression of Nuc-RFP, sensitivity and settings of the fluorescence detector) and based on cell intrinsic variability (differences in basal ASP levels, different sensitivity to treatments, influence of pH, etc.). We maintain that jAspSnFR3 has utility to measure relative changes in aspartate within a cell line across treatment conditions and over time, but absolute quantitation of aspartate still will require complementary approaches, like mass spectrometry, enzymatic assays, or NMR.

      5) jAspSnFR3 seems to have the potential to be incorporated easily for several research groups as a main tool. In general, a minor correction to replace F/F with ΔF/F in the text.

      Thank you for catching this error, the text has been edited accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors provide evidence to show that an increase in Kv7 channels in hilar mossy cells of Fmr1 knock out mice results in a marked decrease in their excitability. The reduction in excitatory drive onto local hilar interneurons produces an increased excitation/inhibition ratio in granule cells. Inhibiting Kv7 channels can help normalize the excitatory drive in this circuit, suggesting that they may represent a viable target for targeted therapeutics for fragile-x syndrome.

      Strengths:

      The work is supported by a compelling and thorough set of electrophysiological studies. The authors do an excellent job of analysing their data and present a very complete data set.

      We thank the Reviewer for the positive comments.

      Weaknesses:

      There are no significant weaknesses in the experimental work, however the complexity of the data presentation and the lack of a schematic showing the organizational framework of this circuit make the data less accessible to non-experts in the field. I highly encourage a graphical abstract and network diagram to help individuals understand the implications of this work.

      We thank the Reviewer for the suggestion, and added a schematic of the dentate network organization (Figure 1A).

      The work is important as it identifies a unique regional and cell-specific abnormality in Fmr1 KO mice, showing how the loss of one gene can result in region-specific changes in brain circuits.

      Reviewer #2 (Public Review):

      Summary:

      Deng et al. investigate, for the first time to my knowledge, the role that hippocampal dentate gyrus mossy cells play in Fragile X Syndrome. They provide strong evidence that, in slice preparations from Fmr1 knockout mice, mossy cells are hypoactive due to increased Kv7 function whereas granule cells are hyperactive compared to slices from wild-type mice. They provide indirect evidence that the weakness of mossy cell-interneuron connections contributes to granule cell hyperexcitability, despite converse adaptations to mossy cell inputs. The authors show that application of the Kv7 inhibitor XE991 is able to rescue granule cell hyperexcitability back to wild-type baseline, supporting the overall conclusion that inhibition of Kv7 in the dentate may be a potential therapeutic approach for Fragile X Syndrome. However, any claims regarding specific circuit-based intervention or analysis are limited by the exclusively pharmacological approach of the manipulations.

      Strengths:

      Thorough electrophysiological characterization of mossy cells in Fmr1 knockout mice, a novel finding.

      Their electrophysiological approach is quite rigorous: patched different neuron types (GC, MC, INs) one at a time within the dentate gyrus in FMR1 KO and WT, with and without 'circuit blockade' by pharmacologically inhibiting neurotransmission. This allows the most detailed characterization possible of passive membrane/intrinsic cell differences in the dentate gyrus of Fmr1 knockout mice.

      Provide several examples showing the use of Kv7 inhibitor XE991 is able to rescue excitability of granule cell circuit in Fmr1 knockout mice (AP firing in the intact circuit, postsynaptic current recordings, theta-gamma coupling stimulation).

      We thank the Reviewer for the positive comments.

      Weaknesses:

      The implications for these findings and the applicability of the potential treatment for the disorder in a whole animal are limited due to the fact that all experiments were done in slices.

      We appreciate the Reviewer’s point and agree. To address this concern, we have revised the Discussion to state that “the applicability of a circuit-wide approach as a potential treatment in vivo will require extensive future behavioral analyses, which are beyond the scope of the current study”. We also now emphasize in Discussion that “these findings provide a proof-of-principle demonstration that a circuit-based intervention can normalize dynamic E/I balance and restore dentate circuit output in vitro”.

      The authors' interpretation of the word 'circuit-based' is problematic - there are no truly circuit-specific manipulations in this study due to the reliance on pharmacology for their manipulations. While the application of the Kv7 inhibitor may have a predominant effect on the circuit through changes to mossy cell excitability, this manipulation would affect many other cells within the dentate and adjacent brain regions that connect to the dentate that express Kv7 as well.

      We appreciate the reviewer’s point but would like to clarify that by using a term “circuit-based” we did not intend to imply that it is a “’circuit-specific” intervention. Our intended interpretation of the term ‘circuit-based’ stems from the following reasoning: the dentate circuit has two types of excitatory neurons which show opposite excitability defects in FXS mice, thus presenting an irreconcilable conflict to correct pharmacologically for each cell type individually. Instead, we sought an approach to correct the overall dentate circuit output, rather than to restore excitability defects of individual cell types. Notably, when we pharmacologically isolated granule cells from the circuit, inhibition of Kv7 failed to restore their excitability, suggesting that normalization of the dentate output depends on the circuit activity. Since we focused on correcting dentate output using such a circuit-dependent approach, we used the term ‘circuit-based intervention’ to emphasize this notion.

      Reviewer #3 (Public Review):

      The paper by Deng, Kumar, Cavalli, Klyachko describes that, unlike in other cell types, loss of Fmr1 decreases the excitability of hippocampal mossy cells due to up-regulation of Kv7 currents. They also show evidence that while muting mossy cells appears to be a compensatory mechanism, it contributes to the higher activity of the dentate gyrus, because the removal of mossy cell output alleviates the inhibition of dentate principal cells. This may be important for the patho-mechanism in Fragile X syndrome caused by the loss of Fmr1.

      These experiments were carefully designed, and the results are presented ‎in a very logical, insightful, and self-explanatory way. Therefore, this paper represents strong evidence for the claims of the authors. In the current state of the manuscript, there are only a few points that need additional explanation.

      We thank the Reviewer for the positive comments.

      One of the results, which is shown in the supplementary dataset, does not fit the main conclusions. Changes in the mEPSC frequency suggest that in addition to the proposed network effects, there are additional changes in the synaptic machinery or synapse number that are independent of the actual activity of the neurons. Since the differences of the mEPSC and sEPSC frequencies are similar and because only the latter can signal network effects, while the former is typically interpreted as a presynaptic change, it cannot be claimed that sEPSC frequency changes are due to the hypo-excitability of mossy cells.

      We thank the Reviewer for this important point and agree. To address this concern, we now state in Results that “We note that changes in the excitatory drive onto interneurons include both mEPSC and sEPSC frequencies, which reflect not only potential deficits in excitability of their input cells, such as MCs, but also changes in synaptic connectivity/function, that may arise from homeostatic circuit reorganization/compensation (see Discussion)”.

      We also now emphasize this point in Discussion by stating that “alterations in excitatory drives, including both mEPSC and sEPSC frequencies onto interneurons, suggest changes in the excitatory synapse number and/or function. Together with alterations in inhibitory drives these changes may reflect compensatory circuit reorganization of both excitatory and inhibitory connections, including mossy cell synapses”.

      We also note in Discussion that “Such circuit reorganization can explain the balanced E/I drive onto granule cells in Fmr1 KO mice we observed in the basal state, which can result from reorganization of excitatory and inhibitory axonal terminals”.

      Notably, our findings that Kv7 blocker acting by increasing MC excitability is sufficient to correct dentate output, supports the notion that hypo-excitability of mossy cells is a major factor contributing to dentate circuit E/I imbalance. This does not exclude the presence of additional mechanisms contributing to E/I imbalance, such as changes of synaptic connectivity or release machinery. To reflect this point, we revised the Results to temper the initial claim that “this analysis supports the notion that the hypo-excitability of MCs in Fmr1 KO mice caused (now replaced with “is a major factor contributing to”) the reduction of excitatory drive onto hilar interneurons, which ultimately results in reduced local inhibition”.

      An apparent technical issue may imply a second weak point in the interpretation of the results. Because the IPSCs in the PP stimulation experiments (Fig 8) start within a few milliseconds, it is unlikely that its first ‎components originate from the PP-GC-MC-IN feedforward inhibitory circuit. The involvement of this circuit and MCs in the Kv7-dependent excitability changes is the main implication of the results of this paper. But this feedforward inhibition requires three consecutive synaptic steps and EPSP-AP couplings, each of them lasting for at least 1ms + 2-5ms. Therefore, the inhibition via the PP-GC-MC-IN circuit can be only seen from 10-20ms after PP stimulation. The earlier components of the cPSCs should originate from other circuit elements that are not related to the rest of the paper. Therefore, more isolated measurements on the cPSC recordings are needed ‎which consider only the later phase of the IPSCs. This can be either a measurement of the decay phase or a pharmacological manipulation that selectively enhances/inhibits a specific component of the proposed circuit.

      We appreciate the Reviewer’s point. As we mentioned in Results: “The EPSP measured in granule cells in response to the PP stimulation integrates both excitatory and inhibitory synaptic inputs onto granule cells, including the direct synaptic input from the PP and all the PP stimulation-associated feedforward and feedback synaptic inputs. In other words, the EPSP in granule cells integrates all dentate circuit ‘operations’.” As the Reviewer pointed out, this is also the case in the measurements of cPSCs, which comprise all of PP stimulation-associated feedforward and feedback inhibition. We thank the Reviewer for the suggestion to isolate specific components of IPSC. However, we did not attempt to do it in this study for three reasons. First, activity of all of these circuit components likely overlaps extensively in time and it is difficult to identify the specific time point that can separate contributions from earlier canonical feed-forward and feed-back components from the contribution of the later MC-dependent PP-GC-MC-IN feed-forward component. Notably the tri-synapse PP-GC-MC-IN component differs temporarily from the canonical di-synaptic (PP-GC-IN) feed-back inhibition only by a single synaptic activation step, resulting in only a few milliseconds difference. Moreover, the temporal differences in the contributions of these components vary widely among different recordings making a uniform analysis very difficult. Second, we used three different metrics to assess E/I changes in cPSC measurements, which capture a wide range of temporal processes and their integration, including peak-to-peak measurements, the charge transfer, and the excitation window metrics. Third, the principal readout in our study was the overall dentate output (i.e., granule cell firing), which reflects the integration of all dentate circuit ‘operations’ thus making the overall cPSC measurements appropriate, in our view, for this readout.

      I suggest refraining from the conclusions saying "‎MCs provide at least ~51% of the excitatory drive onto interneurons in WT and ~41% in KO mice", because too many factors (eg. IN cell types, slice condition, synaptic reliability) are not accounted for in these actual numbers, and these values are not necessary for the general observation of the paper.

      We thank the reviewer for this suggestion, and have revised the manuscript accordingly.

      There are additional minor issues about the presentation of the results.

      We have carefully checked and corrected the minor errors that reviewer pointed out.

      Recommendations for the authors:

      Revisions that are considered essential for improved assessment regarding the strengths of support of the claims:

      • Temper claims regarding circuit-based effects

      • Temper claims regarding very specific quantitative assessments of synaptic drives

      • Differentiate between monosynaptic inputs and inputs arriving through multiple synaptic contacts with proper analytical techniques.

      We appreciate these suggestions and have revised the manuscript to address the concerns raised by the reviewers.

      Reviewer #1 (Recommendations For The Authors):

      The authors do an outstanding job of reviewing and presenting all of their data. This is a paper I will recommend all of my trainees read, as it is an excellent example of a complete research project. While I am impressed with the effort involved, I also wondered if the complexity and thoroughness of their presentations could make the story less accessible to non-expert readers. My comments are simply intended to help them present a more coherent and succinct story to a wider audience, though I am not sure I really provide any meaningful changes. This is simply a very thorough and complete body of work that the authors should be commended for. After reading it I felt they had gone above and beyond what most authors would provide in terms of data to support their story, and thus I had no doubt that a change in Kv7 plays a role in changing the excitability of the network.

      We thank the Reviewer for the positive comments and great suggestions. We have made numerous changes to present our work in a more coherent and succinct way, in part by re-plotting some of the figures, as well as by adding a schematic of the dentate circuit in Figure 1.

      Figure 1. A visual of mossy cells and the local circuit they are studying would be a useful addition to Figure. 1. I also feel this is important for conveying the story of how hypo-excitability can impact the E/I of the network. I think it has to be more of a cell structure/circuit-based figure than is presented in Supplementary Figure 8.

      We thank the reviewer for this suggestion. We have added a schematic of the dentate circuit with all major cell types involved in Figure 1A.

      Figure 1. A, B, and C tell a coherent story and are easy to understand. The interpretation of the phase plot in D is harder to access. Perhaps having this as a separate figure and providing a clearer presentation of the way the phaseplot was created (see Figure 3 Bove et al., 2019, Neuroscience 418; DOI: 10.1016/j.neuroscience.2019.08.048)

      We appreciate the Reviewer’s point and agree. In order to keep Figure 1 more concise and readable, we removed the phase plot in the revised version. This change did not negatively impact the result presentation because the primary aim of this plot was to visualize changes in voltage threshold in an alternative way, but it was already clearly shown by the ramp-evoked AP traces (revised Figure 1D, insert), and thus was not essential to show.

      Figure 1 E-N might be better situated in a supplementary graph as the characteristics of the AP aren't changing.

      We understand the Reviewer’s point, but we feel it would be better to keep all action potential metrics together in one figure, to show that only a specific subset of parameters was affected in Fmr1 KO mice.

      Figure 2: (A-D) I am not sure having so many figures is required given the focus is on having a small change in Ir at one membrane potential. I do worry that the significance appears to be due to 2 cells with an IR of over 100 in the WT group and 2 with an IR of around 62 in the KO group. All other cells are between 75-100 in both groups. I also worry a bit bc in the literature IRs between 55 and 125 seem to be commonly reported by groups that do this work normally (Buzsacki, Westbrook, etc.). I would be cautious about making too much out of this result.

      We thank the Reviewer for these comments. We have performed additional analyses of these data, as also suggested by Reviewer 3 (Point #1), and improved presentation of the data in Figure 2D-F by showing the effect of XE991 on increasing input resistance in WT vs KO. We also plotted other panels in a similar way to show the comparisons between WT and KO, as well as comparisons within genotype +/- XE991, which makes the results easy to follow. For more details, please also see the response to Reviewer 3, Point 1.

      Figure 2D-E: As in the text, this result is really pointing towards there being a Kv7 issue. Worries about the data in D aside, I think these two figures alone tell a clearer story. Figure 3 on the other hand tells a story of the effects of blocking Kv7 on membrane potential. Is this central to the story the others are trying to tell?

      We thank the reviewer for this point. We believe that Figure 2, Figure 3 and Figure 4—figure supplement 1 together provide strong and multifaceted evidence to support changes in Kv7 function in Fmr1 KO mossy cells.

      Figure 3. This is an interesting finding that shows how detailed their analysis was. Showing that the change in holding current in KO animals is greater than in WT is the first solid piece of evidence that there is a change in Kv7 in these cells that affects their excitability.

      We appreciate the reviewer’s comment. As mentioned above, we believe that Figure 2, Figure 3 and Figure 4—figure supplement 1 together provide strong and multifaceted evidence to support changes in Kv7 function in Fmr1 KO mossy cells.

      Figures 4 and 5 provide additional detail to support the idea that Kv& changes by showing how the E/I ratio and spontaneous minis are shifted in KO animals.

      We thank the Reviewer for the comments.

      Figures 6-8 build a compelling story for the reduction in excitatory drive in mossy cells affecting the network dynamics in excitatory/inhibitory interactions in DG cells.

      We appreciate the Reviewer’s comment.

      Reviewer #2 (Recommendations For The Authors):

      1) Other than location and characteristic morphology, the other parameters that were used to identify mossy cells and granule cells were also parameters used to find differences in cellular properties between wild-type and Fmr1 KO mice (RMP, sEPSC frequency, etc.), which would confound the results shown. The use of available transgenic mouse lines would provide for a more unbiased screen of these cells. Afterhyperpolarization was also used as a parameter while screening cells, yet none of the data on this measurement is shown.

      We thank the reviewer for this point and agree that transgenic mouse lines provide a more unbiased way to identify various types of neurons. However, since the present study involves analyses of at least three different types of neurons, establishing multiple transgenic lines labeling different types of dentate neurons in the Fmr1 KO mouse model would be very time consuming and beyond the current resources of the lab. We would also like to clarify that the three types of dentate neurons are easily distinguished according to the large differences in location, morphology and basal electrophysiological properties, none of which were essential in defining differences between genotypes. Specifically, granule cells are located in the granule cell layer, have a small cell body (<10 m), RMP around -80mV, capacitance ~20 pF, and infrequent sEPSCs (<20 events/min); mossy cells are located in the hilus, have a large cell body (>15 m), RMP around -65 mV, capacitance >100 pF, and fast afterhyperpolarization less than -10 mV (WT –5.1 ± 0.7 mV, KO -5.8 ± 0.5 mV); interneurons are located in the hilus or border of granule cell layer, have a relative smaller cell body (10-15 m), RMP around -55 mV, capacitance <60 pF, and afterhyperpolarization larger than -15 mV (WT -20.4 ± 1.3 mV, KO -19.8 ±1.4 mV). We note that the cells that could not be definitively classified into the three categories were not included in analyses, and we have now clarified this further in the Methods. To address the reviewer’s second concern regarding AHP, we now provided the corresponding values in the Methods.

      2) A definitive way to test the cell-autonomous nature of the Kv7 changes would be to use female mice, who will have a mosaic of cells affected by the fragile X chromosome, and the Fmr1 KO cells could be engineered to express GFP to help identify them from wild-type cells.

      We agree and appreciate this suggestion. This could be an interesting follow up study to further verify the cell-autonomous nature of Kv7 changes.

      3) The authors heavily rely on XE991 as a selective Kv7 blocker. Is it blocking all Kv7 channels at the concentration used? If so, given the significant expression of Kv7 in the dentate as shown by Western blot, is it surprising that there is no effect of this inhibitor on wild-type slices in most cases?

      We thank the reviewer for this important point. We used 10x of IC50 concentration in the present study, suggesting that more than 80% of Kv7 should be blocked. Notably, we observed several effects of XE991 in WT mice: it significantly increased input resistance (new Figure 2D-F), and strongly enhanced AP firing evoked by step depolarization (Figure 7E-H), although we did not observe effect of XE991 in WT in the analyses of spiking evoked by theta-gamma stimulation in Figure 8. However, this is not surprising. If a parameter we measured is predominately cell-autonomous (for example, input resistance), the effects of XE991 are easy to observe. However, if a parameter reflects integration of all dentate circuit operations (for example, AP probability in response to theta-gamma stimulation), it is difficult to detect the effect of XE991 in WT mice because the dentate circuit of WT mice has larger capability to maintain E/I balance in response to XE991.

      4) E/I ratio is a helpful concept, and it is heavily relied upon in the results text, but statistically shaky, especially for sEPSC:sIPSCs since you are combining uncertainty in the sEPSC and sIPSC to make one very uncertain ratio that doesn't undergo any subsequent statistical confirmation (such as in Fig 4I).

      We appreciate the reviewer’s point and apologize for the confusion in presentation of Fig 4I (and 5I), due to lack of detailed explanation. The E/I ratio shown in Figs. 4I (and 5I) is a single data-point estimate calculated from the mean values of independent sEPSC and sIPSC measurements (Figs. 4G-H and 5G-H, respectively). This ratio was used only as an estimate/illustration of the changes, rather than a precise determination of the shift in E/I balance. Because there is only one data-point for this ratio, statistical analysis is not possible. For this reason we performed extensive additional analyses in Figures 7 and 8, in which the EPSC and IPSC were measured from the same cells and at the same time to define the actual E/I ratio with the corresponding statistical analyses (i.e., a real matched and dynamic E/I ratio).

      5) Is this mGlur2/CB1 specificity to PP/granule and MC axons, respectively, true in the Fmr1 KO mice? It is possible that mGluR2 and CB1 expression patterns are altered in FMR1 KO, thus the assumption used to isolate these distinct inputs may not hold true.

      This is a very good point. We do assume that the specificity of Group II mGluR and CB1 is similar between Fmr1 KO and WT mice, but this is an assumption that we have not directly verified. However, our results in Figures 7 and 8 strongly support this assumption, because if it were not true, then our intervention would be unlikely to correct the excessive dentate output.

      6) XE991 only normalized GC firing when other cells were not pharmacologically blocked. The authors suggest this means blockage of MC Kv7 reduces GC excitability back to normal...presumably by increasing MC --> IN --> GC firing. This is a conclusion from many indirect comparisons (comparing XE991 effect on GC with/without GABA and glutamate blockers; comparing MC firing rates with/without XE991, and using CB1 agonist versus mGluR2 agonist to say it is mossy cells that are mostly controlling INs) - a clincher experiment would be to acutely knockdown Kv7 in mossy cells specifically and measure GC and IN firing.

      Thank you, this is a great suggestion. Indeed, as an expansion of this project, in the future studies we are planning to manipulate excitability of mossy cells through manipulating Kv7, or using chemogenetic or optogenetic approaches.

      7) The reasoning behind the FMRP-Kv7 connection is quite weak, citing the paper Darnell 2011 as "translational target", but FMRP has myriad translational targets.

      We agree, and attempted to define the mechanism of increased Kv7 function using co-immunoprecipitation approach, as well as immunostaining to look at cell-type specific expression changes. However, both of these approaches were difficult to interpret due to technical limitations of the available antibodies. We also note that “We did not further investigate the precise mechanisms underlying enhancement of Kv7 function in the absence of FMRP, since the present study primarily focuses on the functional consequences of abnormal cellular and circuit excitability”. To address this concern, we extensively discussed the potential mechanisms of FMRP-Kv7 connection, acknowledged in Discussion that “further studies will be needed to elucidate the precise mechanism responsible for the increased Kv7 function in Fmr1 KO mice”, and will continue to investigate it in the future studies.

      8) The authors attempt to look for changes in Kv7 expression with Western blot, but since they hypothesize that Kv7 changes are mainly in the mossy cells, it is perhaps not surprising that they would not be able to see any changes when they look at dentate as a whole. Staining for Kv7 subunits to look at expression on a cellular level would be beneficial.

      We appreciate the reviewer’s suggestion. We attempted to perform the suggested experiments using immunostaining for KCNQ2, KCNQ3 and KCNQ5 in different subtypes of dentate neurons. However, these experiments failed to produce interpretable results due to technical limitations of the available antibodies.

      9) Is Kv7 localization or splice/composition different in FMR1 KO mice?

      This is a very good point. As we mentioned in Point 8 above, we were not able to perform these experiments and do not have the answer at this point.

      10) Regarding the 3 subtypes of interneurons in the dentate, the authors are pooling data based on similar intrinsic properties, but this conclusion may be affected by the low number of recorded neurons for the regular-spiking type. In addition, it is unclear whether these different interneuron types have differential circuit connectivity (most likely) which would make it imperative to keep circuit analysis for interneurons segregated into these cell types.

      We appreciate the reviewer’s point. Indeed, these different interneuron types may have distinct circuit connectivity and contributions to circuit activity. However, identification of these 3 types of interneurons and determination of their respective functions is in itself a very extensive set of experiments which is beyond the scope of the current manuscript. We also note that the functional readout of circuit activity in our measurements was the AP firing and EPSPs evoked in granule cells by PP stimulation, which integrate all dentate circuit operations, including all of the feedforward and feedback loops which are mediated by all of these different types of interneurons. For simplicity, we thus pooled all interneuron data for the purposes of this study. But we fully agree that extensive future work is required to elucidate interneuron-type specific changes in Fmr1 KO mice and their contributions to the dentate circuit dysfunction.

      11) To do statistics treating each cell individually, and therefore assuming each cell is independent of one another, is not correct. Two cells from the same mouse will be more similar than two cells from different mice, therefore they are not independent data points. Nested statistical methods (n cells from o slices from p mice) will be important in future work, as discussed by (Aarts et al., Nat. Neurosci. 2014).

      We agree with the Reviewer’s point and appreciate this suggestion. In the present study, the cells tested in electrophysiological experiments were from at least 3 different mice for each condition, which help minimize this kind of errors.

      Reviewer #3 (Recommendations For The Authors):

      Is there a difference in the Rin at -45mV of the control cell after the application of XE991? This is important to appreciate whether the XE991-sensitive conductances contribute to the basal excitability of MCs. Furthermore, the statistical comparison of the Rin at -45mV of the FXS animals in the control solution and in the presence of XE991 would be also important‎. Actually, the most accurate measurement would be to show a difference in the acute Kv7-blockade between control and FXS animals, if that is possible with this blocker. Additionally, it would be also informative if the bar graphs in Fig.2 D & E were merged for this purpose, similarly as in the later figures.

      We thank the Reviewer for this suggestion and agree. Following this suggestion, we have re-plotted the data in Figure 2 accordingly. Specifically, we now show that XE991 significantly increased input resistance in both WT and KO mossy cells, and the effect of XE991 on increasing input resistance was markedly larger in KO than WT mossy cells. For other figures, we have plotted data in a similar way to show the comparisons between WT and KO, as well as comparisons within genotype +/- XE991.

      Because of the cell-to-cell variability of the voltage responses, it would be more informative and representative if the average of traces from all cells were shown in Fig.2 D & E.

      We agree with the Reviewer’s point. For clarity of presentation, we presented the cell-to-cell variability of the data as scatter points of input resistance values in the bar graph (Figure 2E), together with the representative traces (Figure 2D). Plotting the average traces from all cells would result in a total of 30 traces for all the WT and KO mice, which is difficult to visually assess clearly.

      On page 7, please clarify the recorded cell type in this sentence: "In ‎contrast, WIN markedly reduced the number of sEPSCs in both WT and KO mice...".

      We thank the Reviewer for pointing out this omission and have clarified it in the revised version.

      In Figures 6 C, F, and I, the title of the Y-axis should be normalized frequency. Please also correct the figure legend accordingly because the current sentence can be also interpreted as the absolute or total number of events that were compared, irrespective of the duration of the recordings.

      We thank the Reviewer for this point and have corrected the revised version accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I highly appreciate this study and found the paper to be very well-written and easy to follow. However, a more extensive discussion of what I summarized under "weakness" would strengthen the paper. This may include a broader discussion of the canopy effect itself and the most relevant literature on its extent in rainforest settings in general and primate foods in particular, as well as more details on the dietary behavior of modern orangutans (stratigraphy of orangutan foods) and how seasonal their diet is. The extreme seasonality in orangutan plant food availability should be discussed. Now there are only 2 sentences in the discussion (lines 304-312) and I find the word "plant' only twice overall, though variation in plant food d18O is what drives variation in orangutan dental d18O values.

      We very much appreciate the support of this reviewer, and their feedback about the clarity of the paper. As noted in the provisional reply to reviewers, we are happy to add additional context about the issue of isotopic enrichment within forest canopies, and have expanded the original paragraph in the discussion devoted to this subject. We made reference to the fact that orangutan diets vary by season and site in the original submission, and have now acknowledged that seasonal diet variation may also contribute to variation in enamel isotope values.

      Also, I'd like to note that there has been only one recent study so far that made some level of an attempt to find a breastfeeding effect in orangutans using fecal isotope data. Tsutaya et al. 2022 (AJBA) report some seasonality in adult orangutan fecal isotope values, which could be relevant here as well. But also they reported some data from 2 to 7-year-old orangutan offspring and did not see any breastfeeding pattern in isotope values here either. Probably not too surprising at this older age, but still worth noting in the context of this study.

      There is a 2019 study that sampled fecal isotopes in 43 mother-infant orangutan pairs and found a different pattern than Tsutaya et al. (2022), although these data have not been published in full (Knott et al. (2019) AJBA 168, S68, 128-129). Given these contradictions, the fact that neither study serially sampled the first two years of life, and caveats to fecal isotope sampling of wild primates reviewed in Bădescu et al. (2023: American Journal of Primatology 2023;e235), introducing these nitrogen isotope studies does not aide in the interpretation of oxygen isotope data during intensive nursing, and thus is beyond the scope of this paper. The seasonality Tsutaya et al. (2022) reported in adult fecal samples was for carbon isotopes rather than nitrogen isotopes, and its relevance to the current study is unclear given that the orangutan plant foods measured did not show seasonal variation in carbon isotopes. As requested above, we have noted orangutans’ dietary seasonality might influence the variation of oxygen isotope values.

      Reviewer #2 (Recommendations For The Authors):

      First, the manuscript offers upfront flashy numbers with respect to the number of samples, but what the reader really needs to know upfront is the number of individuals and the number of teeth per individual. These facts are buried and make the reader work too hard to keep track. While the specimen ID numbers are valuable in the table, perhaps a different ID could be used in the text, such as individuals modern Borneo A and B, fossil Sumatra A and B, etc.? Similarly, it would be helpful to remind readers of each locality - Borneo or Sumatra, modern or fossil.

      Tables 1 and 2 and the first sentence of the results and the materials and methods stated that we measured 18 teeth in this study. It is likely that the placement of the tables at the very end of the manuscript in the submitted version made the sample sizes and specimen information less evident to the reviewer. In response to this critique we have now added the number of teeth to the abstract, and trust that when the tables are placed within the text as indicated it will be easier to follow textual references to particular individuals. Museum identification codes have been provided in two previous publications of these teeth, and we retain them here for consistency.

      Second, the manuscript mentions some climate change in Sumatra, but what about Borneo?

      The results on the Bornean fossil teeth stated: “The range of values from these two fossil molars (14.2–24.8 ‰) markedly exceeds the range of modern Bornean orangutans (12.7–20.0 ‰) (Figure 4), with the mean δ18O value at least 2‰ heavier, suggesting possibly drier conditions with greater seasonality during their formation.” In the final section of the discussion, we devoted two paragraphs to discussing evidence for climate change at Niah Cave in Borneo - more than we devote to discussing such data from Sumatra.

      The most valuable figure in the manuscript is Figure 3 showing the serial sampling of modern teeth. It would be incredibly useful to see a similar graph for the fossils and a graph of the modern and fossils together for each island. The violin plots demonstrate a range of values but fail to provide the important seasonality signals. The manuscript is promising but as written is difficult to follow, and the results and conclusions with regard to climate change need more demonstration. On a minor note, I found myself wanting to know about the dates of fossils before knowing the isotopic values. You might wish to move the dating section to precede the isotopes.

      As requested, we have added an additional Supplemental figure making the comparisons of seasonality between fossil and modern individual more evident.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addressed an alternative hypothesis to temporal binding phenomena. In temporal binding, two events that are separated in time are "pulled" towards one another, such that they appear more coincidental. Previous research has shown evidence of temporal binding events in the context of actions and multisensory events. In this context, the author revisits the well-known Libet clock paradigm, in which subjects view a moving clock face, press a button at a time of their choosing to stop the clock, a tone is played (after some delay), and then subjects move the clock dial to the point where the one occurred (or when the action occurred). Classically, the reported clock time is a combination of the action and sound times. The author here suggests that attention can explain this by a mechanism in which the clock dial leads to a roving window of spatiotemporal attention (that is, it extends in both space and time around the dial). To test this, the author conducted a number of experiments where subjects performed the Libet clock experiment, but with a variety of different stimulus combinations. Crucially, a visual detection task was introduced by flashing a disc at different positions along the clock face. The results showed that detection performance was also "pulled" towards the action event or sensory event, depending on the condition. A model of roving spatiotemporal attention replicated these effects, providing further evidence of the attentional window.

      Strengths:

      The study provides a novel explanation for temporal binding phenomena, with clear and cleverly designed experiments. The results provide a nice fit to the proposed model, and the model itself is able to recapitulate the observed effects.

      Weaknesses:

      Despite the above, the paper could be clearer on why these effects are occurring. In particular, the control experiment introduced in Experiment 3 is not well justified. Why should a tactile stimulus not lead to a similar effect? There are possibilities here, but the author could do well to lay them out. Further, from a perspective related to the attentional explanation, other alternatives are not explored. The author cites and considers work suggesting that temporal binding relies on a Bayesian cue combination mechanism, in which the estimate is pulled towards the stimulus with the lowest variance, but this is not discussed. None of this necessarily detracts from the findings, but otherwise makes the case for attention less clear.

      I would like to thank the reviewer for the helpful comments and recommendations. Regarding Experiment 3, the rationale is this. We showed in Experiments 1 and 2 that, for outcome binding, there were two types of difference between Action Sound condition and Sound Only condition: the reported time of sound onset (i.e. the reported clock hand location at the sound onset) and the attention distribution. To experimentally test the relevance of the attention difference to the difference of reported time, we created a situation where the attention difference could be minimised and then checked the difference of reported time. We found that when the attention difference was controlled for between the two conditions, the difference of reported time was also gone, thus providing further evidence for a close link between attention and time report in the current testing paradigm. Therefore, Experiment 3 was primarily targeting the experimental evidence for the claim of the current study. What we needed in Experiment 3 was a condition that could have a smaller attention difference with the Action Sound condition than the attention difference between Sound Only and Action Sound conditions in Experiments 1 and 2. We expected that a tactile stimulus before the sound onset could work, without a clear prediction of the strength of the tactile stimulus in shifting attention, which was also not necessary. This experimental manipulation was a nice fit for the purpose of experiment 3, as we could empirically measur the effectiveness of the tactile stimulus on attention shift and then relate it to the changes in outcome binding.

      As the reviewer correctly suggested, the Bayesian framework has been applied in several studies to explain the time judgement distortion in sensorimotor situations (e.g. the temporal binding effect studied here). However, the current study asked what temporal binding is really about when it is measured with the Libet clock method. Is it really about a distortion in time perception (which the Bayesian account tries to explain)? Or is it also about attention? The results showed that the spatiotemporal attention distribution is at least a confound in measuring the perceived time of an event using the Libet clock method. Therefore, the Bayesian account raised in previous studies is relevant when explaining the distortion in time perception, given that it really exists. We here asked if the distortion really exists, and to what extent.

      Reviewer #2 (Public Review):

      Summary:

      Temporal binding, generally considered a timing illusion, results from actions triggering outcomes after a brief delay, distorting perceived timing. The present study investigates the relationship between attention and the perception of timing by employing a series of tasks involving auditory and visual stimuli. The results highlight the role of attention in event timing and the functional relevance of attention in outcome binding.

      Strengths:

      • Experimental Design: The manuscript details a well-structured sequence of experiments investigating the attention effect in outcome binding. Thoughtful variations in manipulation conditions and stimuli contribute to a thorough and meaningful investigation of the phenomenon.

      • Statistical Analysis: The manuscript employs a diverse set of statistical tests, demonstrating careful selection and execution. This statistical approach enhances the reliability of the reported findings.

      • Narrative Clarity: Both in-text descriptions and figures provide clear insights into the experiments and their results, facilitating readers in following the logic of the study.

      Weaknesses:

      • Conceptual Clarity: The manuscript aims to integrate key concepts in human cognitive functions, including attention, timing perception, and sensorimotor processes. However, before introducing experiments, there's a need for clearer definitions and explanations of these concepts and their known and unknown interrelationships. Given the complexity of attention, a more detailed discussion, including specific types and properties, would enhance reader comprehension.

      • Computational Modeling: The manuscript lacks clarity in explaining the model architecture and setup, and it's unclear if control comparisons were conducted. These details are critical for readers to properly interpret attention-related findings in the modeling section. Providing a clearer overview of these aspects will improve the overall understanding of the computational models used.

      I would like to thank the reviewer for the helpful comments and recommendations. The attention in the current study, which has been made clearer in the revised manuscript, refers specifically to visuospatial attention. It is presented as a key factor shaping the results of timing report obtained with the clock method, thereby contributing to the explanation of temporal binding. Indeed, attention has been mentioned previously in a similar context, but was treated vaguely as a kind of general cognitive resources. The current study specifically tested and verified that the visuospatial attention paid to the clock face influenced the timing reports. This point has been discussed in a dedicated paragraph in the discussion section of the revised manuscript.

      The modelling of the timing report using the attention data was based on a very simple idea: The clock hand location receiving more attention should be given more weight when participants made the timing report (i.e. reporting the clock hand position). The weight for each location was calculated using the detection rate at each location. The relevant methods section has been extensively revised to provide a step-by-step implementation of the modelling, with rationales and pitfalls in the interpretation of the modelling results given (also in the discussion section).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the immunophenotypes of cancer treatment-related pneumonitis. The evidence supporting the claims of the authors is solid, although the inclusion of controls, as suggested by one of the reviewers, strengthened the study. The work will be of interest to cancer immunologists.

      Response: We are thankful for the editor's recognition of the contribution our study makes to understanding the immunophenotypes associated with cancer treatment-related pneumonitis. We agree that the inclusion of control data is pivotal for benchmarking biomarkers. While our initial study design was constrained by the availability of BALF from healthy individuals within clinical settings, we addressed this limitation by incorporating scRNA-seq data from healthy control and COVID-19 BALF cells sourced from the GSE145926 dataset. This additional analysis has provided a baseline for comparison, revealing that CD16 is expressed in a minority of T cells in healthy BALF, specifically 1.0% of CD4+ T cells and 1.6% of CD8+ T cells. The inclusion of this data as Figures 6H and 6I in our manuscript offers a robust context for the significant increase in CD16-expressing T cells observed in patients with PCP, thus enhancing the robustness of our study's conclusions.

      Author response image 1.

      Reviewer #1 (Recommendations For The Authors):

      Many thanks for giving me the opportunity to review your paper. I really enjoyed the way you carried out this work - for example, your use of a wide panel of markers and the use of two analytical methods - you have clearly given great thought to bias avoidance. I also greatly appreciated your paragraph on the limitations, as there are several, but you do not 'over-sell' your conclusions so there is no issue here for me.

      To improve the piece, there are a few typos (eg 318 - specific to alpha-myosin) and I was briefly confused about the highlighted clusters in Figure 4. Perhaps mention why they are highlighted when they first appear in 4D instead of E?

      Response: We have corrected the typos, and we have rearranged the sequence of Figures 3E and 3F, as well as 4D and 4E, to ensure a logical flow. Citrus-generated violin plots are now presented prior to the heatmap of the clusters, which better illustrates the progression of our analysis and the derivation of the clusters.

      In terms of improvements to the data, obviously it would have been ideal if you had had some sort of healthy control as a point of reference for all cohorts, but working in the field I understand the difficulties in getting healthy BAL. It would be worth your while however trying to find more supportive data in the literature in general. There are studies which assess various immune markers in healthy BAL eg https://journal-inflammation.biomedcentral.com/articles/10.1186/1476-9255-11-9. and so I think it is worth looking wrt the main findings. For example, are CD16+ T cells seen in healthy BAL or any other conditions (at present the COVID study is being over-relied on)? Could these cells be gamma deltas? (gamma deltas frequently express CD8 and CD16, and can switch to APC like phenotypes).

      Response: We are grateful for the reviewer's consideration of the practical challenges associated with collecting BALF from healthy individuals. Alternatively, we have supplemented our analysis with single-cell RNA sequencing data from BALF cells of healthy controls, as found in existing literature (Nature Medicine 2020; 26: 842-844). We have accessed to GSE145926 and downloaded data of BALF cells from healthy control (n=3) and severe COVID19 (n=6). The filtered gene-barcode matrix was first normalized using ‘NormalizeData’ methods in Seurat v.4 with default parameters. The top 2,000 variable genes were then identified using the ‘vst’ method in Seurat FindVariableFeatures function. Then PCA and UMAP was performed. T cells were identified as CD2 >1 and CD3E >1, and FCGR3A expression was explored using an expression threshold of 0.5. Violin plots and bar plots were generated by ggplot function.

      Regarding the pivotal finding of increased CD16-expressing T cells in patients with PCP, the scRNA-seq data mining indicates that CD16 is expressed by a minority of T cells in healthy BALF—1.0% of CD4+ T cells and 1.6% of CD8+ T cells. These figures, now incorporated into our revised manuscript as Figures 6H and 6I, substantiate our findings. These cells could be gamma delta T cells, but we could not confirm it with the limited data. We will investigate in the future study. The main text has been updated to reflect these findings.

      Author response image 2.

      I would agree with your approach of not going down the transcript route, so just focus on protein expression.

      I think you need to mention more about the impact of ICI on PD1 expression - in the methods you lose one approach owing to low T cell expression (132) but in the discussion you mention ICI induced high expression (311) as previously reported. This apparent contradiction needs an explanation.

      Response: We acknowledge the need for clarification regarding the impact of ICIs on PD-1 expression. In the methods section, the low detection of PD-1 expression on T cells in patients treated with nivolumab was indeed noted; this was due to the competitive nature of the PD-1 detection antibody EH12.2 with nivolumab. As reported by Suzuki et al. (International Immunology 2020; 32: 547-557), T cells from patients with ICI-induced ILD, including those treated with nivolumab, exhibit upregulated PD-1 expression, where the PD-1 detection antibody (clone: MIH4). Conversely, as outlined by Yanagihara et al. (BBRC 2020; 527: 213-217), the PD-1 detection antibody clone EH12.2 conjugated with 155Gd (#3155009B) used in our study is unable to detect PD-1 when patients are under nivolumab treatment due to competitive inhibition. The absence of a metal-conjugated PD-1 antibody with the MIH4 clone presented a limitation in our study. Ideally, we would have conjugated the MIH4 antibody with 155Gd for our analysis, which is a refinement we aim to incorporate in future research. We have now included this discussion in our manuscript to clarify the contradiction between the methodological limitations and the high PD-1 expression induced by ICIs, as reported in the literature. This addition will guide readers through the nuances of antibody selection and its implications for detecting PD-1 expression in the context of ICI treatment.

      Finally, since you have the severity data, it would be good to assess all the significantly different clusters against this metric, as you have done for CD16+ T cells. Not only may this reveal more wrt the impact of other immune populations, but it'll also give a point of reference for the CD16+ T cell data.

      Response: Thank you for the suggestion to assess all significantly different clusters against the disease severity metric. We have expanded our analysis to include a thorough correlation study between the disease severity and intensity of various T-cell markers. Notably, we observed that intensity of CCR7 expression correlates with the disease severity. Although the precise biological significance of this correlation remains to be elucidated, it may suggest a role for CCR7+ T cells in the pathogenesis or progression of the disease. We have considered the potential implications of this finding and included it as Supplementary Figure 5. We have also discussed this observation in the discussion section.

      Author response image 3.

      Overall though I think this is a really nice study, with a potentially very significant finding in linking CD16+ T cells with severity. Congratulations.

      Response: We would like to thank the reviewer’s heartful comments on our manuscript.

      Reviewer #2 (Recommendations For The Authors):

      General:

      1) The fact that this is a retrospective study should be indicated earlier in the paper.

      Response: Now we have mentioned the retrospective nature of the study in the method section as follows: In this retrospective study, patients who were newly diagnosed with PCP, DI-ILD, and ICI-ILD and had undergone BALF collection at Kyushu University Hospital from January 2017 to April 2022 were included. The retrospective study was approved by the Ethics Committee of Kyushu University Hospital (reference number 22117-00).

      2) tSNE and UMAP are dimensionality reduction techniques that don't cluster the cells, the authors should specify what clustering algorithm was used subsequently (e.g FlowSOM)

      Response: The cluster was determined manually by their expression pattern.

      3) With regards to the role of CD16 in a potential exacerbated cytotoxicity in the fatal PCP case, the authors could measure the levels of C3a related proteins in patient serum to link to a common immunopathogenic pathway with COVID.

      Response: We did not collect serum from the patients in this study as our research protocol was approved by the Ethics committee for the use of BALF only. However, we agree with your assessment that the measurement of serum C3a levels would be informative. In future studies, we will incorporate the measurement of serum C3a levels to provide more comprehensive insights into the impact of C3a on immune function. Thank you for your valuable feedback and for helping us to improve the quality of our research.

      Line-specific:

      101 The authors should provide some information on how the cryopreservation of the BALF was carried out.

      Response: Upon collection, BALF samples were immediately centrifuged at 300 g for 5 minutes to pellet the cells. The resultant cell pellets were then resuspended in Cellbanker 1 cryopreservation solution (Takara, catalog #210409). This suspension was aliquoted into cryovials and gradually frozen to –80ºC using a controlled rate freezing method to ensure cell viability. The samples were stored at –80ºC until required for experimental analysis. We have added the information in the method section.

      Fig 3B: It would be very helpful if the authors could add a supplementary figure with marker expression on the UMAP projection.

      Response: We have added Supplementary Figure 4 with marker expression on the UMAP projection in Figure 3B.

      Fig 4A: Same as Fig 3B

      Response: We have added Supplementary Figure 5 with marker expression on the UMAP projection in Figure 4A.

      Fig 5B: Same as Fig 3B

      Response: We have added Supplementary Figure 6 with marker expression on the tSNE projection in Figure 5B.

      266 Authors should state if the data is not shown with regards to differences in myeloid cell fractions

      430 Marker intensity is not shown in panel D

      Re: Corrected as follows: “Citrus network tree visualizing the hierarchical relationship of each marker between identified T cell ~”

      446 The legend says patients have IPF, CTD-ILD, sarcoidosis but the figure shows PCP, DI-ILD, ICI-ILD.

      Re: Corrected.

      451 What do the authors mean in "Graphical plots represent individual samples"? Panel B is a dot plot of all samples.

      Response: Corrected as “Dot plots represent ~”.

      472 What do the authors mean in "Graphical plots represent individual samples"? Panel C is a dot plot of all samples.

      Response: Corrected as “Dot plots represent ~”.

      Reviewer #3 (Recommendations For The Authors):

      An important thing is to add comparisons against healthy donors, at least. A common baseline is needed to firmly establish any biomarkers.

      Response: We acknowledge the reviewer's concern regarding the comparison with healthy donors. Although our study did not initially include BALF collection from healthy controls due to the constraints of clinical practice, we recognize the importance of a control baseline to validate biomarkers. To address this, we have integrated scRNA-seq data from healthy control BALF cells available in public datasets (Nature Medicine 2020; 26: 842-844), accessed from GSE145926. This dataset includes BALF cells from healthy controls (n=3) alongside severe COVID-19 patients (n=6). Data mining confirmed that CD16 expression is in a minority of T cells in healthy BALF—1.0% of CD4+ T cells and 1.6% of CD8+ T cells. We have included this comparative data in our manuscript as Figures 6H and 6I to provide context for the observed increase in CD16-expressing T cells in PCP patients, which substantiates our findings.

      Author response image 4.

      Data analysis needs to go deeper. There are several other tools on Cytobank alone that would allow a more quantitative analysis of the data. Fold changes in marker expressions would be very important as measurements of phenotypic changes.

      Response: We thank the reviewer for their constructive feedback on the depth of our data analysis. We acknowledge the value of a more quantitative approach, including the use of fold change measurements to assess phenotypic alterations, and recognize the potential insights such tools on Cytobank could provide. Due to the scope and limited space of the current study, we have focused our analysis on the most pertinent findings relevant to our research questions. We believe the present analysis serves the immediate objectives of this study. However, we agree that further quantitative analysis would enhance the understanding of the data. We have expanded our analysis to include a thorough correlation study between the disease severity of PCP and intensity of various T-cell markers. Notably, we observed that intensity of CCR7 expression correlates with the disease severity of PCP. Although the precise biological significance of this correlation remains to be elucidated, it may suggest a role for CCR7+ T cells in the pathogenesis or progression of the disease. We have considered the potential implications of this finding and included it as Supplementary Figure 5. We have also discussed this observation in the discussion section. We aim to consider these approaches in future work to build upon the foundation laid by this study. Your suggestions are invaluable and will be kept at the forefront as we plan subsequent research phases.

      Author response image 5.

      Reviewer #1 (Public Review):

      Cytotoxic agents and immune checkpoint inhibitors are the most commonly used and efficacious treatments for lung cancers. However their use brings two significant pulmonary side-effects; namely Pneumocystis jirovecii infection and resultant pneumonia (PCP), and interstitial lung disease (ILD). To observe the potential immunological drivers of these adverse events, Yanagihara et al. analysed and compared cells present in the bronchoalveolar lavage of three patient groups (PCP, cytotoxic drug-induced ILD [DI-ILD], and ICI-associated ILD [ICI-ILD]) using mass cytometry (64 markers). In PCP, they observed an expansion of the CD16+ T cell population, with the highest CD16+ T proportion (97.5%) in a fatal case, whilst in ICI-ILD, they found an increase in CD57+ CD8+ T cells expressing immune checkpoints (TIGIT+ LAG3+ TIM-3+ PD-1+), FCRL5+ B cells, and CCR2+ CCR5+ CD14+ monocytes. Given the fatal case, the authors also assessed for, and found, a correlation between CD16+ T cells and disease severity in PCP, postulating that this may be owing to endothelial destruction. Although n numbers are relatively small (n=7-9 in each cohort; common numbers for CyTOF papers), the authors use a wide panel (n=65) and two clustering methodologies giving greater strength to the conclusions. The differential populations discovered using one or two of the analytical methods are robust: whole population shifts with clear and significant clustering. These data are an excellent resource for clinical disease specialists and pan-disease immunologists, with a broad and engaging contextual discussion about what they could mean.

      Strengths:

      • The differences in immune cells in BAL in these specific patient subgroups is relatively unexplored.

      • This is an observational study, with no starting hypothesis being tested.

      • Two analytical methods are used to cluster the data.

      • A relatively wide panel was used (64 markers), with particular strength in the alpha beta T cells and B cells.

      • Relevant biomarkers, beta-D-glucan and KL-6 were also analysed

      • Appropriate statistics were used throughout.

      • Numbers are low (7 cases of PCP, 9 of DI-ILD, and 9 of ICI-ILD) but these are difficult samples to collect and so in relative terms, and considering the use of CyTOF, these are good numbers.

      • Beta-D-glucan shows potential as a biomarker for PCP (as previously reported) whilst KL-6 shows potential as a biomarker for ICI-ILD (not reported before). Interestingly, KL-6 was not seen to be increased in DI-ILD patients.

      • Despite the relatively low n numbers and lack of matching there are some clear differentials. The CD4/CD8+CD16+HLA-DR+CXCR3+CD14- T cell result is striking - up in PCP (with EM CD4s significantly down) - whilst the CD8 EMRA population is clear in ICI-ILD and 'non-exhausted' CD4s, with lower numbers of EMRA CD8s in DI-ILD.

      • The authors identify 17/31 significantly differentiated clusters of myeloid cells, eg CD11bhi CD11chi CD64+ CD206+ alveolar macrophages with HLA-DRhi in PCP.

      • With respect to B cells, the authors found that FCRL5+ B cells were more abundant in patients with ICI-ILD compared to those with PCP and DI-ILD, suggesting these FCRL5+ B cells may have a role in irAE.

      • One patient's extreme CD16+ T cell (97.5% positive) and death, led the authors to consider CD16+ T cells as an indicator of disease severity in PCP. This was then tested and found to be correct.

      • Authors discuss results in context of literature leading them to suggest that CD16+ T cells may target endothelial cells and wonder if anti-complement therapy may be efficacious in PCP.

      • Great discussion on auto-reactive T cell clones where the authors suggest that in ICI-ILD CD8s may react against healthy lung, driving ILD.

      • An observation of CXCR3 in different CD8 populations in ICI-ILD and PCP lead the authors to hypothesise on the chemoattractants in the microenvironment.

      • Excellent point suggesting CD57 may not always be a marker of senescence on T cells - reflective of growing change within the community.

      • Well considered suggestion that FCRL5+ B cells may be involved in ICI-ILD driven autoimmunity.

      • The authors discuss the main weaknesses in the discussion and stress that the findings detailed in the paper "demonstrate a correlation rather than proof of causation".

      • Figures and legends are clear and pleasing to the eye.

      Weaknesses:

      • This is an observational study, with no starting hypothesis being tested.

      • Only patients who were able to have a lavage taken have been recruited.

      • One set of analysis wasn't carried out for one subgroup (ICI-ILD) as PD1 expression was negative owing to the use of nivolumab.

      • Some immune cell subsets wouldn't be picked up with the markers and gating strategies used; e.g. NK cells.

      • Some immune cells would be disproportionately damaged by the storage, thawing and preparation of the samples; e.g. granulocytes.

      • Numbers are low (7 cases of PCP, 9 of DI-ILD, and 9 of ICI-ILD), sex, age and adverse event matching wasn't performed, and treatment regimen are varied and 'suspected' (suggesting incomplete clinical data) - but these are difficult samples to collect. These numbers drop further for some analyses e.g. T cell clustering owing to factors such as low cell number.

      • The disease comparisons are with each other, there is no healthy control.

      • Samples are taken at one time point.

      • The discussion on probably the stand out result - the CD16+ T cells in PCP - relies on two papers - leading to a slightly skewed emphasis on one paper on CD16+ cells in COVID. There are other papers out there that have observed CD16+ T cells in other conditions. It is also worth being in mind that given the markers used, these CD16+ T cell may be gamma deltas.

      • The discussion on ICI patient consistently showing increased PD1, could have been greater, as given the ICI is targeting PD1, one would expect the opposite as commented on, and observed, in the methods section.

      Reviewer #2 (Public Review):

      Yanagihara and colleagues investigated the immune cell composition of bronchoalveolar lavage fluid (BALF) samples in a cohort of patients with malignancy undergoing chemotherapy and with with lung adverse reactions including Pneumocystis jirovecii pneumonia (PCP) and immune-checkpoint inhibitors (ICIs) or cytotoxic drug induced interstitial lung diseases (ILDs). Using mass cytometry, their aim was to characterize the cellular and molecular changes in BAL to improve our understanding of their pathogenesis and identify potential biomarkers and therapeutic targets. In this regard, the authors identify a correlation between CD16 expression in T cells and the severity of PCP and an increased infiltration of CD57+ CD8+ T cells expressing immune checkpoints and FCLR5+ B cells in ICI-ILD patients.

      The conclusions of this paper are mostly well supported by data, but some aspects of the data analysis need to be clarified and extended.

      1) The authors should elaborate on why different set of markers were selected for each analysis step. E.g., Different set of markers were used for UMAP, CITRUS and viSNE in the T cell and myeloid analysis.

      2) The authors should state if a normality test for the distribution of the data was performed. If not, non-parametric tests should be used.

      3) The authors should explore the correlation between CD16 intensity and the CTCAE grade in T cell subsets such as EMRA CD8 T cells, effector memory CD4, etc as identified in Figure 1B.

      4) The authors could use CITRUS to better assess the B cell compartment.

      Reviewer #3 (Public Review):

      The authors collected BALF samples from lung cancer patients newly diagnosed with PCP, DI-ILD or ICI-ILD. CyTOF was performed on these samples, using two different panels (T-cell and B-cell/myeloid cell panels). Results were collected, cleaned-up, manually gated and pre-processed prior to visualisation with manifold learning approaches t-SNE (in the form of viSNE) or UMAP, and analysed by CITRUS (hierarchical clustering followed by feature selection and regression) for population identification - all using Cytobank implementation - in an attempt to identify possible biomarkers for these disease states. By comparing cell abundances from CITRUS results and qualitative inspection of a small number of marker expressions, the authors claimed to have identified an expansion of CD16+ T-cell population in PCP cases and an increase in CD57+ CD8+ T-cells, FCRL5+ B-cells and CCR2+ CCR5+ CD14+ monocytes in ICI-ILD cases.

      By the authors' own admission, there is an absence of healthy donor samples and, perhaps as a result of retrospective experimental design, also an absence of pre-treatment samples. The entire analysis effectively compares three yet-established disease states with no common baseline - what really constitutes a "biomarker" in such cases? The introduction asserts that "y characterizing the cellular and molecular changes in BAL from patients with these complications, we aim to improve our understanding of their pathogenesis and identify potential therapeutic targets" (lines 82-84). Given these obvious omissions, no real "changes" have been studied in the paper. These are very limited comparisons among three, and only these three, states.

      Even assuming more thorough experimental design, the data analysis is unfortunately too shallow and has not managed to explore the wealth of information that could potentially be extracted from the results. CITRUS is accessible and convenient, but also make a couple of big assumptions which could affect data analysis - 1) Is it justified to concatenate all FCS files to analyse the data in one batch / small batches? Could there be batch effects or otherwise other biological events that could confuse the algorithm? 2) With a relatively small number of samples, and after internal feature selection of CITRUS, is the regression model suitable for population identification or would it be too crude and miss out rare populations? There are plenty of other established methods that could be used instead. Have those methods been considered?

      Colouring t-SNE or UMAP (e.g. Figure 6C) plots by marker expression is useful for quick identification of cell populations but it is not a quantitative analysis. In a CyTOF analysis like this, it is common to work out fold changes of marker expressions between conditions. It is inadequate to judge expression levels and infer differences simply by looking at colours.

      The relatively small number of samples also mean that most results presented in the paper are not statistical significant. Whilst it is understandable that it is not always possible to collect a large number of patient samples for studies like this, having several entire major figures showing "n.s." (e.g. Figures 3A, 4B and 5C), together with limitations in the comparisons themselves and inadequate analysis, make the observations difficult to be convincing, and even less so for the single fatal PCP case where N = 1.

      It would also be good scientific practice to show evidence of sample data quality control. Were individual FCS files examined? Did the staining work? Some indication of QC would also be great.

      This dataset generated and studied by the authors have the potential to address the question they set out to answer and thus potentially be useful for the field. However, in the current state of presentation, more evidence and more thorough data analysis are needed to draw any conclusions, or correlations, as the authors would like to frame them.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      This paper performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument. The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate. Strengths: The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      Response: Thank you very much for your affirmation of our work. The reviewer discussed the parts of our manuscript that involve evolution sentence by sentence. We have further refined the description in this regard and improved the logical flow. Thank you again for your help.

      Weaknesses:

      1) The last section of the results, entitled "Downstream target gene analysis" is primarily based on in silico genome-wide binding motif predictions.

      While the authors identify a potential binding site using EMSA, it is unclear how much this general approach over-predicted potential targets. While I think this work is interesting, its potential caveats are not mentioned. In fact the Discussion section seems to trust the high number of target genes as a reliable result. Specifically, the authors correctly say: "even if there are some transcription factor-binding sites in a gene, the gene is not necessarily regulated by these factors in a specific tissue and period", but then propose a biological explanation that not all binding sites are relevant to expression control. This makes a radical short-cut that predicted binding sites are actual in vivo binding sites. This may not be true, as I'd expect that only a subset of binding motifs predicted by Positional Weight Matrices (PWM) are real in vivo binding sites with a ChIP-seq or Cut-and-Run signal. This is particularly problematic for PWM that feature only 5-nt signature motifs, as inferred here for mamo-S and mamo-L, simply because we can expect many predicted sites by chance.

      Response: Thank you very much for your careful work. The analysis and identification of transcription factor-binding sites is an important issue in gene regulation research. Techniques such as ChIP-seq can be used to experimentally identify the binding sites of transcription factors (TFs). However, reports using these techniques often only detect specific cell types and developmental stages, resulting in a limited number of downstream target genes for some TFs. Interestingly, TFs may regulate different downstream target genes in different cell types and developmental stages.

      Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions -1, 2, 3, and 6 are key amino acids for recognizing and binding DNA. The residues at positions -1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand (Wolfe SA et al., 2000; Pabo CO et al., 2001). Based on this principle, the binding sites of C2H2-ZF have good reference value. For the 5-nt PWM sequence, we referred to the study of D. melanogaster, which was identified by EMSA (Shoichi Nakamura et al., 2019). In the new version, we have rewritten this section.

      Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313-340.

      Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183-212.

      Nakamura S, Hira S, Fujiwara M, et al. A truncated form of a transcription factor Mamo activates vasa in Drosophila embryos. Commun Biol. 2019;2:422. Published 2019 Nov 20.

      2) The last part of the current discussion ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program") is flawed with important logical shortcuts that assign "agency" to the evolutionary process. For instance, this section conveys the idea that phenotypically relevant mutations may not be random. I believe some of this is due to translation issues in English, as I understand that the authors want to express the idea that some parts of the genome are paths of least resistance for evolutionary change (e.g. the regulatory regions of developmental regulators are likely to articulate morphological change). But the language and tone is made worst by the mention that in another system, a mechanism involving photoreception drives adaptive plasticity, making it sound like the authors want to make a Lamarckian argument here (inheritance of acquired characteristics), or a point about orthogenesis (e.g. the idea that the environment may guide non-random mutations).

      Because this last part of the current discussion suffers from confused statements on modes and tempo of regulatory evolution and is rather out of topic, I would suggest removing it.

      In any case, it is important to highlight here that while this manuscript is an excellent genotype-to-phenotype study, it has very few comparative insights on the evolutionary process. The finding that mamo is a pattern or pigment regulatory factor is interesting and will deserve many more studies to decipher the full evolutionary study behind this Gene Regulatory Network.

      Response: Thank you very much for your careful work. In this part of the manuscript, we introduced some assumptions that make the statement slightly unconventional. The color pattern of insects is an adaptive trait. The bd and bdf mutants used in the study are formed spontaneously. As a frequent variation and readily observable phenotype, color patterns have been used as models for evolutionary research (Wittkopp PJ et al., 2011). Darwin's theory of natural selection has epoch-making significance. I deeply believe in the theory that species strive to evolve through natural selection. However, with the development of molecular genetics, Darwinism’s theory of undirected random mutations and slow accumulation of micromutations resulting in phenotype evolution has been increasingly challenged.

      The prerequisite for undirected random mutations and micromutations is excessive reproduction to generate a sufficiently large population. A sufficiently large population can contain sufficient genotypes to face various survival challenges. However, it is difficult to explain how some small groups and species with relatively low fertility rates have survived thus far. More importantly, the theory cannot explain the currently observed genomic mutation bias. In scientific research, every theory is constantly being modified to adapt to current discoveries. The most famous example is the debate over whether light is a particle or a wave, which has lasted for hundreds of years. However, in the 20th century, both sides seemed to compromise with each other, believing that light has a wave‒particle duality.

      In summary, we have rewritten this section to reduce unnecessary assumptions.

      Wittkopp PJ, Kalay G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet. 2011;13(1):59-69.

      Minor Comment:

      The gene models presented in Figure 1 are obsolete, as there are more recent annotations of the Bm-mamo gene that feature more complete intron-exon structures, including for the neighboring genes in the bd/bdf intervals. It remains true that the mamo locus encodes two protein isoforms.

      An example of the Bm-mamo locus annotation, can be found at: https://www.ncbi.nlm.nih.gov/gene/101738295 RNAseq expression tracks (including from larval epidermis) can be displayed in the embedded genome browser from the link above using the "Configure Tracks" tool.

      Based on these more recent annotations, I would say that most of the work on the two isoforms remains valid, but FigS2, and particularly Fig.S2C, need to be revised.

      Response: Thank you very much for your careful work. In this study, we referred to the predicted genes of SilkDB, NCBI and Silkbase. In different databases, there are varying degrees of differences in the number of predicted genes and the length of gene mRNA. Because the SilkDB database is based on the first silkworm genome, it has been used for the longest time and has a relatively large number of users. In the revised manuscript, we have added the predicted genes of NCBI and Silkbase in Figure S1.

      Author response image 1.

      The predicted genes and qPCR analysis of candidate genes in the responsible genomic region for bd mutant. (A) The predicted genes in SilkDB;(B) the predicted genes in Genbak;(C) the predicted genes in Silkbase;(D) analysis of nucleotide differences in the responsible region of bd;(E) investigation of the expression level of candidate genes.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to identify new genes involved in melanin metabolism and its spatial distribution in the silkworm Bombyx mori. They identified the gene Bm-mamo as playing a role in caterpillar pigmentation. By functional genetic and in silico approaches, they identified putative target genes of the Bm-mamo protein. They showed that numerous cuticular proteins are regulated by Bm-mamo during larval development.

      Strengths:

      • preliminary data about the role of cuticular proteins to pattern the localization of pigments

      • timely question

      • challenging question because it requires the development of future genetic and cell biology tools at the nanoscale

      Response: Thank you very much for your affirmation of our work. The reviewer's familiarity with the color patterns of Lepidoptera is helpful, and the recommendation raised has provided us with very important assistance. This has allowed us to make significant progress with our manuscript.

      Weaknesses:

      • statistical sampling limited

      • the discussion would gain in being shorter and refocused on a few points, especially the link between cuticular proteins and pigmentation. The article would be better if the last evolutionary-themed section of the discussion is removed.

      A recent paper has been published on the same gene in Bombyx mori (https://www.sciencedirect.com/science/article/abs/pii/S0965174823000760) in August 2023. The authors must discuss and refer to this published paper through the present manuscript.

      Response: Thank you very much for your careful work. First, we believe that competitive research is sometimes coincidental and sometimes intentional. Our research began in 2009, when we began to configure the recombinant population. In 2016, we published an article on comparative transcriptomics (Wu et al. 2016). The article mentioned above has a strong interest in our research and is based on our transcriptome analysis for further research, with the aim of making a preemptive publication. To discourage such behavior, we cannot cite it and do not want to discuss it in our paper.

      Songyuan Wu et al. Comparative analysis of the integument transcriptomes of the black dilute mutant and the wild-type silkworm Bombyx mori. Sci Rep. 2016 May 19:6:26114. doi: 10.1038/srep26114.

      Reviewer #1 (Recommendations For The Authors):

      1) please consider using a more recent annotation model of the B. mori genome to revise your Result Section 1, Fig.1, and Fig. S2. https://www.ncbi.nlm.nih.gov/gene/101738295

      Specifically, you used BGIM_ gene models, while the current annotation such as the one above featured in the NCBI database provides more accurate intron-exon structures without splitting mamo into tow genes. I believe this can be done with minor revisions of the figures, and you could keep the BGIM_ gene names for the text.

      Response: Thank you very much for your careful work. The GenBank of NCBI (National Center for Biotechnology Information) is a very good database that we often use and refer to in this research process. Our research started in 2009, so we mainly referred to the SilkDB database (Jun Duan et al., 2010), although other databases also have references, such as NCBI and Silkbase (https://silkbase.ab.a.u-tokyo.ac.jp/cgi-bin/index.cgi). Because the SilkDB database was constructed based on the first published silkworm genome data, it has been used for the longest time and has a relatively large number of users. Recently, researchers are still using these data (Kejie Li et al., 2023).

      The problem with predicting the mamo gene as two genes (BGIBMGA012517 and BGIBMGA012518) in SilkDB is mainly due to the presence of alternative splicing of the mamo gene. BGIBMGA012517 corresponds to the shorter transcript (mamo-s) of the mamo gene. Due to the differences in sequencing individuals, sequencing methods, and methods of gene prediction, there are differences in the number and sequence of predicted genes in different databases. We added the pattern diagram of predicted genes from NCBI and Silkbase, and the expression levels of new predicted genes are shown in Supplemental Figure S1.

      Jun Duan et al., SilkDB v2.0: a platform for silkworm (Bombyx mori) genome biology. Nucleic Acids Res. 2010 Jan;38(Database issue): D453-6. doi: 10.1093/nar/gkp801. Kejie Li et al., Transcriptome analysis reveals that knocking out BmNPV iap2 induces apoptosis by inhibiting the oxidative phosphorylation pathway. Int J Biol Macromol. 2023 Apr 1;233:123482. doi: 10.1016/j.ijbiomac.2023.123482. Epub 2023 Jan 31.

      Author response image 2.

      The predicted genes and qPCR analysis of candidate genes in the responsible genomic region for bd mutant. (A) The predicted genes in SilkDB;(B) the predicted genes in Genbak;(C) the predicted genes in Silkbase;(D) analysis of nucleotide differences in the responsible region of bd;(E) investigation of the expression level of candidate genes.

      2) As I mentioned in my public review, I strongly believe the interpretation of the PWM binding analyses require much more conservative statements taking into account the idea that short 5-nt motifs are expected by chance. The work in this section is interesting, but the manuscript would benefit from a quite significant rewrite of the corresponding Discussion section, making it that the in silico approach is prone to the identification of many sites in the genomes, and that very few of those sites are probably relevant for probabilistic reasons. I would recommend statements such as "Future experiments assessing the in vivo binding profile of Bm-mamo (eg. ChIP-seq or Cut&Run), will be required to further understand the GRNs controlled by mamo in various tissues".

      Response: Thank you very much for your careful work. Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions -1, 2, 3, and 6 are key amino acids for recognizing and binding DNA. The residues at positions -1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand (Wolfe SA et al., 2000; Pabo CO et al., 2001). Based on this principle, the prediction of DNA recognition motifs of C2H2-type zinc finger proteins currently has good accuracy.

      The predicted DNA binding sequence (GTGCGTGGC) of the mamo protein in Drosophila melanogaster was highly consistent with that of silkworms. In addition, in D. melanogaster, the predicted DNA binding sequence of mamo, the bases at positions 1 to 7 (GTGCGTG), was highly similar to the DNA binding sequence obtained from EMSA experiments (Seiji Hira et al., 2013). Furthermore, in another study on the mamo protein of Drosophila melanogaster, five bases (TGCGT) were used as the DNA recognition core sequence of the mamo protein (Shoichi Nakamura et al., 2019). In the JASPAR database (https://jaspar.genereg.net), there are also some shorter (4-6 nt) DNA recognition sequences; for example, the DNA binding sequence of Ubx is TAAT (ID MA0094.1) in Drosophila melanogaster. However, we used longer DNA binding motifs (9 nt and 15 nt) of mamo to study the 2 kb genomic regions near the predicted gene. Over 70% of predicted genes were found to have these feature sequences near them. This analysis method is carried out with common software and processes. Due to sufficient target proteins, the accessibility of DNA, the absence of suppressors, the suitability of ion environments, etc., zinc finger protein transcription factors are more likely to bind to specific DNA sequences in vitro than in vivo. Using ChIP-seq or Cut&Run techniques to analyze various tissues and developmental stages in silkworms can yield one comprehensive DNA-binding map of mamo, and some false positives generated by predictions can be excluded. Thank you for your suggestion. We will conduct this work in the next research step. In addition, for brevity, we deleted the predicted data (Supplemental Tables S7 and S8) that used shorter motifs.

      Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313-340.

      Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183-212.

      Anton V Persikov et al., De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 2014 Jan;42(1):97-108. doi: 10.1093/nar/gkt890. Epub 2013 Oct 3.

      Seiji Hira et al., Binding of Drosophila maternal Mamo protein to chromatin and specific DNA sequences. Biochem Biophys Res Commun. 2013 Aug 16;438(1):156-60. doi: 10.1016/j.bbrc.2013.07.045. Epub 2013 Jul 20.

      Shoichi Nakamura et al., A truncated form of a transcription factor Mamo activates vasa in Drosophila embryos. Commun Biol. 2019 Nov 20;2: 422. doi: 10.1038/s42003-019-0663-4. eCollection 2019.

      3) In my opinion, the last section of the Discussion needs to be completely removed ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program"), as it is over-extending the data into evolutionary interpretations without any support. I would suggest instead writing a short paragraph asking whether the pigmentary role of mamo is a Lepidoptera novelty, or if it could have been lost in the fly lineage.

      Below, I tried to comment point-by-point on the main issues I had.

      Wu et al: Notably, the industrial melanism event, in a short period of several decades, resulted in significant changes in the body color of multiple Lepidoptera species(46). Industrial melanism events, such as changes in the body color of pepper moths, are heritable and caused by genomic mutations(47).

      Yes, but the selective episode was brief, and the relevant "carbonaria" mutations may have existed for a long time at low-frequency in the population.

      Response: Thank you very much for your careful work. Moth species often have melanic variants at low frequencies outside industrial regions. Recent molecular work on genetics has revealed that the melanic (carbonaria) allele of the peppered moth had a single origin in Britain. Further research indicated that the mutation event causing industrial melanism of peppered moth (Biston betularia) in the UK is the insertion of a transposon element into the first intron of the cortex gene. Interestingly, statistical inference based on the distribution of recombined carbonaria haplotypes indicates that this transposition event occurred in approximately 1819, a date highly consistent with a detectable frequency being achieved in the mid-1840s (Arjen E Van't Hof, et al., 2016). From molecular research, it is suggested that this single origin melanized mutant (carbonaria) was generated near the industrial development period, rather than the ancient genotype, in the UK. We have rewritten this part of the manuscript.

      Arjen E Van't Hof, et al., The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016 Jun 2;534(7605):102-5. doi: 10.1038/nature17951.

      Wu et al: If relying solely on random mutations in the genome, which have a time unit of millions of years, to explain the evolution of the phenotype is not enough.

      What you imply here is problematic for several reasons.

      First, as you point out later, some large-effect mutations (e.g. transpositions) can happen quickly.

      Second, it's unclear what "the time units of million of years" means here... mutations occur, segregate in populations, and are selected. The speed of this process depends on the context and genetic architectures.

      Third, I think I understand what you mean with "to explain the evolution of the phenotype is not enough", but this would probably need a reformulation and I don't think it's relevant to bring it here. After all, you used loss-of-function mutants to explain the evolution of artificially selected mutants. The evolutionary insights from these mutants are limited. Random mutations at the mamo locus are perfectly sufficient here to explain the bd and bdf phenotypes and larval traits.

      Response: Thank you very much for your careful work. Charles Darwin himself, who argued that “natural selection can act only by taking advantage of slight successive variations; she can never take a leap, but must advance by the shortest and slowest steps” (Darwin, C. R. 1859). This ‘micromutational’ view of adaptation proved extraordinarily influential. However, the accumulation of micromutations is a lengthy process, which requires a very long time to evolve a significant phenotype. This may be only a proportion of the cases. Interestingly, recent molecular biology studies have shown that the evolution of some morphological traits involves a modest number of genetic changes (H Allen Orr. 2005).

      One example is the genetic basis analysis of armor-plate reduction and pelvic reduction of the three-spined stickleback (Gasterosteus aculeatus) in postglacial lakes. Although the marine form of this species has thick armor, the lake population (which was recently derived from the marine form) does not. The repeated independent evolution of lake morphology has resulted in reduced armor plate and pelvic structures, and there is no doubt that these morphological changes are adaptive. Research has shown that pelvic loss in different natural populations of three-spined stickleback fish occurs by regulatory mutations deleting a tissue-specific enhancer (Pel) of the pituitary homeobox transcription factor 1 (Pitx1) gene. The researchers genotyped 13 pelvic-reduced populations of three-spined stickleback from disparate geographic locations. Nine of the 13 pelvic-reduced stickleback populations had sequence deletions of varying lengths, all of which were located at the Pel enhancer. Relying solely on random mutations in the genome cannot lead to such similar mutation forms among different populations. The author suggested that the Pitx1 locus of the stickleback genome may be prone to double-stranded DNA breaks that are subsequently repaired by NHEJ (Yingguang Frank Chan et al., 2010).

      The bd and bdf mutants used in the study are formed spontaneously. Natural mutation is one of the driving forces of evolution. Nevertheless, we have rewritten the content of this section.

      Darwin, C. R. The Origin of Species (J. Murray, London, 1859).

      H Allen Orr. The genetic theory of adaptation: a brief history. Nat Rev Genet. 2005 Feb;6(2):119-27. doi: 10.1038/nrg1523.

      Yingguang Frank Chan et al., Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science. 2010 Jan 15;327(5963):302-5. doi: 10.1126/science.1182213. Epub 2009 Dec 10.

      Wu et al: Interestingly, the larva of peppered moths has multiple visual factors encoded by visual genes, which are conserved in multiple Lepidoptera, in the skin. Even when its compound eyes are covered, it can rely on the skin to feel the color of the environment to change its body color and adapt to the environment(48). Therefore, caterpillars/insects can distinguish the light wave frequency of the background. We suppose that perceptual signals can stimulate the GRN, the GRN guides the expression of some transcription factors and epigenetic factors, and the interaction of epigenetic factors and transcription factors can open or close the chromatin of corresponding downstream genes, which can guide downstream target gene expression.

      This is extremely confusing because you are bringing in a plastic trait here. It's possible there is a connection between the sensory stimulus and the regulation of mamo in peppered moths, but this is a mere hypothesis. Here, by mentioning a plastic trait, this paragraph sounds as if it was making a statement about directed evolution, especially after implying in the previous sentence that (paraphrasing) "random mutations are not enough". To be perfectly honest, the current writing could be misinterpreted and co-opted by defenders of the Intelligent Design doctrine. I believe and trust this is not your intention.

      Response: Thank you very much for your careful work. The plasticity of the body color of peppered moth larvae is very interesting, but we mainly wanted to emphasize that their skin shows the products of visual genes that can sense the color of the environment by perceiving light. Moreover, these genes are conserved in many insects. Human skin can also perceive light by opsins, suggesting that they might initiate light–induced signaling pathways (Haltaufderhyde K et al., 2015). This indicates that the perception of environmental light by the skin of animals and the induction of feedback through signaling pathways is a common phenomenon. For clarity, we have rewritten this section of the manuscript.

      Haltaufderhyde K, Ozdeslik RN, Wicks NL, Najera JA, Oancea E. Opsin expression in human epidermal skin. Photochem Photobiol. 2015;91(1):117-123.

      Wu et al: In addition, during the opening of chromatin, the probability of mutation of exposed genomic DNA sequences will increase (49).

      Here again, this is veering towards a strongly Lamarckian view with the environment guiding specific mutation. I simply cannot see how this would apply to mamo, nothing in the current article indicates this could be the case here. Among many issues with this, it's unclear how chromatin opening in the larval integument may result in heritable mutations in the germline.

      Response: Thank you very much for your careful work. Previous studies have shown that there is a mutation bias in the genome; compared with the intergenic region, the mutation frequency is reduced by half inside gene bodies and by two-thirds in essential genes. In addition, they compared the mutation rates of genes with different functions. The mutation rate in the coding region of essential genes (such as translation) is the lowest, and the mutation rates in the coding region of specialized functional genes (such as environmental response) are the highest. These patterns are mainly affected by the traits of the epigenome (J Grey Monroe et al., 2022).

      In eukaryotes, chromatin is organized as repeating units of nucleosomes, each consisting of a histone octamer and the surrounding DNA. This structure can protect DNA. When one gene is activated, the chromatin region of this gene is locally opened, becoming an accessible region. Research has found that DNA accessibility can lead to a higher mutation rate in the region (Radhakrishnan Sabarinathan et al., 2016; Schuster-Böckler B et al., 2012; Lawrence MS et al., 2013; Polak P et al., 2015). In addition, the BTB-ZF protein mamo belongs to this family and can recruit histone modification factors such as DNA methyltransferase 1 (DMNT1), cullin3 (CUL3), histone deacetylase 1 (HDAC1), and histone acetyltransferase 1 (HAT1) to perform chromatin remodeling at specific genomic sites. Although mutations can be predicted by the characteristics of apparent chromatin, the forms of mutations are diverse and random. Therefore, this does not violate randomness. For clarity, we have rewritten this section of the manuscript.

      J Grey Monroe, Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022 Feb;602(7895):101-105.

      Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A, López-Bigas N. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature. 2016;532(7598):264-267.

      Schuster-Böckler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488(7412):504-507.

      Lawrence MS, Stojanov P, Polak P, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214-218.

      Polak P, Karlić R, Koren A, et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature. 2015;518(7539):360-364.

      Mathew R, Seiler MP, Scanlon ST, et al. BTB-ZF factors recruit the E3 ligase cullin 3 to regulate lymphoid effector programs. Nature. 2012;491(7425):618-621.

      Wu et al: Transposon insertion occurs in a timely manner upstream of the cortex gene in melanic pepper moths (47), which may be caused by the similar binding of transcription factors and opening of chromatin.

      No, we do not think that the peppered moth mutation is Lamarckian at all, as seems to be inferred here (notice that by mentioning the peppered moth twice, you are juxtaposing a larval plastic trait and then a purely genetic wing trait, making it even more confusing). Also, the "in a timely manner" is superfluous, because all the data are consistent with a chance mutation being eventually picked up by strong directional mutation. The mutation and selection did NOT occur at the same time.

      Response: Thank you very much for your careful work. The insertion of one transposon into the first intron of the cortex gene of industrial melanism in peppered moth occurred in approximately 1819, which is similar to the time of industrial development in the UK (Arjen E Van't Hof, et al., 2016). In multiple species of Heliconius, the cortex gene is the shared genetic basis for the regulation of wing coloring patterns. Interestingly, the SNP of the cortex, associated with the wing color pattern, does not overlap among different Heliconius species, such as H. erato dephoon and H. erato favorinus, which suggests that the mutations of this cortex gene have different origins (Nadeau NJ et al., 2016). In addition, in Junonia coenia (van der Burg KRL et al., 2020) and Bombyx mori (Ito K et al., 2016), the cortex gene is a candidate for regulating changes in wing coloring patterns. Overall, the cortex gene is an evolutionary hotspot for the variation of multiple butterfly and moth wing coloring patterns. In addition, it was observed that the variations in the cortex are diverse in these species, including SNPs, indels, transposon insertions, inversions, etc. This indicates that although there are evolutionary hotspots in the insect genome, this variation is random. Therefore, this is not completely detached from randomness.

      Arjen E Van't Hof, et al., The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016 Jun 2;534(7605):102-5. doi: 10.1038/nature17951.

      Nadeau NJ, Pardo-Diaz C, Whibley A, et al. The gene cortex controls mimicry and crypsis in butterflies and moths. Nature. 2016;534(7605):106-110.

      van der Burg KRL, Lewis JJ, Brack BJ, Fandino RA, Mazo-Vargas A, Reed RD. Genomic architecture of a genetically assimilated seasonal color pattern. Science. 2020;370(6517):721-725.

      Ito K, Katsuma S, Kuwazaki S, et al. Mapping and recombination analysis of two moth colour mutations, Black moth and Wild wing spot, in the silkworm Bombyx mori. Heredity (Edinb). 2016;116(1):52-59.

      Wu et al: Therefore, we proposed that the genetic basis of color pattern evolution may mainly be system-guided programmed events that induce mutations in specific genomic regions of key genes rather than just random mutations of the genome.

      While the mutational target of pigment evolution may involve a handful of developmental regulator genes, you do not have the data to infer such a strong conclusion at the moment.

      The current formulation is also quite strong and teleological: "system-guided programmed events" imply intentionality or agency, an idea generally assigned to the anti-scientific Intelligent Design movement. There are a few examples of guided mutations, such as the adaptation phase of gRNA motifs in bacterial CRISPR assays, where I could see the term ""system-guided programmed events" to be applicable. But it is irrelevant here.

      Response: Thank you very much for your careful work. The CRISPR-CAS9 system is indeed very well known. In addition, recent studies have found the existence of a Cas9-like gene editing system in eukaryotes, such as Fanzor. Fanzor (Fz) was reported in 2013 as a eukaryotic TnpB-IS200/IS605 protein encoded by the transposon origin, and it was initially thought that the Fz protein (and prokaryotic TnpBs) might regulate transposon activity through methyltransferase activity (Saito M et al., 2023). Fz has recently been found to be a eukaryotic CRISPR‒Cas system. Although this system is found in fungi and mollusks, it raises hopes for scholars to find similar systems in other higher animals. However, before these gene-editing systems became popular, zinc finger nucleases (ZFNs) were already being studied as a gene-editing system in many species. The mechanism by which ZFN recognizes DNA depends on its zinc finger motif (Urnov FD et al., 2005). This is consistent with the mechanism by which transcription factors recognize DNA-binding sites.

      Furthermore, a very important evolutionary event in sexual reproduction is chromosome recombination during meiosis, which helps to produce more abundant alleles. Current research has found that this recombination event is not random. In mice and humans, the PRDM9 transcription factors are able to plan the sites of double-stranded breaks (DSBs) in meiosis recombination. PRDM9 is a histone methyltransferase consisting of three main regions: an amino-terminal region resembling the family of synovial sarcoma X (SSX) breakpoint proteins, which contains a Krüppel-associated box (KRAB) domain and an SSX repression domain (SSXRD); a PR/SET domain (a subclass of SET domains), surrounded by a pre-SET zinc knuckle and a post-SET zinc finger; and a long carboxy-terminal C2H2 zinc finger array. In most mammalian species, during early meiotic prophase, PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site. Subsequently, meiotic DNA DSBs are formed at hotspots through the combined action of SPO11 and TOPOVIBL. In addition, some proteins (such as RAD51) are involved in repairing the break point. In summary, programmed events of induced and repaired DSBs are widely present in organisms (Bhattacharyya T et al., 2019).

      These studies indicate that on the basis of randomness, the genome also exhibits programmability.

      Saito M, Xu P, Faure G, et al. Fanzor is a eukaryotic programmable RNA-guided endonuclease. Nature. 2023;620(7974):660-668.

      Urnov FD, Miller JC, Lee YL, et al. Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature. 2005;435(7042):646-651.

      Bhattacharyya T, Walker M, Powers NR, et al. Prdm9 and Meiotic Cohesin Proteins Cooperatively Promote DNA Double-Strand Break Formation in Mammalian Spermatocytes [published correction appears in Curr Biol. 2021 Mar 22;31(6):1351]. Curr Biol. 2019;29(6):1002-1018.e7.

      Wu et al: Based on this assumption, animals can undergo phenotypic changes more quickly and more accurately to cope with environmental changes. Thus, seemingly complex phenotypes such as cryptic coloring and mimicry that are highly similar to the background may have formed in a short period. However, the binding sites of some transcription factors widely distributed in the genome may be reserved regulatory interfaces to cope with potential environmental changes. In summary, the regulation of genes is smarter than imagined, and they resemble a more advanced self-regulation program.

      Here again, I can agree with the idea that certain genetic architectures can evolve quickly, but I cannot support the concept that the genetic changes are guided or accelerated by the environment. And again, none of this is relevant to the current findings about Bm-mamo.

      Response: Thank you very much for your careful work. Darwin's theory of natural selection has epoch-making significance. I deeply believe in the theory that species strive to evolve through natural selection. However, with the development of molecular genetics, Darwinism’s theory of undirected random mutations and slow accumulation of micromutations resulting in phenotype evolution has been increasingly challenged.

      The prerequisite for undirected random mutations and micromutations is excessive reproduction to generate a sufficiently large population. A sufficiently large population can contain sufficient genotypes to face various survival challenges. However, it is difficult to explain how some small groups and species with relatively low fertility rates have survived thus far. More importantly, the theory cannot explain the currently observed genomic mutation bias. In scientific research, every theory is constantly being modified to adapt to current discoveries. The most famous example is the debate over whether light is a particle or a wave, which has lasted for hundreds of years. However, in the 20th century, both sides seemed to compromise with each other, believing that light has a wave‒particle duality.

      Epigenetics has developed rapidly since 1987. Epigenetics has been widely accepted, defined as stable inheritance caused by chromosomal conformational changes without altering the DNA sequence, which differs from genetic research on variations in gene sequences. However, an increasing number of studies have found that histone modifications can affect gene sequence variation. In addition, both histones and epigenetic factors are essentially encoded by genes in the genome. Therefore, genetics and epigenetics should be interactive rather than parallel. However, some transcription factors play an important role in epigenetic modifications. Meiotic recombination is a key process that ensures the correct separation of homologous chromosomes through DNA double-stranded break repair mechanisms. The transcription factor PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site (Bhattacharyya T et al., 2019). Interestingly, mamo has been identified as an important candidate factor for meiosis hotspot setting in Drosophila (Winbush A et al., 2021).

      Bhattacharyya T, Walker M, Powers NR, et al. Prdm9 and Meiotic Cohesin Proteins Cooperatively Promote DNA Double-Strand Break Formation in Mammalian Spermatocytes [published correction appears in Curr Biol. 2021 Mar 22;31(6):1351]. Curr Biol. 2019;29(6):1002-1018.e7.

      Winbush A, Singh ND. Genomics of Recombination Rate Variation in Temperature-Evolved Drosophila melanogaster Populations. Genome Biol Evol. 2021;13(1): evaa252.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      Response: Thank you very much for your careful work. First, we believe that competitive research is sometimes coincidental and sometimes intentional. Our research began in 2009, when we began to configure the recombinant population. In 2016, we published an article on comparative transcriptomics (Wu et al. 2016). The article mentioned above has a strong interest in our research and is based on our transcriptome analysis for further research, with the aim of making a preemptive publication.

      To discourage such behavior, we cannot cite it and do not want to discuss it in our paper.

      Songyuan Wu et al. Comparative analysis of the integument transcriptomes of the black dilute mutant and the wild-type silkworm Bombyx mori. Sci Rep. 2016 May 19:6:26114. doi: 10.1038/srep26114.

      • line 52-54. The numerous biological functions of insect coloration have been thoroughly investigated. It is reasonable to expect more references for each function.

      Response: Thank you very much for your careful work. We have made the appropriate modifications.

      Sword GA, Simpson SJ, El Hadi OT, Wilps H. Density-dependent aposematism in the desert locust. Proc Biol Sci. 2000;267(1438):63-68. … Behavior.

      Barnes AI, Siva-Jothy MT. Density-dependent prophylaxis in the mealworm beetle Tenebrio molitor L. (Coleoptera: Tenebrionidae): cuticular melanization is an indicator of investment in immunity. Proc Biol Sci. 2000;267(1439):177-182. … Immunity.

      N. F. Hadley, A. Savill, T. D. Schultz, Coloration and Its Thermal Consequences in the New-Zealand Tiger Beetle Neocicindela-Perhispida. J Therm Biol. 1992;17, 55-61…. Thermoregulation.

      Y. G. Hu, Y. H. Shen, Z. Zhang, G. Q. Shi, Melanin and urate act to prevent ultraviolet damage in the integument of the silkworm, Bombyx mori. Arch Insect Biochem. 2013; 83, 41-55…. UV protection.

      M. Stevens, G. D. Ruxton, Linking the evolution and form of warning coloration in nature. P Roy Soc B-Biol Sci. 2012; 279, 417-426…. Aposematism.

      K. K. Dasmahapatra et al., Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature.2012; 487, 94-98…. Mimicry.

      Gaitonde N, Joshi J, Kunte K. Evolution of ontogenic change in color defenses of swallowtail butterflies. Ecol Evol. 2018;8(19):9751-9763. Published 2018 Sep 3. …Crypsis.

      B. S. Tullberg, S. Merilaita, C. Wiklund, Aposematism and crypsis combined as a result of distance dependence: functional versatility of the colour pattern in the swallowtail butterfly larva. P Roy Soc B-Biol Sci.2005; 272, 1315-1321…. Aposematism and crypsis combined.

      • line 59-60. This general statement needs to be rephrased. I suggest remaining simple by indicating that insect coloration can be pigmentary, structural, or bioluminescent. About the structural coloration and associated nanostructures, the authors could cite recent reviews, such as: Seago et al., Interface 2009 + Lloyd and Nadeau, Current Opinion in Genetics & Development 2021 + "Light as matter: natural structural colour in art" by Finet C. 2023. I suggest doing the same for recent reviews that cover pigmentary and bioluminescent coloration in insects. The very recent paper by Nishida et al. in Cell Reports 2023 on butterfly wing color made of pigmented liquid is also unique and worth to consider.

      Response: Thank you very much for your careful work. We have made the appropriate modifications.

      Insect coloration can be pigmentary, structural, or bioluminescent. Pigments are mainly synthesized by the insects themselves and form solid particles that are deposited in the cuticle of the body surface and the scales of the wings (10, 11). Interestingly, recent studies have found that bile pigments and carotenoid pigments synthesized through biological synthesis are incorporated into body fluids and passed through the wing membranes of two butterflies (Siproeta stelenes and Philaethria diatonica) via hemolymph circulation, providing color in the form of liquid pigments (12). The pigments form colors by selective absorption and/or scattering of light depending on their physical properties (13). However, structural color refers to colors, such as metallic colors and iridescence, generated by optical interference and grating diffraction of the microstructure/nanostructure of the body surface or appendages (such as scales) (14, 15). Pigment color and structural color are widely distributed in insects and can only be observed by the naked eye in illuminated environments. However, some insects, such as fireflies, exhibit colors (green to orange) in the dark due to bioluminescence (16). Bioluminescence occurs when luciferase catalyzes the oxidation of small molecules of luciferin (17). In conclusion, the color patterns of insects have evolved to be highly sophisticated and are closely related to their living environments. For example, cryptic color can deceive animals via high similarity to the surrounding environment. However, the molecular mechanism by which insects form precise color patterns to match their living environment is still unknown.

      • RNAi approach. I have no doubt that obtaining phenocopies by electroporation might be difficult. However, I find the final sampling a bit limited to draw conclusions from the RT-PCR (n=5 and n=3 for phenocopies and controls). Three control individuals is a very low number. Moreover, it would nice to see the variability on the plot, using for example violin plots.

      Response: Thank you very much for your careful work. In the RNAi experiment, we injected more than 20 individuals in the experimental group and control group. We have added the RNAi data in Figure 4.

      Author response table 1.

      • Figure 6. Higher magnification images of Dazao and Bm-mamo knockout are needed, as shown in Figure 5 on RNAi.

      Response: Thank you very much for your careful work. We have added enlarged images.

      Author response image 3.

      • Phylogenetic analysis/Figure S6. I am not sure to what extent the sampling is biased or not, but if not, it is noteworthy that mamo does not show duplicated copies (negative selection?). It might be interesting to discuss this point in the manuscript.

      Response: Thank you very much for your careful work. mamo belongs to the BTB/POZ zinc finger family. The members of this family exhibit significant expansion in vertebrates. For example, there are 3 members in C. elegans, 13 in D. melanogaster, 16 in Bombyx mori, 58 in M. musculus and 63 in H. sapiens (Wu et al, 2019). These members contain conserved BTB/POZ domains but vary in number and amino acid residue compositions of the zinc finger motifs. Due to the zinc finger motifs that bind to different DNA recognition sequences, there may be differences in their downstream target genes. Therefore, when searching for orthologous genes from different species, we required high conservation of their zinc finger motif sequences. Due to these strict conditions, only one orthologous gene was found in these species.

      • Differentially-expressed genes and CP candidate genes (line 189-191). The manuscript would gain in clarity if the authors explain more in details their procedure. For instance, they moved from a list of 191 genes to CP genes only. Can they say a little bit more about the non-CP genes that are differentially expressed? Maybe quantify the number of CPs among the total number of differentially-expressed genes to show that CPs are the main class?

      Response: Thank you very much for your careful work. The nr (Nonredundant Protein Sequence Database) annotations for 191 differentially expressed genes in Supplemental Table S3 were added. Among them, there were 19 cuticular proteins, 17 antibacterial peptide genes, 6 transporter genes, 5 transcription factor genes, 5 cytochrome genes, 53 enzyme-encoding genes and others. Because CP genes were significantly enriched in differentially expressed genes (DEGs), previous studies have found that BmorCPH24 can affect pigmentation. Therefore, we first conducted an investigation into CP genes.

      • Interaction between Bm-mamo. It is not clear why the authors chose to investigate the physical interaction of Bm-mamo protein with the putative binding site of yellow, and not with the sites upstream of tan and DDC. Do the authors test one interaction and assume the conclusion stands for the y, tan and DDC?

      Response: Thank you very much for your careful work. In D. melanogaster, the yellow gene is the most studied pigment gene. The upstream and intron sequences of the yellow gene have been identified as containing multiple cis-regulatory elements. Due to the important pigmentation role of the yellow gene and its variable cis-regulatory sequence among different species, it has been considered a research model for cis-regulatory elements (Laurent Arnoult et al. 2013, Gizem Kalay et al. 2019, Yaqun Xin et al. 2020, Yann Le Poul et al. 2020). We use yellow as an example to illustrate the regulation of the mamo gene. We added this description to the discussion.

      Laurent Arnoult et al. Emergence and diversification of fly pigmentation through evolution of a gene regulatory module. Science. 2013 Mar 22;339(6126):1423-6. doi: 10.1126/science.1233749.

      Gizem Kalay et al. Redundant and Cryptic Enhancer Activities of the Drosophila yellow Gene. Genetics. 2019 May;212(1):343-360. doi: 10.1534/genetics.119.301985. Epub 2019 Mar 6.

      Yaqun Xin et al. Enhancer evolutionary co-option through shared chromatin accessibility input. Proc Natl Acad Sci U S A. 2020 Aug 25;117(34):20636-20644. doi: 10.1073/pnas.2004003117. Epub 2020 Aug 10.

      Yann Le Poul et al. Regulatory encoding of quantitative variation in spatial activity of a Drosophila enhancer. Sci Adv. 2020 Dec 2;6(49):eabe2955. doi: 10.1126/sciadv.abe2955. Print 2020 Dec.

      • Please note that some controls are missing for the EMSA experiments. For instance, the putative binding-sites should be mutated and it should be shown that the interaction is lost.

      Response: Thank you very much for your careful work. In this study, we found that the DNA recognition sequence of mamo is highly conserved across multiple species. In D. melanogaster, studies have found that mamo can directly bind to the intron of the vasa gene to activate its expression. The DNA recognition sequence they use is TGCGT (Shoichi Nakamura et al. 2019). We chose a longer sequence, GTGCGTGGC, to detect the binding of mamo. This binding mechanism is consistent across species.

      • Figure 7 and supplementary data. How did the name of CPs attributed? According to automatic genome annotation of Bm genes and proteins? Based on Drosophila genome and associated gene names? Did the authors perform phylogenetic analyses to name the different CP genes?

      Response: Thank you very much for your careful work. The naming of CPs is based on their conserved motif and their arrangement order on the chromosome. In previous reports, sequence identification and phylogenetic analysis of CPs have been carried out in silkworms (Zhengwen Yan et al. 2022, Ryo Futahashi et al. 2008). The members of the same family have sequence similarity between different species, and their functions may be similar. We have completed the names of these genes in the text, for example, changing CPR2 to BmorCPR2.

      Zhengwen Yan et al. A Blueprint of Microstructures and Stage-Specific Transcriptome Dynamics of Cuticle Formation in Bombyx mori. Int J Mol Sci. 2022 May 5;23(9):5155.

      Ningjia He et al. Proteomic analysis of cast cuticles from Anopheles gambiae by tandem mass spectrometry. Insect Biochem Mol Biol. 2007 Feb;37(2):135-46.

      Maria V Karouzou et al. Drosophila cuticular proteins with the R&R Consensus: annotation and classification with a new tool for discriminating RR-1 and RR-2 sequences. Insect Biochem Mol Biol. 2007 Aug;37(8):754-60.

      Ryo Futahashi et al. Genome-wide identification of cuticular protein genes in the silkworm, Bombyx mori. Insect Biochem Mol Biol. 2008 Dec;38(12):1138-46.

      • Discussion. I think the discussion would gain in being shorter and refocused on the understudied role of CPs. Another non-canonical aspect of the discussion is the reference to additional experiments (e.g., parthogenesis line 290-302, figure S14). This is not the place to introduce more results, and it breaks the flow of the discussion. I encourage the authors to reshuffle the discussion: 1) summary of their findings on mamo and CPs, 2) link between pigmentation mutant phenotypes, pigmentation pattern and CPs, 3) general discussion about the (evo-)devo importance of CPs and link between pigment deposition and coloration. Three important papers should be mentioned here:

      1) Matsuoka Y and A Monteiro (2018) Melanin pathway genes regulate color and morphology of butterfly wing scales. Cell Reports 24: 56-65... Yellow has a pleiotropic role in cuticle deposition and pigmentation.

      2) https://arxiv.org/abs/2305.16628... Link between nanoscale cuticle density and pigmentation

      3) https://www.cell.com/cell-reports/pdf/S2211-1247(23)00831-8.pdf... Variation in pigmentation and implication of endosomal maturation (gene red).

      Response: Thank you very much for your careful work. We have rewritten the discussion section.

      1) We have summarized our findings.

      Bm-mamo may affect the synthesis of melanin in epidermis cells by regulating yellow, DDC, and tan; regulate the maturation of melanin granules in epidermis cells through BmMFS; and affect the deposition of melanin granules in the cuticle by regulating CP genes, thereby comprehensively regulating the color pattern in caterpillars.

      2) We describe the relationship among the pigmentation mutation phenotype, pigmentation pattern, and CP.

      Previous studies have shown that the lack of expression of BmorCPH24, which encodes important components of the endocuticle, can lead to dramatic changes in body shape and a significant reduction in the pigmentation of caterpillars (53). We crossed Bo (BmorCPH24 null mutation) and bd to obtain F1(Bo/+Bo, bd/+), then self-crossed F1 and observed the phenotype of F2. The lunar spots and star spots decreased, and light-colored stripes appeared on the body segments, but the other areas still had significant melanin pigmentation in double mutation (Bo, bd) individuals (Fig. S13). However, in previous studies, introduction of Bo into L (ectopic expression of wnt1 results in lunar stripes generated on each body segment) (24) and U (overexpression of SoxD results in excessive melanin pigmentation of the epidermis) (58) strains by genetic crosses can remarkably reduce the pigmentation of L and U (53). Interestingly, there was a more significant decrease in pigmentation in the double mutants (Bo, L) and (Bo, U) than in (Bo, bd). This suggests that Bm-mamo has a stronger ability than wnt1 and SoxD to regulate pigmentation. On the one hand, mamo may be a stronger regulator of the melanin metabolic pathway, and on the other hand, mamo may regulate other CP genes to reduce the impact of BmorCPH24 deficiency.

      3) We discussed the importance of (evo-) devo in CPs and the relationship between pigment deposition and coloring.

      CP genes usually account for over 1% of the total genes in an insect genome and can be categorized into several families, including CPR, CPG, CPH, CPAP1, CPAP3, CPT, CPF and CPFL (68). The CPR family is the largest group of CPs, containing a chitin-binding domain called the Rebers and Riddiford motif (R&R) (69). The variation in the R&R consensus sequence allows subdivision into three subfamilies (RR-1, RR-2, and RR-3) (70). Among the 28 CPs, 11 RR-1 genes, 6 RR-2 genes, 4 hypothetical cuticular protein (CPH) genes, 3 glycine-rich cuticular protein (CPG) genes, 3 cuticular protein Tweedle motif (CPT) genes, and 1 CPFL (like the CPFs in a conserved C-terminal region) gene were identified. The RR-1 consensus among species is usually more variable than RR-2, which suggests that RR-1 may have a species-specific function. RR-2 often clustered into several branches, which may be due to gene duplication events in co-orthologous groups and may result in conserved functions between species (71). The classification of CPH is due to their lack of known motifs. In the epidermis of Lepidoptera, the CPH genes often have high expression levels. For example, BmorCPH24 had a highest expression level, in silkworm larvae epidermis (72). The CPG protein is rich in glycine. The CPH and CPG genes are less commonly found in insects outside the order Lepidoptera (73). This suggests that they may provide species specific functions for the Lepidoptera. CPT contains a Tweedle motif, and the TweedleD1 mutation has a dramatic effect on body shape in D. melanogaster (74). The CPFL members are relatively conserved in species and may be involved in the synthesis of larval cuticles (75). CPT and CPFL may have relatively conserved functions among insects. The CP genes are a group of rapidly evolving genes, and their copy numbers may undergo significant changes in different species. In addition, RNAi experiments on 135 CP genes in brown planthopper (Nilaparvata lugens) showed that deficiency of 32 CP genes leads to significant defective phenotypes, such as lethal, developmental retardation, etc. It is suggested that the 32 CP genes are indispensable, and other CP genes may have redundant and complementary functions (76). In previous studies, it was found that the construction of the larval cuticle of silkworms requires the precise expression of over two hundred CP genes (22). The production, interaction, and deposition of CPs and pigments are complex and precise processes, and our research shows that Bm-mamo plays an important regulatory role in this process in silkworm caterpillars. For further understanding of the role of CPs, future work should aim to identify the function of important cuticular protein genes and the deposition mechanism in the cuticle.

      Minor comments - Title. At this stage, there is no evidence that Bm-mamo regulates caterpillar pigmentation outside of Bombyx mori. I suggest to precise 'silkworm caterpillars' in the title.

      Response: Thank you very much for your careful work. We have modified the title.

      • Abstract, line 29. Because the knowledge on pigmentation pathway(s) is advanced, I would suggest writing 'color pattern is not fully understood' instead of 'color pattern is not clear'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 29. I suggest 'the transcription factor' rather than 'a transcription factor'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 30. If you want to mention the protein, the name 'Bm-mamo' should not be italicized.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 30. 'in the silkworm'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 31. 'mamo' should not be italicized.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 31. 'in Drosophila' rather 'of Drosophila'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 32. Bring detail if the gamete function is conserved in insects? In all animals?

      Response: Thank you very much for your careful work. The sentence was changed to “This gene has a conserved function in gamete production in Drosophila and silkworms and evolved a pleiotropic function in the regulation of color patterns in caterpillars.”

      • Introduction, line 51. I am not sure what the authors mean by 'under natural light'. Please rephrase.

      Response: Thank you very much for your careful work. We have deleted “under natural light”.

      • line 43. I find that the sentence 'In some studies, it has been proven that epidermal proteins can affect the body shape and appendage development of insects' is not necessary here. Furthermore, this sentence breaks the flow of the teaser.

      Response: Thank you very much for your careful work. We have deleted this sentence.

      • line 51-52. 'Greatly benefit them' should be rephrased in a more neutral way. For example, 'colours pattern have been shown to be involved in...'.

      Response: Thank you very much for your careful work. We have modified to “and the color patterns have been shown to be involved in…”

      • line 62. CPs are secreted by the epidermis, but I would say that CPs play their structural role in the cuticle, not directly in the epidermis. I suggest rephrasing this sentence and adding references.

      Response: Thank you very much for your careful work. We have modified “epidermis” to “cuticle”.

      • line 67. Please indicate that pathways have been identified/reported in Lepidoptera (11). Otherwise, the reader does not understand if you refer to previous biochemical in Drosophila for example.

      Response: Thank you very much for your careful work. We have modified this sentence. “Moreover, the biochemical metabolic pathways of pigments used for color patterning in Lepidoptera…have been reported.”

      • line 69. Missing examples of pleiotropic factors and associated references. For example, I suggest adding: engrailed (Dufour, Koshikawa and Finet, PNAS 2020) + antennapedia (Prakash et al., Cell Reports 2022) + optix (Reed et al., Science 2011), etc. Need to add references for clawless, abdominal-A.

      Response: Thank you very much for your careful work. We have made modifications.

      • line 76. The simpler term moth might be enough (instead of Lepidoptera).

      Response: Thank you very much for your careful work. We have modified this to “insect”.

      • line 96. I would simplify the text by writing "Then, quantitative RT-PCR was performed..."

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 112. 'Predict' instead of 'estimate'?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 113. I would rather indicate the full name first, then indicate mamo between brackets.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 144. The Perl script needs to be made accessible on public repository.

      Response: Thank you very much for your careful work.

      • line 147-150. Too many technical details here. The details are already indicated in the material and methods section. Furthermore, the details break the flow of the paragraph.

      Response: Thank you very much for your careful work. We have modified this section.

      • line 152. Needs to make the link with the observed phenotypes in Figure 1. Just needs to state that RNAi phenocopies mimic the mutant alleles.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 153-157. Too many technical details here. The details are already indicated in the material and methods section. Furthermore, the details break the flow of the paragraph.

      Response: Thank you very much for your careful work. We have simplified this paragraph.

      • line 170. Please rephrase 'conserved in 30 species' because it might be understood as conserved in 30 species only, and not in other species.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 182. Maybe explain the rationale behind restricting the analysis to +/- 2kb. Can you cite a paper that shows that most of binding sites are within 2kb from the start codon?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 182. '14,623 predicted genes'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 183. '10,622 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 183. Redundancy. Please remove 'silkworm' or 'B. mori'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 187. '10,072 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 188. '9,853 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 200. "Therefore, the differential...in caterpillars" is a strong statement.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 204. Remove "The" in front of eight key genes. Also, needs a reference... maybe a recent review on the biochemical pathway of melanin in insects.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 220. This sentence is too general and vague. Please explicit what you mean by "in terms of evolution". Number of insect species? Diversity of niche occupancy? Morphological, physiological diversity?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 285. The verb "believe" should be replaced by a more neutral one.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 354-355. This sentence needs to be rephrased in a more objective way.

      Response: Thank you very much for your careful work. We have rewritten this sentence.

      • line 378. Missing reference for MUSCLE.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 379. Pearson model?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 408. "The CRISPRdirect online software was used...".

      Response: Thank you very much for your careful work. We have modified this sentence.

      • Figure 1. In the title, I suggest indicating Dazao, bd, bdf as it appears in the figure. Needs to precise 'silkworm larval development'.

      Response: Thank you very much for your careful work. We have modified this figure title.

      • Figure 3. In the title, is the word 'pattern' really necessary? In the legend, please indicate the meaning of the acronyms AMSG and PSG.

      Response: Thank you very much for your careful work. We have modified this figure legend.

      • Figure S7A. Typo 'Znic finger 1', 'Znic finger 2', 'Znic finger 3',

      Response: Thank you very much for your careful work. We have fixed these typos. .

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In 2019, Wilkinson and colleagues (PMID: 31142833) managed to break the veil in a 20-year open question on how to properly culture and expand Hematopoietic Stem Cells (HSCs). Although this study is revolutionizing the HSC biology field, several questions regarding the mechanisms of expansion remain open. Leveraging on this gap, Zhang et al.; embarked on a much-needed investigation regarding HSC self-renewal in this particular culturing setting.

      The authors firstly tacked the known caveat that some HSC membrane markers are altered during in vitro cultures by functionally establishing EPCR (CD201) as a reliable and stable HSC marker (Figure 1), demonstrating that this compartment is also responsible for long-term hematopoietic reconstitution (Figure 3). Next in Figure 2, the authors performed single-cell omics to shed light on the potential mechanisms involved in HSC maintenance, and interestingly it was shown that several hematopoietic populations like monocytes and neutrophils are also present in this culture conditions, which has not been reported. The study goes on to functionally characterize these cultured HSCs (cHSC). The authors elegantly demonstrate using state-of-the-art barcoding strategies that these culturing conditions provoke heterogeneity in the expanding HSC pool (Figure 4). In the last experiment (Figure 5), it was demonstrated that cHSC not only retain their high EPCR expression levels but upon transplantation, these cells remain more quiescent than freshly-isolated controls.

      Taken together, this study independently validates that the proposed culturing system works and provides new insights into the mechanisms whereby HSC expansion takes place.

      Most of the conclusions of this study are well supported by the present manuscript, some aspects regarding experimental design and especially the data analysis should be clarified and possibly extended.

      1) The first major point regards the single-cell (sc) omics performed on whole cultured cells (Figure 2):

      a. The authors claim that both RNA and ATAC were performed and indeed some ATAC-seq data is shown in Figure 2B, but this collected data seems to be highly underused.

      We appreciate the opportunity to clarify our analytical approach and the rationale behind it. In our study, we employed a novel deep learning framework, SAILERX, for our analysis. This framework is specifically designed to integrate multimodal data, such as RNAseq and ATACseq. The advantage of SAILERX lies in its ability to correct for technical noise inherent in sequencing processes and to align information from different modalities. Unlike methods that force a hard alignment of modalities into a shared latent space, SAILERX allows for a more refined integration. It achieves this by encouraging the local structures of the two modalities, as measured by pairwise similarities.

      To put it more simply, SAILERX combines RNAseq and ATACseq data, ensuring that the unique characteristics of each data type are respected and used to enhance the overall biological picture, rather than forcing them into a uniform framework.

      While it is indeed possible to analyze the ATAC-seq and RNA-seq modalities separately, and we acknowledge the potential value in such an approach, our primary objective in this study was to highlight the relatively low content of HSCs in cultures. This finding is a key point of our work, and the multiome data support this from a molecular point of view.

      The Seurat object we provide was created to facilitate further analysis by interested researchers. This object simplifies the exploration of both the ATAC-seq and RNA-seq data, allowing for additional investigations that may be of interest to the scientific community. We hope this explanation clarifies our methodology and its implications.

      b. It's not entirely clear to this reviewer the nature of the so-called "HSC signatures"(SF2C) and why exactly these genes were selected. There are genes such as Mpl and Angpt1 which are used for Mk-biased HSCs. Maybe relying on other HSC molecular signatures (PMID: 12228721, for example) would not only bring this study more into the current field context but would also have a more favorable analysis outcome. Moreover reclustering based on a different signature can also clarify the emergence of relevant HSC clusters.

      In our study, the selection of the HSC signature in our work was based on well-referenced datasets on well-defined HSPCs, as detailed in the "v. HSC signature" section of our methods. This signature was projected also to another single-cell RNA sequencing dataset generated from ex vivo expanded HSC culture (PMID: 35971894, see Author response image 1 below), demonstrating again an association primarily to the most primitive cells (at least based on gene expression).

      Author response image 1.

      Projection of "our" HSC signature on scRNAseq data from independent work.

      In further response to the suggestion here, we have also examined the molecular signature of HSCs referenced in PMID: 12228721 but also of another HSC signature from PMID: 26004780 in our data (Author response image 2). While these signatures do indeed enrich for cells that fall in the cluster of molecularly defined HSCs, our analysis indicates that neither of them significantly improves the identification of HSCs in our dataset compared to the signature we originally used. This finding reinforces our confidence in the appropriateness of our chosen HSC signature for this study.

      Author response image 2.

      Projection of alternative HSC signatures onto the SAILERX UMAP.

      Regarding the specific genes Mpl and Angpt1, we respectfully oppose the view that these genes are exclusively associated with MK-biased HSCs. There is substantial evidence supporting the broader role of Mpl in regulating HSCs, regardless of any particular "lineage bias". Similarly, while Angpt1 has been less extensively studied, its role in HSCs, as examined in PMID: 25821987, suggests a more general association with HSCs rather than a specific impact on MKs. Therefore, we maintain that it is more accurate to consider these genes as HSC-associated rather than restricted to MK-biased HSCs.

      Finally, addressing the comment on reclustering based on different signatures, we would like to clarify that the clustering process is independent of the projection of signatures. The clustering aims to identify cell populations based on their overall molecular profiles, and while signatures can aid in characterizing these populations, they do not influence the clustering process itself.

      c. The authors took the hard road to perform experiments with the elegant HSC-specific Fgd5-reporter, and they claim in lines 170-171 that it "failed to clearly demarcate in our single-cell multimodal data". This seems like a rather vague statement and leads to the idea that the scRNA-seq experiment is not reliable. It would be interesting to show a UMAP with this gene expression regardless and also potentially some other HSC markers.

      We understand the concerns raised about our statement on the performance of the Fgd5-reporter in our multimodal data analysis. Our aim was not to suggest that single-cell molecular data are unreliable. Instead, we intended to point out specific challenges associated with scRNA sequencing, notably the high rates of dropout. Regarding the specific example of Fgd5, it appears this transcript is not efficiently captured by 10x technology. Our previous 10x scRNA-seq experiments on cells from the Fgd5 reporter strain (Säwén et al., eLife 2018; Konturek-Ciesla et al., Cell Rep. 2023) support this observation. Despite cells being sorted as Fgd5-reporter positive, many showed no detectable transcripts.

      We consider it pertinent to note that our study integrates ATAC-seq data in conjunction with single-cell molecular data. We believe that this integration, coupled with the analytical methods we have employed, potentially offers a way to address some of the limitations typically associated with scRNA sequencing. However, in assessing frequencies, we observe that the number of candidate HSCs identified via single-cell molecular data is substantially higher compared to those identified through flow cytometry, the latter which we demonstrate correlate functionally with genuine long-term repopulating activity.

      With respect to Fgd5, as depicted in our analysis below, there appears to be an enrichment of cells in the cluster identified as HSCs, as well as a significant representation in the cycling cell cluster (Author response image 3). Regarding the projection of other individual genes, the Seurat object we have provided allows for such projections to be readily performed. This offers an opportunity for further exploration and validation of our findings by interested researchers.

      Author response image 3.

      Feature plot depicting Fgd5 expression in the SAILERX UMAP.

      2) During the discussion and in Figure 4, the authors ponder and demonstrate that this culturing system can provoke divert HSC close expansion, having also functional consequences. This a known caveat from the original system, but in more recent publications from the original group (PMID: 36809781 and PMID: 37385251) small alterations into the protocol seem to alleviate clone selection. It's intriguing why the authors have not included these parameters at least in some experiments to show reproducibility or why these studies are not mentioned during the discussion section.

      Thank you for pointing out the recent publications (PMID: 36809781 and PMID: 37385251) that discuss modifications to the HSC culturing system. We appreciate the opportunity to address why these were not included in our discussion or experiments.

      Firstly, it is important to note that these papers were published after the submission of our manuscript. In fact, one of the studies (PMID: 36809781) references the preprint version of our work on Biorxiv. This timing meant that we were unable to consider these studies in our initial manuscript or incorporate any of their findings into our experimental designs.

      Furthermore, as strong advocates for the peer-review system, we prioritize references that have undergone this rigorous process. Preprints, while valuable for early dissemination of research findings, do not offer the same level of scrutiny and validation as peer-reviewed publications. Our approach was to rely on the most relevant and rigorously reviewed literature available to us at the time of submission. This included, most notably, the original and ground-breaking work by Wilkinson et al., which provided a foundational basis for our research.

      We acknowledge that the field of HSC research is rapidly evolving, and new findings, such as those mentioned, are continually emerging. These new studies undoubtedly contribute valuable insights into HSC culturing systems and their optimization. However, given the timing of their publication relative to our study, we were not able to include them in our analysis or discussion.

      3) In this reviewer's opinion, the finding that transplanted cHSC are more quiescent than freshly isolated controls is the most remarkable aspect of this manuscript. There is a point of concern and an intriguing thought that sprouts from this experiment. It is empirical that for this experiment the same HSC dose is transplanted between both groups. This however is technically difficult since the membrane markers from both groups are different. Although after 8 weeks chimerism levels seem to be the same (SF5D) for both groups, it would strengthen the evidence if the author could demonstrate that the same number of HSCs were transplanted in both groups, likely by limiting dose experiments. Finally, it's interesting that even though EE100 cells underwent multiple replication rounds (adding to their replicative aging), these cells remained more quiescent once they were in an in vivo setting. Since the last author of this manuscript has also expertise in HSC aging, it would be interesting to explore whether these cells have "aged" during the expansion process by assessing whether they display an aged phenotype (myeloid-skewed output in serial transplantations and/or assisting their transcriptional age).

      We thank the reviewer for the insightful observations regarding the quiescence of transplanted cultured HSCs. We appreciate the opportunity to clarify the experimental design and its implications, particularly in the context of HSC aging.

      The primary aim of comparing cKit-enriched bone BM cells with cultured cells was to investigate if ex vivo activated HSCs exhibit a similar proliferation pattern to in vivo quiescent HSCs post-transplantation. This comparison was crucial for evaluating the similarity between in vitro cultured and "unmanipulated" HSC behavior. While we acknowledge the technical challenge of transplanting equivalent HSC doses between groups due to differing membrane markers, our study design focused on assessing stem cell activity post-culture. This was quantitatively evaluated by calculating the repopulating units (detailed in Table 1 and Fig S4G), rather than through a limiting dilution assay. There exists a plethora of literature demonstrating the correlation between these assays, although of course the limiting dilution assay is designed to provide a more exact output.

      Regarding the intriguing aspect of HSC aging in the context of ex vivo expansion, our observations indicate that both the subfraction of ex vivo expanded cells (Fig 3 and Fig S3) and the entire cultured population (Fig 4B, Fig 5B, Fig S4A, and Fig S5B) maintain long-term multilineage reconstitution capacity post-transplantation. This suggests that the PVA-culture system does not lead to apparent signs of "HSC aging," despite the cells undergoing active self-renewal in vitro. This is further supported by our serial transplantation experiments, where cultured cells continued to demonstrate multilineage capacity rather than any evident myeloid-biased reconstitution 16 weeks post-second transplantation (see Author response image 4 below).

      Author response image 4.

      Serial transplantation behavior of ex vivo expanded HSCs. 5 million whole BM cells from primary transplantation were transplanted together with 5 million competitor whole BM cells. The control group was transplanted with 100 cHSCs freshly isolated from BM for the primary transplantation. Mann-Whitney test was applied and the asterisks indicate significant differences. , p < 0.05; , p < 0.01; ***, p < 0.0001. Error bars denote SEM.

      However, we recognize the complexity of defining HSC aging and the potential for the culture system to influence certain aspects of this process. The association of aging signature genes with HSC primitiveness and young signature genes with differentiation presents an interesting dichotomy. Our analysis of a native dataset on young mice and the projection of aged signatures onto our multiome data (as shown below for a set of genes known to be induced at higher levels in aged HSCs (f.i. Wahlestedt et al., Nature Comm 2017), aging scRNAseq data from PMID: 36581635) does not directly indicate that the culture system promotes HSC aging compared to aged Lin-Sca+Kit+ cells. Yet, we do not rule out the possibility that culturing may influence other facets of the HSC aging process.

      In conclusion, while our current data do not provide direct evidence of induced HSC aging through the culture system, this remains a compelling area for future research. The potential impact of ex vivo culture on aspects of the HSC aging process warrants further exploration, and we appreciate your suggestion in this regard.

      Author response image 5.

      No evident signs of "molecular aging" following ex vivo expansion of HSCs. Young and aged scRNAseq data from PMID: 36581635 were integrated and explored from the perspective of known genes associating to HSC aging. The top row depicts contribution to UMAPs from young and aged cells (two left plots), cell cycle scores of the cells, and the expression of EPCR and CD48 as examples markers for primitive and more differentiated cells, respectively. The expression of the HSC aging-associated genes Wwtr1, Cavin2, Ghr, Clu and Aldh1a1 was then assessed in the data as well as in the SAILERX UMAP of cultured HSCs (bottom row).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Zhang and colleagues characterise the behaviour of mouse hematopoietic stem cells when cultured in PVA conditions, a recently published method for HSC expansion (Wilkinson et al., Nature, 2019), using multiome analysis (scRNA-seq and scATACseq in the same single cell) and extensive transplantation experiments. The latter are performed in several settings including barcoding and avoiding recipient conditioning. Collectively the authors identify several interesting properties of these cultures namely: 1) only very few cells within these cultures have long-term repopulation capacity, many others, however, have progenitor properties that can rescue mice from lethal myeloablation; 2) single-cell characterisation by combined scRNAseq and scATACseq is not sufficient to identify cells with repopulation capacity; 3) expanded HSCs can be engrafted in unconditioned host and return to quiescence.

      The authors also confirm previous studies that EPCRhigh HSCs have better reconstitution capability than EPCRlow HSCs when transplanted.

      Strengths:

      The major strength of this manuscript is that it describes how functional HSCs are expanded in PVA cultures to a deeper extent than what has been done in the original publication. The authors are also mindful of considering the complexities of interpreting transplantation data. As these PVA cultures become more widely used by the HSC community, this manuscript is valuable as it provides a better understanding of the model and its limitations.

      Novelty aspects include:

      • The authors determined that small numbers of expanded HSCs enable transplantation into non-conditioned syngeneic recipients.

      • This is to my knowledge the first report characterising the output of PVA cultures by multiome. This could be a very useful resource for the field.

      • They are also the first to my knowledge to use barcoding to quantify HSC repopulation capacity at the clonal level after PVA culture.

      • It is also useful to report that HSCs isolated from fetal livers do expand less than their adult counterparts in these PVA cultures.

      Weaknesses:

      • The analysis of the multiome experiment is limited. The authors do not discuss what cell types, other than functional or phenotypic HSCs are present in these cultures (are they mostly progenitors or bona fide mature cells?) and no quantifications are provided.

      The primary objective of our manuscript was to characterize the features of HSCs expanded from ex vivo culture. In this context, our analysis of the single cell multiome sequencing data was predominantly centered on elucidating the heterogeneity of cultures, along with subsequent in vivo functional analysis. This focus is reflected in our comparisons between the molecular features of ex vivo cultured candidate HSCs (cHSCs) and "fresh/unmanipulated" HSCs, as illustrated in Figures 2D-E of our manuscript.

      Our findings provide substantial evidence that ex vivo expanded cells share significant similarities with HSCs isolated from the BM in terms of molecular features, differentiation potential, heterogeneity, and in vivo stem cell activity/function. This suggests that the ex vivo culture system closely mimics several aspects of the in vivo environment, thereby broadening the potential applications of this system for HSC research.

      Regarding the presence of other cell types in the cultures, it is important to note that most cells did not express mature lineage markers, suggesting their immature status. However, we acknowledge the presence of some mature lineage marker-positive cells within the cultures. These cells are represented by the endpoints in our SAILERX UMAP, indicating a progression from immature to more differentiated states within the culture system.

      While the main emphasis of our study was on HSCs, we understand the importance of acknowledging and briefly discussing the presence and characteristics of other cell types in the cultures. This aspect provides a more comprehensive understanding of the culture system and its impact on cellular heterogeneity, although it was for the most part beyond the scope of our studies.

      • Barcoding experiments are technically elegant but do not bring particularly novel insights. We respectfully disagree with the view that our barcoding experiments do not offer novel insights. We believe that the application of barcoding technology in our study represents a significant advancement over previous methods, both in terms of quantitative rigor and ethical considerations.

      In the foundational work by Wilkinson et al., clonal assessments were indeed performed, but these were limited in scope and largely served as proof of concept. Our use of barcoding technology, on the other hand, allowed for a comprehensive quantitative assessment of the expansion potential of HSC clones. This technology enabled us to rigorously quantify the number of HSC clones capable of undergoing at least three self-renewing divisions (e.g. those clones present in 5 separate animals), while also revealing the heterogeneity in their expansion potential.

      One alternative approach could have been to culture single HSCs and distribute the progeny among multiple mice for analysis. However, when considering the sheer number of mice that would be required for such an experiment for quantitative assessments, it becomes evident that viral barcoding is a far superior method. Not only does it provide a more efficient and scalable approach to assessing clonal expansion, but it also significantly reduces the number of animals required for the study, aligning with the principles of ethical research and animal welfare.

      In conclusion, we assert that the barcoding experiments conducted in our study are not only technically robust but also yield novel quantitative insights into the dynamics of HSC clones within expansion cultures. These insights have value not only for current research but also hold potential implications for future applications.

      • The number of mice analysed in certain experiments is fairly low (Figures 1 and 5).

      We would like to clarify our approach in the context of the 3R (replacement, refinement, and reduction) policy, which guides ethical considerations in animal research.

      In alignment with the 3R principles, our study was designed to minimize the use of experimental animals wherever possible. For most experiments, including those presented in Figures 1 and 5, we adopted a standard of using five mice per group. Based on the effect sizes we observed, we concluded that this sample size was appropriate for most parts of our study.

      Specifically for Figure 5, we used two animals per time point, totaling seven animals per treatment group. It is important to note that we did not monitor the same animals over time but used different animals at each time point, as mice had to be sacrificed for the type of analyses conducted. Despite the seemingly small sample size, the results we obtained were remarkably consistent across groups. This consistency provided strong evidence that ex vivo activated HSCs return to a more quiescent state after being transplanted into unconditioned recipients. Given the clear and consistent nature of these results, we determined that including more animals for the purpose of additional statistical analysis was not necessary.

      Our approach reflects a balance between adhering to ethical standards in animal research and ensuring the scientific validity and reliability of our findings. We believe that the sample sizes chosen for our experiments are justified by the consistent and significant results we obtained, which contribute meaningfully to our understanding of HSC behavior post-transplantation.

      • The manuscript remains largely descriptive. While the data can be used to make useful recommendations to future users working with PVA cultures and in general with HSCs, those recommendations could be more clearly spelled out in the discussion.

      We fully agree that many aspects of our study are indeed descriptive, which is reflective of the exploratory and foundational nature of this type of research.

      We have strived to provide clear and direct recommendations for researchers interested in utilizing the PVA culture system, which we believe are evident throughout our manuscript:

      1) Utility of Viral Delivery in HSC Research: Our research, particularly through the use of barcoding experiments, underscores the effectiveness of viral delivery methods in HSC studies. While barcoding itself is a significant tool, it is the underlying process of viral delivery that truly exemplifies the potential of this approach. Our work shows that the culture system is highly conducive to maintaining HSC activity, which is critical for genetic manipulation. This is evident not only in our current study but also in our previous work that included for transient delivery methods (Eldeeb et al., Cell Reports 2023).

      2) Non-conditioned transplantation: Our findings suggest that non-conditioned transplantation can be a valuable method in studying both normal and malignant hematopoiesis. This approach can complement genetic lineage tracing models, providing a more native and physiological context for hematopoietic research. We state this explicitly in our discussion.

      3) Integration with recent technical advances: The combination of the PVA culture system with recent developments in transplantation biology, genome engineering, and single-cell technologies holds significant promise. This integration is likely to yield exciting discoveries with relevance to both basic and clinically oriented hematopoietic research. This is the end statement of our discussion.

      While our manuscript is in a way tailored to those with experience in HSC research, we have made a concerted effort to ensure that the content is accessible and informative to a broader audience, including those less familiar with this area of study. Our intention is to provide a resource that is both informative for experts in the field and approachable for newcomers.

      • The authors should also provide a discussion of the other publications that have used these methods to date.

      We would like to clarify that the scope of literature on the specific methods we employed, particularly in the context of our research objectives, is not extensive. Most of the existing references on these methods come from a relatively narrow range of research groups. In preparing our manuscript, we tried to be comprehensive yet selective in our citations to maintain focus and relevance. Our referencing strategy was guided by the aim to include literature that was most directly pertinent to our study's methodologies and findings.

      Overall, the authors succeeded in providing a useful set of experiments to better interpret what type of HSCs are expanded in PVA cultures. More in-depth mining of their bioinformatic data (by the authors or other groups) is likely to highlight other interesting/relevant aspects of HSC biology in relation to this expansion methodology.

      We are grateful for the overall positive assessment of our work and the recognition of its contributions to understanding HSC expansion in PVA cultures.

      We agree that every study, including ours, has its limitations, particularly regarding the scope and depth of exploration. It is challenging to cover every aspect comprehensively in a single study. Our research aimed to provide a foundational understanding of HSCs in PVA cultures, and we are pleased that this goal appears to have been met.

      We also concur with your point on the potential for further in-depth mining of our bioinformatic data. Our hope is that this data can serve as a resource (or at least a starting point) for other investigators.

      In conclusion, we hope that our responses have adequately addressed your queries and clarified any concerns. We are committed to contributing to the growth of knowledge in HSC research and look forward to the advancements that our study might enable, both within our team and the wider scientific community.

      Reviewer #1 (Recommendations For The Authors):

      1) In Line 150, the R packages can/should be mentioned just in the method section;

      We have moved this text to the methods section.

      2) In Figure F3C adding a legend next to the plot would assist the reader in identifying which populations are referred to, as the same color pellet is used for other panels;

      We have now adjusted the figure legend position to make it more clear for the reader.

      3) In Figure 4D, for the pre-culture experiments 1000 cHSCs were used and then in the post-culture 1200 cHSCs were used. Can the authors justify the different numbers?

      The decision to use 1000 cHSCs in the pre-culture experiments and 1200 cHSCs in the post-culture experiments was not based on a specific rationale favoring one cell number over the other. In our Method section, we have detailed our experimental design, which was structured to provide robust and reliable readouts of HSC behavior and characteristics in different conditions.

      We consider the two cell numbers – 1000 and 1200 – to be quite similar in the context of our experimental aims. Since the readouts here are based on clonal assessments, this slight difference in cell numbers is unlikely to significantly impact the overall conclusions drawn from these experiments. The primary focus of our study was on qualitative aspects of HSC behavior and function, rather than on quantitative differences that might arise from small variations in initial cell numbers.

      4) In SF5F it would help readers if a line plot (per group) was also shown together with the dot plots. Moreover, applying statistics to the trend lines (Wilcoxon, for example) would strengthen the argument that cHSCs divide less than control cells.

      We would like to clarify that the data presented in SF5F were derived from different animals at each respective time point. As such, the data points at each time point represent independent measurements from separate animals, rather than a continuous measurement from the same set of animals over time. Therefore, creating a line plot that connects each time point within a group would inadvertently convey a misleading impression of a longitudinal study on the same animals, which is not reflective of the actual experimental design. Instead, the dot plot format was chosen as it more accurately depicts the independent and discrete nature of the measurements at each time point. Our current data presentation method was selected to provide the most accurate and transparent representation of our findings.

      Reviewer #2 (Recommendations For The Authors):

      Listed below are recommendations to further improve this manuscript:

      Major Comments

      1) Fig 1: the authors showed that EPCRhigh HSCs have better reconstitution capability than EPCRlow HSCs via bone marrow transplantation. Additionally, mice receiving cultured EPCRhigh SLAM LSK cells were more efficiently radioprotected than those receiving PVA expanded EPCRlow SLAM LSK.

      a. In addition to Fig.1F, authors should show the lineage distributions and chimerism of mice receiving cultured EPCRhigh and EPCRlow SLAM LSK respectively.

      We have indeed analyzed the lineage distribution in these experiments, and our findings indicate no statistically significant differences between the groups (see graph in Author response image 6). This suggests that the cultured EPCRhigh and EPCRlow SLAM LSK cells do not preferentially differentiate into specific lineages in a way that would impact the overall interpretation of our results.

      Author response image 6.

      Regarding the chimerism in peripheral blood (PB) lineages, Fig. 1F in our manuscript currently shows the PB myeloid chimerism. We chose to focus on this parameter as it most directly relates to our study's objectives. We did here not transplant with competitor cells, and in most cases, the chimerism levels reached 100% for lineages other than T cells (T cells being more radioresistant). Based on our analysis, including data on chimerism in other PB lineages would not significantly enhance the understanding of the functional capacity of the transplanted cells, as the myeloid chimerism data already provides a robust indicator of their engraftment and functional potential.

      We believe that our current presentation of data in Fig. 1F, along with the additional analyses provided in the results section, offers a comprehensive understanding of the behavior and potential of the cultured EPCRhigh and EPCRlow SLAM LSK cells.

      b. Fig1F: only 5 mice were used in each group. Could this result occur by chance? Testing with Fisher's exact test with the data provided results in p=0.16. The authors should consider adding more animals or adding the p-value above (or from another relevant test) for readers' consideration.

      We acknowledge the point that only five mice were used in each group and understand the concern regarding the robustness of our findings.

      As correctly noted, applying Fisher's exact test to the data in Fig. 1F results in a p-value which does not reach the conventional threshold for statistical significance. However, one might also consider the analysis of the KM survival curve, which associated with a p-value of 0.0528 (Fig. 1F, left graph below; Gehan-Breslow-Wilcoxon test). A similar test on the single-cell culture transplantation experiment (Fig. 1E, right graph below) also demonstrated statistical significance (p-value = 0.0485).

      While these p-values meet (or are very close to) the conventional criteria for statistical significance (p<0.05), we have chosen to place greater emphasis on effect sizes rather than strictly on p-values. This decision is based on our belief that effect sizes provide a more direct and meaningful measure of the biological impact observed in our experiments. We find that the effect sizes observed are compelling and consistent with the overall narrative of our study.

      Author response image 5.

      2) The characterisation of the multiome experiment is highly underdeveloped.

      a. From an experimental point of view, it is not clear how the PVA culture for this experiment was started. Are there technical/biological replicates? Have several PVA cultures been pooled together?

      We have included these details in the revised text to ensure a comprehensive understanding of our experimental setup.

      b. Fig2B: The authors should present more data as to how each of the clusters was annotated (bubble plot of marker genes used for annotation?) and importantly the percentage of cells in each of the clusters. It is particularly relevant to note what % is the cluster annotated as HSCs and compare that to the % of phenotypic HSCs and the % repopulating HSCs calculated in the transplantation experiments.

      In our study, the annotation of clusters was primarily based on reference genes for cell types from prior works in the field, such as from our recent work (Konturek-Ciesla et al., Cell Reports 2023). Additionally, we employed transcription factor (TF) motifs to assign identities to these clusters. This approach is relatively standard in the field, and we believe it provides a robust framework for our analysis. We included information on some of the key TF motifs used to guide our annotations.

      Regarding the assignment of a percentage to cells within the HSC cluster, we initially had reservations about the utility of this measure. This is because the transcriptional identity of HSCs might not align precisely with their identity based on candidate HSC protein markers. There are complexities related to transcriptional continuums that could influence the interpretation of such data. However, acknowledging your request for this information, we have now included the percentage of cells in the HSC cluster in Fig. 2B for reference.

      We also wish to highlight that when isolating EPCR+ cells, which encompasses a range of CD48 expression, clustering becomes much less distinct, as shown in Fig. 2E. Most of these cells do not demonstrate long-term functional HSC activity in a transplantation setting (as presented in Figure 3). This observation underscores the challenges in deducing HSC identity based solely on molecular data and reinforces the importance of functional validation.

      c. Are there any mature cells in these PVA cultures? The annotations presented in the table under the UMAP are vague: Are cluster 4 monocytes or monocytes progenitors? Same for clusters 0,1 and 7 - are these progenitors or more mature cells? How were HPCs (cluster 3) distinguished from cHSCs (cluster 5)?

      We agree with your observation that the annotations for certain clusters, such as clusters 4, 0, 1, and 7, as well as the distinction between HPCs (cluster 3) and cHSCs (cluster 5), appear vague. This vagueness to some extent stems from the challenges inherent in comparing cultured cells to their counterparts isolated directly from animals. Most reference data defining cell types are derived from cells in their native state, and less is known about how these definitions translate to the progeny of HSPCs cultured in vitro.

      In our study, we used the expression of reference genes and enriched transcription factor motifs to annotate clusters. This method, while useful, has its limitations in precisely defining the maturation stage of cells in culture. The enrichment of lineage-defining factors at the ends of the UMAP suggests the presence of more mature cells, whereas the lack of lineage marker expression in the majority of cells implies a general lack of terminal differentiation.

      This issue is not necessarily unique to the culture situation, as similar challenges in cell type annotation are encountered in other contexts, such as the analysis of granulocyte-macrophage progenitors in bone marrow, where a vast range of cell types and clusters are identified (e.g., PMID: 26627738). To try to address these challenges, we employed an approach detailed in the methods section under the header "iv. ATAC processing and cluster annotation." We assessed marker genes for clusters using Enrichr for cell types, relying on databases designed to provide gene expression identities to defined cell types. This methodology informed our references to the clusters.

      In summary, while our annotations provide a general overview of the cell types present in the cultures, we acknowledge the complexities and limitations in precisely defining these types, particularly in distinguishing between progenitors and more mature cells. We hope this explanation clarifies our approach and the considerations behind our cluster annotations, but at the same time feel that the alternative approaches have their own drawbacks.

      d. What is the meaning of the trajectories presented in Figure 2C? In the absence of a comparison to i) what is observed either when HSCs are cultured in control/non-expanding conditions ii) an in vivo landscape of differentiation in mouse bone marrow; this analysis does not bring any relevant piece of information.

      We understand the perspective on comparisons to control conditions and in vivo differentiation landscapes. However, we respectfully disagree with the viewpoint that the analysis that we have performed does not bring relevant information.

      The trajectory analysis in Figure 2C is intended to provide insights into the cell types generated in our PVA cultures and the potential differentiation pathways they may follow. This kind of analysis is particularly valuable in the context of understanding how in vitro cultures can support HSC maintenance and differentiation, which is a topic of significant interest in the field. For instance, studies like PMID: 31974159 have highlighted the importance of combining in vitro HSC cultures with molecular investigations.

      While we acknowledge that our analysis would benefit from a direct comparison to control or non-expanding conditions, as well as to an in vivo differentiation landscape, we believe that the information provided by our current analysis still holds substantial value. It offers a glimpse into the possible cellular dynamics and differentiation routes within our culture system, which can be a valuable reference point for other investigators working with similar systems.

      Regarding the confidence in computed differentiation trajectories, we recognize that this is an area where caution is warranted. Computational approaches to define cell differentiation pathways have inherent limitations and should be interpreted within the context of their assumptions and the data available. This challenge is not unique to our work but is a broader issue in the field of computational biology.

      In conclusion, while we agree that additional comparative analyses could further enrich our findings, we maintain that the trajectory analysis presented in Figure 2C contributes meaningful insights into cell differentiation in our PVA culture system. We believe these insights are of interest and value to researchers exploring the complex interplay of HSC maintenance and differentiation in vitro.

      3) The addition of barcoding experiments is appreciated. However, it is already known that upon transplantation clonal output is highly heteroegeneous, with a small number of clones predominating over others. This is particularly the case after myeloablation conditioning.

      a. The "pre-culture" experimental design makes sense. The "post-culture" one is however ambiguous in terms of result interpretation. The authors observe fewer clones contributing to a large proportion of the graft (>5%) than in the "pre-culture" setting. Their interpretation is that expanded HSCs are functionally more homogeneous than the input HSCs. However, in the pre-culture experiment, there are 19 days of expansion during which there will be selection pressures over culture plus ongoing differentiation. In the post-culture experiment, there is no time for such pressures to be exerted. Therefore the conclusion drawn by the authors is not the only conclusion. I would encourage the authors to compare the "pre-culture" experiment to an experiment in which cHSCs are in culture for 48h, then barcoded, and then transplanted. This would be much more informative and would allow a proper comparison of expanded HSCs vs input HSCs.

      We understand the perspective that a shorter culture period would reduce the influence of selection pressures and differentiation, potentially allowing for a more direct comparison between expanded HSCs and input HSCs. However, we would like to point out that similar experiments have been conducted in the past, as referenced in our work (PMID: 28224997) and others (PMID: 21964413). These studies have demonstrated a significant heterogeneity in the reconstituting clones when barcoding is done early and cells are transplanted directly.

      In light of previous research, we are confident that our methodology — tracking the fates of candidate HSC clones throughout the culture period and assessing the outcomes of individual cells from these expanding clones — yields significant and pertinent insights. We want to highlight the significance of barcoding cells late in the culture, a strategy that allows us to barcode cells that have already been subjected to potential selection pressures within the culture environment. Our primary objective is to investigate the effects of these selection pressures on the subsequent in vivo behavior of the cells that emerge from this process. By focusing on this aspect, we aim to deepen the understanding of how in vitro culture conditions influence the functional characteristics and heterogeneity of HSCs after expansion. We believe this approach provides a unique perspective on the adaptive changes HSCs undergo during culture and their implications for transplantation efficacy and HSC biology. Our study thus addresses a critical question in the field: how do the conditions and selection pressures inherent to in vitro culture impact the quality and behavior of HSCs upon their return to an in vivo environment?

      b. Another experiment the authors may consider is barcoding in unconditioned recipients as there the bottleneck of selecting specific clones should be lower. In addition, this could nicely complement the return to quiescence observed in Figure 5 (see point below)

      We agree that this experiment could provide valuable insights, particularly in understanding how different selection pressures might affect HSC clones in various transplantation contexts. It would indeed be a worthwhile complement to our observations in Figure 5 regarding the return to quiescence of HSCs post-transplantation.

      However, we would like to point out that our study already includes a substantial amount of data and analyses aimed at addressing specific research questions within this defined scope. The addition of an experiment with barcoding in unconditioned recipients, while undoubtedly relevant and interesting, would extend beyond the boundaries we set for this particular study.

      4) Figure 5D-F, only 2 animals per condition were tested, so the experiment is underpowered for any statistics. How about cell viability of cHSC after in vitro culture? The authors have also not tested whether there is a difference in cell viability post-transplant between EE100 and control. In addition, comparing cell cycle profiles of donor EPCR+ HSCs in these transplanted mice would provide additional evidence to support the conclusion.

      Regarding the sample size, we acknowledge that only two animals per condition were used in these experiments, which limits the statistical power for robust quantitative analysis. This decision was guided by ethical considerations to minimize animal use, in line with the 3Rs principle (Replacement, Reduction, Refinement). Despite the small sample size, we believe that the strong trends observed in these experiments are indicative and consistent with our broader findings, although we recognize the limitations in terms of statistical generalization. At the same time, as we have written in the public response: "Specifically for Figure 5, we used two animals per time point, totaling seven animals per treatment group. It is important to note that we did not monitor the same animals over time but used different animals at each time point, as mice had to be sacrificed for the type of analyses conducted."

      In the context of post-transplant analysis, conducting separate viability assessments on transplanted cells is not typically informative. This is because non-viable cells would naturally be eliminated through biological processes such as phagocytosis soon after transplantation. Therefore, any post-transplant viability analysis would not provide meaningful insights into the engraftment potential or behavior of the transplanted cells.

      However, it is important to note that in all our cell isolation and analysis protocols, we routinely include viability markers. This practice ensures that the cell populations we study and report on are indeed viable. Including these markers is a standard part of our methodology and contributes to the accuracy and reliability of our data.

      Regarding the comparison of cell cycle profiles, we chose to focus on the cell trace assay as a means to monitor and track cell division history, which directly addresses the central theme here - informing on the proliferation and quiescence dynamics of transplanted HSCs. While comparing cell cycle profiles could perhaps offer an additional layer of information, we did not deem it essential for our core objectives.

      5) Several publications have used these PVA cultures and made comments on their strengths and limitations. They do not overlap with this study but should be discussed here for completeness (for example Che et al, Cell Reports, 2022; Becker et al., Cell Stem Cell, 2023; Igarashi, Blood Advances, 2023).

      See comments to reviewer 1.

      Minor Comments

      Figure 1C: should add in the legend that this is in peripheral blood.

      Figure 2C: typo in the title.

      Figure 3A: typo in "equivalent". We thank the reviewer for catching these errors, which we have now corrected.

      Figure 3B and 3C: symbol colours of EPCRhighCD48+ and EPCR- are too similar to distinguish the 2 groups easily. We highly recommend using contrasting colours.

      For easier visualization, we have changed the symbol types and colors in our revised version.

      Fig3B and S3A-B: authors should show statistical significance in comparing the 4 fractions. We have now added this information.

      In the discussion, the authors rightly point out a paper that described EPCR+ HSCs. There are other papers that also looked at EPCR intensity (high vs low), for example, Umemoto et al., EMBO J, 2022.

      While we acknowledge the relevance of the paper you mentioned, we faced constraints in the number of references we could include. Therefore, we prioritized citing the original demonstration of EPCR as an HSC marker, particularly focusing on the work by the Mulligan laboratory, which established that cells expressing the highest levels of EPCR exhibit the most potent HSC activity. We believe this reference most directly supports the core focus of our study and provides the necessary context for our findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Xia et al. investigated the mechanisms underlying Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). The authors observed that abnormal osteogenesis and adipogenesis are associated with decreased β-catenin in the necrotic femoral head of GONFH patients, and that the inhibition of β-catenin signalling leads to abnormal osteogenesis and adipogenesis in GONFH rats. Of interest, the deletion of β-catenin in Col2-expressing cells rather than in Osx-expressing cells leads to a GONFH-like phenotype in the femoral head of mice.

      Strengths:

      A strength of the study is that it sets up a Col2-expressing cell-specific β-catenin knockout mouse model that mimics the full spectrum of osteonecrosis phenotype of GONFH. This is interesting and provides new insights into the understanding of GONFH. Overall, the data are solid and support their conclusions.

      Reviewer #1 (Recommendations For The Authors):

      1) Fig. 1I should be quantified and presented as bar graphs to make it consistent with other data, and the significance should be shown.

      Reply: Thanks for your comments. We have provided the quantitative bar graph in the new version.

      2) Fig. 2H, beta-catenin, ALP and FABP4 should be labled below the X axis. Moreover, the pattern of Fig. 2H is different from other bar graphs and the dots for individual samples are missing, so I could not judge the N values for the experiments. N values should also be provided for Fig. 3.

      Reply: Thanks for your comments. We have added the labels of beta-catenin, ALP and FABP4 below the X axis in Fig. 2H. The modes of quantitative bar graphs were changed to show the N values in the each experiment.

      3) Fig. 4 shows the fate mapping of Col2+ cells and Osx+ cells in the femoral head. In this regard, the authors presented images for Col2-expressing cells at all the indicated time points, i.e. 1, 3, 6, and 9 months, but only presented images for Osx-expressing cells for 1 month while those for 3, 6, and 9 months are missing.

      Reply: Thanks for your comments. Here, we showed that the expression of Osx+ cells in the femoral head were total different with Col2+ cells at the age of 3, 6 month, further indicating they were two different progenitor lineage cells.

      Author response image 1.

      4) Some experiments may need to be described in more detail" e.g., ABH/Orange G staining, biomechanical testing, μCT analysis, et al.

      Reply: Thanks for your comments. We have provided more information of experiment procedures.

      5) This study proposed that Col2-expressing cells play a key role in the progression of GONFH, did the authors use Col2+ cells for the in vitro experiments?

      Reply: As in vitro experiments could not reflect the location of Col2-expressing cells in the femoral head, therefore here we applied in vivo lineage tracing study. After as long as 9 month of linage trace, we thoroughly showed the self-renew ability and osteogenic commitment of Col2+ cells, as well as its space variation in the femoral head with age. Conditional knockout of β-catenin caused that Col2+ cells trans-differentiated into adipogenic cells instead of osteogenic cells, which directly clarified the mechanism of Col2+ cells leading to GONFH-like phenotype in mice.

      6) A few typo errors, such as Line 13, "contribute" should be "contributes"; Line 118, "reveled" should be "revealed".

      Reply: We have revised the grammar errors in the new manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reported a study to uncover that β-catenin inhibition disrupting the homeostasis of osteogenic/adipogenic differentiation contributes to the development of Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). In this study, they first observed abnormal osteogenesis and adipogenesis associated with decreased β-catenin in the necrotic femoral head of GONFH patients, but the exact pathological mechanisms of GONFH remain unknown. They then performed in vivo and in vitro studies to further reveal that glucocorticoid exposure disrupted osteogenic/adipogenic differentiation of bone marrow stromal cells (BMSCs) by inhibiting β-catenin signaling in glucocorticoid-induced GONFH rats, and specific deletion of β-catenin in Col2+ cells shifted BMSCs commitment from osteoblasts to adipocytes, leading to a full spectrum of disease phenotype of GONFH in adult mice.

      Strengths:

      This innovative study provides strong evidence supporting that β-catenin inhibition disrupts the homeostasis of osteogenic/adipogenic differentiation that contributes to the development of GONFH. This study also identifies an ideal genetically modified mouse model of GONFH. Overall, the experiment is logically designed, the figures are clear, and the data generated from humans and animals is abundant supporting their conclusions.

      Weaknesses:

      There is a lack of discussion to explain how the Wnt agonist 1 works. There are several types of Wnt ligands. It is not clear if this agonist only targets Wnt1 or other Wnts as well. Also, why Wnt agonist 1 couldn't rescue the GONFH-like phenotype in β-cateninCol2ER mice needs to be discussed.

      Reply: Thanks for your constructive comments. Wnt agonist 1 is a cell-permeating activator of the Wnt signaling pathway that induces transcriptional activity dependent on β-catenin (PMID: 25514428,18624906). In the present study, we aim to demonstrate that activation of β-catenin signaling could alleviate the phenotype of rat GONFH, thus only β-catenin and downstream targets (RUNX2, ALP, PPAR-γ, FABP4) expressions were detected after Wnt agonist 1 intervention. Conditional knockout β-catenin in Col2+ cells lead to an mouse GONFH-like phenotype. Wnt agonist 1 couldn't rescue this GONFH-like, as it did not activate β-catenin signaling. We have discussed them in the new version.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors are trying to delineate the mechanism underlying the osteonecrosis of the femoral head.

      Strengths:

      The authors provided compelling in vivo and in vitro data to demonstrate Col2+ cells and Osx+ cells were differentially expressed in the femoral head. Moreover, inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype including fat accumulation, subchondral bone destruction, and femoral head collapse, indicating that imbalance of osteogenic/adipogenic differentiation of Col2+ cells plays an important role in GONFH pathogenesis. Therefore, this manuscript provided mechanistic insights into osteonecrosis as well as potential therapeutic targets for disease treatment.

      Weaknesses:

      However, additional in-depth discussion regarding the phenotype observed in mice is highly encouraged.

      Reply: Thanks for your comments. Inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype. Lineage tracing data showed Col2+ cells and Osx+ cells were different cell populations, and we have discussed the potential mechanism caused the different phenotypes between β-cateninCol2ER mice and β-cateninOsxER mice.

      1) Why did the authors use dexamethasone in the cellular experiments but methylprednisolone to induce the GONFH rat model?

      Reply: Thanks for the comments. Here, we applied a dexamethasone (DEX)-treated BMSC model in vitro and a methylprednisolone (MPS)-induced rat model in vivo for GONFH study based on the published literatures (PMID: 37317020, 29662787, 29512684,35126710, 32835568).

      2) Both bone damage and fat accumulation were observed in 3-month-old and 6-month-old β-cateninCol2ER mice, but the femoral head collapse (the feature of GONFH at the late stage) only occurred in the older β-catenin Col2ER mice. This interesting observation needs to be discussed. Reply: Thanks for the comments. Bone damage caused a poor mechanical support is the key to femoral head collapse. Despite of similar trabecular bone loss and fat accumulation in the 3-month-old and 6-month-old β-cateninCol2ER mice, the older mice also presented extensive subchondral bone destruction. Integrated subchondral bone provided a well mechanical support for femoral head morphology, therefore femoral head collapse were occurred in the older β-cateninCol2ER mice.

      3) In the Materials and Methods, detailed information on the reagents should be provided.

      Reply: We have provided detailed information of the important reagents.

      4) As shown in Figure 4, β-cateninOsxER mice at 3 months of age did not show differences in lipid droplet area and empty lacunae rate, but there was a decrease in bone area. The authors should at least provide some necessary discussion of this phenomenon.

      Reply: Thanks for your comments. In the present study, we found few lipid droplet and empty lacuna but a significant decrease of bone mass in the femoral heads of β-cateninOsxER mice. Previous studies showed that specific knockout of β-catenin in Osx-expressing cells promoted osteoclast formation and activity, leading to the bone mass loss (PMID: 29124436, 34973494). We discussed this phenomenon in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editors for their constructive and critical comments/ suggestions regarding our paper. We have since extensively revised the manuscript accordingly, including the addition of new experimental data. Hope the readers, reviewers, and editors are now satisfied with the quality and significance of the revised paper.

      Our responses to the eLife assessment and the reviewers’ comment as well as the details of the revisions are described below.

      Wang et al present a useful manuscript that builds modestly on the group's previous publication on KLF1 (EKLF) K47R mice focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo (Shyu et al., Adv Sci (Weinh). 2022). The data demonstrates that Eklf (K74R) imparts these advantages in a background, age, and gender independent manner, not the consequence of the specific amino acid substitution, and transferable by BMT. However, the authors overstate the meaning of these results and the strength of evidence is incomplete, since only a melanoma model of cancer is used, it is unclear why only homozygous mutation is needed when only a small fraction of cells during BMT confer benefit, they do not show EKLF expression in any cells analyzed, and the PD-1 and PDL-1 experiments are not conclusive. The definitive mechanism relative to the prior publication from this group on this topic remains unclear.

      The issues in the assessment by the editor on our paper were also brought up by the reviewers. We have taken care of them by carrying out new experiments as well as rewriting of the paper to highlight the rationales and novel aspects of the current study, as described below in our responses to the three reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors Wang et al. present a study of a mouse model K74R that they claim can extend the life span of mice, and also has some anti-cancer properties. Importantly, this mechanism seems to be mediated by the hematopoietic system, and protective effects can be transferred with bone marrow transplantation.

      The authors need to be more specific in the title and abstract as to what is actually novel in this manuscript (a single tumor model), and what relies on previously published data (lifespan). Because many of these claims derive from previously published data, and the current manuscript is an extension of previously published work. The authors need to be more specific as to the actual data they present (they only use the B16 melanoma model) and the actual novelty of this manuscript.

      Especially experiments on life span are published and not sufficiently addressed in this actual paper, as the title would suggest.

      Indeed important to point out the novelty of this paper in comparison to the previous paper. First, we have modified the title, the abstract, and the text so to emphasize that the extended lifespan as well as tumor resistance could be transferred by from Eklf(K74R) mice to WT mice by a single transplantation of the Eklf(K74R) bone marrow mononuclear cells (BMT) to the WT mice at their young age (2 months).

      We now also provide several new experimental data including the one demonstrating that Eklf(K74R) mice are resistant to tumorigenesis of hepatocellular carcinoma as well (new Fig. 1E). These points are elaborated in more details below in my responses to the reviewers’ comments/ suggestions.

      Reviewer #2 (Public Review):

      The manuscript by Wang et al. follows up on the group's previous publication on KLF1 (EKLF) K47R mice and reduced susceptibility to tumorigenesis and increased life span (Shyu et al., Adv Sci (Weinh). Sep 2022;9(25):e2201409. doi:10.1002/ advs.202201409). In the current manuscript, the authors have described the dependence of these phenotypes on age, gender, genetic background, and hematopoietic translation of bone marrow mononuclear cells. Considering the current study is centered on the phenotypes described in the previous study, the novelty is diminished. Further, there are significant conceptual concerns in the study that make the inferences in the manuscript far less convincing. Major concerns are listed below:

      1) The authors mention more than once in the manuscript that KLF1 is expressed in range of blood cells including hematopoietic stem cells, megakaryocytes, T cells and NK cells. In the case of megakaryocytes, studies from multiple labs have shown that while EKLF is expressed megakaryocyte-erythroid progenitors, EKLF is important for the bipotential lineage decision of these progenitors, and its high expression promotes erythropoiesis, while its expression is antagonized during megakaryopoiesis. In the case of HSCs, the authors reference to their previous publication for KLF1's expression in these cells- however, in this study nor in the current study, there is no western blot documented to convincingly show that KLF1 protein is expressed at detectable levels in these cells. For T cells, the authors have referenced a study which is based on ectopic expression of KLF1. For NK cells, the authors reference bioGPS: however, upon inspection, this is also questionable.

      2) The current study rests on the premise that KLF1 is expressed in HSCs, NK cells and leukocytes, and the references cited are not sufficient to make this assumption, for the reasons mentioned in the first point. Therefore, the authors will have to show both KLF1 mRNA and protein levels in these cells, and also compare them to the expression levels seen in KLF1 wild type erythroid cells along with knockout erythroid cells as controls, for context and specificity.

      Regarding the novelties of the current story. Besides demonstration of the independence of the healthy longevity characteristics on age, gender, and genetic background, as exemplified by the tumor resistance, another novelty of the current study is that the healthy longevity characteristics, in particular the tumor resistance and extended lifespan, could be transferred by one-time long-term transplantation of the Eklf(K74R) bone marrow mononuclear cells from young Eklf(K74R) mice to young WT mice. Also, since submission of the last version of the paper, we have carried out new experiments, including the characterization of the anti-cancer capability of NK cells (new Fig. 6) as well as assay of the tumor-resistance of Eklf(K74R) mice to hepatocellular carcinoma (new Fig. 1E), etc.

      We have also modified the title, Abstract, and different parts of the text to highlight the novelties of the current study.

      As to the expression of EKLF in different hematopoietic blood cell types, we have now added a paragraph in Result (p.6 and p.7) describing what have been known in literature in relation to our data presented in the paper. Importantly, following the reviewer’s comments, we have since carried out Western blot analysis of EKLF expression in NK, T, and B cells (p. 6, p.7 and new Fig. S4B). Also noted is that the level of EKLF in B cells is very low and only could be detected by RT-qPCR (Fig. S4C) and RNA-Seq (Bio-GPS database)

      3) To get to the mechanism driving the reduced susceptibility to tumorigenesis and increased life span phenotypes in EKLF K74R mice, the authors report some observations- However, how these observations are connected to the phenotypes is unclear.

      a. For example, in Figure S3, they report that the frequency of NK1.1+ cells is higher in the mutant mice. The significance of this in relation to EKLF expression in these cells and the tumorigenesis and life span related phenotypes are not described. Again, as mentioned in the second point, KLF1 protein levels are not shown in these cells.

      b. In Figure 4, the authors show mRNA levels of immune check point genes, PD-1 and PD-l1 are lower in EKLF K74R mice in PB, CD3+ T cells and B220+ B cells. Again, the questions remain on how these genes are regulated by EKLF, and whether and at what levels EKLF protein is expressed in T cells and B cells relative to erythroid cells. Further, while the study they reference for EKLF's role in T cells is based on ectopic expression of EKLF in CD4+ T cells, in the current study, CD3+ T cells are used. Also, there are no references for the status of EKLF in B cells. These details are not discussed in the manuscript.

      Regarding this part of the questions and comments by the reviewer.

      First, we have since assayed the effect of the K74R substitution of EKLF on the in vitro cancer cell-killing ability of NK cells (termed NK1.1 cells in the previous version). The data showed that NK(K74R) cells have higher ability than the WT NK cells (new Fig. 6). This property together with the higher expression level of NK(K74R) cells in 24 month-old Eklf (K74R) mice than NK cells in 24 month-old WT mice would contribute to the higher tumor-resistance of the Eklf (K74R) mice. This point is also addressed on p. 8 andp.9.

      Second, as stated in previous sections, we have since carried out comparative Western blot analysis of the expression of EKLF protein in NK, CD3 T, and B cells of the WT and Eklf(K74R) mice, respectively (please see the new Fig. S4B). Also, description regarding what are known in literature in relation to our data on the expression of EKLF protein/ Eklf mRNA in different types of hematopoietic blood cells is now included in the Result (please see p.6 and p.7). Notably though, the level of EKLF protein in B cells was too low to be detected by WB (Fig. S4B).

      4) The authors perform comparative proteomics in the leukocytes of EKLF K74R and WT mice as shown in Figure S5. What is the status of EKLF levels in the mutant lysate vs wild type lysates based on this analysis? More clarity needs to be provided on what cells were used for this analysis and how they were isolated since leukocytes is a very broad term.

      The leukocytes used by us were isolated from the peripheral blood after removal of red blood cells, as described in the Materials and Methods.

      Also, the Western blot analysis of EKLF expression in the lysates of leukocytes/ white blood cells (WBC) has been shown previously, now presented in the new Figure S4A.

      5) In the discussion the authors make broad inferences that go beyond the data shown in the manuscript. They mention that the tumorigenesis resistance and long lifespan is most likely due to changes in transcription regulatory properties and changes in global gene expression profile of the mutant protein relative to WT leukocytes. And based on reduced mRNA levels of Pd-1 Pd-l1 genes in the CD3+ T cells and B220+ B cells from mutant mice, they "assert" that EKLF is an upstream regulator of these genes and regulates the transcriptomes of a diverse range of hematopoietic cells. The lack of a ChIP assay to show binding of WT EKLF on genes in these cells and whether this binding is reduced or abolished in the mutant cells, make the above statements unsubstantiated.

      We have since carried out ChIP-PCR analysis of EKLF-binding in the Pd-1 promoter (new Fig. S5). The data showed that EKLF was bound on the CACCC box at -103 of the promoter in WT CD3+T as well as in CD3+T(K74R) cells. This result is discussed on p.7.

      6) Where westerns are shown, the authors need to show the molecular weight ladder, and where qPCR data are shown for EKLF, it will be helpful to show the absolute levels and compare these levels to those in erythroid cells, along the corresponding EKLF knock out cells as controls.

      We have since included the molecular weight markers by the side of Western blots in Fig. S4. Also, we have added a new figure (Fig.S4C) showing the comparison of the expression levels of Eklf mRNA in B cells and CD3+ T cells to the mouse erythroleukemia (MEL) cells, as analyzed by RT-qPCR.

      Also, as indicated now in the Material and Methods section, the specificity of the primers used for RT-qPCR quantitation of mouse Eklf mRNA has been validated before by comparative analysis of wild type and EKLF-knockout mouse erythroid cells (Hung et al., IJMS, 2020).

      7) Figure S1D does not have a figure legend. Therefore, it is unclear what the blot in this figure is showing. In the text of the manuscript where they reference this figure, they mention that the levels of the mutant EKLF vs WT EKLF does not change in peripheral blood, while in the figure they have labeled WBCs for the blot, and the mRNA levels shown do seem to decrease in the mutant compared to WT peripheral blood.

      We apologize for this ignorance on our side. The data shown in the original Fig. SID (new Fig. S4A) are from Western blot analysis of EKLF protein and RT-qPCR analysis of Eklf mRNA in leukocytes/ white blood cells (WBC) isolated from the peripheral blood samples. We have now added back the figure legend and also rewritten the corresponding description in the text on p.6.

      Reviewer #3 (Public Review):

      Hung et al provide a well-written manuscript focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo. The work is fundamental and the data is convincing although several details remain incompletely elucidated. The major strengths of the manuscript include the clarity of the effect and the appropriate controls. For instance, the authors query whether Eklf (K74R) imparts these advantages in a background, age, and gender dependent manner, demonstrating that the findings are independent. In addition, the authors demonstrate that the effect is not the consequence of the specific amino acid substitution, with a similar effect on anticancer activity. Furthermore, the authors provide some evidence that PD-1 and PDL-1 are altered in Eklf (K74R) mice.

      Here we thank the encouraging comments by this reviewer.

      Finally, they demonstrate that the effects are transferrable with BMT. Several weaknesses are also evidence. For instance, only melanoma is tested as a model of cancer such that a broad claim of "anti-cancer activity" may be somewhat of an overreach.

      We have now included new data showing that the Eklf(K74R) mice also carry a higher anti-cancer ability against hepatocellular carcinoma than the WT mice (new Fig. 1E).

      It is also unclear why a homozygous mutation is needed when only a small fraction of cells during BMT can confer benefit. It is also difficult to explain how transplanted donor Eklf (K74R) HSCs confer anti-melanoma effect 7 and 14 days after BMT.

      First, these two observations not necessarily conflict with each other. It is likely that homozygosity, but not heterozygosity, of the K74R substitution in EKLF allows one or more types of hematopoietic blood cells to gain new functions, e.g. the higher cancer cell- killing capability of NK(K74R) cells (new Fig. 6), that help the mice to live long and healthy. Also, the data in Fig. 2D indicated that as low as 20% of the blood cells carrying homozygous Eklf(K74R) alleles in the recipient mice upon BMT could be sufficient to confer the mice a higher anti-cancer capability, likely in part due to cells such as NK(K74R). These points are now clarified in Discussion (p.9 and p.10).

      Second, we think the NK(K74R) cells contributed a significant part to the anti-cancer capability of the transplanted Eklf(K74R) blood in the recipient WT mice. As documented in some literature, e.g. Ferreira et al., Journal of Molecular Medicine (2019), the hematopoietic lineage of the NK cells would be fully reconstituted as early as 2 weeks after BMT. Of course, there could be other still unknown factors/ cells that also contribute to the tumor-resistance of the recipient mice at 7 day following BMT. This point is now touched upon on p.8 and p.9.

      Furthermore, it would be useful to see whether there are virulence marker alterations in the melanoma loci in WT vs Eklf (K74R) mice.

      As responded in the Public Reviews, we will analyze this in future together with other types of tumors in a separate study.

      Finally, the data in Fig 4c is difficult to interpret as decreased PD-1 and PDL-1 after knockdown of EKLF in vitro is not a useful experiment to corroborate how mutation without changing EKLF expression impacts immune cells. The work is impactful as it provides evidence that healthspan and lifespan may be modulated by specific hematological mutation but the mechanism by which this occurs is not completely elucidated by this work.

      As described in a previous section, we have since also carried out ChIP-qPCR analysis of the binding of WT EKLF and EKLF (K74R) on the Pd-1 promoter (new Fig. S5).

      Reviewer #1 (Recommendations For The Authors):

      The authors present interesting melanoma model data but need to tone down their claim of multiple effects of their model system. It needs to be clear what is new and what is previously known.

      As respond in the Public Reviews, we have since added new data on the tumor resistance of the Eklf(K74R) mice to hepatocellular carcinoma (new Fig. 1E). We have also modified the title as well as highlighted the novel points in the Abstract and text of the revised draft.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the major concerns listed in the public review, the minor concerns that the authors could address are listed below:

      1) Will be helpful to describe why was the pulmonary melanoma focus assay chosen for metastasis assay?

      We now describe on p. 4 the rationale behind the initial choice of this assay for analysis of the anti-cancer capability of the Eklf(K74R) mice. Also, we have since included data from experiment using the subcutaneous cancer cell inoculation assay for comparative analysis of the anti-hepatocellular carcinoma capability of Eklf(K74R) and WT mice (Fig. 1E and p.5).

      2) Reference #61 for B16-F10-luc cells cited in the methods does not have details on the generation of these cells. What these cells are and why this model was chosen needs to be described.

      Sorry about not providing this information before. We now describe the generation of B16F10-luc cells in the Material and Methods section (p.13). The rationale of choosing the B16-F10 cells for the pulmonary lung foci assay is also added on p.4.

      3) The DNA binding consensus site for EKLF needs to be expanded in the introduction.

      This part has been taken care of now on p.13.

      Reviewer #3 (Recommendations For The Authors):

      Hung et al provide a well-written manuscript focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo. The work is fundamental and the data is convincing although several details remain incompletely elucidated.

      1) Only melanoma is tested as a model of cancer such that a broad claim of "anti-cancer activity" may be somewhat of an overreach. The authors, therefore, need to provide evidence of a second type of malignancy to which Eklf mutation confers anticancer and longevity advantages or temper the claims in the discussion that the effect still needs to be tested in non-melanoma cancer models to determine the broad anti-cancer effect.

      As responded in the Public Reviews, we have since shown that Eklf(K74R) mice also exhibited a higher resistance to the carcinogenesis of hepatocellular carcinoma (new Fig. 1E).

      2) Why is a homozygous mutation needed when only a small fraction of cells during BMT can confer benefit of Eklf mutation? Is there evidence that the cellular effect is binary but only a few such cells are needed? This is confusing and requires further clarification.

      As responded in the Public Reviews, these two observations not necessarily conflict with each other. It is likely that homozygosity, but not heterozygosity, of the K74R substitution in EKLF allows one or more types of hematopoietic blood cells to gain new functions, e.g. the higher cancer cell- killing capability of NK(K74R) cells (new Fig. 6), that help the mice to live long and healthy. Also, the data in Fig. 2D indicated that as low as 20% of the blood cells carrying homozygous Eklf(K74R) alleles in the recipient mice upon BMT could be sufficient to confer the mice a higher anti-cancer capability, likely in part due to cells such as NK(K74R). This point is now clarified in Discussion (p.9).

      3) BMT typically requires at least 3-4 weeks to reconstitute the marrow compartment but the authors are able to see effects of Eklf mutation as early as 7 days following BMT. This is surprising and brings into question the mechanism of effect.

      As responded in the Public Reviews, we think the NK(K74R) cells contributed a significant part to the anti-cancer capability of the transplanted Eklf(K74R) blood in the recipient WT mice. As documented in some literature, e.g. Ferreira et al., Journal of Molecular Medicine (2019), the hematopoietic lineage of the NK cells would be fully reconstituted as early as 2 weeks after BMT. Of course, there could be other still unknown factors/ cells that also contribute to the tumor-resistance of the recipient mice at 7 day following BMT (please see discussion of this point on p. 9).

      4) It would be useful to see whether there are virulence marker alterations in the melanoma loci in WT vs Eklf (K74R) mice.

      As responded in the Public Reviews, we will analyze this in future together with other types of tumors in a separate study.

      5) The data in Fig 4c is difficult to interpret as decreased PD-1 and PDL-1 after knockdown of EKLF in vitro is not a useful experiment to corroborate how mutation WITHOUT changing EKLF expression impacts immune cells.

      Indeed, the RNAi knockdown experiment only demonstrated a positive regulatory role of EKLF in Pd1/Pd-l1 gene expression. We have followed the reviewer’s suggestion and carried out ChIP-qPCR analysis and shown that the factor is bound on the Pd-1 promoter in both WT CD3+T cells and CD3+T(K74R) cells (new Fig. S5). We briefly discuss these data on p.7 in relation to the possible effect of K74R substitution of EKLF on Pd-1 expression.

      We have now further clarified this point on p. 7.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on the very nice structure! In my opinion, which you can feel free to take or leave, this would work better as a short report focused on the improvement of the structure relative to the current published model. To my mind, while the functional and dimerization studies are supportive of the cryo-EM studies (specifically, the purified protein is functional, and does tend to dimerize in various membrane mimetics), these experiments don't provide a lot of new mechanistic insight on their own. The dimerization, in particular, could be developed further.

      Response: Thank you for the comments. We have chosen to stick with the current article format. That the protein is dimeric is exciting in our view and we are working to further define the functional significance of this formation.

      Reviewer #2 (Recommendations For The Authors):

      Ln 48. Abstract. "highlighting feature of the complex interface" sounds a bit vague. I was wondering if the authors considered including more specific findings here.

      Response: This sentence has been removed.

      Ln 149 and elsewhere. The authors refer to the previously published structure of HiSiaQM as "low resolution". It may just be me and likely not the intention of the authors, but this comes across as an attempt to diminish the validity of this previous work from another group, which is not necessary. I would recommend rewording these parts slightly, even if it is just to say "lower resolution" instead of "low resolution".

      Response: It was not our intention to diminish the excellent work published by another group, we have changed “low resolution” to “lower resolution” throughout.

      Ln 160. The authors state that the inward-open conformation is likely "the resting state of the transporter". I think this statement should be modified slightly to acknowledge that this is only true under these conditions, i.e. in the absence of the bilayer, membrane potential and chemical gradients.

      Response: We have edited this as follows “That we observe the inward-open conformation without either a bound P-subunit or fiducial marker, suggests that this is the resting state of the transporter under experimental conditions (in the absence of a membrane bilayer, membrane potential and chemical gradients).”

      Ln 202. I'm not convinced that the use of the word "probable" is appropriate here; "possible" would likely fit better in the absence of compelling evidence that this dimer forms in a bacterial cell membrane with physiological levels of HiSiaQM expression.

      Response: We have changed “probable” to “possible”.

      The authors show an SEC trace for DDM solubilised protein, which is a single peak, whereas the LMNG extracted protein has 2 distinctly different elution profiles depending on the LMNG concentration. Was the same phenomenon observed when varying the DDM concentration?

      Response: We observed significantly more aggregation with DDM than L-MNG, so it was infrequently used and considerably less well characterised. In one purification, moderately higher DDM shifted the elution peak to be slightly later but retained a similar profile. Overall, we did not observe the same phenomenon of distinctly different elution profiles with DDM, but we have limited data.

      Ln 245. The two positions cited as important for the elevator-type mechanism are the fusion helix and the dimer interface. However, there is no evidence that the dimer interface observed in this work has any relevance to the transport mechanism. To make this statement, the interface would need to be disrupted and the effects on transport evaluated.

      Response: This has been edited as follows. “Evident in our cryo-EM maps are well-defined phospholipid densities associated with areas of HiSiaQM that may be important for the function of an elevator-type mechanism (Figure 4), but require further testing.”

      Ln 257. The authors state that the lipids form "specific and strong interactions" with the protein, but without knowing the identity of the lipids present, it is difficult to say anything about the specificity of this interaction. I think the authors could consider rewording this. Response: We have edited this by removing the term “specific” and describing the lipid interactions only as strong interactions.

      Ln 270. The authors identify a lipid-binding site and residues that likely interact with the headgroup. It would be interesting if the authors could speculate on the purpose of this lipid binding site and how it could affect transport. The residues are not conserved, which the authors suggest reflects the variety of lipid compositions in different bacteria. Are the authors suggesting that this lipid binding site is a general feature for all fused TRAP transporters and that the identity of the lipid changes depending on the species?

      Response: Yes, we speculate that the lipid binding site may be a general feature for fused TRAP transporters. We have added speculation about this binding site, specifically that “the fusion helix and concomitant lipid molecule may provide a more structurally rigid scaffold than a Q-M heterodimer, i.e., PpSiaQM, although how this impacts the elevator transition requires further testing” at Line 283.

      Though we believe that a binding pocket is likely found in a number of fused TRAPs (based on sequence and Alphafold predictions, e.g., FnSiaQM and AaSiaQM), we have now acknowledged that some fusions may not necessarily bind a lipid molecule here, by stating “While this binding pocket is likely found in a number of fused TRAPs (based on sequence predictions, e.g., FnSiaQM and AaSiaQM in Supplementary Figure 8), it is not clear whether they also bind lipids here without experimental data” at Line 290.

      Ln 306. The authors state that the HiSiaPQM has a 10-fold higher transport activity than PpSiaPQM. Unless the transport assays were performed in parallel (to mitigate small changes in experimental set-up) and the reconstitution efficiency for each proteoliposome preparation was carefully analysed, it is very difficult for this to be a meaningful comparison. Even if the amount of protein incorporated into the proteoliposomes is quantified (e.g. by evaluating protein band intensity when the proteoliposomes are analysed using SDS-PAGE), this does not account for an inactive protein that was incorporated, nor the proportion of the protein that was incorporated in the inside-out orientation, which would be functionally silent in these assays. I'm not suggesting these assays actually need to be performed, but I think the text should be modified to reflect what can actually be compared.

      Response: We agree with the reviewer that a meaningful comparison is difficult to make without a careful analysis of the reconstitution efficiency and have modified the text to reflect this. We have altered the paragraph beginning at Line 319 to the following: “The fused HiSiaPQM system appears to have a higher transport activity than the non-fused PpSiaPQM system. With the same experimental setup used for PpSiaPQM (5 M Neu5Ac, 50 M SiaP) (33), the accumulation of [3H]-Neu5Ac by the fused HiSiaPQM is ~10-fold greater. Although this difference may reflect the reconstitution efficiency of each proteoliposome preparation, it is possible that it has evolved as a result of the origins of each transporter system—P. profundum is a deep-sea bacterium and as such the transporter is required to be functional at low temperatures and high pressures… ”

      Ln 335. "S298A did not show an effect on growth when mutated to alanine previously." Suggest changing "S298A" here to "S298".

      Response: This has been changed.

      Ln 340. In addition to PpSiaQM, the large cavity was also presumably observed in the lower resolution structure of HiSiaQM?

      Response: The cavity is detectable in the lower resolution structure (7qe5), though very poorly defined by the density. Furthermore, the AlphaFold model fitted to this density has positioned sidechains inside the cavity, which we consider very likely to be an error (in comparison to our structures, VcINDY and our estimates of the volume required to house sialic acid). The cavity is generally much better defined by the structures we have referenced.

      Ln 345. Reference missing after "previously reported"? Response: This has been added. Measuring the affinity for the P-to-QM interaction is very useful, but it would have enhanced the study if some of the residues identified as important for this interaction (detailed on p.13) had been tested for their contributions to binding using this approach.

      Response: We do aim to perform this assay with these mutants in the future, but are also developing parallel assays to further test this interaction in different membrane mimetics.

      Ln 436. As stated previously, it is more accurate to say that "this is the most stable conformation" under these conditions.

      Response: We have edited this to say “The ‘elevator down’ (inward-facing) conformation is preferred in experimental conditions”. We have also changed the last sentence of this paragraph to say “However, the dimeric structures we have presented have no other proteins bound, yet exist stably in the elevator down state, suggesting this is the most stable conformation in experimental conditions, where there is no membrane bilayer, membrane potential, or chemical gradient present.”

      Ln 438. "Lipids associated with HiSiaQM are structurally and mechanistically important." This conclusion is not supported by the data presented; there is no evidence that the bound lipids influence the mechanism at all. The lipids observed are certainly interestingly placed and one could speculate about their relevance, but this statement of fact is not supported. Therefore, their importance to the mechanism needs to be tested or this conclusion needs to be substantially softened.

      Response: We have softened this statement by changing it to “Lipids have strong interactions with HiSiaQM and are likely to be important for the transport mechanism.”

      Reviewer #3 (Recommendations For The Authors):

      The fact that HiSiaQM samples consist of a mixture of compact monomer and dimer is clear, from Fig. S5 and S6. However, the analysis displayed in Fig 3 and Fig S4 would require more explanation. To my understanding, it requires the values of the sedimentation and diffusion coefficients. It could be good to provide the experimental values of D, and explain a little more about the method in the material and method section.

      Response: Yes, the analysis requires the experimental diffusion coefficients. These have been added to the Figure 3 and S4 legends and more detail has been added to the method section.

      In addition, I am puzzled when reading, in the legend of Fig 3, considerations that peak 2 could not correspond to a monomer or trimer: do these sentences correspond to other mathematical solutions, or is a given frictional ratio considered, or do they refer to Fig. S5 analysis?

      We can see where this confusion could arise from. These sentences do not correspond to a given frictional ratio or the Fig. S5 analysis (this is a separate, complementary analysis). For peak 2 not existing as a monomer is strictly a physical justification – with pure protein and an observed peak smaller than peak 2, a monomer is not possible for peak 2. For peak 2 not existing as a trimer is a mathematical solution using the s and D coefficients. The solutions identify that an unreasonably low amount of detergent would be bound to a trimer (32 molecules for L-MNG or 0 for DDM) to exist at those s and D values so we have ruled the trimer out. Reassuringly, the complementary analysis in Fig. S5/S6 agrees with the monomer-dimer outputs from the s and D analysis. We have adjusted the text in the legends of Fig. 3 and S4 to better convey these points.

    1. Author Response

      eLife assessment

      This useful study uses a mouse model of pancreatic cancer to examine mitochondrial mass and structure in atrophying muscle along with aspects of mitochondrial metabolism in the same tissue. Most relevant are the solid transcriptomics and proteomics approaches to map out related changes in gene expression networks in muscle during cancer cachexia.

      Response: We very much appreciate the positive feedback from the editors on our article and are delighted to have it published in eLife. Our sincere thanks to the Reviewers for their positive feedback on our work, and for their insightful and constructive comments.

      Reviewer #1 (Public Review):

      Summary:

      This important study provides a comprehensive evaluation of skeletal muscle mitochondrial function and remodeling in a genetically engineered mouse model of pancreatic cancer cachexia. The study builds upon and extends previous findings that implicate mitochondrial defects in the pathophysiology of cancer cachexia. The authors demonstrate that while the total quantity of mitochondria from skeletal muscles of mice with pancreatic cancer cachexia is similar to controls, mitochondria were elongated with disorganized cristae, and had reduced oxidative capacity. The mitochondrial dysfunction was not associated with exercise-induced metabolic stress (insufficient ATP production), suggesting compensation by glycolysis or other metabolic pathways. However, mitochondrial dysfunction can lead to increased production of ROS/oxidative stress and would be expected to interfere with carbohydrate and lipid metabolism, events that are linked to cancer-induced muscle loss. The data are convincing and were collected and analyzed using state-of-the-art techniques, with unbiased proteomics and transcriptomics analyses supporting most of their conclusions.

      Additional Strengths:

      The authors utilize a genetically engineered mouse model of pancreatic cancer which recapitulates key aspects of human PDAC including the development of cachexia, making the model highly appropriate and translational.

      The authors perform transcriptomic and proteomics analyses on the same tissue, providing a comprehensive analysis of the transcriptional networks and protein networks changed in the context of PDAC cachexia.

      Weaknesses:

      The authors refer to skeletal muscle wasting induced by PDAC as sarcopenia. However, the term sarcopenia is typically reserved for the loss of skeletal muscle mass associated with aging.

      Response: We agree that the term sarcopenia initially refers to aged muscle, but its use has spread to other fields, including oncology (for example, in this article, which we quote: Mintziras I et al. Sarcopenia and sarcopenic obesity are significantly associated with poorer overall survival in patients with pancreatic cancer: Systematic review and meta-analysis. Int J Surg 2018;59:19-26). Actually, the term sarcopenia is now widely used in the literature and in the clinic to describe the loss of muscle mass and strength in cancer patients (see for example, this recent review: Papadopetraki A. et al. The Role of Exercise in Cancer-Related Sarcopenia and Sarcopenic Obesity. Cancers 2023;15;5856).

      In Figure 2, the MuRF1 IHC staining appears localized to the extracellular space surrounding blood vessels and myofibers-which causes concern as to the specificity of the antibody staining. MuRF1, as a muscle-specific E3 ubiquitin ligase that degrades myofibrillar proteins, would be expected to be expressed in the cytosol of muscle fibers.

      Response: We agree that MuRF1 IHC staining was also observed in the extracellular space, which was a surprise, for which we have no explanation to date.

      Disruptions to skeletal muscle metabolism in PDAC mice are predicted based on mitochondrial dysfunction and the transcriptomic and proteomics data. The manuscript could therefore be strengthened by additional measures looking at skeletal muscle metabolites, or linking the findings to previous work that has looked at the skeletal muscle metabolome in related models of PDAC cachexia (Neyroud et al., 2023).

      Response: We agree that our omics data could be strengthened by additional measures looking at skeletal muscle metabolites. It's an excellent suggestion to parallel the transcriptomic and proteomic data we obtained on the gastrocnemius muscle with the metabolomic data obtained by Neyroud et al. on the same muscle. These authors used another mouse model of PDAC than our KIC GEMM model, namely the allograft model implanting KPC cells (derived from the pancreatic tumor of KPC mice, another PDAC GEMM model) into syngeneic recipient mice. They carried out a proteomic study on the tibialis anterior muscle and a metabolomic study on the gastrocnemius muscle. Proteomics data identified in particular a KPC-induced reduction in the relative abundance of proteins annotating to oxidative phosphorylation, consistently with our data showing reduced mitochondrial activity pathways. Metabolomic data showed reduced abundance of many amino acids as expected, and of intermediates of the mitochondrial TCA cycle (malate and fumarate) in KPC-atrophied muscle consistently with reduced mitochondrial metabolic pathways that we illustrated. In contrast, metabolites that were increased in abundance included those related to oxidative stress and redox homeostasis, which is not surprising regarding the profound oxidative stress affecting atrophied muscle. Finally, we noted in Neyroud's metabolomic data the dysregulation of certain lipids and nucleotides in atrophied muscle, which is very interesting to relate to our study describing alterations in lipid and nucleotide metabolic pathways.

      Reviewer #2 (Public Review):

      The present work analyzed the mitochondrial function and bioenergetics in the context of cancer cachexia induced by pancreatic cancer (PDAC). The authors used the KIC transgenic mice that spontaneously develop PDAC within 9-11 weeks of age. They deeply characterize bioenergetics in living mice by magnetic resonance (MR) and mitochondrial function/morphology mainly by oxygraphy and imaging on ex vivo muscles. By MR they found that phosphocreatine resynthesis and maximal oxidative capacity were reduced in the gastrocnemius muscle of tumor-bearing mice during the recovery phase after 6 minutes of 1 Hz electrical stimulation while pH was reduced in muscle during the stimulation time. By oxygraphy, the authors showed a decrease in basal respiration, proton leak, and maximal respiration in tumor-bearing mice that was associated with the decrease of complex I, II, and IV activity, a reduction of OXPHOS proteins, mitochondrial mass, mtDNA, and to several morphological alterations of mitochondrial shape. The authors performed transcriptomic and proteomic analyses to get insights into mitochondrial defects in the muscles of PDAC mice. By IPA analyses on transcriptomics, they found an increase in the signature of protein degradation, atrophy, and glycolysis and a downregulation of muscle function. Focusing on mitochondria they showed a downregulation mainly in OXPHOS, TCA cycle, and mitochondrial dynamics genes and upregulation of glycolysis, ROS defense, mitophagy, and amino acid metabolism. IPA analysis on proteomics revealed major changes in muscle contraction and metabolic pathways related to lipids, protein, nucleotide, and DNA metabolism. Focusing on mitochondria, the protein changes mainly were related to OXPHOS, TCA cycle, translation, and amino acid metabolism.

      The major strength of the paper is the bioenergetics and mitochondrial characterization associated with the transcriptomic and proteomic analyses in PDAC mice that confirmed some published data of mitochondrial dysfunction but underlined some novel metabolic insights such as nucleotide metabolism.

      There are minor weaknesses related to some analyses on mitochondrial proteins and to the fact that proteomic and transcriptomic comparison may be problematic in catabolic conditions because some gene expression is required to maintain or re-establish enzymes/proteins that are destroyed by the proteolytic systems (including the autophagy proteins and ubiquitin ligases). The authors should consider the following points.

      Point 1. The authors used the name sarcopenia as synonymous with muscle atrophy. However, sarcopenia clearly defines the disease state (disease code: ICD-10-CM (M62.84)) of excessive muscle loss and force drop during ageing (Ref: Anker SD et al. J Cachexia Sarcopenia Muscle 2016 Dec;7(5):512-514.). Therefore, the word sarcopenia must be used only when pathological age-related muscle loss is the subject of study. Sarcopenia can be present in cancer patients who also experience cachexia, however since the age of tumor-bearing mice in this study is 7-9 weeks old, the authors should refrain from using sarcopenia and instead replace it with the words muscle atrophy/ muscle wasting/muscle loss.

      Response: This issue has also been raised by the Reviewer #1. We agree that the term sarcopenia historically refers to aged muscle, but it is also used in oncology (for example, in this article, which we quote: Mintziras I et al. Sarcopenia and sarcopenic obesity are significantly associated with poorer overall survival in patients with pancreatic cancer: Systematic review and meta-analysis. Int J Surg 2018;59:19-26). Actually, the term sarcopenia is now widely used in the literature and in the clinic to describe the loss of muscle mass and strength in cancer patients (see for example, this recent review: Papadopetraki A. et al. The Role of Exercise in Cancer-Related Sarcopenia and Sarcopenic Obesity. Cancers 2023;15;5856).

      Point 2. Most of the analyses of mitochondrial function are appropriate. However, the methodological approach to determining mitochondrial fusion and fission machinery shown in Fig. 5F is wrong. The correct way is to normalize the OPA1, MFn1/2 on mitochondrial proteins such as VDAC/porin. In fact, by loading the same amount of total protein (see actin in panel 5F) the difference between a normal and a muscle with enhanced protein breakdown is lost. In fact, we should expect a decrease in actin level in tumor-bearing mice with muscle atrophy while the blots clearly show the same level due to the normalization of protein content. Moreover, by loading the same amount of proteins in the gel, the atrophying muscle lysates become enriched in the proteins/organelles that are less affected by the proteolysis resulting in an artefactual increase. The correct way should be to lyse the whole muscle of control and tumor-bearing mice in an identical volume and to load in western blot the same volume between control cachectic muscles. Alternatively, the relative abundance of mitochondrial shaping proteins related to mitochondrial transmembrane or matrix proteins (mito mass) should compensate for the loading normalization. Because the authors showed elongated mitochondria despite mitophagy genes being up, fragmentation may be altered. Moreover, DNM1l gene is suppressed and therefore DRP1 protein must be analyzed. Finally, OPA 1 protein has different isoforms due to the action of proteases like OMA1, and YME1L that elicit different functions being the long one pro-fusion while the short ones do not. The authors must quantify the long and short isoforms of OPA1.

      Response: We acknowledge that our analysis of a minor set of proteins involved in mitochondrial dynamics by Western blotting (Figure 5F) is basic and could have been improved. We thank the Reviewer for all the suggestions, which will be very useful in future projects studying the subject in greater depth and according to the molecular characteristics of each player in mitochondrial fusion, fission, mitophagy and biogenesis.

      Point 3. The comparison of proteomic and transcriptomic profiles to identify concordance or not is problematic when atrophy programs are induced. In fact, most of the transcriptional-dependent upregulation is to preserve/maintain/reestablish enzymes that are consumed during enhanced protein breakdown. For instance, the ubiquitin ligases when activated undergo autoubiquitination and proteasome degradation. The same happens for several autophagy-related genes belonging to the conjugation system (LC3, Gabarap), the cargo recognition pathways (e.g. Ubiquitin, p62/SQSTM1) and the selective autophagy system (e.g. BNIP3, PINK/PARKIN) and metabolic enzymes (e.g. GAPDH, lipin). Finally, in case identical amounts of proteins have been loaded in mass spec the issues rise in point 2 of selective enrichment should be considered. Therefore, when comparing proteomic and transcriptomic these issues should be considered in discussion.

      Response: We fully agree with the Reviewer that seeking concordance between transcriptomic and proteomic data in the case of an organ affected by a high level of proteolysis is a difficult business. Another major difficulty we discussed in the Discussion section of the article is the fact that there is no concordance between RNA and protein level for a good proportion of proteins, for multiple reasons, so each level of omics has to be interpreted independently to give information on the pathophysiology of the organ studied.

    1. Author Response

      We thank the editors and reviewers for taking the time to provide a critical assessment of our manuscript. We are delighted our work was found to have merit, and will revise the manuscript based on their valuable input.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors):

      Major comments:

      1) The immunolabeling data in Figure S4 shows no change in puncta number but reduced puncta size in Kit KO. sIPSC data show reduced frequency but little change in amplitude. These data would seem contradictory in that one suggests reduced synaptic strength, but not number, and the other suggests reduced synapse number, but not strength. How do the authors reconcile these results?

      Regarding the synaptic puncta, In Kit KO (or KL KO), we have not detected an overt reduction in the average VGAT/Gephyrin/Calbindin positive puncta density or puncta size per animal. With respect to puncta size, only in the Kit KO condition, and only when individual puncta are assessed does this modest (~10%) difference in size become statistically significant. In the revision, we eliminate this figure and focus on the per animal averages.

      We interpret that the reduction in sIPSC and mIPSC frequency likely stems from a decreased proportion of functional synapse sites. The number of MLIs, their action potential generation, the density of synaptic puncta, and the ability of direct stimulation to evoke release and equivalent postsynaptic currents, are all similar in Control vs Kit KO. It is therefore feasible that a reduced frequency of postsynaptic inhibitory events is due to a reduced ability of MLI action potentials to invade the axon terminal, and/or an impaired ability for depolarization to drive (e.g. coordinated calcium flux) transmitter release. That is, while the number of MLIs and their synapses appear similar, the reduced mIPSC frequency suggests that there is a reduced proportion of, or probability that, Kit KO synapse sites that function properly.

      2) Related to point 1, it would be helpful to see immunolabeling data from Kit ligand KO mice? Do these show the same pattern of reduced puncta size but no change in number?

      Although we have not added a figure, we have now added experiments and a corresponding analysis in the manuscript. As we had previously for Kit KO, we now for KL KO conducted IHC for VGAT, Gephyrin, and Calbindin, and we analyzed triple-positive synaptic puncta in the molecular layer of Pcp2 Cre KL KO mice and Control (Pcp2 Cre negative, KL floxed homozygous) mice. We did not find a gross reduction in the average synaptic puncta size or density, or in the PSD-95 pinceau size. From this initial analysis, it appears that the presynaptic hypotrophy is more notable in the receptor than in the ligand knockout. We speculate that this is perhaps because the Kit receptor may have basal activity in the absence of Kit ligand, that Kit may serve a presynaptic scaffolding role that is lost in the receptor (but not the ligand) knockout, or simply that the embryonic timing of the Pax2 Cre vs Pcp2 Cre recombination events is more relevant to pinceaux development, especially as basket cells are born primarily prenatally.

      3) The data using KL overexpression in PC (figure 4E,F) are intriguing, but puzzling. The reduction in sIPSC frequency and amplitude in the control PC is much greater than seen in the Kit or KL KO. The interpretation of these data, "Thus, KL-Kit levels may not set the number of MLI:PC release sites, but may instead influence the proportion of synapses that are functional for neurotransmission (Figure 4G)" is not clear and the reasoning here should be explained in more detail, perhaps in the discussion.

      We have attempted to clarify this portion of the manuscript by eliminating the cartoon of the proposed model, and by revising and adding to the discussion. Either MLI Kit KO or PC KL KO seems to preserve the absolute number of MLI:PC anatomical synapse sites (IHC) but to reduce the proportion of those synapse that are contributing to neurotransmission (mIPSC). We speculate that sparse PC KL overexpression (OX) may either 1) weaken inhibition to surrounding control PCs by either diminishing KL OX PC to KL Control PC inhibition, and/or 2) act retrogradely through MLI Kit to potentiate MLI:MLI inhibition, reducing the MLI:PC inhibition at neighboring Control PCs.

      Minor comments:

      1) In the first sentence of the results, should "Figure 1A, B" be "Figure C, D"?

      Yes, corrected.

      2) The top of page 6 states "the mean mIPSC amplitude was ~10% greater in PC KL KO than in control", this does not appear to be the case in Figure 3E. control and KL KO look very similar here.

      In this portion of the text citing the modest 10% increase in mIPSC amplitude, we are referring to the average amplitude of all individual mIPSC events in the PC KL KO condition; in the figure referred to by the reviewer (3E), we are instead referring to the average of all mIPSC event amplitudes per KL KO PC. Because of the dramatic difference in sample size for individual events vs cells, this modest difference rises to statistical, if not biological, significance. We include this individual event analysis only to suggest that, since we in fact saw a slightly higher event amplitude in the KL KO condition, it is unlikely that a reduced amplitude would have been a technical reason that we detected a lower event frequency.

      3) Figure 3 D, duration, y-axis should be labelled "ms"

      Event duration is no longer graphed or referenced. This has been replaced with total inhibitory charge.

      Reviewer #2 (Recommendations For The Authors):

      Methods:

      • Pax2-Cre line: embryonal Cre lines sometimes suffer from germline recombination. Was this evaluated, and if yes, how?

      The global loss of Kit signaling is incompatible with life, as seen from perinatal lethality in other Kit Ligand or Kit mutant mouse lines or other conditional approaches. Furthermore, a loss of Kit signaling in germ cells impedes fertility. Thus, while not explicitly ruled out, since conditional Pax2 Cre mediated Kit KO animals were born, survived, and produced offspring in normal ratios, we do not suspect that germline recombination was a major issue in this specific study.

      • Include rationale for using different virus types in different studies (AAV vs. Lenti).

      This rationale is now included and reflects the intention to achieve infection sparsity in the smaller and less dense tissue of perinatal mouse brains.

      • How, if at all, was blinding performed for histological and electrophysiological experiments?

      It was not possible for electrophysiology to be conducted blinded for the Kit KO experiments, owing to the subjects’ hypopigmentation. However, whenever feasible, resultant microscopy images or electrophysiological data sets were analyzed by Transnetyx Animal ID, and the genotypes unmasked after analysis.

      • Provide justification for limiting electrophysiology recordings to lobule IV/V and why MLIs in the middle third of the molecular layer were prioritized when inhibition of PCs is dominated by large IPSCs from basket cells. Why were 2 different internals used for recording IPSCs and EPSCs in PCs and MLIs? While that choice is justified for action potential recordings, it provides poor voltage control in PC voltage clamp. Both IPSCs and EPSCs could have been isolated pharmacologically using a CsCl internal.

      The rationale for regional focus has been added to the text. For MLI action potential recordings, we opted to sample the middle third of the molecular layer so that we would not be completely biased to either classic distal stellate vs proximal basket subtypes. It is our hope, in future optogenetic interrogations, to simultaneously record the dynamics of all MLI subtypes in a more unbiased way. With respect to internal solutions, we initially utilized a cesium chloride internal to maximize our ability to resolve differences in GABAA mediated currents, which was the hypothesis-driven focus of our study. While we agree that utilizing a single internal and changing the voltage clamp to arrive at per-cell analysis of Excitatory/Inhibitory input would have been most informative, our decision to utilize pharmacological methods was driven by our experience that achieving adequate voltage clamp across large Purkinje cells was often problematic, particularly in adult animals.

      Introduction:

      In the introduction, the authors state that inactivating Kit contributes to neurological dysfunction - their examples highlight neurological, psychiatric, and neurodevelopmental conditions.

      The language has been changed.

      General:

      Using violin plots illustrates the data distribution better than bar graphs/SEM.

      We have included violin plots throughout, and we have changed p values to numeric values, both in the interest of presenting the totality of the data more clearly.

      Synapses 'onto' PCs sounds more common than 'upon' PCs.

      We have changed the wording throughout.

      Figure 1:

      1F - there seems to be an antero-posterior gradient of Kit expression.

      Though not explicitly pursued in the manuscript, it is possible that such a gradient may reflect differences in the timing of the genesis and maturation of the cerebellum along the AP axis. Regional variability is however now briefly addressed as a motivator for focused studies within lobules IV/V.

      E doesn't show male/female ratios but only hypopigmentation.

      This language has been corrected.

      Figure 2 and associated supplementary figures:

      2A/B: The frequency of sIPSCs is very high in PCs, making the detection of single events challenging. How was this accomplished? Please add strategy to the methods.

      We have added methodological detail for electrophysiology analysis.

      How were multi-peak events detected and analyzed? 'Duration' is not specified - do the authors refer to kinetics? If so, report rise and decay. It is likely impossible to show individual aligned sIPSCs with averages superimposed, given that sIPSCs strongly overlap. Alternatively, since no clear baseline can be determined in between events, and therefore frequency, amplitude, and kinetics quantification is near-impossible, consider plotting inhibitory charge.

      Given the heterogeneity of events, we now do not refer to individual event kinetics. As suggested, we have now included an analysis of the total inhibitory charge transferred by all events during the recording epoch.

      S2: Specify how density, distribution, and ML thickness were determined in methods. How many animals/cells/lobules?

      For consistency with viral injections and electrophysiology, the immunohistochemical analysis was restricted to lobule IV/V. This is clearer in the revision and detail is added in the methods.

      S3:

      S3B: the labels of Capacitance and Input resistance are switched.

      This has been corrected.

      How were these parameters determined? Add to methods.

      Added

      In the previous figure the authors refer to 'frequency', in this figure to 'rate' - make consistent

      This has been corrected.

      D: example does not seem representative. Add amplitude of current pulse underneath traces.

      We added new traces from nearer the group means and we now include the current trace.

      F/G example traces (aligned individual events + average) are necessary.

      We added example traces near the relevant group means for each condition.

      Statement based on evoked IPCSs that 'synapses function normally' is a bit sweeping and can only be fully justified with paired recordings. Closer to the data would be the release probability of individual synapses is similar between control and Kit KO.

      Paired recordings in both Kit Ligand and Kit receptor conditional knockout conditions is indeed an informative aim of future studies should support permit. For now, we have clarified the language to be more in line with the reviewer’s welcome suggestion.

      S4:

      Histological strategy cannot unambiguously distinguish MLI-PC and PC-PC synapses. Consider adding this confound to the text.

      We have added this confound to the discussion.

      The observation that the pinceau is decreased in size could have important implications for ephaptic coupling of MLI and PC and could be mentioned.

      We agree and have added this notion to the discussion.

      Y-label is missing in B.

      Corrected.

      Figure 3 and associated supplementary figures:

      In the text, change PC-Cre to L7-Cre or Pcp2-Cre.

      Changed

      How do the authors explain a reduction in frequency, amplitude, and duration of sIPSCs in the KL KO but not in the Kit KO? Add to the discussion

      We now address this apparent discordance in the discussion. Pax2 Cre mediates recombination weeks ahead of Pcp2 Cre. We therefore suspect that postnatal PC KL KO may be more phenotypic than embryonic MLI Kit KO because there is less time for developmental compensation. A future evaluation of the impact of postnatal Kit KO would be informative to this end.

      As in Figure 2, plotting the charge might be more accurate.

      We now plot total charge transfer.

      Are the intrinsic properties in KL KO PCs altered? (Spontaneous firing, capacitance, input resistance).

      We have added to the text that we found no difference in capacitance or input resistance between Purkinje cells from KL floxed homozygous Control animals versus those from KL floxed homozygous, PCP2 Cre positive KL KO animals. We plan to characterize both basal and MLI modulated PC firing in a future manuscript, especially since Pcp2 Cre mediated KL KO seems more phenotypic than Pax2 Cre mediated Kit KO, we agree that this seems a better testbed for investigating differences in both the basal, and the MLI-mediated modulations in, PC firing.

      3D-F - Example traces would be desirable (see above, analogous to Fig. 2).

      More example traces have been added.

      Figure 4: 'In vivo mixtures' sounds unusual. Consider revision (e.g., 'to sparsely delete KL').

      Changed

      The observation that control PC sIPSC frequency is lower in KL OX PCs than in sham is interesting. This observation would be consistent with overall inhibitory synapse density being preserved. This could be evaluated with immunohistochemistry. For how far away from the injection area does this observation hold true?

      Because we have now analyzed and failed to find an overt (per animal average) change in synaptic puncta size or density in the whole animal Control vs PCP2 Cre mediated KL KO conditions, we do not have confidence that it is feasible to pursue this IHC strategy in the sparse viral-mediated KL KO or OX conditions. To the reviewer’s valid point however, we intend to probe the spatial extent/specificity of the sparse phenomenon when we are resourced to complement the KL/Kit manipulations with transgenic methods for evaluating MLI-PC synapses specifically, potentially by GRASP or related methods that would not be confounded by PC-PC synapses. Transgenic MLI access would also facilitate determining the spatial extent to which opto-genetically activated MLIs evoke equivalent responses in Control vs KL manipulated PCs.

      Y-legend in D clipped.

      Corrected

      Existing literature suggests that MLI inhibition regulates the regularity of PC firing - this could be tested in Kit and KL mutants.

      For now, based upon transgenic animal availability, we have now included an evaluation of PC firing in the (Pax2 Cre mediated) Kit KO condition. PC average firing frequency, mean ISI, and ISI CV2 were not significantly different across genotypes. A KS test of individual ISI durations for Control vs Kit KO did reveal a difference (p<0.0001). We have added a supplementary figure (S6) with this data. It is possible that in the more phenotypic PC KL KO condition that we may find a difference in these PC spiking patterns of PC firing, however, we are also eager to test in future studies whether postnatal KL or Kit KO impairs the ability of MLI activation to produce pauses or other alterations in PC firing or in PF-PC mediated plasticity.

      Reviewer #3 (Recommendations For The Authors):

      Reference to Figure 1A in the Results section is slightly inaccurate. Kit gene modifications are illustrated in Figures 1A, B. Where Figure 1A shows Kit distribution. Please rephrase. Relatedly, the reference to Figs 1B - D are shifted in the results section, and 1E is skipped.

      We have changed the text.

      Please show cumulative histograms for frequency too for consistency with amplitude (e.g. Fig 2).

      We have instead, for reasons outlined by other reviewers, documented total charge transfer for both Kit KO and KL KO experiments where sIPSC events were analyzed.

      Fig S3: include example traces of PPR.

      This is now included.

      Include quantifications of GABAergic synapse density in Fig S4.

      This is now included.

      Include inset examples of KO in Fig S4A.

      This is now included.

      Add average puncta size graphs along Figure S4B. The effect apparent in the histogram of S4B is small and statistics using individual puncta as n values (in the 20,000s) therefore misleading.

      Per animal analysis is now instead included in the figure and text.

      Figure S4B y axis label blocked.

      Corrected

      Include quantification referenced in "As PSD95 immunoreactivity faithfully follows multiple markers of pinceaux size 40, we quantified PSD95 immunoreactive pinceau area and determined that pinceaux area was decreased by ~50% in Kit KO (n 26 Control vs 43 Kit KO, p<0.0001, two-tailed t-test)."

      We added a graph of per animal averages, instead of in text individual pinceau areas.

      Include antibody dilutions in the methods.

      Added.

      It's unclear from the text where the Mirow lab code comes from.

      Detail has now been added in text.

      Typo in methods "The Kit tm1c alle was bred...".

      Corrected

      Typo in Figure S4 legend "POSD-95 immuno-reactivity".

      Corrected

    1. Author Response

      The following is the authors’ response to the original reviews.

      First of all, we'd like to thank the three reviewers for their meticulous work that enable us to present now an improved manuscript and substantial changes were made to the article following reviewers' and editors' recommendations. We read all their comments and suggestions very carefully. Apart from a few misunderstandings, all comments were very pertinent. We responded positively to almost all the comments and suggestions, and as a result, we have made extensive changes to the document and the figures. This manuscript now contains 16 principal figures and 15 figure supplements.

      The number of principal figures is now 16 (1 new figure), and additional panels have been added to certain figures. On the other hand, we have added 7 additional figures (supplement figures) to answer the reviewers' questions and/or comments.

      Main figures

      ▪ Figures 1, 4, 5, 10, 11, 12, 13, 14: unchanged ▪ Figure 7 and 8 were switched.

      ▪ Figure 2: we added panel F in response to reviewer 3's and request for sperm defect statistics

      ▪ Figure 3: the contrast in panel B has been taken over to homogenize colors

      ▪ Figure 6: This figure was recomposed. The WB on testicular extract was suppressed and we present a new WB allowing to compare the presence of CCDC146 in the flagella fraction. Using an anti-HA Ab, we demonstrate that the protein is localized in the flagella in epididymal sperm. Request of the 3 reviewers.

      ▪ Figure 7 (old 8): to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum. Moreover, the WB was removed and is now presented in figure 6 (improved as requested).

      ▪ Figure 8. Was old figure 7

      ▪ Figure 9: figure 9 was recomposed and improved for increased clarity as suggested by reviewer 2 and 3.

      ▪ Figure 16 was before appendix 11

      Figure supplements and supplementary files

      ▪ Figure 1-Figure supplement 1 New. Sperm parameters of the 2 patients. requested by editor (remark #1) by the reviewer 1 (Note #3)

      ▪ Figure 2-Figure supplement 1 new. Sperm parameters of the line 2 (KO animals) requested by the reviewer 1 (Note #5)

      ▪ Figure 4-Figure supplement 1 New. Experiment to evaluate the specificity of the human CCDC146 antibody. Minimal revision request and reviewer 1 note #8

      ▪ Figure 6-Figure supplement 1 New. Figure recomposed; Asked by reviewer 2 note #4 and reviewer 3

      ▪ Figure 8-Figure supplement 1 New. We now provide new images to show the non-specific staining of the midpiece of human sperm by secondary Abs in ExM experiments; Asked by reviewer 2

      ▪ Figure 10-Figure supplement 1 New. We added new images to show the non-specific staining of the midpiece of mouse sperm by secondary Abs in IF (panel B). Rewiever 1 note #9 and reviewer 2 note #5

      ▪ Figure 12-Figure supplement 1 New. Control requested by reviewer 3 Note #23

      ▪ Figure 13-Figure supplement 1 New. We provide a graph and a statistical analysis demonstrating the increase of the length of the manchette in the Ccdc146 KO. Requested by editor and reviewer 3 Note 24

      ▪ Figure 15-Figure supplement 1 New. Control requested by reviewer 2. Minor comments

      ▪ Figure supplementary 1 New. Answer to question requested by reviewer 2 note #1

      All the reviewers' and editors’ comments have been answered (see our point to point response) and we resubmit what we believe to be a significantly improved manuscript. We strongly hope that we meet all your expectations and that our manuscript will be suitable for publication in "eLife". We look forward to your feedback,

      Point by point answer

      Please note that there has been active discussion of the manuscript and the summarize points below is the minimal revision request that the reviewers think the authors should address even under this new review model system. It was the reviewers' consensus that the manuscript is prepared with a lot of oversights - please see all the minor points to improve your manuscript.

      All minimal revision requests have been addressed

      Minimal revision request

      1) Clinical report/evaluation of the two patients should be given as it was not described even in their previous study as well as full description of CCDC146.

      We provide now a new Figure 1-figure supplement 1 describing the patients sperm parameters

      2) Antibody specificity should be provided, especially given two of the reviewers were not convinced that the mid piece signal is non-specific as the authors claim. As both KO and KI model in their hands, this should be straightforward.

      To validate the specificity of the Antibody, we transfected HEK cells with a human DDK-tagged CCDC146 plasmid and performed a double immunostaining with a DDK antibody and the CCDC146 antibody. We show that both staining are superimposable, strongly suggesting that the CCDC146 Ab specifically target CCDC146. This experiment is now presented in Figure 4-Figure supplement 1. Next, to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum.

      3) The authors should improve statistical analysis to support their experimental results for the reader can make fair assessment. Combined with clear demonstration of ab specificity, this lack of statistical analysis with very few sample number is a major driver of dampening enthusiasm towards the current study.

      Several statistical analyses were carried out and are now included:

      1) distribution of the HA signal in mouse sperm cells (see point 2 Figure 7 panel B)

      2) quantification and statistical analyses of the defect observed in Ccdc146 KO sperm (figure 2 panel E)

      3) Quantification and statistical analyses of the length of the manchette in spermatids 13-15 steps (Figure 13-Figure supplement 1 new)

      4) The authors need to clarify (peri-centriolar vs. centriole)

      In figure 4A, we have clearly shown that the protein colocalizes with centrin, a centriolar core protein in somatic cells. This colocalization strongly suggests that CCDC146 is therefore a centriolar protein, and this is now clearly indicated lines 211-212. However, its localization is not restricted to the centrioles and a clear staining was also observed in the pericentriolar material (PCM). The presence of a protein in PCM and centriole was already described, and the best example is maybe gamma-tubulin (PMID: 8749391).

      or tone down (CCDC146 to be a MIP) of their claim/description.

      Concerning its localization in sperm, we agree with the reviewer that our demonstration that CCDC146 is MIP would deserve more results. Because of that, we have toned down the MIP hypothesis throughout the manuscript. See lines 491495

      Testis-specific expression of CCDC146 as it is not consistent with their data.

      We have also modified our claim concerning the testis-expression of CCDC146. Line 176

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      1) As described in general comments, this study limits how the CCDC146 deficiency impairs abnormal centriole and manchette formation. The authors should explain their relationship in developing germ cells.

      In fact, there are limited information about the relationship between the manchette and the centriole. However, few articles have highlighted that both organelles share molecular components. For instance, WDR62 is required for centriole duplication in spermatogenesis and manchette removal in spermiogenesis (Commun Biol. 2021; 4: 645. doi: 10.1038/s42003-021-02171-5). Another study demonstrates that CCDC42 localizes to the manchette, the connecting piece and the tail (Front. Cell Dev. Biol. 2019 https://doi.org/10.3389/fcell.2019.00151). These articles underline that centrosomal proteins are involved in manchette formation and removal during spermiogenesis and support our results showing the impact of CCDC146 lack on centriole and manchette biogenesis. This information is now discussed. See lines 596-603

      2) The authors generated knock-in mouse model. If then, are the transgene can rescue the MMAF phenotype in CCDC146-null mice? This reviewer strongly suggest to test this part to clearly support the pathogenicity by CCDC146.

      We indeed wrote that we created a “transgenic mice”, which was misleading. We actually created a CCDC16 knock-in expressing a tagged-protein. The strain was actually made by CRISPR-Cas9 and a sequence coding for the HA-tag was inserted just before the first amino acid in exon 2, leading to the translation of an endogenous HA-tagged CCDC146 protein. We have removed the word transgenic from the text and made changes accordingly (see lines 250-253). We can therefore not use this strain to rescue the MMAF phenotype as suggested by the reviewer.

      3) Although the authors cite the previous study (Coutton et al., 2019), the study does not describe any information for CCDC146 and clinical information for the patients. The authors must show the results for clinical analysis to clarify the attended patients are MMAF patients without other phenotypic defects.

      We have now inserted a table, indicating all sperm parameters for the patients harboring a mutation in the CCDC146 gene (Figure 1-Figure supplement 1) and is now indicated lines 159-160

      4) The authors describe CCDC146 expression is dominant in testes, However, the level in testis is only moderate in human (Supp Figure 1). Thus, this description is not suitable.

      In Figure 1-figure supplement 2 (old FigS1), the median of expression in testis is around 12 in human, a value considered as high expression by the analysis software from Genevestigator. However, for mouse, it is true that the level of expression is medium. We assumed that reviewer’s comment concerned testis expression in mouse. To take into account this remark, we changed the text accordingly. See line 176.

      5) Although the authors mentioned that two mice lines are generated, only one line information is provided. Authors must include information for another line and provide basic characterization results to support the shared phenotype within the lines.

      We now provide a revised Figure 2-figure supplement 1CD, presenting the second line and the corresponding text in the main text is found lines 178-183.

      6) In somatic cells, the CCDC146 localizes at both peri-centriole and microtubule but its intracellular localization in sperm is distinguished. The authors should explain this discrepancy.

      The multi-localization of a centriolar protein is already discussed in detail in discussion lines 520-526. We have written:

      “Despite its broad cellular distribution, the association of CCDC146 with tubulin-dependent structures is remarkable. However, centrosomal and axonemal localizations in somatic and germ cells, respectively, have also been reported for CFAP58 [37, 55], thus the re-use of centrosomal proteins in the sperm flagellar axoneme is not unheard of. In addition, 80% of all proteins identified as centrosomal are found in multiple localizations (https://www.proteinatlas.org/humanproteome/subcellular/centrosome). The ability of a protein to home to several locations depending on its cellular environment has been widely described, in particular for MAP. The different localizations are linked to the presence of distinct binding sites on the protein…. “

      7) Authors mention CCDC146 is a centriolar protein in the title and results subtitle. However, the description in results part depicts CCDC146 is a peri-centriolar protein, which makes confusion. Do the authors claim CCDC146 is centrosomal protein?

      In figure 4A, we have clearly shown that the protein colocalizes with centrin, a centriolar core protein. This colocalization strongly suggests that CCDC146 is therefore a centriolar protein in somatic cells, and is now clearly indicated lines 211-212. However, its localization is not restricted to the centrioles and a clear staining was also observed in the pericentriolar material (PCM). The presence of a protein in PCM and centriole was already described and the best example is maybe gamma-tubulin (PMID: 8749391).

      8) Verification of the antibody against CCDC146 must be performed and shown to support the observed signal are correct. 2nd antibody only signal is not proper negative control.

      It is a very important remark. The commercial antibody raised against human CCDC146 was validated in HEK293-cells expressing a DDK-tagged CCDC146 protein. Cells were co-marked with anti-DDK and anti-CCDC146 antibodies. We have a perfect colocalization of the staining. This experiment is now presented in Figure 4-figure supplement 1 and presented in the text (lines 206-208).

      9) In human sperm, conventional immunostaining reveals CCDC146 is detected from acrosome head and midpiece. However, in ExM, the signal at acrosome is not detected. How is this discrepancy explained? The major concern for the ExM could be physical (dimension) and biochemical (properties) distortion of the sample. Without clear positive and negative control, current conclusion is not clearly understood. Furthermore, it is unclear why the authors conclude the midpiece signal is non-specific. The authors must provide experimental evidence.

      Staining on acrosome should always be taken with caution in sperm. Indeed, numerous glycosylated proteins are present at the surface of the plasma membrane regarding the outer acrosomal membrane for sperm attachment and are responsible for numerous nonspecific staining. Moreover, this acrosomal staining was not observed in mouse sperm, strongly suggesting that it is not specific.

      Concerning the staining in the midpiece observed in both conventional and Expansion microscopy, it also seems to be nonspecific and associated with secondary Abs.

      For IF, we now provide new images showing clearly the nonspecific staining of the midpiece when secondary Ab were used alone (see Figure 10-figure supplement 1B).

      For ExM, we provide new images in Figure 8-figure supplement 1B (POC5 staining) showing a staining of the midpiece (likely mitochondria), although POC5 was never described to be present in the midpiece. Both experiments (CCDC146 and POC5 staining by ExM) shared the same secondary Ab and the midpiece signal was likely due to it.

      Moreover, we now provide new images (figure 7C) in ExM on mouse sperm showing no staining in the midpiece and demonstrating that the punctuated signal is present all along the flagellum. Finally, we would like to underline that we now provide new IF results, using an anti-HA conjugated with alexafluor 488 and confirming the ExM results.

      These points are now discussed lines 498-502 for acrosome and lines 503-511 for midpiece staining.

      10) For intracellular localization of the CCDC146 in mouse sperm, the authors should provide clear negative control using WT sperm which do not carry the transgene.

      This experiment was performed.

      To avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum.

      11) Current imaging data do not clearly support the intracellular localization of the CCDC146. Although western blot imaging reveal that CCDC146 is detected from sperm flagella, this is crude approach. Thus, this reviewer highly recommends the authors provide more clear experimental evidence, such as immuno EM.

      We provide now a WB comparing the presence of the protein in the flagellum and in the head fractions; see new figure 6. We show that CCDC146 is only present in the flagellum fraction; The detection of the band appeared very quickly at visualization and became very strong after few minutes, demonstrating that the protein is abundant in the flagella. It is important to note that epididymal sperm do not have centrioles and therefore this signal is not a centriolar signal. We also now provide new statistical analyses showing that the immuno-staining observed in the principal piece is very specific (Figure 7B). Altogether, these results demonstrate unequivocally the intracellular localization of CCDC146 in the flagellum. This point is now discussed lines 480-489

      12) Although sarkosyl is known to dissociate tubulin, it is not well understood and accepted that the enhanced detection of CCDC146 by the detergent indicates its microtubule inner space. Sperm axoneme to carry microtubule is also wrapped peri-axonemal components with structural proteins, which are even not well solubilized by high concentration of the ionic detergent like SDS.

      We agree with the reviewer that the solubilization of the protein by sarkozyl is not a proof of the presence of the protein inside microtubule. Taking into account this point, the MIP hypothesis was toned down and we now discuss alternative hypothesis concerning these results; See discussion lines 490-497

      13) SEM image is not suitable to explain internal structure (line 317-323).

      We agree with the reviewers and changes were made accordingly. See lines 354-357

      Minor comments

      1) In main text, supplementary figures are cited "Supp Figure". And the corresponding legends are written in "Appendix - Figure". Please unify them.

      Done Labelled now “Figure X-figure supplement Y”

      2) Line 159, "exon 9/19" is not clear.

      We have written now exons 9 and indicated earlier that the gene contains 19 exons

      3) Line 188, "positive cells" are vague.

      Positive was changed by “fluorescent”

      4) Representative TUNEL assay image for knockout testes were not shown in Supp Figure 3B.

      It was a mistake now Figure 2-figure supplement 2C

      5) Please provide full description for "IF" and "AB" when described first.

      Done

      6) Line 262, It is unclear what is "main piece".

      Changed to principal piece

      7) Line 340, Although the "stage" information might be applicable, this is information for "seminiferous tubule" rather than "spermatid". This reviewer suggests to provide step information rather than stage information.

      We agree with the reviewer that there was a confusion between “stage” and “step”. We change to step spermatids

      8) Line 342, Step 1 is not correct in here.

      OK corrected. now steps 13-15 spermatids

      9) Line 803, "C." is duplicated.

      Removed

      10) Figure 3A, it will be good to mark the defective nuclei which are described in figure legends.

      These cells are now indicated by white arrow heads

      11) Figure 5, Please provide what MT stands for.

      Now explained in the legend of figure 5

      12) Figure 6. Author requires clear blot images for C. In addition, Panel B information is not correct. If the blot was performed using HA antibody, then how "WT" lane shows bands rather than "HA" bands?

      The reviewer is correct. It was a mistake; The figure was recomposed and improved.

      Reviewer #2 (Recommendations For The Authors):

      Overall, editing oversights are present throughout the manuscript, which has made the review process quite difficult. Some repetitive figures can be removed to streamline to grasp the overall story easier. Some claims are not fully supported by evidence that need to tone down. Some figures not referenced in the main text need to be mentioned at least once.

      All figures are now referenced in the text

      Major comments:

      1) 163-164 - Please clarify the claim that there is going to be an absence of the protein or nonfunctional protein, especially for the patient with a deletion that could generate a truncated protein at two third size of the full-length protein. Similarly, 35% of the protein level is present for the patient with a nonsense mutation. Some in silico structural analysis or analysis of conserved domains would be beneficial to support these claims.

      Both mutations are predicted to produce a premature stop codons: p.Arg362Ter and p.Arg704serfsTer7, leading either to the complete absence of the protein in case of non-sense mediated mRNA decay or to the production of a truncated protein missing almost two third or one fourth of the protein respectively. CCDC146 is very well conserved throughout evolution (Figure supplementary 1), including the 3’ end of the protein which contains a large coil-coil domain (Figure 1B). In view of the very high degree of conservation, it is most likely that the 3’ end of the protein, absent in both subjects, is critical for the CCDC146 function and hence that both mutations are deleterious. This explanation is now added to the discussion. see lines 439-448

      2) 173, 423 - Please clearly state a rationale of your mouse model design (i.e., why a mouse model that recapitulate human mutation is not generated) as the truncations identified in human patients are located further towards the C-terminus, and it is not clear whether truncated proteins are present, and if so, they could still be functional. Basically, the current mouse model supports the causality of the human mutations.

      This is an important question, which goes beyond the scope of this article, and raises the question of how to confirm the pathogenicity of mutations identified by high-throughput sequencing. The production of KO or KI animals is an important tool to help confirm one’ suspicions but the first element to take into consideration is the nature of the genetic data.

      Here we had two patients with homozygous truncating variants. In human, it is well established that the presence of premature stop codons usually induces non-sense mediated mRNA decay (NMD), inducing the complete absence of the protein or a strong reduction in protein production. In the unlikely absence of NMD in our two patients, the identified variants would induce the production of proteins missing 60% and 30% of their C terminal part. Often (and it is particularly true for structural proteins) the production of abnormal proteins is more deleterious than the complete absence of the protein (and it is most likely the purpose of NMD, to limit the production of abnormal “toxic” proteins). For these reasons, to try to recapitulate the most likely consequences of the human variants, without risking obtaining an even more severe effect, we decided to introduce a stop codon in the first exon in order to remove the totality of the protein in the KO mice.

      The second element is to interpret the phenotype of the KO animals. Here, the human sperm phenotype is perfectly recapitulated in the KO mice.

      Overall, we have strong genetic arguments in human and the reproduction of the phenotype in KO mice confirming the pathogenicity of the variants identified in men.

      This point is now discussed see lines 433-438

      3) Figure 6A - the labelling is misleading as it seems to suggest that the specific cells were isolated from the testes for RT-PCR.

      We have modified the labelling to avoid any confusion.

      Figure 6B -Signal of HA-tag is shown in WT, not in transgenic. Please check the order of the labels. Figure 6C - This blot is NOT a publication-quality figure. The bands are very difficult to observe, especially in lane D18. Because it is one of the important data of this study, replacing this figure is a must.

      The figure has been completely remade, including new results. See new figure 6. Figure 6C was suppressed.

      4) Supplementary fig 6 is also not a publication-level figure, and the top part seems largely unnecessary (already in the figure legend).

      The figure has been completely remade as well (now Figure 6-Figure Supplement 1).

      5) 261/267- The conclusion that mitochondrial staining in the flagellum (in both mice and humans) is non-specific is not convincing. Supplementary fig 8 shows that the signal from secondary only IF possibly extends beyond the midpiece - but it is hard to determine as no mitochondrial-specific staining is present. Either need to tone down the conclusion or provide supporting experimental evidence.

      First, to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum. These experiments are now described lines 271-279

      Second, we provide new images of the signal obtained with secondary Abs only that shows more clearly that the secondary Ab gave a non-specific staining (Figure 10-Figure supplement 1B). This point is discussed lines 503-511

      6) Figure 9 A - Please relate the white line to Fig. 9B label in X-axis. The information from Fig 9A+D and 9E+F are redundant. The main text nor the figure legends indicate why these specific two sperm were chosen for quantification and demonstrating the outcomes. One of them could be moved to supplementary information or removed, or the two could be combined.

      As suggested by the reviewer, we have combined the two sperm to demonstrate that CCDC146 staining is mostly located on microtubule doublets. Moreover, the figure was recomposed to make it clearer.

      Minor comments:

      All of the supplementary figures are referred to as Supp Fig X in the text, however, they are actually titled Appendix - Figure X. This needs to be consistent.

      The figures are now referred as figure supplement x in both text and figures

      Line 125 - edit spacing.

      We think this issue (long internet link) will be curated later and more efficiently by the journal, during the step of formatting necessary for publication.

      144 - With which to study  with which we studied?

      We made the change as suggested.

      151 - Supp Fig 1 - the text says that the gene is highly transcribed in human and mouse testes, but the information in the figure states that the level in mouse tissues is "medium"

      We have corrected this mistake in the text; See line 176

      165 - The two mutations are most likely deleterious. Please specifically mention what analyses done to predict the deleterious nature to support these claims.

      Both variants, c.1084C>T and c.2112del, are extremely rare in the general population with a reported allele frequency of 6.5x10-5 and 6.5x10-06 respectively in gnomAD v3. Moreover, these variants are annotated with a high impact on the protein structure (MoBiDiC prioritization algorithm (MPA) score = 10, DOI: 10.1016/j.jmoldx.2018.03.009) and predicted to induce each a premature termination codon, p.(Arg362Ter) and p.(Arg704SerfsTer7) respectively, leading to the production of a truncated protein. This information is now given line 164-169

      196-200/Figure 4 - As serum starved cells/basal body (B) are not mentioned in the main text, as is, Fig 4A would be sufficient/is relevant to the text. Please make the text reflect the contents of the whole figure, or re/move to supplement.

      We agree with the reviewer that the full description of the figure should be in the text. We added two sentences to describe figure 4B see lines 217-218.

      224 - spermatozoa (plural) fits better here, not spermatozoon

      OK changed accordingly

      236 - According to the figure legend, 6B is only showing data from the epididymal sperm, not postnatal time points; should be referencing 6C. Alignment of Marker label

      As indicated above, the figure has been completely remade, including new results. See new figure 6. Figure 6C was suppressed. The corresponding text was changed accordingly see lines 249-266

      255-256 - Referenced figure 7B3, however, 7B3 only shows tubulin staining, so no CCDC146 can be observed. Did authors mean to reference fig 7B as a whole?

      Sorry for this mistake. We agree and the text is now figure 8B6 (figure 7 and 8 were switched)

      305 - "of tubules" - I presume it is meant to be microtubules?

      Yes; The text was changed as suggested

      317-321 - a diagram of HTCA would be useful here

      We have added a reference where HTCA diagram is available see line 363. Moreover, a TEM view of HTCA is presented figure 12A

      322/Fig 11A - an arrow denoting the damage might be useful, as A1 and A3 look similar. The size of the marker bar is missing. Please update the information on figure legend.

      Concerning, the comparison between A1 and A3, the take home message is that there is a great variability in the morphological damages. This point is now underlined in the corresponding text. We updated the size of the marker bar as suggested (200 nm). See line 365-367

      323 - Please mark where capitulum is in the figure

      Capitulum was changed for nucleus

      Since Fig 11B2 is not referenced in the main text, it does not seem to add anything to the data, and could be removed/moved to supplement.

      We added a sentence to describe figure 11B2 line 370

      342-343 - manchette in step I is not seen clearly - the figure needs to be annotated better. However, DPY19L2 is absent in step I in the KO, but the main text does not reflect that - why is that?

      We do not understand the remark of the reviewer “manchette in step I is not seen clearly”. The figure shows clearly the manchette (red signal) in both WT and KO (Figure 13 D1/D2).

      For steps 13-15 WT spermatids, the size of the manchette decreases and become undetectable. In KO spermatids, the shrinkage of the manchette is hampered and in contrast continue to expand (Figure 13D2). We also provide a new Figure 13-figure supplement 1 for other illustrations of very long manchettes and a statistical analysis. In the meantime, the acrosome is strongly remodeled, as shown in figure 16-new, with detached acrosome (panel H). This morphological defect may induce a loss of the DPY19L2 staining (Figure 13 D2 stage I-III). This explanation is now inserted in the text line 396399

      Figure 15B and 15C only show KO, corresponding images from the WT should be present for comparison.

      WT images are now provided in Figure 1-figure supplement 1 new

      Figure 12 - Figure 12 - JM?.

      JM was removed. It does not mean anything

      Figure 12C and Supplementary Fig 10 - structures need to be labelled, as it is unclear what is where

      Done

      338 - text mentions step III, but only sperm from step VII are shown in Figure 13

      As suggested by reviewer 3, we changed stage by step. The text was modified to take into account this remark see lines 388-396

      360 - This is likely supposed to say Supp Figure 11E-G, not 13??

      Yes, it is a mistake. Corrected

      388 Typo "in a in a".

      Yes, it is a mistake. Corrected

      820 - Fig 3 legend - in KO spermatid nuclei were elongated - could this be labelled by arrows? I am not convinced this phenotype is that different from the WT.

      In fact, the nuclei of elongating KO spermatids are elongated and also very thin, a shape not observed in the WT; We have added arrow heads and modified the text to indicate this point line 200.

      836 - Figure 5 legend says that in yellow is centrin, but that is not true for 5A, where the figure shows labelling for y-tubulin (presumably, according to the figure itself).

      We have modified the text of the legend to take into account the remark

      837- 5A supposedly corresponds to synchronized HEK293T cells, but the reasoning behind using synchronized cells is not mentioned at all in the main text; furthermore, how this synchronization is achieved is not explained in materials and methods (serum starvation? Thymidine block?).

      Yes, figure 5A was obtained with synchronized cells. We have added one paragraph in the MM section. For cell synchronization experiments, cells underwent S-phase blockade with thymidine (5 mM, SigmaAldrich) for 17 h followed by incubation in a control culture medium for 5 h, then a second blockade at the G2-M transition with nocodazole (200 nM, Sigma-Aldrich) for 12 h. Cells were then fixed with cold methanol at different times for IF labelling. See line 224 for changes made in the result section and lines 700-704 for changes made in the MM section.

      845- figure legend says that the RT-PCR was done on CCDC146-HA tagged mice, but the main text does not reflect that.

      We made changes and the description of the KI is now presented before (line 240) the RT-PCR experiment (line 257).

      949 - it is likely supposed to say A2, not B1 (B1 does not exist in Fig 15)

      Yes, it is a mistake. Corrected

      971 - Appendix Fig 3 legend - I believe that the description for B and C are swapped.

      Yes, it is a mistake. Corrected

      Furthermore, some questions to address in A would be: Which cross sections were from which animal/points? How many per animal? Were they always in the same location?

      Yes, we have a protocol for arranging and orienting all testes in the same way during the paraffin embedding phase. The cross-sections are therefore not taken at random, and we can compare sections from the same part of the testis. The number of animals was already indicated in the figure legend (see line 1128)

      Reviewer #3 (Recommendations For The Authors):

      1) There are a number of grammatical and orthographical errors in the text. Careful proofreading should be performed.

      We have sent the manuscript to a professional proofreader

      2) The author should also check for redundancies between the introduction and the discussion.

      The discussion has modified to take into account reviewers’ remarks. Nevertheless, we did our best to avoid redundancies between introduction and discussion.

      3) Can the authors provide a rationale why they have chosen to tag their gene with an HA tag for localisation? One would rather think of fluorescent proteins or a Halo tag.

      Because the functional domains of the protein are unknown, adding a fluorescent protein of 24 KDa may interfere with both the localization and the function of CCDC146. For this reason, we choose a small tag of only 1.1 KDa, to limit as such as possible the risk of interfering with the structure of the protein. This rational is now indicated in the manuscript lines 251-254. It is worth to note, that the tagged-strain shows no sperm defect, demonstrating that the HA-tag does not interfere with CCDC146 function.

      4) In the abstract, line 53, "provide evidence" is not the right term for something that is just suggestive. The term "suggests" would be more appropriate.

      The text was modified to take into account this remark

      5) Line 74: "genetic deficiency" sounds strange here, do the authors mean simply "mutation"?

      Infertility may be due to several genetic deficiency such as chromosomal defects (XXY (Klinefelter syndrome)), microdeletion of the Y chromosome or mutations in a single gene. Therefore, mutation is too restrictive. Nevertheless, we modified the sentence which is now “…or a genetic disorder including chromosomal or single gene deficiencies”

      6) Lines 163-164: the authors describe the mutations (premature stop mutations) and say that they could either lead to complete absence of the gene product, or the expression of a truncated protein. Did they test this, for example, with some immuno blot analyses?

      As stated above, unfortunately, we were unable to verify the presence of RNA-decay in these patients for lack of biological material.

      7) Line 184 and Fig 2E: the sperm head morphologies should be quantitatively assessed.

      We provide now a full statistical analysis of the observed defects: see new panel in Figure 2 F

      8) Fig 3: The annotation should be more precise - KO certainly means CDCC146-KO. The colours of the IH panels is different, which attracts attention but is clearly a colour-adjustment artefact. Colours should be adjusted for the panels to look comparable. It would be also helpful to add arrowheads into the figure to point at the phenotypes that are highlighted in the text.

      We have added Ccdc146 KO in all figures. We have added arrow heads to point out the spermatids showing a thin and elongated nucleus. Concerning adjustment of colors, we attempted to make images of panel B comparable. See new figure 3.

      9) Fig 6A: the authors use RT PCR to determine expression dynamics of their gene of interested, and use actin (apparently) as control. However, actin and CDCC146 expression levels follow the same trend. How is the interpreted?

      The reviewer did not understand the figure. The orange bars do not correspond to actin expression and the grey bars to Ccdc146 expression but both bars represent the mRNA expression levels of Ccdc146 relative to Actb (orange) and Hprt (grey) expression in CCDC146-HA mouse pups’ testes. We tested two housekeeping genes as reference to be sure that our results were not distorted by an unstable expression of a housekeeping gene. We did not see significant difference between both house keeping genes. Actin was not used.

      10) In line 235, the authors suggest posttranslational modifications of their protein as potential cause for a slightly different migration in SDS PAGE as predicted from the theoretical molecular weight. This is not necessarily the case, some proteins do migrate just differently as predicted.

      We have changed the text accordingly and now provide alternative explanation for the slightly different migration. See lines 258-259

      11) The annotation of Fig 6 panels is problematic. First, why do the authors write "Laemmli" as description of the gel? It would be more helpful to write what is loaded on the gel, such as "sperm". Second, in panels B and C it would be helpful to add the antibodies used. It is not clear why there is a signal in the WT lane of panel B, but not in the HA lane (supposing an anti-HA antibody is used: why has WT a specific HA band?). In panel C, it is not clear why the blot that has so beautifully shown a single band in panel B suddenly gives such a bad labelling. Can the authors explain this? Also, they cut off the blot, likely because to too much background, but this is bad practice as full blots should be shown. In the current state, the panel C does not allow any clear conclusion. To make it conclusive, it must be repeated.

      Several mistakes were present in this figure. This figure was recomposed. The WB on testicular extract was suppressed and we now present a new WB allowing to compare the presence of CCDC146 in the flagella and head fractions from WT and HA-CCDC146 sperm. Using an anti-HA Ab, we demonstrate that in epididymal sperm the protein is localized in the flagella only. See new figure 6. The corresponding text was changed accordingly.

      12) The authors have raised an HA-knockin mouse for CDCC146, which they explained by the unavailability of specific antibodies. However, in Fig 7, they use a CDCC146 antibody. Can they clarify?

      The commercial Ab work for HUMAN CCDC146 but not for MOUSE CCDC146. We have added few words to make the situation clearer, we have added the following information “the commercial Ab works for human CCDC146 only”. See line 240

      13) In Fig 7A (line 258), the authors hypothesise that they stain mitochondria - why not test this directly by co-staining with mitochondria markers?

      We chose another solution to resolve this question:

      To avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the whole flagellum.

      14) It seems that in both, Fig 7 and 8, the authors use expansion microscopy to localise CDCC146 in sperm tails. However, the staining differs substantially between the two figures. How is this explained?

      In figure 8 we used the commercial Ab in human sperm, whereas in figure 7 we used the anti-HA Abs in mouse sperm. Because the antibodies do not target the same part of the CCDC146 protein (the tag is placed at the N-terminus of the protein, and the HPA020082 Ab targets the last 130 amino acids of the Cter), their accessibility to the antigenic site could be different. However, it is important to note that both antibodies target the flagellum. This explanation is now inserted see lines 304-312

      15) Fig 8D and line 274: the authors do a fractionation, but only show the flagella fraction. Why?

      Showing all fractions of their experiment would have underpinned the specific enrichment of CDCC146 in the flagella fraction, which is what they aim to show. Actually, given the absence of control proteins, the fact that the band in the flagellar fraction appears to be weaker than in total sperm, one could even conclude that there is more CDCC146 in another (not analysed) fraction of this experiment. Thus, the experiment as it stands is incomplete and does not, as the authors claim, confirm the flagellar localisation of the protein.

      We agree with the reviewer’s remark. We provide now new results showing both flagella and nuclei fractions in new figure 6A. This experiment is presented lines 253-256

      16) Line 283, Fig 9D,F: The description of the microtubules in this experiment is not easy to understand. Do the authors mean to say that the labelling shows that the protein is associated with doublet microtubules, but not with the two central microtubules? They should try to find a clearer way to explain their result.

      As suggested by reviewer 2, we have changed the figure to make it clearer. The text was changed accordingly. See new figure 9 and new corresponding legend lines 1006.

      17) Fig 9G - how often could the authors observe this? Why is the axoneme frayed? Does this happen randomly, or did the authors apply a specific treatment?

      Yes, it happens randomly during the fixation process.

      18) Line 300 and Fig 10A - the authors talk about the 90-kDa band, but do say anything about what they think this band is representing.

      We have now added the following sentence lines 340-342: “This band may correspond to proteolytic fragment of CCDC146, the solubilization of microtubules by sarkosyl may have made CCDC146 more accessible to endogenous proteases.”

      19) Fig 11A, lines 321-322: the authors write that the connecting piece is severely damaged. This is not obvious for somebody who does not work in sperm. Perhaps the authors could add some arrow heads to point out the defects, and briefly describe them in the text.

      We realized from your remark that our message was not clear. In fact, there is a great variability in the morphological damages of the HTCA. For instance, the HTCA of Ccdc146 KO sperm presented in figure 10A2 is quite normal, whereas that in figure 10A4 is completely distorted. This point is now underlined in the corresponding text. See lines 367-369

      We also added the size of the marker bar (200 nm), which were missing in the figure’s legend.

      20) Line 323: it will be important to name which tubulin antibody has been used to identify centrioles, as they are heavily posttranslationally modified.

      The different types of anti-tubulin Abs are described in the corresponding figure’s legend

      21) Fig 11B - phenotypes must be quantified to make these observations meaningful.

      We agree that a quantification would improve the message. However, testicular sperm are obtained by enzymatic separation of spermatogenic cells and the number of testicular sperm are very low. Moreover, not all sperm are stained. Taking these two points into account, it seems to us that quantification could be difficult to analyze. For this reason, the quantification was not done; however, it is important to note that these defects were not observed in WT sperm, demonstrating that these defects are cased by the lack of CCDC146. We have added a sentence to underline this point; See lines 374-375

      22) Line 329: Figure 12AB - is this a typo - should it read Figure 12B?

      We have split the panel A in A1 and A2 and changed the text accordingly. See line 378

      23) Why are there not wildtype controls in Fig 12B, C?

      We provide now as Figure 12-figure supplement 1, a control image for fig 12B. For figure 12C, the emergence of the flagellum from the distal centriole in WT is already shown in Fig 12A1

      24) Fig 13: the authors write that the manchette is "clearly longer and wider than in WT cells" (lines 342-343). How can they claim this without quantitative data?

      We now provide a statistical analysis of the length of the manchette. See figure 13-figure supplement 1A. We also provide a new a new image illustrating the length of the manchette in Ccdc146 KO spermatids; See Figure 13-figure supplement 1B.

    1. Author Response

      We appreciate the insightful and constructive feedback from the reviewers regarding our manuscript, "Gain neuromodulation mediates perceptual switches: evidence from pupillometry, fMRI, and RNN Modelling." The comments have provided us with a number of valuable perspectives that will undoubtedly strengthen the impact and clarity of our work.

      We recognize the need for a more detailed and comparative analysis of the perceptual tasks used in our pupil and fMRI experiments. To address these points directly: the jittered intertrial intervals (ITIs) in the fMRI work were deemed necessary to effectively deconvolve the BOLD response (see Stottinger et al., 2018). In our fMRI work, each image was randomly preceded and followed by varying ITIs (2, 4, 6, and 8 seconds), ensuring an equitable distribution across sets and subjects. Importantly, our analysis of both fMRI and behavioral studies, including eye tracking data, indicates that perceptual switch behavior – the point at which switches occur – is consistent across modalities. If more predictive or preparatory activity were present in the fMRI version of the task, we would expect earlier switches or choices and altered reaction time distributions – neither of these signatures was observed in the original study (Stottinger et al., 2018). Importantly, this suggests that the additional time available in the fMRI experiments did not significantly alter behavioral outcomes. Thus, our findings suggest that despite the differences in timing and task structure, the behavioural responses remain consistent across both experimental setups. We will clarify this in the revised manuscript.

      In response to the reviewer's comments on our computational model, particularly regarding the modelling of noradrenaline (NA) effects in the RNN, we agree that modelling gain as stationary is a substantial approximation. However, given the slow ramping of pupil diameter, which served as our proxy for gain, it is an approximation that we believe is justified: in the revised manuscript, we will run additional simulations to ensure the validity of this approximation. In addition, whilst we agree that the model is more complicated than is needed for the task, we opted for RNN modelling, in lieu of a simpler modelling approach, because we wanted to use RNN modelling as a method for both hypothesis testing and generation. To build the RNN, the only key elements of model structure we had to specify in advance were the inputs and the target outputs of the network. The solution the RNN arrived at, although involving many more parameters than a simpler model, was entirely determined by optimisation (i.e., not our a priori hypotheses). We feel that this strengthens the result considerably. Importantly, this approach also allowed us to be surprised by the results of the model – for instance, we did not anticipate that the effect of gain on the energy landscape to be primarily mediated by inhibitory gain. In the revised manuscript, we will integrate this line of thinking into the paper. We are also sensitive to the fact that this result is both counterintuitive and difficult to study in high-dimensional dynamical systems like RNNs. In revisions, we will provide further analysis of the RNN and build a 2D approximation to the RNN that can be studied on the phase plane to better conceptually illuminate the mechanisms at play.

      Furthermore, we agree with the suggestion to consider alternative mechanisms that might contribute to perceptual switches, such as attention and top-down processing. While our study primarily focuses on LC-mediated gain modulation, we acknowledge the complexity of neural processes involved in perception and will expand our discussion to include these potential mechanisms. Furthermore, noting the importance of moderating the causal language used in our manuscript. We will revise our wording to more accurately reflect the correlational nature of our findings and ensure that our conclusions are firmly grounded in the data presented.

      In conclusion, we are enthusiastic about the opportunity to refine our manuscript based on these valuable comments. In an updated version, we will address the overall points by providing clearer explanations of our methods, refining our figures for better readability, and ensuring that our conclusions are supported by robust analysis. We believe that these revisions will not only address the concerns raised but also significantly enhance the overall quality of our research. We thank the reviewers for their thorough and thoughtful critiques and look forward to submitting our revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors explore the effects of DNA methylation on the strength of regulatory activity using massively parallel reporter assays in cell lines on a genome-wide level. This is a follow-up of their first paper from 2018 that describes this method for the first time. In addition to adding more indepth information on sequences that are explored by many researchers using two main methods, reduced bisulfite sequencing and sites represented on the Illumina EPIC array, they now show also that DNA methylation can influence changes in regulatory activity following a specific stimulation, even in absence of baseline effects of DNA methylation on activity. In this manuscript, the authors explore the effects of DNA methylation on the response to Interferon alpha (INFA) and a glucocorticoid receptor agonist (dexamethasone). The authors validate their baseline findings using additional datasets, including RNAseq data, and show convergences across two cell lines. The authors then map the methylation x environmental challenge (IFNA and dex) sequences identified in vitro to explore whether their methylation status is also predictive of regulatory activity in vivo. This is very convincingly shown for INFA response sequences, where baseline methylation is predictive of the transcriptional response to flu infection in human macrophages, an infection that triggers the INF pathways.

      Thank you for your strong assessment of our work!

      The extension of the functional validity of the dex-response altering sequences is less convincing.

      We agree. We note that genes close to dex-specific mSTARR-seq enhancers tend to be more strongly upregulated after dex stimulation than those near shared enhancers, which parallels our results for IFNA (lines 341-344). However, there is unfortunately no comparable data set to the human flu data set (i.e., with population-based whole genome-bisulfite sequencing data before and after dex challenge), so we could not perform a parallel in vivo validation step. We have added this caveat to the revised manuscript (lines 555-557).

      Sequences altering the response to glucocorticoids, however, were not enriched in DNA methylation sites associated with exposure to early adversity. The authors interpret that "they are not links on the causal pathway between early life disadvantage and later life health outcomes, but rather passive biomarkers". However, this approach does not seem an optimal model to explore this relationship in vivo. This is because exposure to early adversity and its consequences is not directly correlated with glucocorticoid release and changes in DNA methylation levels following early adversity could be related to many physiological mechanisms, and overall, large datasets and meta-analyses do not show robust associations of exposure to early adversity and DNA methylation changes. Here, other datasets, such as from Cushing patients may be of more interest.

      Thank you for making these important points. We have expanded the set of caveats regarding the lack of enrichment of early adversity-reported sites in the mSTARR-data set (lines 527-533). Specifically, we note that the relationship between early adversity and glucocorticoid physiology is complex (e.g., Eisenberger and Cole, 2012; Koss and Gunnar, 2018) and that dex challenge models one aspect of glucocorticoid signaling but not others (e.g., glucocorticoid resistance). Nevertheless, we also see little evidence for enrichment of early adversity-associated sites in the mSTARR data set at baseline, independently of the dex challenge experiment (lines 483-485; Figure 4).

      We also agree that large data sets (e.g., Houtepen et al., 2018; Marzi et al., 2018) and reviews (e.g., Cecil et al., 2020) of early adversity and DNA methylation in humans show limited evidence of associations between early adversity and DNA methylation levels. However, the idea that early adversity impacts downstream outcomes remains pervasive in the literature and popular science (see Dubois et al., 2019), which we believe makes tests like ours important to pursue. We also hope that our data set (and others generated through these methods) will be useful in interpreting other settings in which differential methylation is of interest as well—in line with your comment below. We have clarified both of these points in the revised manuscript (lines 520-522; 536-539).

      Overall, the authors provide a great resource of DNA methylation-sensitive enhancers that can now be used for functional interpretation of large-scale datasets (that are widely generated in the research community), given the focus on sites included in RBSS and the Illumina EPIC array. In addition, their data lends support that differences in DNA methylation can alter responses to environmental stimuli and thus of the possibility that environmental exposures that alter DNS methylation can also alter the subsequent response to this exposure, in line with the theory of epigenetic embedding of prior stimuli/experiences. The conclusions related to the early adversity data should be reconsidered in light of the comments above.

      Thank you! And yes, we have revised our discussion of early life adversity effects as discussed above.

      Reviewer #1 (Recommendations For The Authors):

      While the paper has a lot of strengths and provides new insight into the epigenomic regulation of enhancers as well as being a great resource, there are some aspects that would benefit from clarification.

      a. It would be great to have a clearer description of how many sequences are actually passing QC in the different datasets and what the respective overlaps are in bps or 600bp windows. Now often only % are given. Maybe a table/Venn diagram for overview of the experiments and assessed sequences would help here. This concern the different experiments in the K652, A549, and Hep2G cell lines, including stimulations.

      We now provide a supplementary figure and supplementary table providing, for each dataset, the number of 600 bp windows passing each filter (Figure 2-figure supplement 1; Supplementary File 9), as well as a supplementary figure providing an upset plot to show the number of assessed sequences shared across the experiments (Figure 2-figure supplement 2).

      b. It would also be helpful to have a brief description of the main differences in assessed sequences and their coverage of the old (2018) and new libraries in the main text to be able better interpret the validation experiments.

      We now provide information on the following characteristics for the 2018 data set versus the data set presented for the first time here: mean (± SD) number of CpGs per fragment; mean (± SD) DNA sequencing depth; and mean (± SD) RNA sequencing depth (lines 169-170 provide values for the new data set; in line 194, we reference Supplementary File 5, which provides the same values for the old data set). Notably, the coverage characteristics of analyzed windows in both data sets are quite high (mean DNA-seq read coverage = 94x and mean RNA-seq read coverage = 165x in the new data set at baseline; mean DNA-seq read coverage = 22x and mean RNA-seq read coverage = 54x in Lea et al. 2018).

      c. Statements of genome-wide analyses in the abstract and discussion should be a bit tempered, as quite a number of tested sites do not pass QC and do not enter the analysis. From the results it seems like from over 4.5 million sequences, only 200,000 are entering the analysis.

      The reason why many of the windows are not taken forward into our formal modeling analysis is that they fail our filter for RNA reads because they are never (or almost never) transcribed—not because there was no opportunity for transcription (i.e., the region was indeed assessed in our DNA library, and did not show output transcription, as now shown in Figure 2-figure supplement 1). We have added a rarefaction analysis (lines 715-722 in Materials and Methods) of the DNA fragment reads to the revised manuscript which supports this point. Specifically, it shows that we are saturated for representation of unique genomic windows (i.e., we are above the stage in the curve where the proportion of active windows would increase with more sequencing: Figure 1figure supplement 4). Similarly, a parallel rarefaction curve for the mSTARR-seq RNA-seq data (Figure 1-figure supplement 4) shows that we would gain minimal additional evidence for regulatory activity with more sequencing depth. We now reference these analyses in revised lines 179-184 and point to the supporting figure in line 182.

      In other words, our analysis is truly genome-wide, based on the input sequences we tested. Most of the genome just doesn’t have regulatory activity in this assay, despite the potential for it to be detected given that the relevant sequences were successfully transfected into the cells.

      d. Could the authors comment on the validity of the analysis if only one copy is present (cut-off for QC)?

      We think this question reflects a misunderstanding of our filtering criteria due to lack of clarity on our part, which we have modified in the revision. We now specify that the mean DNA-seq sequencing depth per sample for the windows we subjected to formal modeling was quite high:

      93.91 ± 10.09 SD (range = 74.5 – 113.5x) (see revised lines 169-170). In other words, we never analyze windows in which there is scant evidence that plasmids containing the relevant sequence were successfully transfected (lines 170-172).

      Our minimal RNA-seq criteria require non-zero counts in at least 3 replicate samples within either the methylated condition or the unmethylated condition, or both (lines 166-168). Because we know that multiple plasmids containing the corresponding sequence are present for all of these windows—even those that just cross the minimal RNA-seq filtering threshold—we believe our results provide valid evidence that all analyzed windows present the opportunity to detect enhancer activity, but many do not act as enhancers (i.e., do not result in transcribed RNA). Notably, we observe a negligible correlation between DNA sequencing depth for a fragment, among analyzed windows, and mSTARR-seq enhancer activity (R2 = 0.029; now reported in lines 183-184). We also now report reproducibility between replicates, in which all replicate pairs have r > 0.89, on par with previously published STARR-seq datasets (e.g., Klein et al., 2020; Figure 1-figure supplement 6, pointed to in line 193).

      e. While the authors state that almost all of the control sequences contain CpGs sites, could the authors also give information on the total number of CpG sites in the different subsets? Was the number of CpGs in a 600 bp window related to the effects of DNA methylation on enhancer activity?

      We now provide the number of CpG sites per window in the different subsets in lines 282-284. As expected, they are higher for EPIC array sites and for RRBS sites because the EPIC array is biased towards CpG-rich promoter regions, and the enzyme typically used in the starting step of RRBS digests DNA at CpG motifs (but control sequences still contain an average of ~13 CpG sites per fragment). We also now model the magnitude of the effects of DNA methylation on regulatory activity as a function of number of CpG sites within the 600 bp windows. Consistent with our previous work in Lea et al., 2018, we find that mSTARR-seq enhancers with more CpGs tend to be repressed by DNA methylation (now reported in lines 216-219 and Figure 1figure supplement 11).

      f. In the discussion, a statement on the underrepresented regions, likely regulatory elements with lower CG content, that nonetheless can be highly relevant for gene regulation would be important to put the data in perspective.

      Thanks for this suggestion. We agree that regulatory regions, independent of CpG methylation, can be highly relevant, and now clarify in the main text that the “unmethylated” condition of mSTARR-seq is essentially akin to a conventional STARR-seq experiment, in that it assesses regulatory activity regardless of CpG content or methylation status (lines 128-130).

      Consequently, our study is well-designed to detect enhancer-like activity, even in windows with low GC content. We now show with additional analyses that we generated adequate DNA-seq coverage on the transfected plasmids to analyze 90.2% of the human genome, including target regions with no or low CpG content (lines 148-149; 153-156; Supplementary file 2). As noted above, we also now clarify that regions dropped out of our formal analysis because we had little to no evidence that any transcription was occurring at those loci, not because sequences for those regions were not successfully transfected into cells (see responses above and new Figure 1-figure supplement 4 and Figure 2-figure supplement 1).

      g. To control for differences in methylation of the two libraries, the authors sequence a single CpGs in the vector. Could the authors look at DNA methylation of the 600 bp windows at the end of the experiment, could DNA methylation of these windows be differently affected according to sequence? 48 hours could be enough for de-methylation or re-methylation.

      We agree that variation in demethylation or remethylation depending on fragment sequence is possible. We now state this caveat in the main text (lines 158-159), and specify that genomic coverage of our bisulfite sequencing data across replicates are (unfortunately) too variable to perform reliable site-by-site analysis of DNA methylation levels before and after the 48 hour experiment (lines 1182-1185). Instead, we focus on a CpG site contained in the adapter sequence (and thus included in all plasmids) to generate a global estimate of per replicate methylation levels. We also now note that any de-methylation or re-methylation would reduce our power to detect methylation-dependent activity, rather than leading to false positives (lines 163-165).

      h. The section on the method for correction for multiple testing should be more detailed as it is very difficult to follow. Why were only 100 permutations used, the empirical p-value could then only be <0.01? The description of a subsample of the N windows with positive Betas is unclear, should the permutation not include the actual values and thus all windows - or were the no negative Betas? Was FDR accounting for all elements and pairs?

      We have now expanded the text in the Materials and Methods section to clarify the FDR calculation (lines 691, 695-699, 702, 706). We clarify that the 100 permutations were used to generate a null distribution of p-values for the data set (e.g., 100 x 17,461 p-values for the baseline data set), which we used to derive a false discovery rate. Because we base our evidence on FDRs, we therefore compare the distribution of observed p-values to the distribution of pvalues obtained via permutation; we do not calculate individual p-values by comparing an observed test statistic against the test statistics for permuted data for that individual window.

      We compare the data to permutations with only positive betas because in the observed data, we observe many negative betas. These correspond to windows which have no regulatory activity (i.e., they have many more input DNA reads than RNA-seq reads) and thus have very small pvalues in a model testing for DNA-RNA abundance differences. However, we are interested in controlling the false discovery rate of windows that do have regulatory activity (positive betas). In the permuted data, by contrast and because of the randomization we impose, test statistics are centered around 0 and essentially symmetrical (approximately equally likely to be positive or negative). Retaining all p-values to construct the null therefore leads to highly miscalibrated false discovery rates because the distribution of observed values is skewed towards smaller values— because of windows with “significantly” no regulatory activity—compared to the permuted data. We address that problem by using only positive betas from the permutations.

      i. The interpretation of the overlap of Dex-response windows with CpGs sites associated with early adversity should be revisited according to the points also mentioned in the public review and the authors may want to consider exploring additional datasets with other challenges.

      Thank you, see our responses to the public review above and our revisions in lines (lines 555559). We agree that comparisons with more data sets and generation of more mSTARR-seq data in other challenge conditions would be of interest. While beyond the scope of this manuscript, we hope the resource we have developed and our methods set the stage for just such analyses.

      Reviewer #2 (Public Review):

      This work presents a remarkably extensive set of experiments, assaying the interaction between methylation and expression across most CpG positions in the genome in two cell types. To this end, the authors use mSTARR-seq, a high-throughput method, which they have previously developed, where sequences are tested for their regulatory activity in two conditions (methylated and unmethylated) using a reporter gene. The authors use these data to study two aspects of DNA methylation:

      1) Its effect on expression, and 2. Its interaction with the environment. Overall, they identify a small number of 600 bp windows that show regulatory potential, and a relatively large fraction of these show an effect of methylation on expression. In addition, the authors find regions exhibiting methylation-dependent responses to two environmental stimuli (interferon alpha and glucocorticoid dexamethasone).

      The questions the authors address represent some of the most central in functional genomics, and the method utilized is currently the best method to do so. The scope of this study is very impressive and I am certain that these data will become an important resource for the community. The authors are also able to report several important findings, including that pre-existing DNA methylation patterns can influence the response to subsequent environmental exposures.

      Thank you for this generous summary!

      The main weaknesses of the study are: 1. The large number of regions tested seems to have come at the expense of the depth of coverage per region (1 DNA read per region per replicate). I have not been convinced that the study has sufficient statistical power to detect regulatory activity, and differential regulatory activity to the extent needed. This is likely reflected in the extremely low number of regions showing significant activity.

      We apologize for our lack of clarity in the previous version of the manuscript. Nonzero coverage for half the plasmid-derived DNA-seq replicates is a minimum criterion, but for the baseline dataset, the mean depth of DNA coverage per replicate for windows passing the DNA filter is quite high: 12.723 ± 41.696 s.d. overall, and 93.907 ± 10.091 s.d. in the windows we subjected to full analysis (i.e., windows that also passed the RNA read filter). We now provide these summary statistics in lines 148-149 and 169-170 and Supplementary file 5 (see also our responses to Reviewer 1 above). We also now show, using a rarefaction analysis, that our data set saturates the ability to detect regulatory windows based on DNA and RNA sequencing depth (new Figure 1-figure supplement 4; lines 179-184; 715-722).

      2) Due to the position of the tested sequence at the 3' end of the construct, the mSTARR-seq approach cannot detect the effect of methylation on promoter activity, which is perhaps the most central role of methylation in gene regulation, and where the link between methylation and expression is the strongest. This limitation is evident in Fig. 1C and Figure 1-figure supplement 5C, where even active promoters have activity lower than 1. Considering these two points, I suspect that most effects of methylation on expression have been missed.

      Thank you for pointing this out. We agree that we have not exhaustively detected methylationdependent activity in all promoter regions, given that not all promoter regions are active in STARR-seq. However, there is good evidence that some promoter regions can function like enhancers and thus be detected in STARR-seq-type assays (Klein et al., 2020). This important point is now noted in lines 187-189; an example promoter showing methylation-dependent regulatory activity in our dataset is shown in Figure 3E.

      We also now clarify that Figure 1C shows significant enrichment of regulatory activity in windows that overlap promoter sequence (line 239). The y-axis is not a measure of activity, but rather the log-transformed odds ratio, with positive values corresponding to overrepresentation of promoter sequences in regions of mSTARR-seq regulatory activity. Active promoters are 1.640 times more likely to be detected with regulatory activity than expected by chance (p = 1.560 x 10-18), which we now report in a table that presents enrichment statistics for all ENCODE elements shown in Figure 1C for clarity (Supplementary file 4). Moreover, 74.1% of active promoters that show regulatory activity have methylation-dependent activity, also now reported in Supplementary file 4.

      Overall, the combination of an extensive resource addressing key questions in functional genomics, together with the findings regarding the relationship between methylation and environmental stimuli makes this a key study in the field of DNA methylation.

      Thank you again for the positive assessment!

      Reviewer #2 (Recommendations For The Authors):

      I suggest the authors conduct several tests to estimate and/or increase the power of the study:

      1) To estimate the potential contribution of additional sequencing depth, I suggest the authors conduct a downsampling analysis. If the results are not saturated (e.g., the number of active windows is not saturated or the number of differentially active windows is not saturated), then additional sequencing is called for.

      We appreciate the suggestion. We have now performed a downsampling/rarefaction curve analysis in which we downsampled the number of DNA reads, and separately, the number of RNA reads. We show that for both DNA-seq depth and RNA-seq depth, we are within the range of sequencing depth in which additional sequencing would add minimal new analysis windows in the dataset (Figure 1-figure supplement 4; lines 179-184; 715-722).

      2) Correlation between replicates should be reported and displayed in a figure because low correlations might also point to too few reads. The authors mention: "This difference likely stems from lower variance between replicates in the present study, which increases power", but I couldn't find the data.

      We now report the correlations between RNA and DNA replicates within the current dataset and within the Lea et al., 2018 dataset (Figure 1-figure supplement 6). The between-replicate correlations in both our RNA libraries and DNA libraries are consistently high (r ≥ 0.89).

      3) The correlation between the previous and current K562 datasets is surprisingly low. Given that these datasets were generated in the same cell type, in the same lab, and using the same protocol, I expected a higher correlation, as seen in other massively parallel reporter assays. The fact that the correlations are almost identical for a comparison of the same cell and a comparison of very different cell types is also suspicious.

      Thanks for raising this point. We think it is in reference to our original Figure 1-Figure supplement 6, for which we now provide Pearson correlations in addition to R2 values (now Figure 1-Figure supplement 8). We note that this is not a correlation in raw data, but rather the correlation in estimated effect sizes from a statistical model for methylation-dependent activity. We now provide Pearson correlations for the raw data between replicates within each dataset (Figure 1-Figure supplement 6), which for the baseline dataset are all r > 0.89 for RNA replicates and r > 0.98 for DNA replicates, showing that replicate reproducibility in this study is on par with other published studies (e.g., Klein et al., 2020 report r > 0.89 for RNA replicates and r > 0.91 for DNA replicates).

      We do not know of any comparable reports in other MPRAs for effect size correlations between two separately constructed libraries, so it’s unclear to us what the expectation should be. However, we note that all effect sizes are estimated with uncertainty, so it would be surprising to us to observe a very high correlation for effect sizes in two experiments, with two independently constructed libraries (i.e., with different DNA fragments), run several years apart—especially given the importance of winner’s curse effects and other phenomena that affect point estimates of effect sizes. Nevertheless, we find that regions we identify as regulatory elements in this study are 74-fold more likely to have been identified as regulatory elements in Lea et al., 2018 (p < 1 x10-300).

      4) The authors cite Johnson et al. 2018 to support their finding that merely 0.073% of the human genome shows activity (1.7% of 4.3%), but:

      a. the percent cited is incorrect: this study found that 27,498 out of 560 million regions (0.005%) were active, and not 0.165% as the authors report.

      We have modified the text to clarify the numerator and denominator used for the 0.165% estimate from Johnson et al 2018 (lines 175-176). The numerator is their union set of all basepairs showing regulatory activity in unstimulated cells, which is 5,547,090 basepairs. The denominator is the total length of the hg38 human genome, which is 3,298,912,062 basepairs.

      Notably, the denominator (the total human genome) is not 560 million—while Johnson et al (2018) tested 560 million unique ~400 basepair fragments, these fragments were overlapping, such that the 560 million fragments covered the human genome 59 times (i.e., 59x coverage).

      b. other studies that used massively parallel reporter assays report substantially higher percentages, suggesting that the current study is possibly underpowered. Indeed, the previous mSTARR-seq found a substantially larger percentage of regions showing regulatory activity (8%). The current study should be compared against other studies (preferably those that did not filter for putatively active sequences, or at least to the random genomic sequences used in these studies).

      We appreciate this point and have double checked comparisons to Johnson et al., 2018 and Lea et al., 2018. Our numbers are not unusual relative to Johnson et al., 2018 (0.165%), which surveyed the whole genome. Also, in comparing to the data from Lea et al., 2018, when processed in an identical manner (our criteria are more stringent here), our values of the percent of the tested genome showing significant regulatory activity are also similar: 0.108% in the Lea et al., 2018 dataset versus 0.082% in the baseline dataset. Finally, our rarefaction analyses (see our responses above) indicate that we are not underpowered based on sequencing depth for RNA or DNA samples. We also note that there are several differences in our analysis pipeline from other studies: we use more technical replicates than is typical (compare to 2-5 replicates in Arnold et al., 2013; Johnson et al., 2018; Muerdter et al., 2018), we measure DNA library composition based on DNA extracted from each replicate post-transfection (as opposed to basing it on the pre-transfection library: [Johnson et al., 2018], and we use linear mixed models to identify regulatory activity as opposed to binomial tests [Johnson et al., 2018; Arnold et al., 2013; Muerdter et al., 2018].

      I find it confusing that the four sets of CpG positions used: EPIC, RRBS, NR3C1, and random control loci, add up together to 27.3M CpG positions. Do the 600 bp windows around each of these positions sufficient to result in whole-genome coverage? If so, a clear explanation of how this is achieved should be added.

      Thanks for this comment. Although our sequencing data are enriched for reads that cover these targeted sites, the original capture to create the input library included some off target reads (as is typical of most capture experiments, which are rarely 100% efficient). We then sequenced at such high depth that we ultimately obtained sequencing coverage that encompassed nearly the whole genome. We now clarify in the main text that our protocol assesses 27.3 million CpG sites by assessing 600 bp windows encompassing 93.5% of all genomic CpG sites (line 89), which includes off-target sites (line 149).

      scatter plot showing the RNA to DNA ratios of the methylated (x-axis) vs unmethylated (y-axis) library would be informative. I expect to see a shift up from the x=y diagonal in the unmethylated values.

      We have added a supplementary figure showing this information, which shows the expected shift upwards (Figure 1-figure supplement 9).

      Another important figure missing is a histogram showing the ratios between the unmethylated and methylated libraries for all active windows, with the significantly differentially active windows marked.

      We have added a supplementary figure showing this information (Figure 1-Supplementary Figure 10).

      Perhaps I missed it, but what is the distribution of effect sizes (differential activity) following the various stimuli?

      This information is provided in table form in Supplementary Files 3, 10, and 11, which we now reference in the Figure 2 legend (lines 365-366).

      Minor changes

      It is unclear what the lines connecting the two groups in Fig.3C represent, as these are two separate groups of regions.

      We now clarify in the figure legend that values connected by a line are the same regions, not two different sets of regions. They show the correlation between DNA methylation and gene expression at mSTARR-seq-identified enhancers in individuals before and after IAV stimulation, separately for enhancers that are shared between conditions (left) versus those that are IFNAspecific (right). The two plots therefore do show two different sets of regions, which we have depicted to visualize the contrast in the effect of stimulation on the correlation on IFNA-specific enhancers versus shared enhancers. We have revised the figure legend to clarify these points (line 458-460).

      L235-242 are unclear. Specifically - isn't the same filter mentioned in L241-242 applied to all regions?

      Yes, the same filter for minimal RNA transcription was applied to all regions. We have modified the text (lines 264-265, 271, 275-277) to clarify that the enrichment analyses were performed twice, to test whether the target types were: 1) enriched in the dataset passing the RNA filter (i.e., the dataset showing plasmid-derived RNA reads in at least half the sham or methylated replicates; n = 216,091 windows) and 2) enriched in the set of windows showing significant regulatory activity (at FDR < 1%; n = 3,721 windows).

      To improve cohesiveness, the section about most CpG sites associated with early life adversity not showing regulatory activity in K562s can be moved to the supplementary in my opinion.

      Thank you for this suggestion. Because ELA and the biological embedding hypothesis (via DNA methylation) were major motivations for our analysis (see Introduction lines 42-48; 75-79), and we also discuss these results in the Discussion (lines 518-520), we have respectfully elected to retain this section in the main manuscript. We have added text in the Discussion explaining why we think experimental tests of methylation effects on regulation are relevant to the literature on early life adversity (lines 520-522), and have added discussion on limits to these analyses (lines 527-533).

      References:

      Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A (2013) Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science, 339, 1074-1077.

      Cecil CA, Zhang Y, Nolte T (2020) Childhood maltreatment and DNA methylation: A systematic review. Neuroscience & Biobehavioral Reviews, 112, 392-409.

      Dubois M, Louvel S, Le Goff A, Guaspare C, Allard P (2019) Epigenetics in the public sphere: interdisciplinary perspectives. Environmental Epigenetics, 5, dvz019.

      Eisenberger NI, Cole SW (2012) Social neuroscience and health: neurophysiological mechanisms linking social ties with physical health. Nature neuroscience, 15, 669-674.

      Houtepen L, Hardy R, Maddock J, Kuh D, Anderson E, Relton C, Suderman M, Howe L (2018) Childhood adversity and DNA methylation in two population-based cohorts. Translational Psychiatry, 8, 1-12.

      Johnson GD, Barrera A, McDowell IC, D’Ippolito AM, Majoros WH, Vockley CM, Wang X, Allen AS, Reddy TE (2018) Human genome-wide measurement of drug-responsive regulatory activity. Nature communications, 9, 1-9.

      Klein JC, Agarwal V, Inoue F, Keith A, Martin B, Kircher M, Ahituv N, Shendure J (2020) A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nature Methods, 17, 1083-1091.

      Koss KJ, Gunnar MR (2018) Annual research review: Early adversity, the hypothalamic–pituitary– adrenocortical axis, and child psychopathology. Journal of Child Psychology and Psychiatry, 59, 327-346.

      Marzi SJ, Sugden K, Arseneault L, Belsky DW, Burrage J, Corcoran DL, Danese A, Fisher HL, Hannon E, Moffitt TE (2018) Analysis of DNA methylation in young people: limited evidence for an association between victimization stress and epigenetic variation in blood. American journal of psychiatry, 175, 517-529.

      Muerdter F, Boryń ŁM, Woodfin AR, Neumayr C, Rath M, Zabidi MA, Pagani M, Haberle V, Kazmar T, Catarino RR (2018) Resolving systematic errors in widely used enhancer activity assays in human cells. Nature methods, 15, 141-149.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1) Can the authors statistically define the egg-laying classes? In some parts of the manuscript, the division between the different classes could be more ambiguous. I understand that the class III strains are divided by the kcnl-1 genotype, but given the different results for diverse traits, it could be more clear to keep them as one class. Also, overall, the authors choose a collection of 15 strains across the different classes to phenotype for many traits and perform genome edits. It is understandable that they cannot test all strains, but given the variation across traits and classes, it might be good to add a few more caveats about how these strains might not be representative of all strains across the species.

      Response: The egg-laying classes were defined as in Figure 1A by arbitrarily chosen cut-offs (at 10, 10-25, and 25 eggs in utero) to simplify subsequent analyses. We added this explanation to the first paragraph of the results section. However, the differences in average egg retention are significantly different between the four defined classes using the 15 selected strains (Fig. 2A).

      We think that the distinction between Class IIIA and IIIB strains is important and justified because the two Classes significantly differ in mean egg retention (Fig. 2A) and because Class IIIB harbour the large-effect variant KCNL-1 V530L whereas Class IIIA do not.

      We agree that the 15 selected strains are not necessarily representative of all strains across the species. We have added a note of caution regarding this point to the first paragraph of the section “Temporal progression of egg retention and internal hatching”: “Note that this strain selection, especially concerning the largest Class II, is unlikely to reflect the overall strain diversity observed across the species". In addition, we have reworded the first sentence of this paragraph as follows: “ To better characterize natural variation in C. elegans egg retention, we focused on a subset of 15 strains from divergent phenotypic Classes I-III, with an emphasis on Class III strains exhibiting strong egg retention (at mid-L4 + 30h) (Fig. 2A and 2B).”

      2) For the GWAS experiments, the authors should describe if any of the QTL overlap with hyper-divergent regions in the strain set. The QTL could be driven by these less well defined regions.

      Response: We have added the following sentence: “The three QTLs do not align with any of the recently identified hyper-divergent regions of the genome (Lee et al., 2021).

      3) The authors should look at correlations between the mod-5(n822) edit phenotypes and the exogenous 5-HT and SSRI phenotypes to demonstrate how the traits can differ. Some correlation plots might help that point as well.

      Response: We examined all possible correlations as suggested: none are significant and strain effects on trait differences are idiosyncratic, as written in our results section. The correlational analyses remain of limited value due to small samples: N=10 for mean strain values for measured phenotypes. We therefore feel that these analyses do not provide any additional insights beyond our figures (4C, 4D, 5C, 5D, S5A-C ) and our statement on page 15: “As in previous experiments (Fig. 4C and 5C), we find again that strains sharing the same egg retention phenotype may differ strongly in egg-laying behaviour in response to modulation of both exo- and endogenous serotonin levels (Class IIIA: ED3005 and JU2829) (Fig. 5D and S5C).”

      4) Figure 6D, was there any censoring of the data? Normally, these types of studies are plagued by an increase in censored animals that can decrease significance. The effects among the classes seem large, but statistical comparisons might help as well.

      Response: There was no censoring of animals (censoring of animals in lifespan studies is usually done by removing “bags of worms”, which here was our study phenotype). We now mention this in the corresponding figure legend. We also added a statistical analysis showing that mean survival was significantly different between all Classes.

      5) Many of the traits, edits, and deeper analyses are performed on the JU751 genetic background. This choice is sensible, otherwise, the work can increase exponentially. However, the authors should add a caveat about how these results might be limited to JU751 and other strains might respond differently.

      Response: For certain experiments, it was not feasible to include multiple strains from all phenotypic classes, so we selected JU751 (Class IIIB) and JU1200 (Class II), for which we had established CRISPR-engineered lines to modulate the egg retention phenotype by a single amino acid change in KCNL-1. To emphasize that these experimental observations cannot be generalized, we added the following statement in the relevant results section: “These experimental results offer preliminary evidence (bearing in mind that our analysis was primarily centered on a single genetic background) that laying of advanced-stage embryos may enhance intraspecific competitive ability, particularly in scenarios where multiple genotypes compete for colonization and exploitation of limited, patchily distributed resources.”

      6) The authors argue that evolution could be acting on specific parts of the egg-laying machinery (e.g., muscledirected signaling components). It might be useful to look at levels of standing variation and selection at groups of loci compared to genomic controls to see if this conclusion can be strengthened.

      Response: This is a good idea but how to select pertinent candidate loci is unclear (there are over 300 genes with effects on egg laying, www.wormbase.org). In addition, the genetics of muscle-directed signalling components in egg laying is only starting to be explored, with no specific candidate genes having been identified (Medrano & Collins, 2023, Curr Biol). We therefore think that such an analysis is currently not possible.

      7) Completely optional: The authors present a compelling and interesting case for transitions and trade-offs between oviparity and viviparity. The C. vivipara species has a different egg-laying mode than other Caenorhabditis species. The authors could add a short section describing their expectations about the neuronal morphology, 5-HT circuits, and muscle function in this species given their results. What genes or circuits should be the focus of future studies to address this question in Caenorhabditis. Also, Loer and Rivard present some similar ideas based on the differences in 5-HT staining neurons across diverse nematodes. Those results can be incorporated and discussed as well.

      Response: Our current research focuses on the evolution of egg laying in different Caenorhabditis species. So far, however, it remains difficult to provide specific hypotheses on how the egg-laying circuit has changed in C. vivipara. We rephrased the final paragraph of the discussion to incorporate some of the reviewer’s suggestions: “Nematodes display frequent transitions from oviparity to obligate viviparity in many distinct genera (Sudhaus, 1976; Ostrovsky et al., 2015), including in the genus Caenorhabditis, with at least one viviparous species, C. vivipara (Stevens et al., 2019). Although evidence exists for the evolution of egg-laying circuitry across oviparous Caenorhabditis species (Loer and Rivard, 2007), the specific cellular and genetic changes responsible for the transition to obligate viviparity in C. vivipara have yet to be examined. Resolving the genetic basis of intraspecific variation in C. elegans egg retention, including partial or facultative viviparity, may thus shed light on the molecular changes underlying the initial steps of evolutionary transitions from oviparity to obligate viviparity in invertebrates.”

      Specific edits:

      1) Perhaps a silly point, but "parity" (to my knowledge) does not have a biological meaning on its own. I suggest "egg-laying mode" or "birth mode".

      Response: This term has been used previously in the literature (e.g.https://onlinelibrary.wiley.com/doi/10.1111/jeb.13886 or https://doi.org/10.1101/2023.10.22.563505). However, as the referee rightly points out, this is not a standard term. We therefore replaced “parity mode” with “egg-laying mode”.

      2) "Against fluctuating environmental fluctuations" is a bit strange

      Response: Corrected.

      3) The first publications of Egl mutants were by the Horvitz lab so some citations are not in all of the first descriptions of the trait (early in Results)

      Response: We have added the relevant work (Trent 1982, Trent 1983, Desai & Horvitz 1989) to this paragraph in the early results section.

      4) "Strong egg retention usually strongly..." is a bit strange

      Response: Corrected.

      1. Figure 8G font looks smaller than the others.

      Response: Corrected.

      Reviewer #2:

      1) In Figure 1A, I infer that in the graph class I measurements are represented by dark blue dots and class II by purple dots. I am having a really hard time distinguishing between these two colors in the graph. In the pie chart I have no problem, but in the graph the black lines around the colored dots seem to obscure the colors. Not sure how to fix this graphical problem, but it is preventing the graph from communicating the results effectively.

      Response: We have changed the colours, spacing and format of this figure to resolve this problem.

      2) The behavioral analysis of Figure 3B-3F is problematic. The experimental methods used and the interpretation of the results each have issues. This is cause for concern since this is the most direct analysis of the actual variations in egg-laying behavior across strains presented in this paper.

      This experiment is modeled after the work of Waggoner et al. 1998, who recorded egg laying events of individual worms on video over several hours and noted the exact time of individual egg laying events. Waggoner et al. found in the reference C. elegans strain N2 that egg-laying events occurred in ~2 minute clusters ("active phases") separated by ~20 minute silent periods ("inactive phases"). Mignerot et al. did not take continuous videos of animals, but rather examined plates bearing a single worm only every 5 minutes and noted the number of new eggs that appeared on the plate in each 5-minute interval. From these data, the authors claim they have measured the intervals between "egg-laying phases" (the term used in the Figure 3 legend). In the Results, the authors explicitly claim they are measuring the timing and frequency of actual active and inactive egg-laying phases. Apparently, all the eggs laid within one 5-minute interval are considered to have been laid in a single active phase, and the time between 5-minute intervals containing egg laying events is considered an "inactive phase" and is measured only with a resolution of 5 minutes. It is not explained anywhere how the authors handle the situation of seeing eggs laid in two consecutive 5-minute intervals. Is that one active phase that is 10 minutes long, or is that two separate active phases with a 5-minute active phase in between? Because of this ambiguity in how they define active and inactive phases, I find it impossible to understand and judge the data presented in Fig. 3D-3F. The authors in the results state that "Class I and Class IIIB displayed significantly accelerated and reduced egg laying activity respectively (Fig. 3C to 3E)" . I assume they are referring to the statistical analysis described in the figure legend, which is quite difficult to understand. Frankly, just looking at the graphs in Fig. 3D3F, it is hard for the reader to identify specific features shown in the graphs can explain why, for example, Class I strains have fewer retained eggs than Class III strains. So, I found this analysis very unsatisfying.

      I also feel the authors are making an unwarranted assumption that their non-N2 strains will have distinguishable active and inactive phases of egg-laying behavior analogous to those seen in the N2 strain. Given the possibly large variations in egg-laying behavior in the various strains examined, that assumption should be questioned. Thus, framing the entire analysis of behavior patterns in terms of the length of active and inactive phases might not be appropriate.

      Response: This comment validly highlights important problems and limitations of our scan-sampling method to quantify strain differences in egg-laying behaviour. We acknowledge that we failed to present the data with due diligence, and clarity regarding terminology and interpretation. However, we think that some of these results are still of value after revised presentation. Our biggest mistake was to use the terms “active and inactive phase”, as coined by Waggoner et al. 1998. We are aware that our measures are not equivalent to these previously defined measures but have been sloppy with terminology. We therefore carefully reworded this entire results section, using clear definitions to indicate differences between the Waggoner assay and our assay (including a graphical representation of our assay design in the revised Fig. 3B). In brief, our simplified assay is useful to estimate the frequency and approximate duration of prolonged inactive periods of egg laying because we can unambiguously determine intervals in which eggs were laid or not. In contrast, as pointed out by the reviewer, we cannot determine if multiple active phases occurred within a 5-min interval, nor can we estimate the duration of an active “phase”. We now state this limitation explicitly in the manuscript. What our results do show is that the number of intervals during which egg laying occurred is significantly different between strains and Classes: Class I (low retention) have a higher number of intervals with egg-laying events, whereas Class IIIB showed a reduced number of such events (Fig. 3D). We can therefore also roughly estimate the mean time (per individual) between two egg-laying intervals, giving us a proxy for prolonged periods when egg-laying is inactive (Fig. 3E); we note that our estimate for N2 is very close to what has been previously measured (~20 min). Therefore, we can confidently conclude that there are natural strains which have both shorter (Class I) and longer (Class IIIB) inactive periods of egg laying. These results partly align with observed variation in egg retention. However, we agree with the reviewer – as we had stated both in results and discussion sections – that these behavioural differences act together with differences in the sensing of egg accumulation in utero (as suggested by results shown in Fig. 3G and 3H). We also agree that it seems very plausible that the observed behavioural differences, as revealed by scan-sampling, may only have a secondary role in accounting for natural variation in egg retention. We will be testing these hypotheses specifically in our future research.

      Note: The statistical analyses are nested ANOVAs to ask (a) does the value differ between strains within a given class and (b) does the value differ between Classes? Classes labelled with different letters in the figures therefore significantly differ in their mean values, demonstrating that measured behavioural phenotypes consistently differ between some (but not all) phenotypic classes, yet largely in line with their egg retention phenotypes (Fig. 3D and 3E).

      3) Figure 4A is a schematic diagram of how the egg-laying circuit works based on previous literature, and the authors cite Collins et al. 2015 and Kopchock et al. 2021 as their sources. One feature of this figure seems unwarranted, namely the part indicating that egg accumulation acts on the UM muscles, and the statement in the legend that "mechanical excitation of uterine muscles (UM) in response to egg accumulation favours exit from the inactive state (Collins et al., 2016)". I believe Collins et al. 2016 showed that egg accumulation favors egg laying and may have speculated that it does so by stretching the um muscles, but this idea remains speculative and has not been established by any experimental data. I point out this issue,in particular, because it may bear on the nice data the authors of this manuscript show in Figure 3G and 3H, which show that some strains accumulate many eggs in the uterus before they initiate egg laying.

      Also, in Figure 4A and 4B, the legend does not explain the logic of the green areas labeled "egg-laying active phase" and the yellow area labeled "egg-laying inactive state". I was not sure what sure how to interpret these features of the graphics.

      Response: The input from uterine muscles remains indeed hypothetical, and we have corrected the figure accordingly, now simply referring to the feedback of egg accumulation on egg laying activity, as recently characterized in more detail by Medrano & Collins (2023, Curr Biol).

      The green/yellow backgrounds shown in figures 4A (and 4B) are not useful and we have removed them.

      4) Results, page 11: "We used standard assays, in which animals are reared in liquid M9 buffer without bacterial food." In the standard assays, animals are reared on NGM agar plates with bacterial food, and then at the start of the egg-laying assay, are transferred to liquid M9 buffer without bacterial food. I assume that is what these authors did, and they should correct the language of the text to make it more accurate.

      Response: The reviewer is correct. We have incorporated this change to improve accuracy.

      5) The authors note that "serotonin induced a much stronger egg-laying responds in the Class IIIA strain ED3005 than in other strains (Fig. 4C)". I would like to point out to the authors that strains such as ED3005 that have a very large number of unlaid eggs in their uterus are prone to lay a very large number of eggs when treated with exogenous serotonin, simply for the trivial reason that they have more eggs to release. This was previously seen in, for example, in Desai and Horvitz (1989) in certain egg-laying defective mutants.

      Response: This is an important point and our comparison of ED3005 to ALL other strains is problematic. We changed this result description by stating that ED3005 shows possible serotonin hypersensitivity compared to strains with similar levels of egg retention (Class IIIA): “In addition, serotonin induced a much stronger egg-laying response in the strain ED3005 than in other Class IIIA strains with similar levels of egg retention (Fig. 4B). ED3005 may thus exhibit serotonin hypersensitivity, which has been observed in certain egg-laying mutants where perturbed synaptic transmission impacts serotonin signalling (Schafer and Kenyon, 1995; Schafer et al., 1996).”

      6) In Figure 4 the authors show that all strains lay eggs in response to fluoxetine and imipramine, but some strains (Class IIIB) do not lay eggs in response to serotonin. They then cite a series of papers, starting with Trent et al. 1983, that they claim show that this specific phenotype demonstrates that the HSN neurons are functionally releasing serotonin (bottom of page 11). This statement needs to be removed - it is incorrect. It is true that egg laying in response to fluoxetine and/or imipramine AS WELL AS egg laying in response to serotonin has been interpreted as indicating the presence of HSN neurons that functionally release serotonin to stimulate egg laying (these were referred to as Category C by Trent et al., 1983). However, the mutants that Mignerot et al. are talking about (those that don't respond to serotonin but do respond to imipramine/fluoxetine) were called Category D by Trent et al., 1983, and to my knowledge these have never been interpreted as necessarily having functionally intact HSN neurons. Mutants such as these that can lay eggs in some circumstances but cannot lay eggs in response to exogenous serotonin have usually been interpreted as having egg-laying muscles that are defective in responding to serotonin.

      How can we interpret strains that respond to imipramine/fluoxetine and not serotonin? Mignerot et al. cite some of the papers (Kullyev et al. 2010; Wenishenker et al., 1999; Yue et al., 2018) showing that imipramine and fluoxetene have off-target effects and can stimulate egg laying by acting through proteins other than the serotonin-reuptake inhibitor. The authors later in their discussion at the top of Page 24 also cite Dempsey et al 2005, a paper that also argues that imipramine and fluoxetene act via off target effects. However, currently in Figure 4B Mignerot et al. emphasize that the serotonin reuptake inhibitor is the target of these drugs. Since the results presented for Class IIIB strains are not in accord with this interpretation, this seems misleading to me. The bottom line for me is that class IIIB strains cannot respond to exogenous serotonin, but can lay eggs in other conditions, so perhaps there is something specifically wrong with their ability to respond to serotonin.

      Response: We thank the reviewer for this important comment – we misinterpreted some of these past findings and our statements were either inexact or incorrect. We have revised this section accordingly: “Both drugs also stimulated egg laying in the Class IIIB strains and the Class IIIA strain JU2829 for which exogenous serotonin either inhibited egg laying or had no effect on it (Fig. 4B). In the past, mutants unresponsive to serotonin yet responsive to other drugs, including fluoxetine and imipramine, have been interpreted as being defective in the serotonin response of vulval muscles (Trent et al., 1983; Reiner et al., 1995; Weinshenker et al., 1995). This is indeed the likely case of Class IIIB strains carrying the KCNL-1 V530L variant thought to specifically reduce excitability of vulval muscles (Vigne et al., 2021). Our results therefore suggest that JU2829 (Class IIIA) may exhibit a similar defect in vulval muscle activation via serotonin caused by an alternative genetic change. Overall, these pharmacological assays do not allow us to conclude if and how HSN function has diverged among strains because the mode of action and targets of tested drugs has not been fully resolved. Nevertheless, our results are consistent with previous models proposing that these drugs do not simply block serotonin reuptake but can stimulate egg laying, to some extent, through mechanisms independent of serotonergic signaling (Trent et al., 1983; Desai and Horvitz, 1989; Reiner et al., 1995; Weinshenker et al., 1995, 1999; Dempsey et al., 2005; Kullyev et al., 2010; Branicky et al., 2014; Yue et al., 2018).”

      We removed the oversimplified Fig. 4B to avoid any misinterpretation.

      8) In Figure 7B and 7C, the authors should add some type of error bars to the graphs to and give the readers an idea of whether the differences between strains that they write about are statistically significant or not.

      Response: These are frequency data to describe temporal dynamics of hatching (N=45-72 eggs per strain) (Fig. 7B) and development in single cohorts (N=48-177 eggs per strain) (Fig. 7C), hence, the absence of error bars.

      We agree that this representation of the data is not very telling. We therefore changed the data representation in these two figures to show that there are clear, statistically significant, negative correlations between egg retention and time to hatching / egg-to-adult developmental time.

      9) When the authors reference a list of papers in a single list, e.g. "(Burton et al., 2021; Fausett et al., 2021; Garsin et al., 2001; Padilla et al., 2002; Van Voorhies and Ward, 2000)" they seem to do so in alphabetical order by the first author's last name. I believe the usual practice is to list references by year of publication, with the earliest first.

      Response: We corrected citation style according to eLIFE format.

      10) At the top of page 24, the authors write "It seems unlikely, however, that any of these variants strongly alter central function of HSN and HSN-mediated signalling because fluoxetine and imipramine, known to act via HSN (Dempsey et al., 2005; Trent et al., 1983; Weinshenker et al., 1995), triggered a robust stimulatory effect on egg laying in all examined strains (Fig. 4C)." I believe that the Weinshenker paper in fact showed that imipramine does not act via the HSN, and the Dempsey paper suggested that both drugs can act at least in part independently of the HSN. Therefore, the authors should revise their statement.

      Response: We have removed the sentence.

      Reviewing Editor:

      Minor suggestions:

      1) p. 2, fifth line from bottom: "lead" instead of "leads";

      2) p. 2, last line: "muscle" instead of "muscles";

      3) p. 3, first full paragraph, 17th line: "populations" instead of "population";

      4) p. 5, fourth line from bottom: Delete first comma;

      5) p. 6, Figure 1D: "of" instead of "off";

      6) p. 7, fifth line: "KCNL-1";

      7) p. 9, third paragraph, second line: please clarify "late mid-L4";

      8) p. 16, first line: "exogenous";

      9) p 20, first paragraph, beginning of second sentence: "Whether" instead of "If";

      10) p. 22, ninth line from bottom: delete "shaped by";

      11) p. 23, last paragraph, third and eighth lines from bottom: change "between" to "among"

      Response: Thank you. All corrected.

      Additional changes:

      Figure 5A: We removed figure 5A showing a cartoon of mod-5/SERT and its effects on serotonin signalling. This figure was incorrectly showing that MOD-5 is expressed in HSN (Jafari et al 2011 J. Neuroscience, Hammarlund et al 2018 Neuron).

      Abstract: We reworded the abstract to reduce its length.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Koumoundourou et al., identify a pathway downstream of Bcl11b that controls synapse morphology and plasticity of hippocampal mossy fiber synapses. Using an elegant combination of in vivo, ex vivo, and in vitro approaches, the authors build on their previous work that indicated C1ql2 as a functional target of Bcl11b (De Bruyckere et al., 2018). Here, they examine the functional implications of C1ql2 at MF synapses in Bcl11b cKO mice and following C1ql2 shRNA. The authors find that Bcl11b KO and shRNA against C1ql2 significantly reduces the recruitment of synaptic vesicles and impairs LTP at MF synapses. Importantly, the authors test a role for the previously identified C1ql2 binding partner, exon 25b-containing Nrxn3 (Matsuda et al., 2016), as relevant at MF synapses to maintain synaptic vesicle recruitment. To test this, the authors developed a K262E C1ql2 mutant that disrupts binding to Nrxn3. Curiously, while Bcl11b KO and C1ql2 KD largely phenocopy (reduced vesicle recruitment and impaired LTP), only vesicle recruitment is dependent on C1ql2-Nrxn3 interactions. These findings provide new insight into the functional role of C1ql2 at MF synapses. While the authors convincingly demonstrate a role for C1ql2-Nrxn3(25b+) interaction for vesicle recruitment and a Nrxn3(25b+)independent role for C1ql2 in LTP, the underlying mechanisms remain inconclusive. Additionally, a discussion of how these findings relate to previous work on C1ql2 at mossy fiber synapses and how the findings contribute to the biology of Nrxn3 would increase the interpretability of this work.

      As suggested by reviewer #1, we extended our discussion of previous work on C1ql2 and additionally discussed the biology of Nrxn3 and how our work relates to it. Moreover, we extended our mechanistic analysis of how Bcl11b/C1ql2/Nrxn3 pathway controls synaptic vesicle recruitment as well as LTP (please see also response to reviewer #2 points 5 and 8 and reviewer #3 point 4 of public reviews below for detailed discussion).

      Reviewer #2 (Public Review):

      This manuscript describes experiments that further investigate the actions of the transcription factor Bcl11b in regulating mossy fiber (MF) synapses in the hippocampus. Prior work from the same group had demonstrated that loss of Bcl11b results in loss of MF synapses as well as a decrease in LTP. Here the authors focus on a target of Bcl11b a secreted synaptic organizer C1ql2 which is almost completely lost in Bcl11b KO. Viral reintroduction of C1ql2 rescues the synaptic phenotypes, whereas direct KD of C1ql2 recapitulates the Bcl1 phenotype. C1ql2 itself interacts directly with Nrxn3 and replacement with a binding deficient mutant C1q was not able to rescue the Bcl11b KO phenotype. Overall there are some interesting observations in the study, however there are also some concerns about the measures and interpretation of data.

      The authors state that they used a differential transcriptomic analysis to screen for candidate targets of Bcl11b, yet they do not present any details of this screen. This should be included and at the very least a table of all DE genes included. It is likely that many other genes are also regulated by Bcl11b so it would be important to the reader to see the rationale for focusing attention on C1ql2 in this study.

      The transcriptome analysis mentioned in our manuscript was published in detail in our previous study (De Bruyckere et al., 2018), including chromatin-immunoprecipitation that revealed C1ql2 as a direct transcriptional target of Bcl11b. Upon revision of the manuscript, we made sure that this was clearly stated within the main text module to avoid future confusion. In the same publication (De Bruyckere et al., 2018), we discuss in detail several identified candidate genes such as Sema5b, Ptgs2, Pdyn and Penk as putative effectors of Bcl11b in the structural and functional integrity of MFS. C1ql2 has been previously demonstrated to be almost exclusively expressed in DG neurons and localized to the MFS.

      There it bridges the pre- and post-synaptic sides through interaction with Nrxn3 and KAR subunits, respectively, and regulates synaptic function (Matsuda et al., 2016). Taken together, C1ql2 was a very good candidate to study as a potential effector downstream of Bcl11b in the maintenance of MFS structure and function. However, as our data reveal, not all Bcl11b mutant phenotypes were rescued by C1ql2 (see supplementary figures 2d-f of revised manuscript). We expect additional candidate genes, identified in our transcriptomic screen, to act downstream of Bcl11b in the control of MFS.

      All viral-mediated expression uses AAVs which are known to ablate neurogenesis in the DG (Johnston DOI: 10.7554/eLife.59291) through the ITR regions and leads to hyperexcitability of the dentate. While it is not clear how this would impact the measurements the authors make in MF-CA3 synapses, this should be acknowledged as a potential caveat in this study.

      We agree with reviewer #2 and are aware that it has been demonstrated that AAV-mediated gene expression ablates neurogenesis in the DG. To avoid potential interference of the AAVs with the interpretability of our phenotypes, we made sure during the design of the study that all of our control groups were treated in the same way as our groups of interest, and were, thus, injected with control AAVs. Moreover, the observed phenotypes were first described in Bcl11b mutants that were not injected with AVVs (De Bruyckere et al., 2018). Finally, we thoroughly examined the individual components of the proposed mechanism (rescue of C1ql2 expression, over-expression of C1ql3 and introduction of mutant C1ql2 in Bcl11b cKOs, KD of C1ql2 in WT mice, and Nrxn123 cKO) and reached similar conclusions. Together, this strongly supports that the observed phenotypes occur as a result of the physiological function of the proteins involved in the described mechanism and not due to interference of the AAVs with these biological processes. We have now addressed this point in the main text module of the revised ms.

      The authors claim that the viral re-introduction "restored C1ql2 protein expression to control levels. This is misleading given that the mean of the data is 2.5x the control (Figure 1d and also see Figure 6c). The low n and large variance are a problem for these data. Moreover, they are marked ns but the authors should report p values for these. At the least, this likely large overexpression and variability should be acknowledged. In addition, the use of clipped bands on Western blots should be avoided. Please show the complete protein gel in primary figures of supplemental information.

      We agree with reviewer #2 that C1ql2 expression after its re-introduction in Bcl11b cKO mice was higher compared to controls and that this should be taken into consideration for proper interpretation of the data. To address this, based also on the suggestion of reviewer #3 point 1 below, we overexpressed C1ql2 in DG neurons of control animals. We found no changes in synaptic vesicle organization upon C1ql2 over-expression compared to controls. This further supports that the observed effect upon rescue of C1ql2 expression in Bcl11b cKOs is due to the physiological function of C1ql2 and not as result of the overexpression. These data are included in supplementary figure 2g-j and are described in detail in the results part of the revised manuscript.

      Additionally, we looked at the effects of C1ql2 overexpression in Bcl11b cKO DGN on basal synaptic transmission. We plotted fEPSP slopes versus fiber volley amplitudes, measured in slices from rescue animals, as we had previously done for the control and Bcl11b cKO (Author response image 1a). Although regression analysis revealed a trend towards steeper slopes in the rescue mice (Author response image 1a and b), the observation did not prove to be statistically significant, indicating that C1ql2 overexpression in Bcl11b cKO animals does not strongly alter basal synaptic transmission at MFS. Overall, our previous and new findings support that the observed effects of the C1ql2 rescue are not caused by the artificially elevated levels of C1ql2, as compared to controls, but are rather a result of the physiological function of C1ql2.

      Following the suggestion of reviewer #2 all western blot clipped bands were exchanged for images of the full blot. This includes figures 1c, 4c, 6b and supplementary figure 2g of the revised manuscript. P-value for Figure 1d has now been included.

      Author response image 1.

      C1ql2 reintroduction in Bcl11b cKO DGN does not significantly alter basal synaptic transmission at mossy fiber-CA3 synapses. a Input-output curves generated by plotting fEPSP slope against fiber volley amplitude at increasing stimulation intensities. b Quantification of regression line slopes for input-output curves for all three conditions. Control+EGFP, 35 slices from 16 mice; Bcl11b cKO+EGFP, 32 slices from 14 mice; Bcl11b cKO+EGFP-2A-C1ql2, 22 slices from 11 mice. The data are presented as means, error bars represent SEM. Kruskal-Wallis test (non-parametric ANOVA) followed by Dunn’s post hoc pairwise comparisons. p=0.106; ns, not significant.

      Measurement of EM micrographs: As prior work suggested that MF synapse structure is disrupted the authors should report active zone length as this may itself affect "synapse score" defined by the number of vesicles docked. More concerning is that the example KO micrographs seem to have lost all the densely clustered synaptic vesicles that are away from the AZ in normal MF synapses e.g. compare control and KO terminals in Fig 2a or 6f or 7f. These terminals look aberrant and suggest that the important measure is not what is docked but what is present in the terminal cytoplasm that normally makes up the reserve pool. This needs to be addressed with further analysis and modifications to the manuscript.

      As requested by reviewer #2 we analyzed and reported in the revised manuscript the active zone length. We found that the active zone length remained unchanged in all conditions (control/Bcl11b cKO/C1ql2 rescue, WT/C1ql2 KD, control/K262E and control/Nrxn123 cKO), strengthening our results that the described Bcl11b/C1ql2/Nrxn3 mechanism is involved in the recruitment of synaptic vesicles. These data have been included in supplementary figures 2c, 4h, 5f and 6g and are described in the results part of the revised manuscript.

      We want to clarify that the synapse score is not defined by the number of docked vesicles to the plasma membrane. The synapse score, which is described in great detail in our materials and methods part and has been previously published (De Bruyckere et al., 2018), rates MFS based on the number of synaptic vesicles and their distance from the active zone and was designed according to previously described properties of the vesicle pools at the MFS. The EM micrographs refer to the general misdistribution of SV in the proximity of MFS. Upon revision of the manuscript, we made sure that this was clearly stated in the main text module to avoid further confusion.

      The study also presents correlated changes in MF LTP in Bcl11b KO which are rescued by C1ql2 expression. It is not clear whether the structural and functional deficits are causally linked and this should be made clearer in the manuscript. It is also not apparent why this functional measure was chosen as it is unlikely that C1ql2 plays a direct role in presynaptic plasticity mechanisms that are through a cAMP/ PKA pathway and likely disrupted LTP is due to dysfunctional synapses rather than a specific LTP effect.

      The inclusion of functional experiments in this and our previous study (de Bruyckere et al., 2018) was first and foremost intended to determine whether the structural alterations observed at MFB disrupt MFS signaling. From the signaling properties we tested, basal synaptic transmission (this study) and short-term potentiation (de Bruyckere et al., 2018) were unaltered by Bcl11b KO, whereas MF LTP was found to be abolished (de Bruyckere et al., 2018). Indeed, because MF LTP largely depends on presynaptic mechanisms, including the redistribution of the readily releasable pool and recruitment of new active zones (Orlando et al., 2021; Vandael et al., 2020), it appears to be particularly sensitive to the specific structural changes we observed. We therefore believe that it is valuable information that MF LTP is affected in Bcl11b cKO animals - it conveys a direct proof for the functional importance of the observed morphological alterations, while basic transmission remains largely normal. Furthermore, it subsequently provided a functional marker for testing whether the reintroduction of C1ql2 in Bcl11b cKO animals or the KD of C1ql2 in WT animals can functionally recapitulate the control or the Bcl11b KO phenotype, respectively.

      We fully agree with the reviewer that C1ql2 is unlikely to directly participate in the cAMP/PKA pathway and that the ablation of C1ql2 likely disrupts MF LTP through an alternative mode of action. Our original wording in the paragraph describing the results of the forskolin-induced LTP experiment might have overstressed the importance of the cAMP pathway. We have now rephrased that paragraph to better describe the main idea behind the forskolin experiment, namely to circumvent the initial Ca2+ influx in order to test whether deficient presynaptic Ca2+ channel/KAR signaling might be responsible for the loss of LTP in Bcl11b cKO. The results are strongly indicative of a downstream mechanism and further investigation is needed to determine the specific mechanisms by which C1ql2 regulates MFLTP, especially in light of the result that C1ql2.K262E rescued LTP, while it was unable to rescue the SV recruitment at the MF presynapse. This raises the possibility that C1ql2 can influence MF-LTP through additional, yet uncharacterized mechanisms, independent of SV recruitment. As such, a causal link between the structural and functional deficits remains tentative and we have now emphasized that point by adding a respective sentence to the discussion of our revised manuscript. Nevertheless, we again want to stress that the main rationale behind the LTP experiments was to assess the functional significance of structural changes at MFS and not to elucidate the mechanisms by which MF LTP is established.

      The authors should consider measures that might support the role of Bcl11b targets in SV recruitment during the depletion of synapses or measurements of the readily releasable pool size that would complement their findings in structural studies.

      We fully agree that functional measurements of the readily releasable pool (RRP) size would be a valuable addition to the reported redistribution of SV in structural studies. We have, in fact, attempted to use high-frequency stimulus trains in both field and single-cell recordings (details on single-cell experiments are described in the response to point 8) to evaluate potential differences in RRP size between the control and Bcl11b KO (Figure for reviewers 2a and b). Under both recording conditions we see a trend towards lower values of the intersection between a regression line of late responses and the y-axis. This could be taken as an indication of slightly smaller RRP size in Bcl11b mutant animals compared to controls. However, due to several technical reasons we are extremely cautious about drawing such far-reaching conclusions based on these data. At most, they suffice to conclude that the availability of release-ready vesicles in the KO is likely not dramatically smaller than in the control.

      The primary issue with using high-frequency stimulus trains for RRP measurements at MFS is the particularly low initial release probability (Pr) at these synapses. This means that a large number of stimulations is required to deplete the RRP. As the RRP is constantly replenished, it remains unclear when steady state responses are reached (reviewed by Kaeser and Regehr, 2017). This is clearly visible in our single-cell recordings (Author response image 2b), which were additionally complicated by prominent asynchronous release at later stages of the stimulus train and by a large variability in the shapes of cumulative amplitude curves between cells. In contrast, while the cumulative amplitude curves for field potential recordings do reach a steady state (Author response image 2a), field potential recordings in this context are not a reliable substitute for single cell or, in the case of MFB, singlebouton recordings. Postsynaptic cells in field potential recordings are not clamped, meaning that the massive release of glutamate due to continuous stimulation depolarizes the postsynaptic cells and reduces the driving force for Na+, irrespective of depletion of the RRP. This is supported by the fact that we consistently observed a recovery of fEPSP amplitudes later in the trains where RRP had presumably been maximally depleted. In summary, high-frequency stimulus trains at the field potential level are not a valid and established technique for estimating RRP size at MFS.

      Specialized laboratories have used highly advanced techniques, such as paired recordings between individual MFB and postsynaptic CA3 pyramidal cells, to estimate the RRP size of MFB (Vandael et al., 2020). These approaches are outside the scope of our present study which, while elucidating functional changes following Bcl11b depletion and C1ql2 rescue, does not aim to provide a high-end biophysical analysis of the presynaptic mechanisms involved.

      Author response image 2.

      Estimation of RRP size using high-frequency stimulus trains at mossy fiber-CA3 synapses. a Results from field potential recordings. Cumulative fEPSP amplitude in response to a train of 40 stimuli at 100 Hz. All subsequent peak amplitudes were normalized to the amplitude of the first peak. Data points corresponding to putative steady state responses were fit with linear regression (RRP size is indirectly reflected by the intersection of the regression line with the yaxis). Control+EGFP, 6 slices from 5 mice; Bcl11b cKO+EGFP, 6 slices from 3 mice. b Results from single-cell recordings. Cumulative EPSC amplitude in response to a train of 15 stimuli at 50 Hz. The last four stimuli were fit with linear regression. Control, 5 cells from 4 mice; Bcl11b cKO, 3 cells from 3 mice. Note the shallow onset of response amplitudes and the subsequent frequency potentiation. Due to the resulting increase in slope at higher stimulus numbers, intersection with the y-axis occurs at negative values. The differences shown were not found to be statistically significant; unpaired t-test or Mann-Whitney U-test.

      Bcl11b KO reduces the number of synapses, yet the I-O curve reported in Supp Fig 2 is not changed. How is that possible? This should be explained.

      We agree with reviewer #2– this apparent discrepancy has indeed struck us as a counterintuitive result. It might be that synapses that are preferentially eliminated in Bcl11b cKO are predominantly silent or have weak coupling strength, such that their loss has only a minimal effect on basal synaptic transmission. Although perplexing, the result is fully supported by our single-cell data which shows no significant differences in MF EPSC amplitudes recorded from CA3 pyramidal cells between controls and Bcl11b mutants (Author response image 3; please see the response below for details and also our response to Reviewer #1 question 2).

      Matsuda et al DOI: 10.1016/j.neuron.2016.04.001 previously reported that C1ql2 organizes MF synapses by aligning postsynaptic kainate receptors with presynaptic elements. As this may have consequences for the functional properties of MF synapses including their plasticity, the authors should report whether they see deficient postsynaptic glutamate receptor signaling in the Bcl11b KO and rescue in the C1ql2 re-expression.

      We agree that the study by Matsuda et al. is of key importance for our present work. Although MF LTP is governed by presynaptic mechanisms and we previously did not see differences in short-term plasticity between the control and Bcl11b cKO (De Bruyckere et al., 2018), the clustering of postsynaptic kainate receptors by C1ql2 is indeed an important detail that could potentially alter synaptic signaling at MFS in Bcl11b KO. We, therefore, re-analyzed previously recorded single-cell data by performing a kinetic analysis on MF EPSCs recorded from CA3 pyramidal cells in control and Bcl11b cKO mice (Figure for reviewers 3a) to evaluate postsynaptic AMPA and kainate receptor responses in both conditions. We took advantage of the fact that AMPA receptors deactivate roughly 10 times faster than kainate receptors, allowing the contributions of the two receptors to mossy fiber EPSCs to be separated (Castillo et al., 1997 and reviewed by Lerma, 2003). We fit the decay phase of the second (larger) EPSC evoked by paired-pulse stimulation with a double exponential function, yielding a fast and a slow component, which roughly correspond to the fractional currents evoked by AMPA and kainate receptors, respectively. Analysis of both fast and slow time constants and the corresponding fractional amplitudes revealed no significant differences between controls and Bcl11b mutants (Figure for reviewers 3e-h), indicating that both AMPA and kainate receptor signaling is unaffected by the ablation of C1ql2 following Bcl11b KO.

      Importantly, MF EPSC amplitudes evoked by the first and the second pulse (Author response image 3b), paired-pulse facilitation (Author response image 3c) and failure rates (Author response image 3d) were all comparable between controls and Bcl11b mutants. These results further corroborate our observations from field recordings that basal synaptic transmission at MFS is unaltered by Bcl11b KO.

      We note that the results from single cell recordings regarding basal synaptic transmission merely confirm the observations from field potential recordings, and that the attempted measurement of RRP size at the single cell level was not successful. Thus, our single-cell data do not add new information about the mechanisms underlying the effects of Bcl11b-deficiency and we therefore decided not to report these data in the manuscript.

      Author response image 3.

      Basal synaptic transmission at mossy fiber-CA3 synapses is unaltered in Bcl11b cKO mice. a Representative average trace (20 sweeps) recorded from CA3 pyramidal cells in control and Bcl11b cKO mice at minimal stimulation conditions, showing EPSCs in response to paired-pulse stimulation (PPS) at an interstimulus interval of 40 ms. The signal is almost entirely blocked by the application of 2 μM DCG-IV (red). b Quantification of MF EPSC amplitudes in response to PPS for both the first and the second pulse. c Ratio between the amplitude of the second over the first EPSC. d Percentage of stimulation events resulting in no detectable EPSCs for the first pulse. Events <5 pA were considered as noise. e Fast decay time constant obtained by fitting the average second EPSC with the following double exponential function: I(t)=Afaste−t/τfast+Aslowe−t/τslow+C, where I is the recorded current amplitude after time t, Afast and Aslow represent fractional current amplitudes decaying with the fast (τfast) and slow (τslow) time constant, respectively, and C is the offset. Starting from the peak of the EPSC, the first 200 ms of the decaying trace were used for fitting. f Fractional current amplitude decaying with the fast time constant. g-h Slow decay time constant and fractional current amplitude decaying with the slow time constant. For all figures: Control, 8 cells from 4 mice; Bcl11b cKO, 8 cells from 6 mice. All data are presented as means, error bars indicate SEM. None of the differences shown were found to be statistically significant; Mann-Whitney U-test for nonnormally and unpaired t-test for normally distributed data.

      Reviewer #3 (Public Review):

      Overall, this is a strong manuscript that uses multiple current techniques to provide specific mechanistic insight into prior discoveries of the contributions of the Bcl11b transcription factor to mossy fiber synapses of dentate gyrus granule cells. The authors employ an adult deletion of Bcl11b via Tamoxifen-inducible Cre and use immunohistochemical, electron microscopy, and electrophysiological studies of synaptic plasticity, together with viral rescue of C1ql2, a direct transcriptional target of Bcl11b or Nrxn3, to construct a molecular cascade downstream of Bcl11b for DG mossy fiber synapse development. They find that C1ql2 re-expression in Bcl11b cKOs can rescue the synaptic vesicle docking phenotype and the impairments in MF-LTP of these mutants. They also show that C1ql2 knockdown in DG neurons can phenocopy the vesicle docking and plasticity phenotypes of the Bcl11b cKO. They also use artificial synapse formation assays to suggest that C1ql2 functions together with a specific Nrxn3 splice isoform in mediating MF axon development, extending these data with a C1ql2-K262E mutant that purports to specifically disrupt interactions with Nrxn3. All of the molecules involved in this cascade are disease-associated and this study provides an excellent blueprint for uncovering downstream mediators of transcription factor disruption. Together this makes this work of great interest to the field. Strengths are the sophisticated use of viral replacement and multi-level phenotypic analysis while weaknesses include the linkage of C1ql2 with a specific Nrxn3 splice variant in mediating these effects.

      Here is an appraisal of the main claims and conclusions:

      1) C1ql2 is a downstream target of Bcl11b which mediates the synaptic vesicle recruitment and synaptic plasticity phenotypes seen in these cKOs. This is supported by the clear rescue phenotypes of synapse anatomy (Fig.2) and MF synaptic plasticity (Fig.3). One weakness here is the absence of a control assessing over-expression phenotypes of C1ql2. It's clear from Fig.1D that viral rescue is often greater than WT expression (totally expected). In the case where you are trying to suppress a LoF phenotype, it is important to make sure that enhanced expression of C1ql2 in a WT background does not cause your rescue phenotype. A strong overexpression phenotype in WT would weaken the claim that C1ql2 is the main mediator of the Bcl11b phenotype for MF synapse phenotypes.

      As suggested by reviewer #3, we carried out C1ql2 over-expression experiments in control animals. We show that the over-expression of C1ql2 in the DG of control animals had no effect on the synaptic vesicle organization in the proximity of MFS. This further supports that the observed effect upon rescue of C1ql2 expression in Bcl11b cKOs is due to the physiological function of C1ql2 and not a result of the artificial overexpression. These data are now included in supplementary figure 2g-j and are described in detail in the results part of the revised manuscript. Please also see response to point 3 of reviewer #2.

      2) Knockdown of C1ql2 via 4 shRNAs is sufficient to produce the synaptic vesicle recruitment and MFLTP phenotypes. This is supported by clear effects in the shRNA-C1ql2 groups as compared to nonsense-EGFP controls. One concern (particularly given the use of 4 distinct shRNAs) is the potential for off-target effects, which is best controlled for by a rescue experiment with RNA insensitive C1ql2 cDNA as opposed to nonsense sequences, which may not elicit the same off-target effects.

      We agree with reviewer #3 that the usage of shRNAs could potentially create unexpected off-target effects and that the introduction of a shRNA-insensitive C1ql2 in parallel to the expression on the shRNA cassette would be a very effective control experiment. However, the suggested experiment would require an additional 6 months (2 months for AAV production, 2-3 months from animal injection to sacrifice and 1-2 months for EM imaging/analysis and LTP measurements) and a high number of additional animals (minimum 8 for EM and 8 for LTP measurements). We note here, that before the production of the shRNA-C1ql2 and the shRNA-NS, the individual sequences were systematically checked for off-target bindings on the murine exome with up to two mismatches and presented with no other target except the proposed (C1ql2 for shRNA-C1ql2 and no target for shRNA-NS). Taking into consideration our in-silico analysis, we feel that the interpretation of our findings is valid without this (very reasonable) additional control experiment.

      3) C1ql2 interacts with Nrxn3(25b+) to facilitate MF terminal SV clustering. This claim is theoretically supported by the HEK cell artificial synapse formation assay (Fig.5), the inability of the K262-C1ql2 mutation to rescue the Bcl11b phenotype (Fig.6), and the altered localization of C1ql2 in the Nrxn1-3 deletion mice (Fig.7). Each of these lines of experimental evidence has caveats that should be acknowledged and addressed. Given the hypothesis that C1ql2 and Nrxn3b(25b) are expressed in DG neurons and work together, the heterologous co-culture experiment seems strange. Up till now, the authors are looking at pre-synaptic function of C1ql2 since they are re-expressing it in DGNs. The phenotypes they are seeing are also pre-synaptic and/or consistent with pre-synaptic dysfunction. In Fig.5, they are testing whether C1ql2 can induce pre-synaptic differentiation in trans, i.e. theoretically being released from the 293 cells "post-synaptically". But the post-synaptic ligands (Nlgn1 and and GluKs) are not present in the 293 cells, so a heterologous synapse assay doesn't really make sense here. The effect that the authors are seeing likely reflects the fact that C1ql2 and Nrxn3 do bind to each other, so C1ql2 is acting as an artificial post-synaptic ligand, in that it can cluster Nrxn3 which in turn clusters synaptic vesicles. But this does not test the model that the authors propose (i.e. C1ql2 and Nrxn3 are both expressed in MF terminals). Perhaps a heterologous assay where GluK2 is put into HEK cells and the C1ql2 and Nrxn3 are simultaneously or individually manipulated in DG neurons?

      C1ql2 is expressed by DG neurons and is then secreted in the MFS synaptic cleft, while Nrxn3, that is also expressed by DG neurons, is anchored at the presynaptic side. In our work we used the well established co-culture system assay and cultured HEK293 cells secreting C1ql2 (an IgK secretion sequence was inserted at the N-terminus of C1ql2) together with hippocampal neurons expressing Nrxn3(25b+). We used the HEK293 cells as a delivery system of secreted C1ql2 to the neurons to create regions of high concentration of C1ql2. By interfering with the C1ql2-Nrxn3 interaction in this system either by expression of the non-binding mutant C1ql2 variant in the HEK cells or by manipulating Nrxn expression in the neurons, we could show that C1ql2 binding to Nrxn3(25b+) is necessary for the accumulation of vGlut1. However, we did not examine and do not claim within our manuscript that the interaction between C1ql2 and Nrxn3(25b+) induces presynaptic differentiation. Our experiment only aimed to analyze the ability of C1ql2 to cluster SV through interaction with Nrxn3. Moreover, by not expressing potential postsynaptic interaction partners of C1ql2 in our system, we could show that C1ql2 controls SV recruitment through a purely presynaptic mechanism. Co-culturing GluK2-expressing HEK cells with simultaneous manipulation of C1ql2 and/or Nrxn3 in neurons would not allow us to appropriately answer our scientific question, but rather focus on the potential synaptogenic function of the Nrxn3/C1ql2/GluK2 complex and the role of the postsynaptic ligand in it. Thus, we feel that the proposed experiment, while very interesting in characterization of additional putative functions of C1ql2, may not provide additional information for the point we were addressing. In the revised manuscript we tried to make the aim and methodological approach of this set of experiments more clear.

      4) K262-C1ql2 mutation blocks the normal rescue through a Nrxn3(25b) mechanism (Fig.6). The strength of this experiment rests upon the specificity of this mutation for disrupting Nrxn3b binding (presynaptic) as opposed to any of the known postsynaptic C1ql2 ligands such as GluK2. While this is not relevant for interpreting the heterologous assay (Fig.5), it is relevant for the in vivo phenotypes in Fig.6. Similar approaches as employed in this paper can test whether binding to other known postsynaptic targets is altered by this point mutation.

      It has been previously shown that C1ql2 together with C1ql3 recruit postsynaptic GluK2 at the MFS. However, loss of just C1ql2 did not affect the recruitment of GluK2, which was disrupted only upon loss of both C1ql2 and C1ql3 (Matsuda et al., 2018). In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 can recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (KARs and BAI3; Fig.5; please also see response above). Furthermore, we have now performed a kinetic analysis on single-cell data which we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b KO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling is altered upon the loss of C1ql2 following Bcl11b cKO (Author response image 3e-h; please also see our response to reviewer #2 point 8). Thus, we have no experimental evidence supporting the idea that a loss of interaction between C1ql2.K262E and GluK2 would interfere with the examined phenotype. However, to exclude that the K262E mutation disrupts interaction between C1ql2 and GluK2, we performed co-immunoprecipitation from protein lysate of HEK293 cells expressing GluK2myc-flag and GFP-C1ql2 or GluK2-myc-flag and GFP-K262E and could show that both C1ql2 and K262E had GluK2 bound when precipitated. These data are included in supplementary figure 5k of the revised manuscript.

      5) Altered localization of C1ql2 in Nrxn1-3 cKOs. These data are presented to suggest that Nrx3(25b) is important for localizing C1ql2 to the SL of CA3. Weaknesses of this data include both the lack of Nrxn specificity in the triple a/b KOs as well as the profound effects of Nrxn LoF on the total levels of C1ql2 protein. Some measure that isn't biased by this large difference in C1ql2 levels should be attempted (something like in Fig.1F).

      We acknowledge that the lack of specificity in the Nrxn123 model makes it difficult to interpret our data. We have now examined the mRNA levels of Nrxn1 and Nrxn2 upon stereotaxic injection of Cre in the DG of Nrxn123flox/flox animals and found that Nrxn1 was only mildly reduced. At the same time Nrxn2 showed a tendency for reduction that was not significant (data included in supplementary figure 6a of revised manuscript). Only Nrxn3 expression was strongly suppressed. Of course, this does not exclude that the mild reduction of Nrxn1 and Nrxn2 interferes with the C1ql2 localization at the MFS. We further examined the mRNA levels of C1ql2 in control and Nrxn123 mutants to ensure that the observed changes in C1ql2 protein levels at the MFS are not due to reduced mRNA expression and found no changes (data are included in supplementary figure 6b of the revised manuscript), suggesting that overall protein C1ql2 expression is normal.

      The reduced C1ql2 fluorescence intensity at the MFS was first observed when non-binding C1ql2 variant K262E was introduced to Bcl11b cKO mice that lack endogenous C1ql2 (Fig.6). In these experiments, we found that despite the overall high protein levels of C1ql2.K262E in the hippocampus (Fig. 6c), its fluorescence intensity at the SL was significantly reduced compared to WT C1ql2 (Fig. 6d-e). The remaining signal of the C1ql2.K262E at the SL was equally distributed and in a punctate form, similar to WT C1ql2. Together, this suggests that loss of C1ql2-Nrxn3 interaction interferes with the localization of C1ql2 at the MFS, but not with the expression of C1ql2. Of course, this does not exclude that other mechanisms are involved in the synaptic localization of C1ql2, beyond the interaction with Nrxn3, as both the mutant C1ql2 in Bcl11b cKO and the endogenous C1ql2 in Nrxn123 cKOs show residual immunofluorescence at the SL. Further studies are required to determine how C1ql2-Nrxn3 interaction regulates C1ql2 localization at the MFS.

      Reviewer #1 (Recommendations For The Authors):

      In addition to addressing the comments below, this study would benefit significantly from providing insight and discussion into the relevant potential postsynaptic signaling components controlled exclusively by C1ql2 (postsynaptic kainate receptors and the BAI family of proteins).

      We have now performed a kinetic analysis on single-cell data that we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b cKO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling differ between controls and upon the loss of C1ql2 following Bcl11b cKO (Author response image 3e-h; please also see our response to Reviewer #2 point 8). This agrees with previous findings that C1ql2 regulates postsynaptic GluK2 recruitment together with C1ql3 and only loss of both C1ql2 and C1ql3 results in a disruption of KAR signaling (Matsuda et al., 2018). In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 can recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (KARs and BAI3; Fig.5; please also see our response to reviewer #3 point 4 above). We believe that further studies are needed to fully understand both the pre- and the postsynaptic functions of C1ql2. Because the focus of this manuscript was on the role of the C1ql2-Nrxn3 interaction and our investigation on postsynaptic functions of C1ql2 was incomplete, we did not include our findings on postsynaptic current kinetics in our revised manuscript. However, we increased the discussion on the known postsynaptic partners of C1ql2 in the revised manuscript to increase the interpretability of our results.

      Major Comments:

      The authors demonstrate that the ultrastructural properties of presynaptic boutons are altered after Bcl11b KO and C1ql2 KD. However, whether C1ql2 functions as part of a tripartite complex and the identity of the postsynaptic receptor (BAI, KAR) should be examined.

      Matsuda and colleagues have nicely demonstrated in their 2016 (Neuron) study that C1ql2 is part of a tripartite complex with presynaptic Nrxn3 and postsynaptic KARs. Moreover, they demonstrated that C1ql2, together with C1ql3, recruit postsynaptic KARs at the MFS, while the KO of just C1ql2 did not affect the KAR localization. In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 is able to recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (Fig. 5; please also see our response to reviewer #3 point 4 above). Moreover, we were able to show that the SV recruitment depends on C1ql2 interaction with Nrxn3 through the expression of a non-binding C1ql2 (Fig. 6) that retains the ability to interact with GluK2 (supplementary figure 5k of revised manuscript) or by KO of Nrxns (Fig. 7). Furthermore, we have now performed a kinetic analysis on single-cell data which we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b cKO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling differ between controls and Bcl11b mutants (Author response image 3e-h; please also see our response to Reviewer #2 question 8). Together, we have no experimental evidence so far that would support that the postsynaptic partners of C1ql2 are involved in the observed phenotype. While it would be very interesting to characterize the postsynaptic partners of C1ql2 in depth, we feel this would be beyond the scope of the present study.

      Figure 1f: For a more comprehensive understanding of the Bcl11b KO phenotype and the potential role for C1ql2 on MF synapse number, a complete quantification of vGlut1 and Homer1 for all conditions (Supplement Figure 2e) should be included in the main text.

      In our study we focused on the role of C1ql2 in the structural and functional integrity of the MFS downstream of Bcl11b. Bcl11b ablation leads to several phenotypes in the MFS that have been thoroughly described in our previous study (De Bruyckere et al., 2018). As expected, re-expression of C1ql2 only partially rescued these phenotypes, with full recovery of the SV recruitment (Fig. 2) and of the LTP (Fig. 3), but had no effect on the reduced numbers of MFS nor the structural complexity of the MFB created by the Bcl11b KO (supplementary figure 2d-f of revised manuscript). We understand that including the quantification of vGlut1 and Homer1 co-localization in the main figures would help with a better understanding of the Bcl11b mutant phenotype. However, in our manuscript we investigate C1ql2 as an effector of Bcl11b and thus we focus on its functions in SV recruitment and LTP. As we did not find a link between C1ql2 and the number of MFS/MFB upon re-expression of C1ql2 in Bcl11b cKO or now also in C1ql2 KD (see response to comment #4 below), we believe it is more suitable to present these data in the supplement.

      Figure 3/4: Given the striking reduction in the numbers of synapses (Supplement Figure 2e) and docked vesicles (Figure 2d) in the Bcl11b KO and C1ql2 KD (Figure 4e-f), it is extremely surprising that basal synaptic transmission is unaffected (Supplement Figure 2g). The authors should determine the EPSP input-output relationship following C1ql2 KD and measure EPSPs following trains of stimuli at various high frequencies.

      We fully acknowledge that this is an unexpected result. It is, however, well feasible that the modest displacement of SV fails to noticeably influence basal synaptic transmission. This would be the case, for example, if only a low number of vesicles are released by single stimuli, in line with the very low initial Pr at MFS. In contrast, the reduction in synapse numbers in the Bcl11b mutant might indeed be expected to reflect in the input-output relationship. It is possible, however, that synapses that are preferentially eliminated in Bcl11b cKO are predominantly silent or have weak coupling strength, such that their loss has only a minimal effect on basal synaptic transmission. Finally, we cannot exclude compensatory mechanisms (homeostatic plasticity) at the remaining synapses. A detailed analysis of these potential mechanisms would be a whole project in its own right.

      As additional information, we can say that the largely unchanged input-output-relation in Bcl11b cKO is also present in the single-cell level data (Author response image 3; details on single-cell experiments are described in the response to Reviewer #2 point 8).

      As suggested by the reviewer, we have now additionally analyzed the input-output relationship following C1ql2 KD and again did not observe any significant difference between control and KD animals. We have incorporated the respective input-output curves into the revised manuscript under Supplementary figure 3c-d.

      Figure 4: Does C1ql2 shRNA also reduce the number of MFBs? This should be tested to further identify C1ql2-dependent and independent functions.

      As requested by reviewer #1 we quantified the number of MFBs upon C1ql2 KD. We show that C1ql2 KD in WT animals does not alter the number of MFBs. The data are presented in supplementary figure 4d of the revised manuscript. Re-expression of C1ql2 in Bcl11b cKO did not rescue the loss of MFS created by the Bcl11b mutation. Moreover, C1ql2 re-expression did not rescue the complexity of the MFB ultrastructure perturbed by the Bcl11b ablation. Together, this suggests that Bcl11b regulates MFs maintenance through additional C1ql2-independent pathways. In our previously published work (De Bruyckere et al., 2018) we identified and discussed in detail several candidate genes such as Sema5b, Ptgs2, Pdyn and Penk as putative effectors of Bcl11b in the structural and functional integrity of MFS (please also see response to reviewer #2- point 1 of public reviews).

      Figure 5: Clarification is required regarding the experimental design of the HEK/Neuron co-culture: 1. C1ql2 is a secreted soluble protein - how is the protein anchored to the HEK cell membrane to recruit Nrxn3(25b+) binding and, subsequently, vGlut1?

      C1ql2 was secreted by the HEK293 cells through an IgK signaling peptide at the N-terminus of C1ql2. The high concentration of C1ql2 close to the secretion site together with the sparse coculturing of the HEK293 cells on the neurons allows for the quantification of accumulation of neuronal proteins. We have now described the experimental conditions in greater detail in the main text module of the revised manuscript

      2) Why are the neurons transfected and not infected? Transfection efficiency of neurons with lipofectamine is usually poor (1-5%; Karra et al., 2010), while infection of neurons with lentiviruses or AAVs encoding cDNAs routinely are >90% efficient. Thus, interpretation of the recruitment assays may be influenced by the density of neurons transfected near a HEK cell.

      We agree with reviewer #1 that viral infection of the neurons would have been a more effective way of expressing our constructs. However, due to safety allowances in the used facility and time limitation at the time of conception of this set of experiments, a lipofectamine transfection was chosen.

      However, as all of our examined groups were handled in the same way and multiple cells from three independent experiments were examined for each experimental set, we believe that possible biases introduced by the transfection efficiency have been eliminated and thus have trust in our interpretation of these results.

      3) Surface labeling of HEK cells for wild-type C1ql2 and K262 C1ql2 would be helpful to assess the trafficking of the mutant.

      We recognize that potential changes to the trafficking of C1ql2 caused by the K262E mutation would be important to characterize, in light of the reduced localization of the mutant protein at the SL in the in vivo experiments (Fig. 6e). In our culture system, C1ql2 and K262E were secreted by the HEK cells through insertion of an IgK signaling peptide at the N-terminus of the myc-tagged C1ql2/K262E. Thus, trafficking analysis on this system would not be informative, as the system is highly artificial compared to the in vivo model. Further studies are needed to characterize C1ql2 trafficking in neurons to understand how C1ql2-Nrxn3 interaction regulates the localization of C1ql2. However, labeling of the myc-tag in C1ql2 or K262E expressing HEK cells of the co-culture model reveals a similar signal for the two proteins (Fig. 5a,c). Nrxn-null mutation in neurons co-cultured with C1ql2-expressing HEK cells disrupted C1ql2 mediated vGlut1 accumulation in the neurons. Selective expression of Nrxn3(25b) in the Nrxn-null neurons restored vGlut1 clustering was (Fig. 5e-f). Together, these data suggest that it is the interaction between C1ql2 and Nrxn3 that drives the accumulation of vGlut1.

      Figure 6: Bcl11b KO should also be included in 6f-h.

      As suggested by reviewer #1, we included the Bcl11b cKO in figures 6f-h and in corresponding supplementary figures 5c-j.

      Figure 7b: What is the abundance of mRNA for Nrxn1 and Nrxn2 as well as the abundance of Nrxns after EGFP-Cre injection into DG?

      We addressed this point raised by reviewer #1 by quantifying the relative mRNA levels of Nrxn1 and Nrxn2 via qPCR upon Nrxn123 mutation induction with EGFP-Cre injection. We have now examined the mRNA levels of Nrxn1 and Nrxn2 upon stereotaxic injection of Cre in the DG of Nrxn123flox/flox animals and found that Nrxn1 was only mildly reduced. At the same time Nrxn2 showed a tendency for reduction that was not significant. The data are presented in supplementary figure 6a of the revised maunscript.

      Minor Comments for readability:

      Synapse score is referred to frequently in the text and should be defined within the text for clarification.

      'n' numbers should be better defined in the figure legends. For example, for protein expression analysis in 1c, n=3. Is this a biological or technical triplicate? For electrophysiology (e.g. 3c), does "n=7" reflect the number of animals or the number of slices? n/N (slices/animals) should be presented.

      Figure 7a: Should the diagrams of the cre viruses be EGFP-Inactive or active Cre and not CRE-EGFP as shown in the diagram?

      Figure 7b: the region used for the inset should be identified in the larger image.

      All minor points have been fixed in the revised manuscript according to the suggestions.

      Reviewer #3 (Recommendations For The Authors):

      -Please describe the 'synapse score' somewhere in the text - it is too prominently featured to not have a clear description of what it is.

      The description of the synapse score has been included in the main text module of the revised manuscript.

      -The claim that Bcl11b controls SV recruitment "specifically" through C1ql2 is a bit stronger than is warranted by the data. Particularly given that C1ql2 is expressed at 2.5X control levels in their rescue experiments. See pt.2

      Please see response to reviewer #3 point 1 of public reviews. To address this, we over-expressed C1ql2 in control animals and found no changes in the synaptic vesicle distribution (supplementary figure 2g-j of revised manuscript). This supports that the observed rescue of synaptic vesicle recruitment by re-expression of C1ql2 is due to its physiological function and not due to the artificially elevated protein levels. Of course, we cannot exclude the possibility that other, C1ql2-independent, mechanisms also contribute to the SV recruitment downstream of Bcl11b. Our data from the C1ql2 rescue, C1ql2 KD, the in vitro experiments and the interruption of C1ql2-Nrxn3 in vivo, strongly suggest C1ql2 to be an important regulator of SV recruitment.

      -Does Bcl11b regulate Nrxn3 expression? Considering the apparent loss of C1ql2 expression in the Nrxn KO mice, this is an important detail.

      We agree with reviewer #3 that this is an important point. We have previously done differential transcriptomics from DG neurons of Bcl11b cKOs compared to controls and did not find Nrxn3 among the differentially expressed genes. To further validate this, we now quantified the Nrxn3 mRNA levels via qPCR in Bcl11b cKOs compared to controls and found no differences. These data are included in supplementary figure 5a of the revised manuscript.

      -It appears that C1ql2 expression is much lower in the Nrxn123 KO mice. Since the authors are trying to test whether Nrxn3 is required for the correct targeting of C1ql2, this is a confounding factor. We can't really tell if what we are seeing is a "mistargeting" of C1ql2, loss of expression, or both. If the authors did a similar analysis to what they did in Figure 1 where they looked at the synaptic localization of C1ql2 (and quantified it) that could provide more evidence to support or refute the "mistargeting" claim.

      Please also see response to reviewer #3 point 5 of public reviews. To exclude that reduction of fluorescence intensity of C1ql2 at the SL in Nrxn123 KO mice is due to loss of C1ql2 expression, we examined the mRNA levels of C1ql2 in control and Nrxn123 mutants and found no changes (data are included in supplementary figure 6b of the revised manuscript), suggesting that C1ql2 gene expression is normal. The reduced C1ql2 fluorescence intensity at the MFS was first observed when non-binding C1ql2 variant K262E was introduced to Bcl11b cKO mice that lack endogenous C1ql2 (Fig.6). In these experiments, we found that despite the overall high protein levels of C1ql2.K262E in the hippocampus (Fig. 6c), its fluorescence intensity at the SL was significantly reduced compared to WT C1ql2 (Fig. 6d-e). The remaining C1ql2.K262E signal in the SL was equally distributed and in a punctate form, similar to WT C1ql2. Together, this indicates that the loss of C1ql2-Nrxn3 interaction interferes with the localization of C1ql2 along the MFS, but not with expression of C1ql2. Of course, this does not exclude that additional mechanisms regulate C1ql2 localization at the synapse, as both the mutant C1ql2 in Bcl11b cKO and the endogenous C1ql2 in Nrxn123 cKO show residual immunofluorescence at the SL.

      We note here that we have not previously quantified the co-localization of C1ql2 with individual synapses. C1ql2 is a secreted molecule that localizes at the MFS synaptic cleft. However, not much is known about the number of MFS that are positive for C1ql2 nor about the mechanisms regulating C1ql2 targeting, transport, and secretion to the MFS. Whether C1ql2 interaction with Nrxn3 is necessary for the protection of C1ql2 from degradation, its surface presentation and transport or stabilization to the synapse is currently unclear. Upon revision of our manuscript, we realized that we might have overstated this particular finding and have now rephrased the specific parts within the results to appropriately describe the observation and have also included a sentence in the discussion referring to the lack of understanding of the mechanism behind this observation.

      -Title of Figure S5 is "Nrxn KO perturbs C1ql2 localization and SV recruitment at the MFS", but there is no data on C1ql2 localization.

      This issue has been fixed in the revised manusript.

      -S5 should be labeled more clearly than just Cre+/-

      This issue has been fixed in the revised manuscript.

      References

      Castillo, P.E., Malenka, R.C., Nicoll, R.A., 1997. Kainate receptors mediate a slow postsynaptic current in hippocampal CA3 neurons. Nature 388, 182–186. https://doi.org/10.1038/40645

      De Bruyckere, E., Simon, R., Nestel, S., Heimrich, B., Kätzel, D., Egorov, A.V., Liu, P., Jenkins, N.A., Copeland, N.G., Schwegler, H., Draguhn, A., Britsch, S., 2018. Stability and Function of Hippocampal Mossy Fiber Synapses Depend on Bcl11b/Ctip2. Front. Mol. Neurosci. 11. https://doi.org/10.3389/fnmol.2018.00103

      Kaeser, P.S., Regehr, W.G., 2017. The readily releasable pool of synaptic vesicles. Curr. Opin. Neurobiol. 43, 63–70. https://doi.org/10.1016/j.conb.2016.12.012

      Lerma, J., 2003. Roles and rules of kainate receptors in synaptic transmission. Nat. Rev. Neurosci. 4, 481–495. https://doi.org/10.1038/nrn1118

      Orlando, M., Dvorzhak, A., Bruentgens, F., Maglione, M., Rost, B.R., Sigrist, S.J., Breustedt, J., Schmitz, D., 2021. Recruitment of release sites underlies chemical presynaptic potentiation at hippocampal mossy fiber boutons. PLoS Biol. 19, e3001149. https://doi.org/10.1371/journal.pbio.3001149

      Vandael, D., Borges-Merjane, C., Zhang, X., Jonas, P., 2020. Short-Term Plasticity at Hippocampal Mossy Fiber Synapses Is Induced by Natural Activity Patterns and Associated with Vesicle Pool Engram Formation. Neuron 107, 509-521.e7. https://doi.org/10.1016/j.neuron.2020.05.013

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work describes new validated conditional double KO (cDKO) mice for LRRK1 and LRRK2 that will be useful for the field, given that LRRK2 is widely expressed in the brain and periphery, and many divergent phenotypes have been attributed previously to LRRK2 expression. The manuscript presents solid data demonstrating that it is the loss of LRRK1 and LRRK2 expression within the SNpc DA cells that is not well tolerated, as it was previously unclear from past work whether neurodegeneration in the LRRK double Knock Out (DKO) was cell autonomous or the result of loss of LRRK1/LRRK2 expression in other types of cells. Future studies may pursue the biochemical mechanisms underlying the reason for the apoptotic cells noted in this study, as here, the LRRK1/LRRK2 KO mice did not replicate the dramatic increase in the number of autophagic vacuoles previously noted in germline global LRRK1/LRRK2 KO mice.

      We thank the editors for handling our manuscript and for the succinct summary that recognizes the significance of our findings and points out interesting directions for future studies. We also thank the reviewers for their helpful comments and positive evaluation of our work. Below, we have provided point-by-point responses to the reviewers’ comments.

      Reviewer #1 (Public Review):

      Summary:

      This is an important work showing that loss of LRRK function causes late-onset dopaminergic neurodegeneration in a cell-autonomous manner. One of the LRRK members, LRRK2, is of significant translational importance as mutations in LRRK2 cause late-onset autosomal dominant Parkinson's disease (PD). While many in the field assume that LRRK2 mutant causes PD via increased LRRK2 activity (i.e., kinase activity), it is not a settled issue as not all disease-causing mutant LRRK2 exhibit increased activity. Further, while LRRK2 inhibitors are under clinical trials for PD, the consequence of chronic, long-term LRRK2 inhibition is unknown. Thus, studies evaluating the long-term impact of LRRK deficit have important translational implications. Moreover, because LRRK proteins, particularly LRRK2, are known to modulate immune response and intracellular membrane trafficking, the study's results and the reagents will be valuable for others interested in LRRK function.

      Strengths:

      This report describes a mouse model where the LRRK1 and LRRK2 gene is conditionally deleted in dopaminergic neurons. Previously, this group showed that while loss of LRRK2 expression does not cause brain phenotype, loss of both LRRK1 and LRRK2 causes a later onset, progressive degeneration of catecholaminergic neurons and dopaminergic (DAergic) neurons in the substantia nigra (SN), and noradrenergic neurons in the locus coeruleus (LC). However, because LRRK genes are widely expressed with some peripheral phenotypes, it was unknown if the neurodegeneration in the LRRK double knockout (DKO) was cell autonomous. To rigorously test this question, the authors have generated a double conditional (cDKO) allele where both LRRK1 and LRRK2 genes were targeted to contain loxP sites. In my view, this was beyond what is usually required, as most investigators might might combine one KO allele with another floxed allele. The authors provide a rigorous validation showing that the Driver (DAT-Cre) is expressed in most DAergic neurons in the SN and that LRRK levers are decreased selectively in the ventral midbrain. Using these mice, the authors show that the number of DAergic neurons is normal at 15 but significantly decreased at 20 months of age. Moreover, the authors show that the number of apoptotic neurons is increased by ~2X in aged SN, demonstrating increased ongoing cell death, as well as an increase in activated microglia. The degeneration is limited to DAergic neurons as LC neurons are not lost as this population does not express DAT. Overall, the mouse genetics and experimental analysis were performed rigorously, and the results were statistically sound and compelling.

      Weaknesses:

      I only have a few minor comments. First is that in PD and other degenerative conditions, loss of axons and terminals occurs prior to cell bodies. It might be beneficial to show the status of DAergic markers in the striatum. Second, previous studies indicate that very little, if any, LRRK1 is expressed in SN DAergic neurons. This also the case with the Allen Brain Atlas profile. Thus, authors should discuss the discrepancy as authors seem to imply significant LRRK1 expression in DA neurons.

      We appreciate the reviewer’s recognition of the importance of the study as well as our rigorous experimental approaches and compelling results. Our responses to the reviewer's two minor comments are below.

      1) DAergic markers in the striatum: We performed TH immunostaining in the striatum and quantified TH+ DA terminals in the striatum of DA neuron-specific LRRK cDKO and littermate control mice at the ages of 15 and 24 months. We found similar levels of TH immunoreactivity in the striatum of LRRK cDKO and littermate control mice at the age of 15 months (p = 0.6565, unpaired Student’s t-test) and significantly reduced levels of TH immunoreactivity in the striatum of LRRK cDKO, compared to control mice at the age of 24 months (~19%, p = 0.0215), suggesting an age-dependent loss of dopaminergic terminals in the striatum of DA neuron-specific LRRK cDKO mice. These results are now included as Figure 5 of the revised manuscript.

      2) LRRK1 expression in the SNpc: It is shown in the Mouse brain RNA-seq dataset and the Allen Mouse brain ISH dataset (https://www.proteinatlas.org/ENSG00000154237-LRRK1/brain) that LRRK1 is broadly expressed in the mouse brain and is expressed at modest levels in the midbrain, comparable to the cerebral cortex. Indeed, our Western analysis also showed that levels of LRRK1 detected in the dissected ventral midbrain and the cerebral cortex of control mice are similar (40µg total protein loaded per lane; Figure 2E). Furthermore, we previously demonstrated that deletion of LRRK2 (or LRRK1) alone does not cause age-dependent loss of DA neurons in the SNpc, but deletions of both LRRK1 and LRRK2 result in age-dependent loss of DA neurons in LRRK DKO mice, indicating the functional importance of LRRK1 in the protection of DA neuron survival in the aging mouse brain (Tong et al., PNAS 2010, 107: 9879-9884, Giaime et al., Neuron 2017, 96: 796-807).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shen and collaborators described the generation of cDKO mice lacking LRRK1 and LRRK2 selectively in DAT-positive DAergic neurons. The Authors asked whether selective deletion of both LRRK isoforms could lead to a Parkinsonian phenotype, as previously reported by the same group in germline double LRRK1 and LRRK2 knockout mice (PMID: 29056298). Indeed, cDKO mice developed a late reduction of TH+ neurons in SNpc that partially correlated with the reduction of NeuN+ cells. This was associated with increased apoptotic cell and microglial cell numbers in SNpc.

      Unlike the constitutive DKO mice described earlier, however, cDKO mice did not replicate the dramatic increase in the number of autophagic vacuoles. The study supports the authors' hypothesis that loss of function rather than gain of function of LRRK2 leads to PD.

      Strengths:

      The study described for the first time a model where both the PD-associated gene LRRK2 and its homolog LRRK1 are deleted selectively in DAergic neurons, offering a new tool to understand the physiopathological role of LRRK2 and the compensating role of LRRK1 in modulating DAergic cell function.

      Weaknesses:

      The model has no construct validity since loss of function mutations of LRRK2 are well-tolerated in humans and do not lead to PD. The evidence of a Parkinsonian phenotype in these cDKO mice is limited and should be considered preliminary.

      We thank the reviewer for commenting on the usefulness of this new PD mouse model.

      The reviewer did not include a reference citation for the statement "loss of function mutations of LRRK2 are well-tolerated in humans and do not lead to PD." It is possible that the reviewer was referring to a human population study (Whiffin et al., Nat Med 2020, 26: 869-877), entitled "The effect of LRRK2 lossof-function variants in humans." In this study, the authors analyzed 141,456 individuals sequenced in the Genome Aggregation Database, 49,960 exome-sequenced individuals from the UK Biobank, and more than 4 million participants in the 23andMe genotyped dataset, and they looked for human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants). The reported findings were interesting, and the authors were careful in stating their conclusions. However, this is not a linkage study of large pedigrees carrying a single, clear-cut loss-of-function mutation (e.g. large deletions of most exons and coding sequences). Therefore, the experimental evidence is not compelling enough to conclude whether loss-of-function mutations in LRRK2 cause PD or do not cause PD.

      The current report is an unbiased genetic study in an effort to reveal the normal physiological role of LRRK in dopaminergic neurons. It was not intended to produce Parkinsonian phenotypes in LRRK cDKO mice, which would be a biased effort. However, the unequivocal discovery of the cell intrinsic role of LRRK in the protection of DA neurons from age-dependent degeneration and apoptotic cell death should be considered seriously, while we contemplate the disease mechanism and how LRRK2 mutations may cause DA neuron loss and PD.

      Reviewer #3 (Public Review):

      Kang, Huang, and colleagues investigated the impact of LRRK1 and LRRK2 deletion, specifically in dopaminergic neurons, using a novel cDKO mouse model. They observed a significant reduction in DAergic neurons in the substantia nigra in their conditional LRRK1 and LRRK2 KO mice and a corresponding increase in markers of apoptosis and gliosis. This work set out to address a longstanding question within the field around the role and importance of LRRK1 and LRRK2 in DAergic neurons and suggests that the loss of both proteins triggers some neurodegeneration and glial activation.

      The studies included in this work are carefully performed and clearly communicated, but additional studies are needed to strengthen further the authors' claims around the consequences of LRRK2 deletion in DAergic neurons.

      1) In Figures 2E and F, the authors assess the protein levels of LRRK1 and LRRK2 in their cDKO mouse model to confirm the deletion of both proteins. They observe a mild loss of LRRK1 and LRRK2 signals in the ventral midbrain compared to wild-type animals. While this is not surprising given other cell types that still express LRRK1 and LRRK2 would be present in their dissected ventral midbrain samples, it does not sufficiently confirm that LRRK1 and LRRK2 are not expressed in DAergic neurons. Additional data is needed to more directly demonstrate that LRRK1 and LRRK2 protein levels are reduced in DAergic neurons, including analysis of LRRK1 and LRRK2 protein levels via immunohistochemistry or FACS-based analysis of TH+ neurons.

      We thank the reviewer for highlighting this incredibly important but often overlooked issue. We agree that the data in Figure 2E, F alone would be inadequate to validate DA neuron-specific LRRK cDKO mice.

      Cell type-specific conditional knockouts are a mosaic with KO cells mixed with other cell types expressing the gene normally. DA neuron-specific cDKO is particularly challenging, as DA neurons are a subset of cells embedded in the ventral midbrain. Rather than using immunostaining, which relies upon specific, good LRRK1 and LRRK2 antibodies for IHC, or FACS sorting of TH+ neurons followed by Western blotting (few cells, mixed cell populations, etc.), we chose a clean genetic approach by generating germline mutant mice carrying the deleted LRRK1 and LRRK2 alleles in all cells from the floxed LRRK1 and LRRK2 alleles. This approach permits characterization of these deletion mutations in germline mutant mice using molecular approaches that yield unambiguous results.

      We crossed CMV-Cre deleter mice with floxed LRRK1 and LRRK2 mice to generate respective germline LRRK1 KO and LRRK2 KO mice, in which all cells carry the LRRK1 or LRRK2 deleted alleles that are identical to those in DA neurons of cDKO mice. We then performed Northern, extensive RTPCR followed by sequencing, and Western analyses to show the absence of the full length LRRK1 and LRRK2 mRNA (Figure 1G, H, Figure 1-figure supplement 8 and 10), and the expected truncation of LRRK1 and LRRK2 mRNA (Figure 1-figure supplement 9 and 11), and the absence of LRRK1 and LRRK2 proteins (Figure 1I). These analyses together demonstrate that in the presence of Cre, either CMV-Cre expressed in all cells or DAT-Cre expressed selectively in DA neurons, the floxed LRRK1 and LRRK2 exons are deleted, resulting in null alleles. We further demonstrated the specificity of DAT-Cremediated recombination (deletion) by crossing DAT-Cre mice with a GFP reporter, showing that 99% TH+ DA neurons in the SNpc are also GFP+ (Figure 2A, B), indicating that DAT-Cre-mediated recombination of the floxed alleles occurs in essentially all TH+ DA neurons in the SNpc.

      2) The authors observed a significant but modest effect of LRRK1 and LRRK2 deletion on the number of TH+ neurons in the substantia nigra (12-15% loss at 20-24 months of age). It is unclear whether this extent of neuron loss is functionally relevant. To strengthen the impact of these data, additional studies are warranted to determine whether this translates into any PD-relevant deficits in the mice, including motor deficits or alterations in alpha-synuclein accumulation/aggregation.

      Yes, the reduction of DA neurons in the SNpc of cDKO mice at the age of 20-24 months is modest. At 15 months of age, the number of TH+ DA neurons in the SNpc is similar between LRRK cDKO mice (10,000 ± 141) and littermate controls (10,077 ± 310, p > 0.9999). At 20 months of age, the number of DA neurons in the SNpc of LRRK cDKO mice (8,948 ± 273) is significantly reduced (-12.7%), compared to control mice (10,244 ± 220, F1,46 = 16.59, p = 0.0002, two-way ANOVA with Bonferroni’s post hoc multiple comparisons, p = 0.0041). By 24 months of age, the number of DA neurons in the SNpc of LRRK cDKO mice (8,188 ± 452) relative to controls (9,675 ± 232, p = 0.0010) is further reduced (15.4%).

      Similar results were obtained by an independent quantification by another investigator, also conducted in a genotype blind manner, using the fractionator and optical dissector method, by which TH+ cells were quantified in 25% areas. These results are included as Figure 3-figure supplement 1 in the revised manuscript. Because of the more limited sampling, the quantification data are more variable, compared to quantification of TH+ cells in all areas of the SNpc, shown in Figure 3. With both methods, we quantified TH+ cells in every 10th sections encompassing the entire SNpc (3D structure), as sampling using every 5th or every 10th sections yielded similar results.

      We also performed behavioral analysis of LRRK cDKO mice and littermate controls at the ages of 10 and 25 months using the beam walk test (10 mm and 20 mm beam) and the pole test, which are sensitive to impairment of motor coordination. We found that LRRK cDKO mice at 10 months of age showed significantly more hindlimb errors (p = 0.0005, unpaired two-tailed Student’s t-test) and longer traversal time (p = 0.0075) in the 10mm beam walk test, compared to control mice, though their performance is similar in the 20 mm beam walk (hindlimb slips: p = 0.0733, traversal time: p = 0.9796) and in the pole test. At 22 months of age, the performance of LRRK cDKO mice and littermate controls is more variable and worse, compared to the younger mice, and is not significantly different between the genotypic groups. These results are now included as Figure 9 of the revised manuscript.

      3) The authors demonstrate that, unlike in the germline LRRK DKO mice, they do not observe any alterations in electron-dense vacuoles via EM. Given their data showing increased apoptosis and gliosis, it remains unclear how the loss of LRRK proteins leads to DAergic neuronal cell loss. Mechanistic studies would be insightful to understand better potential explanations for how the loss of LRRK1 and LRRK2 may impair cellular survival, and additional text should be added to the discussion to discuss potential hypotheses for how this might occur.

      We agree that this phenotypic difference between germline DKO and DA neuron-specific cDKO mice is intriguing, suggesting a non-cell autonomous contribution of LRRK in age-dependent accumulation of autophagic and lysosomal vacuoles in SNpc neurons of germline LRRK DKO mice. We will discuss the phenotypic difference further in the revised manuscript. We are generating microglial specific LRRK cDKO mice to investigate the role of LRRK in microglia and whether microglia contribute in a cell extrinsic manner to the regulation of the autophagy-lysosomal pathway in DA neurons.

      4) The authors discuss the potential implications of the neuronal cell loss observed in cDKO mice for LRRK1 and LRRK2 for therapeutic approaches targeting LRRK2 and suggest this argues that LRRK2 variants may exert their effects through a loss-of-protein function. However, all of the data generated in this work focus on a mouse in which both LRRK1 and LRRK2 have been deleted, and it is therefore difficult to make any definitive conclusions about the consequences of specifically targeting LRRK2. The authors note potential redundancy between the two LRRK proteins, and they should soften some of their conclusions in the discussion section around implications for the effects of LRRK2 variants. Human subjects that carry LRRK2 loss-of-function alleles do not have an increased risk for developing PD, which argues against the author's conclusions that LRRK2 variants associated with PD are loss-o-ffunction. Additional text should be included in their discussion to better address these nuances and caution should be used in terms of extrapolating their data to effects observed with PD-linked variants in LRRK2.

      We will modify the discussion accordingly in the revised manuscript.

    1. Author Response

      eLife assessment

      This valuable paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. While the examples of scientific results to be derived with this method are in the preliminary stages, they offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents.

      Authors’ Response: We thank the reviewers for the helpful and constructive comments. They have helped us plan for significant improvements to our manuscript. Our preliminary response and plans for revision are indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside this, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      -Comprehensive methodological detailing:

      The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols, and a list of materials in the supplementary materials.

      -Minimal movement artifacts:

      A notable strength of this study is the remarkably low movement artifacts. To further underscore this achievement, a more robust quantification across all subjects, coupled with benchmarking against established tools (such as those from suite2p), would be beneficial.

      Authors’ Response: This is a good suggestion. Since we used suite2p for our data analysis, and have records of the fast-z correction applied by the microscope, we can supply these as quantifications of movement corrections that were applied across our sample of mice. We hope to supply this information as a supplement in the revised manuscript.

      Currently, we have chosen to show that the corrected, post- suite2p registration movement artifacts are very close to zero. We will revise the manuscript with clear descriptions of methods that we have found important, such as fully tightening all mounting devices, utilizing the air table properly, implanting the cranial window with proper, even pressure across its entire extent, and mounting the mouse so that it is not too close or far from the surface of the running wheel.

      Insightful preliminary data and analysis:

      The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      -Clarification about the extent of the method in the title and text:

      The title of the paper, using the term "pan-cortical," along with certain phrases in the text, may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice. To avoid confusion, it should be explicitly stated that the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex). For instance, in line 545, the phrase "lateral cortex with our dorsal and side mount preparations" should be revised to "lateral cortex with our dorsal or side mount preparations" for greater clarity.

      Authors’ Response: We will revise the manuscript so that it is clear that we made use of two imaging configurations for the 2-photon mesoscope data and the benefits and limitations of these two preparations. The dorsal mount and the side mount each have their advantages and disadvantages, but together form a powerful tool for imaging much of the dorsal and lateral cortex in awake, behaving mice.

      -Comparison with existing methods:

      A more detailed contrast between this method and other published techniques would add value to the paper. Specifically, the lateral view appears somewhat narrower than that described in Esmaeili et al., 2021; a discussion of this comparison would be useful.

      Authors’ Response: We will modify the manuscript so that a more detailed comparison with other published techniques is included. The preparation by Esmaeili et al. 2021 has some similarities, but also differences, from our preparation. Our preliminary reading is that their through-the-skull field of view is approximately the same as our through-the-skull field of view that exists between our first (headpost implantation) and second (window implantation) surgeries, although our preparation appears to include more anterior areas both near to and on the contralateral side of the midline. We will compare these preparations more accurately in the revised manuscript.

      If you compare the imageable extent of our cranial window for mesoscale 2-photon imaging to that of their through-the-skull widefield preparation, which is a bit of an “apples to oranges” comparison, then you are likely correct that their field of view is larger than ours, if you are referring to our 10 mm radius-bend glass. However, use of our 9 mm radius bend glass (i.e. a tighter bend) allows us to image additional ventral auditory areas. We could show an example of this, perhaps, although we did not make as much use of this alternative window in the large FOV experiments, because the increased curvature of the glass relative to the 10 mm radius bend window prevents imaging of the entire preparation in a single 2-photon z-plane. With the 9 mm radius bend glass we mostly imaged in the multiple, small FOV configuration (see Fig. S2).

      Furthermore, the number of neurons analyzed seems modest compared to recent papers (50k) - elaborating on this aspect could provide important context for the readers.

      Authors’ response: With respect to the “modest” number of neurons analyzed (between 2000 and 8000 neurons per session for our dorsal and side mount preparations with medians near 4500; See Fig. S2e) we would like to point out that factors such as use of dual-plane imaging or multiple imaging planes, different mouse lines, use of different duration recording sessions (see our Fig S2c), use of different imaging speeds and resolutions (see our Fig S2d), use of different Suite2p run-time parameters, and inclusion or areas with blood vessels and different neuron cell densities, may all impact the count of total analyzed neurons. We could provide additional documentation of these issues, but we would like to point out that, in our case, we were not trying to maximize neuron count at the expense of other factors such as imaging speed and total spatial FOV extent.

      -Discussion of methodological limitations:

      The limitations inherent to the method, such as the potential behavioral effects of tilting the mouse's head, are not thoroughly examined. A more comprehensive discussion of these limitations would enhance the paper's balance and depth.

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this situation (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, will be discussed more thoroughly in the revised manuscript.

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparison across mice indicates that side and dorsal mount mice show similar behavioral variability.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective to 20 degrees to the right, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      -Preliminary nature of results:

      The results are at a preliminary stage; for example, the B-soid analysis is based on a single mouse, and the validation data are derived from the training data set. The discrepancy between the maps in Figures 5e and 6e might indicate that a significant portion of the map represents noise. An analysis of variability across mice and a method to assign significance to these maps would be beneficial.

      Authors’ Response: In this methods paper, we have chosen to supply proof of principle examples, without a complete analysis of animal-to-animal variance. The dataset for this paper contains both neural and behavioral data for 91 sessions across 18 mice from both dorsal and side mount preparations. The complete analysis of this dataset exceeds the capacity of the present study. We will include more individual examples in the revised version, along with data showing the amount of between session and across mouse variance. We will include in the revised manuscript a comparison of the stability of B-SOiD measures across sessions, as a demonstration of what may be expected with this method.

      -Analysis details:

      More comprehensive details on the analysis would be beneficial for replicability and deeper understanding. For instance, the statement "Rigid and non-rigid motion correction were performed in Suite2p" could be expanded with a brief explanation of the underlying principles, such as phase correlation, to provide readers with a better grasp of the methodologies employed.

      Authors’ Response: We are revising the manuscript to give more detail without reducing readability, so as to increase clarity of presentation. Since this is a methods paper, we are modifying the manuscript to include more details and clear explanations so that the reader may replicate our methods and results.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      Authors’ Response: Thank you for this excellent summary of the scope of our paper.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Authors’ Response: We very much support the idea that our work here will contribute to the development of standard workflows across the field including multiple approaches to large-scale neural recordings.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various problems, such as motion artifacts or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for the imaging of neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field.

      Weaknesses:

      Most of the descriptions are quite focused on a specific acquisition system, the Thorlabs Mesoscope, and the manuscript is in part highly technical making it harder to understand the motivation and reasoning behind some of the proposed implementations. A revised version would benefit from a more general description of common problems and the thought process behind the proposed solutions to broaden the impact of the work and make it more accessible for labs that do not have access to a Thorlabs mesoscope. A better introduction of some of the specific issues would also promote the development of other solutions in labs that are just starting to use similar tools.

      Authors’ Response: We will re-write the motivation behind the study to clarify the general problems that are being addressed. As the 2-photon imaging component of these experiments were performed on a Thorlabs mesoscope, the imaging details will necessarily deal specifically with this system. We will briefly compare the methods and results from our Thorlabs system to that of other systems, based on what we are able to glean from the literature on their strengths and weaknesses.

      Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication to providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high, but it is not clear whether this is available to the community, some datasets should be deposited. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high-level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large-scale neural ensembles with behaviour. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      Authors’ Response: Thank you for the kind words.

      We intend to make the data set used in making our main figures available to the public, perhaps using FigShare, so that they may check the validity of the methods and analysis. We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with a second in-depth analysis paper that is currently in preparation.

      This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other. This is described in the methods, but a visual representation would greatly benefit the readers looking to implement something similar.

      Authors’ Response: This is an excellent suggestion. We will include a workflow diagram in the revised manuscript for the methods, data collection, and analysis.

      The authors should cite sources for the claims stated in lines 449-453 and cite the claim of the mouse's hearing threshold mentioned in lines 463.

      Authors’ Response: For the claim stated in lines 449-453, “The unattenuated or native high-frequency background noise generated by the resonant scanner causes stress to both mice and experimenters, and can prevent mice from achieving maximum performance in auditory mapping, spontaneous activity sessions, auditory stimulus detection, and auditory discrimination sessions/tasks,” we can provide the following references: (i) for mice: Sadananda et al, 2008 (“Playback of 22-kHz and 50-kHz ultrasonic vocalizations induces differential c-fos expression in rat brain”, Neuroscience Letters, Vol 435, Issue 1, p 17-23), and (ii) for humans: Fletcher et al, 2018 (“Effects of very high-frequency sound and ultrasound on humans. Part I: Adverse symptoms after exposure to audible very-high frequency sound”, J Acoust Soc A, 144, 2511-2520). We will include these references in the revised paper.

      For line 463, “i.e. below the mouse hearing threshold at 12.5 kHz of roughly 15 dB”, we can provide the following reference: Zheng et al, 1999 (“Assessment of hearing in 80 inbred strains of mice by ABR threshold analyses”, Vol 130, Issues 1-2, p 94-107). We will also include this reference in the paper. Thank you for identifying these citation omissions.

      No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors.

      Authors’ Response: There are two statistical comparisons that we feel may be useful to add to the single session data displayed in this figure, in order to address the point that you raise. The first would allow us to assess whether for each Rastermap group, the distribution of neuron densities across CCF areas differs from a null, uniform distribution. The second would allow us to examine differences between Rastermap groups associated with different qualitative behaviors in order to know with which patterns of neural activity they are reliably associated.

      For the first comparison, we could provide a statistic similar to what we provide for Fig. S6c and f, in which for each CCF area we compare the observed mean correlation values to a null of 0, or, in this case, the population densities of each Rastermap group for each CCF area to a null value equal to the total number of CCF areas divided by the total number of recorded neurons for that group (i.e. a Rastermap group with 500 neurons evenly distributed across ~30 CCF areas would contain ~17 neurons (or ~6% density) per CCF area.) Our current figure legend states that the maximum of the scale bar look-up value (reds) for each group ranges from ~8% to 32%. So indeed, adding these significances would be informative in this case.

      For the second comparison, we could compare the density of neurons for each CCF area across Rastermap groups for this session. For example, it may be the case that the density of neurons in primary and secondary visual areas belonging to Rastermap groups that predominate during the “walk” behavior is higher than in the Rastermap group that predominates during the “whisk” behavior, or that the density of neurons in the “whisk” and “twitch” Rastermap groups in primary and secondary motor areas is higher than in the Rastermap groups that are active during the “walk” and “oscillate” behaviors.

      Such a comparison should in fact be robust to Rastermap group variability across sessions and mice, as long as the same qualitative behaviors recur. However, our current qualitative methods for discretization of the Rastermap groups likely limits our ability to extend such an analysis accurately across our entire dataset. We are pursuing more rigorous analysis methods in this vein for our second, results oriented paper.

      While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in lines 178-179, the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what is known from the literature.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the current methods paper. We are following up this methods paper with an in depth analysis of neural activity and corresponding behavior across the cortex during spontaneous and trained behaviors, but this analysis goes well beyond the scope of the present manuscript. Here, we prefer to present examples of the types of results that can be expected to be obtained using our methods, and how these results compare with those obtained by others in the field.

      Specific strengths and weaknesses with areas to improve:

      The paper should include an overall cartoon diagram that indicates how the various modules are linked together for the sampling of both behaviour and mesoscale GCAMP. This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other.

      Authors’ Response: This is an excellent suggestion and will be included in the revised manuscript, so that readers can more readily follow our workflow, data collection, and analysis.

      The paper contains many important results regarding correlations between behaviour and activity motifs on both the cellular and regional scales. There is a lot of data and it is difficult to draw out new concepts. It might be useful for readers to have an overall figure discussing various results and how they are linked to pupil movement and brain activity. A simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc) may help in this regard.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the present methods paper. Such an analysis is a significant undertaking with such large and heterogeneous datasets, and we provide proof-of-principle data here so that the reader can understand the type of data to be expected using our methods. We hope to provide a more complete analysis of data obtained using our methodology in the near future in a second manuscript.

      However, we may be amenable to including preliminary linear model fit results, as supplementary material, for the two example sessions highlighted in this paper (i.e. the one dorsal mount session in Fig. 4, and the one side mount session shown in Figs. 5 and 6).

      Previously, widefield imaging methods have been employed to describe regional activity motifs that correlate with known intracortical projections. Within the authors' data it would be interesting to perhaps describe how these two different methods are interrelated -they do collect both datasets. Surprisingly, such macroscale patterns are not immediately obvious from the authors' data. Some of this may be related to the scaling of correlation patterns or other factors. Perhaps there still isn't enough data to readily see these and it is too sparse.

      Authors’ Response: Unfortunately, we are unable to directly compare widefield GCaMP6s activity with mesoscope 2-photon GCaMP6s activity. During widefield data acquisition, animals were stimulated with visual, auditory, or somatosensory stimuli, while 2-photon mesoscope data collection occurred during spontaneous changes in behavioral state, without sensory stimulation. The suggested comparison is, indeed, an interesting project for the future.

      In lines 71-71, the authors described some disadvantages of one-photon widefield imaging including the inability to achieve single-cell resolution. However, this is not true. In recent years, the combination of better surgical preparations, camera sensors, and genetically encoded calcium indicators has enabled the acquisition of single-cell data even using one-photon widefield imaging methods. These methods include miniscopes (Cai et al., 2016), multi-camera arrays (Hope et al., 2023), and spinning disks (Xie et al., 2023).

      Cai, Denise J., et al. "A shared neural ensemble links distinct contextual memories encoded close in time." Nature 534.7605 (2016): 115-118.

      Hope, James, et al. "Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton." bioRxiv (2023).

      Xie, Hao, et al. "Multifocal fluorescence video-rate imaging of centimetre-wide arbitrarily shaped brain surfaces at micrometric resolution." Nature Biomedical Engineering (2023): 1-14.

      Authors’ Response: We will correct these statements and incorporate these, and other relevant, references. There are advantages and disadvantages to each chosen technique, such as ease of use, field of view, accuracy, speed, etc., and we will highlight a few of these without an extensive literature review.

      Even the best one-photon imaging techniques typically have ~10-20 micrometer resolution in xy (we image at 5 micrometer resolution for our large FOV configuration, but the xy point-spread function for the Thorlabs mesoscope is 0.61 x 0.61 micrometers in xy with 970 nm excitation) and undefined z-resolution (4.25 micrometers for Thorlabs mesoscope). A coarser resolution increases the likelihood that activity data from neighboring cells may contaminate the fluorescence observed from imaged neurons. Reducing the FOV and using sparse expression of the indicator lessens this overlap problem.

      We do appreciate these recent advances, however, particularly for use in cases where more rapid imaging is desired over a large field of view (CCD acquisition can be much faster than that of standard 2-photon galvo-galvo or even galvo-resonant scanning, as the Thorlabs mesoscope uses). This being said, there are few currently available genetically encoded Ca2+ sensors that are able to measure fluctuations faster than ~10 Hz, which is a speed achievable on the Thorlabs 2-photon mesoscope with our techniques using the “small, multiple FOV” method (Fig. S2d, e).

      The authors' claim of achieving optical clarity for up to 150 days post-surgery with their modified crystal skull approach is significantly longer than the 8 weeks (approximately 56 days) reported in the original study by Kim et al. (2016). Since surgical preparations are an integral part of the manuscript, it may be helpful to provide more details to address the feasibility and reliability of the preparation in chronic studies. A series of images documenting the progression optical quality of the window would offer valuable insight.

      Authors’ Response: As you suggest, we will include images and data demonstrating the average changes in the window preparation, as well as the degree of variability and a range of outcome scenarios that we observed over the prolonged time periods of our study. We will also include methodological details that we found were useful for facilitating long term use of these preparations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper performed a functional analysis of the poorly characterized pseudo-phosphatase Styxl2, one of the targets of the Jak/Stat pathway in muscle cells. The authors propose that Styxl2 is essential for de novo sarcomere assembly by regulating autophagic degradation of non-muscle myosin IIs (NM IIs). Although a previous study by Fero et al. (2014) has already reported that Styxl2 is essential for the integrity of sarcomeres, this study provides new mechanistic insights into the phenomenon. In vivo studies in this manuscript are compelling; however, I feel the contribution of autophagy in the degradation of NM IIs is still unclear.

      Major concerns:

      1) The contribution of autophagy in the degradation of Myh9 is still unclear to this reviewer.

      It has been reported that autophagy is dispensable for sarcomere assembly in mice (Cell Metab, 2009, PMID; 1994508). In Fig. 7A, the authors showed that overexpressed Styxl2 downregulated the amount of ectopically expressed Myh9 in an ATG5-dependent manner in C2C12 cells; however, the experiment is far from a physiological condition. Therefore, the authors should test ATG5 knockdown and the genetic interaction between Styxl2 and ATG5 in vivo. That is, 1) loss of ATG5 on sarcomere assembly in zebrafish, and 2) the genetic interaction between Styxl2 and ATG5; co-injection of Styxl2 mRNA and ATG5-MO into the zebrafish embryos.

      Our response: In fact, the reference cited by the reviewer (Cell Metab, 2009; PMID; 19945408) clearly indicated that autophagy is required for sarcomere assembly. Moreover, another paper using the fish extraocular muscle regeneration model (Autophagy, 2014, PMID: 27467399), also showed that the sarcomere structure was disrupted in the regenerated muscles when autophagy was inhibited by chloroquine. In addition, other references (Nature medicine, 2007, PMID: 17450150; Autophagy, 2010, PMID: 20431347) also showed that loss of Atg5 in mouse cardiac muscles led to disorganized sarcomere structure. We also performed the Atg5 knockdown experiments as suggested by the reviewer. However, the sarcomere structure defects were not so obvious as Styxl2 knockdown (see Author response image 1 below). In fact, it was reported that Atg5 knockdown may not be a desirable strategy to disrupt autophagy as it was found “--- only a small amount of Atg5 is needed for autophagy, knockdown of Atg5 to levels low enough to block autophagy might be difficult to achieve, --” (Nature medicine, 2007, PMID: 17450150). Due to the ineffectiveness of the Atg5 MO in our assays, we did not perform the second experiment suggested by the reviewer. Moreover, as Styxl2 is not a key component of the autophagy machinery, it is less likely that overexpression of Styxl2 alone can rescue the autophagy defects caused by Atg5.

      Author response image 1.

      The fish zygotes were injected with Atg5 or Ctrl MO. 48 hpf, the fish were stained with an anti-Actinin antibody. Some fast muscle fibers were disrupted when Atg5 was knocked down. The number in numerator at the bottom of each image represents fish embryos showing normal Actinin staining pattern, while that in denominator represents the total number of embryos examined. Scale bar, 10 µm.

      2) As referenced, Yamamoto et al. reported that Myh9 is degraded by autophagy. Mechanistically, Nek9 acts as an autophagic adaptor that bridges Atg8 and Myh9 through interactions with both. Inconsistent with the model, the authors mentioned on page 12, lines 365-367, "A recent report showed that Myh9 could also undergo Nek9-mediated selective autophagy (Yamamoto et al., 2021), suggesting that Myh9 is ubiquitinated". I think it is not yet explored whether autophagic degradation of Myh9 requires its ubiquitination. Moreover, I cannot judge whether Myh9 is ubiquitinated in a Styxl2-dependent manner from the data in Fig. 7C. The author should test whether Nek9 is required for Myh9 degradation in muscles. If Nek plays a role in the Myh9 degradation, it would be better to remove Fig. 7C.

      Our response: Indeed, as pointed out by the reviewer, it has not been explored whether Myh9 is ubiquitinated or not. However, it has been well-established that some proteins undergoing autophagic degradation are ubiquitinated, which are linked to Atg8/LC3 via p62 and NBR1 (Mol Cell, 2009, PMID: 19250911; J Biol Chem, 2007, PMID: 17580304). To improve the data quality, we repeated the Myh9 ubiquitination experiment in cells with or without Styxl2 by using a slightly different strategy: as shown in the revised Figure 7C, we first co-transfect HEK 293T cells with HA-Myh9, Myc-ubiquitin, and Flag-Styxl2. We then immunoprecipitated Myc-tagged Ubiquitin from the whole cell lysates, and then blot for HAMyh9. We detected an obvious increase in Ubiquitin-conjugated HA-Myh9 (revised Figure 7C). As suggested by the reviewer, we also tested whether knockdown of Nek9 affects the degradation of Myh9. We failed to detect an obvious effect (see Author response image 2 below) caused by Nek9 knockdown. One possible explanation for this negative result is that Nek9 itself is a negative regulator of selective autophagy (J Biol Chem, 2020, PMID: 31857374). By knocking it down, the functions of the autophagy machinery are expected to be enhanced instead of being impaired. This may explain why we failed to detect an effect on Myh9 degradation simply by knocking down Nek9. To further elucidate whether Nek9 is involved in Myh9 degradation in myoblasts, we may need to use a dominant-negative mutant of Nek9 missing the LCIII-binding motif as shown by Yamamoto (Nat Commun, 2021, PMID: 34078910). This will be addressed in our future study.

      Author response image 2.

      C2C12 cells were transfected with negative control siRNA (NC), siNek9#2 or siNek9#3. 18 h later, the cells were transfected with plasmids HA-Myh9 and Flag-Styxl2 or Flag-Stk24. After another 24 h, the cells were harvested for RT-qPCR (left panel) or western blot (right panel).

      3) In Fig. 5F, the protein level of Styxl2 and Myh10 should be checked because the efficiency of Myh10-MO was not shown anywhere in this manuscript.

      Our response: As suggested by the reviewer, a Western blot showing the protein levels of Myh10 was shown in Figure 5-figure supplement 1B.

      Reviewer #2 (Public Review):

      The authors investigated the role of the Jak1-Stat1 signaling pathway in myogenic differentiation by screening the transcriptional targets of Jak1-Stat1 and identified Styxl2, a pseudophosphatase, as one of them. Styxl2 expression was induced in differentiating muscles. The authors used a zebrafish knockdown model and conditional knockout mouse models to show that Styxl2 is required for de novo sarcomere assembly but is dispensable for the maintenance of existing sarcomeres. Styxl2 interacts with the non-muscle myosin IIs, Myh9 and Myh10, and promotes the replacement of these non-muscle myosin IIs by muscle myosin IIs through inducing autophagic degradation of Myh9 and Myh10. This function is independent of its phosphatase domain.

      A previous study using zebrafish found that Styxl2 (previously known as DUSP27) is expressed during embryonic muscle development and is crucial for sarcomere assembly, but its mechanism remains unknown. This paper provides important information on how Styxl2 mediates the replacement of non-muscle myosin with muscle myosin during differentiation. This study may also explain why autophagy deficiency in muscles and the heart causes sarcomere assembly defects in previous mouse models.

      Reviewer #3 (Public Review):

      Wu and colleagues are characterising the function of Styxl2 during muscle development, a pseudo-phosphatase that was already described to have some function in sarcomere morphogenesis or maintenance (Fero et al. 2014). The authors verify a role for Styxl2 in sarcomere assembly/maintenance using zebrafish embryonic muscles by morpholino knockdown and by a conditional Styxl2 allele in mice (knocked-out in satellite cells with Pax7 Cre).

      Experiments using a tamoxifen inducible Cre suggest that Styxl2 is dispensable for sarcomere maintenance and only needed for sarcomere assembly.

      BioID experiments with Styxl2 in C2C 12 myoblasts suggest binding of nonmuscle myosins (NMs) to Styxl2. Interestingly, both NMs are downregulated when muscles differentiate after birth or during regeneration in mice. This down-regulation is reduced in the Styxl2 mutant mice, suggesting that Styxl2 is required for the degradation of these NMs.

      Impressively, reducing one NM (zMyh10) by double morpholino injection in a Styxl2 morphant zebrafish, does improve zebrafish mobility and sarcomere structure. Degradation of Mhy9 is also stimulated in cell culture if Styxl2 is co-expressed. Surprisingly, the phosphatase domain is not needed for these degradation and sarcomere structure rescue effects. Inhibitor experiments suggest that Styxl2 does promote the degradation of NMs by promoting the selective autophagy pathway.

      Strengths:

      A major strength of the paper is the combination of various systems, mouse and fish muscles in vivo to test Styxl2 function, and cell culture including a C2C12 muscle cell line to assay protein binding or protein degradation as well as inhibitor studies that can suggest biochemical pathways.

      Weakness:

      The weakness of this manuscript is that the sarcomere phenotypes and also the western blots are not quantified. Hence, we rely on judging the results from a single image or blot. Also, Styxl2 role in sarcomere biology was not entirely novel.

      Few high resolution sarcomere images are shown, myosins have not been stained for.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      4) The position of molecular weight markers should be shown in all Western blot data.

      Our response: As suggested by the reviewer, the molecular weight markers have been added in the Western blot data.

      5) Schematic models of Styxl2deltaN509 and N513 construct would be helpful for the readers.

      Our response: A schematic has been added in Figure 6B (upper panel) to show Styxl2deltaN509 and Styxl2N513.

      6) Several data were described but not shown (data not shown). I think the data need to be included in the main or supplemental figures.

      Our response: As suggested by the reviewer, the raw data were now included in the Figure 6-figure supplement 1A and Figure 7-figure supplement 1.

      Reviewer #2 (Recommendations For The Authors):

      1) In Fig. 5E, the authors suggest that the needle touch response was improved by additional knockdown of Myh10. This is a bit confusing because the germline knockout of Myh10 is lethal (line 445). The authors should provide more explanation on this point. Additionally, it would be better to include Myh10-MO in Fig. 5E.

      Our response:<br /> In line 445 of our original manuscript, we stated that germline knockout of mouse Myh10 gene is lethal based on a published report (Proc Natl Acad Sci USA, 1997, PMID: 9356462). Here, in zebrafish zygotes, we only knocked down zMyh10, thus, we do not expect to get a lethal phenotype. In addition, other groups who knocked down Myh10 in fish also did not get a lethal phenotype (Dev Biol, 2015, PMID: 25446029). As to the control involving Myh10MO in the experiment in Fig.5E, we did include it in our experiments. As we did not observe any obvious effects on either motility or sarcomere structures, we did not include the data set in the figure.

      2) It was suggested that Myh9 and Myh10 form a complex (Rao et al. PLoS One 9, e114087, 2014). Thus, the IP experiments do not rule out the possibility that Styxl2 directly interacts with either Myh9 or Myh10 and indirectly with the other.

      Our response: In known myosin-II complexes, different myosin molecules can associate with each other through their tail domains (Bioarchitecture, 2013, PMID: 24002531). Thus, if we use fulllength myosin molecules in our co-immunoprecipitation assays, it will be difficult to exclude the possibility raised by the reviewer. However, by using truncated myosin proteins, we showed that the head domain of either Myh9 or Myh10 could interact with Styxl2 in the absence of the tail domain (Figure 4E, F). This result strongly suggests that both Myh9 and Myh10 can independently interact with Styxl2.

      Reviewer #3 (Recommendations For The Authors):

      1) The western blot shown in Figure 3B supporting the induced deletion of Styxl2 should be quantified. Ideally, some other blots, e.g., in Figure 5, too. Please add the age of the mice in Figure 5B to the figure legend.

      Our response:<br /> As suggested by the reviewer, we quantified the data in Figures.3B, 3F, 5B, 5D, and 7A and the data were included in the revised figures. In Fig.5B, we already indicated the age of the mice (i.e., P1) in the legend.

      2) A quantification of the sarcomere phenotypes in the double knock-down of zMyh10 and Styxl2 compared to Styxl2 single would make the paper significantly stronger. Furthermore, a double morpholino control should be included to rule out any RNAi machinery 'dilution effect'.

      Our response: As suggested by the reviewer, we quantified the sarcomere structures using the line scan analysis in ImageJ and the scan images were placed as inserts in the upper corner of the immunofluorescent images (revised Figures 5F, and 6C). To avoid potential “dilution effects”, in all the experiments involving the use of two different MOs, the total amount of MO was kept the same in all control samples by including a control MO (e.g., in samples treated with one specific MO, an equal amount of a control MO was also included, while in samples without any specific MO, twice as much control MO was used).

      3) The sarcomere phenotypes in figure 6 should also be better quantified, for example using simple line scans of the alpha-actinin stains and assay periodicity or calculating the autocorrelation coefficients. How about myosin stains?

      Our response: We quantified Figure 6C as suggested by the reviewer. We also performed myosin staining. The results were similar to that shown by the a-actinin antibody (see revised Figure 6-Fig supplement 1B).

      4) Do the authors see periodic NMs patterns in developing mouse muscle fibers as indicated by the model in in in figure 7D? It is unclear if nonmuscle myosin is present in a PERIODIC pattern in early myofibrils. NM myosin periodic patterns that have been observed have a periodicity of only about 1 µm fitting the shorter length of the NM bipolar filaments (about 300 nm only, PMID 28114270).

      Our response: The reviewer raised a good point here. Ideally, we should examine developing mouse muscle fibers to prove that NM shows periodic patterns. However, due to the difficulty in catching myocytes undergoing sarcomere assembly, the majority of the studies involving NM in sarcomeres use cultured cardiomyocytes. Using TA muscles from P1 new-born mice, we failed to detect the presence of NM in sarcomeres (see Author response image 3 below). Actually, nearly all the myofibers showed mature sarcomere pattern without the NM signal. More work is needed in the future to examine developing mouse fibers at different embryonic stages to look for the presence of NM in developing sarcomeres.

      Author response image 3.

      The TA muscles were collected from male and female P1 mice. The muscles were sectioned and co-stained for a-actinin (Actn) and Myh9. The majority of myofibrils is mature without the NM II signal. Scale bar, 10 µm.

      5) Recent work suggested that mechanical tension is key to assemble the first long periodic myofibril containing immature sarcomeres. Tension is likely produced by a combination of NM and Mhc in the assembling sarcomeres themselves. This could be included in the introduction or discussion (PMIDs 24631244, 29316444, 29702642, 35920628).

      Our response: We thank the reviewer for pointing to us additional relevant references. We have added them in the Introduction.

      6) I suggest replacing "sarcomeric muscles" with "striated muscles".

      Our response: We revised the term in the manuscript as suggested by the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study addresses how protein synthesis in activated lymphocytes keeps up with their rapid division, with important findings that are of significance to cell biologists and immunologists endeavouring to understand the 'economy' of the immune system. The work is supported by solid data but because it proposes non-conventional mechanisms, it requires additional explanation and justification to align with the current understanding in the field.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors examine the fascinating question of how T lymphocytes regulate proteome expression during the dramatic cell state change that accompanies the transition from the resting quiescent state to the activated, dividing state. Orthogonal, complementary assays for translation (RPM/RTA, metabolic labeling) are combined with polyribosome profiling and quantitative, biochemical determinations of protein and ribosome content to explore this question, primarily in the OT-I T lymphocyte model system. The authors conclude that the ratio of protein levels to ribosomes/protein synthesis capacity is insufficient to support activation-coupled T cell division and cell size expansion. The authors hint at cellular mechanisms to explain this apparent paradox, focusing on protein acquisition strategies, including emperipolesis and entosis, though these remain topic areas for future study.

      The strengths of the paper include the focus on a fundamental biological question - the transcriptional/translational control mechanisms that support the rapid, dramatic cell state change that accompanies lymphocyte activation from the quiescent to activated state, the use of orthogonal approaches to validate the primary findings, and the creative proposal for how this state change is achieved.

      The weakness of the work is that several cellular regulatory processes that could explain the apparent paradox are not explored, though they are accessible for experimental analysis. In the accounting narrative that the authors highlight, a thorough accounting of the cellular process inventory that could support the cell state change should be further explored before committing to the proposal, provocative as it is, that protein acquisition provides a principal mechanism for supporting lymphocyte activation cell state change.

      Appraisal and Discussion:

      1) relating to the points raised above, two recent review articles explore this topic area and highlight important areas of study in RNA biology and translational control that likely contribute to the paradox noted by the authors: Choi et al. 2022, doi.org/10.4110/in.2022.22.e39 ("RNA metabolism in T lymphocytes") and Turner 2023, DOI: 10.1002/bies.202200236 ("Regulation and function of poised mRNAs in lymphocytes"). These should be cited, and the broader areas of RNA biology discussed by these authors integrated into the current manuscript.

      Good suggestion. We have added these references with a short discussion.

      2) The authors cite the Wolf et al. study from the Geiger lab (doi.org/10.1038/s41590-020-07145, ref. 41) though largely to compare determined values for ribosome number. Many other elements of the Wolf paper seem quite relevant, for example, the very high abundance of glycolytic enzymes (and whose mRNAs are quite abundant as well), where (and as others have reported) there is a dramatic activation of glycolytic flux upon T cell activation that is largely independent of transcription and translation, the evidence for "pre-existing, idle ribosomes", the changes in mRNA copy number and protein synthesis rate Spearman correlation that accompanies activation, and that the efficiencies of mRNA translation are heterogeneous. These data suggest that more accounting needs to be done to establish that there is a paradox.

      As one example, what if glycolytic enzyme protein levels in the resting cell are in substantial excess of what's needed to support glycolysis (likely true) and so translational upregulation can be directed to other mRNAs whose products are necessary for function of the activated cell? In this scenario, the dilution of glycolytic enzyme concentration that would come with cell division would not necessarily have a functional consequence. And the idle ribosomes could be recruited to key subsets of mRNAs (transcriptionally or post-transcriptionally upregulated) and with that a substantial remodeling of the proteome (authors ref. 44). The study of Ricciardi et al. 2018 (The translational machinery of human CD4+ T cells is poised for activation and controls the switch from quiescence to metabolic remodeling (doi.org/10.1016/j.cmet.2018.08.009) is consistent with this possibility. That study, and the short reviews noted above, are useful in highlighting the contributions of selective translational remodeling and the signaling pathways that contribute to the cell state change of T cell activation.

      Our study focuses on the central issue of whether measured ribosome translation rates support rapid division. The abundance of glycolytic enzymes, mRNA copy numbers etc., are clearly interesting and critical to cell metabolism, but are irrelevant to measuring the overall translation rate and capacity of T cells.

      From this perspective, an alternative view can be posited, where the quiescent state is biologically poised to support activation, where subsets of proteins and mRNAs are present in far higher levels than that necessary to support basal function of the quiescent lymphocyte. In such a model, the early stages of lymphocyte activation and cell division are supported by this surplus inventory, with transcriptional activation, including ribosomal genes, primarily contributing at later stages of the activation process. An obvious analogy is the developing Drosophila embryo where maternal inheritance supports early-stage development and zygotic transcriptional contributions subsequently assuming primary control (e.g. DOI 10.1002/1873-3468.13183 , DOI: 10.1126/science.abq4835). To pursue that biological logic would require quantifying individual mRNAs and their ribosome loading states, mRNA-specific elongation rates, existing individual protein levels, turnover rates of both mRNAs and proteins, ribosome levels, mean ribosome occupancy state, and how each of these parameters is altered in response to activation. Such accounting could go far to unveil the paradox. This is a considerable undertaking, though, and outside the scope of the current paper.

      The reviewer is essentially proposing RiboSeq analysis of pre- and post-activation T cells, whereby individual mRNAs can be queried for ribosome occupancy, and where translation inhibitors could be used to quantify mRNA-specific transit rates. This is important information but would not provide a more accurate accounting of protein synthesis rates than our much more direct measurement. We note that other labs have begun to work on this exact topic, however – see both PMID: 36002234 and PMID: 32330465.

      Reviewer #2 (Public Review):

      This paper takes a novel look at the protein economy of primary human and mouse T-cells - in both resting and activated state. Their findings in primary human T-cells are that:

      1) A large fraction of ribosomes are stalled in resting cultured primary human lymphocytes, and these stalled ribosomes are likely to be monosomes.

      2) Elongation occurs at similar rates for HeLa cells and lymphocytes, with the active ribosomes in resting lymphocytes translating at a similar rate as fully activated lymphocytes.

      They then turn their attention to mouse OT-1 lymphocytes, looking at translation rates both in vitro and in vivo. Day 1 resting T-cells also show stalling - which curiously wasn't seen on freshly purified cells - I didn't understand these differences.

      This is clarified and discussed starting in the third paragraph of “Protein synthesis in mouse lymphocytes ex vivo” section. Cells cultured ex vivo for 1 day with no activation show signs of stalling, as we observed in isolated human cells. But cells immediately out of an animal show a measurable decay rate since they are obviously synthesizing proteins in vivo and are processed rapidly.

      In vivo, they show that it is possible to monitor accurate translation and measure rates. Perhaps most interestingly they note a paradoxically high ratio of cellular protein to ribosomes insufficient to support their rapid in vivo division, suggesting that the activated lymphocyte proteome in vivo may be generated in an unusual manner.

      This was an interesting and provocative paper. Lots of interesting techniques and throwing down challenges to the community - it manages to address a number of important issues without necessarily providing answers.

      Reviewer #3 (Public Review):

      This manuscript provides a more or less quantitative analysis of protein synthesis in lymphocytes. I have no issue with the data as presented, as I'm sure all measurements have been expertly done. I see no need for additional experimental work, although it would be helpful if the authors could comment on the possibility of measuring the rate of synthesis of a defined protein, say a histone, in cells prior to and after activation. The conclusion the authors leave us with is the idea that the rates of protein synthesis recorded here are incompatible with observed rates of T cell division in vivo. Indeed, in the final paragraph of the discussion, the authors note the mismatch between what they consider a requirement for cell division, and the observed rates of protein synthesis. They then invoke unconventional mechanisms to make up for the shortfall, without -in this reviewer's opinion- discussing in adequate detail the technical limitations of the methodology used.

      Points #1-3 in the Discussion relate to potential pitfalls of our analyses; in point #3 we now add further limitations of RTA based on non-random detection of nascent chains due either to bias in either puromycylation or antibody detection of puromycylated nascent chains.

      A key question is the broad interest, novelty, and extension of current knowledge, in comparison with Argüello's (reference 27) 'SunRise' method. It would be helpful for the authors to stake out a clear position as to the similarities and differences with reference 27: what have we learned that is new? The authors could cite reference 27 in the introduction of their manuscript, given the similarity in approach. That said, the findings reported here will generate further discussion.

      We did cite this reference (27) in the section “Flow RPM measures ribosome elongation rate in live cells” giving credit where credit is due. We independently devised the method in 2014, and uniquely, to our knowledge, have applied it in vivo. We now further discuss the importance of our CHX modification to limit dissociation and increase the accuracy of RTA (second and third paragraphs of “Protein synthesis in mouse lymphocytes and innate immune cells in vivo”).

      The manuscript would increase in impact if the authors were to clearly define why a particular measurement is important and then show the actual experiment/result. As an example, it would be helpful to explain to the non-expert why the distinction between monosomes, polysomes, and stalled versions of the same is important, and then explain the rationale of the actual experiment: how can these distinctions be made with confidence, and what are confounding variables?

      We believe this is addressed in the section “Resting human lymphocytes have a dominant monosome population”.

      The initial use of human cells, later abandoned in favor of the OT-1 in vitro and in vivo models, requires contextualization. If the goal is to address the relationship between rates of translation and cell division of antigen-activated T cells in vivo, then a lot of the work on the human model and the in vitro experiments becomes more of a distraction, unless properly contextualized. Is there any reason to assume that antigen-specific activation in vivo will impact translation differently than the use of the PMA/ionomycin/IL2 cocktail? The way the work is presented leaves me with the impression that everything that was done is included, regardless of whether it goes to the core of the question(s) of interest.

      Donor PBMCs are clearly the more relevant model for understanding human T cell biology, which is why started our studies with this model. Had the manuscript strictly described mouse studies it is likely that we would be criticized for not studying human cells: Catch 22! However, as we state in the manuscript, the human cell model has a variety of technical downsides, including donor heterogeneity. PMA/ionomycin activation is also physiologically questionable, and while we could deliver a defined TCR to redirect their specificity, this is typically done after cells have been activated, since lentiviral delivery is poor in resting lymphocytes. A main point we try to make from this work is that cells derived from human blood donors show signs of ribosomal stalling by the time they are isolated and put into culture. This may limit the usefulness of studying them preactivation, although based on our mouse data, some level of stalled ribosomes may be a feature as well – to prime T cells to be ready for their massive expansion. The move to the OT-I system gave us complete control over the system, including in vivo delivery of translation inhibitors.

      It would be helpful if the authors made explicit some of the assumptions that underlie their quantitative comparisons. Likewise, the authors should discuss the limitations of their methods and provide alternative interpretations where possible, even if they consider them less/not plausible, with justification. As they themselves note, improvements in the RPM protocols raised the increase in translating ribosomes upon activation from 10-fold to 15-fold. Who's to say that is the best achievable result? What about the reliability/optimization of the other measurements?

      We expanded discussion of potential pitfalls of the RPM techniques and others in the Discussion section. Regarding RPM per se, we use it as a readout of ribosome time decay, so even if further optimizations can be made, the decay rates we have made should still be accurate. In addition, for our cell accounting measurements in Figure 6, we do not use RPM data and rather calculate based on the assumption that every ribosome is used for protein synthesis at a “maximal” rate of mRNA transit.

      The composition of the set of proteins produced upon activation will differ from cell to cell (CD4, CD8, B, resting vs. dividing). Even if analyses are performed on fixed cells, the ability of the monoclonal anti-puromycin antibody to penetrate the matrix of the various fixed cell types may not be equal for all of them, depending on protein composition, susceptibility to fixation etc. Is it possible for puromycin to occupy the ribosome's A site and terminate translation without forming a covalent bond with the nascent chain? This could affect the staining with anti-puromycin antibodies and also underestimate the number of nascent chains.

      Yes, the method (like every other one) is imperfect. Harringtonine run-off experiments show that RPM staining only detects nascent chains. Note that reference 47 reports that 75% of translation in activated T cells is devoted to synthesizing ~250 housekeeping proteins, which are likely to be highly similar between lymphocyte subsets.

      I believe that the concept of FACS-based quantitation also requires an explanation for the nonexpert. For the FACS plots shown, the differences between the highest and lowest RPM scores for cells that divided and that have a similar CFSE score is at least 10-fold. Does that mean that divided cells can differ by that margin in terms of the number of nascent chains present? If I make the assumption that cells stimulated with PMA/ionomycin/IL2 respond more or less synchronously, why would there be a 10-fold difference in absolute fluorescence intensity (anti=puromycin) for randomly chosen cells with similar CFSE values? While the use of MFI values is standard practice in cytofluorimetry, the authors should devote some comments to such variation at the population level.

      We believe that the referee is referring to Sup Fig. 1B. In this experiment the T cells are polyclonal and represent the full range of naïve to potentially exhausted differentiation states. Looking at our initial in vivo RPM study (reference 22) and comparing Figure 2 (OTI’s) to Figure 3 (endogenous CD4s or CD8s), reveals more spread in the RPM values polyclonal vs. monoclonal T cells - now clarified in the third paragraph of “Protein synthesis in mouse lymphocytes and innate immune cells in vivo”). Flow cytometry is by far the most accurate method for measuring fluorescence in individual cells. It is likely to be an accurate measure of the variation of nascent chains in cells in the same division cohort but likely represents the diversity of T cell activation profiles in blood of healthy donors.

      It is assumed that for cells to complete division, they must have produced a full and complete copy of their proteome and only then divide. What if cells can proceed to divide even when expressing a subset of the proteome of departure (=the threshold set required for initiation of division), only to complete synthesis of the 'missing ' portion once cell division is complete? Would this obviate the requirement for an unusual mechanism of protein acquisition (trogocytosis; other)?

      There must be a steady state level of translation and proteome replenishment, though. If a cell can divide when it affords daughter cells with 90% of its G0 proteome (as an example), that daughter cell would either 1) be 10% smaller, or 2) require extra translation to make up for the missing proteome during its own division cycle. Though T cells do typically shrink slightly after an initial activation, cell size stabilizes over time. Requiring each daughter cell to make more and more missing proteome could be plausible, considering that initial bursts of division do take longer over time, but still, even in vitro activated T cells divide rapidly for weeks without large decreases in their division rates.

      Translation is estimated to proceed at a rate of ~6 amino acids per second, but surely there is variability in this number attributable to inaccuracies of the methods used, in addition to biological variability. Were these so-called standard values determined for a range of different tissues? It stands to reason that there might be variation depending on the availability of initiation/elongation factors, NTPs, aminoacyl tRNAs etc. What is the margin of error in calculating chain elongation rates based on the results shown here?

      We refer to all relevant studies we know of, including new in vivo estimates of elongation rates (reference 40).

      Reviewer #1 (Recommendations For The Authors):

      A "limitations of study" section would be a helpful way to detail potential contributing mechanisms that were not explored in the current study.

      We have expanded the methodological limitations in the Discussion section.

      Major:

      1) Broaden the scope of biological models that could explain the paradox.

      In the Discussion, we suggest that T cells acquire some fraction of their proteome through external sources and highlight some examples of this occurring.

      Minor:

      1) Include Mr markers for Fig. 2C.

      Done.

      2) Though commonly used interchangeably, historically the term protein synthesis was the consequence of mRNA translation. In other words, proteins are not translated.

      Good point! We have changed the text accordingly.

      3) Include more meaningful X-axis legend in polysome gradient panels i.e., Fig. S2, e.g., fraction number.

      In most experiments, fractions were not collected. Rather, the x-axis refers to time that the sample took to be queried by the detector.

      4) Figure 3A does not report polysome profiles as described in the text, pg. 5, though this is reported in Fig S2D.

      The figure callouts were correct but confusing. We now separately refer to out each result to clarify.

      5) In Fig 5A, SDS-PAGE/anti-Puro blots would be more convincing and contain more information. The dot-blot is difficult to interpret.

      Disagree. To quantitate total anti-puromycin signal a dot blot is far better than immunoblotting, which is compromised by unequal transfer of different protein species.

      6) It's not clear why a degree of monosome translation is necessarily surprising (pg. 7).

      It’s surprising since for many decades it was believed that translation by monosomes is a tiny fraction of translation. But separately, with this particular mode of activation, activated T cells displayed a preponderance of monosomes during their burst of division. When the activation method was improved, polysomes dominated. But monosome translation clearly supported T cell division during activation without cognate peptide, which was interesting.

      Reviewer #2 (Recommendations For The Authors):

      1) One concern is the dose of puromycin used. My understanding is that puromycin acts as a chain termination inhibitor - but is being used here predominantly as a label for nascent polypeptide chains. My concern, therefore, is the dose being used - here at 50ug/ml - which seems high and I would be concerned that at this dose it would act as a translational inhibitor rather than just labelling nascent chains, and is therefore resulting in a lower signal/background ration than expected. In human cell lines 0.1ug/ml is optimal and doses published (in cell lines) range between 1 and 10ug/ml so it will be interesting to understand why this high dose was used.

      Do they have a dose-response curve - is this high dose necessary because these are primary Tcells. Can the authors show that 50 µg/mL of puromycin is optimal for studying protein translation in primary human T cells? A titration curve will help answer this question and could be included in Suppl Figure 1. This experiment is critical as the authors use a higher dose than previous studies (commonly between 1 and 10 µg/mL).

      The reviewer is referencing puromycin concentrations typically used in the selection of cells – for the RPM assay, puromycin is used at saturating doses to label the maximal number of nascent chains stalled by CHX or EME pretreatment.

      2) None of the figures show statistical significance.

      Statistics on relevant comparisons are now indicated on figures and in legends.

      3) The authors mention: "We performed RPM on cells labelled with CFSE to track cell division by dye dilution (Supplemental Figure 1B). On day 2, activated cells exhibited multiple populations, with nearly all divided cells showing a high RPM signal.". However, on day 2 it is hard to see any dividing cells in the dot plot included in the supplemental figure. Dividing cells only appear on day 5? Their statements make the subsequent paragraphs also difficult to follow.

      We modified the text to clarify this data – there is likely activation-induced cell death occurring which is why there are relatively few CFSE-low cells at this timepoint, and they do exhibit a fairly wide range of RPM staining. The main point is that by day 5, nearly all divided cells exhibit high RPM.

      4) "Many divided cells exhibited near baseline RPM signals, however, consistent with their return to the resting state. Interestingly, although non-activated cells did not divide, ~50% demonstrated increased RPM staining.". Again, it is hard to see the ~50% of cells with increased RPM the authors refer to in the provided supplemental figure.

      This is from quantification of the flow data and is described more fully later when we discuss ribosome stalling.

      5) The authors say "Thus, we cannot attribute the persistence of flow RPM staining in translation initiation inhibitor-treated cells to incomplete inhibition of protein synthesis.' - but it's unclear what this refers to as in the previous paragraph they also say: 'Initiation inhibitors, however, clearly discriminated between day 1 resting and activated cells. RPM signal was diminished by up to 8090% on day 5 post-activation.' - this is all somewhat confusing. It would be helpful to have this clarified and in the text to make more liberal use of referring to specific figures.

      Figure 1B shows that RPM is maintained at fairly high levels during treatment with EME or CHX (in contrast to the initiation inhibitors HAR/PA). To rule out that the drugs were simply not active, tritiated leucine labeling was conducted to confirm that incorporation of the radiolabeled amino acid dropped to near-baseline (Figure 1C). Therefore, we can conclude that the drugs are indeed working as intended, but EME/CHX does not decrease RPM signal to the same extent that they prevent leucine incorporation.

      6) Page 5 Fig 3A - I don't understand the difference between freshly isolated OT-1 cells - which don't stall and day 1 OT-1 cells which do. Why are freshly isolated cells not behaving like the naïve cells- isn't this what they would predict? Also - I accept that there is a move from monosome to polysome population between day 1 and 2 - the effect isn't huge - it would be helpful/interesting to know what has happened by day 5 - is the effect much more significant?

      Freshly isolated cells are harvested from animals and immediately queried, whereas day 1 cells are cultured for 24h in the absence of any activation. Presumably, the ex vivo culture without any activation causes the mouse T cell ribosomes to stall, just as we observed in cells obtained from human donors that took hours to collect and bring to the bench. The appearance of polysomes is really related to how the activation of the cells is done… refer to Figure 5B to see how significant the polysome buildup can be!

      7) Fig S3C - I don't understand how they reach the conclusion from this figure that: '~15-fold increase in translating ribosomes in activated OT-I T cells in vivo (Supplemental Figure 3C) as compared to the 10-fold increase we previously reported using the original protocol. It would very much help the reader if these calculations could be better explained.

      These are simply quantifications of the RPM staining done in Supplemental Figure 3C compared to experiments done in the absence of the CHX-modified method.

      8) Page 7 - They conclude that the Tan paper has superior lymphocyte activation - but presumably this depends on the signal as to whether there is more activation and how this affects the shift from monosome to polysome -ie maybe a stronger activation signal affects the distribution more - perhaps their method is the more physiological? Is their conclusion fair - that 'These findings indicate that monosomes make a major contribution to translation in resting T cells but are likely to make a minor contribution in fully activated cells.'

      Yes, we believe that their published method would be more physiological with the use of the natural OT-I peptide. We conclude that although monosome translation is present (as others have published), there are relatively few monosomes in fully activated T cells. Therefore, the monosome contribution to overall translation in activated T cells appears to be minor.

      9) Contrary to observations in vitro, ribosomes are not stalled in naïve mouse T cells in vivo, as we show via RTA analysis of non-activated T cells. - yes - this seems somewhat surprising - what is the explanation?

      We presume this is due to the stress/non-native environment that ex vivo cultured cells are subjected to.

      10) Whilst I understand the point that the authors are trying to make in Figure 1D about resting T cells having high background RPM staining due to stalled ribosomes, it is intriguing that there is almost no difference (no statistical significance provided) after 2 or 5 days of activation. Isn't this finding contrary to the one provided in Figure 1A and Suppl Figure 1B?

      Figure 1A is showing the difference between no activation and activation conditions. Figure 1D is predominantly meant to show that the increase in RPM from activated cells at day 1 and day 5 are not as different as one might predict. The reason, as we describe in further experiments, is likely that cells exhibiting ribosomal stalling can incorporate puromycin, damping the “fold change” we calculate (unlike what we observe in metabolic labeling experiments in the same figure panel). Statistics have now been displayed on the graphs in Figure 1D for further clarification.

      11) "Including EME with HAR prevented decay of the RPM signal, as predicted, since EME blocks elongation while enabling (even enhancing) puromycylation21,26." I find this very confusing. I understand that emetine blocks protein elongation whilst enabling puromycilation, but why does it block the effect of the protein initiation inhibitor Harringtonin? Do they compete with each other?

      When ribosomes are frozen with emetine, they cannot transit mRNA and “fall off”. Therefore, the inclusion of EME in these experiments is a control to ensure that we are looking at true transit and runoff of ribosomes with harringtonine treatment (explanation in the second paragraph of “Flow RPM measures ribosome elongation rates in live cells” section)

      12) Can the authors explain why the RPM signal of activated OT-I cells (PMA/Iono) increases 20fold compared to resting cells, but there is only a ~2-fold increase in signal in human cells? The authors previously mentioned: "We noted that the RPM signal in activated cells was only 2- to 5fold higher than in non-activated cells. This increase is modest compared to the ~15-fold activation-induced increase in protein synthesis in original studies 10,11. To examine this discrepancy, we incubated cells for 15 min with harringtonin (HAR) or pactamycin (PA) to block translation initiation or emetine (EME) or cycloheximide (CHX) to block elongation." Would the authors have followed the same path if they had started the paper with OT-I cells?

      Human cells are not as well activated as OT-I in our study. The last question is beyond the scope of our reasoning as empirical evidence-based scientists, but we have applied for funding from the HG Wells Foundation for a time machine to answer this question.

      13) Authors should include representative raw data of the flow cytometries used to perform the "Ribosome Transit Assay (RTA) in Figures 2 and 3 as supplemental data.

      Done; now included in Supplemental Figures 1 and 3.

      14) It would be interesting to compare RPM in T cells activated with a more physiological stimulus, such as beads anti-CD3 anti-CD28 vs PMA/Iono. Particularly after showing that peptide-specific stimulation (with SIINFEKL) is more effective than PMA/Iono in activating OT-I cells and inducing polysome formation (Figures 5B and Suppl Figure 4A).

      We tried plate bound anti- CD3 and anti-CD28 early in these studies, and they didn’t induce as much early activation.

      15) Can the authors include the gating strategy to call "activated OT-I cells" to the cells shown in Suppl Figure 3c?

      A new Supplemental Figure 3D has been added showing the exact gating strategy for the OT-I cell RTA assays described in Supplemental Figure 3C and elsewhere.

      16) In Figure 6B, the authors mention an increase in the volume of the cells based on the assumption of spherical morphology but then show an increase in diameter. It would be more consistent to show both parameters in the same graph.

      The graph was changed to volume calculations instead of diameter for clarity. But they are linked as volume scales by radius cubed.

      17) The paper's main conclusion (i.e., that the ratio of proteins to ribosomes in T cells activated in-vivo does not support their doubling time) is exciting. They conclude this after measuring cell volume, protein abundance, and ribosomes per cell. As no changes in cell volume and protein abundance between T cells activated in vitro vs in vivo were observed (Figures 6B and 6C), the difference is exclusively attributable to a reduced number of ribosomes per cell in T cells activated in vivo (Figure 6F). Critically, the measurement of ribosomes per cell in T cells activated in vivo (Figure 6F, "ex vivo day 2") includes only two data points. It is hard to understand how the authors calculated this figure's means and standard deviations as it is not described in the figure legend. From the dispersion observed for "day 1" and "day 2" in vitroactivated T cells, it seems that the variability of the assay to measure ribosome content could explain part of the phenotype. Additionally, there are several missing data points in Figure 6H. As this figure is just a transformation of Figures 6D and 6G, it isn't easy to understand why. Can I suggest that they include more data points for Figures 6F, G, and H in the ex vivo day 2' category as the two data points shown with little variability is out of keeping with the rest of the data, and may be skewing their data?

      Figure 6F does not have the same number of data points as other panels because it required measurement of both protein content and ribosome number. Since the ribosome quantification method described here was developed later than some of our earlier protein measurements, not all experiments had both sets of data to properly calculate the proteins per ribosome. All data that had both values are included, though.

      Reviewer #3 (Recommendations For The Authors):

      Minor points:

      If an increase in cell diameter is recorded upon activation, why not also provide the value for the increase in volume?

      Done

      Regarding the writing, the erratic punctuation/hyphenation - or lack thereof - doesn't improve readability. One example: "....consistent with the idea that the flow RPM signal in day 1 resting lymphocytes...." Perhaps better: "... consistent with the idea that the RPM signal, obtained by flow cytometry for lymphocytes analyzed on day 1 and maintained in the absence of any activating agent,..." I understand that this can make for longer sentences, but I object to the use of 'flow' as shorthand for 'flow cytometry', and to the use of day 1 as an adverb or adjective. That works as lab jargon, it's less effective in a written text. The abbreviation 'DRiPs' is not defined. Words like 'notably', and 'surprisingly' can be eliminated.

      This work would benefit from the inclusion of a section describing 'Limitations of the study'.

      This is now expanded in the Discussion, as described above.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The association of vitamin D supplementation in reducing Asthma risk is well studied, although the mechanistic basis for this remains unanswered. In the presented study, Kilic and co-authors aim to dissect the pathway of Vitamin D-mediated amelioration of allergic airway inflammation. They use initial leads from bioinformatic approaches, which they then associate with results from a clinical trial (VDAART) and then validate them using experimental approaches in murine models. The authors identify a role of VDR in inducing the expression of the key regulator Ikzf3, which possibly suppresses the IL-2/STAT5 axis, consequently blunting the Th2 response and mitigating allergic airway inflammation.

      The major strength of the paper lies in its interdisciplinary approach, right from hypothesis generation, and linkage with clinical data, as well as in the use of extensive ex vivo experiments and in vivo approaches using knock-out mice. The study presents some interesting findings including an inducible baseline absence/minimal expression of VDR in lymphocytes, which could have physiological implications and needs to be explored in future studies. However, the study presents a potential for further dissection of relevant pathophysiological parameters using additional techniques, to explain certain seemingly associative results, and allow for a more effective translation.

      Several results in the study suggest multiple factors and pathways influencing the phenotype seen, which remain unexplored. The inferences of this study also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant. While this does not undermine the importance of this elegant study, it is essential to emphasize a holistic picture while interpreting the results.

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to advance our knowledge of how vitamin D may be protective in allergic airway disease in both adult and neonatal mouse models. The rationale and starting point are important human clinical, genetic/bioinformatic data, with a proposed role for vitamin D regulation of 2 human chromosomal loci (Chr17q12-21.1 and Chr17q21.2) linked to the risk of immune-mediated/inflammatory disease. The authors have made significant contributions to this work specifically in airway disease/asthma. They link these data to propose a role for vitamin D in regulating IL-2 in Th2 cells implicating genes associated with these loci in this process.

      Strengths:

      Here the authors draw together evidence form. multiple lines of investigation to propose that amongst murine CD4+ T cell populations, Th2 cells express high levels of VDR, and that vitamin D regulates many of the genes on the chromosomal loci identified to be of interest, in these cells. The bottom line is the proposal that vitamin D, via Ikfz3/Aiolos, suppresses IL-2 signalling and reduces IL-2 signalling in Th2 cells. This is a novel concept and whilst the availability of IL-2 and the control of IL-2 signalling is generally thought to play a role in the capacity of vitamin D to modulate both effector and especially regulatory T cell populations, this study provides new data.

      Weaknesses:

      Overall, this is a highly complicated paper with numerous strands of investigation, methodologies etc. It is not "easy" reading to follow the logic between each series of experiments and also frequently fine detail of many of the experimental systems used (too numerous to list), which will likely frustrate immunologists interested in this. There is already extensive scientific literature on many aspects of the work presented, much of which is not acknowledged and largely ignored. For example, reports on the effects of vitamin D on Th2 cells are highly contradictory, especially in vitro, even though most studies agree that in vivo effects are largely protective. Similarly other reports on adult and neonatal models of vitamin D and modulation of allergic airway disease are not referenced. In summary, the data presentation is unwieldy, with numerous supplementary additions, that makes the data difficult to evaluate and the central message lost. Whilst there are novel data of interest to the vitamin D and wider community, this manuscript would benefit from editing to make it much more readily accessible to the reader.

      Wider impact: Strategies to target the IL-2 pathway have long been considered and there is a wealth of knowledge here in autoimmune disease, transplantation, GvHD etc - with some great messages pertinent to the current study. This includes the use of IL-2, including low dose IL-2 to boost Treg but not effector T cell populations, to engineered molecules to target IL-2/IL-2R.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In the revised manuscript, the authors have addressed a significant number of concerns raised. The restructuring and incorporation of a number of discussion points have improved the readability. Moreover, the authors have also incorporated some more figures to address certain questions raised.

      However, the authors could reconsider a few more points which would improve the readability of the manuscript.

      For e.g.

      1) While it is appreciated that the authors have provided the schematic of the study design for the VDAART trial, the visualization for the RNA-seq analysis may be helpful.

      We have created a visualization of the workflow for the RNA seq analysis as part of Figure 1 – figure supplement 1C.

      2) Quantification of images would not require any additional experiments, yet can reinforce the results with objectivity.

      We appreciate this comment. We chose to display histology images to allow a glimpse at the inflammatory condition in the lung tissue. For histological quantification, lung tissue should have been harvested and analyzed in a systematic and randomized way as well as in sufficient animal numbers to allow statistical analyses. This has not been done for these mouse models since the focus was in analyzing cytokine production by lung tissue CD4+ T cells as the driver of inflammation.

      3) The authors have not addressed the discrepancy of the sample sizes in the experiments. Some dot plots still don't match the legends, and there is a wide variation in the numbers chosen for different experiments and different groups in the same experiments.

      We appreciate the thorough screening of our manuscript and apologize for this oversight. We corrected the errors in the respective figure legends.

      The in vivo experiments comprise studies performed in (A) VDR-KO mice and (B) WT mice fed with vit-D supplemented chow.

      Sample size calculations for the mouse models of allergic airway inflammation based on BAL cell numbers revealed a minimum of n=8 per group for correct statistical analysis. In both experimental settings, the respective mouse lines were bred in the mouse facilities of MGH (A) and BWH (B). Depending on the litter sizes, additional mice were added in the HDM group, since bigger variability was expected in this group than the saline group.

      Intracellular CD4+ cytokine staining was performed for all mice, however some stainings failed and could not be reliably interpreted and were therefore excluded.

      Reviewer #2 (Recommendations For The Authors):

      The authors have largely replied to the reviewer comments, amended some noted typos & figure legend issues, as well as discussed the reviewers concerns in text and in their rebuttal.

      The data presented are novel and of significant interest, conceptually moving this field forward, but in this reviewer's opinion reflect one pathway, of likely several, linked to protective effects of vitamin D on airway disease. This reviewer recommends a further slight editing of the text to present this broader scenario.

      i) Treg cells are highly dependent on IL-2 (both Foxp3+ and IL-10+ cells, not always the same population), constitutively express the IL-2R, and there is already a significant literature regarding vitamin D and IL-10/Treg in control of immune-mediated conditions. A simple statement acknowledging this and that there are likely more than one mechanisms by which vitamin D may regulate allergic airway disease (directly or indirectly) would be appreciated - this is no way detracts from the novelty and contribution of the current findings.

      We thank the reviewer for this suggestion. We have added the following statement to the manuscript (lines 623-625):

      “Additional pathways, including the induction of IL-10 production by CD4+ T cells as well as a direct induction of Foxp3+ T reg cells could have further contributed to the observed protective effect of vitamin D supplementation (PMID: 21047796; 22529297).”

      ii) More comprehensive referencing of earlier papers proposing effects of vitamin D in controlling Treg/IL-10 and dampening Th2 responses in mouse (and human) models

      (e.g. Taher, Y. A., van Esch, B. C. A. M., Hofman, G. A., Henricks, P. A. J. & van Oosterhout, A. J. M. 1alpha,25-dihydroxyvitamin D3 potentiates the beneficial effects of allergen immunotherapy in a mouse model of allergic asthma: role for IL-10 and TGF-beta. J. Immunol. 180, 5211-21 (2008). Vassiliou JE et al, 2014. Vitamin D deficiency induces Th2 skewing and eosinophilia in neonatal allergic airways disease. Allergy DOI10.1111/all.12465).

      We have included the reference in the discussion section of our manuscript in lines 617-619:

      “Similar findings regarding the effects of vitamin D in controlling Treg/IL-10 and dampening Th2 responses have been reported, e.g., in (PMID: 18390702) and in offspring of mice that had been subjected to vitamin D deficiency in the third trimester of their pregnancy (PMID: 24943330).”

    2. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The association of vitamin D supplementation in reducing Asthma risk is well studied, although the mechanistic basis for this remains unanswered. In the presented study, Kilic and co-authors aim to dissect the pathway of Vitamin D mediated amelioration of allergic airway inflammation. They use initial leads from bioinformatic approaches, which they then associate with results from a clinical trial (VDAART) and then validate them using experimental approaches in murine models. The authors identify a role of VDR in inducing the expression of the key regulator Ikzf3, which possibly suppresses the IL-2/STAT5 axis, consequently blunting the Th2 response and mitigating allergic airway inflammation.

      Strengths:

      The major strength of the paper lies in its interdisciplinary approach, right from hypothesis generation, and linkage with clinical data, as well as in the use of extensive ex vivo experiments and in vivo approaches using knock-out mice.

      The study presents some interesting findings including an inducible baseline absence/minimal expression of VDR in lymphocytes, which could have physiological implications and needs to be explored in future studies.

      Weaknesses:

      The core message of the study relies on the role of vitamin D and its receptor in suppressing the Th2 response. However, there is scope for further dissection of relevant pathophysiological parameters in the in vivo experiments, which would enable stronger translation to allergic airway diseases like Asthma.

      To a large extent, the authors have been successful in validating their results, although a few inferences could be reinforced with additional techniques, or emphasised in the discussion section (possibly utilising the ideas and speculative section offered by the journal).

      The study inferences also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant. Moreover, the authors have referenced vitamin D doses for the murine models from the VDAART trials and performed the experiments in the second generation of animals. While this is appreciated, the risk of hypervitaminosis-D cannot be ignored, in view of its lipid solubility. Possibly comparison and justification of the doses used in murine experiments from previous literature, as well as the incorporation of an emphasised discussion about the side effects and toxicity of Vitamin D, is an important aspect to consider.

      In no way do the above considerations undermine the importance of this elegant study which justifies trials for vitamin D supplementation and its effects on Asthma. The work possesses tremendous potential.

      We thank the reviewer for their careful assessment of our paper and helpful suggestions. Please find the point-by-point responses to the reviewer recommendations below.

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to advance our knowledge of how vitamin D may be protective in allergic airway disease in both adult and neonatal mouse models. The rationale and starting point are important human clinical, genetic/bioinformatic data, with a proposed role for vitamin D regulation of 2 human chromosomal loci (Chr17q12-21.1 and Chr17q21.2) linked to the risk of immune-mediated/inflammatory disease. The authors have made significant contributions to this work specifically in airway disease/asthma. They link these data to propose a role for vitamin D in regulating IL-2 in Th2 cells implicating genes associated with these loci in this process.

      Strengths:

      Here the authors draw together evidence from multiple lines of investigation to propose that amongst murine CD4+ T cell populations, Th2 cells express high levels of VDR, and that vitamin D regulates many of the genes on the chromosomal loci identified to be of interest, in these cells. The bottom line is the proposal that vitamin D, via Ikfz3/Aiolos, suppresses IL-2 signalling and reduces IL-2 signalling in Th2 cells. This is a novel concept and whilst the availability of IL-2 and the control of IL-2 signalling is generally thought to play a role in the capacity of vitamin D to modulate both effector and especially regulatory T cell populations, this study provides new data.

      Weaknesses:

      Overall, this is a highly complicated paper with numerous strands of investigation, methodologies etc. It is not "easy" reading to follow the logic between each series of experiments and also frequently fine detail of many of the experimental systems used (too numerous to list), which will likely frustrate immunologists interested in this. There is already extensive scientific literature on many aspects of the work presented, much of which is not acknowledged and largely ignored. For example, reports on the effects of vitamin D on Th2 cells are highly contradictory, especially in vitro, even though most studies agree that in vivo effects are largely protective. Similarly, other reports on adult and neonatal models of vitamin D and modulation of allergic airway disease are not referenced. In summary, the data presentation is unwieldy, with numerous supplementary additions, which makes the data difficult to evaluate and the central message lost. Whilst there are novel data of interest to the vitamin D and wider community, this manuscript would benefit from editing to make it much more readily accessible to the reader.

      Wider impact: Strategies to target the IL-2 pathway have long been considered and there is a wealth of knowledge here in autoimmune disease, transplantation, GvHD etc - with some great messages pertinent to the current study. This includes the use of IL-2, including low dose IL-2 to boost Treg but not effector T cell populations, to engineered molecules to target IL-2/IL-2R.

      We thank the reviewer for their careful assessment of our paper and helpful suggestions. Please find the point-by-point responses to the reviewer recommendations below. In addition, we have revisited the Introduction and Discussion, added additional subsection headings, and provided additional schematics to make the general flow of the paper more accessible to a wider audience.

      Reviewer #1 (Recommendations For The Authors):

      There are certain aspects of the manuscript which could be revisited in order to provide more clarity to the reader. Some of these are:

      1. In vivo experiments : The major inference and its impact is derived from the effect of VDR on Ikzf3 expression, and consequently on the Th2 response. While the study employs both in vivo and ex vivo approaches to validate this claim, pathophysiological aspects could have been explored in more detail, by using cytokine panels, possibly techniques to measure airway resistance, as well as by reducing the variations in the sample sizes used in different groups. Similarly, certain inferences from ex vivo studies may be important to demonstrate in the in vivo setting as well. A justification for the incorporation of both Balb/c and C57 Bl6 mice for the experiments could also be incorporated in the manuscript.

      2. Certain sections, especially those connecting VDR, Ikzf1/3 and IL2/STAT axis seem associative. This is indicated by Figure 5 H as well, where the effects of calcitriol administration in KO cells indicate additional pathways at play, possibly through indirect effects. The use of additional techniques like ChIP, co-IP and establishing STAT induction/activation would probably strengthen the findings, alternatively, a clear distinction between the speculative and the definitive results could be made in the discussion section, as the journal encourages. Similar considerations could be made for VDR and Ikzf3.

      3. Role of other cells :

      a. While the investigators have explored the phenotype on other cell types like Th1 and Treg, at places there remains a lacuna. For instance, the absence of neutrophil fractions from the DLC-BAL, as well inconsistencies in the groups selected for comparison. For eg. in Figure 3 Supplementary Figure 2, the figure suggests IL13 expression in CD4+ cells, yet the text reads incubated Th2 cells. This could be made more lucid.

      b. In Figure 3 Supplementary Figure 1 there is a trend towards an increase in IL-10 levels, whereas in Supplementary Figure 2 there is a drop in the IL13 level in the VDR KO group, which has not been explained.

      c. While 17q loci form the predominant loci associated with Asthma, other loci important in Asthma on chromosomes 2,6,9, 22 could be discussed in the manuscript as well, even if they can't be explored in depth.

      1. Quantification of histology and confocal images could provide an objective assessment to the readers. Possibly incorporation of co-localisation panels for the IF images showing membrane/cytoplasmic/ nuclear localisation of the VDR under various conditions.

      2. Structure of the manuscript: At places the manuscript has a disrupted flow, as well as mislabelled figures (Figure 2SF1B is 1C, Fig 2c is 2b in the results, ). Flow gates can be arranged sequentially and consistent labelling of the gates and axis would ease interpretation. In some places sample sizes mentioned do not match the dot graphs in the figures (figure 3K-L). In the same figure and others (Figure 5 Supplementary Figure 2), a comparison of all groups would be beneficial. A restructuring of the results and corrections, could assist the reader. Also, a visualization of the VDAART analysis in the main figures, corroborating with the results sections would do justice to the interesting approach and findings. The clearances and approvals for the study also need to be incorporated into the manuscript. If possible, the incorporation of a schematic showing the proposed pathway for VDR-induced Ikzf3 and subsequent suppression of the genes present on Chr 17 loci to mitigate allergic airway inflammation would help.

      Reviewer #2 (Recommendations For The Authors):

      A few specific points: A number of immune concepts are studied without reference to the broader literature and the data presented data on occasion counter these earlier findings. Examples of this include:

      • Vitamin D can both enhance and inhibit IL-13 synthesis, demonstrated both in vitro and ex vivo, and these effects are clearly context-specific. I am not questioning the validity of the present experimental findings in this specific experimental model), but the experimental context - the problem is that this is not discussed.

      • Short-term bulk Th2 cultures are used with no indication of their enrichment for lineage-specific markers or cytokine - their conclusions might be enhanced by this. Data on genes/markers of interest could be further enhanced by showing FACS plots of co-expression e.g. Th2 genes e.g. IL-13/GATA3 with these other markers.

      • Are human Th2 enriched for VDR, since the backdrop to this study is human clinical and genetic data? For a study that has based its rationale on human clinical/genetic studies it would be great to confirm these findings in human Th2 cells.

      • The Discussion might comment on some of these wider issues.

      • Minor typos throughout, including in figure legends

      Reviewer #1

      1. The study inferences also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant.

      We agree that asthma has many sub-phenotypes and endotypes and that the Th2 response may not be predominant in all of them, but we focus here on the origins of the disease in the first few years of life and the genetic and molecular mechanisms associate with disease onset where the Th2 response is important.

      1. Moreover, the authors have referenced vitamin D doses for the murine models from the VDAART trials and performed the experiments in the second generation of animals. While this is appreciated, the risk of hypervitaminosis-D cannot be ignored, in view of its lipid solubility. Possibly comparison and justification of the doses used in murine experiments from previous literature, as well as the incorporation of an emphasized discussion about the side effects and toxicity of Vitamin D, is an important aspect to consider.

      We appreciate this comment from the reviewers allowing us to review vitamin D toxicity in more detail. Given the length of this review we did not include this in the manuscript discussion but provide it here.

      Vitamin D supplementation in humans is debated due to possibility of intoxication from overdose. Vitamin D intoxication is a rare medical condition associated with hypercalcemia, hyperphosphatemia, and suppressed parathyroid hormone level and is typically seen in patients who are receiving very high doses of vitamin D, ranging from 50,000 to 1 million IU/d for several months to years 1,2. Intoxication observed at lower doses might be attributable to rare genetic disorders 1. By far the bigger problem in humans is vitamin D deficiency; this is especially true in pregnant women where dosage requirements are high due to the needs of the fetus. It is estimated that virtually all pregnant women are vitamin D insufficient or deficient 3. VDAART has shown that vitamin D in a dose of 4400 IC given to pregnant women can prevent asthma in their offspring. There were no adverse side effects in the mother or the infant from this dose 4.

      In rodents, a few studies have reported vitamin D intoxication with very high vitamin D doses 5(PMID: 23405058: 50.000 IU/kg 120d -> toxicity in females). In contrast there are several studies using 2-2.5 times higher doses of vitamin D than we use here, that do not report adverse events in mouse models of disease 6,7. Our doses of vitamin D are identical to those used in VDAART and are lower than those used in any of these other rodent studies. In addition, while we did not specifically assess specific signs of vitamin D intoxication, we can exclude any impact on animal well-being, health, reproduction, and behavior throughout the study.

      1. The major inference and its impact are derived from the effect of VDR on Ikzf3 expression, and consequently on the Th2 response. While the study employs both in vivo and ex vivo approaches to validate this claim, pathophysiological aspects could have been explored in more detail, by using cytokine panels, possibly techniques to measure airway resistance, as well as by reducing the variations in the sample sizes used in different groups.

      We have added the following sentence to the discussion: “Additional cytokine measurements in the mice as well as measurement of airway resistance would have added to the pathophysiological data linking IKFZ3 expression to TH2 response.”

      1. Similarly, certain inferences from ex vivo studies may be important to demonstrate in the in vivo setting as well. A justification for the incorporation of both Balb/c and C57 Bl6 mice for the experiments could also be incorporated in the manuscript.

      We agree with the reviewers that ex vivo results may require in vivo confirmation. We have added a sentence explaining the rationale for use of both Balb/c and C57BL/6 mice in the results section “Vitamin D suppresses the activation of the IL-2/Stat5 pathway and cytokine production in Th2 cells”: “To ensure that the above findings were not restricted to the C57BL/6 mouse strain, the inverse experiment was performed in Balb/c mice. This mouse strain is commonly used for type 2 driven inflammation.”

      1. Certain sections, especially those connecting VDR, Ikzf1/3 and IL2/STAT axis seem associative. This is indicated by Figure 5 H as well, where the effects of calcitriol administration in KO cells indicate additional pathways at play, possibly through indirect effects.

      We appreciate this comment. The RNA-Seq results showed an over representation of the IL-2/STAT5 pathway in Vit-D deficient Th2 cells compared to those under Vitamin D supplementation. We further show the induction of IKZF3 expression with calcitriol stimulation. High IKZF3 expression is known to suppress IL-2 expression. Lack of IKZF3 diminishes the suppressive activity of calcitriol on IL-2 expression. However, as pointed out by the reviewer, Figure 5 H implicates additional pathways regulated by calcitriol for the suppression of IL-2 and we note that in the text.

      1. The use of additional techniques like ChIP, co-IP and establishing STAT induction/activation would probably strengthen the findings, alternatively, a clear distinction between the speculative and the definitive results could be made in the discussion section, as the journal encourages. Similar considerations could be made for VDR and Ikzf3.

      We have added the following sentence to the discussion. We have focused here on establishing the relationship between VDR binding and IKFZ3 activation or repression and subsequent ORMDL3 and Il2 activation. Additional use of ChIP or co-IP to establish STAT induction and activation would have been of potential value.

      1. Role of other cells: a. While the investigators have explored the phenotype on other cell types like Th1 and Treg, at places there remains a lacuna. For instance, the absence of neutrophil fractions from the DLC BAL, as well inconsistencies in the groups selected for comparison. For e.g., in Figure 3 Supplementary Figure 2, the figure suggests IL13 expression in CD4+ cells, yet the text reads incubated Th2 cells. This could be made more lucid.

      We appreciate this comment and would like to clarify. Neutrophil numbers were assessed in the presented in vivo models and showed no differences in neutrophil number due to genotype or vitamin D diet. We added the graphs to the supplement in Figure 3 - figure supplement 1A and Figure 5 - figure supplement 1B and refer to the figures in the main text. All in vivo data were analyzed by Mixed-effect ANOVA analysis or Two-way ANOVA test with Holm-Šidák’s post-hoc analysis (factors: genotype & exposure). To keep the plots clear, we incorporated only the statistic for the groups of interest.

      1. b) In Figure 3 Supplementary Figure 1 there is a trend towards an increase in IL-10 levels, whereas in Supplementary Figure 2 there is a drop in the IL13 level in the VDR KO group, which has not been explained.

      We apologize for any confusion. Figure 3 supplementary Figure 1 shows cytokine positive CD4+ T cells isolated from saline and HDM exposed mouse lungs. These data were analyzed with a Mixed-effect ANOVA analysis or Two-way ANOVA test with Holm-Šidák’s post-hoc analysis (factors: genotype & exposure) and were not found significant. Figure 3 supplementary Figure 2 shows IL-13 levels in the system of in vitro polarization of naïve CD4+ T cells into Th2 cells. The difference between this result and the findings in Figure 3H is the in vivo setting in which additional factors such as IL-4 can aggravate the immune response.

      1. c) While 17q loci form the predominant loci associated with Asthma, other loci important in Asthma on chromosomes 2,6,9, 22 could be discussed in the manuscript as well, even if they can't be explored in depth.

      This is an excellent comment. Our preliminary results confirm that three asthma susceptibility loci: 2q12.1 (IL1RL1), 6p21.32 (HLA-DQA1/B1/A2/B2) and 22q12.3 (IL2RB) each have VDR and IKZF3 binding sites either in enhancers predicted by GeneHancer to target these genes or within these genes themselves. In particular, we found (i) VDR binding sites within IL18RAP and in the enhancer region GH02J102301 targeting IL1RL1, and IKZF3 binding sites within IL1RL1; (ii) VDR binding sites in the enhancer regions GH06J032940 and GH06J031813 targeting HLA-DQA2, and IKZF3 binding sites within HLA-DQA1; (iii) VDR and IKZF3 binding sites within IL2RB. In contrast, the region 9p24.1 (IL33) has no documented VDR or IKZF3 binding sites within IL33 or in the promoter regions targeting IL33. Investigating these additional genetic loci further, using the integrative approach taken here with 17q12-21, is beyond the scope of this current manuscript but based on these preliminary results, would be a worthwhile scientific endeavor.

      1. Quantification of histology and confocal images could provide an objective assessment to the readers. Possibly incorporation of co-localisation panels for the IF images showing membrane/cytoplasmic/nuclear localisation of the VDR under various conditions.

      We agree that quantification of histology and confocal images could provide an overview of VDR expression in the lungs. Given the knowledge on VDR expression in a variety of cell types, including structural cells in the lungs and the focus of this manuscript on CD4+ T cells, we focused on determining VDR expression in CD4+ T cells isolated from saline and HDM exposed lungs in the mouse models studied (Figure 2 C; Fig. 2- figure supplement 1 B & C, Figure 3 C; Figure 5 - figure supplement 1) as well as in vitro (Figure 2 - figure supplement 2; Figure 5 - figure supplement 2).

      1. Structure of the manuscript: At places the manuscript has a disrupted flow, as well as mislabeled figures (Figure 2SF1B is 1C, Fig 2c is 2b in the results, ). Flow gates can be arranged sequentially and consistent labelling of the gates and axis would ease interpretation.

      We appreciate this comment and have corrected the mislabeled figures and tried to improve the flow.

      1. In some places sample sizes mentioned do not match the dot graphs in the figures (figure 3K-L). In the same figure and others (Figure 5 Supplementary Figure 2), a comparison of all groups would be beneficial.

      We appreciate this comment and have checked the sample sizes. Each of these experiments compared two groups and these two groups were compared statistically. We corrected the sample size for Figure 5 Supplementary Figure 2 C in the manuscript.

      1. A restructuring of the results and corrections, could assist the reader.

      We have restructured both the results and the discussion, incorporating the changes noted here in the response to the reviewers, to make the flow of the manuscript easier to read.

      1. Also, a visualization of the VDAART analysis in the main figures, corroborating with the results sections would do justice to the interesting approach and findings.

      We have now added the below schematic to Figure 1-figure supplement 1C to summarize the analyses conducted on the VDAART data.

      Author response image 1.

      1. The clearances and approvals for the study also need to be incorporated into the manuscript.

      These were in the checklist and have been moved to the main text of the manuscript.

      1. If possible, the incorporation of a schematic showing the proposed pathway for VDR induced Ikzf3 and subsequent suppression of the genes present on Chr 17 loci to mitigate allergic airway inflammation would help.

      We have a figure for this (below) that we have incorporated into the manuscript as Figure 5 - figure supplement 3:

      Author response image 2.

      Cartoon Summarizing Vitamin D molecular genetics at 17q12-21

      Reviewer #2

      1. A few specific points: A number of immune concepts are studied without reference to the broader literature and the data presented data on occasion counter these earlier findings. Examples of this include:

      a. Vitamin D can both enhance and inhibit IL-13 synthesis, demonstrated both in vitro and ex vivo, and these effects are clearly context-specific. I am not questioning the validity of the present experimental findings in this specific experimental model), but the experimental context - the problem is that this is not discussed.

      We thank the reviewer for this comment. We have now included a sentence in the discussion section mentioning the contradictory results. It reads as follows:

      “We acknowledge that the impact of vitamin D on Th2 biology is conflicting in the literature. While several groups report Th2 promoting activity, we, and others, show inhibition of type 2 cytokine production 8–11. These discrepancies could be due to the model system studied, e.g., PBMC and purified CD4+ T cells, or the dose of vitamin D or the mouse strain.”

      b. Short-term bulk Th2 cultures are used with no indication of their enrichment for lineage specific markers or cytokine – their conclusions might be enhanced by this. Data on genes/markers of interest could be further enhanced by showing FACS plots of co-expression e.g., Th2 genes e.g., IL-13/GATA3 with these other markers.

      We appreciate this comment. The in vitro culture system used for Th2 cell differentiation has been well described in the literature. As shown in Figure 3 - figure supplement 2; Figure 4 E and Figure 5 - figure supplement 2 D & E the lineage specific IL-13 cytokine levels are detectable at high levels.

      c. Are human Th2 cells enriched for VDR, since the backdrop to this study is human clinical and genetic data? For a study that has based its rationale on human clinical/genetic studies it would be great to confirm these findings in human Th2 cells.

      We appreciate this comment and are curious to explore this in future research. The VDAART trial is a double-blinded multicenter trial in which an immediate processing of the blood samples and an enrichment of different immune cell populations was not feasible. Other publicly available data sets report gene expression derived from mixed and peripheral (blood) cells and not local (lung) tissues. Published in vitro studies on human Th2 cells do not report VDR expression in comparison to other Th subsets, which would allow the assessment of enrichment.

      1. The Discussion might comment on some of these wider issues.

      We have rewritten the discussion to incorporate many of the issues raised in this review.

      1. Minor typos throughout, including in figure legends.

      We have edited all of the figure legends.

      References

      1. Holick, M. F. Vitamin D Is Not as Toxic as Was Once Thought: A Historical and an Up-to-Date Perspective. Mayo Clinic proceedings 90, 561–564; 10.1016/j.mayocp.2015.03.015 (2015).

      2. Hossein-nezhad, A. & Holick, M. F. Vitamin D for health: a global perspective. Mayo Clinic proceedings 88, 720–755; 10.1016/j.mayocp.2013.05.011 (2013).

      3. Hollis, B. W. & Wagner, C. L. New insights into the vitamin D requirements during pregnancy. Bone research 5, 17030; 10.1038/boneres.2017.30 (2017).

      4. Litonjua, A. A. et al. Effect of Prenatal Supplementation With Vitamin D on Asthma or Recurrent Wheezing in Offspring by Age 3 Years: The VDAART Randomized Clinical Trial. JAMA 315, 362–370; 10.1001/jama.2015.18589 (2016).

      5. Gianforcaro, A., Solomon, J. A. & Hamadeh, M. J. Vitamin D(3) at 50x AI attenuates the decline in paw grip endurance, but not disease outcomes, in the G93A mouse model of ALS, and is toxic in females. PloS one 8, e30243; 10.1371/journal.pone.0030243 (2013).

      6. Landel, V., Millet, P., Baranger, K., Loriod, B. & Féron, F. Vitamin D interacts with Esr1 and Igf1 to regulate molecular pathways relevant to Alzheimer's disease. Molecular neurodegeneration 11, 22; 10.1186/s13024-016-0087-2 (2016).

      7. Agrawal, T., Gupta, G. K. & Agrawal, D. K. Vitamin D supplementation reduces airway hyperresponsiveness and allergic airway inflammation in a murine model. Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology 43, 672–683; 10.1111/cea.12102 (2013).

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The authors of the manuscript "High-resolution kinetics of herbivore-induced plant volatile transfer reveal tightly clocked responses in neighboring plants" assessed the effects of herbivory induced maize volatiles on receiver plants over a period of time in order to assess the dynamics of the responses of receiver plants. Different volatile compound classes were measured over a period of time using PTR-ToF-MS and GC-MS, under both natural light:dark conditions, and continuous light. They also measured gene expression of related genes as well as defense related phytohormones. The effects of a secondary exposure to GLVs on primed receiver plants was also measured.

      The paper addresses some interesting points, however some questions arise regarding some of the methods employed. Firstly, I am wondering why VOCs (as measured by GC-MS) were not quantified. While I understand that quantification is time consuming and requires more work, it allows for comparisons to be made between lines of the same species, as well as across other literature on the subject. Simply relying on the area under the curve and presenting results using arbitrary units is not enough for analyses like these. AU values do not allow for conclusions regarding total quantities, and while I understand that this is not the main focus of this paper, it raises a lot of uncertainty for readers (for example, the references cited show that TMTT has been found to accumulate at similar levels of caryophyllene, however the AU values reported are an order of magnitude higher for TMTT. Again, without actual quantification this is meaningless, but for readers it is confusing).

      With regards to the correlation analyses shown in figure 6, the results presented in many of the correlation plots are not actually informative. While there is a trend, I do not think that this is an appropriate way to show the data, as there are clearly other relationships at play. The comparison between plants under continuous light and normal light:dark conditions is interesting.

      This paper addresses a very interesting idea and I look forward to seeing further work that builds on these ideas.

      As mentioned in our previous response, we have added the quantification of GLVs in order to increase the comparability of our work to other studies.

      Regarding the comment about TMTT (only measured as internal pools), the purpose of the inclusion of these internal pool data, was simply to determine whether terpenes were accumulating in leaf tissue during the night when emissions are hindered (likely due to closed stomata). The data clearly show that internal terpene pools do not accumulate above daytime levels during darkness – this is further supported by gene expression data that show downregulation of terpene synthase genes during darkness. While quantification would certainly increase the ability to compare internal pools, it would not change the interpretation of our results. Also note that absolute quantification is challenging for compounds such as TMTT, which are not readily available.

      Regarding the comment on Figure 6, while we agree there may be interesting patterns beyond linear relationships, as stated in our previous response, the purpose of our analysis was to determine if the higher terpene burst in receiver plants on the second day may be explained by sender plants emitting more GLVs on the second day. Figure 6 shows that this is not the case. Further analyses would not provide additional significant insights into the hypothesis that we tested here.

      We thank the reviewer for their overall positive outlook on our paper and for the constructive comments.

      Reviewer #2 (Public Review):

      The exact dynamics of responses to volatiles from herbivore-attacked neighbouring plants have been little studied so far. Also, we still lack evidence whether herbivore-induced plant volatiles (HIPVs) induce or prime plant defences of neighbours. The authors investigated the volatile emission patterns of receiver plants that respond to the volatile emission of neighbouring sender plants which are fed upon by herbivorous caterpillars. They applied a very elegant approach (more rigorous than the current state-of-the-art) to monitor temporal response patterns of neighbouring plants to HIPVs by measuring volatile emissions of senders and receivers, senders only and receivers only. Different terpenoids were produced within 2 h of such exposure in receiver plants, but not during the dark phase. Once the light turned on again, large amounts of terpenoids were released from the receiver plants. This may indicate a delayed terpene burst, but terpenoids may also be induced by the sudden change in light. As one contrasting control, the authors also studied the time-delay in volatile emission when plants were just kept under continuous light. Here they also found a delayed terpenoid production, but this seemed to be lower compared to the plants exposed to the day-night-cycle. Another helpful control was now performed for the revision in which the herbivory treatment was started in the evening hours and lights were left on. This experiment revealed that the burst of terpenoid emission indeed shifted somewhat. Circadiane and diurnal processes must thus interact.

      Interestingly, internal terpene pools of one of the leaves tested here remained more comparable between night and day, indicating that their pools stay higher in plants exposed to HIPVs. In contrast, terpene synthases were only induced during the light-phase, not in the dark-phase. Moreover, jasmonates were only significantly induced 22 h after onset of the volatile exposure and thus parallel with the burst of terpene release.

      An additional experiment exposing plants to the green leaf volatile (glv) (Z)-3-hexenyl acetate revealed that plants can be primed by this glv, leading to a stronger terpene burst. The results are discussed with nice logic and considering potential ecological consequences. All data are now well discussed.

      Overall, this study provides intriguing insights in the potential interplay between priming and induction, which may co-occur, enhancing (indirect and direct) plant defence. Follow-up studies are suggested that may provide additional evidence.

      We thank the reviewer for their positive outlook on our paper and for their constructive comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The authors did a great job with the revision. The additional experiments strengthened their conclusions. Thanks also for performing the suggested test for potential differences in induction capacity at different times of day, the new data are very interesting.

      Thank you very much.

      Line 49-52: The newly added sentence could be clarified in wording.

      We will clarify the sentence.

      Line 254-255: The newly added sentence needs to be corrected. This is no full sentence and it is not clear what the authors wanted to say here.

      We will clarify this sentence.

      Figure 6: In those instances, in which the correlation is not significant, the line should not be shown.

      We will remove the lines when correlations are not significant.

      The names of chemical compounds and terpene synthases should be written in lower case letters (see legend Fig 6, e.g. hexenal, not Hexenal; legend fig. 2: terpene synthase, not Terpene synthase)

      In the last round of revisions, I commented on Line 23: consequences on community dynamics are not investigated here, so this is a bit misleading. ... Your response was "We have deleted the sentence about community dynamics ..." which, however, in fact was not done! Please change!

      Apologies for that, we will delete mention of community dynamics in that sentence (for real).


      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study examines the effects of herbivory-induced maize volatiles on neighboring plants and their responses over time. Measurements of volatile compound classes and gene expression in receiver plants exposed to these volatiles led to the conclusion that the delayed emission of certain terpenes in receiver plants after the onset of light may be a result of stress memory, highlighting the role of priming and induction in plant defenses triggered by herbivore-induced plant volatiles (HIPVs). Most experimental data are compelling but additional experiments and accurate quantifications of the compounds would be required to confirm some of the main claims.

      Response: We thank the editors for their overall positive feedback on our MS. We have added additional experiments to quantify green leaf volatile emissions in both sender plants and synthetic dispensers (Reviewer 1) and address the importance of the precise time of day plants are induced (Reviewer 2). These additions strengthen the main conclusions of our study.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors of the manuscript "High-resolution kinetics of herbivore-induced plant volatile transfer reveal tightly clocked responses in neighboring plants" assessed the effects of herbivory-induced maize volatiles on receiver plants over a period of time in order to assess the dynamics of the responses of receiver plants. Different volatile compound classes were measured over a period of time using PTR-ToF-MS and GC-MS, under both natural light:dark conditions, and continuous light. They also measured gene expression of related genes as well as defence-related phytohormones. The effects of a secondary exposure to GLVs on primed receiver plants were also measured.

      The paper addresses some interesting points, however, some questions arise regarding some of the methods employed. Firstly, I am wondering why VOCs (as measured by GC-MS) were not quantified. While I understand that quantification is time-consuming and requires more work, it allows for comparisons to be made between lines of the same species, as well as across other literature on the subject. As experiments with VOC dispensers were also used in this experiment, I find it even more baffling that the authors didn't confirm the concentration of the emission from the plants they used to make sure they matched. The references cited justifying the concentration used (saying it was within the range of GLVs emitted by their plants) to prepare the dispenser were for either a different variety of maize (delprim versus B73) or arabidopsis. Simply relying on the area under the curve and presenting results using arbitrary units is not enough for analyses like these.

      Response: We thank the reviewer for their comment. We have now quantified both the emission of dispensers and maize seedlings infested with 3 4th-instar Spodoptera exigua larvae. Averaged across 1 h, HAC dispensers emitted roughly 2x higher molar concentrations than total GLV molar concentrations emitted by plants infested by 3 caterpillars. Of note, GLV emissions induced by caterpillars vary over time, and can be more than 2-fold higher than the average during times of strong active feeding (Supplemental Fig 4). Thus, the release rate of the dispensers is well within the plant’s physiological range.

      Note that the references cited were included to support the claim of the biological activity of all three GLVs rather than to justify concentration of our dispensers. We have rephrased this sentence to reflect this (see L330-333).

      With regards to the correlation analyses shown in Figure 6, the results presented in many of the correlation plots are not actually informative. By blindly reporting the correlation coefficient important trends are being ignored, as there are clearly either bimodal relationships (e.g. upper left panel, HAC/TMTT, HAC/MNT) or even stranger relationships (e.g. upper left panel, IND/SQT, IND/MNT) that are not being well explained by a correlation plot. It is not appropriate to discuss the correlation factors presented here and to draw such strong conclusions on emission kinetics. The comparison between plants under continuous light and normal light:dark conditions is interesting, but I think there are better ways to examine these relationships, for example, multivariate analysis might reveal some patterns.

      Response: We thank the reviewer for their comment. With our analysis we aimed at testing specifically whether the high release of known bioactive volatiles (GLVs and indole) by sender plants on the second day can explain the higher terpene emissions in the receiver plants. We explicitly mention this in the text (L176-L186). Indeed, under normal light conditions (light and dark phase), there are clear positive correlations between the GLV release of sender plants and the terpene release of receiver plants over time (see also Fig 1 and Fig 5). However, under continuous light conditions, GLV emissions in sender plants no longer correlate with terpene emissions in receiver plants (also apparent by comparison of Fig 4 and Fig 5). This shows that temporal variation in GLV emissions are insufficient to explain the delayed terpene burst. This is the relevant conclusion we draw from this analysis. As presented, we find the data to provide strong evidence that the delayed burst in receiver plant terpene emissions cannot be solely explained by higher availability of active signals on the second day. The priming experiment in Figure 7 then provides a direct additional test for this concept. While more complex analyses could indeed reveal additional patterns, these would not be particularly informative for the question at hand.

      In Figure 2, the elevated concentrations of beta-caryophyllene found in the control plants at 8h and 16.75h measurement timepoints are curious. Is this something that is commonly seen in B73?

      Response: We thank the reviewer for this comment. A small number of untreated plants indeed accumulated β -caryophyllene at night, which is likely the result of biological variability between samples. Our plants were soil-grown, and it is for instance possible that variation in soil biota may account for this variability. Alternatively, some plants may have been slightly stressed during handling. Note that this variability does not affect any of the conclusions in our manuscript.

      While there can be discrepancies between emissions and compounds actually present within leaf tissue, it is a little bit odd that such high levels of b-caryophyllene were found at these timepoints, however, this is not reflected in the PTR-ToF-MS measurements of sesquiterpenes. It would be beneficial to include an overview of the mechanism of production and storage of sesquiterpenes in maize leaves, which would clarify why high amounts were found only in the GC-MS analysis and not the PTR-ToF-MS analysis, which is a more sensitive analytical tool. It is possible that the amounts of b-caryophyllene present in the leaf are actually extremely low, however as the values are not given as a concentration but rather arbitrary units, it is not possible to tell. I would include a line explaining what is seen with b-caryophyllene.

      Response: Thank you for this comment. It is important to note that accumulation in maize leaves can differ substantially from emission, especially at night when stomata are closed. This has been observed before in maize leaves (Seidl-Adams et al., 2015). As the reviewer suspects, earlier work indeed found that β-caryophyllene is a minor sesquiterpene compared to β-farnesene and α-bergamotene in B73 ( Block et al., 2018). The PTR-ToF-MS does not discriminate between terpenes with the same m/z and thus measures total sesquiterpene emissions. Given that sesquiterpene emissions are strongly regulated by stomatal aperture and that overall sesquiterpene accumulation in control plants is low, it is not surprising that we measure only minor amounts of sesquiterpene emissions in general, and in control plants in particular. We now text to the manuscript to explain these aspects (L116-L122). Block, A.K., Hunter, C.T., Rering, C. et al. Contrasting insect attraction and herbivore-induced plant volatile production in maize. Planta 248, 105–116 (2018).

      Seidl-Adams I, Richter A, Boomer KB, Yoshinaga N, Degenhardt J, Tumlinson JH. Emission of herbivore elicitor-induced sesquiterpenes is regulated by stomatal aperture in maize (Zea mays) seedlings. Plant Cell Environ. 38, 23-34 (2015).

      Additionally, it seems like the amounts of TMTT within the leaf are extraordinarily high (judging only by the au values given for scale), far higher than one would expect from maize.

      Response: We are unsure about the reviewer’s interpretation here. The AU values do not allow for conclusions regarding total quantities. An earlier study found that TMTT in induced B73 plants accumulates to similar amounts as β-caryophyllene (Block et al., 2018), thus it is not surprising to detect significant TMTT pools in induced maize leaves. It is important to note that the aim of the experiment here was to test the hypothesis that plants may be hyperaccumulating volatiles when the stomata are closed at night, which could potentially explain the delayed terpene burst on the second day. We do not observe such a hyperaccumulation, thus ruling out this as the primary factor responsible for the observed phenomenon. This is further supported by the continuous light experiments, where the delayed burst in terpene emission is not hindered by the lack of a dark phase.

      Block, A.K., Hunter, C.T., Rering, C. et al. Contrasting insect attraction and herbivore-induced plant volatile production in maize. Planta 248, 105–116 (2018).

      Reviewer #2 (Public Review):

      The exact dynamics of responses to volatiles from herbivore-attacked neighbouring plants have been little studied so far. Also, we still lack evidence of whether herbivore-induced plant volatiles (HIPVs) induce or prime plant defences of neighbours. The authors investigated the volatile emission patterns of receiver plants that respond to the volatile emission of neighbouring sender plants which are fed upon by herbivorous caterpillars. They applied a very elegant approach (more rigorous than the current state-of-the-art) to monitor temporal response patterns of neighbouring plants to HIPVs by measuring volatile emissions of senders and receivers, senders only and receivers only. Different terpenoids were produced within 2 h of such exposure in receiver plants, but not during the dark phase. Once the light turned on again, large amounts of terpenoids were released from the receiver plants. This may indicate a delayed terpene burst, but terpenoids may also be induced by the sudden change in light. A potential caveat exists with respect to the exact timing and the day-night cycle. The timing may be critical, i.e. at which time-point after onset of light herbivores were placed on the plants and how long the terpene emission lasted before the light was turned off. If the rhythm or a potential internal clock matters, then this information should also be highly relevant. Moreover, light on/off is a rather arbitrary treatment that is practical for experiments in the laboratory but which is not a very realistic setting. Particularly with regard to terpene emission, the sudden turning on of light instead of a smooth and continuous change to lighter conditions may trigger emission responses that are not found in nature.

      Response: We thank the reviewer for their comment. Although not explicitly mentioned it in the initial draft of the MS, we employed 15 min transition periods for light and dark phase transitions with a light intensity of 60 µmol m-2 s-1 (compared to 300 µmol m-2 s-1 at full light) to achieve a more gradual transition. We now included this information in the manuscript (L291-L292).

      As one contrasting control, the authors also studied the time-delay in volatile emission when plants were just kept under continuous light (just for the experiment or continuously?). Here they also found a delayed terpenoid production, but this seemed to be lower compared to the plants exposed to the day-night-cycle. Another helpful control would be to start the herbivory treatment in the evening hours and leave the light on. If then again plants only release volatiles after a 17 h delay, the response is indeed independent of the diurnal clock of the plant.

      Response: This is a very interesting point raised by the reviewer. We now conducted an additional experiment under continuous light where we started the herbivory treatment just before the start of the dark phase (ca. 20:00 PM). We found a similar pattern: a distinct delay in the highest burst. However, interestingly, the burst was shifted from 12-18 hr to 10-12 hr (Supplemental Fig 1). This burst aligned reasonably well with the point at which lights would normally be turned on again. In light of this, and, as the herbivore additions typically started ca. 5 hrs after the onset of light following a dark period (Figures 1-7), we wanted to rule out the possibility that the lack of a burst on the first day, was simply due to a difference in induction capacity depending on how shortly after the onset of light plants became exposed to GLVs. As such, we designed an additional experiment to examine whether exposure to GLVs immediately after the lights come on induce higher terpene emissions than plants exposed to GLVs ca. 5 hr after lights come on (Supplemental Fig 2). Interestingly, emissions across the terpenes were similar, regardless how long after the onset of lights on plants were exposed to GLVs. This suggests that the delayed burst is not due to the fact that, on the second day, plants are exposed to GLVs immediately after the lights come on whereas the first day they are only exposed 5 hr after the lights come on. Both continuous light experiments (normal timing and shifted timing) show bursts that occur slightly earlier than we observe with under normal day : night light conditions (L159-L166 and L207-L211), suggesting an interaction between circadian and diurnal processes. For instance, it is possible that plants would start producing volatiles slightly earlier than the onset of the day, however, light and stomatal opening limits the exact timing of the burst under normal light:dark transitions. The additional data provide further evidence for the delayed burst as a timed response in maize plants.

      Additionally, we have added explanation the continuous light figure legends that plants were grown under normal conditions and lights were only left on following treatment.

      Interestingly, internal terpene pools of one of the leaves tested here remained more comparable between night and day, indicating that their pools stay higher in plants exposed to HIPVs. In contrast, terpene synthases were only induced during the light-phase, not in the dark-phase. Moreover, jasmonates were only significantly induced 22 h after the onset of the volatile exposure and thus parallel with the burst of terpene release. An additional experiment exposing plants to the green leaf volatile (glv) (Z)-3-hexenyl acetate revealed that plants can be primed by this glv, leading to a stronger terpene burst. The results are discussed with nice logic and considering potential ecological consequences. Some data are not discussed, e.g. the jasmonate and gene induction pattern.

      Response: Thanks for this comment. We have added a sentence regarding the jasmonate data suggesting that, in addition to providing an additional layer of evidence for the observed delay, suggest that other JA-dependent defenses in maize may follow similar temporal patterns (L254-L257).

      Overall, this study provides intriguing insights into the potential interplay between priming and induction, which may co-occur, enhancing (indirect and direct) plant defence. Follow-up studies are suggested that may provide additional evidence.

      Reviewer #1 (Recommendations For The Authors):

      Could the authors please explain why they chose not to calculate concentrations for VOCs? Perhaps it is that B73 is a very unique variety in that it contains very high levels of TMTT, even in control plants? This should be clarified by the authors.

      Response: We address this comment in the public review portion

      For the legend within Figure 2, I would move it to be in the upper left or right corners of the figure. It is not easy to see in its current position.

      Response: We have moved the figure legend based on the reviewers recommendation

      Figures depicting PTR-ToF-MS data: add m/z values to either the figures themselves and/or the legends.

      Response: We have added m/z values to the legends and added molecular formulas of protonated compounds to each panel.

      Overall, here are some other suggestions: I am slightly weary of the term "clocked response". I'm not sure this is the correct fit for what you are trying to convey. I think "regulated" is a better term than "clocked". I understand that it is likely a stylistic choice to use this word, however, I advise reconsidering for the sake of clarity of the results.

      Response: Thank you. We find clocked to be an appropriate term, as it highlights the temporal aspect of the burst, and have thus left the title as is.

      Have another look at the references as some are not in the correct format (i.e., species not in italics).

      Response: We have checked and corrected the references

      Reviewer #2 (Recommendations For The Authors):

      Line 23: consequences on community dynamics are not investigated here, so this is a bit misleading.

      Last sentence of the abstract: It would be nice to read the answer to this long-standing question here.

      Response: We have deleted he sentence about community dynamics and provided a more concrete final sentence (L38-L40)

      Lines 48-50: The example does not fit so well with the first sentence and is not entirely clear (relation to temporal dynamics; similar to what?).

      Response: We have reworded the sentence for clarity (L49-L52)

      Line 56: "volatiles" should be plural.

      Response: Changed (L58)

      Line 58: "to be produced" rather than "to produce"

      Response: This seems a stylistic choice, and have left it as is.

      End of abstract: Did you have any hypotheses? These should be stated here.

      Response: The listing of hypotheses is also a stylistic choice, which is in some cases required by journals, but not eLife. As such we have not included a discrete list of hypotheses and instead describe what we aimed to investigate and what we found.

      Line 93: "This response disappeared at night." Does this mean: "No volatiles were emitted during night"? Or was this a gradual disappearance? How many hours after the onset of light did the herbivore treatment start and how many hours after the first emission of volatiles was the light turned off?

      Response: We have added when herbivory began (L92-L93) and changed the text to ‘as soon as light was restored’ (L97-L98).

      Line 93: "as soon as the night was over" means practically rather "as soon as the light was switched on".

      Response: See above

      Line 91: "small induction" - do you mean "low amounts of xxx"?

      Response: We mean a small induction. Terpene emission is relatively low (hence small), but still induced relative controls.

      Line 91: which mono- and sesquiterpenes were monitored?

      Response: It is PTR-ToF-MS a thus we cannot identify individual sesquiterpenes and monoterpenes (as they all have the same mass), and thus group them generally.

      Figure 1: What exactly is the "control"? And what does the vertical hatched line in the beginning represent?

      Response: We have defined the control and added a sentence describing the vertical hatched line

      "Black points represent the same but with undamaged sender plants" - what is "the same" here? I find that a bit confusing!

      Response: We have rephrased

      Line 104: how do you define an "overaccumulation"?

      Response: We have added ‘above daytime levels’ to clarify that we mean over daytime levels (L106)

      Why was the oldest developing leaf chosen? Is this the largest one when plants are two weeks old? How many leaves do they have then? Is this the leaf with the highest biomass?

      Response: We chose this leaf as it is the largest and also highly responsive to HIPVs. We have added this sentence (with a reference) in the methods section (L369-L370)

      Line 107: "started increasing after 3 hours" - they may already have started before. The following description also sounds like the dynamics were investigated here. However, instead the authors measured samples at four distinct time-points and cannot say whether something "began" or "remained" etc. The wording should be changed to a more appropriate description, describing the differences at a given time-point.

      Response: We changed the wording to ‘were marginally induced after 3 hr’ see L110

      Line 113: What do you mean by "delete BELOW NIGHTTIME levels"?

      Response: The word we used was ‘deplete’ to ‘drop’ (L116)

      Line 114: "the expression of terpene synthases" add "in the receiver plants exposed to HIPVs."

      Response: Added

      Figure 2ff: The situation of receiver plants exposed to control plant volatiles is not explained in the method section and also not depicted in the Suppl. Fig. 1. Here, the sender plants seem to always have been induced (if the red star-like structure should resemble an induction - a legend may be helpful here).

      Response: We have changed to ‘connected to undamaged sender plants’. We additionally added a sentence to the methods section describing controls L300

      Line 140: This treatment is not described in the methods section. Were the plants only kept under constant conditions for the 2 experimental days? Compared to the induction shown in Fig. 1, the amount of released volatiles seems less here.

      Response: We have added explanation of this to the figure legends, explaining that plants were grown under normal conditions and lights were only left on following treatment

      Another helpful control would be to start the herbivory treatment in the evening hours and leave the light on. If then again plants only release volatiles after a 17 h delay, the response is indeed independent of the diurnal clock of the plant.

      Response: See public review comment. We have added this experiment and discuss it accordingly in the MS (L159-L166 and L207-L211)

      Line 157: Check sentence/grammar

      Response: Checked and modified

      Figure 5: I suggest using a different colour for volatiles released from the sender plants, not again the green also used in the other figures for the receiver plants. This would help the reader to quickly see which plants are in focus in each figure.

      Response: We have changed the color of the figures for clarity

      Figure 6 legend: check grammar in several sentences (use of singular vs. plural)

      Response: We have made the tense uniform

      The diurnal rhythm of jasmonates (and potentially also terpene synthases?) is not considered in the discussion.

      Response: See above, and we have added a sentence to the discussion mentioning the jasmonates (L254-L257)

      Line 230-231: check grammar. Given the complexity, the response pattern may not be so predictable.

      Response: We do not understand this comment, but have checked the grammar throughout the manuscript.

      Line 235: I like the discussion on potential ecological consequences.

      While some interpretation for each experiment is already given in the results section, not all results are discussed in the discussion section. For example, the jasmonate data are not discussed. This should be added.

      Response: See above

      Line 266: To get an idea about the plant size: How many leaves do the plants have in that stage?

      Response: Added a sentence describing the size L287-L288

      Line 321: change to "as in the greenhouse"

      Response: Changed

      Line 334: How were the terpenoids identified and, in particular, quantified?

      Response: Added (L379-L380)

      Line 354: Maybe rather change to: "Plant treatments and tissue collection for phytohormone sampling were identical as described above for terpene and gene expression analysis.

      Response: Changed

      Line 357: add "material" or "leaf tissue" after "flash frozen"

      Response: Added

      Line 359: What was the source of the isotopically labelled phytohormones?

      Response: Added (L400-L403)

      Line 360: The phytohormones are "analyzed" using UPLC. The "quantification" is then done afterward. Please correct.

      Response: Corrected (L404)

      Overall: a great approach and a wonderful idea!

      Thanks

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript investigates the role of membrane contact sites (MCSs) and sphingolipid metabolism in regulating vacuolar morphology in the yeast Saccharomyces cerevisiae. The authors show that tricalbin (1-3) deletion leads to vacuolar fragmentation and the accumulation of the sphingolipid phytosphingosine (PHS). They propose that PHS triggers vacuole division through MCSs and the nuclear-vacuolar junction (NVJ). The study presents some solid data and proposes potential mechanisms underlying vacuolar fragmentation driven by this pathway. However, there are some concerns regarding the strength and interpretation of their lipid data, and the robustness of some conclusions. The manuscript would benefit from addressing these concerns and providing more conclusive evidence to support the proposed conclusions. Overall, the study provides valuable insights into the connection between MCSs, lipid metabolism, and vacuole dynamics, but further clarification will be highly valuable to strengthen the conclusions.

      We thank the thoughtful and positive feedback from Reviewer #1. Nevertheless, there are concerns raised regarding the strength and interpretation of the lipid data, as well as the robustness of specific conclusions. We acknowledge the importance of addressing the raised concerns and provide more conclusive evidence to support our proposed conclusions. We have responded in the "Recommendations to Authors" section and hope that our research has been further strengthened.

      Reviewer #2 (Public Review):

      This manuscript investigates the mechanism behind the accumulation of phytosphingosine (PHS) and its role in triggering vacuole fission. The study proposes that membrane contact sites (MCSs) are involved in two steps of this process. First, tricalbin-tethered MCSs between the endoplasmic reticulum (ER) and the plasma membrane (PM) or Golgi modulate the intracellular amount of PHS. Second, the accumulated PHS induces vacuole fission, most likely via the nuclear-vacuolar junction (NVJ). The authors suggest that MCSs regulate vacuole morphology through sphingolipid metabolism.

      While some of the results in the manuscript are interesting the overall logic is hard to follow. In my assessment of the manuscript, my primary concern lies in its broad conclusions which, in my opinion, exceed the available data and raise doubts. Here are some instances where this comes into play for this manuscript:

      We greatly appreciate the careful insights into our research from Reviewer #2. We have sincerely addressed the points one by one in the following.

      Major points for revision

      1) The rationale to start investigating a vacuolar fission phenotype in the beginning is very weak. It is basically based on a negative genetic interaction with NVJ1. Based on this vacuolar fragmentation is quantified. The binning for the quantifications is already problematic as, in my experience, WT cells often harbor one to three vacuoles. How are quantifications looking when 1-3 vacuoles are counted as "normal" and more than 3 vacuoles as "fragmented"? The observed changes seem to be relatively small and the various combinations of TCB mutants do not yield a clear picture.

      The number of vacuoles at a steady state could be influenced by various environmental factors, including the composition of the medium (manufacturer supplying the reagent and local water hardness) and the background of the strain. Possibly due to those causes, our observations differ from the experience of Reviewer #2. Indeed, we observed that WT cells always have one vacuole in YPD medium. Whereas in SD medium (Fig S3B only), WT cells have mainly one or two vacuoles per cell. In both cases, we observed that some of the mutants showed a different phenotype from the WT and that those differences are supported by student’s t-test and two-way ANOVA analysis.

      2) The analysis of the structural requirements of the Tcb3 protein is interesting but does not seem to add any additional value to this study. While it was used to quantify the mild vacuolar fragmentation phenotype it does not reoccur in any following analysis. Is the tcb3Δ sufficient to yield the lipid phenotype that is later proposed to cause the vacuolar fragmentation phenotype?

      We do not know whether tcb3Δ alone is sufficient to increase PHS as we have not examined it. Nevertheless, as another approach, we analyzed the difference in IPC level between tcb1Δ2Δ3Δ triple deletion and tcb3Δsingle deletion in a sec18 mutant background and showed that the reduction of IPC synthesis is similar between tcb1Δ2Δ3Δand tcb3Δ alone (unpublished). This result suggests that out of all tricalbins (Tcb1, Tcb2 and Tcb3), Tcb3 plays a central role. In addition, the IPC synthesis reduction phenotype was small in tcb1Δ alone and tcb2Δ alone, but a strong phenotype appeared in the tcb1Δtcb2Δ combined deletion (as strong as in tcb3Δ alone). The relationship between Tcb1 Tcb2 and Tcb3 indicated by these results is also consistent with the results of the structural analysis in this study. We have shown that Tcb3 physically interacts with Tcb1 and Tcb2 by immunoprecipitation analysis (unpublished). In the future, we plan to investigate the relationship between Tcb proteins in more detail, along with the details of the interactions between Tcb1, Tcb2, and Tcb3.

      3) The quantified lipid data also has several problems. i) The quantified effects are very small. The relative change in lipid levels does not allow any conclusion regarding the phenotypes. What is the change in absolute PHS in the cell. This would be important to know for judging the proposed effects. ii) It seems as if the lipid data is contradictory to the previous study from the lab regarding the role of tricalbins in ceramide transfer. Previously it was shown that ceramides remain unchanged and IPC levels were reduced. This was the rationale for proposing the tricalbins as ceramide transfer proteins between the ER and the mid-Golgi. What could be an explanation for this discrepancy? Does the measurement of PHS after labelling the cells with DHS just reflect differences in the activity of the Sur2 hydroxylase or does it reflect different steady state levels.

      i) As Reviewer #2 pointed out, it is a slight change, but we cannot say that it is not sufficient. We have shown that PHS increases in the range of 10~30% depending on the concentration of NaCl that induces vacuole division (This result is related to the answers to the following questions by Reviewer #3 and to the additional data in the new version). This observation supports the possibility that a small increase in PHS levels may have an effect on vacuole fragmentation. We did not analyze total PHS level by using methods such as liquid chromatography-mass spectrometry or ninhydrin staining of TLC-separated total lipids. The reason for this is that radiolabeling of sphingolipids using the precursor [3H]DHS provides higher sensitivity and makes it easier to detect differences. Moreover, using [3H]DHS labeling, we only measure PHS that is synthesized in the ER and that doesn’t originate from degradation of complex sphingolipids or dephosphorylation of PHS-1P in other organelles.

      ii) In our previous study (Ikeda et al. iScience. 2020), we separated the lipid labeled with [3H]DHS into ceramides and acylceramides. There was no significant change in ceramide levels, but acylceramides increased in tcb1Δ2Δ3Δ. Since we did not separate these lipids in the present study, the data shows the total amount of both ceramide and acylceramide. We apologize that the term in Figure 3A was wrong. We have corrected it. Also, we have used [3H]DHS to detect IPC levels, which differs from the previous analysis used [3H]inositol. This means the lipid amounts detected are completely different. Since the amount of inositol incorporated into cells varies from cell to cell, the amount loaded on the TLC plate is adjusted so that the total amount (signal intensity) of radioactively labeled lipids is almost the same. In contrast, for DHS labeling, the amount of DHS attached to the cell membrane is almost the same between cells, so we load the total amount onto the TLC plate without adjustment. In addition, the reduction in IPC levels due to Tcb depletion that we previously reported was seen only in sec12 or sec18 mutation backgrounds, and no reduction in IPC levels was observed in the tcb1Δ2Δ3Δ by [3H]inositol labeling (Ikeda et al. iScience. 2020). Therefore, we cannot simply compare the current results with the previous report due to the difference in experimental methods.

      The labeling time for [3H]DHS is 3 hours, and we are not measuring steady-state amounts, but rather analyzing metabolic reactions. Since [3H]DHS is converted to PHS by Sur2 hydroxylase in the cell, the possibility that differences in PHS amounts reflect differences in Sur2 hydroxylase activity cannot be ruled out. However, this possibility is highly unlikely since we have previously observed that the distribution of ceramide subclasses is hardly affected by tcb1Δtcb2Δtcb3Δ (Ikeda et al. iScience 2020). We have added to the discussion that the possibility of differences in Sur2 hydroxylase activity cannot be excluded.

      4) Determining the vacuole fragmentation phenotype of a lag1Δlac1Δ double mutant does not allow the conclusion that elevated PHS levels are responsible for the observed phenotype. This just shows that lag1Δlac1Δ cells have fragmented vacuoles. Can the observed phenotype be rescued by treating the cells with myriocin? What is the growth rate of a LAG1 LAC1 double deletion as this strain has been previously reported to be very sick. Similarly, what is the growth phenotype of the various LCB3 LCB4 and LCB5 deletions and its combinations.

      As Reviewer #2 pointed out, the vacuolar fragmentation in lag1Δlac1Δ itself does not attribute to the conclusion that increased PHS levels are the cause. Since this mutant strain has decreased level of ceramide and its subsequent product IPC/MIPC in addition to the increased level of the ceramide precursors LCB or LCB-1P, we have changed the manuscript as follows. As noted in the following comment by reviewer #2, myriocin treatment has been reported to induce vacuolar fragmentation, so we do not believe that experiments on recovery by myriocin treatment will lead to the expected results.

      ・ Previous Version: We first tested whether increased levels of PHS cause vacuolar fragmentation. Loss of ceramide synthases could cause an increase in PHS levels. Our analysis showed that vacuoles are fragmented in lag1Δlac1Δ cells, which lack both enzymes for LCBs (DHS and PHS) conversion into ceramides (Fig 3B). This suggests that ceramide precursors, LCBs or LCB-1P, can induce vacuolar fragmentation.

      ・Current Version: We first evaluated whether the increases in certain lipids are the cause of vacuolar fragmentation in tcb1Δ2Δ3Δ. Our analysis showed that vacuoles are fragmented in lag1Δlac1Δ cells, which lack both enzymes for LCBs (DHS and PHS) conversion into ceramides (Fig 3B). This suggests that the increases in ceramide and subsequent products IPC/MIPC are not the cause of vacuolar fragmentation, but rather its precursors LCBs or LCB-1P.

      As reviewer #2 pointed out, the lag1Δlac1Δ double mutant is very slow growing as shown below (Author response image 1). We also examined the growth phenotype of LCB3, LCB4, and LCB5 deletion strains, and found that the growth of these strains was the same as the wild strains, with no significant differences in growth (Author response image 1).

      Author response image 1.

      Cells (FKY5687, FKY5688, FKY36, FKY37, FKY33, FKY38) were adjusted to OD 600 = 1.0 and fivefold serial dilutions were then spotted on YPD plates, then incubated at 25℃ for 3 days.

      5) The model in Figure 3 E proposes that treatment with PHS accumulates PHS in the endoplasmic reticulum. How do the authors know where exogenously added PHS ends up in the cell? It would also be important to determine the steady state levels of sphingolipids after treatment with PHS. Or in other words, how much PHS is taken up by the cells when 40 µM PHS is added?

      It has been found that the addition of PHS well suppresses the Gas1 trafficking (Gaigg et al. J Biol Chem. 2006) and endocytosis phenotypes in lcb-100 mutants (Zanolari et al. EMBO J. 2000). Their suppression depends on Lcb3 localized to the ER. Thus, we know that PHS added from outside the cell reaches the ER and is functional.

      We also agree that it is important to measure the amount of PHS taken up into the cells. However, this is extremely difficult to do for the following reasons. The majority of PHS added to the medium remains attached to the surface layer of the cells. If we measure the lipids in the cells by MS, we would detect both lipids present on the outside and inside of the plasma membrane. This means we need to separate the outside from the inside of the cell's membrane to determine the exact amount of LCB that has taken up by the cells. Regretfully, this separation is currently technically difficult.

      6) Previous studies have observed that myriocin treatment itself results in vacuolar fragmentation (e.g. Hepowit et al. biorXivs 2022, Fröhlich et al. eLife 2015). Why does both, depletion and accumulation of PHS lead to vacuolar fragmentation?

      It’s exactly as Reviewer #2 said. Consistent with previous results with myriocin treatment, we also observed vacuolar fragmentation in the lcb1-100 mutant strain. Then we have added these papers to the references for further discussion. Our discussion is as follows.

      "Previous studies have observed that myriocin treatment results in vacuolar fragmentation (Hepowit et al. bioRxiv 2022; Now published in J Cell Sci. 2023, Fröhlich et al. eLife 2015). Myriocin treatment itself causes not only the depletion of PHS but also of complex sphingolipids such as IPC. This suggests that normal sphingolipid metabolism is important for vacuolar morphology. The reason for this is unclear, but perhaps there is some mechanism by which sphingolipid depletion affects, for example, the recruitment of proteins required for vacuolar membrane fusion. In contrast, our new findings show that both PHS increase and depletion cause vacuole fragmentation. Taken together, there may be multiple mechanisms controlling vacuole morphology and lipid homeostasis by responding to both increasing and decreasing level of PHS."

      7) The experiments regarding the NVJ genes are not conclusive. While the authors mention that a NVJ1/2/3 MDM1 mutant was shown to result in a complete loss of the NVJ the observed effects cannot be simply correlated. It is also not clear why PHS would be transported towards the vacuole. In the cited study (Girik et al.) the authors show PHS transport from the vacuole towards the ER. Here the authors claim that PHS is transported via the NVJ towards the vacuole. Also, the origin of the rationale of this study is the negative genetic interaction of tcb1/2/3Δ with nvj1Δ. This interaction appears to result in a strong growth defect according to the Developmental Cell paper. What are the phenotypes of the mutants used here? Does the additional deletion of NVJ genes or MDM1 results in stronger growth phenotypes?

      We seriously appreciate the concerns in our research. As reviewer #2 pointed out, we have not shown evidence in this study to support that PHS is transported directly from the ER to the vacuole, so it is unclear whether PHS is transported to the vacuole and its physiological relevance. Girik et al. showed that the NVJ resident protein Mdm1 is important for PHS transport between vacuole and ER. Given the applied experimental method that tracks PHS released in the vacuole, indeed only transport of PHS from the vacuole to the ER was verified. However, assuming that Mdm1 transports PHS along its concentration gradient we consider that under normal conditions, PHS is transported from the ER (as the organelle of PHS synthesis) to the vacuole. We clarified this interpretation by adding the following sentences to the manuscript at line 313:

      “The study applied an experimental method that tracks LCBs released in the vacuole and showed that Mdm1p is necessary for LCBs leakage into the ER. However, assuming that Mdm1p transports LCBs along its concentration gradient we consider that under normal conditions, LCBs is transported from the ER (as the organelle of PHS synthesis) to the vacuole.”

      The negative genetic interaction between tcb1/2/3Δ and nvj1Δ is consistent with this model, but under our culture conditions we did not observe a negative interaction between the genes encoding the TCB3 and NVJ junction proteins (Author response image 2). We do not know if this is due to strain background, culture conditions, or whether the deletions of TCB1 and TCB2 are also required for the negative interaction. We would like to analyze details in the future.

      Author response image 2.

      Cells (FKY 3868, FKY5560, FKY6187, FKY6189, FKY6190, FKY6188, FKY6409) were adjusted to OD 600 = 1.0 and fivefold serial dilutions were then spotted on YPD plates, then incubated at 25℃ for 3 days.

      Our results in this study show that deletion of the NVJ component gene partially suppresses vacuolar fission upon the addition of PHS. To clarify these facts, we have changed the sentences in Results and Discussion of our manuscript as follows. We hope that this change will avoid over-interpretation.

      ・ Previous: To test the role of NVJ-mediated “transport” for PHS-induced vacuolar fragmentation,

      ・Current: To test the role of NVJ-mediated “membrane contact” for PHS-induced vacuolar fragmentation,

      ・Previous: Taken together, we conclude from these findings that accumulated PHS in tricalbin deleted cells triggers vacuole fission via “non-vesicular transport of PHS” at the NVJ.

      ・Current: Taken together, we conclude from these findings that accumulated PHS in tricalbin deleted cells triggers vacuole fission via “contact between ER and vacuole” at the NVJ.

      ・Previous: Because both PHS- and tricalbin deletion-induced vacuolar fragmentations were partially suppressed by the lack of NVJ (Fig 4B, 4C), it is suggested that transport of PHS into vacuoles via the NVJ is involved in triggering vacuolar fragmentation.

      ・Current: Based on the fact that both PHS- and tricalbin deletion-induced vacuolar fragmentations were partially suppressed by the lack of NVJ (Fig 4B, 4C), it is possible that the trigger for vacuolar fragmentation is NVJ-mediated transport of PHS into the vacuole.

      8) As a consequence of the above points, several results are over-interpreted in the discussion. Most important, it is not clear that indeed the accumulation of PHS causes the observed phenotypes.

      We thank the suggestion by Reviewer #2. In particular, the concern that PHS accumulation really causes vacuolar fragmentation could only be verified by an in vitro assay system. This is an important issue to be resolved in the future.

      Reviewer #3 (Public Review):

      In this manuscript, the authors investigated the effects of deletion of the ER-plasma membrane/Golgi tethering proteins tricalbins (Tcb1-3) on vacuolar morphology to demonstrate the role of membrane contact sites (MCSs) in regulating vacuolar morphology in Saccharomyces cerevisiae. Their data show that tricalbin deletion causes vacuolar fragmentation possibly in parallel with TORC1 pathway. In addition, their data reveal that levels of various lipids including ceramides, long-chain base (LCB)-1P and phytosphingosine (PHS) are increased in tricalbin-deleted cells. The authors find that exogenously added PHS can induce vacuole fragmentation and by performing analyses of genes involved in sphingolipid metabolism, they conclude that vacuolar fragmentation in tricalbin-deleted cells is due to the accumulated PHS in these cells. Importantly, exogenous PHS- or tricalbin deletion-induced vacuole fragmentation was suppressed by loss of the nucleus vacuole junction (NVJ), suggesting the possibility that PHS transported from the ER to vacuoles via the NVJ triggers vacuole fission.

      This work provides valuable insights into the relationship between MCS-mediated sphingolipid metabolism and vacuole morphology. The conclusions of this paper are mostly supported by their results, but there is concern about physiological roles of tricalbins and PHS in regulating vacuole morphology under known vacuole fission-inducing conditions. That is, in this paper it is not addressed whether the functions of tricalbins and PHS levels are controlled in response to osmotic shock, nutrient status, or ER stress.

      We appreciate the comment, and we consider it an important point. To answer this, we have performed additional experiments. Please refer to the following section, "Recommendations For The Authors" for more details. These results and discussions also have been added to the revised Manuscript. We believe this upgrade makes our findings more comprehensive.

      There is another weakness in their claim that the transmembrane domain of Tcb3 contributes to the formation of the tricalbin complex which is sufficient for tethering ER to the plasma membrane and the Golgi complex. Their claim is based only on the structural simulation, but not on biochemical experiments such as co-immunoprecipitation and pull-down.

      We appreciate your valuable suggestion and would like to attempt to improve upon it in the future.

      Author response to Recommendations:

      The following is the authors' response to the Recommendations For The Authors. We have now incorporated the changes recommended by Reviewers to improve the interpretations and clarity of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      I would recommend the authors provide additional experimental data to fully support their claims or revise the writing of their manuscript to be more precise in their conclusions. In particular, I have suggestions/questions:

      Fig. 1A: display the results as in 1B (that is, different colors for different number of vacuoles, and the x axes showing the different conditions, in this case WT vs tcb1∆2∆3∆.

      In response to the suggestion of Reviewer #1, we have changed the display of results.

      Fig. S1B: the FM4-64 pattern looks different in the KO strain as compared to those shown in Fig. 1A. Is there a reason for that? Also, no positive control of cps1p not in the vacuole lumen is shown.

      Our apologies, this was probably due to the poor resolution of the images. We have made other observations and changed the Figure along with the positive control.

      Line 172: the last condition in Fig. 2B (vi), should be compared to the tcb1∆tcb2∆ condition (shown in fig 1).

      In response to the suggestion of Reviewer #1, we have changed the manuscript as follows: We found that cells expressing Tcb3(TM)-GBP and lacking Tcb1p and Tcb2p (Fig 2B (vi)) are even more fragmented than tcb1Δ2Δ in Fig 1B and are fragmented to a similar degree as tcb3Δ (Fig 1B and Fig 2B (ii)).

      Fig 2E: the model shown here can be tested, is there binding (similar to kin recognition mechanism of some Golgi proteins) between the different Tcb TMDs?

      As Reviewer #1 mentioned, we have confirmed by co-immunoprecipitation that Tcb3 binds to both Tcb1 and Tcb2 (unpublished). Furthermore, we will test if the binding can be observed with TMD alone in the future.

      Fig 3A: you measured an increase in PHS that is metabolized from DHS (which is what you label). Are there other routes to produce PHS independently of DHS? I mean, how is the increase reporting on the total levels of this lipid?

      PHS synthesized by Sur2 is converted to PHS-1P and phytoceramide. Conversely, PHS is reproduced by degradation of PHS1-P via Lcb3, Ysr3, and by degradation of phytoceramides via Ypc1 (Vilaça, Rita et al. Biochim Biophys Acta Mol Basis Dis. 2017. Fig1). Our analysis shows that these degradation substrates are not decreasing but rather accumulating in tcb1Δ2Δ3Δ strain, suggesting that the degradation system is not promoting PHS level. Therefore, the increase in detected PHS is most likely due to congestion/jams in metabolic processes downstream of PHS. Possible causes of the lipid metabolism disruption in Tcbdeletion cells have been discussed in the Discussion. To put it simply, (1) The reduced activity of a PtdIns4P phosphatase Sac1, due to MCS deficiency between ER and PM. (2) The impaired ceramide nonvesicular transport from the ER to the Golgi. (3) The low efficiency of PHS export by Rsb1, due to insufficient PHS diffusion between the ER and the PM.

      Line 248: did the authors test if the NVJ MCS is unperturbed in the triple Tcb KO?

      This is an exciting question. We are very interested in considering whether Tcb deficiency affects NVJ formation in terms of lipid transport. We would like to conduct further analysis in this regard in our future studies.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest carefully evaluating the findings in this manuscript. Right now the connection between elevated PHS levels and vacuolar fragmentation are not really supported by the data. One of the major issues in the field of yeast sphingolipid biology is that quantification of the lipid levels is difficult and labor- and cost-intensive. But I think that it is very important to directly connect phenotypes with the lipid levels.

      Minor points:

      • In figure 1 c and d WT controls of the different treatments are lacking.

      As reviewer #2 had pointed out, we have added data for the WT controls.

      • The tcb1Δmutant appears to be sensitive in pH 5.0 media while the triple tricalbins mutant grows fine. Is that a known phenotype?

      We have performed this assay on SD plates. Then, to check whether this phenotype of tcb1Δ was specific or general, we re-analyzed the same strain in YPD medium. In YPD medium, tcb1Δ strain grew normally, while the control, vma3Δ, was still pH sensitive. Therefore, the growth of this tcb1Δ strain is dependent on the nutrient conditions of the medium but does not appear to be pH sensitive. This new data was inserted as part of Supplementary Figure 1.

      • Line 305. The is an "of" in the sentence that needs to be deleted.

      As pointed out by Reviewer #2, we have corrected the sentence.

      Reviewer #3 (Recommendations For The Authors):

      In supplementary Fig 2, the authors show the involvement of the NVJ in hyperosmotic shockinduced vacuole fission, but the involvement of tricalbins and PHS in this process is not tested. Does osmotic shock affect the level or distribution of tricalbins and PHS? They will be able to test whether overexpression of tricalbins inhibits hyperosmotic shock-induced vacuole fission or not. Also, they will be able to perform the similar experiments upon ER stressinduced vacuole fission.

      We appreciate Reviewer#3 for suggesting that it is important to test the involvement of PHS in hyperosmotic shock- or ER stress-induced vacuole fission. We have shown in a previous report that treatment with tunicamycin, which is ER stress inducer, increased the PHS level by about 20% (Yabuki et al. Genetics. 2019. Fig4). In addition, we tested the effect of hyperosmolarity on PHS levels for this time. Analysis of PHS under hyperosmotic shock conditions (0.2 M NaCl), in which vacuolar fragments were observed, showed an increase in PHS of about 10%. Furthermore, when the NaCl concentration was increased to 0.8 M, PHS levels increased up to 30%. In other words, we have shown that PHS increases in the range of tens of percent depending on the concentration of NaCl that induces vacuole division. This observation supports the possibility that a small increase in PHS levels may have an effect on vacuole fragmentation. Moreover, NaCl-induced vacuolar fragmentation, like that caused by PHS treatment, was also suppressed by PHS export from the cell by Rsb1 overexpression.

      These new data are now inserted, commented and discussed in the manuscript as Figure 5. We hope that these results will provide further insight into the more general aspects of PHS involvement in the vacuole fission process.

      Minor points:

      1) It is unclear for me whether endogenous Tcb3 is deleted in cells expressing Tcb3-GBP (FKY3903-3905 and FKY4754). They should clearly mention that these cells do not express endogenous Tcb3 in the manuscript.

      We apologize that our description was not clear. In this strain, endogenous TCB3 gene is tagged with GBP and the original Tcb3 has been replaced by the tagged version. We have changed the description in our manuscript.

      2) The strength of the effect of PHS on vacuole morphology looks different in respective WT cells in Fig 3C, 4B, and S2B. Is this due to the different yeast strains they used?

      Yes, we used BY4742 background for the strain in Figure 3C, SEY6210 background in Figure 4B, and HR background in Figure S2B. As a matter of fact, we observed that the strength of the PHS effect varies depending on their background. Strain numbers are now given in the legend so that the cells used for each data can be referenced in the strain list.

      3) p.3, line 44: the "SNARE" complex (instead of "protease")?

      We thank for the remarks on the incorrect wording. We have corrected this sentence.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Strengths:

      The major strength of this paper is the series of laser cutting experiments supporting that asters position via pushing forces acting both on the boundary (see below for a relevant comment) and between asters. The combination of imaging, data analysis and mathematical modeling is also powerful.

      Author Response: We thank the Reviewer for the positive comments, especially in recognising the power of our quantitative approaches.

      Weaknesses:

      This paper has weaknesses, mainly in the presentation but also in the quality of the data which do not always support the conclusions satisfactorily (this might in part be a presentation issue).

      Author Response>: We address these concerns below.

      My overall suggestion for the authors is to explain better the motivation and interpretation of their experiments and also to remove some of the observations which seem to be there because they could be done rather than because they add to the main message of the paper, which I find straightforward, valuable and supported by the data in Figure 4.

      Author Response: We have extended the motivation of the study in the Introduction, and at the beginning of appropriate Results sections. We better motivate the force potential and especially the key results from Figure 4. We outline specific changes below.

      In Figure 2, it is difficult for me to understand what is being tracked. I believe that the authors track the yolk granules (visible as large green blobs) and not lipid droplets. There is some confusion between the text, legends and methods so I could not tell. If the authors are tracking yolk granules as a proxy for hydrodynamics flows it seems appropriate to cite previous papers that have used and verified these methods. More notably, this figure is somewhat disconnected with the rest of the paper. I find the analysis interesting in principle but would urge the authors to propose some interpretation of the experiments in the context of their big-picture message. At this point, I cannot understand what the Figure adds.

      Author Response: Indeed, we track the yolk droplets that move around the aster. In the extraction protocol, we likely get a mixture of lipid droplets and yolk granules; this is due to the extraction procedure involving shear forces within the pipette. We are not certain about the exact nature of these droplets, but they are likely to a large extent yolk. We have clarified the terminology in the text, the legend and methods section. In this figure, we now show that the droplets do not move towards the aster center as the hydrodynamic pulling model would suggest. Instead, they appear to passively respond to a repulsive force, that results in them streaming around the aster. We have added additional panels to the figure that illustrates the directionality of yolk granule movements (lines 159-164). We agree with the Reviewer that the context could have been clarified. The role of fluid flows in biological systems is, as the Reviewer highlights, well studied. We have added additional contextualisa8on in the text (lines 140-146). We also motivate more clearly the figure, as it provides evidence that the asters generate forces over 20µm scale (lines 159-164). This is highly relevant for one of the paper’s main conclusions – that the Drosophila blastocyst asters generate pushing forces that enable regular packing.

      In Figure 3, it is not surprising that the aster-aster interactions are different from interactions with the boundary which is likely more rigid. It is also hard to understand why the force and thus velocity should scale as microtubule length. This Figure should be better conceptualized. I think that it becomes clear at the end of the paper that the authors are trying to derive an effective potential to use in a mathematical model in Figure 5 to test their hypotheses. I think that should be told from the start, so a reader understands why these experiments are being shown.

      Author Response: We don’t claim that the force scales with microtubule length on a single microtubule. However, at larger distances from the aster, the microtubule density decreases, and hence the effective force decreases.

      The Reviewer is correct that we use these results to motivate our effective potential. We have brought this motivation forward in the manuscript to guide the reader (lines 169-171) and included a further note at the end of the section (lines 216-218).

      The experiments in Figure 4 are very nice in suppor8ng a pushing model. However, it would help if the authors could speculate what the single aster is pushing against in this experiment. The experiments reported in Figure 1 seemed to suggest that the aster mainly pushed against the boundary. In the experiments in Figure 4 do the individual asters touch the boundary on both sides? I think that readers need more information on what the extract looks like for those experiments.

      Author Response: We now include an additional panel B in Figure 4– that shows an example of an explant during aster ablation. The distance between asters is typically less than the distance to the explant boundary. Boundary effects likely play a small role in the aster-aster separation, in terms of potentially determining the axis of separation. However, the separation of asters occurs along a straight line for a substan8al period (>1 min) of separation; if boundary effects were more dominant, we may expect to see curving of the aster-aster separation trajectories as they also receive feedback from the boundary.

      Figure 4F could use some statistics. I doubt that the acceleration in the pink curves would be significant. I believe that the decelera8on is and that is probably the most crucial result. Since the authors present only 3 asters pairs it is important to be sure that these conclusions are solid.

      Author Response: We agree with the Reviewer. These experiments are challenging to do, as they require carefully controlled conditions. In two out of three experiments we see significant increase in acceleration in the pink curves. Of course, the interpretation of this must be caveated as our experimental number is low. These details are now provided in the revision (lines 263267).

      Reviewer 2

      Strengths:

      This study reveals a unique aster positioning mechanics in the syncytial embryo explant, which leads to an understanding of the mechanism underlying the positioning of multiple asters associated with nuclei in the embryo. The use of explants enabled accurate measurement of aster motility and, therefore, the construc8on of a quantitative model. This is a notable achievement.

      Author Response: We thank the Reviewer for their review, and in highlighting how our quantitative model is a clear step forward in our understanding of aster dynamics.

      Weaknesses:

      The main conclusion that aster repulsion predominates in this system has already been drawn by the same authors in their recent study (de-Carvalho et al., Development, 2022). As the present work provides additional support to the previous study using different experimental system, the authors should emphasize that the present manuscripts adds to it (but the conceptual novelty is limited).

      Author Response: While this study is related to the previous work, there are major differences. First, here we quantitatively assess aster dynamics within a “clean” system. Such accurate measurements are not possible in vivo currently. Further, experiments like laser ablation are much better defined within the explant system. We do recognise more clearly the previous work in the Introduc8on and lines 291-293, 299-300. Combined, with the different perspectives provided in these papers on the problem of aster positioning in syncytia, we believe these papers provide new and well-supported insights.

      The molecular mechanisms underlying aster repulsion remain unexplored since the authors were unable to identify specific factor(s) responsible for aster repulsion in the explant.

      Author Response: Given that the nature of the aster dynamics were not previously characterised, our work presents a major step forward. We show compelling evidence that an effective pushing force potential plays a role in aster interactions. With this critical knowledge, we can now explore for the potential molecular mechanisms – but such information lies beyond the current manuscript scope. This is particularly challenging due to the lack of specific microtubule drug inhibitors in Drosophila. We highlight related issues in the Discussion: paragraph starting on line 340 and lines 367-370.

      Specific suggestions:

      Microtubules should be visualized more clearly (either in live or fixed samples). This is particularly important in Figure 4E and Video 4 (laser ablation experiment to create asymmetric asters).

      Author Response: This is similar to Reviewer 1 final comment above. These experiments are very challenging and being able to see the microtubules with sufficient clarity is not straightforward. Given our controls and previous experience, we are confident we are ablating the microtubules.

      Minor points:

      1) The authors explain the roles of microtubule asters in several model systems in the first paragraph of the introduction part. Please specify the species and/or cell types in each description.

      Author Response: We have provided as suggested.

      2) In lines 164 and 172, the citing figure numbers should be modified to Supplementary Fig. 1A and 1B, respectively.

      Author Response: We thank the Reviewer for spotting this error. It has now been corrected.

      3) The authors showed in the previous study that the boundary in the explant does not have an intact cell cortex and f-actin compartments (de-Carvalho et al., Development, 2022). This important informa8on should also be described in the current manuscript. It is also valuable to mention whether the pulling force mechanism operates in embryos where the intact cell cortex is present.

      Author Response: This is an interesting point We have added a sentence in the discussion with this information. We have now added additional text in the Discussion (lines 324-327).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      It is somewhat speculative that the structure represents the EIIa-bound regulatory state. There's a strong enough case that it should be analyzed in the discussion, but I don't think it is firmly established. Therefore, the title of the paper should be changed.

      Our answer: Thank you for the comment. We have changed the title to “Mobile barrier mechanisms for Na+-coupled symport in an MFS sugar transporter”

      Reading through the manuscript, it was challenging to distinguish what is new in the current manuscript and what has been done previously. There were a lot of parts where it was hard for me to identify the main point of the current study among all the details of previous studies. It would also benefit from shortening. For example:

      -Page 6: Nb725 binding has already been characterized extensively in the very nice JBC paper earlier this year. It's important to test 725-4 for binding, but since it doesn't change the binding interaction, and probably wouldn't be expected to, the entire section could be written more succinctly. The main point, which is that 725-4 behaves like 725, is lost among all the details

      Our answer: Thanks for this instructive suggestion. We have shortened the description in this section.

      -Page 9-10. I don't understand what summarizing all of the results from the previous D59C studies adds to the current story. It's important because it provides an indication of the substrate binding site, but its mechanism of action does not seem relevant to the current work.

      Our answer: We have shortened the description of the sugar-binding site and moved the previous Fig. 3b to supplementary figure sFig. 11. According to your comment about showing the location of the binding sites, which is also suggested by Reviewer #2, we modified Fig. 3 and added two panels to map the location of the bound Na+ in the inward-facing structure and the bound sugar in the outward-facing structure.

      The sugar-binding site identified in the published structure is critical to construct the mobile barrier mechanism. The sugar-binding residues identified in the published structure provided essential data to support the conclusion that the sugar-binding pocket is broken in the inward-facing structure. Thus, this published structure is mechanistically relevant to the current study.

      -Page 12. Too much summary of the previous outward structure. Since this is already part of the literature, it would be more efficient to reference the previous data when it is important to interpret the new data (or show as a figure).

      Our answer: The introduction of the previous sugar-binding sit is important for the detailed comparison between the two states as discussed above, but we agree with this reviewer and have significantly shortened the paragraph by moving the detailed description into the legend to the sFig. 11.

      -Instead of providing the PDB ID in figures of the current structure, just say "current work" or similar. Then it is obvious you are not citing a previous structure.

      Our answer: To distinguish clearly the new data and published results, the citation of the cryoEM structure [PDP ID 8T60] has been completely removed from the main text but kept in sTable 1.

      -An entire panel of Figure 3 is dedicated to ligand binding in a previous outward-facing structure.

      Showing it in the overlay would be sufficient.

      Our answer: It is the first time for us to show a structure with a bound-Na+. Fig. 3 also illustrates the spatial relationship between the sugar-binding pocket and the cation-binding pocket since both binding sites are determined now. As stated above, according to two reviewers’ comments, we have modified the Figures and the Fig. 3d is the overlay.

      Please increase the size of the font in all figures. It should be 6-8 point when printed on a standard sheet of paper. Labels in Figure 3, distances in Figure 4, and everything in Figure 5 is hard to see.

      Our answer: Thank you for the comments and the enlargement of the figure size and label font in all figures have been made.

      Figure 2: would be helpful to show Figure S8 in the main text, orienting the reader to the approximate location of substrate binding. What is known about the EIIA-Glc binding interface? Has anyone probed this by mutagenesis? Where are these residues on the overall structure, and are they somewhere other than the nanobody interface?

      Our answer: Thank you for this comment. We have added a panel for orienting the readers about the substrate location in MelB in Figure 3c. The sFig. 8 actually focuses on the details of Nb interactions with MelB. Our current data strongly supported the notion that the Nb-bound MelBSt structure mimics the EIIAGlc-bound MelB but is not structurally resolved, so we have tuned down our statement on EIIAGlc. There is one study suggesting the C-terminal tail helix may be involved in the EIIAGlc binding, which has been added to the discussion.

      Can Figure 5 be split into 2 figures and simplified?

      Our answer: thanks for the suggestion. We have split it into Figs. 5b and 6 and also moved the peptide mapping to the Fig 5a.

      What is the difference between cartoon and ribbon rendering?

      Our answer: Ribbon: illustrating the structure; cartoon: highlighting the positions with statistically significant protection or deprotection. The statistically significant changes are implied by the ribbon representation; Sphere: not covered by labeled peptides.

      Can the panels showing the kinetic data be enlarged? I don't think they need to surround the molecule. An array underneath would be fine.

      Our answer: We have enlarged all figures and labels. The placement of selected plots around the model could clearly show the difference in deuterium uptake rates between the transmembrane domain and extra-membrane regions. We will maintain this arrangement.

      Do colors in panel A correspond with colors in panel B?

      Our answer: The color usage in both are different. Now the two panels have been separated.

      Do I understand correctly that in the HDX experiments, negative values indicate positions that exchange more quickly in the nanobody-free protein relative to the nanobody-bound protein?

      Our answer: Your understanding is correct.

      I assume some of this is due to the protein changing conformation, but some of it might be due to burial at the nanobody-binding interface. Can those peptides be indicated?

      Our answer: Thank you for this comment. We have marked the peptide carrying the Nb-binding residues on uptake plots in Figs.6 and Extended Fig. 1. There are only three Nb-binding residues covered by many overlapping peptides. Most are not covered, either not carried by the labeled peptides (Tyr205, Ser206, and Ser207) or with insignificant changes (Pro132 and Thr133), except for Asp137, Lys138, and Arg141 which are presented in 8 labeled peptides.

      Few buried positions in the outward-facing state are expected to be solvent in the inward-facing state; unfortunately, inward-facing state they are buried by Nb binding.

      Make figure legends easier to interpret by removing non-essential methods details (like buffer conditions).

      Our answer: We removed the detailed method descriptions in most figure legends. Thank you.

      Check throughout for typos.

      ie page 9 Lue Leu

      Page 9 like likely

      Our answer: We have corrected them. Thank you!

      Reviewer #2 (Recommendations For The Authors):

      I have mostly minor questions/remarks.

      • Why not do the hdx-ms experiments in the presence of sugar? That would give a proper distinction between two conformational states, instead of an ensemble of states vs one state.

      Our answer: MelB conformation induced by sugar is also multiple states, and likely most are outward-facing states and occluded intermediate states. This is also supported by the new finding of an inward state with low sugar affinity. The ideal design should be one inward and one outward to understand the inward-outward transition. We have not identified an outward-facing mutant while we can obtain the inward by the Nb. WT MelBSt with bound Na+ favors the outward-facing state. Although our design is not ideal, we do have one state vs a predominant outward-facing WT with bound Na+.

      Minor comments:

      • Fig 5 is misleading as the peptide number does not match with the amino acid sequence. I would suggest putting a heat map with coverage on top. Or showing deuterium uptake per peptide. See examples below.

      Our answer: The peptide number should not match with sequence number. We have 155 overlapping peptides that cover the entire amino acid sequence including the 10-His tag, and there are 60 residues with no data because they are not covered by a labeled peptide. The residue positions that are covered by peptides are estimated by bars on the top. The cylinder length does not correspond to the length of the transmembrane helix, just for mapping purposes.

      • Can the authors explain how they found that the Nbs bind to the cytoplasmic side (before obtaining the structure)?

      Our answer: Our in vivo two-hybrid assay between the Nb and MelBSt indicated their interaction on the cytoplasmic surface of MelBSt, which is further confirmed by the melibiose fermentation and transport assay, where the transport activities were completely inhibited by intracellularly coexpressed Nb and MelBSt. Thanks for raising this question.

      • The authors use the word "substrate" indifferently for sugar and Na+ binding, which is a bit confusing. Technically, only sugar is the substrate and Na+ is a ligand, or cotransported-ion, that powers the reaction of transport. This might sound like nit-picking but it can lead to misunderstandings (at some point I thought two sugars were transported, and then I was looking for the second Na+ binding site).

      Our answer: We used to call the sugar and Na as co-substrate but we agree with this comment.

      We have changed by using substrate for the cargo sugar and coupling cation for the driving cation.

      • Abstract "only the inner barrier" - the is missing.

      Thanks. We have corrected this.

      • p.3 intro "and identified that the positive cooperativity of cation and melibiose, " something is missing.

      Thanks again. We missed the “as the core symport mechanism”.

      • P.6 Nb275_4 instead of Nb725_4

      Thank you very much for your careful reading.

      • P.7. Also, affinity affinities

      Thank you very much. We changed to “; and also, the -NPG affinity decreased by 21~32-fold for both Nbs”

      • P.8 " contains 417 MelBSt residues (positions 2-210, 219-355, and 364-432). This does not sum up to 417 residues.

      Thanks for your critical reading. We changed 364-432 to 262-432.

      • p.9 Lue 54

      We have corrected it to Leu54.

      • I find fig.3 hard to read. Can the authors show the Na+ binding pockets and sugar binding pockets within the structure? Especially figure 3b. why are the residues in different colors?

      Our answer: We have moved Fig 3b into sFig. 11. We colored the residues in the previous Fig 3B to match the hosting helices. We have added two panels to show the location of both sugar and Na in the molecular. Thank you for your comments.

      • Fig4 bcef. Colored circles at the end of the helices. What are they for?

      Our answer: We revised the legend. “The paired helices involved in either barrier formation were highlighted in the same colored circles.”

      • 86% coverage includes the his-tag - it would be good to clarify that.

      Our answer: Yes, it includes the 10-His tag.

      • Fig.7 - anti clockwise cycle of transport is counter-intuitive.

      Our answer: We have re-arranged. Our model was constructed originally to explain efflux due to limited information at the earlier state. Now more data are available allowing us to explain inflow and active transport.

      • Where are all the uptake plots per peptide for the HDX-MS data?

      Our answer: We have added the course raw data and prepared all uptake plots for all 71 peptides with statistically significant changes as an Extended Fig. 1.

      • P.22 protein was concentrated to 50 mg/mL. Really? That is a lot.

      This is correct. We can even concentrate MelBSt protein to greater than 50 mg/ml.

      • Have the authors looked into the potential role of lipids in regulating the conformational transition? Since the structure was obtained in nanodiscs, have they observed some unexplained densities? The role of lipid-protein interactions in regulating such transitions was observed for several transporters including MFS (Gupta K, et al. The role of interfacial lipids in stabilizing membrane protein oligomers. Nature. 2017 10.1038/nature20820. Martens C, et al. Direct protein-lipid interactions shape the conformational landscape of secondary transporters. Nat Commun. 2018 10.1038/s41467-018-06704-1.). Furthermore, I see the authors have already observed lipid specific functional regulation of MelB (ref: Hariharan, P., et al BMC Biol 16, 85 (2018). https://doi.org/10.1186/s12915-018-0553-0). A few words about this previous work, and even commenting on the absence of lipid-protein interactions in this current work is worthwhile.

      Our answer: Thanks for this very relevant comment. We paid attention to the unmodelled densities. There is one with potential but it is challenging to model it. We have added a sentence “There is no unexplained density that can be clearly modeled by lipids.” in the method to address this concern.

      Reviewer #3 (Recommendations For The Authors):

      1) In the following sentence, the authors report high errors for the Kd value. The anti-Fab Nb binding to NabFab was two-fold poorer than Nb725_4 at a Kd value of 0.11 {plus minus} 0.16 μM. The figure however indicates that the error value is 0.016 µM. Pls correct.

      Our answer: Thank you. You are correct. The error has been corrected. 0.16 ± 0.02 uM. In this revised manuscript, we present the data in nM units.

      2) Is the stoichiometry of the MelB:Na+ symport clearly known in this transporter. It can be mentioned in the discussion with appropriate references.

      Our answer: Yes, the stoichiometry of unity has been clearly determined, which was included in the second paragraph of the previous version.

      3) In the last section of results, the authors seem to suggest a greater movement within their Cterminal helical bundle compared to N-terminal helices. Is there evidence to suggest an asymmetry in the rocker switch between the two states of the transporter?

      Our answer: Our structural data revealed that the C-terminal bundle is more dynamic compared with the N-terminal bundle where hosts the residues for specific binding of galactoside and Na+. The HDX data showed that the most dynamic regions are the structurally unresolved C-terminal tail by either method, the conserved tail helix and the middle-loop helix. transmembrane helices are relatively less dynamic with similar distributions on both transmembrane bundles. Since the most dynamic regions are peripheral element associated with the C-terminal domain, it might give a wrong impression. With regard to the symmetric or asymmetric movement, which will certainly affect the dynamic interactions between the transporter and the lipids, we favor the notion that MelBSt performs symmetric movement during the rocker switch between inward and outward states at the least cost for the protein-lipids interaction.

      4) Figure 1. Are the thermograms exothermic or endothermic? clarify

      Our answer: In our thermograms, all positive peaks are exothermic due to the direct detection of the heat release by the TA instrument. We clarified this in Method and now we stress this in figure legends to avoid confusion.

      5) Figure 4a,d. Please put in a membrane bilayer and depict cytosolic and extracellular compartments for clarity.

      Thank you. We have added a bilayer and labeled the sidedness in this figure and other related figures.

      6) Fig 7. Melibiose symport cannot be referred to as Melibiose efflux transport in the legend as the latter refers to antiport. Pls rectify.

      Our answer: Influx and efflux are conventionally used to describe the direction of movement of a substrate. The use of symport and antiport indicates the directions of the coupling reaction for the cargo and cation. For the symporter MelB, melibiose efflux means that sugar with the coupled cation moves out, which is driven by the melibiose concentration. During the steady state of melibiose active transport, efflux rate = influx rate.

      7) Page 11 "A common feature of carrier transporters". The authors can use either carriers or transporters. Need not use both simultaneously.

      Sorry for overlooking this. We have deleted carriers. Thank you very much for your time.

      8) Several typos were noticed in this manuscript. some are listed below. pls correct.

      Page 4- last paragraph "Furthermore"

      We have corrected it. Thank you again!

      Page 7 - second para one repharse "affinity reduced by 21~32 fold/units.." pls clarify

      Added 21~32 fold.

      Page 9 - "so it is highly likely that inward-open conformation" pls correct.

      We have corrected to “likely”.

      Fig. S9c - correct the spelling "Distance".

      We have corrected to “Distance”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Major comments:

      1) The authors conclude that the bone growth defects are chondrocyte-specific, highlighting no changes in the IGF pathway. However, other bone cells such as mesenchymal progenitors, osteoblasts, osteocytes, and marrow stromal cells are also lateral plate mesoderm derived and likely have roles in the bone growth phenotypes (a). Additionally, while the size decrease of the proliferative zone was stated, no actual proliferation assays such as BrdU were conducted (b). With the elements being of such small size in the mutants, the defects are likely to be found at the earliest stages of limb development at E11.5-E13.5 and may be due to mesenchymal to chondrocyte transitions or defects in osteoblast lineage development (c). Overall, the skeletal characterization is not rigorous and does not identify even a likely cellular mechanism. Further, a molecular mechanism by which SMN functions in mesenchymal progenitors, chondrocytes, or osteoblast lineage cells has not been assessed (d).

      (a, c) As the reviewer commented, it seems to be a very important point to evaluate whether there is any problem in embryonic development from the time of mesenchymal cell condensation of the limb bud to the primary ossification center. However, when Hensel et al evaluated bone growth in P3 of severe SMA mice, the growth defect was not very large, with control femur length 3.5 mm and mutant 3.2 mm. it seems that even if SMN defects occur, there is no major problem with endochondral bone formation in the embryonic period (Hensel et al., 2020).

      In this study, the SMN2 1-copy mutant with the bone growth defect was found to have a similar reduction in SMN protein to the severe SMA mouse model in experiments quantifying SMN protein. When Hensel et al. performed an in vitro ossification test on primary osteoblasts from the other severe SMA mouse model (Taiwanese severe SMA), they found no significant difference compared to controls. In femurs at P3 from severe SMA mice, they found no difference in bone voxel density and bone thickness (Hensel et al., 2020). In our data, bone thickness was not different in Figure 1 and Figure 1 – figure supplement 2, and BMD was actually greater. Thus, we believe that osteoblast and osteocyte function does not appear to be impaired by the absence of SMNs. When we looked at cortical osteoblasts in our new Figure 1-figure supplement 2, there did not appear to be a significant difference in density.

      Furthermore, it is unlikely that BMSCs contributed to the bone growth we observed up to 2 weeks of age. the Lepr+Cxcl12+ BMSC population, which constitutes 94% ± 4% of CFU-F colonies formed by bone marrow cells (Zhou et al.k, 2014), is Prrx1-positive, and is known to be capable of osteogenesis in vivo, was only shown to differentiate into osteoblasts and form new bone in adults over 8 weeks of age. In the Lepr-cre; tdTomato; Col2.3-GFP mouse model, few cells expressing the osteoblast marker Col2.3-GFP are found before 2 months, and only about 3% of femur trabecular and cortical osteocytes express tdTomato at 2 months (Zhou et al., 2014). In Cxcl12-CreER; tdTomato; Col2.3-GFP mouse model, the researchers did not find tomato positivity in osteoblasts and osteocytes even after administration of tamoxifen at P3 and analysis 1 year later (Matsushita et al., 2020).

      We, therefore, concluded that the bone growth abnormalities observed in SMN2 1-copy mutants are due to problems in endochondral ossification caused by chondrocyte defects and not due to other Prrx1-lineage skeletal cells.

      (b) According to the reviewer's suggestion, we evaluated cell proliferation in the new Figure 1J-L by performing immunostaining for the Ki67 proliferation marker in growth plates.

      (d) As the reviewer pointed out, we enhanced the mechanism study and found the reduction of chondrocyte-derived IGF signaling and hypertrophic marker in new Figure 2. We evaluated the density of osteoblasts and osteoclasts, which can affect bone mineralization. We highlighted the limited impact of BMSCs on bone growth in the first two weeks of life. In a previous study, SMN-deleted osteoblasts did not show any issues with ossification (Hensel et al., 2020). In fact, osteoblast density in the SMN2 1-copy mutant was not different from the control, indicating that the skeletal abnormalities can largely be attributed to deficiencies in endochondral ossification caused by chondrocytes. Since chondrocytes are the local source of IGF and our mutants exhibit phenotypes similar to mouse models with reduced IGF, such as downregulated expression of Igf1 and Igfbp3, downregulated IGF-induced hypertrophic gene expression, reduced AKT phosphorylation, proliferation, and growth plate zone length, SMN-deleted chondrocytes probably showed these phenotypes due to decreased IGF secretion. Now, we added new Figure 2A-C, and E.

      2) Is the liver the only organ/tissue that supplied IGF to the chondrocytes or are other lateral plate mesoderm-derived cells potential suppliers? It's not possible to pin SMN deletion in chondrocytes as intrinsic ignoring the other bone cell types that it is depleted from in the Prrx1Cre genetic model.

      Recently, Oichi et al. reported that the local IGF source in the growth plate is chondrocytes by in situ hybridization and p-AKT staining (Oichi et al., 2023). When we measured IGF in chondrocytes isolated from articular cartilage, the expressions of Igf1 andIgfbp3 were markedly reduced in chondrocytes with SMN deletion compared to controls (New Figure 2E), suggesting that intrinsic SMN expression in chondrocytes plays an important role in the growth plate.

      3) Why is SMN protein being isolated from FAPs to assess levels in the null/SMN2 single copy/double copy mutants when the bone defects are supposed to be a chondrocyte-specific phenotype? This protein expression needs to be confirmed in chondrocytes themselves, and or other Prrx1Cre lineaged skeletal cells.

      According to the reviewer’s suggestion, we attempted to evaluate the protein levels in chondrocytes of the SMN2 1-copy mutant. However, we were unable to obtain sufficient numbers of chondrocytes, because of poor proliferation of mutant chondrocytes compared to controls in culture conditions. We could obtain ~10^4 viable cells from 1 mouse of SMN2 1-copy mutant. Therefore, our only options for confirming SMN deletion in chondrocytes were DNA and RNA work. As in the Prrx1-lineage FAPs that the amount of SMN protein correlates with the expression levels of full-length SMN mRNA (Figure 2H-J), we expect that the SMN protein in chondrocytes would be fully depleted due to poor full-length SMN mRNA expression (Figure 2H).

      4) Figure 2E should have example images of each type of NMJ characterization.

      We revised our figure by adding the example images in new Figure 3E.

      5) What are the overall NMJ numbers in the normal formation period? Are these constant into the juvenile period when the authors say the deterioration occurs?

      We appreciate the reviewer's constructive comments, and it would be interesting to see if we could see a difference in the total number of NMJs. However, there is one NMJ in every myofiber, and each muscle has hundreds to thousands of myofibers. The technical difficulty of confocal imaging an entire muscle, which can be several millimeters across, precludes experiments that count every NMJ and show a difference. It may be possible to do so by combining clearing and confocal line scanning techniques. In our analysis of the NMJ, the formation of the NMJ in the mutant appears to be normal. Additionally, the number of myofibers seems to be the same, and there may be no difference in the total NMJ number.

      6) For transplantation experiments the authors sorted YFP or TOMATO+ cells from the Prrx1Cre mice muscles, but refer to them as FAPs. It is known that other cells including tenocyte-like cells, pericytes, and vascular smooth muscle cells are identified by this reporter line. Staining for TOMATO colocalization with PDGFRA would help to clarify this.

      In the method ‘Hindlimb fibro-adipogenic progenitors isolation’ section, we sorted 7AAD–Lin–Vcam–Sca1+ population refers to FAPs. For FAPs transplantation, we also used YFP or TOMATO+ FAPs (7AAD–Lin–Vcam–Sca1+). The ‘FAPs transplantation’ method section did not specify the FAPs population in detail. This has been fixed in the new method. Sca1 (Ly6a) is an effective marker for identifying FAPs within Prrx1-lineage cells, as well as Pdgfra (Leinroth et al., 2022).

      7) The authors only compare the SMN2 single copy mutant transplantation to contralateral to show rescue, but how does this compare to overall wt morphology?

      According to the reviewer’s constructive comment, we compared them with wild-type morphology (new Figure 7A-D).

      8) The asterisks of TOMATO+ in Figure 6A are confusing. FAPs do not usually clump together to form such large plaques and are normally much thinner tendrils. What is the reason for this?

      As the reviewer states, FAPs have a fibroblast-like morphology with elongated thinner tendrils. The Figure 6A image in the figure shows a Z-sliced cell body portion of FAP, where the nucleus is located, and it appears blunt. We attached imaged tomato+ FAPs, in which their cell body parts are plaque-like.

      Author response image 1.

      Tomato+ FAPs in muscle

      9) Would transplantation of healthy FAPs after NMJ maturation in SMN mutants still rescue the phenotype? Assessment of this is key for therapy intervention timelines moving forward.

      It will be very interesting to see if the phenotype improves after NMJ maturation by healthy FAPs transplantation, but this is a technically difficult experiment to do because we found that FAPs do not implant effectively when injected into naive adult muscle. The transplantation into the adult is sufficiently possible if accompanied by an injury, but this eventually leads to new formation of NMJ again. Thus, it seems impossible to do transplantation experiment after NMJ maturation through general methods. If we discover a method to efficiently rescue SMNs from FAPs or identify a factor that affects FAPs' influence on NMJ, then we may be able to conduct this experiment.

      Reference

      Hensel, N., Brickwedde, H., Tsaknakis, K., Grages, A., Braunschweig, L., Lüders, K. A., Lorenz, H. M., Lippross, S., Walter, L. M., Tavassol, F., Lienenklaus, S., Neunaber, C., Claus, P., & Hell, A. K. (2020). Altered bone development with impaired cartilage formation precedes neuromuscular symptoms in spinal muscular atrophy. Human Molecular Genetics, 29(16), 2662–2673. https://doi.org/10.1093/hmg/ddaa145

      Leinroth, A. P., Mirando, A. J., Rouse, D., Kobayahsi, Y., Tata, P. R., Rueckert, H. E., Liao, Y., Long, J. T., Chakkalakal, J. V., & Hilton, M. J. (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports, 39(6), 110785. https://doi.org/10.1016/j.celrep.2022.110785

      Matsushita, Y., Nagata, M., Kozloff, K. M., Welch, J. D., Mizuhashi, K., Tokavanich, N., Hallett, S. A., Link, D. C., Nagasawa, T., Ono, W., & Ono, N. (2020). A Wnt-mediated transformation of the bone marrow stromal cell identity orchestrates skeletal regeneration. Nature Communications, 11(1). https://doi.org/10.1038/s41467-019-14029-w

      Oichi, T., Kodama, J., Wilson, K., Tian, H., Imamura Kawasawa, Y., Usami, Y., Oshima, Y., Saito, T., Tanaka, S., Iwamoto, M., Otsuru, S., & Enomoto-Iwamoto, M. (2023). Nutrient-regulated dynamics of chondroprogenitors in the postnatal murine growth plate. Bone Research, 11(1). https://doi.org/10.1038/s41413-023-00258-9

      Zhou, B. O., Yue, R., Murphy, M. M., Peyer, J. G., & Morrison, S. J. (2014). Leptin-receptor-expressing mesenchymal stromal cells represent the main source of bone formed by adult bone marrow. Cell Stem Cell, 15(2), 154–168. https://doi.org/10.1016/j.stem.2014.06.008

      Reviewer #2

      Major comments:

      1) Regarding bone deficits - CT analysis of bones should be more comprehensive than Figure 1A shows. How about cross-sections? (a) Are bone phenotypes also age-dependent? (b) PCR was done only for SMA and related proteins (such as IGF). IGF protein in the blood and relevant organs should be studied. Why not include biomarkers of osteoblasts or/and osteoclasts and their regulators? (c)

      (a) We appreciate the reviewer’s constructive comment. we added longitudinal section views in new Figure 1A and a description of trabecular bone volume and secondary ossification center in the main text.

      (b) Age-dependent evaluation is an important point. By adulthood, the difference between the SMN2 1-copy mutant and the control is much larger, and even at birth there is a slight difference, although not as large as at 2 weeks of age. We focused our phenotyping on bone growth at 2 weeks of age, a time when new bone formation by BMSCs is less influential, when bone growth is primarily driven by endochondral ossification of chondrocytes, and before the defect in the NMJ is primarily manifested.

      (c) As the reviewer comments, it is important that IGF are evaluated in tissues other than liver. However, the liver is most likely the source of systemic IGF, as shown by the liver-specific deletion of Igf1 and knockout of Igfals, a protein that forms the IGF ternary complex, which is predominantly expressed in the liver. This resulted in a 90% drop in serum IGF levels and a phenotype of shortened femur length and growth plates in the double KO mice (Yakar et al., 2002).

      The local IGF source in the growth plate is chondrocytes confirmed by Igf1 in situ hybridization and p-AKT staining (Oichi et al., 2023). From the In situ hybridization data, we can observe that bone marrow and bone do not express Igf1 at all, but only perichondrium and chondrocytes in the resting zone express Igf1 mRNA. Therefore, we can see that the only supplier of IGF among LPM-derived cells is chondrocytes, and in the new figure 2, we measured IGF pathway expression and AKT phosphorylation in chondrocytes. We have confirmed that the expression of Igf1/Igfbp3 is reduced in chondrocytes with SMN deletion.

      To assess serum IGF level, we could not set up this experiment condition during our revision period due to the requirement of administrative procedures for purchasing new apparatuses and the limitation of our research funds. However, as previously stated, there is no difference in the expression of Igf1 and Igfals in the liver, which accounts for 90% of serum IGF levels. Therefore, we did not anticipate significant variations in serum IGF levels.

      Evaluation of osteoblasts or osteoclasts was done by section staining due to sampling difficulties for PCR. we assessed osteoblasts and osteoclasts state in new Figure 1-figure supplement 2.

      2) What is the relationship between deficits of bone deficits and muscle deficits or even NMJ deficits? Are they inter-related? Is skeletal muscle development also defective in Smn∆MPC mice? Can NMJ deficits result from bone deficits? Or vice versa?

      Unfortunately, the reviewer's comments are very difficult to clarify in our study using the Prrx1-cre model. In skeletal muscle development, the myofiber number was not significantly different in our mouse models. A study has shown that inactivating noggin, a BMP antagonist expressed in condensed cartilage and immature chondrocytes, results in severe skeletal defects without affecting the early stages of muscle differentiation (Tylzanowski et al., 2006). Therefore, bone may not have a significant impact on the early development of muscle, but later in postnatal development it may have an impact on motor performance issues. The relationship between bone and NMJ hasn't been studied. The impact of bone defects on motor skill may result in muscle weakness and NMJ problems. In our study, we showed that NMJ deficit rescue by transplantation of FAPs and decreased IGF in chondrocytes, a key source of local IGF. This suggests that the functions of FAPs in NMJ and chondrocytes in bone deficit are crucial, rather than each other's influence.

      3) Regarding the rescue experiment, the interpretation of the data should be careful. Evidently, healthy FAPs (td-Tomato positive) were transplanted into TA muscles of 10 days-old SMN2 1-copy SmnΔMPC mice, and NMJs were looked at P56. The control was contralateral TA that was injected with the vehicle. As described above, the data had huge SEM and were difficult to interpret or believe. The control perhaps was wrong if FAPs act by releasing "chemicals" because FAPs from one leg may go to other muscles via blood. Second, if FAPs act via contact, the data shown did not support this. Two red FAPs were shown in Figure 6, one of which was superimposed with a nerve track to one of the three NMJs. This NMJ however did not show any difference to the other two, which did not support a contact mechanism. These rescue data were not convincing.

      We appreciate the reviewer’s critical comment, but the reviewer appears to have confused the minimum and maximum range bars in the box-and-whisker plot with the SEM error bar in the bar graph. We apologize for the insufficient description of the figure legends section. We revised them. New Figure 7C, which is a bar graph, has a sufficiently short SEM error bar. In contrast, box-and-whisker plots B and D depict the minimum and maximum range, instead of the SEM, and they are significantly different with a p-value of less than 0.001. If FAPs affect the NMJ via a paracrine factor or ECM with a short range of action, they may rescue the NMJ defect in a non-contact-dependent manner, without affecting the contralateral muscle. Also, the FAPs are heterogeneous, so if only a certain subpopulation rescues, the tomato+ FAP in the figure may not be the rescuing cells.

      4) For most experiments, the "n" numbers were too small. 3-5 mice were used for bone characterization. For the NMJ, most experiments were done with 3 mice. It was unclear how many NMJs were looked at. Perhaps due to small n numbers, the SEM values were enormous (for example, in Figure 6).

      As with the response to the previous comment, this is due to confusion between box-and-whisker plots and bar graphs, and our data was determined to be significant using the appropriate statistical method.

      5) Also for experimental design, some experiments included four genotypes of mice (Fig. 1 J,K) whereas some had only three (Fig.1 A, B, C, D and Fig.3) and others had two (many other figures).

      In the first experiments to confirm the phenotypes, we tested the 2-copy mutant, but it was not significantly different from the wild type, and in subsequent experiments, we mainly tested the only 1-copy mutant.

      6) What was the reason why mixed muscles were used for NMJ characterization (TA versus EDL)? Why not pick a type I-fiber muscle and a type II-fiber muscle?

      We appreciate the constructive comment from the reviewer. Firstly, we conducted a phenotype analysis on the TA muscle. For electrophysiological recording, the EDL muscle should be used for intact nerve with muscle preparation, technically. Additionally, for TEM imaging, EDL was a suitable muscle to locate NMJ positions before TEM processing. Both TA and EDL muscles are adjacent and have similar fiber-type compositions. It would be important to observe in different fiber types of muscles, but when we first identified the phenotype, various types of limb muscles showed similar defects, so we focused on specific muscles.

      7) The description of mouse strains was confusing. SMN2 transgenic mice (with different copies) were not described in the methods.

      We apologize for the insufficient description of the method section. By crossing mice with the SMN2+/+ homologous allele, SMN2 heterologous mice with only one SMN2 allele are SMN2 1-copy mice (SMN2+/0) and SMN2 homologous mice are SMN2 2-copy mice (SMN2+/+). We revised our manuscript method ‘Animals’ section.

      Reference Oichi, T., Kodama, J., Wilson, K., Tian, H., Imamura Kawasawa, Y., Usami, Y., Oshima, Y., Saito, T., Tanaka, S., Iwamoto, M., Otsuru, S., & Enomoto-Iwamoto, M. (2023). Nutrient-regulated dynamics of chondroprogenitors in the postnatal murine growth plate. Bone Research, 11(1). https://doi.org/10.1038/s41413-023-00258-9

      Tylzanowski, P., Mebis, L., and Luyten, F. P. (2006). The noggin null mouse phenotype is strain dependent and haploinsufficiency leads to skeletal defects. Dev. Dyn. 235, 1599–1607. doi: 10.1002/dvdy.20782

      Yakar, S., Rosen, C. J., Beamer, W. G., Ackert-Bicknell, C. L., Wu, Y., Liu, J. L., Ooi, G. T., Setser, J., Frystyk, J., Boisclair, Y. R., & LeRoith, D. (2002). Circulating levels of IGF-1 directly regulate bone growth and density. Journal of Clinical Investigation, 110(6), 771–781. https://doi.org/10.1172/JCI0215463

      Reviewer #3

      1) The authors used Prrx1Cre mouse with floxed Smn exon7(Smnf7) mouse carrying multiple (one or two) copies of the human SMN2 gene. Is it expressed both in chondrocytes and mesenchymal progenitors in the limb?

      We appreciate the reviewer's comment. We analyzed the deletion of Smn in chondrocytes and FAPs via Cre using genomic PCR and qRT-PCR, as depicted in new Figure 2. The SMN2 allele, which is expressed throughout the body, can rescue Smn knockout mouse lethality (Monani et al., 2000). Indeed, the short limb length and lethality observed in SMN2 0-copy mutants were mitigated by the presence of multiple copies of SMN2. Therefore, both Chondrocytes and FAPs may express SMN2 transcripts from the transgenic SMN2 allele.

      2) Page 10 regarding Fig.2E, please show pretzel-like structure. In Figure 2E, plaque, perforated, open, and branched are shown; however, the pretzel is not shown. The same issue is for the Fig. 3D explanation in the text on page 12.

      We appreciate the reviewer's constructive feedback. We included illustrative figures of all types of NMJ characterization, and the branched type is identical to the pretzel type. Therefore, we have replaced ‘branched’ with ‘pretzel’ in our text and revised Figure 3E by incorporating the example images.

      3) The explanation of the electrophysiology for Fig.4 in the text on pages 12 and 15 (RRP) is not so convincing for the readers. It is advisable to add TEM data for transplantation if it is not technically difficult.

      We appreciate the reviewer's critical feedback. Because we did not measure RRP directly, we removed speculation about the possibility of RRP difference. If observing the active zone with TEM and the docking synaptic vesicle would help quantify RRP, it is technically difficult to obtain images of sufficient quality to distinguish the active zones with our current TEM imaging technique.

      4) The authors used the word FAP for 7AAD(-)Lin(-)Vcam(-)Sca1(+). It is recommended to show the expression of PDGFR alpha. Furthermore, as the authors stated in the text, mesenchymal progenitors (FAPs) are heterogeneous. Please discuss this point further. Other reports show at least 6 subpopulations using single-cell analyses (Cell Rep. 2022).

      In the report, Ly6a (Sca1) is a good marker for FAPs, as well as Pdgfra (Leinroth et al., 2022). The 6 subpopulations expressed Ly6a. The one of subpopulations associated with NMJ was discovered. This population expressed Hsd11b1, Gfra1, and Ret and is located adjacent to the NMJ and responds to denervation, indicating an increased possibility of interaction with the NMJ organization. In further our study, we aim to determine which subpopulations are crucial for NMJ maturation by transplanting them to mutants for rescue.

      5) How do authors determine the number of FAP cells for transplantation?

      The FAPs transplantation was performed according to a previously reported our study (Kim et al., 2021).

      Reference Kim, J. H., Kang, J. S., Yoo, K., Jeong, J., Park, I., Park, J. H., Rhee, J., Jeon, S., Jo, Y. W., Hann, S. H., Seo, M., Moon, S., Um, S. J., Seong, R. H., & Kong, Y. Y. (2022). Bap1/SMN axis in Dpp4+ skeletal muscle mesenchymal cells regulates the neuromuscular system. JCI Insight, 7(10). https://doi.org/10.1172/jci.insight.158380

      Leinroth, A. P., Mirando, A. J., Rouse, D., Kobayahsi, Y., Tata, P. R., Rueckert, H. E., Liao, Y., Long, J. T., Chakkalakal, J. V., & Hilton, M. J. (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports, 39(6), 110785. https://doi.org/10.1016/j.celrep.2022.110785

      Monani, U. R., Sendtner, M., Coovert, D. D., Parsons, D. W., Andreassi, C., Le, T. T., Jablonka, S., Schrank, B., Rossol, W., Prior, T. W., Morris, G. E., & Burghes, A. H. M. (2000). The human centromeric survival motor neuron gene (SMN2) rescues embryonic lethality in Smn(-/-) mice and results in a mouse with spinal muscular atrophy. Human Molecular Genetics, 9(3), 333–339. https://doi.org/10.1093/hmg/9.3.333

    1. Author Response

      eLife assessment

      In this valuable study, the authors investigate the transcriptional landscape of tuberculous meningitis, revealing key molecular differences contributed by HIV co-infection. Whilst some of the evidence presented is compelling, the bioinformatics analysis is limited to a descriptive narrative of gene-level functional annotations, which are somewhat basic and fail to define aspects of biology very precisely. Whilst the work will be of broad interest to the infectious disease community, validation of the data is critical for future utility.

      Response: We appreciate eLife’s positive assessment, although we challenge the conclusion that we ‘fail to define aspects of biology very precisely’. Our stated objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis and the eLife assessment affirms we have investigated ‘the transcriptional landscape of tuberculous meningitis’. To more precisely define aspects of the biology will require another study with different design and methods. Therefore the criticism seems unnecessarily harsh given the limitations of our stated objective.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tuberculous meningitis (TBM) is one of the most severe forms of extrapulmonary TB. TBM is especially prevalent in people who are immunocompromised (e.g. HIV-positive). Delays in diagnosis and treatment could lead to severe disease or mortality. In this study, the authors performed the largest-ever host whole blood transcriptomics analysis on a cohort of 606 Vietnamese participants. The results indicated that TBM mortality is associated with increased neutrophil activation and decreased T and B cell activation pathways. Furthermore, increased angiogenesis was also observed in HIV-positive patients who died from TBM, whereas activated TNF signaling and down-regulated extracellular matrix organisation were seen in the HIV-negative group. Despite similarities in transcriptional profiles between PTB and TBM compared to healthy controls, inflammatory genes were more active in HIV-positive TBM. Finally, 4 hub genes (MCEMP1, NELL2, ZNF354C, and CD4) were identified as strong predictors of death from TBM.

      Strengths:

      This is a really impressive piece of work, both in terms of the size of the cohort which took years of effort to recruit, sample, and analyse, and also the meticulous bioinformatics performed. The biggest advantage of obtaining a whole blood signature is that it allows an easier translational development into a test that can be used in the clinical with a minimally invasive sample. Furthermore, the data from this study has also revealed important insights into the mechanisms associated with mortality and the differences in pathogenesis between HIV-positive and HIV-negative patients, which would have diagnostic and therapeutic implications.

      Weaknesses:

      The data on blood neutrophil count is really intriguing and seems to provide a very powerful yet easy-to-measure method to differentiate survival vs. death in TBM patients. It would be quite useful in this case to perform predictive analysis to see if neutrophil count alone, or in combination with gene signature, can predict (or better predict) mortality, as it would be far easier for clinical implementation than the RNA-based method. Moreover, genes associated with increased neutrophil activation and decreased T cell activation both have significantly higher enrichment scores in TBM (Figure 9) and in morality (Figure 8). While I understand the basis of selecting hub genes in the significant modules, they often do not represent these biological pathways (at least not directly associated in most cases). If genes were selected based on these biologically relevant pathways, would they have better predictive values?

      Response: Blood neutrophil count was not found to be a predictor for TBM mortality in our previous studies. We agree it could be useful to perform predictive analysis with neutrophil count as suggested by reviewer. Regarding hub genes versus genes representative of the biological pathways, we cannot know which have better predictive values without performing variable selection for the sets of all genes including both hub genes and pathway representative genes, additional analysis which we will undertake.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the analysis of blood transcriptomic data from patients with TB meningitis, with and without HIV infection, with some comparison to those of patients with pulmonary tuberculosis and healthy volunteers. The objectives were to describe the comparative biological differences represented by the blood transcriptome in TBM associated with HIV co-infection or survival/mortality outcomes and to identify a blood transcriptional signature to predict these outcomes. The authors report an association between mortality and increased levels of acute inflammation and neutrophil activation, but decreased levels of adaptive immunity and T/B cell activation. They propose a 4-gene prognostic signature to predict mortality.

      Strengths:

      -Biological evaluations of blood transcriptomes in TB meningitis and their relationship to outcomes have not been extensively reported previously.

      -The size of the data set is a major strength and is likely to be used extensively for secondary analyses in this field of research.

      Weaknesses:

      The bioinformatic analysis is limited to a descriptive narrative of gene-level functional annotations curated in GO and KEGG databases. This analysis can not be used to make causal inferences. In addition, the functional annotations are limited to 'high-level' terms that fail to define biology very precisely. At best, they require independent validation for a given context. As a result, the conclusions are not adequately substantiated. The identification of a prognostic blood transcriptomic signature uses an unusual discovery approach that leverages weighted gene network analysis that underpins the bioinformatic analyses. However, the main problem is that authors seem to use all the data for discovery and do not undertake any true external validation of their gene signature. As a result, the proposed gene signature is likely to be overfitted to these data and not generalisable. Even this does not achieve significantly better prognostic discrimination than the existing clinical scoring.

      Response: As explained in response to the eLife assessment, our objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis. We agree that ‘This analysis can not be used to make causal inferences’: that would require different study design and approaches. The proposed gene signature has higher AUC values than the existing clinical model. We agree that validation of the gene signature in an independent sample set will be a crucial next step.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      Concerns Public Review:

      1)The framing of 'infinite possible types of conflict' feels like a strawman. While they might be true across stimuli (which may motivate a feature-based account of control), the authors explore the interpolation between two stimuli. Instead, this work provides confirmatory evidence that task difficulty is represented parametrically (e.g., consistent with literatures like n-back, multiple object tracking, and random dot motion). This parametric encoding is standard in feature-based attention, and it's not clear what the cognitive map framing is contributing.

      Suggestion:

      1) 'infinite combinations'. I'm frankly confused by the authors response. I don't feel like the framing has changed very much, besides a few minor replacements. Previous work in MSIT (e.g., by the author Zhongzheng Fu) has looked at whether conflict levels are represented similarly across conflict types using multivariate analyses. In the paper mentioned by Ritz & Shenhav (2023), the authors looked at whether conflict levels are represented similarly across conflict types using multivariate analyses. It's not clear what this paper contributes theoretically beyond the connections to cognitive maps, which feel like an interpretative framework rather than a testable hypothesis (i.e., these previous paper could have framed their work as cognitive maps).

      Response: We acknowledge the limitations inherent in our experimental design, which prevents us from conducting a strict test of the cognitive space view. In our previous revision, we took steps to soften our conclusions and emphasize these limitations. However, we still believe that our study offers valuable and novel insights into the cognitive space, and the tests we conducted are not merely strawman arguments.

      Specifically, our study aimed to investigate the fundamental principles of the cognitive space view, as we stated in our manuscript that “the representations of different abstract information are organized continuously and the representational geometry in the cognitive space is determined by the similarity among the represented information (Bellmund et al., 2018)”. While previous research has applied multivariate analyses to understand cognitive control representation, no prior studies had directedly tested the two key hypotheses associated with cognitive space: (1) that cognitive control representation across conflict types is continuous, and (2) that the similarity among representations of different conflict types is determined by their external similarity.

      Our study makes a unique contribute by directly testing these properties through a parametric manipulation of different conflict types. This approach differs significantly from previous studies in two ways. First, our parametric manipulation involves more than two levels of conflict similarity, enabling us to directly test the two critical hypotheses mentioned above. Unlike studies such as Fu et al. (2022) and other that have treated different conflict types categorically, we introduced a gradient change in conflict similarity. This differentiation allowed us to employ representational similarity analysis (RSA) over the conflict similarity, which goes beyond mere decoding as utilized in prior work (see more explanation below for the difference between Fu et al., 2022 and our study [1]).

      Second, our parametric manipulation of conflict types differs from previous studies that have manipulated task difficulty, and the modulation of multivariate pattern similarity observed in our study could not be attributed by task difficulty. Previous research, including the Ritz & Shenhav (2023) (see below explanation[2]), has primarily shown that task difficulty modulates univoxel brain activation. A recent work by Wen & Egner (2023) reported a gradual change in the multivariate pattern of brain activations across a wide range of frontoparietal areas, supporting the reviewer’s idea that “task difficulty is represented parametrically”. However, we do not believe that our results reflect the task difficulty representation. For instance, in our study, the spatial Stroop-only and Simon-only conditions exhibited similar levels of difficulty, as indicated by their relatively comparable congruency effects (Fig. S1). Despite this similarity in difficulty, we found that the representational similarity between these two conditions was the lowest (see revised Fig. S4, the most off-diagonal value). This observation aligns more closely with our hypothesis that these two conditions are most dissimilar in terms of their conflict types.

      [1] Fu et al. (2022) offers important insights into the geometry of cognitive space for conflict processing. They demonstrated that Simon and flanker conflicts could be distinguished by a decoder that leverages the representational geometry within a multidimensional space. However, their model of cognitive space primarily relies on categorical definitions of conflict types (i.e., Simon versus flanker), rather than exploring a parametric manipulation of these conflict types. The categorical manipulations make it difficult to quantify conceptual similarity between conflict types and hence limit the ability to test whether neural representations of conflict capture conceptual similarity. To the best of our knowledge, no previous studies have manipulated the conflict types parametrically. This gap highlights a broader challenge within cognitive science: effectively manipulating and measuring similarity levels for conflicts, as well as other high-level cognitive processes, which are inherently abstract. We therefore believe our parametric manipulation of conflict types, despite its inevitable limitations, is an important contribution to the literature.

      We have incorporated the above statements into our revised manuscript: Methodological implications. Previous studies with mixed conflicts have applied mainly categorical manipulations of conflict types, such as the multi-source interference task (Fu et al., 2022) and color Stroop-Simon task (Liu et al., 2010). The categorical manipulations make it difficult to quantify conceptual similarity between conflict types and hence limit the ability to test whether neural representations of conflict capture conceptual similarity. To the best of our knowledge, no previous studies have manipulated the conflict types parametrically. This gap highlights a broader challenge within cognitive science: effectively manipulating and measuring similarity levels for conflicts, as well as other high-level cognitive processes, which are inherently abstract. The use of an experimental paradigm that permits parametric manipulation of conflict similarity provides a way to systematically investigate the organization of cognitive control, as well as its influence on adaptive behaviors.

      [2] The work by Ritz & Shenhav (2023) indeed applied multivariate analyses, but they did not test the representational similarity across different levels of task difficulty in a similar way as our investigation into different levels of conflict types, neither did they manipulated conflict types as our study. They first estimated univariate brain activations that were parametrically scaled by task difficulty (e.g., target coherence), yielding one map of parameter estimates (i.e., encoding subspace) for each of the target coherence and distractor congruence. The multivoxel patterns from the above maps were correlated to test whether the target coherence and distractor congruence share the similar neural encoding. It is noteworthy that the encoding of task difficulty in their study is estimated at the univariate level, like the univariate parametric modulation analysis in our study. The representational similarity across target coherence and distractor congruence was the second-order test and did not reflect the similarity across different difficulty levels. Though, we have found another study (Wen & Egner, 2023) that has directly tested the representational similarity across different levels of task difficulty, and they observed a higher representational similarity between conditions with similar difficulty levels within a wide range of brain regions.

      Reference:

      Wen, T., & Egner, T. (2023). Context-independent scaling of neural responses to task difficulty in the multiple-demand network. Cerebral Cortex, 33(10), 6013-6027. https://doi.org/10.1093/cercor/bhac479

      Fu, Z., Beam, D., Chung, J. M., Reed, C. M., Mamelak, A. N., Adolphs, R., & Rutishauser, U. (2022). The geometry of domain-general performance monitoring in the human medial frontal cortex. Science (New York, N.Y.), 376(6593), eabm9922. https://doi.org/10.1126/science.abm9922

      Ritz, H., & Shenhav, A. (2023). Orthogonal neural encoding of targets and distractors supports multivariate cognitive control. https://doi.org/10.1101/2022.12.01.518771 Another issue is suggesting mixtures between two types of conflict may be many independent sources of conflict. Again, this feels like the strawman. There's a difference between infinite combinations of stimuli on the one hand, and levels of feature on the other hand. The issue of infinite stimuli is why people have proposed feature-based accounts, which are often parametric, eg color, size, orientation, spatial frequency. Mixing two forms of conflict is interesting, but the task limitations (i.e., highly correlated features) prevent an analysis of whether these are truly mixed (or eg reflect variations on just one of the conflict types). Without being able to compare a mixture between types vs levels of only one type, it's not clear what you can draw from these results re: how these are combined (and not clear how it reconciles the debate between general and specific).

      Response: As the reviewer pointed out, a feature (or a parameterization) is an efficient way to encode potentially infinite stimuli. This is the same idea as our hypothesis: different conflict types are represented in a cognitive space akin to concrete features such as a color spectrum. This concept can be illustrated in the figure below.

      Author response image 1.

      We would like to clarify that in our study we have manipulated five levels of conflict types, but they all originated from two fundamental sources: vertically spatial Stroop and horizontally Simon conflicts. We agree that the mixture of these two sources does not inherently generate additional conflict sources. However, this mixture does influence the similarity among different conflict conditions, which provides essential variability that is crucial for testing the core hypotheses (i.e., continuity and similarity modulation, see the response above) of the cognitive space view. This clarification is crucial as the reviewer’s impression might have been influenced by our introduction, where we repeatedly emphasized multiple sources of conflicts. Our aim in the introduction was to outline a broader conceptual framework, which might not directly reflect the specific design of our current study. Recognizing the possibility of misinterpretation, we have adjusted our introduction and discussion to place less emphasis on the variety of possible conflict sources. For example, we have removed the expression “The large variety of conflict sources implies that there may be innumerable number of conflict conditions” from the introduction. As we have addressed in the previous response, the observed conflict similarity effect could not be attributed to merely task difficulty. Similarly, the mixture of spatial Stroop and Simon conflicts should not be attributed to one conflict source only; doing so would oversimplify it to an issue of task difficulty, as it would imply that our manipulation of conflict types merely represented varying levels of a single conflict, akin to manipulating task difficulty when everything else being equal. Importantly, the mixed conditions differ from variations along a single conflict source in that they also incorporate components of the other conflict source, thereby introducing difference beyond that would be found within variances of a single conflict source. There are a few additional evidence challenging the single dimension assumption. In our previous revisions, we compared model fittings between the Cognitive-Space model and the Stroop-/Simon-only models, and results showed that the CognitiveSpace model (BIC = 5377093) outperformed the Stroop-Only (BIC = 5377122) and Simon-Only (BIC = 5377096) models. This suggests that mixed conflicts might not be solely reflective of either Stroop or Simon sources, although we did not include these results due to concerns raised by reviewers about the validity of such comparisons, given the high anticorrelation between the two dimensions. Furthermore, Fu et al. (2022) demonstrated that the mixture of Simon and Flanker conflicts (the sf condition) is represented as the vector sum of the Flanker and Simon dimensions within their space model, indicating a compositional nature. Similarly, our mixed conditions are combinations of Stroop and Simon conflicts, and it is plausible that these mixtures represent a fusion of both Stroop and Simon components, rather than just one. Thus, we disagree that the mixture of conflicts is a strawman. In response to this concern, we have included a statement in our limitation section: “Another limitation is that in our design, the spatial Stroop and Simon effects are highly anticorrelated. This constraint may make the five conflict types represented in a unidimensional space (e.g., a circle) embedded in a 2D space. This limitation also means we cannot conclusively rule out the possibility of a real unidimensional space driven solely by spatial Stroop or Simon conflicts. However, this appears unlikely, as it would imply that our manipulation of conflict types merely represented varying levels of a single conflict, akin to manipulating task difficulty when everything else being equal. If task difficulty were the primary variable, we would expect to see greater representational similarity between task conditions of similar difficulty, such as the Stroop and Simon conditions, which demonstrates comparable congruency effects (see Fig. S1). Contrary to this, our findings reveal that the Stroop-only and Simon-only conditions exhibit the lowest representational similarity (Fig. S4). Furthermore, Fu et al. (2022) has shown that the representation of mixtures of Simon and Flanker conflicts was compositional, rather than reflecting single dimension, which also applies to our cases.”

      My recommendation would be to dramatically rewrite to reduce the framing of this providing critical evidence in favor of cognitive maps, and being more overt about the limitations of this task. However, the authors are not required to make further revisions in eLife's new model, and it's not clear how my scores would change if they made those revisions (ie the conceptual limitations would remain, the claims would just now match the more limited scope).

      Response: With the above rationales and the adjustments we have made in the manuscripts, we believe that we have thoroughly acknowledged and articulated the limitations of our study. Therefore, we have decided against a complete rewrite of the manuscript.

      Public Review:

      2) The representations within DLPFC appear to treat 100% Stoop and (to a lesser extent) 100% Simon differently than mixed trials. Within mixed trials, the RDM within this region don't strongly match the predictions of the conflict similarity model. It appears that there may be a more complex relationship encoded in this region.

      Suggestion:

      2) RSMs in the key region of interest. I don't really understand the authors response here either. e.g,. 'It is essential to clarify that our conclusions were based on the significant similarity modulation effect identified in our statistical analysis using the cosine similarity model, where we did not distinguish between the within-Stroop condition and the other four within-conflict conditions (Fig. 7A, now Fig. 8A). This means that the representation of conflict type was not biased by the seemingly disparities in the values shown here'. In Figure 1C, it does look like they are testing this model.

      It seems like a stronger validation would test just the mixture trials (i.e., ignoring Simon-only and stroop-only). However, simon/stroop-only conditions being qualitatively different does beg the question of whether these are being represented parametrically vs categorically.

      Response: We apologize for the confusion caused by our previous response. To clarify, our conclusions have been drawn based on the robust conflict similarity effect.

      The conflict similarity regressor is defined by higher values in the diagonal cells (representing within-conflict similarity) than the off-diagonal cells (indicating between-conflict similarity), as illustrated in Fig. 1C and Fig. 8A (now Fig. 4B). It is important to note that this regressor may not be particularly sensitive to the variations within the diagonal cells. Our previous response aimed to emphasize that the inconsistencies observed along the diagonal do not contradict our core hypothesis regarding the conflict similarity effect.

      We recognized that since the visualization in Fig. S4, based on the raw RSM (i.e., Pearson correlation), may have been influenced by other regressors in our model than the conflict similarity effect. To reflect pattern similarity with confounding factors controlled for, we have visualized the RSM by including only the fixed effect of the conflict similarity and the residual while excluding all other factors. As shown in the revised Figure S4, the difference between the within-Stroop and other diagonal cells was greatly reduced. Instead, it revealed a clear pattern where that the diagonal values were higher than the off-diagonal values in the incongruent condition, aligning with our hypothesis regarding the conflict similarity modulator. Although some visual distinctions persist within the five diagonal cells (e.g., in the incongruent condition, the Stroop, Simon, and StMSmM conditions appear slightly lower than StHSmL and StLSmM conditions), follow-up one-way ANOVAs among these five diagonal conditions showed no significant differences. This held true for both incongruent and congruent conditions, with Fs < 1. Thus, we conclude that there is no strong evidence supporting the notion that Simon- and spatial Stroop-only conditions are systematically different from other conflict types. As a result, we decided not to exclude these two conflict types from analysis.

      Author response image 2.

      The stronger conflict type similarity effect in incongruent versus congruent conditions. Shown are the summary representational similarity matrices for the right 8C region in incongruent (left) and congruent (right) conditions, respectively. Each cell represents the averaged Pearson correlation (after regressing out all factors except the conflict similarity) of cells with the same conflict type and congruency in the 1400×1400 matrix. Note that the seemingly disparities in the values of withinconflict cells (i.e., the diagonal) did not reach significance for either incongruent or congruent trials, Fs < 1.

      Public Review:

      3) To orthogonalized their variables, the authors need to employ a complex linear mixed effects analysis, with a potential influence of implementation details (e.g., high-level interactions and inflated degrees of freedom).

      Suggestion:

      3) The DF for a mixed model should not be the number of observations minus the number of fixed effects. The gold standard is to use satterthwaite correction (e.g. in Matlab, fixedEffects(lme,'DFMethod','satterthwaite')), or number of subjects - number of fixed effects (i.e. you want to generalize to new subjects, not just new samples from the same subjects). Honestly, running a 4-way interaction probably is probably using more degrees of freedom than are appropriate given the number of subjects.

      Response: We concur with the reviewer’s comment that our previous estimation of degrees of freedom (DFs) was inaccurate. Following your suggestion, we have now applied the “Satterthwaite” approach to approximate the DFs for all our linear mixed effect model analyses. This adjustment has led to the correction of both DFs and p values. In the Methods section, we have mentioned this revision.

      “We adjusted the t and p values with the degrees of freedom calculated through the Satterthwaite approximation method (Satterthwaite, 1946). Of note, this approach was applied to all the mixed-effect model analyses in this study.”

      The application of this method has indeed resulted in a reduction of our statistical significance. However, our overall conclusions remained robust. Instead of the highly stringent threshold used in our previous version (Bonferonni corrected p < .0001), we have now adopted a relatively more lenient threshold of Bonferonni correction at p < 0.05, which is commonly employed in the literature. Furthermore, it is worth noting that the follow-up criteria 2 and 3 are inherently second-order analyses. Criterion 2 involves examining the interaction effect (conflict similarity effect difference between incongruent and congruent conditions), and criterion 3 involves individual correlation analyses. Due to their second-order nature, these criteria inherently have lower statistical power compared to criterion 1 (Blake & Gangestad, 2020). We thus have applied a more lenient but still typically acceptable false discovery rate (FDR) correction to criteria 2 and 3. This adjustment helps maintain the rigor of our analysis while considering the inherent differences in statistical power across the various criteria. We have mentioned this revision in our manuscript:

      “We next tested whether these regions were related to cognitive control by comparing the strength of conflict similarity effect between incongruent and congruent conditions (criterion 2) and correlating the strength to behavioral similarity modulation effect (criterion 3). Given these two criteria pertain to second-order analyses (interaction or individual analyses) and thus might have lower statistical power (Blake & Gangestad, 2020), we applied a more lenient threshold using false discovery rate (FDR) correction (Benjamini & Hochberg, 1995) on the above-mentioned regions.”

      With these adjustments, we consistently identified similar brain regions as observed in our previous version. Specifically, we found that only the right 8C region met the three criteria in the conflict similarity analysis. In addition, the regions meeting the criteria for the orientation effect included the FEF and IP2 in left hemisphere, and V1, V2, POS1, and PF in the right hemisphere. We have thoroughly revised the description of our results, updated the figures and tables in both the revised manuscript and supplementary material to accurately reflect these outcomes.

      Reference:

      Blake, K. R., & Gangestad, S. (2020). On Attenuated Interactions, Measurement Error, and Statistical Power: Guidelines for Social and Personality Psychologists. Pers Soc Psychol Bull, 46(12), 1702-1711. https://doi.org/10.1177/0146167220913363

      Minor:

      1. Figure 8 should come much earlier (e.g, incorporated into Figure 1), and there should be consistent terms for 'cognitive map' and 'conflict similarity'.

      Response: We appreciate this suggestion. Considering that Figure 7 (“The crosssubject RSA model and the rationale”) also describes the models, we have merged Figure 7 and 8 and moved the new figure ahead, before we report the RSA results. Now you could find it in the new Figure 4, see below. We did not incorporate them into Figure 1 since Figure 1 is already too crowded.

      Author response image 3.

      Fig. 4. Rationale of the cross-subject RSA model and the schematic of key RSMs. A) The RSM is calculated as the Pearson’s correlation between each pair of conditions across the 35 subjects. For 17 subjects, the stimuli were displayed on the top-left and bottom-right quadrants, and they were asked to respond with left hand to the upward arrow and right hand to the downward arrow. For the other 18 subjects, the stimuli were displayed on the top-right and bottom-left quadrants, and they were asked to respond with left hand to the downward arrow and right hand to the upward arrow. Within each subject, the conflict type and orientation regressors were perfectly covaried. For instance, the same conflict type will always be on the same orientation. To de-correlate conflict type and orientation effects, we conducted the RSA across subjects from different groups. For example, the bottom-right panel highlights the example conditions that are orthogonal to each other on the orientation, response, and Simon distractor, whereas their conflict type, target and spatial Stroop distractor are the same. The dashed boxes show the possible target locations for different conditions. (B) and (C) show the orthogonality between conflict similarity and orientation RSMs. The within-subject RSMs (e.g., Group1-Group1) for conflict similarity and orientation are all the same, but the cross-group correlations (e.g., Group2-Group1) are different. Therefore, we can separate the contribution of these two effects when including them as different regressors in the same linear regression model. (D) and (E) show the two alternative models. Like the cosine model (B), within-group trial pairs resemble betweengroup trial pairs in these two models. The domain-specific model is an identity matrix. The domaingeneral model is estimated from the absolute difference of behavioral congruency effect, but scaled to 0 (lowest similarity) – 1 (highest similarity) to aid comparison. The plotted matrices in B-E include only one subject each from Group 1 and Group 2. Numbers 1-5 indicate the conflict type conditions, for spatial Stroop, StHSmL, StMSmM, StLSmH, and Simon, respectively. The thin lines separate four different sub-conditions, i.e., target arrow (up, down) × congruency (incongruent, congruent), within each conflict type.

      In our manuscript, the term “cognitive map/space” was used when explaining the results in a theoretical perspective, whereas the “conflict similarity” was used to describe the regressor within the RSA. These terms serve distinct purposes in our study and cannot be interchangeably substituted. Therefore, we have retained them in their current format. However, we recognize that the initial introduction of the “Cognitive-Space model” may have appeared somewhat abrupt. To address this, we have included a brief explanatory note: “The model described above employs the cosine similarity measure to define conflict similarity and will be referred to as the Cognitive-Space model.”

    1. Author Response

      Author responses to the original review:

      The data we produce are not criticized as such and thus, do not require revision; the criticisms concern our interpretation of them. General themes of the reviews are that i) genetic signatures do not matter for defining neuronal types (here sympathetic versus parasympathetic); ii) that a cholinergic postganglionic autonomic neuron must be parasympathetic; and iii) that some physiology of the pelvic region would deserve the label “parasympathetic”. We answered the latter argument in (Espinosa-Medina et al., 2018) to which we refer the interested reader; and we fully disagree with the first two. Of note, part of the last sentence of the eLife assessment is misleading and does not reflect the referees’ comments. Our paper analyses genetic differences between the cranial and sacral outflow and uses them to argue that they cannot be both parasympathetic. The eLife assessment acknowledges the “genetic differences” but concludes that, somehow, they don’t detract from a common parasympathetic identity. We take issue with this paradox, of course, but it is coherent with the referee’s comments. On the other hand, the eLife assessment alone pushes the paradox one step further by stating that “functional differences” between the cranial and sacral outflows can’t either prevent them from being both parasympathetic. We would also object to this, but the only “functional differences” used by the referees to dismiss our diagnostic of a sympathetic-like character (rather than parasympathetic) for the sacral outflow are between noradrenergic and cholinergic, and between sympathetic and parasympathetic (and we also disagree with those, see above, and below) —not between cranial and sacral.

      We will thus use the opportunity offered by eLife to keep the paper as it is (with a few minor stylistic changes). We respond below to the referees’ detailed remarks and hope that the publication, as per eLife new model, of the paper, the referees’ comments and our response will help move the field forward.

      Public review by Referee #1

      “Consistently, the P3 cluster of neurons is located close to sympathetic neuron clusters on the map, echoing the conventional understanding that the pelvic ganglia are mixed, containing both sympathetic and parasympathetic neurons”.

      The greater closeness of P3 than of P1/2/4 to the sympathetic cluster can be used to judge P1/2/4 less sympathetic than P3 (and more… something else), but not more parasympathetic. There is no echo of the “conventional understanding” here.

      “A closer look at the expression showed that some genes are expressed at higher levels in sympathetic neurons and in P2 cluster neurons ” [We assume that the referee means “in sympathetic neurons and in P3 cluster neurons”] but much weaker in P1, P2, and P4 neurons such as Islet1 and GATA2, and the opposite is true for SST. Another set of genes is expressed weakly across clusters, like HoxC6, HoxD4, GM30648, SHISA9, and TBX20.

      These statements are inaccurate; On the one hand, the classification is not based on impression by visual inspection of the heatmap, but by calculations, using thresholds. Admittedly, the thresholds have an arbitrary aspect, but the referee can verify (by eye inspection of heatmap) that genes which we calculate as being at “higher levels in sympathetic neurons and in P3 cluster neurons, but much weaker in P1, P2, and P4 neurons” or vice versa, i.e. noradrenergic or cholinergic neurons (genes from groups V and VI, respectively), have a much bigger difference than those cited by the referee, indeed are quasi-absent from the weaker clusters or ganglia. In addition, even by subjective eye inspection:

      Islet is equally expressed in P4 and sympathetics.

      SST is equally expressed in P1 and sympathetics.

      Tbx20 is equally expressed in P2 and sympathetics.

      HoxC6, HoxD4, GM30648, SHISA9 are equally expressed in all clusters and all sympathetic ganglia.

      “Since the pelvic ganglia are in a caudal body part, it is not surprising to have genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa (to have genes expressed in sphenopalatine ganglia, but not in pelvic ganglia), according to well recognized rostro-caudal body patterning, such as nested expression of hox genes.”

      We do not simply show “genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa”, i.e. a genetic distance between pelvic and sphenopalatine, but many genes expressed in all pelvic cells and sympathetic ones, i.e. a genetic proximity between pelvic and sympathetic. This situation can be deemed “unsurprising”, but it can only be used to question the parasympathetic nature of pelvic cells (as we do), or considered irrelevant (as the referee does, because genes would not define cell types, see our response to an equivalent stance by Referee#2). Concerning Hox genes, we do take them into account, and speculate in the discussion that their nested expression is key to the structure of the autonomic nervous system, including its division into sympathetic and parasympathetic outflows.

      It is much simpler and easier to divide the autonomic nervous system into sympathetic neurons that release noradrenaline versus parasympathetic neurons that release acetylcholine, and these two systems often act in antagonistic manners, though in some cases, these two systems can work synergistically. It also does not matter whether or not pelvic cholinergic neurons could receive inputs from thoracic-lumbar preganglionic neurons (PGNs), not just sacral PGNs; such occurrence only represents a minor revision of the anatomy. In fact, it makes much more sense to call those cholinergic neurons located in the sympathetic chain ganglia parasympathetic.

      This “minor revision of the anatomy” would make spinal preganglionic neurons which are universally considered sympathetic (in the thoraco-lumbar chord), synapse onto large numbers of parasympathetic neurons (in the paravertebral chains for sweat glands and periosteum, and in the pelvic ganglion), robbing these terms of any meaning.

      Thus, from the functionality point of view, it is not justified to claim that "pelvic organs receive no parasympathetic innervation".

      There never was any general or rigorous functional definition of the sympathetic and parasympathetic nervous systems — it is striking, almost ironic, that Langley, creator of the term parasympathetic and the ultimate physiologist, provides an exclusively anatomic definition in his Autonomic Nervous System, Part I. Hence, our definition cannot clash with any “functionality point of view”. In fact, as we briefly say in the discussion and explore in (Espinosa-Medina et al., 2018), it is the “sacral parasympathetic” paradigm which is unjustified from a functionality point of view, for implying a functional antagonism across the lumbo-sacral gap, which has been disproven repeatedly. It remains to be determined which neurons are antagonistic to which on the blood vessels of the external genitals; antagonism within one division of the autonomic nervous system would not be without precedent (e.g. there exist both vasoconstrictor and vasodilator sympathetic neurons, and both, inhibitor and activator enteric motoneurons). The way to this question is finally open to research, and as referee#2 says “it is early days”.

      Public review by Referee #2

      This work further documents differences between the cranial and sacral parasympathetic outflows that have been known since the time of Langley - 100 years ago.

      We assume that the referee means that it is the “cranial and sacral parasympathetic outflows” which “have been known since the time of Langley”, not their differences (that we would “further document”): the differences were explicitly negated by Langley. As a matter of fact, the sacral and cranial outflows were first likened to each other by Gaskell, 140 years ago (Gaskell, 1886). This anatomic parallel (which is deeply flawed (Espinosa-Medina et al., 2018)) was inherited wholesale by Langley, who added one physiological argument (Langley and Anderson, 1895) (which has been contested many times (Espinosa-Medina et al., 2018) and references within).

      In addition, the sphenopalatine and other cranial ganglia develop from placodes and the neural crest, while sympathetic and sacral ganglia develop from the neural crest alone.

      Contrary to what the referee says, the sphenopalatine has no placodal contribution. There is no placodal contribution to any autonomic ganglion, sympathetic or parasympathetic (except an isolated claim concerning the ciliary ganglion (Lee et al., 2003)). All autonomic ganglia derive from the neural crest as determined a long time ago in chicken. For the sphenopalatine in mouse, see our own work (Espinosa-Medina et al., 2016).

      One feature that seems to set the pelvic ganglion apart is […] the convergence of preganglionic sympathetic and parasympathetic synapses on individual ganglion cells (Figure 3). This unusual organization has been reported before using microelectrode recordings (see Crowcroft and Szurszewski, J Physiol (1971) and Janig and McLachlan, Physiol Rev (1987)). Anatomical evidence of convergence in the pelvic ganglion has been reported by Keast, Neuroscience (1995).

      Contrary to what the referee says, we do not provide in Figure 3 any evidence for anatomic convergence, i.e. for individual pelvic ganglion cells receiving dual lumbar and sacral inputs. We simply show that cholinergic neurons figure prominently among targets of the lumbar pathway. This said, the convergence of both pathways on the same pelvic neurons, described in the references cited by the referee, is another major problem in the theory of the “sacral parasympathetic” (as we discussed previously (Espinosa-Medina et al., 2018)).

      It should also be noted that the anatomy of the pelvic ganglion in male rodents is unique. Unlike other species where the ganglion forms a distributed plexus of mini-ganglia, in male rodents the ganglion coalesces into one structure that is easier to find and study. Interestingly the image in Figure 3A appears to show a clustering of Chat-positive and Th-positive neurons. Does this result from the developmental fusion of mini ganglia having distinct sympathetic and parasympathetic origins?

      The clustering of Chat-positive and Th-positive cells could arise from a number of developmental mechanisms, that we have no idea of at the moment. This has no bearing on sympathetic and parasympathetic.

      In addition, Brunet et al dismiss the cholinergic and noradrenergic phenotypes as a basis for defining parasympathetic and parasympathetic neurons. However, see the bottom of Figure S4 and further counterarguments in Horn (Clin Auton Res (2018)).

      The bottom of Figure S4 simply indicates which cells are cholinergic and adrenergic. We have already expounded many times that noradrenergic and cholinergic do not coincide with sympathetic and parasympathetic. Henry Dale (Nobel Prize 1936) demonstrated this. Langley himself devoted several pages of his final treatise to this exception to his “Theory on the relation of drugs to nerve system” (Langley, 1921) (p43) (which was actually a bigger problem for him than it is for us, for reason which are too long to recount here; it is as if the theoretical difficulties experienced by Langley had been internalized to this day in the form of a dismissal of the cholinergic sympathetic neurons as a slightly scandalous but altogether forgettable oddity). (Horn, 2018) reviews the evidence that the thoracic cholinergic sympathetic phenotype is brought about by a secondary switch upon interaction with the target and argues that this would be a fundamental difference with the sacral “parasympathetic”. But in fact the secondary switch is preceded by co-expression of ChAT and VAChT with Th in most sympathetic neurons (reviewed in (Ernsberger and Rohrer, 2018)); and we have no idea of the dynamic in the pelvic ganglion. It may also be mentioned in this context that target-dependent specification of neuronal identity has also been demonstrated of other types of sympathetic neurons ((Furlan et al., 2016)

      What then about neuropeptides, whose expression pattern is incompatible with the revised nomenclature proposed by Brunet et al.?

      There was never any neuropeptide-inspired criterion for a nomenclature of the autonomic nervous system.

      Figure 1B indicates that VIP is expressed by sacral and cranial ganglion cells, but not thoracolumbar ganglion cells.

      Contrary to what the referee says, there are VIP-positive cells in our sympathetic data set and even strongly positive ones, except they are scattered and few (red bars on the UMAP). They correspond to cholinergic sympathetics, likely sudomotor, which are known to contain VIP (e.g.(Anderson et al., 2006)(Stanke et al., 2006)). In other words, VIP is probably part of what we call the cholinergic synexpression group (but was not placed in it by our calculations, probably because of a low expression level in sympathetic noradrenergic cells).

      The authors do not mention neuropeptide Y (NPY). The immunocytochemistry literature indicates that NPY is expressed by a large subpopulation of sympathetic neurons but never by sacral or cranial parasympathetic neurons.

      Contrary to what the referee says, Keast (Keast, 1995) finds 3.7% of pelvic neurons double stained for NPY and VIP in male rats, and says (Keast, 2006) that in females “co-expression of NPY and VIP is common” ( thus in cholinergic neurons that the referee calls “parasympathetic”). Single cell transcriptomics is probably more sensitive than immunochemistry, and in our dichotomized data set (table S1), NPY is expressed in all pelvic clusters and all sympathetic ganglia. In other words, it is one more argument for their kinship. It does not appear in the heatmap because it ranks below the 100 top genes.

      Answer to the original recommendations by Referee #2

      Introduction - the use of the words 'consensual' and 'promiscuity' are not clear and rather loaded in the context of the pelvic ganglia. Pick alternative words.

      There is no sexual innuendo inherent in “promiscuity”: “condition of elements of different kinds grouped or massed together without order” (Oxford English Dictionary). We replaced “never consensual” by “never generally accepted”.

      Results - Page 2 - what sex were the mice? Previous works indicate significant sexual dimorphism in the pelvic ganglion.

      The mice included both males and females, and male and female cells are represented in all ganglia and clusters. This is now mentioned in the Material and Methods. Thus, however unsuited to analyze sexual dimorphism, our data set ensures that all the cell types we describe are qualitatively present in both sexes.

      Results line 3 - the celiac and mesenteric ganglia are prevertebral ganglia and not part of the sympathetic chain. The chain refers to the paravertebral ganglia.

      We replaced “part of the prevertebral chain” by “belonging to prevertebral ganglia”. This said, there are precedents for “prevertebral chain ganglia” to designate the rostro-caudal series of prevertebral ganglia. Rita Levi-Montalcini, for example, who devoted her glorious career to sympathetic ganglia, writes in 1972 “The nerve cell population of para- and prevertebral chain ganglia is reduced to 3–5% of that of controls”. (10.1016/0006-8993(72)90405-2).

      Page 3 - "as the current dogma implies". Dogma often refers to opinion or church doctrine. The current nomenclature is neither. Pick another word.

      There is little in science that is proven to the point of eliminating any element of opinion. “Dogma” refers to “that which is held as a principle or tenet […], especially a tenet authoritatively laid down by […] a school of thought” (OED). And “dogma” is used in science to designate tenets better experimentally supported than the “sacral parasympathetic”, such as the “central dogma of molecular biology”.

      Page 3 - "To give justice" implies the classical notion is unjust. How about, 'to further explore previous evidence indicating that ....'

      The term is indeed not proper English for the meaning intended, and the right expression is “to do justice”, to mean: “to treat [a subject or thing] in a manner showing due appreciation, to deal with [it] as is right or fitting” (OED). We have corrected the paper accordingly.

      Page 4 top - the convergence indicated by Figure 3 does not justify excluding cholinergic and noradrenergic genes from the analysis.

      Contrary to what the referee says, Figure 3 does not show any “convergence”, see our answer to Referee#1. What Figure 3 shows is that cells that are targeted by the lumbar pathway (a pathway universally deemed “sympathetic”) are cholinergic in massive proportion. Therefore, by an uncontroversial criterion, the pelvic ganglion contains lots of sympathetic cholinergic neurons. The only other option is to declare that sympathetic preganglionic neurons synapse onto parasympathetic postganglionic ones (which is what Referee#1 proposes, and considers “much simpler”. We beg to differ).

      Our justification for excluding cholinergic and noradrenergic genes from the definition of “sympathetic” and “parasympathetic” is simply that sympathetic neurons can be cholinergic (to sweat glands and periosteum; and — as we show in Figure 3 — many targets of the lumbar pathway); One can also note that anywhere else in the nervous system, classifying cell types as a function of neurotransmitter phenotype would lead to non-sensical descriptions, such as putting together pyramidal cells and cerebellar granules, or motor neurons and basal forebrain cholinergic neurons. Indeed Referee#1 proposes such a revolutionary revision, by calling all cholinergic autonomic neurons “parasympathetic” (see our answer above).

      Keast (1995) did similar experiments and used presynaptic lesions to draw a different conclusion indicating preferential innervation pelvic subpopulations.

      Keast found “preferential” innervation of pelvic subpopulations based on lesion experiments; Nevertheless, she concluded (at the time) that “the correct definition of these two components of the nervous system is based on neuroanatomy rather than chemistry” (Keast, 2006).

      Page 4 - "In the aggregate, the pelvic ganglion is best described as a divergent sympathetic ganglion devoid of parasympathetic neurons" The notion of a divergent ganglion is completely unclear!

      We take “divergent” in a developmental or evolutionary meaning: related to sympathetic ganglia, yet somewhat differing from them. Elsewhere we use the word “modified”. Importantly (and as cited in the paper), a similar situation emerges from the single cell transcriptomic analysis of the lumbar and sacral preganglionics (by other research groups).

      Granted, it is devoid of neurons having the signature of cranial parasympathetics, but that is insufficient to conclude that they are not parasympathetics.

      If a genetic signature which is not only un-parasympathetic, but sympathetic-like remains compatible with some version of the label “parasympathetic”, we get dangerously close to dismissing the molecular make-up of a neuron as a definition of its type. This goes against any contemporary understanding of neuron types (take (Zeisel et al., 2018) among hundreds of other examples).

      Page 4 - "the entire taxonomy of autonomic ganglia could be a developmental readout of Hox genes." This reader completely agrees! We appreciate this would be difficult to test but it helps to explain possible differences along the rostro-caudal axis. Consider making this a key implication of the study!

      If the reader agrees, then his/her previous points become mysterious: we speculate that the Hox code determines the structure of the autonomic nervous system, i.e. the array, along the rostrocaudal axis, of a bulbar parasympathetic, a thoracolumbar sympathetic and lumbo-sacral “pelvo-sympathetic”. The existence of caudal parasympathetic neurons, on the contrary, would subvert any role for Hox genes: similar neurons (similar enough to be called by the same name) would arise at completely different rostro-caudal levels, i.e. with a different Hox code.

      Page 5 - "It is thus remarkable ...that we uncover in no way contradicts the physiology." Not really. The 'classical' sympathetic system innervates the limbs, and the skin and it participates in thermoregulation and in cardiovascular adjustments to exercise. The parasympathetic system does none of these things. Reclassing the pelvic outflow as pseudo-sympathetic contradicts this physiology.

      We do not say that the sacral outflow is classically sympathetic; We go all the way to proposing the special name “pelvo-sympathetic”; And we insist that these special sympathetic-like neurons have special targets (detrusor muscle, helicine arteries…): there is no contradiction. Not only is there no contradiction, but we remove the mind-twister of an anatomical/genetic/cell type-based “sacral parasympathetic” combined with a lack of physiological lumbosacral antagonism (we provide a short history of this dissonance in (Espinosa-Medina et al., 2018)), which led Wilfrid Jänig to write (Jänig, 2006)(p. 357): “Thus, functions assumed to be primarily associated with sacral (parasympathetic) are well duplicated by thoracolumbar (sympathetic) pathways. This shows that the division of the spinal autonomic systems into sympathetic and parasympathetic with respect to sexual functions is questionable”. We could not agree more: this division is questionable in terms of physiology and inexistent in terms of cell types. In other words, we reconcile cell types with physiology (but “it is early days”).

      Answer to the novel recommendations by Referee #2

      In addition to my original comments, important anatomical and functional distinctions are not explained by the data in this paper. ANATOMY- Sympathetic ganglia are located in close proximity to major branches of the aorta. Cranial and sacral parasympathetic ganglia are located next to or within the structures they innervate (e.g. eye, lung, heart, bladder).

      The pelvic ganglion, including some of its cholinergic neurons, that the referee insist are parasympathetic, is further removed from one of its major targets (the helicine arteries of the external genitals) than the sympathetic prevertebral ganglia are of some of theirs (like the gut or kidney). We discussed this issue in (Espinosa-Medina et al., 2018).

      FUNCTION- The sympathetic system controls state variables (e.g. body temperature, blood pressure, serum electrolytes and fluid balance), parasympathetic neurons do not.

      Even in the classical view, the sympathetic system controls the blood vessels of the external genitals or the size of the pupil, for example, which are not state variables.

      […] The data in the paper are a useful next step in defining the genetic diversity of autonomic neurons but do not justify or improve upon existing nomenclature. The future challenge is to understand distinctions between subsets of autonomic ganglion cells that innervate different targets and the principles that govern the integrative function of the autonomic motor system that controls behavior.

      We thank the referee for finding our data useful; and we fully agree with the latter statement. However, neurons, like many other cell types, are hierarchically organized (Zeng and Sanes, 2017), i.e. subsets of neurons belong to sets, with defining traits. Our data argue that there is no parasympathetic neuronal set that includes any pelvic ganglionic neuron. In contrast, there is a ganglionic sympathetic set (defined by our analysis of gene expression) which includes all of them — as there is a preganglionic sympathetic set that includes sacral preganglionics (Alkaslasi et al., 2021; Blum et al., 2021)(although the direct comparison with cranial preganglionics is yet to be made).

      References

      Anderson, C. R., Bergner, A. and Murphy, S. M. (2006). How many types of cholinergic sympathetic neuron are there in the rat stellate ganglion? Neuroscience 140, 567–576.

      Alkaslasi, M. R., Piccus, Z. E., Hareendran, S., Silberberg, H., Chen, L., Zhang, Y., Petros, T. J. and Le Pichon, C. E. (2021). Single nucleus RNA-sequencing defines unexpected diversity of cholinergic neuron types in the adult mouse spinal cord. Nat Commun 12, 2471.

      Blum, J. A., Klemm, S., Shadrach, J. L., Guttenplan, K. A., Nakayama, L., Kathiria, A., Hoang, P. T., Gautier, O., Kaltschmidt, J. A., Greenleaf, W. J., et al. (2021). Single-cell transcriptomic analysis of the adult mouse spinal cord reveals molecular diversity of autonomic and skeletal motor neurons. Nat Neurosci 24, 572–583.

      Ernsberger, U. and Rohrer, H. (2018). Sympathetic tales: subdivisons of the autonomic nervous system and the impact of developmental studies. Neural Dev 13, 20.

      Espinosa-Medina I, Saha O, Boismoreau F, Chettouh Z, Rossi F, Richardson WD, Brunet JF (2016) The sacral autonomic outflow is sympathetic. Science 354, 893-897

      Espinosa-Medina, I., Saha, O., Boismoreau, F. and Brunet, J.-F. (2018). The “sacral parasympathetic”: ontogeny and anatomy of a myth. Clin Auton Res 28, 13–21.

      Furlan, A., La Manno, G., Lübke, M., Häring, M., Abdo, H., Hochgerner, H., Kupari, J., Usoskin, D., Airaksinen, M. S., Oliver, G., et al. (2016). Visceral motor neuron diversity delineates a cellular basis for nipple- and pilo-erection muscle control. 19, 1331–1340.

      Gaskell, W. H. (1886). On the Structure, Distribution and Function of the Nerves which innervate the Visceral and Vascular Systems. J Physiol 7, 1-80.9.

      Horn, J. P. (2018). The sacral autonomic outflow is parasympathetic: Langley got it right. Clin Auton Res 28, 181–185.

      Jänig, W. (2006). The Integrative Action of the Autonomic Nervous System: Neurobiology of Homeostasis. Cambridge: Cambridge University Press.

      Keast, J. R. (1995). Visualization and immunohistochemical characterization of sympathetic and parasympathetic neurons in the male rat major pelvic ganglion. Neuroscience 66, 655–662.

      Keast, J. R. (2006). Plasticity of pelvic autonomic ganglia and urogenital innervation. International Review of Cytology - a Survey of Cell Biology, Vol 248 248, 141-+.

      Langley, J. N. (1921). In The autonomic nervous system (Pt. I)., p. Cambridge: Heffer & Sons ltd.

      Langley, J. N. and Anderson, H. K. (1895). The Innervation of the Pelvic and adjoining Viscera: Part II. The Bladder. Part III. The External Generative Organs. Part IV. The Internal Generative Organs. Part V. Position of the Nerve Cells on the Course of the Efferent Nerve Fibres. J Physiol 19, 71–139.

      Lee, V. M., Sechrist, J. W., Luetolf, S. and Bronner-Fraser, M. (2003). Both neural crest and placode contribute to the ciliary ganglion and oculomotor nerve. Developmental biology 263, 176–190.

      Stanke, M., Duong, C. V., Pape, M., Geissen, M., Burbach, G., Deller, T., Gascan, H., Parlato, R., Schütz, G. and Rohrer, H. (2006). Target-dependent specification of the neurotransmitter phenotype:cholinergic differentiation of sympathetic neurons is mediated in vivo by gp130 signaling. Development 133, 141–150.

      Zeisel, A., Hochgerner, H., Lönnerberg, P., Johnsson, A., Memic, F., van der Zwan, J., Häring, M., Braun, E., Borm, L. E., La Manno, G., et al. (2018). Molecular Architecture of the Mouse Nervous System. Cell 174, 999-1014.e22.

      Zeng, H. and Sanes, J. R. (2017). Neuronal cell-type classification: challenges, opportunities and the path forward. Nat Rev Neurosci 18, 530–546.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the overwhelmingly positive summaries from all three reviewers and the eLife editorial team. All reviewers provided extremely detailed feedback regarding the initially submitted manuscript, we appreciate their efforts in helping us improve this manuscript. Below, are listed each of the specific comments made by the reviewers, and our responses to them in a point-by-point format.

      The only notable change made to the manuscript that was not in response to comments from a reviewer was regarding nomenclature of the structure that we had previously called the nuclear microtubule organising centre (MTOC). We had used the term MTOC to describe the entire structure, which spans the nuclear envelope and comprises an intranuclear portion and cytoplasmic extensions. Given recent evidence, including findings from this study, it is possible that both the intranuclear region and cytoplasmic extensions both have microtubule nucleating capacity, and therefore both meet the definition of an MTOC. To disambiguate this, we now refer to the overall structure as the centriolar plaque (CP), consistent with previous literature. The intranuclear portion of the CP will be referred to as the inner CP, while the cytoplasmic portion will be referred to as the outer CP.

      Reviewer #1 (Recommendations For The Authors):

      1) In the first part of the result section, a paragraph on sample processing for U-ExM could be added, with reference to Fig 1b.

      The following section has been added to the first paragraph of the results “…In this study all parasites were fixed in 4% paraformaldehyde (PFA), unless otherwise stated, and anchored overnight at 37 °C before gelation, denaturation at 95 °C and expansion. Expanded gels were measured, before shrinking in PBS, antibody staining, washing, re-expansion, and imaging (Figure 1b). Parasites were harvested at multiple time points during the intraerythrocytic asexual stage and imaged using Airyscan2 super-resolution microscopy, providing high-resolution three-dimensional imaging data (Figure 1c). A full summary of all target-specific stains used in this study can be found in Figure 1d.”

      2) The order of the figures could be changed for more consistency. For example, fig 2b is cited before 2a.

      An earlier reference to figure 2a was added to rectify this discrepancy.

      3) In Fig 2b it is difficult to distinguish the blue (nuclear) and green (plasma membrane) lines.x

      The thickness of these lines has been doubled.

      4) It is unclear what the authors want to show in Fig 2a.

      The intention of this figure, as with panel a of the majority of the organelle-specific figures in this manuscript, is simply to show what the target protein/structure looks like across intraerythrocytic development.

      5) Lines 154-155, the numbers of MTOC observed do not match those in Supplt Fig2c.

      This discrepancy has been addressed, the numbers in Supplementary Figure 2c were accurate so the text has been changed to reflect this.

      6) Line 188: the authors should explain the principle of C1 treatment.

      The following explanation of C1 treatment has been provided:

      “To ensure imaged parasites were fully segmented, we arrested parasite development by adding the reversible protein kinase G inhibitor Compound 1 (C1). This inhibitor arrests parasite maturation after the completion of segmentation but before egress. When C1 is washed out, parasites egress and invade normally, ensuring that observations made in C1-arrested parasites are physiologically relevant and not a developmental artefact due to arrest.”

      7) Lines 195-204: this part is rather difficult to follow as analysis of the basal complex is detailed later in the manuscript. The authors refer to Fig4 before describing Fig3.

      This has been clarified in the text.

      8) Lines 225 and 227, the authors cite Supplt Fig 2b about the Golgi, but probably meant Supplt Fig 4? In Supplt Fig 4, the authors could provide magnification in insets to better illustrate the Golgi-MTOC association.

      This should have been a reference to Supplementary Figure 2e instead of 2b, which has now been changed. In Supplementary Figure 4, zooms into a single region of Golgi have been provided to more clearly show its MTOC association.

      9) Supplt Fig8 is wrong (duplication of Supplt Fig6).

      We apologise for this mistake, the correct figure is now present in Supplementary Figure 8.

      10) Line 346: smV5 should be defined, and generation of the parasites should be described in the methods.

      This has now been defined, but we have not described the generation of the parasites, as this was performed in a previous study that we have referenced.

      11) Lines 361-362: "By the time the basal complex reaches its maximum diameter..." This sentence is not very clear, the authors could explain more precisely the sequence of events, indicating that the basal complex starts moving in the basal direction, as clearly illustrated in Fig 4a.

      This has been prefaced with the following sentence “…As the parasite undergoes segmentation, the basal complex expands and starts moving in the basal direction.”

      12) Supplt Fig6 comes after Supplt Fig9 in the narrative, and therefore could be placed after.

      Supplementary Figure 6 and 9 follow the order in which they are referred to in the text.

      13) Line 538: Supplt Fig9e instead of 9d.

      This has been fixed.

      14) Line 581: does the PFA-glutaraldehyde fixation allows visualizing other structures in addition to cytostome bulbs?

      While PFA-glutaraldehyde fixation allows visualisation of cytostome bulbs, to date we have not observed any other structure that stains/preserves better using NHS Ester or BODIPY Ceramide in PFA-glutaraldehyde fixed parasites. As a general trend, all structures other than cytostomes become somewhat more difficult to identify using NHS Ester or BODIPY Ceramide in PFA-glutaraldehyde fixed samples due to the local contrast with the red blood cell cytoplasm. It seems likely that this is just due to the preservation of RBC cytoplasm, and would be expected from any fixation method that doesn’t result in RBC lysis, rather than anything unique to glutaraldehyde.

      15) Line 652-653: It is unclear how the authors can hypothesize that rhoptries form de novo rather than splitting based on their observations.

      This not something we can say with certainty, we have however, introduced the following paragraph to qualify our claims: “Overall, we present three main observations suggesting that rhoptry pairs undergo sequential de novo biogenesis rather than dividing from a single precursor rhoptry. First, the tight correlation between rhoptry and MTOC cytoplasmic extension number suggests that either rhoptry division happens so fast that transition states are not observable with these methods or that each rhoptry forms de novo and such transition states do not exist. Second, the heterogeneity in rhoptry size throughout schizogony favors a model of de novo biogenesis given that it would be unusual for a single rhoptry to divide into two rhoptries of different sizes. Lastly, well-documented heterogeneity in rhoptry density suggests that, at least during early segmentation, rhoptries have different compositions. Heterogeneity in rhoptry contents would be difficult to achieve so quickly after biogenesis if they formed through fission of a precursor rhoptry.”

      16) Line 769: is expansion microscopy sample preparation compatible with FISH?

      Yes, there are publications of expansion being done with both MERFISH and FISH. Though it has not yet been applied to plasmodium. See examples: Wang, Guiping, Jeffrey R. Moffitt, and Xiaowei Zhuang. "Multiplexed imaging of high-density libraries of RNAs with MERFISH and expansion microscopy." Scientific reports 8.1 (2018): 4847. And Chen, Fei, et al. "Nanoscale imaging of RNA with expansion microscopy." Nature methods 13.8 (2016): 679-684.

      17) In the methods, the authors could provide details on the gel mounting step for imaging This is particularly important since this paper will likely serve as a reference standard for expansion microscopy in the field. Also, illustration that cryopreservation of gels does not modify the quality of the images would be useful.

      The following section has been added to our “image acquisition” paragraph: “Immediately before imaging, a small slice of gel ~10mm x ~10mm was cut and mounted on an imaging dish (35mm Cellvis coverslip bottomed dishes NC0409658 - FisherScientific) coated with Poly-D lysine. The side of the gel containing sample is placed face down on the coverslip and a few drops of ddH20 are added after mounting to prevent gel shrinkage due to dehydration during imaging.”

      We have decided not to illustrate that cryopreservation does not alter gel quality, as this is something that is already covered in the study that first cryopreserved gels, which is referenced in our methods section.

      Reviewer #2 (Recommendations For The Authors):

      1) Advantages and limitations of the expansion method are generally well discussed. The only matter in that respect that I was wondering is if expansion can always be assumed to be linear for all components of a cell. The hemozoin crystal does not expand (maybe not surprisingly), but could there also be other cellular structures that on a smaller scale separate or expand at a different rate than others? Is there any data on this from other organisms? I am raising this here not as a criticism of this work but if known to occur, it might need mentioning somewhere to alert the reader to it, particularly in regards to the many measurements in the paper (see also point 4). This might be a further factor contributing to the finding that the IMC and PPM could not be resolved.

      This is an excellent point and, to our knowledge, one that is currently still under investigation in the field. It is well-documented that expansion protocols need to be customized to each cell type and tissue they are applied to. Each solution used for fixation and anchoring as well as timing and temperature of denaturation can affect the expansion factor achieved as well as how isotropic/anisotropic the expanded structures turn out. However, we do not know of any examples where isotropic expansion was achieved for everything but an organelle or component of the cell. It is our impression that if the cell seems to have attained isotropic expansion, this is assumed to also be the case for the subcellular structures within it. Nonetheless, we think it remains a possibility to be considered specially as more structures are characterized using these methods. In the case of our IMC/PPM findings, when we performed calculations taking into account our experimental expansion factor as well as antibody effects, it was clear that the resolution of our microscope was not enough to resolve the two structures using our current labelling methods. So, we suspect most of the effect is driven by that. However, this still needs to be validated by attempting to resolve the two structures though alternative labelling and imaging methods.

      2) I understand that many things described in the results part are interconnected but still the level of hopping around between different figures/supp figures is considerable (see also point 6 on synchronicity of Figure parts). I do not have a simple fix, but maybe the authors could check if they could come up with a way to streamline parts of their results into a somewhat more reader friendly order.

      This has been a problem we encountered from the beginning and, after trying multiple presentations of the results and discussion, we realized they all have drawbacks. We eventually settled on this presentation as the “least confusing”. We agree, however, that the figure references and order could be better streamlined and have addressed this to the best of our ability.

      3) Are the authors sure the ER expands well and the BIP signal (Fig. S5) gives a signal reflecting the true shape of the ER? The signal in younger parasites seems rather extensive compared to what the ER (in my experience) typically looks like in these stages in live parasites.

      While there may be a discrepancy between how the presumably dynamic ER appears in live cells, and how it appears using BiP staining, we think it is unlikely this is a product of expansion. Additionally, if there were to be an artefactual change in the ER, it would be likely under-expansion rather than over-expansion, which to our knowledge has not been reported. In our opinion, the BiP staining we observe is comparable between unexpanded and expanded samples. We have included comparative images in Author response image 1 with DNA in cyan and BiP in yellow, unexpanded (left) and expanded (right) using the same microscope and BiP antibody.

      Author response image 1.

      4) It is nice to have measurements of the apicoplast and mitochondria, but given their size, this could also have been done in unexpanded, ideally live parasites, avoiding expansion and fixing artifacts. While the expansion has many nice features, measuring area of large structures may not be one where it is strictly needed. I am not saying this is not useful information, but maybe a note could be added to the manuscript that the conclusions on mitochondria and apicoplast area and division might be worth confirming in live parasites. A brief mention on similarities and differences to previous work analysing the shape and multiplication of these organelles through blood stage development (van Dooren et al MolMicrobiol2005) might also be useful.

      We agree with the reviewer that previous studies such as van Dooren et al. (2005) demonstrate that it is possible to track apicoplast and mitochondrial growth without expansion and share the opinion that live parasites are better for these measurements. Expansion only provides an advantage when more organelle-level resolution is needed. For example, in studying the association between these organelles and the MTOC or visualizing other branch-specific interactions.

      5) I could not find the Supp Fig. 8 on the IMC, the current Supp Fig. 8 is a duplication of Supp Fig. 6

      This has been addressed, Supplementary Figure 8 now refers to the IMC.

      6) Figure order is not very synchronous with the text: Fig. 2a is mentioned after Fig. 2b, Fig. 4b is mentioned first for Fig. 4 (Fig. 4a is not by itself mentioned) and before Fig. 3 is mentioned; Fig. 3b is before Fig. 3a.

      We have done our best to fix these discrepancies, but concede that we have not found a way to order these sections that doesn’t lead to some confusion.

      7) Fig. S2a, The label "Centrin" on left image is difficult to read

      We have increased the font size and changed colour slightly in the hope it is leigible.

      8) In Fig. 2a, the centrin foci are very focal and difficult to see in these images, particularly when printed out but also on screen. To a lesser extent this is also the case for CINCH in Fig. 4a (particularly when printed; when zoomed-in on screen, the signal is well visible). This issue of difficulties in seeing the fluorescence signal of some markers, particularly when printed out, applies also to other images of the paper.

      In the images of full size parasites, this is an issue that we cannot easily overcome as the fluorescent channels are already at maximum brightness without overexposure. To try and address this, we have provided zooms that we hope will more clearly show the fluorescence in these panels.

      9) Expand "C1" in line 188 (first use).

      This has been addressed in response to a previous comment.

      10) Line 227; does Supp Fig. 2b really show Golgi- cytoplasmic MTOC association?

      We have rephrased the wording of this section to clarify that we are observing proximity and not necessarily a physical tethering, however it is worth nothing that this was an accidental reference to Supplementary Figure 2b, and should’ve been Supplementary Figure 2e.

      11) Line 230, in segmented schizonts the Golgi was considered to be at the apical end. It might be more precise to call its location to be close to the nucleus on the side facing the apical end of the parasite. It seems to me it often tends to be closer to the nucleus (in line with its proximity to the ER, see also point 13).

      We have added more detail to this description clarifying that despite being at the apical end, the Golgi is closer to the nucleus.

      12) Supp Fig. S5: Is the top cell indeed a ring? In the second cell there seem to be two nuclei, I assume this is a double infection (please indicate this in the legend or use images of a single infection).

      In our opinion, the top cell in Supplementary Figure 5 is a ring. This is based on its size and its lack of an observable food vacuole (an area that lacks NHS ester staining). We typically showed images of ameoboid rings to avoid this ambiguity, but we think this parasite is a ring nonetheless. For the second image, this parasite is not doubly infected, as both DNA masses are actually contained within the same dumbbell shaped nuclear envelope. This parasite is likely undergoing its first anaphase (or the Plasmodium equivalent of anaphase) and will likely soon undergo its first nuclear division to separate these two DNA masses into individual nuclei.

      13) Line 244: I would not call the Golgi a part of the apical cluster of organelles. All secretory cargo originates from the ER-Golgi-transGolgi axis in a directional manner and this axis is connected to the nucleus by the perinuclear ER. If seen from a secretory pathway centred view, it is the other way around and you could call the apical organelles part of the nuclear periphery which would be equally non-ideal.

      Everything is close together in such a small cell. The secretory pathway likely is arranged in a serial manner starting from the perinuclear region to the transGolgi where cargo is sorted into vesicles for different destinations of which one is for the delivery of material to the apical organelles. The proposition that the Golgi is part of the apical cluster therefore somehow feels wrong, as the Golgi can still be considered to be upstream of the transGolgi before apical cargo branches off from other cargo destined for other destinations We agree with the reviewer that claiming a functional association between the Golgi and the apical organelles would be odd and we by no means meant to imply such functional grouping. Our intent was to confirm observations previously made about Golgi positioning by electron microscopy studies such as Bannister et al. (2000) at a larger spatial and temporal scale. These studies make the observation that the Golgi is spatially associated with the rhoptries at the apical end of the parasites. Logically, the Golgi is tied to the apical organelles through the secretory pathway as the reviewer suggests, but we claim no further relationship beyond that of organelle biogenesis. We have made modifications to the text to clarify these points.

      14) Lines 300 - 308 (and thereafter): I assume these were also expanded parasites and the microtubule length is given after correction for expansion. I would recommend to indicate in line 274 (when first explaining the expansion factor) that all following measurements in the text represent corrected measures or, if this is not always the case, indicate on each occasion. Is the expansion factor accurate and homogenous enough to draw firm conclusions (see also point 1)? Could it be a reason for the variation seen with SPMTs? Could a cellular reference be used as a surrogate to account for cell specific expansion or would you assume that cellular substructure specific expansion differences exist and prevent this?

      This is correct, the reported number is the number corrected for expansion factor, and the corresponding graphs with uncorrected data are present in the Supplementary Figures. We have clarified this in the text. Uneven expansion can be caused when certain organelles/structures do not properly denature. Given that out protocol denatures using highly concentrated SDS at 95 °C for 90 minutes, we do not anticipate that any subcellular compartments would expand significantly differently. In this study our expansion factors varied from ~4.1-4.7 across all gels, and for our corrected values we used the median expansion factor of 4.25. If we are interpreting the length of an interpolar spindle as 20 µm for example, the value would be corrected value would be 4.7 µm when divided by the median expansion factor, 4.9 µm when divided by the lowest, and 4.2 µm when divided by the highest. These values fall well within the measurement error, and so we expect that these small deviations in expansion factor between gels have a fairly minimal influence on variation in microtubule lengths.

      15) Line 353: this is non-essential, but a 3D view of the broken basal ring might better illustrate the 2 semicircles

      We have added the following panel to Supplementary Figure 3 to illustrate this more clearly:

      Author response image 2.

      16) The way the figure legends are shaped, it often seems only panel (a) is from expansion microscopy while the microscopy images in the other parts of the figures have no information on the method used. I assume all images are from expansion microscopy, maybe this could be clarified by placing this statement in a position of the legend that makes it clear it is for all images in a figure.

      This has been clarified in the figure legends.

      17) Fig. 8b, is it clear that internal RON4 is not below or above? Consider showing a 3D representation or side view of these max projections.

      If in these images, we imagine we are looking at the ‘top’ of the rhoptries, our feeling is that the RON4 signal is on the ‘bottom’, at the part closest to the apical polar ring. We tried projecting this, however, but the images were not particularly due to spherical aberrations. Because of this, we have refrained from commenting on the RON4 location relative to the rhoptry bulb prior to elongation.

      18) Line 684 "...distribution or RON4": replace or with of. The information of the next sentence is partly redundant, consider adding it in brackets.

      This has been addressed.

      19) Fig. 9a the EBA175 signal is not very prominent and a bit noisy, are the authors confident this is indeed showing only EBA175 or is there also some background?-AK

      We agree with the reviewer that the EBA175 antibody shows a significant amount of background fluorescence, specially in the food vacuole area. However, we think the puncta corresponding to micronemal EBA175 can be clearly distinguished from background.

      20) Fig. 9b, the long appearance of the micronemes in the z-dimension likely is due to axial stretch (due to point spread function in z and refractive index mismatch), in reality they probably are more spherical. It might be worth mentioning somewhere that this likely is not how these organelles are really shaped in that dimension (spherical fluorescent beads could give an estimation of that effect in the microscopy setup used).

      After recently acquiring a water-immersion objective lens for comparison, it is clear that the transition from oil to hydrogel causes a degree of spherical aberration in the Z-plane, which in this instance causes the micronemes to be more oblong. As we make no conclusions based on the shape of the micronemes, however, we don’t think this is a significant consideration. This is an assumption that should be made when looking at any image whose resolution is not equal in all 3-dimensions. We also note that the more spherical shape of micronemes can be inferred from the max intensity projections in Figure 9c.

      21) Fig. 9b, the authors mention in the text that there is NHS ester signal that overlaps with the fluorescence signal, can occasions of this be indicated in the figure?

      Figure 9b was already quite busy, so we instead added the following extra panel to this figure that more clearly shows the NHS punctae we thought may have been micronemes:

      Author response image 3.

      22) Fig. 9, line 695, the authors write that the EBA puncta were the same size as AMA1 puncta. To me it seems the AMA1 areas are larger than the EBA foci, is their size indeed similar? Was this measured?

      Since we did not conduct any measurements and doing so robustly would be difficult given the density of the puncta, we have decided to remove our comment on the relative size of the puncta.

      23) Materials and methods: Remove "to" in line 871; explain bicarb and incomplete medium in line 885 (non-malaria researchers will not understand what is meant here); line 911 and start of 912 seem somewhat redundant

      This has been addressed.

      24) Is there more information on what the Airyscan processing at moderate filter level does? The background of the images seems to have an intensity of 0 which in standard microscopy images should be avoided (see for instance doi:10.1242/jcs.03433) similar to the general standard of avoiding entirely white backgrounds on Western blots. I understand that some background subtraction processes will legitimately result in this but then it would be nice to know a bit better what happened to the original image.

      We have taken the following excerpt from a publication on Airyscan to help clarify:

      "Airyscan processing consists of deconvolution and pixel reassignment, which yield an image with higher resolution and reduced noise. This can be a contributor to the low background in some channels. The level of filtering is the processing strength, with higher filtering giving higher resolution but increased chances of artefacts. More information about the principles behind Airyscan processing can be found in the following two publications, though details on the algorithm itself seem to be proprietary: Huff, Joseph. "The Airyscan detector from ZEISS: confocal imaging with improved signal-to-noise ratio and super-resolution." (2015): i-ii. AND Wu, Xufeng, and John A. Hammer. "Zeiss airyscan: Optimizing usage for fast, gentle, super-resolution imaging." Confocal Microscopy: Methods and Protocols. New York, NY: Springer US, 2021. 111-130."

      We cannot find any further information about the specifics of Airyscan filtering, however, the moderate filter that we used is the default setting. This information was included just for clarity, rather than something we determined by comparison to other filtering settings.

      In regards to the background, the majority of some images having an intensity value of 0 is partially out of our control. For all NHS Ester images, the black point of the images was 0 so areas that lack signal (white in the case of NHS Ester) truly had no signal detected for those pixels. While we appreciate that never altering the black point of images displays 100% of the data in the image, images with any significant background can become impossibly difficult to interpret. We have done our best to try and present images where the black point is modified to remove background for ease of interpretation by the readers only.

      Reviewer #3 (Public Review):

      1) Most importantly, in order to justify the authors claim to provide an "Atlas", I want to strongly suggest they share their raw 3D-imaging data (at least of the main figures) in a data repository. This would allow the readers to browse their structure of interest in 3D and significantly improve the impact of their study in the malaria cell biology field.

      We agree completely that the potential impact of this study is magnified by public sharing of the data. The reason that this was not done at the time of submission is that most public repositories do not allow continued deposition of data, and so new images included in response to reviewers comments would’ve been separated from the initial submission, which we saw as needlessly complicated. All 647 images that underpin the results discussed in this manuscript are now publicly available in Dryad (https://doi.org/10.5061/dryad.9s4mw6mp4)

      2) The organization of the manuscript can be improved. Aside some obvious modifications as citing the figures in the correct order (see also further comments and recommendations), I would maybe suggest one subsection and one figure per analyzed cellular structure/organelle (i.e. 13 sections). This would in my opinion improve readability and facilitate "browsing the atlas".

      This is actually how we had originally formatted this manuscript, but this structure made discussing inter-connected organelles, such as the IMC and basal complex, impossibly difficult to navigate. We have done our best to make the manuscript flow better, but have not come up with any way to greatly restructure the manuscript so to increase its readability.

      3) Considering the importance of reliability of the U-ExM protocol for this study the authors should provide some validation for the isotropic expansion of the sample e.g. by measuring one well defined cellular structure.

      The protocol we used comes from the Bertiaux et al., 2021 PLoS Biology study. In this study they show isotropic expansion of blood-stage parasites.

      4) In the absence of time-resolved data and more in-depth mechanistic analysis the authors must down tone some of their conclusions specifically around mitochondrial membrane potential, subpellicular microtubule depolymerization, and kinetics of the basal complex.

      Our conclusions regarding mitochondrial membrane potential and basal complex kinetics have been dampened. We have not, however, changed our wording around microtubule depolymerisation. Partial depolymerisation of microtubules during fixation is a known phenomenon in Plasmodium, and in our opinion, our explanation of this offers a hypothesis that is balanced with respective to evidence: “we hypothesise that most SPMTs measured in our C1-treated schizonts had partially depolymerised. P. falciparum microtubules are known to rapidly depolymerise during fixation10,29. It is unclear, however, why this depolymerization was observed most often in C1-arrested parasites. Thus, we cannot determine whether these shorter microtubules are a by-product of drug-induced arrest or a biologically relevant native state that occurs at the end of segmentation.”

      5) The observation that the centriolar plaque extensions remains consistently tethered to the plasma membrane is of high significance. To more convincingly demonstrate this point, it would be very helpful to show one zoomed-in side view of nucleus with a mitotic spindle were both centriolar plaques are in contact with the plasma membrane.

      We of course agree that this is one of our most important observations, but in our opinion this is already demonstrated in Figure 2b. The third panel from the right shows a mitotic spindle and has the location of the cytoplasmic extensions, nuclear envelope and parasite plasma membranes annotated.

      6) Please verify the consistent use of the term trophozoite and schizont. In Fig. 1c a parasite with two nuclei, likely in the process of karyofission is designated as trophozoite, which contrasts with the mononucleated trophozoite shown in Fig. 1a. The reviewer is aware of the more "classical" description of the schizont as parasite with more than 2 nuclei, but based on the authors advanced knowledge of cell cycle progression and mitosis I would encourage them to make a clear distinction between parasites that have entered mitotic stages and pre-mitotic parasites (e.g. by applying the term schizont, and trophozoite, respectively).

      For this study, we have interpreted any parasite having three or more nuclei as being a schizont. We are aware this morphological interpretation is not universally held and indeed suboptimal for studying some aspects of parasite development, but all definitions of a schizont have some drawbacks. Whether a parasite has entered mitosis or not is obviously a hugely significant event in the context of cell biology, but in a mononucleated parasite this could only be determined using immunofluorescence microscopy with cell cycle or DNA replication markers.

      7) Aldolase does not localize diffusely in the cytoplasm in schizont stages as in contrast to earlier stage. The authors should comment on that.

      We are unclear if this is an interpretation of the images in supplementary figure 1, or inferred from other studies. If this is an interpretation of the images in Supplementary Figure 1, we do not agree that the images show a significant change in the localisation of aldolase. It is possible that this difference in interpretation comes from the strong punctate signal observed more readily in the schizont images. This is the strong background signal in or around the food vacuole we mention in the text. These punctae are significantly brighter than the cytosolic aldolase signal, making it difficult to see them on the aldolase only channel, but aldolase signal can clearly be seen in the cytoplasm on the merge images.

      8) Line 79. Uranyl acetate is just one of the contrasting agents used in electron microscopy. The authors might reformulate this statement. Possibly this would also be a good opportunity to briefly mention that electron density measured in EM and protein-density labeled by NHS-Ester can be similar but are not equivalent.

      We have expanded on this in the text.

      9) The authors claim that they investigate the association between the MTOC and the APR (line 194), but strictly speaking only look at subpellicular microtubules and an associated protein density. The argument that there is a "NHS ester-dense focus" (line 210) without actual APR marker is not quite convincing enough to definitively designate this as the APR.

      While an APR marker would of course be very useful, there are currently no published examples of APR markers in blood-stage parasites. We therefore think that the timing of appearance, location, and staining density are sufficient for identifying this structure as the APR, as it has previously been designated through EM studies. We have nonetheless softened our language around APR-related observations.

      10) Line 226: The authors should also discuss the organization of the Golgi in early schizonts (Fig. S4). (not only 2 nuclei and segmenter stages).

      We did not mean to imply that all 22 parasites had only 2 nuclei, but instead that they had 2 or more nuclei. Therefore, early schizonts are included in this analysis, with Golgi closely associated with all their MTOCs.

      11) Line 242: To the knowledge of the reviewer the nuclear pore complexes, although clustered in merozoites and ring stages, don't particularly "define the apical end of the parasite".

      The MTOC is surrounded by NPCs, which because of the location of the MTOC end up being near the forming apical end of the merozoite, but we have removed this as it was needlessly confusing.

      12) Supplementary Figure 8 is missing (it's a repetition of Fig. S6).

      This has been addressed.

      13) Line 253: asexual blood stage parasites have two classes of MTs. Other stages can have more.

      This has been clarified.

      14) Fig. 3f: Please comment how much of these observations of "only one" SPMT could result from suboptimal resolution (e.g. in z-direction) or labeling. Otherwise use line profiles to argue that you can always safely distinguish SPMT pairs.

      In the small number of electron tomograms of merozoites where the subpellicular microtubules have been rendered, they have been seen to have 2 or 3 SPMTs. Despite this, we don’t think it is likely that the single SPMT merozoites observed in this study are caused by a resolution limitation. SPMTs were measured in 3D, rather than from projections, and any schizont where the SPMTs were pointing towards the objective lens, elongating the parasite in Z, were not imaged. Additionally, our number of merozoites with a single SPMT correspond with the same data collected in the Bertiaux et al., 2021 PLoS Biology study. We cannot rule this out as a possibility, as sometimes SPMTs cross over each other in three-dimensions, and at these intersection points they cannot be individually resolved. We, however, think it is very unlikely that two SPMTs would be so close that they can never be resolved across any part of their length.

      15) Lines 302ff: the claim that variability in SPMT size must be a consequence of depolymerzation is unfounded. The dynamics of SPMT are unknown at this point. Similarly unfounded is the definitive claim that it is known that P.f. MTs depolymerize upon fixation. Other possibilities should be considered. SPMT could also simply shorten in C1-arrested parasites.

      While we agree with the reviewer that much about SPMT dynamics in schizonts remains unknown, we disagree with the claim that our consideration of SPMT depolymerization as a possible explanation for our observations is unfounded. Microtubule depolymerization is a well-known fixation and sample preparation artefact in both mammalian cells and a well-documented phenomenon in Plasmodium when parasites are washed with PBS prior to fixation. We convey in the text our belief that it is possible that SPMTs shorten in C1-arrested parasites as a result of drug treatment. However, it is our opinion that there simply is not enough evidence at this moment to conclusively pinpoint the cause of our observed depolymerization. As we mention in the text, further experiments are needed in order to determine with confidence whether depolymerization is a consequence of our fixation protocol, a consequence of C1 treatment (or the length of that treatment), or a biological phenomenon resulting from parasite maturation.

      16) Line 324: "up to 30 daughter merozoites"

      Schizonts can have more than 30 daughter merozoites, so we have not altered this statement.

      17) Figure 4b. Line 354 The postulated breaking in two is not well visible and here the authors should attempt a more conservative interpretation of the data (especially with respect to those early basal complex dynamics).

      We think that the basal complex dividing or breaking in two is the more conservative interpretation of our data. There is no evidence to suggest that a second basal complex is formed de novo and, while never before described using a basal complex protein, the cramp-like structure and dynamics we observe are consistent with that observed in early IMC proteins. We have updated the text to provide additional context and make the reasoning behind our hypothesis clearer.

      18) Line 365: Commenting on their relative size would require a quantification of APR and basal complex size (can be provided in the text).

      We are unsure what this is in reference to, as there is no mention of the APR in the basal complex section.

      19) Lines 375ff: The claim that NHS Ester is a basal complex marker should be mitigated or more convincing images without the context of anti-CINCH staining being sufficient to identify the ring structure should be presented.

      We have provided high quality, zoomed-in images without anti-CINCH staining in Fig. 5D&E, 6C, 7b, and Supplementary Fig. 8 that show that even in the absence of a basal complex antibody, the basal complex still stains densely by NHS ester.

      20) Line 407: The claim that there are differences in membrane potential along the mitochondria needs to be significantly mitigated. There are several alternative explanations of this staining pattern (some of which the authors name themselves). Differences in local compartment volume, differences in membrane surface, diffusibility/leakage of the dye can definitively play a role in addition to fixation and staining artefacts (also brought forward recently for U-ExM by Laporte et al. 2022 Nat Meth). Confirming the hypothesis of the authors would need significantly more experimental evidence that is outside the scope of this study.

      We have significantly dampened and qualified the wording in this section. It now reads: “These clustered areas of Mitotracker staining were highly heterogeneous in size and pattern. Small staining discontinuities like these are commonly observed in mammalian cells when using Mitotracker dyes due to the heterogeneity of membrane potential from cristae to cristae as well as due to fixation artifacts. At this point, we cannot determine whether the staining we observed represents a true biological phenomenon or an artefact of this sample preparation approach. Our observed Mitotracker-enriched pockets could be an artifact of PFA fixation, a product of local membrane depolarization, a consequence of heterogeneous dye retention, or a product of irregular compartments of high membrane potential within the mitochondrion, to mention a few possibilities. Further research is needed to conclusively pinpoint an explanation.”

      21) Fig. 7e: The differences in morphology using different fixation methods are interesting. Can the authors provide a co-staining of K13-GFP together with the better-preserved structures in the GA-containing fixation protocol to demonstrate that these are indeed cytostome bulbs?

      Figure 7 has been changed substantially to show more clearly the preservation of the red blood cell membrane following PFA-GA fixation, followed by direct comparison of K13-GFP stained parasites fixed in either PFA only or PFA-GA. The cytostome section of the results has also changed to reflect this, the changed section now reads:

      “PFA-glutaraldehyde fixation allows visualization of cytostome bulb The cytostome can be divided into two main components: the collar, a protein dense ring at the parasite plasma membrane where K13 is located, and the bulb, a membrane invagination containing red blood cell cytoplasm {Milani, 2015 #63;Xie, 2020 #62}.While we could identify the cytostomal collar by K13 staining, these cytostomal collars were not attached to a membranous invagination. Fixation using 4% v/v paraformaldehyde (PFA) is known to result in the permeabilization of the RBC membrane and loss of its cytoplasmic contents65. Topologically, the cytostome is contiguous with the RBC cytoplasm and so we hypothesised that PFA fixation was resulting in the loss of cytostomal contents and obscuring of the bulb. PFA-glutaraldehyde fixation has been shown to better preserve the RBC cytoplasm65. Comparing PFA only with PFA-glutaraldehyde fixed parasites, we could clearly observe that the addition of glutaraldehyde preserves both the RBC membrane and RBC cytoplasmic contents (Figure 7c). Further, while only cytostomal collars could be observed with PFA only fixation, large membrane invaginations (cytostomal bulbs) were observed with PFA-glutaraldehyde fixation (Figure 7d). Cytostomal bulbs were often much longer and more elaborate spreading through much of the parasite (Supplementary Video 1), but these images are visually complex and difficult to project so images displayed in Figure 7 show relatively smaller cytostomal bulbs. Collectively, this data supports the hypothesis that these NHS-ester-dense rings are indeed cytostomes and that endocytosis can be studied using U-ExM, but PFA-glutaraldehyde fixation is required to maintain cytostome bulb integrity.”

      22) It would be helpful to the readers to indicate in the schematic in Fig. 1b at which point NHS-Ester staining is implemented.

      Figure 1b is slightly simplified in the sense that it doesn’t differentiate primary and secondary antibody staining, but we have updated it to reflect that antibody and dye staining are concurrent, rather than separate.

      23) In Fig. 2B the second panel from the right the nuclear envelope boundary does not seem to be accurately draw as it includes the centrin signal of the centriolar plaque.

      Thank you for pointing this out, it has now been redrawn.

      24) Line 44-45: should read "up to 30 new daughter merozoites" (include citations).

      We have included a citation here, but left it as approximately 30 daughter merozoites as the study found multiple cells with >30 daughter merozoites.

      25) Line 49: considering its discovery in 2015 the statement that it has gained popularity in the last decade can probably be omitted.

      This has been removed.

      26) Fig S1 should probably read "2N" (instead of "2n"). Or alternatively "2C" could be fine.

      27) Line 154: To help comprehension please define the term "branch number" in this context when it comes up.

      A definition for branch has now been provided.

      28) Fig. S5: To my estimation it is not an "early trophozoite", which is depicted.

      While this parasite technically fits our definition of trophozoite, as it has not yet undergone nuclear division, we have swapped it for a visibly earlier parasite for clarity. This is the new parasite depicted

      Author response image 4.

      29) Fig. 2a is not referenced before Fig. 2b in the text.

      This has been addressed.

      30) I could not find the reference to Fig. S2e and its discussion.

      It was wrongly labelled as Supplementary Figure 2b in the text, this has now been addressed.

      31) The next Figure referenced in the text after Fig. 2b is Fig. 4b. Fig.3 is only referenced and discussed later, which was quite confusing.

      The numbering discrepancies have been addressed.

      32) Line 196: Figure reference is missing.

      This data did not have a figure reference, but the numbers have now been provided in-text.

      33) Fig. 3c: Is "Branches per MTOC" not just total branches divided by two? If so it can be omitted. If not so please explain the difference.

      Yes it was total branched divided by two, this has been removed from Figure 3c.

      34) Figure 5c and 6d: The authors should show examples of the image segmentation used to calculate the surface area.

      Surface area calculation was done in an essentially one step process. From maximum intensity projections, free-hand regions of interest were drawn, from which ZEN automatically calculates their area. Example as Author response image 5:

      Author response image 5.

      35) Figure 7b should also show the NHS Ester staining alone for the zoom in.

      We have included the NHS ester staining alone on the zoom on, but we have slightly changed the presentation of these two panels to show both the basal complex and cytostomes as follows:

      Author response image 6.

      36) To which degree are Rhoptry necks associated with MTOC extensions?

      This cannot easily be determined with the images we have so far. Before elongated necks are visible, the RON4 signal does appear pointed towards the MTOC extensions. Rhoptry necks don’t seem to elongate until segmentation, when the MTOC starts to move away from the apical end of the parasite. So it is possible there is a transient association, but we cannot easily discern this from our data.

    1. Author Response:

      Reviewer #1 (Public Review):

      Despite numerous studies on quinidine therapies for epilepsies associated with GOF mutant variants of Slack, there is no consensus on its utility due to contradictory results. In this study Yuan et al. investigated the role of different sodium selective ion channels on the sensitization of Slack to quinidine block. The study employed electrophysiological approaches, FRET studies, genetically modified proteins and biochemistry to demonstrate that Nav1.6 N- and C-tail interacts with Slack's C-terminus and significantly increases Slack sensitivity to quinidine blockade in vitro and in vivo. This finding inspired the authors to investigate whether they could rescue Slack GOF mutant variants by simply disrupting the interaction between Slack and Nav1.6. They find that the isolated C-terminus of Slack can reduce the current amplitude of Slack GOF mutant variants co-expressed with Nav1.6 in HEK cells and prevent Slack induced seizures in mouse models of epilepsy. This study adds to the growing list of channels that are modulated by protein-protein interactions, and is of great value for future therapeutic strategies.

      I have a few comments with regard to how Nav1.6 sensitize Slack to block by quinidine.

      (1) It is not clear to me if the Slack induced current amplitude varies depending on the specific Nav subtype. To this end, it would be valuable to test if Slack open probability is affected by the presence of specific Nav subtypes. Nav induced differences in Slack current amplitude and open probability could explain why individual Nav subtypes show varied ability to sensitize Slack to quinidine blockade.

      We appreciate the reviewer for raising this point. In order to address whether the whole-cell current amplitudes of Slack varies depending on the specific NaV subtype, we examined Slack current amplitudes upon co-expression of Slack with specific NaV subtypes in HEK293 cells. The results have shown that there are no significant differences in Slack current amplitudes upon co-expression of Slack with different NaV channel subtypes (Author response image 1), suggesting whole-cell Slack current amplitudes cannot explain the varied ability of NaV subtypes to sensitize Slack to quinidine blockade. To investigate the effect of different NaV channel subtypes on Slack open probability, we will perform the single-channel recordings in the future studies.

      Author response image 1.

      The amplitudes of Slack currents upon co-expression of Slack with specific NaV subtypes in HEK293 cells. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      (2) It has previously been shown that INaP (persistent sodium current) is important for inducing Slack currents. Here the authors show that INaT (transient sodium current) of Nav1.6 is necessary for the sensitization of Slack to quinidine block whereas INaP surprisingly has no effect. The authors then show that the N-tail together with C-tail of Nav1.6 can induce same effect on Slack as full-length Nav1.6 in presence of high intracellular concentrations of sodium. However, it is not clear to me how the isolated N- and C-tail of Nav1.6 can induce sensitization of Slack to quinidine by interacting with C-terminus of Slack, while sensitization also is dependant on INaT. The authors speculate on different slack open conformation, but one could speculate if there is a missing link, such as an un-identified additional interacting protein that causes the coupling.

      We fully agree the importance of investigating the detailed mechanism underlying the sensitization of Slack to quinidine blockade mediated by the N- and C-termini of NaV1.6. Regarding the possibility of additional interacting proteins (“missing link”) that mediate the coupling between Slack and NaV1.6, our GST-pull down assays involving Slack and the N- and C-termini of NaV1.6 (Fig. S7) suggest a direct interaction between Slack and NaV1.6 channels. This finding leads us to consider the possibility of additional interacting proteins might be excluded. In order to further address these questions, we plan to employ structural biological methods, such as cryo-electron microscopy (cryo-EM).

      Reviewer #2 (Public Review):

      This is a very interesting paper about the coupling of Slack and Nav1.6 and the insight this brings to the effects of quinidine to treat some epilepsy syndromes.

      Slack is a sodium-activated potassium channel that is important to hyperpolarization of neurons after an action potential. Slack is encoded by KNCT1 which has mutations in some epilepsy syndromes. These types of epilepsy are treated with quinidine but this is an atypical antiseizure drug, not used for other types of epilepsy. For sufficient sodium to activate Slack, Slack needs to be close to a channel that allows robust sodium entry, like Na channels or AMPA receptors. but more mechanistic information is not available. Of particular interest to the authors is what allows quinidine to be effective in reducing Slack.

      In the manuscript, the authors show that Nav, not AMPA receptors are responsible for Slack activation, at least in cultured neurons (HeK293, primary cortical neurons). Most of the paper focuses on the evidence that Nav1.6 promotes Slack sensitivity to quinidine.

      (1) The paper is very well written although there are reservations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system.

      We appreciate the reviewer's positive evaluation of our work. We acknowledge that utilizing a more intact system would provide valuable insights into the inhibitory effect of quinidine on Slack-NaV1.6. However, there are certain challenges associated with studying Slack currents in their entirety.

      First, in our experiments, isolating Slack currents from Na+-activated K+ currents in an intact system is challenging as selective inhibitors for Slick are currently unavailable. To address this, we propose using Slick gene knockout mice to specifically measure Slack currents under physiological conditions in the future investigations. Second, we have observed that the interaction between Slack and NaV1.6 primarily occurs at the axon initial segment of neurons. This poses a difficulty when using brain slices for measurements, as employing the whole-cell voltage-clamp technique to assess Slack at the axon initial segment may introduce systemic errors.

      We believe that testing the pharmacological effects of quinidine on Slack-NaV1.6 in primary neurons remains the optimal approach. Although non-neuronal cells or cultured primary neurons may not fully replicate the complexity of an intact system, they still provide valuable insights into the interactions between Slack and NaV1.6, and the effects of quinidine.

      (2) I also have questions about the figures.

      We will make the necessary modifications and clarifications based on the reviewer's comments:

      (3) Finally, riluzole is not a selective drug, so the limitations of this drug should be discussed.

      We thank the reviewer for raising this point. We will discuss the limitations of riluzole in our revised version of the manuscript.

      (4) On a minor point, the authors use the term in vivo but there are no in vivo experiments.

      We thanks the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the co-immunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      Reviewer #3 (Public Review):

      Yuan et al., set out to examine the role of functional and structural interaction between Slack and NaVs on the Slack sensitivity to quinidine. Through pharmacological and genetic means they identify NaV1.6 as the privileged NaV isoform in sensitizing Slack to quinidine. Through biochemical assays, they then determine that the C-terminus of Slack physically interacts with the N- and C-termini of NaV1.6. Using the information gleaned from the in vitro experiments the authors then show that virally-mediated transduction of Slack's C-terminus lessens the extent of SlackG269S-induced seizures. These data uncover a previously unrecognized interaction between a sodium and a potassium channel, which contributes to the latter's sensitivity to quinidine.

      The conclusions of this paper are mostly well supported by data, but some aspects of functional and structural studies in vivo as well as physically interaction need to be clarified and extended.

      (1) Immunolabeling of the hippocampus CA1 suggests sodium channels as well as Slack colocalization with AnkG (Fig 3A). Proximity ligation assay for NaV1.6 and Slack or a super-resolution microscopy approach would be needed to increase confidence in the presented colocalization results. Furthermore, coimmunoprecipitation studies on the membrane fraction would bolster the functional relevance of NaV1.6-Slac interaction on the cell surface.

      We thank the reviewer for good suggestions. We acknowledge that employing proximity ligation assay and high-resolution techniques would significantly enhance our understanding of the localization of the Slack-NaV1.6 coupling.

      At present, the technical capabilities available in our laboratory and institution do not support high-resolution testing. However, we are enthusiastic about exploring potential collaborations to address these questions in the future. Furthermore, we fully recognize the importance of conducting co-immunoprecipitation (Co-IP) assays from membrane fractions. While we have already completed Co-IP assays for total protein and quantified the FRET efficiency values between Slack and NaV1.6 in the membrane region, the Co-IP assays on membrane fractions will be conducted in our future investigations.

      (2) Although hippocampal slices from Scn8a+/- were used for studies in Fig. S8, it is not clear whether Scn8a-/- or Scn8a+/- tissue was used in other studies (Fig 1J & 1K). It will be important to clarify whether genetic manipulation of NaV1.6 expression (Fig. 1K) has an impact on sodium-activated potassium current, level of surface Slack expression, or that of NaV1.6 near Slack.

      We thank the reviewer for pointing this out. In Fig. 1G,J,K, primary cortical neurons from homozygous NaV1.6 knockout (Scn8a-/-) mice were used. We will clarify this information in the revised manuscript. In terms of the effects of genetic manipulation of NaV1.6 expression on IKNa and surface Slack expression, we compared the amplitudes of IKNa measured from homozygous NaV1.6 knockout (NaV1.6-KO) neurons and wild-type (WT) neurons. The results showed that homozygous knockout of NaV1.6 does not alter the amplitudes of IKNa (Author response image 2). The level of surface Slack expression will be tested further.

      Author response image 2.

      The amplitudes of IKNa in WT and NaV1.6-KO neurons (data from manuscript Fig. 1K). ns, p > 0.05, unpaired two-tailed Student’s t test.

      (3) Did the epilepsy-related Slack mutations have an impact on NaV1.6-mediated sodium current?

      We thank the reviewer’s question. We examined the amplitudes of NaV1.6 sodium current upon expression alone or co-expression of NaV1.6 with epilepsy-related Slack mutations (K629N, R950Q, K985N). The results showed that the tested epilepsy-related Slack mutations do not alter the amplitudes of NaV1.6 sodium current (Author response image 3).

      Author response image 3.

      The amplitudes of NaV1.6 sodium currents upon co-expression of NaV1.6 with epilepsy-related Slack mutant variants (SlackK629N, SlackR950Q, and SlackK985N). ns, p>0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      4) Showing the impact of quinidine on persistent sodium current in neurons and on NaV1.6-expressing cells would further increase confidence in the role of persistent sodium current on sensitivity of Slack to quinidine.

      We appreciate the reviewer’s question. Previous studies have shown that quinidine can inhibit persistent sodium currents at low concentrations1. In our experiments, blocking persistent sodium currents by application of riluzole in the bath solution showed no significant effects on the sensitivity of Slack to quinidine blockade upon co-expression of Slack with NaV1.6 (Fig. 2F,H). This result suggested that persistent sodium currents were not involved in the sensitization of Slack to quinidine blockade.

      1. Ju YK, Saint DA, Gage PW. Effects of lignocaine and quinidine on the persistent sodium current in rat ventricular myocytes. Br J Pharmacol. Oct 1992; 107(2):311-6. doi:10.1111/j.1476-5381.1992.tb12743.x
    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Chan et al. tried identifying the binding sites or pockets for the KCNQ1-KCNE1 activator mefenamic acid. Because the KCNQ1-KCNE1 channel is responsible for cardiac repolarization, genetic impairment of either the KCNQ1 or KCNE1 gene can cause cardiac arrhythmia. Therefore, the development of activators without side effects is highly demanded. Because the binding of mefenamic acid requires both KCNQ1 and KCNE1 subunits, the authors performed drug docking simulation by using KCNQ1-KCNE3 structural model (because this is the only available KCNQ1-KCNE structure) with substitution of the extracellular five amino acids (R53-Y58) into D39-A44 of KCNE1. That could be a limitation of the work because the binding mode of KCNE1 might differ from that of KCNE3. Still, they successfully identified some critical amino acid residues, including W323 of KCNQ1 and K41 and A44 of KCNE1. They subsequently tested these identified amino acid residues by analyzing the point mutants and confirmed that they attenuated the effects of the activator. They also examined another activator, yet structurally different DIDS, and reported that DIDS and mefenamic acid share the binding pocket, and they concluded that the extracellular region composed of S1, S6, and KCNE1 is a generic binding pocket for the IKS activators.

      The data are solid and well support their conclusions, although there are a few concerns regarding the choice of mutants for analysis and data presentation.

      Other comments:

      1. One of the limitations of this work is that they used psKCNE1 (mostly KCNE3), not real KCNE1, as written above. It is also noted that KCNQ1-KCNE3 is in the open state. Unbinding may be facilitated in the closed state, although evaluating that in the current work is difficult.

      We agree that it is difficult to evaluate the role of unbinding from our model. Our data showing that longer interpulse intervals have a normalizing effect on the GV curve (Figure 3-figure supplement 2) could be interpreted to suggest that unbinding occurs in the closed state. Alternatively, the slowing of deactivation caused by S1-S6 interactions and facilitated by the activators may effectively be exceeded at the longer interpulse intervals.

      1. According to Figure 2-figure supplement 2, some amino acid residues (S298 and A300) of the turret might be involved in the binding of mefenamic acid. On the other hand, Q147 showing a comparable delta G value to S298 and A300 was picked for mutant analysis. What are the criteria for the following electrophysiological study?

      EP experiments interrogated selected residues with significant contributions to mefenamic acid and DIDs coordination as revealed by the MM/GBSA and MM/PBSA methods. A300 was identified as potentially important. We did attempt A300C but were never able to get adequate expression for analysis.

      1. It is an interesting speculation that K41C and W323A stabilize the extracellular region of KCNE1 and might increase the binding efficacy of mefenamic acid. Is it also the case for DIDS? K41 may not be critical for DIDS, however.

      Yes, we found K41 was not critical to the binding/action of DIDS compared to MEF. In electrophysiological experiments with the K41C mutation, DIDS induced a leftward GV shift (~ -25 mV) whereas the normalized response was statistically non-significant. In MD simulation studies, we observed detachment of DIDS from K41C-Iks only in 3 runs out of 8 simulations. This is in contrast to Mef, where the drug left the binding site of K41C-Iks complex in all simulations.

      1. Same to #2, why was the pore turret (S298-A300) not examined in Figure 7?

      Again, we attempted A300C but could not get high enough expression.

      Reviewer #3 (Public Review):

      Weaknesses:

      1. The computational aspect of the work is rather under-sampled - Figure 2 and Figure 4. The lack of quantitative analysis on the molecular dynamic simulation studies is striking, as only a video of a single representative replica is being shown per mutant/drug. Given that the simulations shown in the video are extremely short; some video only lasts up to 80 ns. Could the author provide longer simulations in each simulation condition (at least to 500 ns or until a stable binding pose is obtained in case the ligand does not leave the binding site), at least with three replicates per each condition? If not able to extend the length of the simulations due to resources issue, then further quantitative analysis should be conducted to prove that all simulations are converged and are sufficient. Please see the rest of the quantitative analysis in other comments.

      We provide more quantitative analysis for the existing MD simulations and ran five additional simulations with 500 ns duration by embedding the channel in a POPC lipid membrane. For the new MD simulations, we used a different force field in order to minimize ambiguity related to force fields as well. Analysis of these data has led to new data and supplemental figures regarding RMSD of ligands during the simulations (Figure 4-figure supplement 1 and Figure 6-figure supplement 3), clustering of MD trajectories based on Mef conformation (Figure 2-figure supplement 3 and Figure 6 -figure supplement 2), H-bond formation over the simulations (Figure 2-figure supplement 4 and Figure 6-figure supplement 1). We have edited the manuscript to include this new information where appropriate.

      1. Given that the protein is a tetramer, at least 12 datasets could have been curated to improve the statistic. It was also unclear how frequently the frames from the simulations were taken in order to calculate the PBSA/GBSA.

      By using one ligand for each ps-IKs channel complex we tried to keep the molecular system and corresponding analysis as simple as was possible. Our initial results have shown that 4D docking and subsequent MD simulations with only one ligand bound to ps-IKs was complicated enough. Our attempts to dock 4 ligands simultaneously and analyze the properties of such a system were ineffective due to difficulties in: i) obtaining stable complexes during conformational sampling and 4D docking procedures, since the ligand interaction covers a region including three protein chains with dynamic properties, ii) possible changes of receptor conformation properties at three other subunits when one ligand is already occupying its site, iii) marked diversity of the binding poses of the ligand as cluster analysis of ligand-channels complex shows (Figure 2-figure supplement 3).

      We have added a line in the methods to clarify the use of only one ligand per channel complex in simulations.

      In order to calculate MMPBSA/MMGBSA we used a frame every 0.3 ns throughout the 300 ns simulation (1000 frames/simulation) or during the time the ligand remained bound. We have clarified this in the Methods.

      1. The lack of labels on several structures is rather unhelpful (Figure 2B, 2C, 4B). The lack of clarity of the interaction map in Figures 2D and 6A.

      We updated figures considering the reviewer's comments and added labels. For 2D interaction maps, we provided additional information in figure legends to improve clarity.

      1. The RMSF analysis is rather unclear and unlabelled thoroughly. In fact, I still don't quite understand why n = 3, given that the protein is a tetramer. If only one out of four were docked and studied, this rationale needs to be explained and accounted for in the manuscript.

      The rationale of conducting MD simulations with one ligand bound to IKs is explained in response to point 2 of the reviewer’s comments.

      RMSF analysis in Figure 4C-E was calculated using the chain to which Mef was docked but after Mef had left the binding site. Details were added to the methods.

      1. For the condition that the ligands suppose to leave the site (K42C for Mef and Y46A for DIDS), can you please provide simulations at a sufficient length of time to show that ligand left the site over three replicates? Given that the protein is a tetramer, I would be expecting three replicates of data to have four data points from each subunit. I would be expecting distance calculation or RMSD of the ligand position in the binding site to be calculated either as a time series or as a distribution plot to show the difference between each mutant in the ligand stability within the binding pocket. I would expect all the videos to be translatable to certain quantitative measures.

      We have shown in the manuscript that the MEF molecule detaches from the K41C/IKs channel complex in all three simulations (at 25 ns, 70 ns and 20 ns, Table. 4). Similarly, the ligand left the site in all five new 500 ns duration simulations. We did not provide simualtions for Y46A, but Y46C left the binding site in 4 of 5 500 ns simulations and changed binding pose in the other.

      Difficulties encountered upon extending the docking and MD simulations for 4 receptor sites of the channel complex is discussed in our response to point # 2 of the reviewer.

      1. Given that K41 (Mef) and Y46 are very important in the coordination, could you calculate the frequency at which such residues form hydrogen bonds with the drug in the binding site? Can you also calculate the occupancy or the frequency of contact that the residues are making to the ligand (close 4-angstrom proximity etc.) and show whether those agree with the ligand interaction map obtained from ICM pro in Figure 2D?

      We thank the reviewer for the suggestion to analyze the H-bond contribution to ligand dynamics in the binding site. In the plots shown in Figure 2-figure supplement 4 and Figure 6-figure supplement 1, we now provide detailed information about the dynamics of the H-bond formation between the ligand and the channel-complex throughout simulations. In addition, we have quantified this and have added these numbers to a table (Table 2) and in the text of the results.

      1. Given that the author claims that both molecules share the same binding site and the mode of ligand binding seems to be very dynamic, I would expect the authors to show the distribution of the position of ligand, or space, or volume occupied by the ligand throughout multiple repeats of simulations, over sufficient sampling time that both ligand samples the same conformational space in the binding pocket. This will prove the point in the discussion - Line 463-464. "We can imagine a dynamic complex... bind/unbind from Its at a high frequency".

      To support our statement regarding a dynamic complex we analyzed longer MD simulations and clustered trajectories, from this an average conformation from each cluster was extracted and provided as supplementary information which shows the different binding modes for Mef (Figure 2-figure supplement 3). DIDS was more stable in MD simulations and though there were also several clusters, they were similar enough that when using the same cut-off distance as for mefenamic acid, they could be grouped into one cluster. (Note the scale differences on dendrogram between Figure 2-figure supplement 3 and Figure 6-figure supplement 2).

      1. I would expect the authors to explain the significance and the importance of the PBSA/GBSA analysis as they are not reporting the same energy in several cases, especially K41 in Figure 2 - figure supplement 2. It was also questionable that Y46, which seems to have high binding energy, show no difference in the EPhys works in figure 3. These need to be commented on.

      Several studies indicate that G values calculated using MM/PBSA and MM/GBSA methods may vary. Some studies report marked differences and the reasons for such a discrepancy is thoroughly discussed in a review by Genheden and Ryde (PMID: 25835573). Therefore, we used both methods to be sure that key residues contributing to ligand binding identified with one method appear in the list of residues for which the calculations are done with the other method.

      Y46C which showed only a slightly less favorable binding energy and did not unbind during 300 ns simulations, unbound, or changed pose in 4 out of 5 of the longer simulations in the presence of a lipid membrane (Figure 4-figure supplement 1). The discrepancy between electrophysiological and MD data is commented in the manuscript (pages 12-13).

      1. Can the author prove that the PBSA/GBSA analysis yielded the same average free energy throughout the MD simulation? This should be the case when the simulations are converged. The author may takes the snapshots from the first ten ns, conduct the analysis and take the average, then 50, then 100, then 250 and 500 ns. The author then hopefully expects that as the simulations get longer, the system has reached equilibrium, and the free energy obtained per residue corresponds to the ensemble average.

      As we mention in the manuscript, MEF- channel interactions are quite dynamic and vary even from simulation to simulation. The frequent change of the binding pose of the ligands observed during simulations (represented in Figure 2 - figure supplement 3 as clusters) is a clear reflection of such a dynamic process. Therefore, we do not expect the same average energy throughout the simulation but we do expect that G values stands above the background for key residues, which was generally the case (Figure 2 - figure supplement 2 and Figure 6.)

      1. The phrase "Lowest interaction free energy for residues in ps-KCNE1 and selected KCNQ1 domains are shown as enlarged panels (n=3 for each point)" needs further explanation. Is this from different frames? I would rather see this PBSA and GBSA calculated on every frame of the simulations, maybe at the one ns increment across 500 ns simulations, in 4 binding sites, in 3 replicas, and these are being plotted as the distribution instead of plotting the smallest number. Can you show each data point corresponding to n = 3?

      The MMPBSA/MMGBSA was calculated for 1000 frames across 3x300 ns simulations with 0.3 ns sampling interval, together 3000 frames, shown in Figure 2-figure supplement 2 and includes error bars to show the differences across runs. We have updated the legend for greater clarity.

      1. I cannot wrap my head around what you are trying to show in Figure 2B. This could be genuinely improved with better labelling. Can you explain whether this predicted binding pose for Mef in the figure is taken from the docking or from the last frame of the simulation? Given that the binding mode seems to be quite dynamic, a single snapshot might not be very helpful. I suggest a figure describing different modes of binding. Figure 2B should be combined with figure 2C as both are not very informative.

      We have updated Figure 2B with better labelling and added a new figure showing the different modes of binding (Figure 2-figure supplement 3).

      1. Similar to the comment above, but for Figure 4B. I do not understand the argument. If the author is trying to say that the pocket is closed after Mef is removed - then can you show, using MD simulation, that the pocket is openable in an apo to the state where Mef can bind? I am aware that the open pocket is generated through batches of structures through conformational sampling - but as the region is supposed to be disordered, can you show that there is a possibility of the allosteric or cryptic pocket being opened in the simulations? If not, can you show that the structure with the open pocket, when the ligand is removed, is capable of collapsing down to the structure similar to the cryo-EM structure? If none of the above work, the author might consider using PocketMiner tools to find an allosteric pocket (https://doi.org/10.1038/s41467-023-36699-3) and see a possibility that the pocket exists.

      Please see the attached screenshot which depicts the binding pocket from the longest run we performed (1250 ns) before drug detachment (grey superimposed structures) and after (red superimposed structures). Mefenamic acid is represented as licorice and colored green. Snapshots for superimposition were collected every 10 ns. As can be seen in the figure, when the drug leaves the binding site (after 500 ns, structures colored red), the N-terminal residue of psKCNE1, W323, and other residues that form the pocket shift toward the binding site, overlapping with where Mefenamic acid once resided. The surface structure in Figure 4B shows this collapse.

      Author response image 1.

      In the manuscript, we propose that drug binding occurs by the mechanism that could be best described by induced fit models, which state that the formation of the firm complexes (channel-Mef complex) is a result of multiple-states conformational adjustments of the bimolecular interaction. These interactions do not necessarily need to have large interfaces at the initial phase. This seems to be the case in Mef with IKS interactions, since we could not identify a pocket of appropriate size either using PocketMiner software suggested by the reviewer or with PocketFinder tool of ICM-pro software.

      1. Figure 4C - again, can you show the RMSF analysis of all four subunits leading to 12 data points? If it is too messy to plot, can you plot a mean with a standard deviation? I would say that a 1-1.5 angstroms increase in the RMSF is not a "markedly increased", as stated on line 280. I would also encourage the authors to label whether the RMSF is calculated from the backbone, side-chain or C-alpha atoms and, ideally, compare them to see where the dynamical properties are coming from.

      Please see the answer to comment #4. We agree that the changes are not so dramatic and modified the text accordingly. RMSD was calculated for backbone atom to compare residues with different side chains, a note of this is now in the methods and statistical significance of ps-IKs vs K41C, W323A and Y46C is indicated in Figures 4C-4E.

      1. In the discussion - Lines 464-467. "Slowed deactivation of the S1/KCNE1/Pore domain/drug complex... By stabilising the activated complex. MD simulation suggests the latter is most likely the case." Can you point out explicitly where this has been proven? If the drug really stabilised the activated complex, can you show which intermolecular interaction within E1/S1/Pore has the drug broken and re-form to strengthen the complex formation? The authors have not disproven the point on steric hindrance either. Can this be disproved by further quantitative analysis of existing unbiased equilibrium simulations?

      The stabilization of S1/KCNE1/Pore by drugs does not necessarily have to involve a creation of new contacts between protein parts or breakage of interfaces between them. The stabilization of activated complexes by drugs may occur when the drug simultaneously binds to both moveable parts of the channel, such as voltage sensor(s) or upper KCNE1 region, and static region(s) of the channel, such as the pore domain. We have changed the corresponding text for better clarity.

      1. Figure 4D - Can you show this RMSF analysis for all mutants you conducted in this study, such as Y46C? Can you explain the difference in F dynamics in the KCNE3 for both Figure 4C and 4D?

      We now show the RMSF for K41C, W323A and Y46C in Figure 4C-E. We speculate that K41 (magenta) and W323 (yellow), given their location at the lipid interface (see Author response image 1), may be important stabilizing residues for the KCNE N-terminus, whereas Y46 (green) which is further down the TMD has less of an impact.

      Author response image 2.

      1. Line 477: the author suggested that K41 and Mef may stabilise the protein-protein interface at the external region of the channel complex. Can you prove that through the change in protein-protein interaction, contact is made over time on the existing MD trajectories, whether they are broken or formed? The interface from which residues help to form and stabilise the contact? If this is just a hypothesis for future study, then this has to be stated clearly.

      It is known that crosslinking of several residues of external E1 with the external pore residues dramatically stabilizes voltage-sensors of KCNQ1/KCNE1 complex in the up-state conformation. This prevents movable protein regions in the voltage-sensors returning to their initial positions upon depolarization, locking the channel in an open state. We suggest that MEF may restrain the backward movement of voltage-sensors in a similar way that stabilizes open conformation of the channel. The stabilization of the voltage sensor domain through MEF occurs due to contacts of the drug with both static (pore domain) and dynamic protein parts (voltage-sensors and external KCNE1 regions). We have changed the corresponding part of the text.

      1. The author stated on lines 305-307 that "DIDS is stabilised by its hydrophobic and vdW contacts with KCNQ1 and KCNE1 subunits as well as by two hydrogen bonds formed between the drug and ps-KCNE1 residue L42 and KCNQ1 residue Q147" Can you show, using H-bond analysis that these two hydrogen bonds really exist stably in the simulations? Can you show, using minimum distance analysis, that L42 are in the vdW radii stably and are making close contact throughout the simulations?

      We performed a detailed H-bond analysis (Figure 6-supplement figure 1) which shows that DIDS forms multiple H-bond over the simulations, though only some of them (GLU43, TYR46, ILE47, SER298, TYR299, TRP323 ) are stable. Thus, the H-bonds that we observed in DIDS-docking experiments were unstable in MD simulations. As in the case of the IKs-MEF complex, the prevailing H-bonds exhibit marked quantitative variability from simulation to simulation. We have added a table detailing the most frequent H-bonds during MD simulations (Table 2).

      1. Discussion - In line 417, the author stated that the "S1 appears to pull away from the pore" and supplemented the claim with the movie. This is insufficient. The author should demonstrate distance calculation between the S1 helix and the pore, in WT and mutants, with and without the drug. This could be shown as a time series or distribution of centre-of-mass distance over time.

      We tried to analyze the distance changes between the upper S1 and the pore domain but failed to see a strong correlation We have removed this statement from the discussion.

      1. Given that all the work were done in the open state channel with PIP2 bound (PDB entry: 6v01), could the author demonstrate, either using docking, or simulations, or alignment, or space-filling models - that the ligand, both DIDS and Mef, would not be able to fit in the binding site of a closed state channel (PDB entry: 6v00). This would help illustrate the point denoted Lines 464-467. "Slowed deactivation of the S1/KCNE1/Pore domain/drug complex... By stabilising the activated complex. MD simulation suggests the latter is most likely the case."

      As of now, a structure representing the closed state of the channel does not exist. 6V00 is the closed inactivated state of the channel pore with voltage-sensors in the activated conformation. In order to create simulation conditions that reliably describe the electrophysiological experiments, at least a good model for closed channels with resting state voltage sensors is necessary.

      1. The author stated that the binding pose changed in one run (lines 317 to 318). Can you comment on those changes? If the pose has changed - what has it changed to? Can you run longer simulations to see if it can reverse back to the initial confirmation? Or will it leave the site completely?

      Longer simulations and trajectory clustering revealed several binding modes, where one pose dominated in approximately 50% of all simulations in Figure 2-figure supplement 3 encircled with a blue frame.

      1. Binding free energy of -32 kcal/mol = -134 kJ/mol. If you try to do dG = -RTlnKd, your lnKd is -52. Your Kd is e^-52, which means it will never unbind if it exists. I am aware that this is the caveat with the methodologies. But maybe these should be highlighted throughout the manuscript.

      We thank the reviewer for this comment. G values, and corresponding Kd values, calculated from simulation of Mef-ps-IKs complex do not reflect the apparent Kd values determined in electrophysiological experiments, nor do they reflect Kd values of drug binding that could be determined in biochemical essays. Important measures are the changes observed in simulations of mutant channel complexes relative to wild type. We now briefly mention this issue in the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) It would be nice to have labels of amino acid residues in Figure 2B.

      We updated Figure 2B and added some residue labels.

      2) Fig. 3A and 7A. In what order the current traces are presented? I don't see the rule.

      We have now arranged the current traces in a more orderly manner, listing them first by ascending KCNE1 residue numbers and then by ascending KCNQ1 residue numbers. Now consistent with Fig 3 and 7 (normalized response and delta V1/2).

      3) Line 312 "A44 and Y46 were more so." A44 may be more critical, but I can't see Y46 is more, according to Figure 2-figure supplement2 and Figure 6.

      Indeed, comparison of the energy decomposition data indicates approximately the same ∆G values for Y46. We have revised this in the text correspondingly.

      4) Line 267 "Mefenamic acid..." I would like to see the movie.

      We no longer have access to this original movie

      5) In supplemental movies 5-7, the side chains of some critical amino acid residues (W323, K41) would be better presented as in movies 1-4.

      We have retained the original presentations of these movies as the original files are no longer available.

      Reviewer #2 (Recommendations For The Authors):

      General comments:

      1) To determine the effect of mefenamic acid and DIDS on channel closing kinetics, a protocol in which they step from an activating test pulse to a repolarizing tail pulse to -40 mV for 1 s is used. If I understand it right, the drug response is assessed as the difference in instantaneous tail current amplitude and the amplitude after 1 s (row 599-603). The drug response of each mutant is then normalized to the response of the WT channel. However, for several mutants there is barely any sign of current decay during this relatively brief pulse (1 s) at this specific voltage. To determine drug effects more reliably on channel closing kinetics/the extent of channel closing, I wonder if these protocols could be refined? For instance, to cover a larger set of voltages and consider longer timescales?

      To clarify, the drug response of each mutant is not normalized to the response of the WT channel. In fact, our analysis is not meant to compare mutant and WT tail current decay but rather how isochronal tail current decay is changed in response to drug treatment in each channel construct. As acknowledged by the reviewer, the peak to end difference currents were calculated by subtracting the minimum amplitude of the deactivating current from the peak amplitude of the deactivating current. But the difference current in mefenamic acid or DIDS was normalized to the maximum control (in the absence of drug) difference current and subtracted from 1.0 to obtain the normalized response. Thus, the difference in tail current decay in the absence and in the presence of drug is measured within the same time scale and allow a direct comparison between before and after drug treatment. As shown in Fig 3D and 7C, a large drug response such as the one measured in WT channels is reflected by a value close to 1. A smaller drug response is indicated by low values. We recognize that some mutations resulted in an intrinsic inhibition of tail current decay in the absence of drug, which potentially lead to underestimating the normalized response value. Our goal was not to study in detail the effects of the drug on channel closing kinetics, but only to determine the impact of the mutation on drug binding by using tail current decay as a readout. Consequently, we believe that the duration of the deactivating tail current used in this experiment was sufficient to detect drug-induced tail current decay inhibition.

      2) The effect of mefenamic acid seems to be highly dependent on the pulse-to-pulse interval in the experiments. For instance, for WT in Figure 3 - Figure supplement 1, a 15 s pulse-to-pulse interval provides a -100 mV shift in V1/2 induced by mefenamic acid, whereas there is no shift induced when using a 30 s pulse-to-pulse interval. Can the authors explain why they generally consider a 15 s pulse-to-pulse interval more suitable (physiologically relevant?) in their experiments to assess drug effects?

      In our previous experiments, we have determined that a 15 s inter-pulse interval is generally adequate for the WT IKs channels to fully deactivate before the onset of the next pulse. Consistent with our previous work (Wang et al. 2019), we observed that in wild-type EQ channels, there is no current summation from one pulse to the next one (see Fig 1A, bottom panel). This is important as the IKs channel complex is known to be frequency dependent i.e. current amplitude increases as the inter-pulse interval gets shorter. Such current summation results in a leftward shift of the conductance-voltage (GV) relationship. This is also important with regards to drug effects. As indicated by the reviewer, mefenamic acid effects are prominent with a 15 sec inter-pulse interval but less so with a 30 sec inter-pulse interval when enough time is given for channels to more completely deactivate. Full effects of mefenamic acid would have therefore been concealed with a 30sec inter-pulse interval.

      Moreover, our patch-clamp recordings aim to explore the distinct responses of mutant channels to mefenamic acid and DIDS in comparison to the wild-type channel. It is important to note that the inter-pulse interval's physiological relevance is not necessarily crucial in this context.

      3) Related to comment 1 and 2, there is a large diversity in the intrinsic properties of tested mutants. For instance, V1/2 ranges from 4 to 70 mV. Also, there is large variability in the slope of the G-V curves. Whether channel closing kinetics, or the impact of pulse-to-pulse interval, vary among mutants is not clear. Could the authors please discuss whether the intrinsic properties of mutants may affect their ability to respond to mefenamic acid and DIDS? Also, please provide representative current families and G-V curves for all assessed mutants in supplementary figures.

      The intrinsic properties of some mutants vary from the WT channels and influence their responsiveness to mefenamic acid and DIDS. The impact of the mutations on the IKs channel complex are reflected by changes in V1/2 (Table 1, 4) and tail current decay (Figs. 3, 7). But, it is the examination of the drug effects on these intrinsic properties (i.e. GV curve and tail current decay) that constitutes the primary endpoint of our study. We consider that the degree by which mef and DIDS modify these intrinsic properties reflects their ability to bind or not to the mutated channel. In our analysis, we compared each mutant's response to mefenamic acid and DIDS with its respective control. Consequently, the intrinsic properties of the mutant channels have already been considered in our evaluation. As requested, we have provided representative current families and G-V curves for all assessed mutants in Figure 3-figure supplement 1 and Figure 7-figure supplement 1.

      4) The A44C and Y148C mutants give strikingly different currents in the examples shown in Figure 3 and Figure 7. What is the reason for this? In the examples in figure 7, it almost looks like KCNE1 is absent. Although linked constructs are used, is there any indication that KCNE1 is not co-assembled properly with KCNQ1 in those examples?

      The size of the current is critical to determining its shape, as during the test pulse there is some endogenous current mixed in which impacts shape. A44C and Y148C currents shown in Figure 7 are smaller with a larger contribution of the endogenous current, mostly at the foot of the current trace. In our experience there is little endogenous current in the tail current at -40 mV and for this reason we focus our measurements there.

      Although constructs with tethered KCNQ1 and KCNE1 were used, we cannot rule out the possibility that Q1 and E1 interaction was altered by some of the mutations. Several KCNE1 and KCNQ1 residues have been identified as points of contact between the two subunits. For instance, the KCNE1 loop (position 36-47) has been shown to interact with the KCNQ1 S1-S2 linker (position 140-148) (Wang et al, 2011). Thus, it is conceivable that mutation of one or several of those residues may alter KCNQ1/KCNE1 interaction and modify the activation/deactivation kinetics of the IKs channel complex.

      5) I had a hard time following the details of the simulation approaches used. If not already stated (I could not find it), please provide: i) details on whether the whole channel protein was considered for 4D docking or a docking box was specified, ii) information on how simulations with mutant ps-IKs were prepared (for instance with the K41C mutant), especially whether the in silico mutated channel was allowed to relax before evaluation (and for how long). Also, please make sure that information on simulation time and number of repeats are provided in the Methods section.

      For 4D docking, only residues within 0.8 nm of psKCNE1 residues D39-A44 were selected. Complexes with mutated residues were relaxed using the same protocol as the WT channel, (equilibration with gradually releasing restraints with a final equilibration for 10 ns where only the backbone was constrained with 50 kcal/mol/nm2). We have updated the methods accordingly.

      Specific comments:

      In figure legends, please provide information on whether data represents mean +/- SD or SEM. Also, please provide information on which statistical test was used in each figure.

      We revised the figure legend to add the nature of the statistical test used.

      G-V curves are normalized between 0 and 1. However, for many mutants the G-V relationship does not reach saturation at depolarized voltages. Does this affect the estimated V1/2? I could not really tell as I was not sure how V1/2 was determined for different mutants (could the explanation on row 595-598 be clarified)?

      The primary focus here is in the shift between the control response and drug response for each mutant, rather than the absolute V1/2 values. The isochronal G-V curves that are generated for each construct (WT and mutant) utilize an identical voltage protocol. This approach ensures a uniform comparison among all mutants. By observing the shifts in these curves, we can gain insight into the response of mutant channels to the drug. This information ultimately helps elucidate the inherent properties of the mutant channels and contributes to our understanding of the drug's binding mechanism to the channel.

      As requested by the reviewer, we also clarified the way V1/2 was generated: When the G-V curve did not reach zero, the V1/2 value was directly read from the plot at the voltage point where the curve crossed the 0.5 value on the y coordinate.

      A general comment is that the Discussion is fairly long and some sections are quite redundant to the Results section. The authors could consider focusing the text in the Discussion.

      We changed the discussion correspondingly wherever it was appropriate.

      I found it a bit hard to follow the authors interpretation on whether their drug molecules remain bound throughout the experiments, or whether there is fast binding/unbinding. Please clarify if possible.

      In the 300 ns MD simulations mefenamic acid and DIDS remained stably bound to WT-ps-IKS, binding of drugs to mutant complexes are described in the Table 3 and Table 5. In longer simulations with the channel embedded in a lipid environment, mefenamic acid unbinds in two out of five runs for WT-ps-IKs (Figure 4 – figure supplement 1), and DIDS shows a few events where it briefly unbinds (Figure 6 -figure supplement 3). Based on electrophysiological data we speculate that drugs might bind and unbind to WT-ps-IKs during the gating process. We do not see bind-unbinding in MD simulations, since the model we used in simulations reflects only open conformation of the channel-complex with an activated-state voltage-sensor, whereas a resting-state voltage sensor condition was not considered.

      The authors have previously shown that channels with no, one or two KCNE1 subunits are not, or only to a small extent, affected by mefenamic acid (Wang et al., 2020). Could the details of the binding site and proposed mechanisms of action provide clues as to why all binding sites need to be occupied to give prominent drug effects?

      In the manuscript, we propose that the binding of drugs induces conformational changes in the pocket region that stabilize S1/KCNE1/Pore complex. In the tetrameric channel with 4:4 alpha to beta stoichiometry the drugs are likely to occupy all four sites with complete stabilization of S1/KCNE1/Pore. When one or more KCNE1 subunits is absent, as in case of EQQ, or EQQQQ constructs, drugs will bind to the site(s) where KCNE1 is available. This will lead to stabilization of the only certain part of the S1/KCNE1/Pore complex. We believe that the corresponding effect of the drug, in this case will be partially effective.

      There is a bit of jumping in the order of when some figures are introduced (e.g. row 178 and 239). The authors could consider changing the order to make the figures easier to follow.

      We have changed the corresponding section appropriately to improve the reading flow.

      Row 237: "Data not shown", please show data.

      The G-V curve of the KCNE1 Y46C mutant displays a complex, double Boltzmann relationship which does not allow for the calculation of a meaningful V1/2 nor would it allow for an accurate determination of drug effects. Consequently, we have excluded it from the manuscript.

      In the Discussion, the author use the term "KCNE1/3". Does this correspond to the previous mention of "ps-KCNE1"?

      Yes, this refers to ps-KCNE1. We have changed it correspondingly.

      Row 576: When was HMR 1556 used?

      While HMR 1556 was used in preliminary experiments to confirm that the recorded current was indeed IKs, it does not provide substantial value to the data presented in our study or our experiments. As a result, we have excluded HMR 1556 experiments from the final results and have revised the Methods section accordingly.

      Reviewer #3 (Recommendations For The Authors):

      1) Figures 2D and 6A are very unclear. Can the authors provide labels as text rather than coloured circles, whether the residue is on Q1 or E1? There is also a distance label in the figure in the small font with the faintest shade of grey, which I believe is supposed to be hydrogen bonds. Can this be improved for clarity?

      We feel that additional labels on the ligand diagrams to be more confusing, instead, we updated the description in the legend and added labels to Figure 2B and Figure 6B to improve the clarity of residue positions. In addition, we have added 2 new figures with more detailed information about H-bonds (Figure 2-figure supplement 4, Figure 6- figure supplement 1).

      2) Figure 2B - all side chains need labelling in different binding modes. The green ligand on blue protein is very difficult to see. Suddenly, the ligand turns light blue in panel 2C. Can this be consistent throughout the manuscript?

      Figure 2B is updated according to this comment.

      3) Figure 2 - figure supplement 2, and figure 6B. Can the author show the residue number on the x-axis instead of just the one-letter abbreviation? This requires the reader to count and is not helpful when we try to figure out where the residue is at a glance. I would suggest a structure label adjacent to the plot to show whether they are located with respect to the drug molecule.

      Since the numbers for residues on either end of the cluster are indicated at the bottom of each boxed section, we feel that adding residue numbers would just further clutter the figure.

      4) Figure 2 - figure supplement 2, and Figure 6B. Can you explain what is being shown in the error bar? I assume standard deviation?

      Error bars on Figure 2-figure supplement 2 represent SEM. We added corresponding text in the figure legend.

      5) Figure 2 - figure supplement 2, and figure 6B. Can you explain how many frames are being accounted for in this PBSA calculation?

      For Figure 2- figure supplement 2 and Figure 6B a frame was made every 0.3 ns over 3x300 ns simulation, 1000 frames for each simulation, 3000 frames overall.

      6) Figure 3D/E and 7C/D, it would be helpful to show which mutant show agreeable results with the simulations, PBSA/GBSA and contact analyses as suggested above.

      The inconsistencies and discrepancies between the results of MD simulations and electrophysiological experiments are discussed throughout the manuscript.

      7) Figure legend, figure 3E - I assume that there is a type that is different mutants with respect to those without the drug. Otherwise, how could WT, with respect to WT, has -105 mV dV1/2?

      The reviewer is correct in that the bars indicate the difference in V1/2 between control and drug treatment. Thus, the difference in V1/2 (∆V1/2) between the V1/2 calculated for WT control and the V1/2 for mefenamic acid is indeed -105 mV. We have now revised Figure 3E's legend to accurately reflect this and ensure a clear understanding of the data presented.

      8) Figure 3 - figure supplement 1B is very messy, and I could not extract the key point from it. Can this be plotted on a separate trace? At least 1 WT trace and one mutant trace, 1 with WT+drug and one mut+drug as four separate plots for clarity?

      The key message of this figure is to illustrate the similarities of EQ WT + Mef and EQ L142C data. Thus, after thorough consideration, we have concluded that maintaining the current figure, which displays the progressive G-V curve shift in EQ WT and L142C in a superimposed manner, best illustrates the gradual shift in the G-V curves. This presentation allows for a clearer and more immediate comparison of the curve shifts, which may be more challenging to discern if the G-V curves were separated into individual figures. We believe that the existing format effectively communicates the relevant information in a comprehensive and accessible manner.

      9) Figure 4B - the label Voltage is blended into the orange helix. Can the label be placed more neatly?

      We altered the labels for this figure and added that information in the figure description.

      10) Can you show the numerical label of the residue, at least only to the KCNE1 portion in Figures 4C and 4D?

      We updated these figures and added residue numbering for clarity.

      11) Can you hide all non-polar hydrogen atoms in figure 8 and colour each subunit so that it agrees with the rest of the manuscripts? Can you adjust the position of the side chain so that it is interpretable? Can you summarise this as a cartoon? For example, Q147 and Y148 are in grey and are very far hidden away. So as S298. Can you colour-code your label? The methionine (I assume M45) next to T327 is shown as the stick and is unlabelled. Maybe set the orthoscopic view, increase the lighting and rotate the figures in a more interpretable fashion?

      We agree that Fig.8 is rather small as originally presented. We have tried to emphasize those residues we feel most critical to the study and inevitably that leads to de-emphasis of other, less important residues. As long as the figure is reproduced at sufficient size we feel that it has sufficient clarity for the purposes of the Discussion.

      12) Line 538-539. Can you provide more detail on how the extracellular residues of KCNE3 are substituted? Did you use Modeller, SwissModel, or AlphaFold to substitute this region of the KCNEs?

      We used ICM-pro to substitute extracellular residues of KCNE3 and create mutant variants of the Iks channel. This information is provided in the methods section now.

      13) Line 551: The PIP2 density was solved using cryo-EM, not X-ray crystallography.

      We corrected this.

      14) Line 555: The system was equilibrated for ten ns. In which ensemble? Was there any restraint applied during the equilibration run? If yes, at what force constant?

      The system was equilibrated in NVT and NPT ensembles with restraints. These details are added to methods. In the new simulations, we did equilibrations gradually releasing spatial from the backbone, sidechains, lipids, and ligands. A final 30 ns equilibration in the NPT ensemble was performed with restraint only for backbone atoms with a force constant of 50 kJ/mol/nm2. Methods were edited accordingly.

      15) Line 557: Kelvin is a unit without a degree.

      Corrected

      16) Line 559: PME is an electrostatic algorithm, not a method.

      Corrected

      17) Line 566: Collecting 1000 snapshots at which intervals. Given your run are not equal in length, how can you ensure that these are representative snapshots?

      Please see comment #5.

      18) Table 3 - Why SD for computational data and SEM for experimental data?

      There was no particular reason for using SD in some graphs. We used appropriate statistical tests to compare the groups where the difference was not obvious.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      1. Evidence for a disulfide bridge contained in membrane-associated FGF2 dimers

      This aspect was brought up in detail by both Reviewer #1 and Reviewer #3. It has been addressed in the revised manuscript by (i) new experimental and computational analyses, (ii) a more detailed discussion of previous work from our lab in which experiments were done the reviewers were asking for and (iii) a more general discussion of known examples of disulfide formation in protein complexes with a particular focus on membrane surfaces facing the cytoplasm, the inner plasma membrane leaflet being a prominent example. Please find our detailed comments in our direct response to Reviewers #1 and #3, see below.

      1. Affinity towards PI(4,5)P2 comparing FGF2 dimers versus monomers

      This is an aspect that has been raised by Reviewer 3 along with additional comments on the interaction of FGF2 with PI(4,5)P2. Please find our detailed response below. With regard to PI(4,5)P2 affinity aspects of FGF2 dimers versus FGF2 monomers, we think that the increased avidity of FGF2 dimers with two high affinity binding pockets for PI(4,5)P2 are a good explanation for the different values of free energies of binding that were calculated from the atomistic molecular dynamics simulations shown in Fig. 9. This phenomenon is well known for many biomolecular interactions and is also consistent with the cryoEM data contained in our manuscript, showing a FGF2 dimer with two PI(4,5)P2 binding sites facing the membrane surface.

      1. C95-C95 FGF2 dimers as signaling units

      We have put forward this hypothesis since in structural studies analyzing the FGF ternary signaling complex consisting of FGF2, FGF receptor and heparin, FGF2 mutants were used that lack C95. Nevertheless, two FGF2 molecules are contained in FGF signaling complexes. In addition to the papers on the structure of the FGF signaling complex, we have cited work that showed that C95-C95 crosslinked FGF2 dimers are efficient FGF signaling modules (Decker et al, 2016; Nawrocka et al, 2020). Therefore, being based on an assembly/disassembly mechanism with the transient formation of poreforming FGF2 oligomers, we think it is an interesting idea that the FGF2 secretion pathway produces C95-C95 disulfide-linked FGF2 dimers at the outer plasma membrane leaflet that can engage in FGF2 ternary signaling complexes. While this is a possibility we put forward to stimulate the field, it of course remains a hypothesis which has been clearly indicated as such in the revised manuscript.

      Reviewer #1:

      1. Evidence for disulfide-bridged FGF2 dimers and higher oligomers on non-reducing versus reducing SDS gels

      The experiment suggested by Reviewer #1 is an important one that has been published by our group in previous work. In these studies, we found FGF2 oligomers analyzed on non-reducing SDS gels to be sensitive to DTT, turning the vast majority of oligomeric FGF2 species into monomers [(Müller et al, 2015); Fig. 3, compare panel D with panel H]. This phenomenon could be observed most clearly after short periods of incubations (0.5 hours) of FGF2 with PI(4,5)P2-containing liposomes. These findings constituted the original evidence for PI(4,5)P2-induced FGF2 oligomerization to depend on the formation of intermolecular disulfide bridges.

      In the current manuscript, we established the structural principles underlying this process and identified C95 to be the only cysteine residue involved in disulfide formation. Based on biochemical cross-linking experiments in cells, cryo-electron tomography, predictions from AlphaFold-2 Multimer and molecular dynamics simulations, we demonstrated a strong FGF2 dimerization interface in which C95 residues are brought into close proximity when FGF2 is bound to membranes in a PI(4,5)P2-dependent manner. These findings provide the structural basis by which disulfide bridges can be formed from the thiols contained in the side chains of two C95 residues directly facing each other in the dimerization interface. In the revised manuscript, we included additional data that further strengthen this analysis. In the experiments shown in the new Fig. 10, we combined chemical cross-linking with mass spectrometry, further validating the reported FGF2 dimerization interface. In addition, illustrated in the new Fig. 8, we employed a new computational analysis combining 360 individual atomistic molecular dynamics simulations, each spanning 0.5 microseconds, with advanced machine learning techniques. This new data set corroborates our findings, demonstrating that the C95-C95 interface self-assembles independently of C95-C95 disulfide formation, based on electrostatic interactions. Intriguingly, it is consistent with our experimental findings based on cross-linking mass spectrometry (new Fig. 10) where cross-linked peptides could also be observed with the C77/95A variant form of FGF2, suggesting a protein-protein interface whose formation does not depend on disulfide formation. Therefore, we propose that disulfide formation occurs in a subsequent step, representing the committed step of FGF2 membrane translocation with the formation of disulfide-bridged FGF2 dimers being the building blocks for pore-forming FGF2 oligomers.

      As a more general remark on the mechanistic principles of disulfide formation in different cellular environments, we would like to emphasize that it is a common misconception that the reducing environment of the cytoplasm generally makes the formation of disulfide bridges unlikely or even impossible. From a biochemical point of view, the formation of disulfide bridges is not limited by a reducing cellular environment but is rather controlled by kinetic parameters when two thiols are brought into proximity. Indeed, it has become well established that disulfide bridges can also be formed in compartments other than the lumen of the ER/Golgi system, including the cytoplasm. For example, viruses maturing in the cytoplasm can form stable structural disulfide bonds in their coat proteins (Locker & Griffiths, 1999; Hakim & Fass, 2010). Moreover, many cytosolic proteins, including phosphatases, kinases and transcriptions factors, are now recognized to be regulated by thiol oxidation and disulfide bond formation, formed as a post-transcriptional modification (Lennicke & Cocheme, 2021). In numerous cases with direct relevance for our studies on FGF2, disulfide bond formation and other forms of thiol oxidation occur in association with membrane surfaces. In fact, many of these processes are linked to the inner plasma membrane leaflet (Nordzieke & Medrano-Fernandez, 2018). Growth factors, hormones and antigen receptors are observed to activate transmembrane NADPH oxidases generating O2·-/H2O2 (Brown & Griendling, 2009). For example, the local and transient oxidative inactivation of membrane-associated phosphatases (e.g., PTEN) serves to enhance receptor associated kinase signaling (Netto & Machado, 2022). It is therefore conceivable that similar processes introduce disulfide bridges into FGF2 while assembling into oligomers at the inner plasma membrane leaflet. In the revised version of our manuscript, we have discussed the above-mentioned aspects in more detail, with the known role of NADPH oxidases in disulfide formation at the inner plasma membrane leaflet being highlighted.

      Reviewer #2:

      1. Potential effects of a C95A substitution on protein folding and comparison with a C95S substitution with regard to phenotypes observed in FGF2 secretion

      A valid point that we indeed addressed at the beginning of this project. Most importantly, we tested whether both FGF2 C95A and FGF2 C95S are characterized by severe phenotypes in FGF2 secretion efficiency. As shown in the revised Fig. 1, cysteine substitutions by serine showed very similar FGF2 secretion phenotypes compared to cysteine to alanine substitutions (Fig. 1C and 1D). In addition, in the pilot phase of this project, we also compared recombinant forms of FGF2 C95A and FGF2 C95S in various in vitro assays. For example, we tested the full set of FGF2 variants in membrane integrity assays as the ones contained in Fig. 4. As shown in Author response image 1, FGF2 variant forms carrying a serine in position 95 behaved in a very similar manner as compared to FGF2 C95A variant forms. Relative to FGF2 wild-type, membrane pore formation was strongly reduced for both types of C95 substitutions. By contrast, both FGF2 C77S and C77A did show activities that were similar to FGF2 wild-type.

      Author response image 1.

      From these experiments, we conclude that changes in protein structure are not the basis for the phenotypes we report on the C95A substitution in FGF2.

      1. Effects of a C77A substitution on FGF2 membrane recruitment in cells

      The effect of a C77A substitution in FGF2 recruitment to the inner plasma membrane leaflet is indeed a moderate one. This is likely to be the case because C77 is only one residue of a more complex surface that contacts the α1 subunit of the Na,K-ATPase. Stronger effects can be observed when K54 and K60 are changed, residues that are positioned in close proximity to C77 (Legrand et al, 2020). Nevertheless, as shown in the revised Fig. 1, we consistently observed a reduction in membrane recruitment when comparing FGF2 C77A with FGF2 wild-type. When analyzing the raw data without GFP background subtraction, a significant reduction of FGF2 C77A was observed compared to FGF2 wild-type (Fig. 1A and 1B). We therefore conclude that C77 does not only play a role in FGF2/α1 interactions in biochemical assays using purified components (Fig. 7) but also impairs FGF2/α1 interactions in a cellular context (Fig. 1A and 1B).

      1. Identity of the protein band in Fig. 3 labeled with an empty diamond

      This is a misunderstanding as we did not assign this band to a FGF2-GFP dimer. When we produced the corresponding cell lines, we used constructs that link FGF2 with GFP via a ‘self-cleaving’ P2A sequence. During translation, even though arranged on one mRNA, this causes the production of FGF2 and GFP as separate proteins in stoichiometric amounts, the latter being used to monitor transfection efficiency. However, a small fraction is always expressed as a complete FGF2-P2A-GFP fusion protein (a monomer). This band can be detected with the FGF2 antibodies used and was labeled in Fig. 3 by an empty diamond.

      1. Labeling of subpanels in Fig. 5A

      We have revised Fig. 5 according to the suggestion of Reviewer #2.

      1. FGF2 membrane binding efficiencies shown in Fig. 5C

      It is true that FGF2 variant forms defective in PI(4,5)P2-dependent oligomerization (C95A and C77/95A) bind to membranes with somewhat reduced efficiencies. This is also evident form the intensity profiles shown in Fig. 5A and was observed in biochemical in vitro experiments as well. A plausible explanation for this phenomenon would be the increased avidity when FGF2 oligomerizes, stabilizing membrane interactions (see also Fig. 9B).

      1. Residual activities of FGF2 C95A and C77/95A in membrane pore formation?

      We do not assign the phenomenon in Fig. 5 Reviewer #2 is referring to as controlled activities of FGF2 C95A and C77/95A in membrane pore formation. Rather, GUVs containing PI(4,5)P2 are relatively labile structures with a certain level of integrity issues upon protein binding and extended incubation times being conceivable. It is basically a technical limitation of this assay with GUVs incubated with proteins for 2 hours. Even after substitution of PI(4,5)P2 with a Ni-NTA membrane lipid, background levels of loss of membrane integrity can be observed (Fig. 6). Therefore, as compared to FGF2 C95A and C77/95A, the critical point here is that FGF2 wt and FGF2 C77A do display significantly higher levels of a loss of membrane integrity in PI(4,5)P2-containing GUVs, a phenomenon that we interpret as controlled membrane pore formation. By contrast, all variant forms of FGF2 show only background levels for loss of membrane integrity in GUVs containing the Ni-NTA lipid.

      1. Why does PI(4,5)P2 induce FGF2 dimerization?

      This has been studied extensively in previous work (Steringer et al, 2017). As also discussed in the current manuscript, the interaction of FGF2 with membranes through its high affinity PI(4,5)P2 binding pocket orients FGF2 molecules on a 2D surface that increase the likelihood of the formation of the C95containing FGF2 dimerization interface. Moreover, in the presence of cholesterol at levels typical for plasma membranes, PI(4,5)P2 clusters containing up to 4 PI(4,5)P2 molecules (Lolicato et al, 2022), a process that may further facilitate FGF2 dimerization.

      1. Is it possible to pinpoint the number of FGF2 subunits in oligomers observed in cryo-electron tomography?

      We indeed took advantage of the Halo tags that appear as dark globular structures in cryo-electron tomography. For most FGF2 oligomers with FGF2 subunits on both sides of the membrane, we could observe 4 to 6 Halo tags which is consistent with the functional subunit number that has been analyzed for membrane pore formation (Steringer et al., 2017; Sachl et al, 2020; Singh et al, 2023). However, since the number of higher FGF2 oligomers we observed in cryo-electron tomography was relatively small and the nature of these oligomers appears to be highly dynamic, caution should be taken to avoid overinterpretation of the available data.

      Reviewer #3:

      1. Conclusive demonstration of disulfide-linked FGF2 dimers

      A similar point was raised by Reviewer #1, so that we would like to refer to our response on page 2, see above.

      1. Identity of FGF2-P2A-GFP observed in Fig. 3

      Again, a similar point has been made, in this case by Reviewer #2 (Point 3). The observed band is not a FGF2-P2A-GFP dimer but rather the complete FGF2-P2A-GFP fusion protein (a monomer) that corresponds to a small population produced during mRNA translation where the P2A sequence did not cause the production of FGF2 and GFP as separate proteins in stoichiometric amounts.

      1. Quantification of GFP signals in Fig. 6

      Fig. 6 has been revised according to the suggestion of Reviewer #3. A comprehensive comparison of PI(4,5)P2 and the Ni-NTA membrane lipid in FGF2 membrane translocation assays is also contained in previous work that introduced the GUV-based FGF2 membrane translocation assay (Steringer et al., 2017).

      1. Experimental evidence for various aspects of FGF2 interactions with PI(4,5)P2

      Most of the points raised by Reviewer #3 have been addressed in previous work. For example, FGF2 has been demonstrated to dimerize only on membrane surfaces containing PI(4,5)P2 (Müller et al., 2015). In solution, FGF2 remained a monomer even after hours of incubation as analyzed by native gel electrophoresis and reducing vs. non-reducing SDS gels (see Fig. 3 in Müller et al, 2015). In the same paper, the first evidence for a potential role of C95 in FGF2 oligomerization has been reported, however, at the time, our studies were limited to FGF2 C77/95A. In the current manuscript, the in vitro experiments shown in Figs. 2 to 6 establish the unique role of C95 in PI(4,5)P2-dependent FGF2 oligomerization. As discussed above, FGF2 oligomers have been shown to contain disulfide bridges based on analyses on non-reducing gels in the absence and presence of DTT (Müller et al., 2015).

      References

      Brown DI, Griendling KK (2009) Nox proteins in signal transduction. Free Radic Biol Med 47: 1239-1253 Decker CG, Wang Y, Paluck SJ, Shen L, Loo JA, Levine AJ, Miller LS, Maynard HD (2016) Fibroblast growth factor 2 dimer with superagonist in vitro activity improves granulation tissue formation during wound healing. Biomaterials 81: 157-168

      Hakim M, Fass D (2010) Cytosolic disulfide bond formation in cells infected with large nucleocytoplasmic DNA viruses. Antioxid Redox Signal 13: 1261-1271

      Legrand C, Saleppico R, Sticht J, Lolicato F, Muller HM, Wegehingel S, Dimou E, Steringer JP, Ewers H, Vattulainen I et al (2020) The Na,K-ATPase acts upstream of phosphoinositide PI(4,5)P2 facilitating unconventional secretion of Fibroblast Growth Factor 2. Commun Biol 3: 141

      Lennicke C, Cocheme HM (2021) Redox metabolism: ROS as specific molecular regulators of cell signaling and function. Mol Cell 81: 3691-3707

      Locker JK, Griffiths G (1999) An unconventional role for cytoplasmic disulfide bonds in vaccinia virus proteins. J Cell Biol 144: 267-279

      Lolicato F, Saleppico R, Griffo A, Meyer A, Scollo F, Pokrandt B, Muller HM, Ewers H, Hahl H, Fleury JB et al (2022) Cholesterol promotes clustering of PI(4,5)P2 driving unconventional secretion of FGF2. J Cell Biol 221

      Müller HM, Steringer JP, Wegehingel S, Bleicken S, Munster M, Dimou E, Unger S, Weidmann G, Andreas H, GarciaSaez AJ et al (2015) Formation of Disulfide Bridges Drives Oligomerization, Membrane Pore Formation and Translocation of Fibroblast Growth Factor 2 to Cell Surfaces. J Biol Chem 290: 8925-8937

      Nawrocka D, Krzyscik MA, Opalinski L, Zakrzewska M, Otlewski J (2020) Stable Fibroblast Growth Factor 2 Dimers with High Pro-Survival and Mitogenic Potential. Int J Mol Sci 21

      Netto LES, Machado L (2022) Preferential redox regulation of cysteine-based protein tyrosine phosphatases: structural and biochemical diversity. FEBS J 289: 5480-5504

      Nordzieke DE, Medrano-Fernandez I (2018) The Plasma Membrane: A Platform for Intra- and Intercellular Redox Signaling. Antioxidants (Basel) 7

      Sachl R, Cujova S, Singh V, Riegerova P, Kapusta P, Muller HM, Steringer JP, Hof M, Nickel W (2020) Functional Assay to Correlate Protein Oligomerization States with Membrane Pore Formation. Anal Chem 92: 14861-14866

      Singh V, Macharova S, Riegerova P, Steringer JP, Muller HM, Lolicato F, Nickel W, Hof M, Sachl R (2023) Determining the Functional Oligomeric State of Membrane-Associated Protein Oligomers Forming Membrane Pores on Giant Lipid Vesicles. Anal Chem 95: 8807-8815

      Steringer JP, Lange S, Cujova S, Sachl R, Poojari C, Lolicato F, Beutel O, Muller HM, Unger S, Coskun U et al (2017) Key steps in unconventional secretion of fibroblast growth factor 2 reconstituted with purified components. eLife 6: e28985

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The paper from Hsu and co-workers describes a new automated method for analyzing the cell wall peptidoglycan composition of bacteria using liquid chromatography and mass spectrometry (LC/MS) combined with newly developed analysis software. The work has great potential for determining the composition of bacterial cell walls from diverse bacteria in high-throughput, allowing new connections between cell wall structure and other important biological functions like cell morphology or host-microbe interactions to be discovered. In general, I find the paper to be well written and the methodology described to be useful for the field. However, there are areas where the details of the workflow could be clarified. I also think the claims connecting cell wall structure and stiffness of the cell surface are relatively weak. The text for this topic would benefit from a more thorough discussion of the weak points of the argument and a toning down of the conclusions drawn to make them more realistic.

      Thank you for your thorough and insightful review of our manuscript. We greatly appreciate your positive and constructive feedbacks on our methodology. We have carefully reviewed your comments and have responded to each point as follows:

      Specific points:

      1) It was unclear to me from reading the paper whether or not prior knowledge of the peptidoglycan structure of an organism is required to build the "DBuilder" database for muropeptides. Based on the text as written, I was left wondering whether bacterial samples of unknown cell wall composition could be analyzed with the methods described, or whether some preliminary characterization of the composition is needed before the high-throughput analysis can be performed. The paper would be significantly improved if this point were explicitly addressed in the main text. We apologize for not making it clearer. The prior knowledge of the peptidoglycan structure of an organism is indeed required to build the “DBuilder” database to accurately identify muropeptides; otherwise, the false discovery rate might increase. While peptidoglycan structures of certain organisms might not have been extensively studied, users still remain the flexibility to adapt the muropeptide compositions based on their study, referencing closely related species for database construction. We have addressed this aspect in the main text to ensure a clearer understanding.

      “(Section HAMA platform: a High-throughput Automated Muropeptide Analysis for Identification of PGN Fragments) …(i) DBuilder... Based on their known (or putative) PGN structures, all possible combinations of GlcNAc, MurNAc and peptide were input into DBuilder to generate a comprehensive database that contains monomeric, dimeric, and trimeric muropeptides (Figure 1b)."

      2) The potential connection between the structure of different cell walls from bifidobacteria and cell stiffness is pretty weak. The cells analyzed are from different strains such that there are many possible reasons for the change in physical measurements made by AFM. I think this point needs to be explicitly addressed in the main text. Given the many possible explanations for the observed measurement differences (lines 445-448, for example), the authors could remove this portion of the paper entirely. Conclusions relating cell wall composition to stiffness would be best drawn from a single strain of bacteria genetically modified to have an altered content of 3-3 crosslinks.

      We understand your concern regarding the weak connection between cell wall structure and cell stiffness. We will make a clear and explicit statement in the main text to acknowledge that the cells analyzed are derived from different strains, introducing the possibility of various factors influencing the observed changes in physical measurements as determined by AFM. Furthermore, we greatly appreciate your suggestion to consider genetically modified strains to investigate the role of cross-bridge length in determining cell envelope stiffness. In this regard, we are in the process of developing a CRISPR/Cas genome editing toolbox for Bifidobacterium longum, and we plan on this avenue of investigation for future work.

      Reviewer #2 (Public Review):

      The authors introduce "HAMA", a new automated pipeline for architectural analysis of the bacterial cell wall. Using MS/MS fragmentation and a computational pipeline, they validate the approach using well-characterized model organisms and then apply the platform to elucidate the PG architecture of several members of the human gut microbiota. They discover differences in the length of peptide crossbridges between two species of the genus Bifidobacterium and then show that these species also differ in cell envelope stiffness, resulting in the conclusion that crossbridge length determines stiffness.

      We appreciate your thoughtful review of our manuscript and your recognition of the potential significance of our work in elucidating the poorly characterized peptidoglycan (PGN) architecture of the human gut microbiota.

      The pipeline is solid and revealing the poorly characterized PG architecture of the human gut microbiota is worthwhile and significant. However, it is unclear if or how their pipeline is superior to other existing techniques - PG architecture analysis is routinely done by many other labs; the only difference here seems to be that the authors chose gut microbes to interrogate.

      We apologize if this could have been clearer. The HAMA platform stands apart from other pipelines by utilizing automatic analysis of LC-MS/MS data to identify muropeptides. In contrast, most of the routine PGN architecture analyses often use LC-UV/Vis or LC-MS platform, where only the automatic analyzing PGFinder software is supported. To our best knowledge, a comparable pipeline on automatically analyzing LC-MS/MS data was reported by Bern et al., which they used commercial Byonic software with an in-house FASTA database and specific glycan modifications. They achieved accurate and sensitive identification on monomer muropeptides, but struggled with cross-linked muropeptides due to the limitations of the Byonic software. We believe that our pipeline introducing the automatic and comprehensive analysis on muropeptide identification (particularly for Gram-positive bacterial peptidoglycans) would be a valuable addition to the field. To enhance clarity, we have adjusted the context as follows:

      (Introduction) … Although they both demonstrated great success in identifying muropeptide monomers, the accurate identification of muropeptide multimers and other various bacterial PGN structures still remains unresolved. This is because deciphering the compositions requires MS/MS fragmentation, but it is still challenging to automatically annotate MS/MS spectra from these complex muropeptide structures."

      I do not agree with their conclusions about the correlation between crossbridge length and cell envelope stiffness. These experiments are done on two different species of bacteria and their experimental setup therefore does not allow them to isolate crossbridge length as the only differential property that can influence stiffness. These two species likely also differ in other ways that could modulate stiffness, e.g. turgor pressure, overall PG architecture (not just crossbridge length), membrane properties, teichoic acid composition etc.

      Regarding the conclusions drawn about the correlation between cross-bridge length and cell envelope stiffness, we understand your point and appreciate your feedback. We revisit this section of our manuscript and tone down the conclusions drawn from this aspect of the study. We also recognize the importance of considering other potential factors that could influence stiffness, as you mentioned above. In light of this, we mentioned the need for further investigations, potentially involving genetically modified strains, in the main text to isolate and accurately determine the impact of bridge length on cell envelope stiffness.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      1) One thing to consider would be testing the robustness of the analysis pipeline with one the well-characterized bacteria studied, but genetically modifying them to change the cell wall composition in predictable ways. Does the analysis pipeline detect the expected changes?

      We appreciate the reviewer's suggestion and would like to provide a clear response. Regarding to testing the pipeline with genetically modified strains, our lab previously worked on genetically modified S. maltophilia (KJΔmrdA).1 Inactivation of mrdA turned out the increasing level of N-acetylglucosaminyl-1,6-anhydro-N-acetylmuramyl-L-alanyl-D-glutamyl-meso-diamnopimelic acid-D-alanine (GlcNAc-anhMurNAc tetrapeptide) in muropeptide profiles, which is the critical activator ligands for mutant strain ΔmrdA-mediated β-lactamase expression. In this case, our platform could provide rapid PGN analysis for verifying the expected change of muropeptide profiles (see Author response image 1). Besides, if the predictable changes involve genetically modifications on interpeptide bridges within the PGN structure, for example, the femA/B genes of S. aureus, which are encoded for the synthesis of interpeptide bridges,2 our current HAMA pipeline is capable of detecting these anticipated changes. However, if the genetically modifications involve the introduce of novel components to PGN structures, then it would need to create a dedicated database specific to the genetically modified strain.

      Author response image 1.

      2) Line 368: products catalyzed > products formed

      The sentence has been revised.

      “(Section Inferring PGN Cross-linking Types Based on Identified PGN Fragments) …Based on the muropeptide compositional analysis mentioned above, we found high abundances of M3/M3b monomer and D34 dimer in the PGNs of E. faecalis, E. faecium, L. acidophilus, B. breve, B. longum, and A. muciniphila, which may be the PGN products formed by Ldts.”

      3) Lines 400-402: Is it possible the effect is related to porosity, not "hardness".

      Thank you for the suggestion. The possibility of the slower hydrolysis rate of purified PGN in B. breve being related to porosity is indeed noteworthy. While this could be a potential factor, we would like to acknowledge the limited existing literature that directly addresses the relation between PGN architecture and porosity. It is plausible that current methods available for assessing cell wall porosity may have certain limitations, contributing to the scarcity of relevant studies. In light of this, we would like to propose a speculative explanation for the observed effect. It is plausible that the tighter PGN architecture resulting from shorter interpeptide bridges in B. breve could contribute to its harder texture. This speculation is grounded in the concept that a more compact PGN structure might lead to increased stiffness, aligning with our observations of higher cell stiffness in B. breve.

      4) Lines 403-408: See point #2 above.

      Thank you for the suggestion. We have explicitly addressed this point in the main text:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) … Taken all together, we speculate that a tight peptidoglycan network woven by shorter interpeptide bridges or 3-3 cross-linkages could give bacteria stiffer cell walls. However, it is important to note that cell stiffness is a mechanical property that also depends on PGN thickness, overall architecture, and turgor pressure. These parameters may vary among different bacterial strains. Hence, carefully controlled, genetically engineered strains with similar characteristics will be needed to dissect the role of cross-bridge length in cell envelope stiffness.”

      5) Lines 428-429: It is not clear to me how mapping the cell wall architecture provides structural information about the synthetic system. It is also not clear how antibiotic resistance can be inferred. More detail is needed here to flesh out these points.

      Thank you for the suggestion. To provide further clarity on these important aspects, the context in the manuscript has been revised.

      “(Discussion) …Importantly, our HAMA platform provides a powerful tool for mapping peptidoglycan architecture, giving structural information on the PGN biosynthesis system. This involves the ability to infer possible PGN cross-linkages based on the type of PGN fragments obtained from hydrolysis. For instance, the identification of 3-3 cross-linkage formed by L,D-transpeptidases (Ldts) is of particular significance. Unlike 4-3 cross-linkages, the 3-3 cross-linkage is resistant to inhibition by β-Lactam antibiotics, a class of antibiotics that commonly targets bacterial cell wall synthesis through interference with 4-3 cross-linkages. Therefore, by elucidating the specific cross-linkage types within the peptidoglycan architecture, our approach offers insights into antibiotic resistance mechanisms.”

      6) Line 478: "maneuvers are proposed for" > "work is needed to generate". Also, delete "innovative". Also "in silico" > "in silico-based".

      The sentence has been revised.

      “(Discussion) …To achieve a more comprehensive identification of muropeptides, future work is needed to generate an expanded database, in silico-based fragmentation patterns, and improved MS/MS spectra acquisition.”

      7) Line 485: "Its" > "It has potential"

      The sentence has been revised.

      “(Discussion) …It has potential applications in identifying activation ligands for antimicrobial resistance studies, characterizing key motifs recognized by pattern recognition receptors for host-microbiota immuno-interaction research, and mapping peptidoglycan in cell wall architecture studies.”

      8) Figure 1 legend: Define Gb and Pb.

      Gb and Pb are the abbreviations of glycosidic bonds and peptide bonds. We have revised the Figure legend 1 as follow:

      “(Figure legend 1) …(b) DBuilder constructs a muropeptide database containing monomers, dimers, and trimers with two types of linkage: glycosidic bonds (Gb) and peptide bonds (Pb).”

      9) Figure 2: It is hard to see what is going on in panel a and c with all the labels. Consider removing them and showing a zoomed inset with labels in addition to ab unlabeled full chromatogram.

      We apologize for not making this clearer. The panel a and c in Figure 2 were directly generated by the Analyzer as a software screenshot of the peak annotations on chromatogram. Our intention was to present a comprehensive PGN mapping (approximately 70% of the peak area was assigned to muropeptide signals) using this platform. We understand the label density might affect clarity, so we have added the output tables of the whole muropeptide identifications as source data (Table 1–Source Data 1&2). Additionally, we have uploaded the Analyzer output files (see Additional Files), which can be better visualized in the Viewer program, and it also allows users zoom in for detailed labeling information.

      10) Figure 3: It is worth pointing out what features of the MS/MS fingerprints are helping to discriminate between species.

      Thank you for the suggestion. We have revised Figure 3 and the legend as follow:

      “(Figure legend 3) …The sequence of each isomer was determined using in silico MS/MS fragmentation matching, with the identified sequence having the highest matching score. The key MS/MS fragments that discriminate between two isomers are labeled in bold brown.”

      Author response image 2.

      11) Figure 4 and 5 legend: Can you condense the long descriptions of the abbreviations - or at least only refer to them once?

      Certainly, to enhance clarity and conciseness in the figure legends, we have revised Figure legend 5 as follow:

      “(Figure legend 5) …(b) Heatmap displaying …. Symbols: M, monomer; D, dimer; T, trimer (numbers indicate amino acids in stem peptides). Description of symbol abbreviations as in Figure legend 4, with the addition of "Glycan-T" representing trimers linked by glycosidic bonds.”

      Reviewer #2 (Recommendations For The Authors):

      1. Please read the manuscript carefully for spelling errors.

      We appreciate your careful review of our manuscript. We have thoroughly rechecked the entire manuscript for spelling errors and have made the necessary corrections to ensure the accuracy and quality of the text.

      1. Line 46 - "multilayered" is likely only true for Gram-positive bacteria.

      We thank reviewer #2 for bringing up this concern. Indeed, Gram-negative bacteria mostly possess single layer of peptidoglycan, but could be up to three layers in some part of the cell surface.3, 4 In order to reduce the confusion, we have rewritten the context as follow: “(Introduction) …PGN is a net-like polymeric structure composed of various muropeptide molecules, with their glycans linearly conjugated and short peptide chains cross-linked through transpeptidation.”

      1. Methods section: It seems like pellets from a 10 mL bacterial culture were ultimately suspended in 1.5 L (750 mL water + 750 mL tris) - why such a large volume? And how were PG fragments subsequently washed (centrifugation? There is no information on this in the Methods).

      We apologize for the mislabeling on the units. The accurate volume should be “1.5 mL (750 µL water + 750 µL tris)”. We have updated the correct volume in the Methods section (lines 99-100). For the washing process of purified PGN, we added 1 mL water, centrifuged at 10,000 rpm for 5 minutes, and removed supernatant. This information has added to the Methods section (lines 95-98).

      1. Line 183 - why were 6 modifications chose as the cutoff? Please make rationale more clear.

      We thank reviewer #2 for the comments. We set the maximum modification number of 6 in the assumption of one modification on each sugar of a trimeric muropeptide. A lower cutoff could effectively limit the identification of muropeptides with unlikely numbers of modifications, whereas a higher cutoff could allow for having multiple modifications on a muropeptide. In our hand, muropeptide modifications of E. coli are mostly N-deacetyl-MurNAc and anhydro-MurNAc, and modifications of gut microbes used here are mostly N-deacetyl-GlcNAc, anhydro-MurNAc, O-acetyl-MurNAc, loss of GlcNAc, and amidated iso-Glu. While we recommend starting data analysis with the cutoff of 6 modifications, users are free to adjust this based on their studies.

      1. Line 339 - define donor vs. acceptor here (can be added in parentheses after explaining the relevant chemical reactions further above in the text)

      Thank you for the suggestion. To provide greater clarity regarding the roles of the donor and acceptor substrates in the transpeptidation process, we have revised the content in the manuscript as follows:

      “(Section Inferring PGN Cross-linking Types Based on Identified PGN Fragments) …In general, there are two types of PGN cross-linkage…. Transpeptidation involves two stem peptides which function as acyl donor and acceptor substrates, respectively. As the enzyme names imply, the donor substrates that Ddts and Ldts bind to are terminated as D,D-stereocenters and L,D-stereocenters, which structurally means pentapeptides and tetrapeptides. During D,D-transpeptidation, Ddts recognize D-Ala4-D-Ala5 of the donor stem (pentapeptide) and remove the terminal D-Ala5 residue, forming an intermediate. The intermediate then cross-links the NH2 group in the third position of the neighboring acceptor stem, forming a 4-3 cross-link.”

      1. Line 366 following - can you calculate % crosslinks based on these numbers? What does "high abundance" of 3,3 crosslinks mean in this context? Is this the majority of PG?

      Thank you for your questions. Calculating the percentage of crosslinks based on the muropeptide compositional numbers is a valid consideration. However, it's important to note that the muropeptides we analyzed were hydrolyzed by mutanolysin, and as such, deriving an accurate % crosslink value from these data might not provide a true representation of the crosslinking percentage within the PGN network. For a more precise determination of % crosslinks, methods such as solid-phase NMR on purified peptidoglycan would be required. Our research provides insights into the characterization of PGN fragments and allows us to infer potential PGN cross-linkage types and the enzymes involved based on the dominant muropeptide fragments. Regarding the phrase "high abundance" in the context, it indicates that the M3b/M4b monomer and D34 dimer muropeptides represent a significant portion of the hydrolysis products. These muropeptides are major constituents within the PGN fragments obtained from the enzymatic hydrolysis.

      1. Line 375 - I am not sure PG is a meaningful diffusion barrier for drugs and signaling molecules, give that even larger proteins can apparently diffuse through the pores.

      Thank you for raising this point. Peptidoglycan indeed possesses relatively wide pores that allow for the diffusion of larger molecules, including proteins.5 Research has provided a rough estimate of the porosity of the PGN meshwork, suggesting that it allows for the diffusion of proteins with a maximum molecular mass of around 50 kDa.6 Considering this, we acknowledge that PGN may not serve as a significant diffusion barrier for drugs and signaling molecules. The porosity of the PGN scaffold, which is defined by the degree of cross-linking, plays a role in influencing the transport of molecules to the cell membrane. Thus, while PGN may not serve as a strict diffusion barrier, its structural characteristics still impact bacterial cell mechanics and interactions. We have revised the manuscript to reflect this understanding:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) …The porosity of the PGN scaffold, defined by the degree of cross-linking, influences the transport of larger molecules such as proteins. Therefore, modifications to PGN structure are anticipated to significantly affect bacterial cell mechanics and interactions.”

      1. Line 400 - what does "slower hydrolysis rate" refer to, is this chemical hydrolysis or enzymatic (autolysins?). also, I am not sure hydrolysis rate of either modality allows for solid conclusions about how hard (line 402) the PG is.

      Thank you for your comments. The hydrolysis rate here refers to the enzymatic hydrolysis, specifically the mutanolysin cleaving the β-N-acetylmuramyl-(1,4)-N-acetylglucosamine linkage. Indeed, there is no direct correlation between the hydrolysis rate and the hardness of PGN architecture, although the structure rigidity is a key determinant in protein digestion.7 Considering the enzymatic hydrolysis rate depending on the accessibility of the substrate to the enzyme, we proposed that the tighter PGN architecture could also lead to a slower hydrolysis rate. This speculation aligns with our observations of higher cell stiffness or more compact PGN structure of B. breve and its slower hydrolysis rate. We understand this is indirect proof, so the revised sentence now reads:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) …Furthermore, B. breve also showed a slower enzymatic hydrolysis rate in purified PGNs, implying that the cell wall structure of B. breve is characterized by a compact PGN architecture.”

      1. Line 424 - I am not convinced this pipeline can detect PG architectures that other pipelines cannot; likely, the difference between previous analyses and theirs is due to different growth conditions (3,3 crosslink formation is often modulated by environmental factors/growth stage). In the next sentence, it sounds like mutanolysin treatment is a novelty in PG analysis (which it is not).

      We apologize if this could have been clearer and we have revised the paragraph to describe our study more accurately. We agree that different growth conditions could influence PGN architecture and other pipelines could manually identify the PGN architectures or automatically identify them if they are not too complex. Our original intention was to highlight the ability of the HAMA program to automatically identify unreported PGN structure. Here are the revised sentences:

      “(Discussion) …We speculate that this finding may be influenced by the comprehensive mass spectrometric approaches we employed or by variations in growth conditions. Moreover, we utilized the well-established enzymatic method involving mutanolysin to cleave the β-N-acetylmuramyl-(1,4)-N-acetylglucosamine linkage, which preserves the original peptide linkage in intact PGN subunits.”

      1. Line 440- 442: As outlined in more detail above: I don't think you can conclude something about the relationship between bridge length and envelope stiffness based on these data. Thank you for your valuable feedback. We agree that our data may not definitively support the direct conclusion about the relationship between bridge length and envelope stiffness in Bifidobacterium species. Instead, we will rephrase this section to accurately present the observed correlations without overgeneralizing:

      “(Discussion) … Notably, our study suggested a potential correlation between the cell stiffness and the compactness of bacterial cell walls in Bifidobacterium species (Figure 5). B. longum, which predominantly harbors tetrapeptide bridges (Ser-Ala-Thr-Ala), exhibits a trend towards lower stiffness, whereas B. breve, characterized by PGN cross-linked with monopeptide bridges (Gly), demonstrates a trend towards higher stiffness. These findings suggested that it may be correlated between the increased rigidity and the more compact PGN architecture built by shorter cross-linked bridges.”

      References: 1. Huang, Y.-W.; Wang, Y.; Lin, Y.; Lin, C.; Lin, Y.-T.; Hsu, C.-C.; Yang, T.-C., Impacts of Penicillin Binding Protein 2 Inactivation on β-Lactamase Expression and Muropeptide Profile in Stenotrophomonas maltophilia. mSystems 2017, 2 (4), 00077-00017.

      1. Jarick, M.; Bertsche, U.; Stahl, M.; Schultz, D.; Methling, K.; Lalk, M.; Stigloher, C.; Steger, M.; Schlosser, A.; Ohlsen, K., The serine/threonine kinase Stk and the phosphatase Stp regulate cell wall synthesis in Staphylococcus aureus. Sci. Rep. 2018, 8 (1), 13693.

      2. Labischinski, H.; Goodell, E. W.; Goodell, A.; Hochberg, M. L., Direct proof of a "more-than-single-layered" peptidoglycan architecture of Escherichia coli W7: a neutron small-angle scattering study. J. Bacteriol. 1991, 173 (2), 751-756.

      3. Rohde, M., The Gram-Positive Bacterial Cell Wall. Microbiol. Spectr. 2019, 7 (3), gpp3-0044-2018.

      4. Vollmer, W.; Höltje, J. V., The architecture of the murein (peptidoglycan) in gram-negative bacteria: vertical scaffold or horizontal layer(s)? J. Bacteriol. 2004, 186 (18), 5978-5987.

      5. Vollmer, W.; Blanot, D.; De Pedro, M. A., Peptidoglycan structure and architecture. FEMS Microbiol. Rev. 2008, 32 (2), 149-167.

      6. Li, Q.; Zhao, D.; Liu, H.; Zhang, M.; Jiang, S.; Xu, X.; Zhou, G.; Li, C., "Rigid" structure is a key determinant for the low digestibility of myoglobin. Food Chem.: X 2020, 7, 100094.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to the Referee Comments We would like to express our appreciation to the editor and the reviewers for their thoughtful comments and constructive suggestions on the manuscript. We agree with most of the comments and have carefully revised the manuscript accordingly. The revisions are highlighted in red font in the revised manuscript. Below are point-by-point responses to the referee’s comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Microglia are increasingly recognized as playing an important role in shaping the synaptic circuit and regulating neural dynamics in response to changes in their surrounding environment and in brain states. While numerous studies have suggested that microglia contribute to sleep regulation and are modulated by sleep, there has been little direct evidence that the morphological dynamics of microglia are modulated by the sleep/wake cycle. In this work, Gu et al. applied a recently developed miniature two-photon microscope in conjunction with EEG and EMG recording to monitor microglia surveillance in freely-moving mice over extended period of time. They found that microglia surveillance depends on the brain state in the sleep/wake cycle (wake, non-REM, or REM sleep). Furthermore, they subjected the mouse to acute sleep deprivation, and found that microglia gradually assume an active state in response. Finally, they showed that the state-dependent morphological changes depend on norepinephrine (NE), as chemically ablating noradrenergic inputs from locus coeruleus abolished such changes; this is in agreement with previous publications. The authors also showed that the effect of NE is partially mediated by β2-adrenergic receptors, as shown with β2-adrenergic receptor knock-out mice. Overall, this study is a technical tour de force, and its data add valuable direct evidence to the ongoing investigations of microglial morphological dynamics and its relationship with sleep. However, there are a number of details that need to be clarified, and some conclusions need to be corroborated by more control experiments or more rigorous statistical analysis. Specifically:

      1. The number of branch points per microglia shown here (e.g., Fig. 2g) is much lower than the values of branch points in the literature, e.g., Liu T et al., Neurobiol. Stress 15: 100342, 2021 (mouse dmPFC, IHC); Liu YU et al., Nat. Neurosci. 22: 1771-81, 2019 (mouse S1, in vivo 2P imaging). The authors need to discuss the possible source of such discrepancy.

      Thank you for raising this important point. Two reasons may account for this difference. Firstly, the difference in the definition of branch points in the software. Liu YU et al. used the Sholl analysis of image J software to analyze the number of branch points of microglia. Sholl analysis defines the number of branch points as the number of crossings between branches and concentric circles of increasing radii. We reconstructed microglia morphology using Imaris, a software that defines branching points based on the number of bifurcation points. The number of bifurcations calculated represents the number of microglia branch points. Secondly, this and previous studies found that more branching points present in the state of anesthesia. The morphological characteristics of microglia in head-fixed mice under anesthesia was reported by Liu T et al. and the microglia reconstruction results presented by the authors are indeed more complex than ours. In short, this is an aspect that we have been paying attention to, and the main reasons for this difference may lie in the definition of branch points, analysis methods and related choice of thresholds. True differences in brain states and the heterogeneity of microglia in different brain regions may also contribute to the apparent discrepancy.

      1. Microglia process end-point speed (Fig. 2h, o): here the authors show that the speed is highest in the wake state and lowest in NREM, which agrees with the measurement on microglia motility during wakefulness vs NREM in a recent publication (Hristovska I et al., Nat. Commun. 13: 6273, 2022). However, Hristovska et al. also reported lower microglia complexity in NREM vs wake state, which seems to be the opposite of the finding in this paper. The authors need to discuss the possible source of such differences.

      This is also an important point. Hristovska et al. reported the morphodynamic characteristics of microglia during wakefulness and NREM sleep. It is worth noting that the sleep state of the mice in their experiments was unnatural due to the head fixation and body limitations, the duration of NREM sleep (sleep stability) being quite different from the NREM sleep analyzed under natural sleep. The limitations of this approach are also discussed by Hristovska et al. “Even though sleep episodes were, as anticipated, shorter than those observed in freely moving animals, changes in neuronal activity characteristic of NREM sleep were monitored by EEG recordings, and changes in morphodynamics were observed during single episodes. Several episodes of REM sleep were detected, but they were too short and rare to be analyzed reliably.” The unnatural sleep state would lead to an increase in the microarousal state, and ultimately a change in the structure of the sleep state, which may be the main reason for the difference in microglia behavior from our natural sleep. We have discussed this in the revised manuscript. Please see line 292298.

      1. Fig. 3: the authors used single-plane images to analyze the morphological changes over 3 or 6 hours of SD, which raises the concern that the processes imaged at the baseline may drift out of focus, leading to the dramatic reduction in process lengths, surveillance area, and number of branch points. In fact, a previous study (Bellesi M et al., J. Neurosci. 37(21): 5263-73, 2017) shows that after 8 h SD, the number of microglia process endpoints per cell and the summed process length per cell do not change significantly (although there is a trend to decline). The authors may confirm their findings by either 3D imaging in vivo, or 3D imaging in fixed tissue.

      Three lines of evidence indicate that microglia morphology changes in Fig 3 are due to SD, rather than variations in the focal plane. First, our single-plane images were quite stable over 3 or 6 hours of SD, though occasional reversible drifts might happen due to sudden motions. Second, per your suggestion, further experiments and analysis of 3D imaging were performed to monitor microglia dynamics during sleep deprivation. The new result is shown in revised Fig. S3 C-D: the length of microglia branches and the number of branching points were significantly reduced after SD, in agreement with the results of single-plane imaging. Furthermore, we detected no significant difference in microglia branching characteristics during 6h sleep deprivation in 2AR KO mice (Fig.S4), and this indirectly affirmed that singleplane imaging is stable enough for detecting true changes in branching during SD.

      1. Fig. 4b: the EEG and EMG signals look significantly different from the example given in Fig. 2a. In particular, the EMG signal appears completely flat except for the first segment of wake state; the EEG power spectrum for REM appears dark; and the wake state corresponds to stronger low frequency components (below ~ 4 Hz) compared to NREM, which is the opposite of Fig. 2a. This raises the concern whether the classification of sleep stage is correct here.

      Thank you for insightful comments. We carefully examined the behavioral video of Figure 4b, there were occasionally microarousal events indicated by slow head rotation during NREM sleep, while the companion EMG signals were completely flat, which is atypical during sleep wake cycle. The microarousal events were not excluded from sleep, which makes this set of data unrepresentative and contrary to Fig.4b. In our revised manuscript, we replaced it with more representative data that can clearly and consistently distinguish between different brain states in mice on EMG and EEG. Please see revised Fig.2a, page 34; revised Fig.4b, page 37.

      1. Fig. 4 NE dynamics. • How long is a single continuous imaging session for NE? • When monitoring microglia surveillance, the authors were able to identify wake or NREM states longer than 15 min, and REM states longer than 5 min. Here the authors selected wake/NREM states longer than 1 min and REM states longer than 30 s. What makes such a big difference in the time duration selected for analysis? • Also, the definition of F0 is a bit unclear. Is the same F0 used throughout the entire imaging session, or is it defined with a moving window?

      A single continuous session of NE imaging usually took about 1 hour. Subsequent analysis was performed on imaging data from each recording that included wake, NREM sleep, and REM sleep. Because of the different time scales of microglia morphological dynamic (relatively slow) and NE signals (fast), we used different time windows in the previous analysis in the previous version of the manuscript.

      Per your suggestion, we have now set the same time window selection criteria for both microglia morphological and NE dynamic analysis: for wake and NREM sleep durations longer than 1 minute, and REM sleep durations longer than 30 seconds. We updated the Methods and all statistics in related figures, please see line 151-154, 481-485, 490-492; Fig. 2e-g and 2l-n, page 34. F0 definition is now explained in the Methods section. Please see line 521-522.

      1. Fig. 5b: how does the microglia morphology in LC axon ablation mice compare with wild type mice under the wake state? The text mentioned "more contracted" morphology but didn't give any quantification. Also, the morphology of microglia in the wake state (Fig. 5b) appears very different from that shown in Fig. S3C1 (baseline). What is the reason?

      The morphology of microglia is indeed heterogeneous and variable, affected by factors including brain state, brain region, microenvironmental changes, along with animal-to-animal difference. We didn’t perform the microglia morphology comparison between the LC axon ablation mice and wild type mice and, in view of this, we removed the description of “more contracted morphology” from the main text. It should also be noted that, as we primarily focused on changes of a microglia in different states over time by selfcomparison, we minimized possible effects of heterogeneity in microglia morphology on our conclusions.

      1. The relationship between NE level and microglia dynamics. Fig. 4C shows that the extracellular NE level is the highest in the wake state and the lowest in REM. Previous studies (Liu YU et al., Nat. Neurosci. 22(11):1771-1781, 2019; Stowell RD et al., Nat. Neurosci. 22(11): 1782-1792, 2019) suggest that high NE tone corresponds to reduced microglia complexity and surveillance. Hence, it would be expected that microglia process length, branch point number, and area/volume are higher in REM than in NREM. However, Fig. 2l-n show the opposite. How should we understand this ?

      Your point is well-taken. On the one hand, our data clearly showed that NE is critically involved in the brain state-dependent microglia dynamic surveillance, with evidence from the ablation of the LC-NE projection and from the β2AR knockout animal model.

      On the other hand, we also understand that NE is not the sole determinant, so the relationship between the NE level and the complexity and surveillance may not be unique.

      In this regard, other potential modulators also present dynamic during sleepwake cycle and may partake in the regulation of microglia dynamic surveillance. previous studies (Liu YU et al., 2019; Stowell RD et al., 2019) have shown that microglia can be jointly affected by surrounding neuronal activity and NE level during wake. It has been reported that LC firing stops (Aston-Jones et al., 1981; Rasmussen et al., 1986), while inhibitory neurons, such as PV neurons and VIP neurons, become relatively active during REM sleep (Brécier et al., 2022). ATP level in basal forebrain is shown to be higher in REM than NREM (Peng et al., 2023). In addition, our own preliminary result (Author response image 1) also showed a higher adenosine level in REM than NREM in somatosensory cortex. Last but not the least, we found that β2AR knockout failed to abolish microglial responses to sleep state switch and SD stress altogether.

      In brief, microglia are highly sensitive to varied changes in the surrounding environment, and many a modulator may participate in the microglia dynamic during sleep state. This may underlie the microglia complexity difference between REM and NREM. Future investigations are warranted to delineate the signal-integrative role of microglia in physiology and under stress. We have discussed the pertinent points in the revised manuscript. Please see line 343-354.

      Author response image 1.

      Extracellular adenosine levels in somatosensory cortex in different brain states. AAV2/9-hSyn-GRABAdo1.0 (Peng W. et al., Science. 2020) was injected into the somatosensory cortex (A/P, -1 mm; M/L, +2 mm; D/V, -0.3 mm). Data from the same recording are connected by lines. n = 9 from 3 mice.

      Reviewer #2 (Public Review):

      The manuscript describes an approach to monitor microglial structural dynamics and correlate it to ongoing changes in brain state during sleep-wake cycles. The main novelty here is the use of miniaturized 2p microscopy, which allows tracking microglia surveillance over long periods of hours, while the mice are allowed to freely behave. Accordingly, this experimental setup would permit to explore long-lasting changes in microglia in a more naturalistic environment, which were previously not possible to identify otherwise. The findings could provide key advances to the research of microglia during natural sleep and wakefulness, as opposed to anesthesia. The main findings of the paper are that microglia increase their process motility and surveillance during REM and NREM sleep as compared to the awake state. The authors further show that sleep deprivation induces opposite changes in microglia dynamics- limiting their surveillance and size. The authors then demonstrate potential causal role for norepinephrine secretion from the locus coeruleus (LC) which is driven by beta 2 adrenergic receptors (b2AR) on microglia. However, there are several methodological and experimental concerns which should be addressed.

      The major comments are summarized below:

      1. The main technological advantage of the 2p miniaturized microscope is the ability to track single cells over sleep cycles. A main question that is unclear from the analysis and the way the data is presented is: are the structural changes in microglia reversible? Meaning, could the authors provide evidence that the same cell can dynamically change in sleep state and then return to similar size in wakefulness? The same question arises again with the data which is presented for anesthesia, is this change reversible?

      As revealed by long-term free behavioral mTPM imaging, the brain-statedependent morphological changes in microglia were reproducible and reversible. Author response image 2 shows that microglia displayed reversible dynamic changes during multiple rounds of sleep-wake transition. Author response image 3 shows that microglia dynamics induced by anesthesia also exhibited reversibility.

      Author response image 2.

      Long-term tracking of microglia process area in different brain states. Data analysis used 8 cells. Data total of 31 time points were selected from in vivo imaging data and were used to characterize the morphological changes of microglia over a continuous 7-hour period.

      Author response image 3.

      Reversible changes of microglial process length, area, number of branch points under anesthesia. Wake group: 30 minute-accommodation to new environment; Isoflurane group: 1.5% in air applied at a flow rate of 0.4 L/min for 30 minutes; Recovery group: 30 minutes after recovery from anesthesia. n = 9 cells from 3 mice for each group.

      1. The binary comparison between brain states is misleading, shouldn't the changes in structural dynamics compared to the baseline of the state onset? The authors method describes analysis of the last 5 minutes in each sleep/wake state. However, these transitions are directional- for instance, REM usually follows NREM, so the description of a decrease in length during REM sleep could be inaccurate.

      As you know, the time scale of microglia morphological dynamic is relatively slow, so we analyzed the microglia morphological dynamic of the last part (30s in the revised manuscript) of each state instead of the state onset, allowing time for stabilization of the microglia response to inter-state transition.

      Further, we compared microglia dynamic between two NREM groups transiting to different subsequent states: group1 (NREM to REM) vs group2 (NREM to Wake). This precaution was to exclude the directional effect of state transitions. Our results showed that there was no difference in microglial length, area, number of branching points between the two NREM groups (Author response image 4), indicating that the last 30s of each NREM was not affected by its following state and that it’s reasonable to perform binary comparison.

      Author response image 4.

      Microglial morphological length, area change, and number of branch points of the last 30s of NREM sleep followed by REM or Wake. n = 9 cells from 3 mice for each group.

      1. Sleep deprivation- again, it is unclear whether these structural changes are reversible. This point is straightforward to address using this methodology by measuring sleep following SD. In addition, the authors chose a method to induce sleep deprivation that is rather harsh. It is unclear if the effect shown is the result of stress or perhaps an excess of motor activity.

      We adopted the method of forced exercise as it has been commonly used for sleep deprivation (Pandi-Perumal et al., 2007; Nollet M et al., 2020), though it does have the potential limitation of excess of motor activity.

      In light of your comments and suggestion, we presented new data demonstrating that sleep duration of the mice, mostly NREM sleep, increased compensatively (ZT9-10) after the 6-hour sleep deprivation (ZT2-8) (revised Fig. S3B). This result shows that sleep deprivation indeed increase sleep pressure in the mice. As the sleep pressure was eased during recovery sleep, morphological changes of microglia were reversed over a timescale of several hours (revised Fig. S3 E-J).

      1. The authors perform measurements of norepinephrine with a recently developed GRAB sensor. These experiments are performed to causally link microglia surveillance during sleep to norepinephrine secretion. They perform 2p imaging and collect data points which are single neurons, and it is unclear why the normalization and analysis is performed for bulk fluorescence similar to data obtained with photometry.

      We did not perform single-neuron analysis for two reasons. First, our experimental conditions, e.g., the expression of the NE indicator and the control of imaging laser intensity, did not yield sufficient signal-to-noise to clearly discriminate individual neurons with two-photon imaging. Second, NE signal may play a modulatory role, and fluorescence changes appeared to be global, rather than local or cell-specific. Therefore, we analyzed fluorescence changes in different brain states over the whole field-of-view in Fig. 4, rather than at the subregional or single-cell level.

      1. The experiments involving b2AR KO mice are difficult to interpret and do not provide substantial mechanistic insight. Since b2AR are expressed throughout numerous cell types in the brain and in the periphery, it is entirely not clear whether the effects on microglia dynamics are direct. The conclusion and the statement regarding the expression of b2AR in microglia is not supported by the references the authors present, which simply demonstrate the existence and function of b2AR in microglia. In addition, these mice show significant changes in sleep pattern and increased REM sleep. This could account for reasons for the changes in microglia structure rather than the interpretation that these are direct effects.

      To summarize, the main conclusions of the paper require further support with analysis of existing data and experimental validation.

      Previous studies have revealed that norepinephrine (NE) has a modulating effect on microglial dynamics through β2AR pathway (Stowell RD et al., 2019; Liu YU et al., 2019). Stowell et al. and Liu et al. use in vivo two-photon imaging to demonstrate that microglia dynamics differ between awake and anesthetized mice and to highlight the roles of NE and β2AR in these states (Gyoneva S et al., 2013; Stowell RD et al., 2019; Liu YU et al., 2019). To evaluate the direct effect of β2AR on microglial dynamics, Stowell et al. administered the β2AR agonist clenbuterol to anesthetized mice and found that this decreased the motility, arbor complexity, and process coverage of microglia in the parenchyma (Stowell RD et al., 2019). Inhibition of β2AR by antagonist ICI-118,551 in awake mice recapitulated the effects of anesthesia by enhancing microglial arborization and surveillance (Stowell RD et al., 2019). In addition, it has been shown microglia expressed higher numbers of β2ARs than any other cells in the brain (Zhang et al., 2014).

      To this end, our current work provided new evidence to support the involvement of the LC-NE-β2AR axis in modulating microglia dynamics both during natural sleep-wake cycle and under SD stress. While we were aware the limitation of using pan-tissue β2AR knockout model that precluded us from pinpointing role of microglial β2AR, it is safe to state that β2-adrenergic receptor signaling plays a significant role in the sleep-state dependent microglia dynamic surveillance, based on the present and previous data.

      We have discussed this in the revised manuscript. Please see line 324-354. As you suggested, we added references to support the statement regarding the expression of β2AR in microglia (please see line 333).

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      Some technical details need to be clarified. Also, please double-check for typos.

      1. In vivo imaging preparation: how long is the recovery time between window/EEG implantation surgery and imaging/recording?

      Imaging data were collected one month after the surgery. We have added descriptions to the methods section of the revised manuscript. Please see line 419.

      1. Statistical analysis: the authors used t-test or ANOVA without first checking whether the data pass the normality test. If the data does not follow a normal distribution, nonparametric tests would be more appropriate.

      Per your suggestion, we performed the test of statistical significance using parametric (ANOVA) if past the normality test, or the non-parametric (Friedman) tests for non-normal data. Please see line 533-535.

      1. Fig. 1b needs a minor change. In the figure, the EMG electrodes appear to be connected to the brain as well.

      We have corrected this oversight. Thank you.

      1. Fig. 1c: it would be helpful to give examples of raw EEG and EMG traces for REM and NREM separately.

      Raw traces are now shown as suggested. Please see Fig. 1c, page 32.

      1. Fig. 1h: is each data point one microglia or one end-point?

      In Fig. 1h, each data represents the average speed of all branches of one microglia, not one end-point.

      1. Sleep deprivation starts at 9 am. What time corresponds to Zeitgeber Time 0 (ZT0, the beginning of the light phase)?

      We now clarified that 9 am corresponds to Zeitgeber time 2. Please see line 196.

      1. Line 61: the authors referred to Ramon y Cajal's original suggestion that microglia dynamics are coupled to the sleep-wake cycle. However, the cited paper only indicates that Cajal suggested a role of astrocytes in the sleep-wake cycle, not microglia. In addition, there is a typo in the line: there should be a space between "Ramon" and "y" in Cajal's name.

      We have updated the statement and reference literature to point out the microglia’s involvement in the sleep-wake cycle. The typo was corrected. Please see line 64-65.

      1. Fig. S3B: As each group has only 3 mice, it is unclear how t-test can yield p < 0.01 or even 0.001.

      We checked the original data again and it was correct. This small p-values may be due to the small intra-group difference of control group.

      1. Line 251-253, "Figure 4h-n" should be "Figure 5h-n"?

      We have revised it. Please see line 265-266.

      1. Fig. 5h: the receptor should be "adrenergic receptor", not "adrenal receptor".

      We changed the term to “adrenergic receptor”. Please see Fig 5h.

      1. Fig. 5g, n: the number of data points is apparently less than the sample size given in the figure legend. Perhaps some data points have exactly the same value so they overlap? The authors may consider plotting identical values with a slight shift so that the number of data points shown matches the actual sample size, to avoid confusion.

      Yes, we have added small jitters so different data points can be seen to avoid confusion. Please see Fig. 5n.

      1. There are some typos (e.g., Line 217, "he" should be "the") and some incomplete references (e.g., [13], [22], [34], [35] lack volume and page number, [15] and [39] lack publisher information). Some references have inconsistent formats (e.g., "Journal of Neuroscience" is sometimes abbreviated and sometimes not). Please correct these.

      We have corrected these oversights. Please see references, page 27.

      Reviewer #2 (Recommendations For The Authors):

      Major issues:

      1. Re-analyze the data in a manner that allows to follow and compare the same cells over different state transitions. This is necessary to evaluate the reversibility of microglia structure. In addition, consider analysis of the change from the beginning to the end of each state.

      As shown in response figure 2, microglia dynamics were reversible during multiple rounds of sleep-wake transition.

      1. It would be nice to see the raw data obtained over time, at least for Figure 1, before offline correction of movement to evaluate the imaging quality and level of drift during imaging.

      We agree to your good suggestion. Please see the supporting material video.

      1. It would be helpful to add an analysis of the percent time spent in each state for the 10 hour recordings.

      Advice has been adopted. Please see revised Fig. S4C.

      1. In Figure 2 the results are from 15 cells from several animals. How much do the results vary between mice? It will be helpful to show if this varies between different mice by labeling cells from each mouse differently.

      In Author response image 5, in which we have labeled the distribution of data points from seven mice, there was mixed distribution of data from different animals at each brain state, but no clear animal-to-animal difference.

      Author response image 5.

      Quantitative analysis of microglial length based on multi-plane microglial imaging. n = 17 cells from 7 mice for each group. In right panel, each color codes data from the same animal.

      1. SD- please add some quantification for sleep and EEG to show that the manipulation really caused sleep deprivation. To address the confound of forced movement and stress, it might be helpful to add quantification of movement compared to an undisturbed wakefulness.

      We have added related data (revised Fig. S3B), as suggested. Please see line 196-197.

      1. The DSP4 application should be also performed with NE measurements to verify the specific of the NE signal measured as well as the DSP4 toxin.

      Following your suggestion, we have added DSP4 data in revised Fig. S4B.

      1. Some suggested refined experiments for the b2AR KO are: a-A conditional b2AR KO in microglia, as cited in the work. b- Local application of a b2 blocker during SD. c- Imaging of NE dynamics in the b2 animals. If NE dynamics during natural sleep cycle are perturbed, then this suggests upstream mechanisms rather than direct microglia effects as suggested by the authors.

      We agree that the current study cannot pinpoint a direct effect of microglia harbored β2AR. We have discussed this limitation in the revised manuscript.

      Please see line 324-354.

      Minor:

      1. Typo on page 4 (microcopy instead of microscopy).

      It was corrected. Please see line 87.

      1. Typo page 11- 'and he largest changes in NE' - supposed to be 'the'.

      We have corrected these mistakes. Please see line 228.

      1. Fig. 4- there are several units missing in the figure in panel b: the top is Hz, but what does the color bar indicate exactly? 2 what? both for theta/delta and for NE. We have modified this figure and legend for clarity. Please see Fig. 4, page 37.

      2. Bottom of page 12- referring to figure 4 but talking about figure 5.

      The typo was corrected. Please see line 265-266.

      Reference

      1. Aston-Jones G, Bloom FE. Activity of norepinephrine-containing locus coeruleus neurons in behaving rats anticipates fluctuations in the sleep-waking cycle. J Neurosci. 1, 876–886 (1981).

      2. Bellesi M, de Vivo L, Chini M, Gilli F, Tononi G, Cirelli C. Sleep loss promotes astrocytic phagocytosis and microglial activation in mouse cerebral cortex. J Neurosci. 37, 5263–5273 (2017).

      3. Brécier A, Borel M, Urbain N, Gentet LJ. Vigilance and behavioral state-dependent modulation of cortical neuronal activity throughout the sleep/wake cycle. J Neurosci. 42, 4852–66 (2022).

      4. Dworak M, McCarley RW, Kim T, Kalinchuk AV, Basheer R. Sleep and brain energy levels: ATP changes during sleep. J Neurosci. 30, 9007-16 (2010).

      5. Gyoneva S., Traynelis SF. Norepinephrine modulates the motility of resting and activated microglia via different adrenergic receptors. J Biol Chem. 288, 15291302 (2013).

      6. Kjaerby C, Andersen M, Hauglund N, Untiet V, Dall C, Sigurdsson B, Ding F, Feng J, Li Y, Weikop P, Hirase H, Nedergaard M. Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci. 25, 1059–1070 (2022).

      7. Liu T, Lu J, Lukasiewicz K, Pan B, Zuo Y. Stress induces microglia-associated synaptic circuit alterations in the dorsomedial prefrontal cortex. Neurobiology of Stress. 15, 100342 (2021).

      8. Liu YU, Ying Y, Li Y, Eyo UB, Chen T, Zheng J, Umpierre AD, Zhu J, Bosco DB, Dong H, Wu LJ. Neuronal network activity controls microglial process surveillance in awake mice via norepinephrine signaling. Nat Neurosci. 22, 1771–1781 (2019).

      9. Nollet M, Wisden W, Franks NP. Sleep deprivation and stress: a reciprocal relationship. Interface Focus. 10, 20190092 (2020).

      10. Pandi-Perumal SR, Cardinali DP, Chrousos GP. 2007. Neuroimmunology of sleep. New York, NY: Springer.

      11. Peng W, Liu X, Ma G, Wu Z, Wang Z, Fei X, Qin M, Wang L, Li Y, Zhang S, Xu M. Adenosine-independent regulation of the sleep-wake cycle by astrocyte activity. Cell Discov. 9, 16 (2023).

      12. Peng W, Wu Z, Song K, Zhang S, Li Y, Xu M. Regulation of sleep homeostasis mediator adenosine by basal forebrain glutamatergic neurons. Science. 369, 6508 (2020).

      13. Rasmussen K, Morilak DA, Jacobs BL. Single unit activity of locus coeruleus neurons in the freely moving cat: I. During naturalistic behaviors and in response to simple and complex stimuli. Brain Research. 371, 324–334 (1986).

      14. Stowell RD, Sipe GO, Dawes RP, Batchelor HN, Lordy KA, Whitelaw BS, Stoessel MB, Bidlack JM, Brown E, Sur M, Majewska AK. Noradrenergic signaling in the wakeful state inhibits microglial surveillance and synaptic plasticity in the mouse visual cortex. Nat Neurosci. 22, 1782-1792 (2019).

      15. Umpierre AD, Bystrom LL, Ying Y, Liu YU, Worrell G, Wu LJ. Microglial calcium signaling is attuned to neuronal activity in awake mice. Elife. 27, e56502 (2020).

      16. Wang Z, Fei X, Liu X, Wang Y, Hu Y, Peng W, Wang YW, Zhang S, Xu M. REM sleep is associated with distinct global cortical dynamics and controlled by occipital cortex. Nat Commun. 13, 6896 (2022).

      17. Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, Phatnani HP, Guarnieri P, Caneda C, Ruderisch N, Deng S, Liddelow SA, Zhang C, Daneman R, Maniatis T, Barres BA, Wu JQ. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci. 34, 11929–11947 (2014).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The Hedgehog (HH) protein family is important for embryonic development and adult tissue maintenance. Deregulation or even temporal imbalances in the activity of one of the main players in the HH field, sonic hedgehog (SHH), can lead to a variety of human diseases, ranging from congenital brain disorders to diverse forms of cancers. SHH activates the GLI family of transcription factors, yet the mechanisms underlying GLI activation remain poorly understood. Modification and activation of one of the main SHH signalling mediators, GLI2, depends on its localization to the tip of the primary cilium. In a previous study the lab had provided evidence that SHH activates GLI2 by stimulating its phosphorylation on conserved sites through Unc-51-like kinase 3 (ULK3) and another ULK family member, STK36 (Han et al., 2019). Recently, another ULK family member, ULK4, was identified as a modulator of the SHH pathway (Mecklenburg et al. 2021). However, the underlying mechanisms by which ULK4 enhances SHH signalling remained unknown. To address this question, the authors employed complex biochemistry-based approaches and localization studies in cell culture to examine the mode of ULK4 activity in the primary cilium in response to SHH. The study by Zhou et al. demonstrates that ULK4, in conjunction with STK36, promotes GLI2 phosphorylation and thereby SHH pathway activation. Further experiments were conducted to investigate how ULK4 interacts with SHH pathway components in the primary cilium. The authors show that ULK4 interacts with a complex formed between STK36 and GLI2 and hypothesize that ULK4 functions as a scaffold to facilitate STK36 and GLI2 interaction and thereby GLI2 phosphorylation by STK36. Furthermore, the authors provide evidence that ULK4 and STK36 co-localize with GLI2 at the ciliary tip of NIH 3T3 cells, and that ULK4 and STK36 depend on each other for their ciliary tip accumulation. Overall, the described ULK4-mediated mechanism of SHH pathway modulation is based on detailed and rigorous Co-IP experiments and kinase assays as well as confocal imaging localization studies. The authors used various mutated and wild-type constructs of STK36 and ULK4 to decipher the mechanisms underlying GLI2 phosphorylation at the tip of the primary cilium. These novel results on SHH pathway activation add valuable insight into the complexity of SHH pathway regulation. The data also provide possible new strategies for interfering with SHH signalling which has implications in drug development (e.g., cancer drugs).

      However, it will be necessary to explore additional model systems, besides NIH3T3, HEK293 and MEF cell cultures, to conclude on the universality of the mechanisms described in this study. Ultimately, it needs to be addressed whether ULK4 modulates SHH pathway activity in vivo. Is there evidence that genetic ablation of ULK4 in animal models leads to less efficient SHH pathway induction? It also remains to be resolved how ULK3 and ULK4 act in distinct or common manners to promote SHH signalling. Another remaining question is, whether cell type- and tissue-specific features exist, that play a role in ULK3- versus ULK4-dependent SHH pathway modulation. In particular for the studies on ciliary tip localization of factors, relevant for SHH pathway transduction, a higher temporal resolution will be needed in the future as well as a deeper insight into tissue/ cell type-specific mechanisms. These caveats, mentioned here, don't have to be addressed in new experiments for the revision of this manuscript but could be discussed.

      We agree with the reviewer that it would be important to investigate in the future the in vivo function Ulk4 in Shh signaling, the relationship between Ulk3 and Ulk4/Stk36, and possible cell type/tissue specificity of these two kinase systems. This will need the generation of single and double knockout mice and examine Hh related phenotypes in different tissues and developmental stages. The precise mechanism by which Ulk4 and Stk36 are translocated to the ciliary tip is also an important and unsolved issue. We include several paragraphs in the “discussion” section to address these outstanding questions for future study.

      Reviewer #2 (Public Review):

      The authors provide solid molecular and cellular evidence that ULK4 and STK36 not only interact, but that STK36 is targeted (transported?) to the cilium by ULK4. Their data helps generate a model for ULK4 acting as a scaffold for both STK36 and its substrate, Gli2, which appear to co-localise through mutual binding to ULK4. This makes sense, given the proposed role of most pseuodkinases as non-catalytic signaling hubs. There is also an important mechanistic analysis performed, in which ULK4 phosphorylation in an acidic consensus by STK36 is demonstrated using IP'd STK36 or an inactive 'AA' mutant, which suggests this phosphorylation is direct.

      The major strength of the study is the well-executed combination of logical approaches taken, including expression of various deletion and mutation constructs and the careful (but not always quantified in immunoblot) effects of depleting and adding back various components in the context of both STK36 and ULK3, which broadens the potential impact of the work. The biochemical analysis of ULK4 phosphorylation appears to be solid, and the mutational study at a particular pair of phosphorylation sites upstream of an acidic residue (notably T2023) is further strong evidence of a functional interaction between ULK4/STK36. The possibility that ULK4 requires ATP binding for these mechanisms is not approached, though would provide significant insight: for example it would be useful to ask if Lys39 in ULK4 is involved in any of these processes, because this residue is likely important for shaping the ULK4 substrate-binding site as a consequence of ATP binding; this was originally shown in PMID 24107129 and discussed more recently in PMID: 33147475 in the context of the large amount of ULK4 proteomics data released.

      The reviewer raised an interesting question of whether ATP binding to the pseudokinase domain of Ulk4 might be required for its function, i.e., by regulating the interaction with its binding partner. In a recent study (Preuss et al. 2020;PMID: 33147475), the critical Lys39 for ATP binding was converted to Arg (KR mutation); however, unlike in most kinases the KR mutation affect ATP binding, the K39R mutation in the Ulk4 pseudokinase did not affect ATP binding although it slightly increased ADP binding (PMID: 33147475). Another mutation made by Preuss et al(PMID: 33147475), N239L, affected protein stability, making it impossible to determine whether this mutation affect ATP binding. Therefore, in the absence of clear approach to perturb ATP binding without affecting the overall structure of Ulk4, it would be challenging to address whether ATP binding regulates the ability of Ulk4 to bind its substrates. Nevertheless, we discuss the possibility that ATP binding might regulate Ulk4/Stk36 interaction and Shh signaling.

      The discussion is excellent, and raises numerous important future work in terms of potential transportation mechanisms of this complex. It also explains why the ULK4 pseudokinase domain is linked to an extended C-terminal region. Does AF2 predict any structural motifs in this region that might support binding to Gli2?

      The extended C-terminal domain of Ulk4 contains Arm/HEAT repeats (protein-protein interacting domain), which are predicted by AF2 to form alpha helixes.

      A weakness in the study, which is most evident in Figure 1, where Ulk4 siRNA is performed in the NIH3T3 model (and effects on Shh targets and Gli2 phosphorylation assessed), is that we do not know if ULK4 protein is originally present in these cells in order to actually be depleted. Also, we are not informed if the ULK4 siRNA has an effect on the 'rescue' by HA-ULK4; perhaps the HA-ULK4 plasmid is RNAi resistant, or if not, this explains why phosphorylation of Gli2 never reaches zero? Given the important findings of this study, it would be useful for the authors to comment on this, and perhaps discuss if they have tried to evaluate endogenous levels of ULK4 (and Stk36) in these cells using antibody-based approaches, ideally in the presence and absence of Shh. The authors note early on the large number of binding partners identified for ULK4, and siRNA may unwittingly deplete some other proteins that could also be involved in ULK4 transport/stability in their cellular model.

      Due to the lack of reliable Ulk4 and Stk36 antibodies, we were unable to confirm knockdown efficiency by western blot analysis. Therefore, we relied on the measure Ulk4 and STk36 mRNA expression by RT-qPCR to estimate the knockdown efficiency (Fig 1- figure supplement 1). We used mouse Ulk4 shRNA to carry out the knockdown experiments in NIH3T3 and MEF cells while the human version of Ulk4 (hUlk4) was used for the rescue experiments (Fig 1- figure supplement 2; Fig. 8). We have confirmed that the mUlk4 shRNA targeting sequence is not conserved in hUlk4; therefore, the hULK4 construct is RNAi resistant. The rescue experiments strongly argue that the effect of Ulk4 RNAi on Shh signaling is due to loss of endogenous Ulk4. This argument is further strengthened by the observations that mutations that affected Ulk4 and Stk36 ciliary tip localization also affected Shh signaling such as Gli2 phosphorylation and Ptch1/Gli expression (Fig. 8).

      The sequence of ULK4 siRNAs is not included in the materials and methods as far as I can see.

      We have added the mouse Ulk4 RNAi target sequence in the revised version.

      Reviewer #3 (Public Review):

      In this manuscript, Zhou et al. demonstrate that the pseudokinase ULK4 has an important role in Hedgehog signaling by scaffolding the active kinase Stk36 and the transcription factor Gli2, enabling Gli2 to be phosphorylated and activated.

      Through nice biochemistry experiments, they show convincingly that the N-terminal pseudokinase domain of ULK4 binds Stk36 and the C-terminal Heat repeats bind Gli2.

      Lastly, they show that upon Sonic Hedgehog signaling, ULK4 localizes to the cilia and is needed to localize Stk36 and Gli2 for proper activation.

      This manuscript is very solid and methodically shows the role of ULK4 and STK36 throughout the whole paper, with well controlled experiments. The phosphomimetic and incapable mutations are very convincing as well. I think this manuscript is strong and stands as is, and there is no need for additional experiments.

      Overall, the strengths are the rigor of the methods, and the convincing case they bring for the formation of the ULK4-Gli2-Stk36 complex. There are no weaknesses noted. I think a little additional context for what is being observed in the immunofluorescence might benefit readers who are not familiar with these cell types and structures.

      We thank this reviewer for the positive comments.

      Recommendations For the Authors

      Reviewer #1 (Recommendations For The Authors):

      This elegant study has been thoroughly and thoughtfully designed and the dataset is solid. The biochemistry results are overall very convincing. Some data lack quantification and there needs to be more information on data analyses and statistics. The following suggestions and comments aim at strengthening the manuscript.

      1. Please provide quantification normalized to input for IP experiments (Figures 1 E - F; Figure 8 C). More information on data analyses and statistics should be provided and included as information in the figure legends.

      Thanks for the suggestions, we have done the quantification and statistics analyses for Figures 1E-G and Figure8 C as requested.

      1. Did the authors investigate whether overexpressing hULK4 in the control NIH3T3 cells leads to an increase in pS230/232 (related to Figure 1E)? This would nicely support the notion of a promoting effect of ULK4 on GLI2 phosphorylation.

      We did not. We speculated that overexpressing hULK4 may not significantly promote GLI2 phosphorylation because Ulk4 is a pseudokinase and endogenous Stk36 (the kinase partner of Ulk4) is limited.

      1. The CO-IP experiments to show GLI2 activation were performed in NIH3T3 cells, whereas HEK293 cells were used for the experiments shown in Figure 2. Is there a specific reason for switching between cell lines also for experiments shown in Figures 3 C- I? Did the authors repeat some of the key experiments in both cell lines?

      In mammalian cells, Shh-induced activation of GLI2 depends on primary cilia (Han et al., 2019). NIH3T3 cells form the primary cilia but HEK293T cells do not. Therefore, we used NIH3T3 cells to examine the processes that are regulated by the Shh treatment assay (e.g., the Shh-induced phosphorylation of GLI2 and STK36). The HEK293 cells were used to map binding domain between ULK4 and STK36/GLI2/SUFU due to the high transfection efficiency.

      1. In Figure 2 D-E the authors nicely showed that hUlk4N-HA interacted with CFP-Stk36 but not with Myc-Gli2/Fg-Sufu whereas hUlk4C-HA formed a complex with Myc-Gli2/Fg-Sufu but not with CFP-Stk36. In Figure 4E the authors showed in their Co-IP experiments that Fg-Stk36 and Myc-Gli2 form a complex independent of SHH treatment. Did the authors see some pull down of Stk36, still in complex with Gli2, using hUlk4C IP and pull down of Gli2, still in complex with Stk36, using hUlk4N IP?

      We did not test that. As we have shown in Figures 4A and 4E, knockdown of endogenous ULK4 nearly abolished the interaction between Myc-GLi2 and Fg-Stk36, suggesting that Ulk4 is the major scaffold to bring Skt36 and Gli2 together, and that there is little if any direct interaction between GLi2 and Stk36.

      1. Another method to verify hULK4-Stk36-Gli2 complex formation (Figure 4) would be helpful. For example, proximity ligation assays, tripartite split GFP assays, or colocalization based on expansion STED immunofluorescence microscopy could be performed to temporally and spatially resolve localization of Ulk4, Stk36 and Gli2 upon SHH stimulation in the primary cilium

      Thanks for the suggestions. We think that our current study using biochemical and cell biology approaches have provide sufficient evidence that Ulk4, Stk36 and Gli2 form complexes. We will keep in mind of those more sophisticated methods in our future endeavors.

      1. Please provide more representative images of Ulk4, Stk36 and Gli2 localization in NIH3T3 cells or lower magnification overview images showing more than one cell (Figure 5).

      We have provided more representative images in Figure 5- figure supplement 1A-F of the revised manuscript.

      1. Confirmation of the results shown in Figure 5 in a second cell line would strengthen the data.

      We have confirmed the results in MEFs (see Figure 5- figure supplement 1G-J)

      1. Did the authors add immunofluorescence for tubulin as a ciliary base marker to ensure correct assignment of ciliary tip versus ciliary base localization for quantification experiments (Figures 5 - 8)?

      It has been well documented that GLi2 is accumulated at the ciliary tip in respond to Shh treatment; therefore, we used Gli2 as a marker for ciliary tip where both Ulk4 and Stk36 were also accumulated. γ tubulin staining could be another marker to assign the ciliary tip vs base; however, the antibody combination we have did not allow us to simultaneously stain γ tubulin and acetylated tubulin (Ac-Tub).

      1. SMO localization as a further readout of SHH pathway activation might be considered to be added for some of the key results (e.g., Figure 6). Is SMO trafficking affected after depletion or overexpression of ULK4?

      Due to the lack of a workable antibody to detect endogenous Smo in our hands, we did not determine whether the trafficking of SMO is affected after depletion or overexpression of ULK4. However, we noticed that a recent study reported that the SHH-induced ciliary SMO accumulation was impaired in Ulk4 siRNA treated cells (Mecklenburg et al. 2021). We include this information and its implication in the discussion section

      1. Do the authors see ULK4 only at the ciliary tip after SHH stimulation or is there also a dynamic time-dependent localization along the ciliary shaft? The image in Figure 6E (dKO + Stk36 WT) seems to show ULK4 also in the shaft.

      Unlike Smo that is evenly distributed alone the axoneme of primary cilia, ULK4 is mainly accumulated at ciliary tips upon Shh stimulation. Ulk4 is also located at low levels outside the cilia and sometimes in the ciliary shaft during its transit to the ciliary tip (e.g., see Figure 5- figure supplement 1F1-2; J1-2).

      1. Is the immunofluorescence signal for Ulk4 significantly reduced after shRNA treatment to deplete Ulk4 (Figure 6A)?

      We constructed a cell line that stably expressed ULK4 shRNA. The knockdown efficiency was determined by measuring Ulk4 mRNA expression (Fig 1_figure supplement 1). Because we were unable to obtain a reliable ULK4 antibody for immunostaining, we did not examine by whether ULK4 signal was depleted by Ulk4 shRNA.

      1. The labelled ciliary tip resembles in some cases images seen for ciliary abscission. The authors could use membrane/ciliary membrane markers to ensure "intraciliary" localization of the investigated factors.

      Thanks for the suggestion. We will try that in our future experiments.

      1. How many replicates were used in the three independent quantitative RT-PCR experiments (Figure 1 A-D)?

      We used 3 replicates in each independent quantitative RT-PCR assay.

      1. Please provide p values or statement on no significance for the comparison between Ulk3 single and Ulk3/Ulk4 double knockdown (Figure 1C) and between Stk36 single and Stk36/Ulk4 double knockdown (Figure 1D; Fig1_Figure Supplement 2).

      Thanks for the suggestion, we have added the p value or “ns” as asked.

      1. Figure legends in general are a bit short could have some more detailed information.

      Thank you for the suggestion, we have revised the Figure legends as asked.

      1. What do the asterisks present in Figure 4 C-D?

      Thanks for the suggestion. The asterisks in Figure 4C-D indicated the full length STK36 and truncated form STK36N and STK36C fragments. We that included this information in the figure legend.

      1. The authors state that a previous study described ULK4 as a genetic modifier for holoprosencephaly and that this raised the possibility that ULK4 may participate in HH signal transduction. Primary ciliary localization of ULK4 in mouse neuronal tissue and SHH pathway modulation by ULK4 in cell culture have been shown by Mecklenburg et al. 2021 before. Maybe the authors could rephrase their introduction and discussion accordingly.

      Thanks for the suggestion, we have changed the introduction and discussion accordingly.

      1. Overexpression studies in heterologous systems using tagged proteins can potentially have an influence on their subcellular localization and function. Please discuss this caveat.

      We have mentioned this caveat in the “discussion” section of the revised manuscript. However, we have tried to express the transgene at low levels using the lentiviral vector containing a weak promoter to ensure that the exogenously expressed proteins are still regulated by Hh signaling. We have also confirmed that the tagged Ulk4 and Stk36 can rescue the loss of endogenous genes.

      1. More details in the Methods section should be provided on the SHH induction in NIH3T3 cells, HEK293 cells and MEFs.

      We have revised the methods section on Shh induction.

      1. ULK4 is known to have at least three isoforms that exhibit varying abundance across developmental stages in mice and humans (Lang et al., 2014) (DOI:10.1242/jcs.137604). Can the authors speculate on potential common and distinct functions of the different ULK4 isoforms on SHH pathway modulation based on their present results?

      It is interesting that Ulk4 has multiple isoforms in both mouse and human. Several short isoforms in both mouse and human lack the pseudokinase domain while one short isoform in mouse lacks the C-terminal region essential for Ulk4 ciliary tip localization. We speculate that the C-terminally deleted isoform may not have a function in the Shh pathway based on our results shown in Fig. 7 and 8 but might still have functions in other cellular processes.

      Reviewer #2 (Recommendations For The Authors):

      The paper is well written, and clear throughout, with excellent (up-to-date) citations to the field.

      We thank reviewer #2 for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      My only quibble is that the immunofluorescence images are a little confusing, especially to people outside of the field. Please include an image of the whole field and improve the captions. Is that a single cell for each cilia? Why are there so few cilia? The DAPI makes it seem like What are we looking at? Are those multiple nuclei in Figure 6? They seem a little small if that's the 5 uM scale bar

      We provide uncropped images of Figure 5E to show the entire cells (below). We have added some context to improve the captions. Most of the mammalian cells such as MEF and NIH3T3 cells contain a single primary cilium; however, mutilated cells do exist. The DAPI staining indicated the nuclei. The cells shown in Figure 6 have single nucleus (the scale should be 2 µM). Due to the unevenness of DAPI signals in the nuclei, only the strong signals (puncta) were shown for individual nuclei.

      Author response image 1.

      One small typo: GLL2 instead of GLI2 on line 363

      Thanks, we have corrected this spelling mistake.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely thank and express our appreciation to each of the reviewers for their thorough critique of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1. The analysis of whole study comes from only 4 cells from L2/3 of ferret visual cortex; however, it is well known that there is a high level of functional heterogeneity within the cortical neurons. Do those four neurons have similar or different response properties? If the four neurons are functionally different, the weak or no correlation may result from heterogenous distribution pattern of mitochondria associated with heterogenous functionality.

      This is an important consideration and often a limitation of CLEM studies. While cortical neurons do exhibit a high degree of functional heterogeneity (similar to spine activity), the 4 cells examined all had selective (OSI > 0.4) somatic responses to oriented gratings, although they differed in their exact orientation preference. Due to experimental limitations of recording from a large population of dendritic spines, we did not characterize other response properties for which their sensitivity might differ. We did not consider orientation preference a metric of study, but instead characterized the difference in preference from the somatic output, allowing comparisons across spines. In addition, our measurements were limited to proximal, basal dendrites rather than any location in the dendritic tree. Nonetheless, we attempted to address this concern by examining spines with functionally heterogenous visual responses within single cells, as reported in our manuscript: mitochondrial volume within a 1 µm radius was correlated with difference in orientation preference relative to the soma across all 4 cells, the mean r = 0.49 +/- 0.22 s.d.), suggesting that cell-to-cell variability has a minimal impact on our main conclusions.

      Even with our limited measurements, there is a large amount of functional heterogeneity in dendritic spine responses (Extended Data Figure 2, Scholl et al. 2021), far greater than the small differences in somatic responses of these 4 cells (Figure 3, Scholl et al. 2021). Moreover, the individual dendrites from these 4 cells exhibited substantial heterogeneity in the distribution of mitochondria. We cannot rule out whether heterogeneity at various scales may obscure certain relationships or result in the weak correlations we observed. We also note that future advancements in volume electron microscopy should allow for greater sample sizes that can better address the role of functional (and structural) heterogeneity. We have added text in the Discussion section about the potential structure-function relationships that might be obscured or revealed by neuron heterogeneity.

      1. The authors argued that "mitochondria are not necessarily positioned to support the energy needs of strong spines." However, the overall energy needs of a spine depend not only on the synaptic strength but also on the frequency of synaptic activity. Is there a correlation between the mitochondria volume around a spine and the overall activity of the spine? This data needs to be analyzed to confirm the distribution of mitochondria is independent of local energy needs.

      The reviewer is correct, but our experimental paradigm was not optimally designed to measure the ‘frequency’ of synaptic activity in vivo. This could have been accomplished with flashed gratings or, perhaps, presenting drifting gratings at different temporal frequencies. For spines tuned to higher temporal frequencies in V1, we may expect greater energy needs, although as the reviewer suggests, energy needs will depend on synapse (and bouton) size. Because we are not able to directly measure activity frequency as carefully or beautifully as can be done ex vivo or in nerve fibers, we do not feel confident in attempting such analysis in this study. Instead, based on previous studies, we assumed that larger synapses might be able to transmit higher frequencies, and thus have higher energy demands. We believe future in vivo studies will more directly measure synaptic frequency for comparison with mitochondria.

      We have added text in the Discussion about this caveat and potential future experiments.

      1. In the results section, the authors briefly mentioned that "We also considered other spine response properties related to tuning preference; specifically, orientation selectivity and response amplitude at the preferred orientation. For either metric, we observed no relationship to mitochondria within 1 μm radius (selectivity: 1 μm: r = -0.081, p = 0.269, n = 60; max response amplitude: 1 μm: r = -0.179, p = 0.078, n = 64) but did see a weak, significant relationship to both at a 5 μm radius (selectivity: r = 0.175, p = 0.027, n = 121; max response amplitude: r =-0.166, p = 0.030, n = 129)." Here only statistic results were given while the data were not presented in the figure illustration. Based on the methods and Figure 3B, it seems that the preferred orientations were calculated based on the vector summation. How did the authors calculate the "response amplitude at the preferred orientation"? This needs to be clarified. In addition, given the huge variety of orientation selectivity, using the response amplitude at the preferred orientation may not be the best parameter to correlate with the mitochondria volume which is indicative of energy needs. It might be necessary to include the baseline activity without visual stimulation and the average response for visual stimuli of different orientations in the analysis.

      We apologize for this oversight, as the details are present in our previous study (Scholl et al., 2021). Response amplitude and preferred orientation were calculated from a Gaussian curve fitting procedure with specific parameters describing those exact values (see Scholl et al. 2021 or Scholl et al. 2013). Only spines with selective responses (vector strength index > 0.1) and passing our SNR criterion were used for these analyses. We have now added this information to the Methods section and referred to it in the Results. With respect to the reviewer’s other concern, we also examined the average response amplitude (across all visual stimuli). There we found no relationship between the volume of mitochondria within 1 or 5 microns of a spine, however, because spines differ greatly in their selectivity (range = 0 – 0.8) the average response may not be an appropriate metric to compare across spines.

      1. A continuation from the former point, do the spines with similar preferred orientation to the somatic Ca signal have similar Ca signal strength, orientation selectivity index and other characteristics to the spines with different preferred orientation? As shown in the examples (Figure 3B), the spine on the right with different orientation preference compared with its soma has considerably larger response in non-preferred orientation compared to the spine on the left. Thus, the overall activity of the spine on the right may be higher than the spine which has similar preferred orientation to the soma. The authors showed that a positive correlation between difference in orientation preference and mitochondria volume (Figure 3C). Could this be simply due to higher spine activity for non-preferred orientation or spontaneous activity? Thus, the mitochondria might be positioned to support spines with higher overall activity rather than diverse response property.

      The reviewer brings up an interesting consideration. We examined the response properties of spines co-tuned (∆θpref < 22.5 deg) and differentially-tuned (∆θpref > 67.5 deg) to the soma. The orientation selectivity was not different between the two groups (p = 0.12, Wilcoxon ranksum test), although there was a small trend towards co-tuned inputs being more selective. We found that calcium response amplitudes for the preferred stimulus were also not different (p = 0.58, Wilcoxon ranksum test). These analyses are now included as a sentence in the Results.

      We agree with the reviewer that higher spontaneous activity in non-preferred spines may help explain the mitochondrial relationship we observe. However, our current dataset does not have sufficiently long recordings to measure spontaneous synaptic activity. Further, when considering a non-anesthetized preparation, spontaneous activity is highly dependent on brain state and an animal’s self-driven brain activity, which all must be experimentally controlled or measured to accurately address this.

      1. In addition, the information about the orientation selectivity of the soma is also missing. Do the four cells shown here all have similar level of orientation selectivity? Or some have relative weak orientation selectivity in the soma?

      Yes, all 4 cells have a similar OSI (range = 0.4 – 0.57, mean = 0.46 +/ 0.08 s.d.). This has been added to the Results section.

      1. This study focused on only a fraction of spines that are (1) responsive (2) osi > 0.1. However, in theory energy consumption is also related to non-responsive spines and spines with weak orientation tuning. What is the percentage of tuned and untuned spines? What's the correlation of mitochondria volume and spine activity level for untuned spines? I also recommend including the non-responsive spines into the analysis. For example, for each mitochondrion calculate the averaged overall activity of spines within certain distance from the mitochondrion, including the non-responsive spines. I would predict there may be more active spines and higher overall spine activity of dentritic segments near a mitochondrion than segments far from a mitochondrion.

      A majority of spines were tuned for orientation (~91%), although we specifically chose to only analyze data from spines with verifiable, independent calcium events. All analyses except those involving measurements of orientation preference use all dendritic spines (i.e. tuned and untuned). We have clarified this in the Results.

      These other ‘silent’ (i.e. without resolvable visual activity) spines may significantly contribute to energy demands of a dendrite too, as our methods (GCaMP6s expression) likely only capture synaptic events driving Ca+2 influx through NMDA receptors or VGCCs. We expect that glutamate imaging (e.g. iGlusnfr) may open the door to additional analyses to fully characterize functional relationship between spines and mitochondria.

      1. The correlation coefficient for mitochondria volume and difference in orientation preference is relatively low (r=0.3150). With such weak correlation, the explanatory power of this data is limited.

      We agree that while the correlation is significant, it is not particularly strong. To better represent the noise surrounding this measurement, we performed a bootstrap correlation analysis, sampling with replacement (1 micron: mean r = 0.31 +/- 0.11 s.e., 5 micron: mean r = 0.02 +/- 0.10 s.e.). We now include this in the Results.

      1. Why do the numbers of spines in different figures vary? For example, n=60 for 1micron in Figure 3, 54 in Figure 3c, 31 in Figure 4b, 51 in Figure 4e and so on.

      We apologize for the lack of clarity. Each analysis presented different requirements of the data. For example, orientation preference was computed only for selective (OSI > 1) spines (Fig. 3c), but this requirement did not apply to comparisons with selectivity or response amplitude (Fig. 3d). Similarly, as stated in the Results and Methods, measurements of local heterogeneity require a minimum number of neighboring spines (n > 2), limiting the number of usable spines for analysis (Fig. 4). We have clarified this in the text.

      1. In Figure 6a, the sample sizes of mito+ spines and mito- spines are extremely unbalanced, which affects the stat power of the analysis. I recommend performing a randomization test.

      We thank the reviewer for this suggestion. We ran permutation tests to compare the similarity in mean values between equally sampled values from each distribution. These tests supported our original analysis and conclusions. We have added these tests to the Results.

      1. Ca signals are approximations of electrical signals. How well are spinal calcium signals correlated to synaptic strength and local depolarization? This should be put into discussion.

      There is unlikely a simple, direct relationship between spine calcium signal and synaptic strength or membrane depolarization, and this has never been addressed in vivo. Koester and Johnston (2005) performed paired recordings in slice and showed that single presynaptic action potentials producing successful transmission generate widely different calcium amplitudes (Fig. 3). Another study from Sobczyk, Scheuss, and Svoboda (2005) used two-photon glutamate uncaging on single spines and showed that micro-EPSC’s evoked are uncorrelated with the spine calcium signal amplitude. We have added a note about this in the discussion.

      1. In Figure 4i, the negative correlation may depend on the 4 data points on the right side. How influential are those data points?

      Spearman’s correlation coefficient analysis is robust to outliers and it is highly unlikely these datapoints are critical with our sample size (n > 100 spines).

      1. Raw data of Ca responses were missing.

      Some data has been published with the parent publication (Scholl et al., 2021). As spine imaging data is difficult to obtain and highly unique, we prefer to provide raw data directly upon reasonable request of the corresponding author.

      1. What is the temporal frequency of the drifting grating? Was it fixed or the speed of the grating was fixed?

      This was fixed to 4 Hz and this is now included in the Methods.

      Reviewer #2 (Recommendations For The Authors):

      1. Most of the measurements were based on the distance from the base of the spine neck, and "only on spines with measurable mitochondrial volume at each radius" were analyzed. To better understand the causality, it may also be interesting to have an analysis based on the distance from mitochondria. Would the result be different if the measurements are not 1µm / 5µm from spine but 1µm / 5µm from mitochondria? (e.g. total spine volume in 1µm / 5µm from mitochondria).

      In fact, our first iteration of this study focused on exactly this metric: measuring the distance to nearest mitochondria. However, after lengthy discussions between the authors, we ultimately decided this metric was inferior to a volumetric one. Our decision was based on several factors: (1) distance to mitochondrion is ill defined (e.g. distance to the a mitochondrion center or nearest membrane edge?), (2) the total amount of mitochondrial volume within a dendritic shaft should allow the greatest amount of energetic support (e.g. more cristae for ATP production, greater capacity for calcium buffering), and (3) we would not account for the geometry of individual mitochondria or their placement near a spine (e.g. when 2 different mitochondria are next to the same spine) We have added further clarification of our reasoning to the Results.

      Nonetheless, we present the reviewer some of our original analyses correlating distance to mitochondria (from the base of the spine and including the spine neck length):

      Author response image 1.

      Here, we examined the relationship to spine head volume, spine-soma orientation preference difference, and the local orientation preference heterogeneity. No relationship showed any significant correlation. Again, this may not be surprising given the drawbacks of measuring ‘distance to mitochondria’.

      1. Is there a selection criterion for the spine for the analysis? Are filopodia spines excluded in the analysis?

      Spines were analyzed regardless of structural classification; however, they were only analyzed if they had a synaptic density with synaptic vesicle accumulation. In our dataset (including those visualized in vivo and reconstructed from the EM volume) we observed no filopodia.

      1. The result states that "56.8% of spines had no mitochondria volume within 1 μm and 12.1% of spines had none within 5 μm.". In other words, around 43% of spines had mitochondria within 1 μm. It would be interesting to show whether there is a correlation between mitochondria size and spine density.

      We agree that this is an interesting measurement. It has been reported that mitochondrial unit length along the dendrite co-varies with linear synapse density in the neocortical distal dendrites of mice (Turner et al., 2022). This was specifically true in distal portions of dendrites more than 60 µm from the soma, because mitochondria volume increases as a function of distance roughly up to this point, then remains relatively constant beyond this distance.

      To investigate this possibility, we calculated the local spine density around an individual spine and compared to the mitochondria volume within 1 or 5 µm. We found no evidence of a correlation between local spine density and the volume of mitochondria (1 µm: Spearman r = -0.07447, p = 0.2859; 5 µm: r = -0.04447, p = 0.3141). However, the majority of our measurements are more proximal than 60 µm (our median distance of all spines = 49.4 µm, max = 114 µm) and this may be one reason why observe no correlation.

      1. In Figure 3B, the drifting grating directions are examined from 0 to 315 degrees in the experiment. However, in Figure 3C and 3D, the spine-soma difference of orientation preference was limited to 0 to 90 degree in the graph. Is the graph trimmed, or is there a cause that limits the spine-soma difference of orientation preference to 90?

      Ferret visual cortical neurons are highly sensitive to grating direction and the responses are fit by a double Gaussian curve which estimates the ‘orientation preference’ (0-180 deg). We then calculated the absolute difference in orientation preference and wrapped that value in circular space so the maximum difference possible is 90 deg (e.g. 135 deg -> 45 deg).

      1. In Figure 4D-F, how is the temporal correlation of calcium activity determined? Is it based on stimulated activity or basal activity? A brief explanation may be helpful to the readers. Also, scale bars could be added to Fig 4D.

      Temporal correlation is computed as the signal correlation between 2 spines over the entire imaging session at that field of view. Specifically, we measured the Pearson correlation between each spine’s ∆F/F trace. To measure the local spatiotemporal correlation, we computed correlations between all neighboring spines within 5 microns and took the average of those values. We have clarified this in the Results section.

      1. Figure 3C and Figure 4D displayed a significant correlation in 1µm range and such correlation drastically diminished once the criterion changed to 5µm range. It would be interesting to include the criterion of intermediate ranges. It would be interesting to see if there is a trend or tendency or if there is a "cut-off" limit.

      We agree with the reviewer that the drastic change in the correlations between 1 and 5 µm range was surprising to see. While these volumetric measurements are time consuming, we returned to our data and measured an intermediate point of 3 µm. Investigating relationships reported in our study, we found no significant trends for spine-soma similarity (Spearman’s r = -0.011, p = 0.54) or local heterogeneity (Spearman’s r = 0.11, p = 0.23). This suggests that a potential ‘critical distance’ might be less than 3 µm; however, far more additional measurements and analyses would be needed to attempt to identify exactly what this distance is.

      1. In Figure 5, it is shown that spines having mitochondrion in the head or neck are larger. However, only 10 spines are found with mitochondria inside. In the current dataset, are mitochondria abundantly found in large spines? Further analysis or justification would be informative to address this.

      In our dataset, mitochondria were found in ~5% of all spines. Spines with mitochondria have a median volume of approximately 0.6 µm3, roughly twice as large as than those without mitochondria, as the reviewer suggests. In the entire population of spines without mitochondria, a volume of 0.6 µm3 represents roughly the 82nd percentile. In other words, of the total population of 157 spines without mitochondria, only 29 had equal or greater volume than the median spine with a mitochondrion. We believe this trend is clearly shown in Figure 5A and is supported by our analysis, including new permutation tests suggested by Reviewer 1.

      Reviewer #3 (Recommendations For The Authors):

      1. The authors state that their unsupervised method "quickly and accurately identified mitochondria," but the methods section only says that segmentations were proofread. Was every segmentation examined and judged to be accurate, or was only a subset of the 324 mitochondria checked?

      After deep learning-based extraction, each mitochondrion segmentation was manually proofread. For each dendrite segment, this was ~10-20 mitochondria, so it did not take long to manually inspect and edit each mitochondrion segmentation.

      1. The EM image of the mitochondrion in the spine head in Figure 2C is low resolution and does not apply to the bulk of the data. Images more representative of the analyzed data should be added to supplement the cartoons.

      Our primary rationale for including this specific image was to show that the mitochondria located within spines are small, round, and to include a view of the synapse as well as the mitochondrion. We now include enlarge and additional EM images to Figure 1C.

      1. The majority of spines did not have any mitochondria within a 1 micron radius and were excluded from the correlation analyses, so most of the conclusions are based on a minority of spines. It would be interesting to see comparisons between spines with and without nearby mitochondria. Correlations between the absolute distance to any mitochondrion, synapse size, and mismatch to soma orientation would be especially interesting.

      The reviewer brings up a good point. It is true that many spines were excluded from our analysis based on the fact that they did not have nearby mitochondria within 1 or 5 µm (56.8% of spines had no mitochondria volume within 1 μm and 12.1% of spines had none within 5 μm). We compared the distributions of synapse size, mismatch to soma, and orientation selectivity of two groups of spines – those with at least some mitochondria within 1 µm (n = 65) versus spines without any mitochondria within 5 µm (n = 19).

      We found no difference in the distributions between spine volume (1 µm: median = 0.29 µm3, IQR = 0.41 µm3; no mitochondria within 5 µm: median = 0.40 µm3, IQR = 0.37 µm3; p = 0.67) or PSD area (1 µm: median = 0.26 µm2, IQR = 0.33 µm2; no mitochondria within 5 µm: median = 0.31 µm2, IQR = 0.36 µm2; p = 0.49). For functional measures, we also saw no difference in orientation selectivity (1 µm: median = 0.29, IQR = 0.28; no mitochondria within 5 µm: median = 0.28, IQR = 0.15; p = 0.74) or mismatch to soma orientation (1 µm: median = 0.54 deg, IQR = 0.86 deg; no mitochondria within 5 µm: median = 0.46 deg, IQR = 0.47 deg; p = 0.75). We now include analyses in the Results.

      We also looked at the absolute distances to mitochondria and did not find any significant relationships to spine head volume, spine-soma orientation preference difference, or the local orientation preference heterogeneity (see our response to reviewer #2 for more information).

      1. In Figure 1A the mitochondria appear to be taking up a substantial fraction of the dendritic shaft diameter, even for distal dendrites. It would be useful to know the absolute diameter of the dendrites and mitochondria, given that this is not rodent data and there is no reference for either in the ferret.

      We agree with the reviewer’s point, although we would like to remind the reviewer that these are basal dendrites of layer 2/3 cells. Basal dendrites tend to be thinner than apical branches. Interestingly, in some cases, the dendrite even swells to accommodate a mitochondrion. We did not incorporate this measurement in our study because it is not trivial; dendrite diameter is variable and dendrites are not perfect cylinders. Although we did not make precise measurements across our dendrites, the diameter is comparable to what has been seen in mouse cortex (Turner et al., 2022), roughly 500-1000 nm, but as small as 100 nm at some pinch points. In terms of mitochondria, many were roughly spherical or oblong, therefore the maximum diameters we report are roughly similar to, if not a bit larger than, those of the cross-sectional diameter.

      1. As a rule, PSD area is correlated with spine volume, which makes the observation that spines with mitochondria have larger volume but not PSD area surprising. With n=10 it is difficult to draw conclusions, but it would be interesting to know the PSD area-to-volume ratio of other spines of the same volume and synapse size.

      We were also somewhat surprised to see this, but exactly as the reviewer mentioned, we believe it to be a limitation of the sample size. The difference in volume was large enough to be detected despite a small sample size. We saw a trend towards larger synapses when spines have mitochondria (the median was approximately 60% larger), and we would expect with a larger comparison that PSD area would be significantly greater in spines with mitochondria.

      We calculated the PSD area-to-spine head volume ratio for spines with or without mitochondria. Spines with mitochondria had a significantly lower ratio compared to those without (Mann-Whitney test, p = 0.0056, mito - = 0.78, n = 10; mito + = 0.53, n = 157). As the reviewer mentions, it is somewhat difficult to draw a conclusion from this, but it appears that the PSD does not scale with the increased spine head size.

      Author response image 2.

      The only way to definitively address this is to increase the sample size, which is becoming easier to achieve with the progression of volume EM imaging and analysis techniques in recent times. We look forward to addressing this in the future.

      1. Nothing is made of the significant fact that these data come from the visual system of a carnivore, not a mouse. Consideration of differences in visual physiology between rodents and carnivores would be worthwhile to put the function of these dendrites in context.

      We thank the reviewer for this consideration and have added text to the Discussion.

    1. Author Response

      Reviewer #2 (Public Review):

      Manassaro et al. present an extensive three-session study in which they aimed to change defensive responses (skin conductance; SCR) to an aversively conditioned stimulus by targeting medial prefrontal cortex (their words) using repetitive TMS prior to retrieval. They report that stimulating mPFC using TMS abolishes SCR responses to the conditioned stimulus, and that this effect is specific for the stimulated region and the specific CS-US association, given that SCR responses to a different modality US are not changed.

      I like how the authors have clearly attempted to control for several potential confounds by including multiple stimulation sites, measured SCR responses to several unconditioned stimuli, and applied the experiment in multiple contexts. However, several conceptual and practical issues remain that I think limit the value of potential conclusions drawn from this work.

      The first issue that I have with this study concerns the relationship between the TMS manipulation and the theoretical background the authors present in their rationale. In the introduction the authors sketch that what they call 'mPFC' is involved in regulation of threat responses. They make a convincing case, however, almost all of the evidence they present concerns the ventromedial part of the prefrontal cortex (refs 18-25). The authors then mention that no one has ever studied the effects of 'mPFC'-TMS on threat memories. That is not surprising given that stimulating vmPFC with TMS is very difficult, if not impossible. Simulation of the electrical field that develops as a consequence from the authors manipulation (using the same TMS coil and positioning the authors use) shows that vmPFC (or mPFC for that matter) is not stimulated. The authors then continue in the methods section stating that the region they aimed for was BA10. This region they presumably do stimulate, however, that does not follow logically from their argument. BA10 is anatomically, cytoarchitectonically and functionally a wholly different area than vmPFC and I wonder if their rationale would hold given that they stimulate BA10.

      We would like to thank the Reviewer for highlighting this very important point. The Reviewer is right in stating that the Brodmann area 10 (BA 10) is anatomically, cytoarchitectonically, and functionally distinct from the ventromedial PFC. As we reported in the Methods section, the coil placement over the frontopolar midline electrode (Fpz) according to the international 10‒20 EEG coordinate system directly focused the stimulation over the medial portion of the BA 10. In the literature, the aPFC is also known as the “frontopolar cortex” or the “rostral frontal cortex” and encompasses the most anterior portion of the prefrontal cortex, which corresponds to the BA 10. In line with this observation, we have corrected “medial prefrontal cortex” (mPFC) with “medial anterior prefrontal cortex” (aPFC) throughout the manuscript. We also have corrected the theoretical background and the rationale in the Introduction section by mentioning several studies that: i) Reported the involvement of the aPFC in emotional down-regulation (Volman et al., 2013; Koch et al., 2018; Bramson et al., 2020). ii) Traced anatomical connections between the medial/lateral aPFC and the amygdala (Peng et al., 2018; Folloni et al., 2019; Bramson et al., 2020). iii) Detected functional connections between the aPFC and the vmPFC during fear down-regulation (Klumpers et al., 2010). iv) Found hypoactivation, reduced connectivity, and altered thickness of aPFC in PTSD patients (Lanius et al., 2005; Morey et al., 2008; Sadeh et al., 2015; Sadeh et al., 2016). v) Revealed that strong activation of the aPFC may promote a higher resilience against PTSD onset (Kaldewaij et al., 2021) and that enhanced aPFC activity and potentiated aPFC-vmPFC connectivity is detectable after effective therapy in PTSD patients (Fonzo et al., 2017). Furthermore, we discussed our results in light of this evidence in the Discussion section. We really thank the Reviewer for this key implementation of our study.

      The second concern I have is that although I think the authors should be praised for including both sham and active control regions, the controls might not be optimally chosen to control for the potential confounds of their condition of interest (mPFC-TMS). Namely, TMS on the forehead can be unpleasant, if not painful, whereas sham-TMS or TMS applied to the back of the head or even over dlPFC is not (or less so at the very least). Given that the SCR results after mPFC TMS show exactly the same temporal pattern as the sham-TMS but with a lower starting point, one could wonder whether a painful stimulation prior to the retrieval might have already caused habituation to painful stimulation observed in SCR in consequent CS presentations. A control region that would have been more obvious to take is the lateral part of BA10, by moving the TMS coil several centimeters to the left or right, circumventing all things potentially called medial but giving similar unpleasant sensations (pain etc).

      We would also like to thank the Reviewer for bringing to light this issue and allowing us to strengthen our results. The Reviewer is right in pointing out that rTMS application over the forehead can be subjectively perceived as unpleasant, relative to other head coordinates or sham stimulation. The question of whether an unpleasant stimulation prior to the retrieval might provoke habituation to discomfort sensations and lead to weaker SCRs in the consequent CS presentations is valid and reasonable. We also thank the Reviewer for advising us to stimulate the lateral part of BA 10 as an active control site. However, given the potential involvement of the lateral BA 10 in the fear network (see previous point) and the potential risks due to the anatomical proximity of lateral BA 10 with the temporal lobe, we reasoned to adopt an alternative approach to investigate whether “a painful stimulation prior to the retrieval might have already caused habituation to painful stimulation observed in SCR in consequent CS presentations”. We repeated the entire experiment in one further group (ctrl discomfort, n = 10) by replacing the rTMS procedure with a 10-min discomfort-inducing procedure over the same site of the forehead (Fpz) to mimic the rTMS-evoked unpleasant sensations in the absence of neural stimulation effects (see the new version of the Methods section). The electrical stimulation intensity was individually calibrated through a staircase procedure (0 = no discomfort; 10 = high discomfort). The shock amplitude was set at the current level corresponding to the mean rating of ‘4’ on the subjective scale because, in the new experiments that we performed targeting the aPFC with rTMS (n = 9), we collected participants’ rTMS-induced discomfort ratings obtaining a mean rating of 3.833 ± 0.589 SEM on the same scale. We found CS-evoked SCR levels not significantly different to those of the sham group during the test session as well as during the follow-up session, suggesting that the discomfort experienced during the rTMS procedure did not contribute to the reduction of electrodermal responses observed in the aPFC group. We reported the results of this experiment in the Results section and Figure 2-figure supplement 2.

      My final concern is that the main analyses are performed on single trials of SCR responses, which is a relatively noise measure to use on single trials. This is also done in relatively small groups (n=21). I would have liked to see both the raw or at least averaged timeseries SCR data plotted, and a rationale explaining how the authors decided on the current sample sizes, if that was based on a power analyses one must have expected quite strong effects.

      Following the Reviewer’s suggestion, we decided to remove the analysis on single trials, and we apologize for not including SCR timeseries. To quantify the amount of effect induced by the rTMS protocol, we have now added within-group comparisons (through 2 × 2 mixed ANOVAs) that show, for each group, the amount of change in CS-evoked SCRs from the conditioning phase to the test phase, as well as from the conditioning phase to the follow-up phase. Furthermore, to directly and simply depict these changes, in addition to dot plots, we have also represented them with line charts (Figs. 2C, 2H, 4C, 4H, 5C, 5H). To estimate the sample size, we had previously performed a power analysis through G*Power 3.1.9.2 and it had resulted in n = 21 per experimental group. However, by correcting data pre-processing procedures (in accordance with Reviewer 1), we obtained data that were not normally distributed. Thus, we reasoned to enlarge our sample width by re-performing a power analysis (with the new suggested statistical analyses) and then repeating the experiments. For the main statistics, i.e. mixed ANOVA (within-between interaction) with two groups and two measurements, with the following input parameters: α equal to 0.05, power (1-β) equal to 0.95, and a hypothesized effect size (f) equal to 0.25, the new estimated sample size resulted in n = 30 per experimental group.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the Authors implement a delayed feedback control method and use it for the first time in biological neuronal networks. They extend a well-established computational theory and expand it into the biological realm. With this, they obtain novel evidence, never considered before, that showcases the difference between simulated neuronal networks and biological ones. Furthermore, they optimize the DFC method to achieve optimal results in the control of cell excitability in the content of biological neuronal networks, taking advantage of a closed-loop stimulation setup that, by itself, is not trivial to build and operate and that will certainly have a positive impact the fields of cellular and network electrophysiology.

      Regarding the results, it would be very constructive if the Authors could share the code for the quasi-real-time interface with the Multichannel Systems software (current and older hardware versions), as this represents likely a bottleneck preventing more researchers to implement such an experimental paradigm.

      On the data focusing on the effects of the DFC algorithms on neuronal behavior, the evidence is very compelling, although more care should be devoted to the statistical analyses, since some of the applied statistical tests are not appropriate. In a more biological sense, further discussion and clarification of the experimental details would improve this manuscript, making it more accessible and clearer for researchers across disciplines (i.e., ranging from computational to experimental Neuroscience) and increasing the impact of this research.

      In summary, this work represents a necessary bridge between recent advances in computational neuroscience and the biological implementation of neuronal control mechanisms.

      Regarding sharing the control code, our application for closed-loop stimulation using aDFC, DFC and Poisson is now available in GitHub (https://github.com/NCN-Lab/aDFC). This was, in fact, our initial intention following the reviewing process. With this application, the user can run the developed algorithms with the MEA2100-256 System from Multi Channel Systems MCS GmbH.

      Same with the data. The dataset with the spike data from all experiments is also now publicly available in Zenodo. The data can be found in https://doi.org/10.5281/zenodo.10138446.

      Regarding the improvements in the statistical analysis, the tests are now performed following Reviewer #1 suggestions. Important to emphasize that this did not change the results/ conclusions of the work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We want to thank the Editor and Reviewers for their thorough assessment of the manuscript as well as their constructive critiques. We have collated below the public review and recommendations from each Reviewer as well as our responses to them.

      eLife assessment

      This study by Verdikt et al. provided solid evidence demonstrating the potential impacts of Δ9-tetrahydrocannabinol (Δ9-THC) on early embryonic development using mouse embryonic stem cells (mESCs) and in vitro differentiation. Their results revealed that Δ9-THC enhanced mESCs proliferation and metabolic adaptation, possibly persisting through differentiation to Primordial Germ Cell-Like Cells (PGCLCs), though the evidence supporting this persistence was incomplete. Although the study is important, it was limited by being conducted solely in vitro and lacking parallel human model experiments.

      Reviewer #1 (Public Review):

      The authors investigated the metabolic effects of ∆9-THC, the main psychoactive component of cannabis, on early mouse embryonic cell types. They found that ∆9-THC increases proliferation in female mouse embryonic stem cells (mESCs) and upregulates glycolysis. Additionally, primordial germ cell-like cells (PGCLCs) differentiated from ∆9-THC-exposed cells also show alterations to their metabolism. The study is valuable because it shows that physiologically relevant ∆9-THC concentrations have metabolic effects on cell types from the early embryo, which may cause developmental effects. However, the claim of "metabolic memory" is not justified by the current data, since the effects on PGCLCs could potentially be due to ∆9-THC persisting in the cultured cells over the course of the experiment, even after the growth medium without ∆9-THC was added.

      The study shows that ∆9-THC increases the proliferation rate of mESCs but not mEpiLCs, without substantially affecting cell viability, except at the highest dose of 100 µM which shows toxicity (Figure 1). Treatment of mESCs with rimonabant (a CB1 receptor antagonist) blocks the effect of 100 nM ∆9-THC on cell proliferation, showing that the proliferative effect is mediated by CB1 receptor signaling. Similarly, treatment with 2-deoxyglucose, a glycolysis inhibitor, also blocks this proliferative effect (Figure 4G-H). Therefore, the effect of ∆9-THC depends on both CB1 signaling and glycolysis. This set of experiments strengthens the conclusions of the study by helping to elucidate the mechanism of the effects of ∆9-THC.

      Although several experiments independently showed a metabolic effect of ∆9-THC treatment, this effect was not dose-dependent over the range of concentrations tested (10 nM and above). Given that metabolic effects were observed even at 10 nM ∆9-THC (see for example Figure 1C and 3B), the authors should test lower concentrations to determine the dose-dependence and EC50 of this effect. The authors should also compare their observed EC50 with the binding affinity of ∆9-THC to cellular receptors such as CB1, CB2, and GPR55 (reported by other studies).

      The study also profiles the transcriptome and metabolome of cells exposed to 100 nM ∆9-THC. Although the transcriptomic changes are modest overall, there is upregulation of anabolic genes, consistent with the increased proliferation rate in mESCs. Metabolomic profiling revealed a broad upregulation of metabolites in mESCs treated with 100 nM ∆9-THC.

      Additionally, the study shows that ∆9-THC can influence germ cell specification. mESCs were differentiated to mEpiLCs in the presence or absence of ∆9-THC, and the mEpiLCs were subsequently differentiated to mPGCLCs. mPGCLC induction efficiency was tracked using a BV:SC dual fluorescent reporter. ∆9-THC treated cells had a moderate increase in the double positive mPGCLC population and a decrease in the double negative population. A cell tracking dye showed that mPGCLCs differentiated from ∆9-THC treated cells had undergone more divisions on average. As with the mESCs, these mPGCLCs also had altered gene expression and metabolism, consistent with an increased proliferation rate.

      My main criticism is that the current experimental setup does not distinguish between "metabolic memory" vs. carryover of THC (or its metabolites) causing metabolic effects. The authors assume that their PGCLC induction was performed "in the absence of continuous exposure" but this assumption may not be justified. ∆9-THC might persist in the cells since it is highly hydrophobic. In order to rule out the persistence of ∆9-THC as an explanation of the effects seen in PGCLCs, the authors should measure concentrations of ∆9-THC and THC metabolites over time during the course of their PGCLC induction experiment. This could be done by mass spectrometry. This is particularly important because 10 nM of ∆9-THC was shown to have metabolic effects (Figure 1C, 3B, etc.). Since the EpiLCs were treated with 100 nM, if even 10% of the ∆9-THC remained, this could account for the metabolic effects. If the authors want to prove "metabolic memory", they need to show that the concentration of ∆9-THC is below the minimum dose required for metabolic effects.

      Overall, this study is promising but needs some additional work in order to justify its conclusions. The developmental effects of ∆9-THC exposure are important for society to understand, and the results of this study are significant for public health.

      *Reviewer #1 (Recommendations For The Authors):

      This has the potential to be a good study, but it's currently missing two key experiments:

      What is the minimum dose of ∆9-THC required to see metabolic effects?

      We would like to thank Reviewer 1 for their insightful comments. We have included exposures to lower doses of ∆9-THC in Supplementary Figure 1. Our data shows that ∆9-THC induces mESCs proliferation from 1nM onwards. However, when ESCs and EpiLCs were exposed to 1nM of ∆9-THC, no significant change in mPGCLCs induction was observed (updated Figure 6B). Of note, in their public review, Reviewer 1 mentioned that “The authors should also compare their observed EC50 with the binding affinity of ∆9-THC to cellular receptors such as CB1, CB2, and GPR55 (reported by other studies).” According to the literature, stimulation of non-cannabinoid receptors and ion channels (including GPR18, GPR55, TRPVs, etc.) occurs at 40nM-10µM of ∆9-THC (Banister et al., 2019). We therefore expect that at the lower nanomolar range tested, CB1 is the main receptor stimulated by ∆9-THC, as we showed for the 100nM dose in our rimonabant experiments (Fig. 2).

      Is the residual THC concentration during the PGCLC induction below this minimum dose? Even if the effects are due to residual ∆9-THC, this would not undermine the overall study. There would simply be a different interpretation of the results.

      This experiment was particularly important to distinguish between a “true” ∆9-THC metabolic memory or residual ∆9-THC leftover during PGCLCs differentiation. Our mass spectrometry quantification revealed that no significant ∆9-THC could be detected in day 5 embryoid bodies compared to treated EpiLCs prior to differentiation (Supplementary Figure 13). These results support the existence of ∆9-THC metabolic memory across differentiation.

      You also do not mention whether you tested your cells for mycoplasma. This is important since mycoplasma contamination is a common problem that can cause artifactual results. Please test your cells and report the results.

      All cells were tested negative for mycoplasma by a PCR test (ATCC® ISO 9001:2008 and ISO/IEC 17025:2005 quality standards). This information has been added in the Material and Methods section.

      Minor points:

      1. I don't think it's correct to say that cannabis is the most commonly used psychoactive drug. Alcohol and nicotine are more commonly used. See: https://nida.nih.gov/research-topics/alcohol and https://www.cancer.gov/publications/dictionaries/cancer-terms/def/psychoactive-substance I looked at the UN drugs report [ref 1] and alcohol or nicotine were not included on that list of drugs, so the UN may use a different definition. This doesn't affect the importance or conclusions of this study, but the wording should be changed.

      We agree and are now following the WHO description of cannabis (https://www.who.int/teams/mental-health-and-substance-use/alcohol-drugs-and-addictive-behaviours/drugs-psychoactive/cannabis) by referring to it as the “most widely used illicit drug in the world”. (Line 44).

      1. It would be informative to use your RNA-seq data to examine the expression of receptors for ∆9-THC such as CB1, CB2, and GPR55. CB1 might be the main one, but I am curious to see if others are present.

      We have explored the protein expression of several cannabinoid receptors, including CB2, GPR18, GPR55 and TRPV1 (Bannister et al., 2019). These proteins, except TRPV1, were lowly expressed in mouse embryonic stem cells compared to the positive control (mouse brain extract, see Author response image 1). Furthermore, our experiment with Rimonabant showed that the proliferative effects of ∆9-THC are mediated through CB1.

      Author response image 1.

      Cannabinoid receptors and non-cannabinoid receptors protein expression in mouse embryonic stem cells.

      1. Make sure to report exact p-values. You usually do this, but there are a few places where it says p<0.0001. Also, report whether T-tests assumed equal variance (Student's) or unequal variance (Welch's). [In general, it's better to use unequal variance, unless there is good reason to assume equal variance.]

      Prism, which was used for statistical analyses, only reports p-values to four decimal places. For all p-values that were p<0.0001, the exact decimals were calculated in Excel using the “=T.DIST.2T(t, df)” function, where the Student’s distribution and the number of degrees of freedom computed by Prism were inputted. Homoscedasticity was confirmed for all statistical analyses in Prism.

      1. Figure 2A: An uncropped gel image should be provided as supplementary data. Additionally, show positive and negative controls (from cells known to either express CB1 or not express CB1)

      The uncropped gel image is presented in Author response image 2. The antibody was validated on mouse brain extracts as a positive control as shown in Figure 1.

      Author response image 2.

      Uncropped gel corresponding to Fig. 2A where an anti-CB1 antibody was used.

      1. Figure 6B: Please show a representative gating scheme for flow cytometry (including controls) as supplementary data. Also, was a live/dead stain used? What controls were used for compensation? These details should be reported.

      The gating strategy is presented in Supplementary Figure 11. The Material and Methods section has also been expanded.

      1. As far as I can tell, you only used female mESCs. It would be good to test the effects on male mESCs as well since these have some differences due to differences in X-linked gene expression (female mESCs have two active X chromosomes). I understand that you might not have a male BV:SC reporter line, so it would be acceptable to omit the mPGCLC experiments on male cells.

      We have tested the 10nM-100µM dose range in the male R8 mESCs (Supplementary Figure 3). Similar results as with the female H18 cells were observed. Accordingly, PGCLCs induction was increased when R8 ESCs + EpiLCs were exposed to 100nM of ∆9-THC (Supplementary Figure 12). This is in line with ∆9-THC impact on fundamentally conserved metabolic pathways across species and sex, although it should be noted that one representative model of each sex is not sufficient to exclude sex-specific effects.

      Reviewer #2 (Public Review):

      In the study conducted by Verdikt et al, the authors employed mouse Embryonic Stem Cells (ESCs) and in vitro differentiation techniques to demonstrate that exposure to cannabis, specifically Δ9-tetrahydrocannabinol (Δ9-THC), could potentially influence early embryonic development. Δ9-THC was found to augment the proliferation of naïve mouse ESCs, but not formative Epiblast-like Cells (EpiLCs). This enhanced proliferation relies on binding to the CB1 receptor. Moreover, Δ9-THC exposure was noted to boost glycolytic rates and anabolic capabilities in mESCs. The metabolic adaptations brought on by Δ9-THC exposure persisted during differentiation into Primordial Germ Cell-Like Cells (PGCLCs), even when direct exposure ceased, and correlated with a shift in their transcriptional profile. This study provides the first comprehensive molecular assessment of the effects of Δ9-THC exposure on mouse ESCs and their early derivatives. The manuscript underscores the potential ramifications of cannabis exposure on early embryonic development and pluripotent stem cells. However, it is important to note the limitations of this study: firstly, all experiments were conducted in vitro, and secondly, the study lacks analogous experiments in human models.

      Reviewer #2 (Recommendations For The Authors):

      1. EpiLCs, characterized as formative pluripotent stem cells rather than primed ones, are a transient population during ESC differentiation. The authors should consider using EpiSCs and/or formative-like PSCs (Yu et al., Cell Stem Cell, 2021; Kinoshita et al., Cell Stem Cell, 2021), and amend their references to EpiLCs as "formative".

      Indeed, EpiLCs are a transient pluripotent stem cell population that is “functionally distinct from both naïve ESCs and EpiSCs” and “enriched in formative phase cells related to pre-streak epiblast” (Kinoshita et al., Cell Stem Cell, 2021). Here, we used the differentiation system developed by M. Saitou and colleagues to derive PGCLCs (Hayashi et al, 2011). Since EpiSCs are refractory to PGCLCs induction (Hayashi et al, 2011), we used the germline-competent EpiLCs and took advantage of a well-established differentiation system to derive mouse PGCLCs. Most authors, however, agree that in terms of epigenetic and metabolic profiles, mouse EpiLCs represent a primed pluripotent state. We have added that PGCs arise in vivo “from formative pluripotent cells in the epiblast” on lines 85-86.

      1. Does the administration of Δ9-THC, at concentrations from 10nM to 1uM, alter the cell cycle profiles of ESCs?

      The proliferation of ESCs was associated with changes in the cell cycle, as presented in the new Supplementary Figure 2, which we discuss in lines 118-123.

      1. Could Δ9-THC treatment influence the differentiation dynamics from ESCs to EpiLCs?

      No significant changes were observed in the pluripotency markers associated with ESCs and EpiLCs (Supplementary Figure 9). We have added this information in lines 277-279.

      1. The authors should consider developing knockout models of cannabinoid receptors in ESCs and EpiLCs (or EpiSCs and formative-like PSCs) for control purposes.

      This is an excellent suggestion. Due to time and resource constraints, however, we focused our mechanistic investigation of the role of CB1 on the use of rimonabant which revealed a reversal of Δ9-THC-induced proliferation at 100nM.

      1. Lines 134-136: "Importantly, SR141716 pre-treatment, while not affecting cell viability, led to a reduced cell count compared to the control, indicating a fundamental role for CB1 in promoting proliferation." Regarding Figure 2D, does the Rimonabant "+" in the "mock" group represent treatment with Rimonabant only? If that's the case, there appears to be no difference from the Rimonabant "-" mock. The authors should present results for Rimonabant-only treatment.

      To be able to compare the effects +/- Rimonabant and as stated in the figure legend, each condition was normalized to its own control (mock with, or without Rimonabant). Author response image 3 is the unnormalized data showing the same effects of Δ9-THC and Rimonabant on cell number.

      Author response image 3.

      Unnormalized data corresponding to the Figure 2D.

      1. In Figure 3, both ESCs and EpiLCs show a significant decrease in oxygen consumption and glycolysis at a 10uM concentration. Do these conditions slow cell growth? BrdU incorporation experiments (Figure 1) seem to contradict this. With compromised bioenergetics at this concentration, the authors should discuss why cell growth appears unaffected.

      Indeed, we believe that cell growth is progressively restricted upon increasing doses of ∆9-THC (consider Supplementary Figure 2). In addition, oxygen consumption and glycolysis can be decoupled from cellular proliferation, especially considering the lower time ranges we are working with (44-48h).

      1. Beyond Δ9-THC exposure prior to PGCLCs induction, it would be also interesting to explore the effects of Δ9-THC on PGCLCs during their differentiation.

      We agree with the Reviewer. Our aim was to study whether exposure prior to differentiation could have an impact, and if so, what are the mediators of this impact. Full exposure during differentiation is another exposure paradigm that is relevant but would not have allowed us to show the metabolic memory of ∆9-THC exposure. Future work, however, will be dedicated to analyzing the effect of continuous exposure through differentiation.

      1. As PGC differentiation involves global epigenetic changes, it would be interesting to investigate how Δ9-THC treatment at the ESCs/EpiLCs stage may influence PGCLCs' transcriptomes.

      We also agree with the Reviewer. While this paper was not primarily focused on Δ9-THC’s epigenetic effects, we have explored the impact of Δ9-THC on more than 100 epigenetic modifiers in our RNA-seq datasets. These results are shown in Supplementary Table 1 and Supplementary Figure 10 and discussed in lines 301-316.

      1. Lines 407-408: The authors should exercise caution when suggesting "potentially adverse consequences" based solely on moderate changes in PGCLCs transcriptomes.

      We agree and have modified the sentence as follows: “Our results thus show that exposure to Δ9-THC prior to specification affects embryonic germ cells’ transcriptome and metabolome. This in turn could have adverse consequences on cell-cell adhesion with an impact on PGC normal development in vivo.“

      1. Investigating the possible impacts of Δ9-THC exposure on cultured mouse blastocysts, implantation, post-implantation development, and fertility could yield intriguing findings.

      We thank the Reviewer for this comment. We have amended our discussion to include these points in the last paragraph.

      1. Given that naïve human PSCs and human PGCLCs differentiation protocols have been established, the authors should consider carrying out parallel experiments in human models.

      We have performed Δ9-THC exposures in hESCs (Supplementary Figure 4 and Supplementary Figure 5), showing that Δ9-THC alters the cell number and general metabolism of these cells. We present these results in light of the differences in metabolism between mouse and human embryonic stem cells on lines 135-141 and 185-188. Implications of these results are discussed in lines 474-486.

      Reviewer #3 (Public Review):

      Verdikt et al. focused on the influence of Δ9-THC, the most abundant phytocannabinoid, on early embryonic processes. The authors chose an in vitro differentiation system as a model and compared the proliferation rate, metabolic status, and transcriptional level in ESCs, exposure to Δ9-THC. They also evaluated the change of metabolism and transcriptome in PGCLCs derived from Δ9-THC-exposed cells. All the methods in this paper do not involve the differentiation of ESCs to lineage-specific cells. So the results cannot demonstrate the impact of Δ9-THC on preimplantation developmental stages. In brief, the authors want to explore the impact of Δ9-THC on preimplantation developmental stages, but they only detected the change in ESCs and PGCLCs derived from ESCs, exposure to Δ9-THC, which showed the molecular characterization of the impact of Δ9-THC exposure on ESCs and PGCLCs.

      Reviewer #3 (Recommendations For The Authors):

      1. To demonstrate the impact of Δ9-THC on preimplantation developmental stages, ESCs are an appropriate system. They have the ability to differentiate three lineage-specific cells. The authors should perform differentiation experiments under Δ9-THC-exposure, and detect the influence of Δ9-THC on the differentiation capacity of ESCs, more than just differentiate to PGCLCs.

      We apologize for the lack of clarity in our introduction. We specifically looked at the developmental trajectory of PGCs because of the sensitivity of these cells to environmental insults and their potential contribution to transgenerational inheritance. We have expanded on these points in our introduction and discussion sections (lines 89-91 and 474-486). Because our data shows the relevance of Δ9-THC-mediated metabolic rewiring in ESCs subsisting across differentiation, we agree that differentiation towards other systems (neuroprogenitors, for instance) would yield interesting data, albeit beyond the scope of the present study.

      1. Epigenetics are important to mammalian development. The authors only detect the change after Δ9-THC-exposure on the transcriptome level. How about methylation landscape changes in the Δ9-THC-exposure ESCs?

      We have explored the impact of Δ9-THC on more than 100 epigenetic modifiers in our RNA-seq datasets. These results are shown in Supplementary Table 1 and Supplementary Figure 10, discussed in lines 301-316. While indeed the changes in DNA methylation profiles appear relevant in the context of Δ9-THC exposure (because of Tet2 increased expression in EpiLCs), we highlight that other epigenetic marks (histone acetylation, methylation or ubiquitination) might be relevant for future studies.

      1. In the abstract, the authors claimed that "the results represent the first in-depth molecular characterization of the impact of Δ9-THC exposure on preimplantation developmental stages." But they do not show whether the Δ9-THC affects the fetus through the maternal-fetal interface.

      We have addressed the need for increased clarity and have modified the sentence as follows: “These results represent the first in-depth molecular characterization of the impact of Δ9-THC exposure on early stages of the germline development.”

      1. To explore the impact of cannabis on pregnant women, the human ESCs may be a more proper system, due to the different pluripotency between human ESCs and mouse ESCs.

      We have performed Δ9-THC exposures in hESCs (Supplementary Figure 4 and Supplementary Figure 5). These preliminary results show that Δ9-THC exposure negatively impacts the cell number and general metabolism of hESCs. With the existence of differentiation systems for hPGCLCs, future studies will need to assess whether Δ9-THC-mediated metabolic remodelling is also carried through differentiation in human systems. We discuss these points in the last paragraph of our discussion section.

      1. All the experiments are performed in vitro, and the authors should validate their results in vivo, at least a Δ9-THC-exposure pregnant mouse model.

      Our work is the first of its kind to show that exposure to a drug of abuse can alter the normal development of the embryonic germline. We agree with the Reviewer that to demonstrate transgenerational inheritance of the effects reported here, future experiments in an in vivo mouse model should be conducted. The metabolic remodeling observed upon cannabis exposure could also be directly studied in a human context, although these experiments would be beyond the scope of the present study. For instance, changes in glycolysis may be detected in pregnant women using cannabis, or directly measured in follicular fluid in a similar manner as done by Fuchs-Weizman and colleagues (Fuchs-Weizman et al., 2021). We hope that our work can provide the foundation to inform such in vivo studies.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Grove and colleagues analyzes the role of TEAD1 transcription factors in all events regulating PNS myelin formation and maintenance and regeneration. Throughout the manuscript, the authors compare the results obtained to those they previously described in YAP/TAZ double knockout mice. Strengths of the manuscript are combined in vivo analyses by generating mutants constitutively lacking TEAD1 expression in myelinating Schwann cells (P0Cre//TEAD1f/f mice: cKO) and mutants in which TEAD1 expression can be ablated after tamoxifen-mediated recombination is myelinating Schwann cells (PlpCreER//TEAD1f/f mice: iKO). Using this approach the authors were able to assess the role of TEAD1 in all aspects related to PNS myelin: formation as well as maintenance and remyelination after injury. By exploiting these models, they were able to define the role of TEAD1 in regulating Schwann cell proliferation as well as in the cholesterol biosynthetic pathway. Collectively, their data indicate that TEAD 1 has a composite role in PNS myelination being required for developmental myelination, but dispensable for myelin maintenance. Further, they also describe a role for TEAD1 in promoting PNS remyelination after an injury event.

      Despite these strengths, there are some weaknesses that should be addressed by the authors:

      1) The manuscript would benefit from better and more detailed analysis of the role of the other TEAD transcription factors, as they are likely redundant in function to TEAD1. For example, since in cKO mice some fibers can escape the sorting defect and eventually myelinate, albeit at a lower level, could they determine whether TEAD2-4 transcription factors might compensate for TEAD1 absence in this setting?

      We speculate that other TEADs, most likely both TEAD2 and TEAD3, compensate TEAD1 in myelinating some developing axons. We also speculate that TEAD4 counteracts TEAD1, resulting in excessive proliferation of Schwann cells in Tead1 cKO. Unfortunately, because, unlike TEAD1, floxed/congenic alleles and IHC-compatible antibodies are not yet available for TEAD2-4, it is difficult to determine their roles. We attempted to knock down TEAD2-4 by injecting AAV-shRNAs into the sciatic nerves of WT and Tead1 iKO, but this intervention was not successful. Our future studies will determine compensatory and/or opposing roles of other TEADs during development and homeostasis and after nerve injury.

      2) A striking result of the study is the morphological defects observed in the process of axonal sorting and in the Remak fibers formation of TEAD1 cKO mice. To explain the sorting defect, the authors correctly analyze Schwann cell proliferation. However, since axonal sorting is mediated by the interaction between the extracellular matrix and intracellular cytoskeleton rearrangement, they should address also these two aspects. As per the Remak bundles and the poly-axonal myelination they observe, it is difficult to reconcile this "abnormal" myelination with the fact that TEAD1 cKO mice have a very severe myelinating phenotype, which is persistent in adulthood.

      It is noteworthy that we found radial sorting to be delayed, but not blocked, in Tead1 cKO, as we had previously reported for Yap/Taz cDKO mice in our earlier publication (Grove et al., eLIFE 2017). The primary reason that myelin development fails in Schwann cells lacking YAP/TAZ (or TEAD1 in the present report) is because they do not initiate myelination of sorted axons, not because of defective radial sorting. We showed that radial sorting was delayed in Schwann cells lacking YAP/TAZ because of their late S phase entry (Figure 4 in Grove et al., eLIFE 2017). In addition, our earlier report demonstrated that the key laminin receptor, integrin 6, is strongly downregulated but axons are nevertheless sorted out by Schwann cells in Yap/Taz cDKO (Figure 4-figure supplement 2 in Grove et al., eLIFE 2017). Our current view, therefore, is that extracellular matrix may contribute to reducing Schwann cell proliferation (Berti et al., 2011; Pellegatta et al., 2013; Yu, Feltri, Wrabetz, Strickland, & Chen, 2005), which helps to delay radial sorting, but that it is not required for Schwann cells lacking YAP/TAZ (or TEAD1) to sort axons (see the author response #2 in Grove et al., eLIFE 2017). Based on this information, we disagree with the reviewer that it is essential for us to address the role of extracellular matrix in delaying radial sorting in Tead1 cKO.

      Regarding Remak bundles, ‘thinly’ myelinated Remak bundles are only ‘occasionally’ observed in Tead1 cKO mice. Given that some large axons are still myelinated in Tead1 cKO mice, likely due to compensation by other TEADs, we speculate that Remak bundles are occasionally myelinated by other TEADs in Tead1 cKO. We have clarified our description and expanded our discussion of TEAD1 regulation of Remak bundles, including abnormal polyaxonal myelination.

      3) In the analyses of the cholesterol biosynthetic pathway, TEAD1 seems to be only partly involved. Again, which is the role of any of the other TEADs?

      Examining cholesterol biosynthesis pathways (SREBP1 and 2) and their target enzymes (SCD1, HMGCR, FDPS, IDI1) in Tead1 cKO and Yap/Taz cDKO, we showed that TEAD1 is required for upregulating FDPS and IDI1. These data suggest that TEAD1 plays a major role in mediating YAP/TAZ-driven cholesterol synthesis by upregulating FDPS and IDI1. It is also important to note that FDPS and IDI1 levels are reduced in TEAD1 cKO as ‘greatly’ as those in Yap/Taz cDKO (Figure 5). We therefore speculate that other TEADs compensate TEAD1 modestly, if at all, in upregulating FDPS and IDI1. We do not rule out the possibility, however, that other TEADs fully compensate TEAD1 in ‘maintaining’ cholesterol synthesis in adult Schwann cells. We will address these important questions in the future when the key resources mentioned above become available to study TEAD2-4.

      4) Why do cKO mice die before P60?

      In accordance with IACUC guidelines, we humanely euthanized Tead1 cKO mice before P60 because, like Yap/Taz cKO mice, they develop severe peripheral neuropathy.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the reviewer for the positive evaluation of our manuscript. We have closely examined the issues raised, and below we offer a point-by-point response to each comment. In the revised manuscript below, all the introduced changes are marked with red font.

      1. There may be a general typo concerning micromolar and millimolar…

      Response 1: The reviewer is correct, and during the reformatting of the manuscript, in some portions of the manuscript, the units used to indicate TPEN concentrations, always µM, were switched to mM. We have corrected those mistakes.

      1. In Figure 1C/Lines 150-152, the authors use DTPA and EDTA as extracellular chelators for zinc… Was the amount of zinc in the media measured and determined to be below the amount of chelator used? Additionally, these chelators are not specific for zinc, but can bind other divalent cations including calcium. Even though zinc binds more tightly than calcium to these chelators, by mass action calcium and magnesium ions may outcompete DTPA and EDTA, leaving zinc availability unperturbed. How do the authors take these interactions into account to determine that chelation of extracellular zinc has no effect on intracellular calcium oscillations? The best way to test this is to use zinc responsive fluorescent probes in a sample of the calcium- and magnesium-replete medium and see if the addition of the DTPA or EDTA alters zinc fluorescence in the cuvette.

      Response 2: We tested several conditions to determine the effect of chelators on the zinc concentration of the monitoring media using commercially available Zn2+ probes. The fluorescent zinc probe FluoZin3 added extracellularly shows high fluorescence, consistent with trace amounts of zinc and possibly non-specific bindings of other cations.

      Further, the media tested was replete with the concentrations of Ca2+ and Mg2+ in TLHEPES. To establish if the non-permeable external chelators we used could bind external Zn2+ despite the high concentrations of Ca2+ and Mg2+, we followed the reviewer’s suggestion of adding the chelators to the complete media in the presence of FluoZin3. The addition of EDTA caused a protracted, ~5 min, but significant decrease in FluoZin3’s fluorescence, suggesting it is effective at removing external Zn2+ despite the presence of other divalent cations (Author response image 1A). We used a second approach where we added the chelator in the presence of nominal concentrations of Ca2+ and Mg2+ to increase the chelators’ chances to find and chelate Zn2+ (Author response image 1B). Then, we injected mPlcζ mRNA, which initiated persistent but low-frequency oscillations, as expected due to the lack of external Ca2+. Remarkably, upon restoring it, the responses became of high frequency, and upon increasing Mg2+, they acquired the regular pattern, consistent with Mg2+’s inhibition of channels that mediate Ca2+ influx. These results show that the chelation of extracellular zinc does not replicate TPEN’s effect, which suggests that TPEN’s abrupt and inhibiting ability on Ca2+ oscillations is most likely due to the 43 chelation of internal Zn2+.

      Author response image 1.

      Cell-impermeable chelators effectively reduce Zn2+ levels in external media but do prevent initiation or continuation of Ca2+ oscillations. (A) A representative trace of FluoZin3 fluorescence in replete monitoring media (TL-HEPES). The media was supplemented with cell-impermeable FluoZin-3, and after initiation of monitoring, the addition of EDTA (100 μM) occurred at the designated point (triangle). (B) The left black trace represents Ca2+ oscillations initiation by injection of mPlcζ mRNA (0.01 μg/μl). The oscillations were monitored in Ca2+ and Mg2+-free media and in the presence of EDTA (110 μM) to chelate residual divalent cations derived from the water source or reagents used to make the media. The right red trace represents the initiation of oscillations as above, but after a period indicated by the black and green bars, Ca2+ and Mg2+ were sequentially added back.

      Noteworthy, low EDTA concentrations, 10-µM, have been used to enhance in vitro culture conditions of mammalian embryos. In fact, it is the key ingredient to overcome the two-cell block that initially prevented the in vitro development of zygotes srom inbred strains. It is unknown how EDTA mediates this effect, which is detectable in Ca2+ and Mg2+ replete media and is only effective when placed extracellularly, but it has been attributed to its ability to chelate toxic metals introduced as impurities by other media components; one study demonstrated that the Zn2+ present in the oil used to overlay the culture medium micro drops was the target (Erbach et al., Human Reproduction, 1995, 10, 3248-54). We included some of these points in the revised version of the manuscript and added this figure as Supplementary Figure 1.

      1. The reviewer noted that while dKO eggs showed reduced labile zinc levels, the amount of total zinc is not determined. Further, the response to thapsigargin in dKO eggs didn’t phenocopy the profile in eggs treated with TPEN. The reviewer argued that without further experimentation, such as comparing polar body extrusion and egg activation rate between WT and dKO, it seems to be a stretch to state that these eggs are zinc deficient.

      Response 3: We agree that the statement, ‘zinc deficient,’ is an overstatement without determining the total zinc levels and associated phenotypes. Therefore, in the revised version of the manuscript, we referred to dKO-derived eggs and embryos as “low-level labile Zn2+ eggs”. Our follow-up studies show that eggs from dKO females seem to undergo egg activation events, such as the timing and rate of second polar body extrusion and pronuclear formation, with a similar dynamic to WT females. Hence, we estimate that the labile Zn2+ levels in dKO eggs are not as low as those of WT eggs treated with TPEN. Consequently, these intermediate zinc levels may have subtle effects, such as changing the Thapsigargin-induced Ca2+ release through the IP3R1 without causing widespread inhibition of cellular events observed after TPEN. We would argue that this approach is significant because it can distinguish how the different cellular events and proteins and enzymes have distinct affinities or zinc requirements and, in this case, start uncovering the channel(s) present in oocytes and eggs that may contribute to regulating zinc homeostasis.

      1. The reviewer pointed out that since zinc is not redox active, it is unclear how zinc could be modifying cysteine residues of IP3R1.The reviewer suggested the possibility that excess zinc is binding to the cysteines and preventing their oxidation leading to the inhibition of the IP3R1 by blocking the channel, thereby preventing calcium release.

      Response 4: The reviewer correctly points out that the mechanism(s) whereby excess Zn2+ modifies the IP3R1 function is undetermined in our study. Further, our description of ‘modifying’ is ambiguous and could be misinterpreted. Data in the literature, some of which we cite in the manuscript, shows that “oxidation of cysteine residues enhances receptor’s sensitivity to ligands in various cell types”. Zn2+ preferentially binds to reduced cysteine residues, and thus, we agree with the proposed reviewer's suggestion that “excess zinc may occupy reduced cysteine residues, preventing their oxidization required to sensitize the receptor”. As noted by the reviewer, we cannot rule out that it might be directly blocking the IP3R1 channel. We have modified the corresponding paragraphs in the Discussion.

      1. Line 80 and 411, there are three other reports demonstrate the zinc reallocation to the egg shell or ejection as the zinc spark; Zebrafish: Converse et al. in Sci. Reports 10, 15673 (2020); X. lavis: Seeler et al. in Nature Chem. 13, 683-691 (2021), C. elegans: Mendoza et al. in Biology of Reproduction 107(2):406-418 (2022).

      Response 5: Thank you for pointing this out, and we have added these references.

      1. Line 129, when discussing that Zn2+ concentrations are reduced after TPEN as visualized by FluoZin-3, the authors should cite the article in which FluoZin-3 was first reported and this result was demonstrated initially: "Detection and Imaging of Zinc Secretion from Pancreatic β-Cells Using a New Fluorescent Zinc Indicator" by Gee et al. J. Am. Chem. Soc 124, 5, 776-778.

      Response 6: Thank you for pointing this out, and we have added this reference.

      1. In Figure 1E/Table 1 the authors evaluated if TPEN supplementation affects meiosis and pronuclear formation; however, the timing of TPEN treatment is unclear. When was TPEN introduced? Were the eggs left in the same media containing TPEN following fertilization, or were they transferred to different media?

      Response 7: Thank you for pointing this out, and we have noted the time of the addition in the figure and text.

      1. Line 1011 and 1012, ZnTP should be ZnPT.

      Response 8: Thank you for pointing this out, which is now corrected.

      Reviewer #2:

      1. The reviewer raises the question of whether a more complex relationship could exist between the levels of zinc in MII eggs by indicating, “a more active relationship such that zinc efflux associated with each calcium spike could be necessary for terminating the Ca spike by depleting cytoplasmic zinc.” The reviewer also states, “Perhaps, rather than simply a permissive role, the normal Zn fluxes during activation may be acutely changing IP3-R gating sensitivity.”

      Response 1: We agree that the demonstration that TPEN dose-dependently delays and consistently terminates ongoing Ca2+ rises perhaps reflects a more nuanced relationship between cytoplasmic labile zinc concentrations, Ca2+ oscillations, and IP3R1 function. Uncovering the precise nature of this relationship would require additional studies, such as determining the impact of TPEN on IP3 binding to its cognate receptor, regulation of channel gating, and more in-depth functional-structural experiments. However, these studies will demand time and complex experimental design and are beyond the scope of the current work. Nevertheless, they are excellent suggestions for future studies.

      We would argue against the reviewer’s suggestion that “zinc sparks directly contribute to shaping the oscillations.” Zn2+ released during the sparks is not labile, but Zn2+ bound to cortical granules-resident proteins, most of which are inaccessible to the cytosol and hence to IP3R1s and should not perturb its function. We examined (data not shown) that the levels of cytosolic labile Zn2+, as assessed with FluoZin3, remained steady for over three hours of Plcζ mRNA-initiated oscillations. Further, because the Zn2+ sparks cease after the third or fourth Ca2+ rise, it would mean, at the very least, that this mechanism only operates on the first few responses. Thus, while the change of cytosolic Ca2+ concentrations triggers the Zn2+ sparks, we argue that the opposite influence is unlikely to hold true.

      1. The reviewer also pointed out that the role of Trpv3 and Trpm7 in Zn2+ homeostasis seems to be minor and that the effects of genetic deletion of those channels are not as clear as those obtained by TPEN. Given that dKO eggs make it to the MII and release more but not less calcium upon thapsigargin than control despite the lowered labile Zn2+ level, the reviewer speculated that the loss of those channels changes calcium gating independent of Zn2+ concentration.

      Response 2: TRPV3, TRPM7, and Cav3.2 are the three channels identified to permeate Ca2+ during oocyte maturation and egg activation in mice. We and other groups have observed that in oocytes and eggs, these channels partly compensate for the absence of each other because the deletion of these channels individually has a limited effect on Ca2+ oscillations and fertility. Thus, in the case of oocytes from Trpv3 and Trpm7 dKO animals, the other plasma membrane channel(s), most likely Cav3.2, is plausibly compensating, and its enhanced function underlies the increased Ca2+ response to Thapsigargin.

      Nevertheless, the slower time to the peak and the lesser steep rise of the Thapsigargin induced rise suggest a negative impact of the dKO environment on IP3R1’s ability to mediate Ca2+ release. Based on the rest of the results in the manuscript, we attribute this change to the lower levels of labile Zn2+ in dKO eggs.

      1. Lastly, the reviewer noted the upregulation of the Fura-2AM following addition of ZnPT. The reviewer indicated that 0.05 uM ZnPT might not increase intracellular Zn2+ to change Fura-2 fluorescence, but it might be sufficient Zn2+ to enter the cell and keep the IP3R1 channels open causing a sustained rise in cytoplasmic calcium and preventing oscillations. Further, if this interpretation holds true, the inhibitory effects of high Zn2+ on IP3R1’s gating shown in figure 7 would be precluded.

      Response 3: We acknowledge that the increased levels of Fura-2 fluorescence following the addition of ZnPT could be due to the increased Zn2+ levels acting on IP3R1, increasing its open probability, and elevating cytosolic Ca2+ levels. We have added this consideration to the discussion. Nevertheless, our evidence suggests that this is unlikely because, as shown in Figure 6 H, I, the ER-Ca2+ levels as assessed by D1ER recordings did not change following the addition of ZnPT, whereas Rhod-2 fluorescence did, suggesting that the two events are seemingly uncoupled. Further, constant leak from the ER and extended high cytosolic Ca2+ would lead to egg activation or cell death, neither of which changes were observed.

      Reviewer #3:

      The reviewer noted that the present study deepened the understanding of the role of zinc in regulating calcium channels and stores at fertilization beyond the previously known Zn2+ requirement in oocyte maturation and the cell cycle progression. We appreciate these comments.

      1. Fig. 1. The reviewer wondered why we selected 10 μM TPEN for most of the experiments in the manuscript. The reviewer noted this concentration only stopped the Ca2+oscillations in just half of the eggs after ICSI.

      Response 1: We used 10-μM TPEN throughout the study because it blocked ~50% of the oscillations of a robust trigger of Ca2+ responses such as ICSI and reduced the frequency in the remaining eggs. This concentration of TPEN abrogates and prevents the responses by milder stimuli, such as Acetylcholine and SrCl2. Importantly, thimerosal and Plcζ mRNA overcome the inhibition by 10μM but not 50-μM TPEN. However, 50μM TPEN inactivates Emi2, a Zn2+-dependent enzyme, causing parthenogenic activation and cell cycle progression, and we wanted to avoid this confounding factor. Therefore, we determined 10-μM is a “threshold” concentration and selected it for the remaining studies. We also reasoned that it would allow the detection of more subtle effects of reducing the levels of labile zinc, causing a milder inhibition of IP3R1 sensitivity and a progressive delay or modification of the responses to other agonists rather than fully abrogating them, which is the case with higher concentrations.

      1. Line131 - no concentration of TPEN stated? Or 'the addition of different concentrations of TPEN"?

      Response 2: We have corrected this. We have now added 50-100 µM concentrations.

      1. Line 146 - instead of TPEN, all TPEN concentrations?

      Response 3: We have added these corrections, as at the concentrations we tested here, 5μM TPEN and above, all caused a reduction in the baseline of Fura-2 fluorescence.

      1. Line 1046 - 'We submit'? Propose?

      Response 4: We have replaced the word submit for propose. Thank you for the suggestion.

    1. Author Response

      Reviewer #2 (Public Review):

      In this paper, the authors discover that postsynaptic mitochondria in C. elegans govern glutamate receptor trafficking dynamics. The core results are two-fold. For one, they find that loss or inhibition of mcu-1 - the C. elegans mitochondrial calcium uniporter - increases GLR-1 glutamate receptor accumulation at the postsynaptic dendritic sites and enhances its trafficking dynamics. The authors hypothesize that this effect on glutamate receptors may have something to do with mitochondrial ROS production. This is because ROS is a by-product of normal oxidative phosphorylation, downstream of calcium import. Indeed, the generation of artificially high amounts of mitochondrial ROS has the opposite effect of mcu-1 loss: decreased glutamate receptor subunit accumulation. Collectively, the results support the idea that mitochondrial function can control receptor dynamics at synaptic sites. This is interesting because tight control of synaptic function likely combines several mitochondrial functions: energy production, calcium buffering, and (here) ROS signaling.

      STRENGTHS

      • The C. elegans genetic model is a strength because the authors are able to make refined conclusions by classical loss-of-function mutants (e.g., mcu-1) along with an impressive cytological toolkit to examine GLR-1 dynamics.

      • The use of pharmacology as a second means to test those genetic conclusions is a strength.

      • The authors' careful reagent verification of reporters (Ca2+, ROS, etc.) is a strength.

      • The ability to link fundamental mitochondrial processes to GLR-1 exocytosis will expand how the field thinks about mitochondrial synapse function.

      WEAKNESSES

      For the most part, the data in the paper support the conclusions, and the authors were careful to try experiments in multiple ways. But please see below:

      • (Main Point) The data are good, but they fall short of mechanism (e.g., Line 322). Figure 6 is accurate as drawn. But calcium and ROS are not abstract signals. They are likely exerting affirmative actions on specific targets. The Discussion does acknowledge this in terms of ROS and it speculates on possible targets.

      We thank the reviewer for their analytical review of our manuscript. We agree that all molecular players involved in the proposed mechanism were not identified by the data presented, so we modified the text to remove overstatements. We also agree that Ca2+ and ROS signaling is not abstract. Rather, there are specific and diverse targets of both Ca2+ and ROS signaling. Follow-up experiments are underway to identify and provide evidence for the necessity of potential ROS/Ca2+ targets in this proposed mechanism. For the current manuscript, we have modified our verbiage in an attempt to not mislead or overstate what our results suggest (e.g., changes/additions to the beginning of the ‘Discussion’, lines 365-377 and 385-388) and updated the illustration of the proposed model to include dashed lines that, as mentioned in the figure legend, indicate indirect action by ROS and Ca2+ (see revised Figure 7).

      The general idea seems to be that mitochondria import calcium through MCU-1 (and interacting factors). As a result, oxidative phosphorylation successfully occurs and mitochondrial ROS is a signaling by-product that signals glutamate receptors not to undergo exocytosis. But there are other interpretations of what might happen in between. In fact, if OXPHOS is disrupted, it is known that this can generate a lot more mitochondrial ROS than the normal by-product levels.

      We do agree that an alternative explanation could be that genetic or pharmacological inhibition of mitochondrial Ca2+ uptake disrupts oxidative phosphorylation, and as a result, inefficiencies or uncoupling in the electron transport chain would lead to an even greater increase in mitochondrial ROS production. Although oxidative phosphorylation was not directly measured, one of our post hoc analyses of GLR-1 transport suggests ATP levels are comparable between controls, mcu-1 mutants, and with Ru360 treatment: the velocity of GLR-1 transport is unchanged between these experimental groups. The processivity of molecular motors (which dictates transport velocity) is highly sensitive to relative ATP abundance. Thus, if ATP levels were dramatically decreased in mcu-1 mutants or following Ru360 treatment, then one would expect a detectable change in GLR-1 transport velocities, but we observed no change (see revised Figure S2E and related discussion at lines 183-190). Although these results do not directly indicate whether ATP production is altered with loss or inhibition of MCU-1, it does suggest that basal ATP levels remain sufficient to support the metabolic demands of GLR-1 transport.

      This reviewer wonders if excess ROS would cause an extreme response. Or alternatively, if scavenging ROS via pharmacological scavengers or SOD expression would reverse the effects.

      These are good points, and we have previously published experiments that address each of them. First, we have seen that globally increasing ROS with various concentrations of H2O2 within the physiological range (<100 nM) decreased GLR-1 transport to a similar extent (PMID: 32847966) indicating that there is not a dose-dependent decrease in GLR-1 transport. We have also assessed GLR-1 transport after treatment with concentrations of H2O2 well above the physiological range (e.g., 500 nM), but these high concentrations obliterated all GLR-1 transport. Contrary to what one may expect, we showed that decreasing ROS via pharmacological or genetic means (probably below physiological range) decreased GLR-1 transport (PMID: 35622512) via a Ca2+ independent mechanism. In other words, ROS scavenging did not have the opposite effect on GLR-1 transport, but we have not combined ROS scavenging with optical induction of ROS production (e.g., via KillerRed) nor have we assessed the potential influence of ROS scavenging on synaptic recruitment. Although we agree that these are important follow-up experiments, they will require a more sensitive ROS indicator because current genetically encoded in vivo ROS sensors cannot detect decreases in ROS levels below the physiological range (< 10 nM) (PMID: 31586057).

      Small Points

      • 33.3 mHz - just making sure, do the authors mean once every 30 seconds? That would be more straightforward.

      Yes, we do mean a 1-second pulse of light every 30 seconds. We have clarified this in the manuscript text (line 115).

      • Figure 2 is confusing. The text says that the mcu-1 mutants have a GLR-1::GFP FRAP rate that is comparable to controls (Lines 165-167). But Figure 2E suggests that it is markedly less, which is the opposite result of the slight increase in rate resulting from Ru360 treatment. And is the explanation why the GLR-1::GFP results differ from the SEP::GLR-1 results a difference between total GFP vs. surface GFP?

      The confusion is due to an incorrect statement in the results text. We have corrected this error and appreciate the reviewer for bringing it to our attention (lines 173-174).

      • I could not watch Video 2 (not sure if it is the file or just the copy I downloaded).

      We thank the reviewer for bringing this to our attention and we believe we have remedied the issue.

      • It is good that the authors tried both optical stimulation and mechanical stimulation (dropping culture plates to stimulate the worms, Figure 3). Why was the mechanical stimulation set aside for further tests in the paper?

      Mechanical stimulation consisted of dropping culture plates containing 2-3 C. elegans onto a lab bench every 30 seconds for 5 or 10 minutes. This mechanical stimulation paradigm was technically cumbersome and was less effective at inducing changes in mito-roGFP fluorescence that optical stimulation. This is likely due to habituation to the mechanical stimulus which has been well-characterized in C. elegans. The optical stimulation was therefore used as it is a more reliable and repeatable method for stimulating the AVA neuron.

      • Does this process affect all kinds of transport, or is it just the glutamate receptors? Was anything else examined?

      Transport of other proteins has not been examined in the context of mitoROS signaling. Our attempts at visualizing and quantifying the transport, synaptic delivery and exocytosis of other synaptic proteins in vivo has proven to be more technically challenging likely due to relatively lower expression in the C. elegans neurons suitable for transport analysis.

      Reviewer #3 (Public Review):

      Reactive oxygen species (ROS) have been previously shown to regulate glutamate receptor phosphorylation, long-distance transport, and delivery of glutamate receptors to synapses, however, the source of ROS is unclear. In this study, the authors test if mitochondria act as a signaling hub and produce ROS in response to neuronal activity in order to regulate glutamate receptor trafficking. The authors use a variety of optogenetic tools including the calcium reporter mitoGCaMP and the ROS reporter mito-roGFP to monitor changes in calcium and ROS, respectively, in mitochondria after activating neurons with ChRimson in the genetic model organism C. elegans. Repeated stimulation of interneurons called AVA with ChRimson leads to increased calcium uptake into mitochondria in dendrites and increased mitochondrial ROS production. The mitochondrial calcium uniporter mcu-1 is required for these effects because mcu-1 genetic loss of function or treatment with Ru360, a drug that inhibits mcu-1, inhibits the uptake of calcium into mitochondria and ROS production after neuronal activation. Mcu-1 genetic loss of function is correlated with an increase in exocytosis of glutamate receptors but a decrease in glutamate receptor transport and delivery to dendrites. This study suggests that mitochondria monitor neuronal activity by taking up calcium and downregulating glutamate receptor trafficking via ROS, as a means to negatively regulate excitatory synapse function.

      Strengths

      -The use of multiple optogenetic tools and approaches to monitor mitochondrial calcium, reactive oxygen species, and glutamate receptor trafficking in live organisms.

      -Identifying a novel signaling role for dendritic mitochondria which is to monitor neuronal activity (via calcium uptake into mitochondria) and generate a signal (reactive oxygen species) that regulates glutamate receptors at synapses.

      Weaknesses

      -Although the use of KillerRed to generate ROS downstream of mcu-1 is a clever approach, the fact that activation of KillerRed results in reduced GLR-1 exocytosis, delivery, and transport raises the concern that KillerRed is generating a high level or ROS that might be toxic to cellular processes. Experiments showing that other cellular processes are not affected by KillerRed activation and testing if reduced ROS production mimics the effects of blocking mcu-1 would strengthen the conclusions in this study.

      We thank the reviewer for their careful analyses of our findings. It is plausible that KillerRed could cause toxic levels of ROS, in fact, it was originally used to instigate oxidative stress-induced apoptosis to achieve cell-specific ablation. These cell ablation protocols required 20+ minutes of KillerRed activation with substantially higher levels of irradiation (e.g., 3.8 mW/mm [PMID: 24209746] vs. our light dosage of 25 µW/mm2). Additionally, our transgenic C. elegans strains expressing KillerRed were designed to have a relatively low KillerRed expression and were screened for low expression based on KillerRed’s fluorescence. Using these strains, we were able to minimally activate KillerRed in the AVA neuron resulting in ROS elevations at mitochondria that were comparable to neuronal activity-induced increases in mitochondrial ROS as measured by mito-roGFP. Specifically, we found that 10 minutes of mechano-stimulation and 5 minutes of ChRimson stimulation increased the fluorescence ratio (Fratio) of mito-roGFP nearly two-fold (Figure 4A-B and 4C-E). A 15-second pulse of light focused on a small region activating mitoKR in the AVA neurite also caused similar two-fold increase in the mito-roGFP Fratio (Figure 4C-E) comparable to what neuronal activity induced. Our 5-minute global KillerRed activation less effectively increased the mito-roGFP Fratio at mitochondria in the AVA neurite compared to neuronal activity (revised Figure 4B and 4H) but was sufficient in decreasing GLR-1 transport (revised Figure 5G-H). So, we decided to do all experiments with 5 minutes of global KillerRed activation since lower activation levels of KillerRed were more likely to achieve non-toxic, signaling levels of ROS. Since we strongly agree that this data is important for tool validation, we have reorganized the manuscript such that these data are now a primary figure (see revised Figure 4 and new results sub-section starting at line 252).

      Additionally, we added supplemental transport velocity data. This data shows that local photoactivation as well as whole-cell activation of KillerRed does not alter transport velocity of GLR-1 vesicles within the neurite (revised Figure S4A and S4B and lines 272-276 and 287-289), which would be the case if ATP, microtubules, or actin dynamics were affected. This supports that our local and whole-cell activation protocol does not cause toxic levels of ROS production.

      Lastly, the reviewer questions whether decreasing ROS alters GLR-1 transport, synaptic delivery and exocytosis in a similar fashion to loss or inhibition of mcu-1, and if so, would further support the proposed mechanism. We have decreased ROS via genetic (catalase overexpression) and pharmacological (using the mitochondria-targeted antioxidant MitoTEMPO) means and seen that diminished ROS levels decrease GLR-1 transport albeit to a lesser degree than that caused by loss/inhibition of mcu-1 (PMID: 35622512). To determine if decreased GLR-1 transport during diminished ROS levels involves mcu-1, we would need to assess GLR-1 transport in mcu-1 mutants while ROS is decreased (e.g., using MitoTEMPO treatment) to see if their combined effect phenocopies the effect of mcu-1(lf) or decreased ROS alone. However, as mentioned previously, we are unable to measure ROS levels below the sensitivity of roGFP but within physiological range so we cannot currently calibrate or validate our methods for scavenging ROS in vivo. This is why we have not yet analyzed synaptic delivery or exocytosis rates of GLR-1 in the context of decreased ROS, but these would be interesting follow-up experiments that may further support our model once more sensitive ROS sensors are available.

      Reviewer #4 (Public Review):

      Using optogenetic stimulation, the authors presented compelling evidence that neuronal activity increases mitochondrial calcium levels, facilitated by the mitochondrial uniporter MCU-1. Through ratiometric measurements, they showed that mitochondrial ROS levels also increase due to neuronal activity via MCU-1. Subsequent FRAP studies were employed to investigate the trafficking of the AMPA receptor, GLR-1. By integrating genetic and pharmacological methodologies, the recovery rate of GLR-1 was assessed. The authors concluded that increased mitochondrial ROS due to neuronal activity reduces the trafficking and exocytosis of AMPA receptors. They proposed that mitochondrial ROS serves as a homeostatic mechanism regulating AMPA receptor trafficking and abundance, thus maintaining synaptic strength. This research is crucial as it provides a direct link between mitochondrial signaling and AMPA receptor trafficking.

      However, there are several significant concerns regarding the methodologies and quantifications employed in this manuscript. The authors utilized GLR-SEP to label surface AMPA receptors and relied on the "FRAP rate" as an indicator of the exocytosis rate. The absence of direct visualization of exocytosis using GLR-SEP, and the lack of direct measurements of exocytosis events, casts doubt on the conclusions about ROS's impact on AMPA receptor exocytosis. Furthermore, the "FRAP rate" determined in this study is a combination of recovery rates (incorporating both endosomal trafficking and diffusion) with the mobile fractions of AMPA receptors, potentially weakened interpretations of the findings. A more comprehensive discussion addressing the conflicting effects of MCU-1 and ROS on GLR-GFP FRAP recovery and dendritic trafficking would enable readers to grasp the intricate roles of mitochondrial calcium and ROS in modulating synaptic receptors.

      We appreciate the reviewer’s attention to detail while reviewing our article. Their major concern about directly visualizing exocytosis events is valid since changes in exocytosis and endocytosis would dictate the amount of SEP::GLR-1 at the synaptic membrane. However, streaming imaging of SEP in vivo is technically difficult showing only few exocytosis events and provides short “snapshots” (1-2 minutes, longer streaming imaging causes photobleaching and photo-toxicity) which must be extrapolated to longer time frames. Our 16-minute SEP::GLR-1 FRAP protocol allows us to capture all plasma membrane recruitment and quantify the relative balance between exo- and endocytosis. It also allows for longer observational periods during which we can detect changes in GLR-1 recruitment to and retention at the synaptic membrane in genetic mutants and with drug treatments. In addition, our photobleaching approach involves photobleaching a ~40-60 µm region proximally and distally to the imaging region which limits the influence of receptor diffusion on the FRAP rate. The reviewer makes a valid point that receptor endocytosis rates would also influence the SEP::GLR-1 FRAP rate. We have now changed the text in the results and discussion to include this information (lines 155-161, and changing “exocytosis” to “synaptic recruitment” throughout the manuscript when discussing SEP::GLR-1 FRAP results [e.g, at lines 169, 208, and 321]).

    1. Author Response

      The following is the authors’ response to the original reviews.

      necessary clarifications on some of the reviewers' suggestions.

      Reviewer #1 (Public Review):

      Weaknesses:

      • This is a pilot study with only 24 cases and 24 controls. Because the human microbiota entails individual variability, this work should be confirmed with a higher sample size to achieve enough statistical power.

      Thank you for your suggestion. Unlike the high sparsity of 16s rRNA, the data density of metagenomic data is higher. Based on the experience of previous research, the sample size used this time can basically meet the requirements. However, your suggestion is very valuable, increasing the sample size allows better in-depth analysis. Due to limitations of objective factors, it is difficult for us to continue to increase the sample size in this study.

      • The authors do not report here the use of blank controls. The use of this type of control is important to "subtract" the potential background from plasticware, buffer or reagents from the real signal. Lack of controls may lead to microbiome artefacts in the results. This can be seen in the results presented where the authors report some bacterial contaminants (Agrobacterium tumefaciensis, Aequorivita lutea, Chitinophagaceae, Marinobacter vinifirmus, etc) as part of the most common bacteria found in cervical samples.

      Thank you for your suggestion. Applying blank controls in low biomass areas can effectively avoid contamination caused by the environment or kits. This opinion is consistent with that published by Raphael Eisenhofer et al. in Trends in Microbiology. When designing this study, we considered that this study described a biomass-rich site, and the abundance of dominant species was much higher than that of the possible 'kitome', so we did not set a blank control. On the other hand, our main discussion object in this study is high-abundance species, and the species filtering threshold for some analyzes was raised to 50%. Therefore, we believe that the absence of the blank control has little effect on the conclusions of this study. However, your opinion is spot on. Failure to set up a negative control will affect our future research on rare species. We will add a description in the Limitations section of the Discussion section.

      • Samples used for this study were collected from the cervix. Why not collect samples from the uterine cavity and isthmocele fluid (for cases)? In their previous paper using samples from the same research protocol ((IRB no. 2019ZSLYEC-005S) they used endometrial tissue from the patients, so access to the uterine cavity was guaranteed.

      Thank you for your suggestion. In Author response image 1 we show the approximate location of our cervical swab sampling. There are two main reasons for choosing cervical swabs:

      1) The adsorption of swabs allows us to obtain sufficient nucleic acid for high-depth sequencing, while the isthmocele fluid varies greatly among patients, which will introduce unnecessary batch effects.

      2) Since the female reproductive tract is a continuous whole, our sampling location is close to the lesion in the cervix, which can be effectively studied. On the other hand, the microbial biomass of the endometrium is probably two orders of magnitude lower than that of the cervix, and it is difficult to avoid contamination of the lower genital tract when sampling.

      Based on the above reasons, we selected cervical swabs for our microbial data.

      Author response image 1.

      • Through the use of shotgun genomics, results from all the genomes of the organisms present in the sample are obtained. However, the authors have only used the metagenomic data to infer the taxonomical annotation of fungi and bacteria.

      Thank you for your suggestion. The advantage of metagenomics is that it can obtain all the nucleic acid information of the entire environment. However, in the study of the female reproductive tract, the database of viruses and archaea is still immature, in order to ensure the accuracy of the results, we did not conduct the study. Looking forward to the emergence of a mature database in the future.

      Reviewer #1 (Recommendations For The Authors):

      • It would be interesting to use another series of functional data coming from the metagenomic analyses (not only taxonomic) to expand and reinforce the results presented.

      Thank you for your suggestion. We have dissected the functional data of microbiota in the article.

      • The authors have previously published the 16S rRNA sequencing and transcriptomic analysis of the same set of patients. It would be nice to see the integration of all the datasets produced.

      Thank you for your suggestion. There is no doubt that integrating all the data will have more dimensional results. In our previous study we focused on microbe-host interactions. However, there is an unanswered question: What are the characteristics of the regulatory network within microbiota? Therefore, we answered this question in this study, exploring the complex interaction processes within microbial communities. In addition to direct effects, interactions between microbiota may also occur through special metabolite experiments. Therefore, we introduced the analysis of the untargeted metabolome. However, 16s rRNA can only provide bacterial information, so we did not integrate the data. In addition, the transcriptome provides host information and is not the focus of this study. However, your suggestion is very valuable, and we will integrate all the data in the next study on the exploration of treatment methods.

      Reviewer #2 (Public Review):

      Weaknesses: Methodological descriptions are minimal.

      Some example:

      *The CON group (line 147) has not been defined. I supposed it is the control group.

      • There are no statistics related to shotgun sequencing. How many reads have been sequenced? How many have been removed from the host? How many are left to study bacteria and fungi? Are these reads proportional among the 48 samples? If not, what method has been used to normalise the data?

      • ggClusterNet has numerous algorithms to better display the modules of the microbiome network. Which one has been used?

      Thank you for your suggestion. We have added details to the method.

      Reviewer #2 (Recommendations For The Authors):

      I think the author should take into account the points described in the "Weaknesses" section. The lack of detail extends to almost all the analyses that have been included in the manuscript. Although the results are sound, I think it is important to understand what has been analysed and how it has been analysed. It is important that all work is reproducible and this requires vital information.

      For example, what parameters have been used for bowtie2? has a local analysis been used? or end-to-end ? Some parameters like --very-sensitive are important for this kind of analysis. You can also use specific programs like kneaddata.

      The Raw data preprocessing section should be more detailed.

      The same with the "Taxa and functional annotation" section, how have the data been normalised? has any Zero-Inflated Gamma probabilistic model algorithm been taken into account? How were the 0 (no species detected) in the shallow samples treated?

      Which algorithms have been used for LEfSe ? Kluskal-Wallis->(Wilcoxon)->LDA ?

      Which p-value has been used as cut-off ? this p-value has been corrected for multiple testing?

      • Information on ggClusterNet should be included and explained.

      The first section of the results and Table 1 should be in the Materials and Methods.

      Thank you for your suggestion. We have added details to the method.

      In the fungi section, it is mentioned that 431 species have been found. They should be included in a supplementary table.

      How many bacteria were found? Please include them also in a supplementary table.

      Thank you for your suggestion. We have added the corresponding table.

      Reviewer #3 (Public Review):

      Major

      1. Smoke or drink conditions, as well as diseases like hypertension and diabetes are important factors that could influence the metabolism of the host, thus the authors should add them in the exclusion criteria in the Methods.

      Thanks to reviewer #3 for professional comments. We have made corresponding additions in the method section. We also followed this standard when recruiting subjects.

      1. The sample size of this study is not large enough to draw a convincing conclusion.

      Thank you for your suggestion. Unlike the high sparsity of 16s rRNA, the data density of metagenomic data is higher. Based on the experience of previous research, the sample size used this time can basically meet the requirements. However, your suggestion is very valuable, increasing the sample size allows better in-depth analysis. Due to limitations of objective factors, it is difficult for us to continue to increase the sample size in this study.

      Reviewer #3 (Recommendations For The Authors):

      Please recruit more samples.

      In addition, there are many formatting and grammatical mistakes in the manuscript.

      Minor

      1. In Line 24-25 of the "Composition and characteristics of fungal communities", the format of "Goyaglycoside A and Janthitrem E." shouldn't be italic.

      2. In Line 126 of the "Metabolite detection using liquid chromatography (LC) and mass spectrometry (MS)", the "10 ul" should be changed to "Ten ul". Beginning with arabic numerals in a sentence should be avoided.

      3. In Line 170 of the "Composition and characteristics of bacterial communities", the "162 differential species" should be "One hundred and sixty-two differential species".

      4. In Line 187 of the "Composition and characteristics of fungal communities", the "42 differential" should be "Forty-two differential".

      Thanks to reviewer #3 for professional comments. We have completely revised the language of the article.

    1. Author Response

      Reviewer #1 (Public Review):

      Payne et al. have investigated the neural basis of VOR adaptation with the goal of constraining sites and mechanisms of plasticity supporting cerebellar learning. This has been an area of intense debate for decades; previous competing models have argued extensively about the sites of plasticity and the strength of eye velocity feedback/ efference copy signals to Purkinje cells has been central to the debate. This paper nicely explores the consequences of varying the strength of this feedback and in so doing, provides a potential explanation for why Purkinje cell responses during VOR cancellation could exhibit stronger responses following learning, despite net depression of the strength of their vestibular inputs. In that sense it provides some reconciliation of existing models. The work appears to be well done and the paper is well written. The manuscript could be improved and the significance of the work clarified and enhanced by contextualizing the work more appropriately within the existing literature in this area.

      We thank the reviewer for the nice summary of this work’s contribution to the long-standing debate regarding sites and mechanisms of plasticity underlying cerebellar learning.

      We have revised the manuscript to address several key points raised by the reviewer. We now emphasize that the main evidence for weak feedback arises from interpreting our model in the context of the existing experimental evidence for plasticity rules in the cerebellar cortex, and we have clarified the commonalities and differences from the Miles-Lisberger model. Several missing references are now included. Additionally, we clarify the comparison of our model to data after learning, and explain how altered signaling through the visual pathways drives paradoxical changes in neural activity without requiring plasticity in the visual pathways. We hope that these changes better situate the work to be interpreted appropriately in the context of the existing literature.

      Reviewer #2 (Public Review):

      Payne et al. use a computational approach to predict the sites and directions of plasticity within the vestibular cerebellum that explain an unresolved controversy regarding the basis of VOR learning. Specifically, the conclusion by Miles and Lisberger (1981) that vestibular inputs onto Purkinje cells (PCs) must potentiate, rather than depress (as in the Marr/Albus/Ito model), following gain-increase learning because when the VOR is cancelled, PC firing increases rather than decreases. Payne et al. provide a novel model solution that recapitulates the results of Miles and Lisberger but, paradoxically, uses plasticity in the cerebellar cortex that weakens PC output rather than strengthens it. However, the model only succeeds when efference copy feedback to the cerebellar cortex is relatively weak thereby allowing a second feedback pathway to drive PC activity during VOR cancellation to counteract the learned change in gain. Because the model is biologically constrained, the findings are well supported. This work will likely benefit the field by providing a number of potentially experimentally testable conclusions. The findings will be of interest to a wider audience if the results can be extrapolated to other cerebellar-dependent learning behaviors rather then just VOR gain-increase learning. Overall, the manuscript is very well written with clearly delineated results and conclusions.

      We appreciate the reviewer’s comments that the model is well-constrained and provides a solution to the long-standing debate surrounding sites and directions of plasticity underlying VOR learning.

      The reviewer raises an important question: do our results generalize across the cerebellum? We note first that we are studying the cerebellum to illustrate a core problem in modeling systems throughout the brain, namely, how to disambiguate plasticity in the face of ubiquitous feedback loops, both within the brain and between the brain and the environment. Within the cerebellum, we focused on VOR learning due to the wealth of experimental data available. While the specific effect of feedback strength on plasticity will depend on the details of the relevant cerebellar circuit, our general approach can be applied to other areas, given sufficient data, in order to determine how plasticity is distributed in the face of potential feedback loops. Importantly, error-driven LTD of the parallel fiber-Purkinje cell synapse is a fundamental hypothesized mechanism for cerebellar learning which has been generally accepted elsewhere in the cerebellum, but was called into question for VOR learning in the flocculus by the Miles-Lisberger model. Thus, our study of VOR learning has broad implications for reconciling plasticity mechanisms across the cerebellum.

      We also note that, even within the VOR circuit, the direction of plasticity and the relative dependence on plasticity at each site may depend on the timescale of learning. On longer timescales, there is thought to be consolidation of learning from a cerebellar cortical site to a brainstem site. Such consolidation from a faster-learning site to a slower-learning site is known as systems consolidation and has been shown theoretically to mitigate the ‘plasticity-stability dilemma’ of having fast learning without over-writing longer-term learning. Our model is compatible with both error-driven plasticity in the cerebellar cortex and a site of plasticity in the brainstem, with brainstem plasticity potentially mediating consolidation of earlier learned changes in the cerebellar cortex. We have now updated the text significantly to discuss the broader implications of the results and to address the reviewer’s specific comments.

      Reviewer #3 (Public Review):

      Summary: In this study, the authors attempt to determine what is the role (and strength) of feedback in a closed-loop (cerebellar) system.

      Strengths:

      1) By combining extensive data fitting of cerebellar experimental observations this study provides deep insights into existing questions and more broadly on the role of feedback and what are the limitations when inferring feedback in (plastic) neural circuits.

      2) Another strength of this study is the gradual build-up of evidence by using models of different complexities to help build the argument that weak feedback is sufficient to explain experimental observations.

      3) The paper is well-written and structured.

      Weaknesses:

      1) In principle feedback can (i) drive dynamics or/and (ii) drive learning directly. Throughout the paper, the authors refer to only the first case (i.e. dynamics). However, the role of feedback in learning is already implicitly assumed by the authors when jointly fitting the model before and after learning. Note that the general conclusion that feedback (in general) is weak may be to the first view (i.e. dynamics), but not the second. Given that a key conclusion of the paper is that no feedback is sufficient to explain the data, this suggests that feedback may instead be used for learning/plasticity.

      We fully agree with the reviewer that our conclusions do not preclude an important role for many other types of feedback, including as an instructive signal for learning. Instead of explicitly considering feedback for learning in our model, we consider static snapshots before and after learning to infer plasticity, while remaining agnostic to the neural algorithm used to achieve such plasticity. A widely held hypothesis is that motor error signals carried by climbing fibers instruct LTD at co-active parallel fiber inputs to Purkinje cells; this is indeed a form of feedback, operating on a slower timescale than “feedback for dynamics.” This “feedback for learning” is not modeled here but is fully consistent with our results, as discussed in a new paragraph of our Discussion (end of Section 3.4.1 “Pathways undergoing plasticity”).

      2) There are some potential limitations of the conclusions drawn due to the model inference methods used. The methods used (fmincon) can easily get stuck in local minima and more importantly they do not provide an overview of the likelihood of parameters given the data. A few studies have now shown that it is important to apply more powerful inference techniques both to infer plasticity (Bykowska et al. Frontiers 2019) and neural dynamics (Gonçalves et al. eLife 2020). As highlighted by Costa et al. Frontiers 2013 using more standard fitting methods can lead to misleading interpretations. Given the large range of experimental data used to constrain the model, this may not be an issue, but it is not explicitly shown.

      The reviewer correctly points out that we used a deterministic model-fitting procedure. To address this concern, we complemented the full dynamic model with a simple analytic model ( Figure 5 ) for which we could fully derive the cost function landscape and analytically show that there is a line of parameters corresponding to a perfect degeneracy in the model. Thus, the challenge in the model we analyze is that there are too many solutions, rather than it being difficult to find a solution. Given this degeneracy, we chose to fix the level of efference copy feedback and then find the (now non-degenerate) solutions, and to then compare these different solutions with regards to their implications for the correlated strengths and changes in strengths of different pathways. We have edited the relevant section of the Discussion for clarity on this topic, and have added references to the additional strategies for model inference mentioned above, in Section 3.3 “Relation to other sloppy models”.

      3) There is some lack of clarity on how the feedback pathways as currently presented should be interpreted in the brain.

      We interpret this comment as referring to the questions of (1) whether our model includes a pathway for learning through feedback, (2) what is the anatomical implementation of the efference copy feedback pathway and visual pathways, and (3) how should the positive weights on the efference copy feedback pathway k PE be interpreted. We address these below.

      (1) Feedback for learning was discussed in point 1 above.

      (2) Anatomical implementation of efference copy pathway: We have edited the Discussion to clarify that there is anatomical evidence for efference copy input to the cerebellum, but that a key aspect of ‘feedback’ is that activity functionally loops back onto itself. Instead, neurons carrying eye movement commands (such as in the vestibular nucleus) could send signals to the cerebellum, without receiving output from the same cerebellar neurons – this would correspond to a ‘spiraling’ pathway that does not form a closed feedback loop (Figure 8). Thus we argue that the existence of the gross anatomical pathways does not necessitate a role for strong, functional, efference copy feedback (Discussion, Section 3.1, lines 481-491).

      Anatomical implementation of visual pathway: The visual feedback pathways considered here are those that would receive visual motion information from the environment. This visual feedback is itself changed by eye movements, thus providing a net overall negative feedback loop that helps to stabilize gaze. This pathway has been proposed to involve cortical regions such as MST (discussed in Materials and Methods, Model Implementation, lines 769-774).

      (3) Interpretation of positive feedback loop: In our model, the efference copy feedback filter, k PE , has positive weight. This corresponds to the positive net sign of the Purkinje cell to brainstem to Purkinje cell feedback loop. Specifically, the Purkinje cell to brainstem pathway is inhibitory (because Purkinje cells are inhibitory), the brainstem to eye velocity command pathway is inhibitory (to achieve counter-rotation of the eyes in response to head turns), and the feedback of this eye velocity command back to Purkinje cells (k PE ) is positive. Thus this loop in our model represents positive feedback. This is now clarified in Materials and Methods, Model Implementation, lines 748.

      4) The functional benefits of having (or not) feedback could be better discussed (related to point 1 above).

      Related to point 1 above, it is certainly the case that feedback is necessary for learning. We do not explicitly model the climbing fiber feedback thought to be involved in learning/plasticity of the parallel fiber pathway.

      We instead focus on the role of efference copy feedback, and how it functionally impacts the required sites and signs of plasticity in the circuit. As shown in the paper, if the efference copy pathway is strong, then this is most consistent with learned changes in eye movements being driven primarily by plasticity in the brainstem pathway (as in the Miles-Lisberger hypothesis), whereas if the efference copy pathway is weak, then this is most consistent with learned changes in eye movements being driven by net depression in the parallel fiber to Purkinje cell pathway (as in the classic Marr-Albus-Ito model and as suggested by most cellular and molecular studies of parallel fiber-Purkinje cell plasticity), in addition to a role of plasticity in the brainstem pathway. We also note that, in the ‘Strong Feedback’ model, the feedback is so strong that the system is on the brink of instability – this has been argued to have the functional benefit of providing ‘inertia’ to eye movements that could help to maintain eye movements during smooth pursuit when a target goes behind an occluder, but it also has the disadvantage of placing the system at a level of positive feedback near the brink of instability. We also note that the visual feedback pathway through the environment, emphasized in this work, serves as a negative feedback loop that reduces deviations between the eye and target velocity. We have extensively re-written the first section of the Discussion (Section 3.1), in order to more clearly lay out the implications of each model for circuit plasticity and feedback.

      5) Some of the key conclusions of the work are not described in the abstract, namely that feedback is weak in the cerebellar system.

      Thank you for raising this point, we have added this key conclusion to the end of the abstract: “Our results address a long-standing debate regarding cerebellum-dependent motor learning, suggesting a reconciliation in which error-driven plasticity of synaptic inputs to Purkinje cells is compatible with seemingly oppositely directed changes in Purkinje cell activity. More broadly, the results demonstrate how learning-related changes in neural activity can appear to contradict the sign of the underlying plasticity when either internal feedback or feedback through the environment is present.”

      Claims:

      The argument is well-built throughout the paper, but there are some potential caveats with the general interpretation (see weaknesses).

      Impact:

      This work has the potential to bring important messages on how best to interpret and infer the role of feedback in neural systems. For the field of the cerebellum, it also proposes solutions to long-standing problems.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Cyclic Nucleotide Binding (CNB) domains are pervasive structural components involved in signaling pathways across eukaryotes and prokaryotes. Despite their similar structures, CNB domains exhibit distinct ligand-sensing capabilities. The manuscript offers a thorough and convincing investigation that clarifies numerous puzzling aspects of nucleotide binding in Trypanosoma.

      Strengths:

      One of the strengths of this study is its multifaceted methodology, which includes a range of techniques including crystallography, ITC (Isothermal Titration Calorimetry), fluorimetry, CD (Circular Dichroism) spectroscopy, mass spectrometry, and computational analysis. This interdisciplinary approach not only enhances the depth of the investigation but also offers a robust cross-validation of the results.

      Weaknesses:

      None noticed.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript clearly shows that Trypanosoma PKA is controlled by nucleoside analogues rather than cyclic nucleotides, which are the primary allosteric effectors of human PKA and PKG. The authors demonstrate that the inosine, guanosine, and adenosine nucleosides bind with high affinity and activate PKA in the tropical pathogens T. brucei, T. cruzi and Leishmania. The underlying determinants of nucleoside binding and selectivity are dissected by solving the crystal structure of T. cruzi PKAR(200-503) and T. brucei PKAR(199-499) bound to inosine at 1.4 Å and 2.1 Å resolution and through comparative mutational analyses. Of particular interest is the identification of a minimal subset of 2-3 residues that controls nucleoside vs. cyclic nucleotide specificity.

      Strengths:

      The significance of this study lies not only in the structure-activity relationships revealed for important targets in several parasite pathogens but also in the understanding of CNB's evolutionary role.

      Weaknesses:

      The main missing piece is the model for activation of the kinetoplastid PKA which remains speculative in the absence of a structure for the trypanosomatid PKA holoenzyme complex. However, this appears to be beyond the scope of this manuscript, which is already quite dense.

      We fully agree that insight into the activation mechanism and its possible deviation from the mammalian paradigm requires a holoenzyme structure revealing the details of R-C interaction. We have attempted Cryo-EM from LEXSY-produced holoenzyme, yet upscaling the purification procedures described in this manuscript have repeatedly failed in spite of numerous protocol changes and optimizations. Much more work is required to achieve this.

      Reviewer #2 (Recommendations For The Authors):

      Some minor points to consider for enhancing the impact of this interesting manuscript:

      1) The nucleoside affinities measured are mainly for the regulatory subunits unbound to the kinase domain. How would nucleoside affinities change when the regulatory subunits are bound to the kinase domain, which is presumably the case under resting conditions? An estimation of this change in affinity is important because it more closely relates to the variations in cellular nucleoside concentrations needed for activation.

      This is an important question and we have given an indirect answer in the manuscript, but not very explicit. The EC50 values for kinase activation of the purified holoenzyme complexes are very similar or almost identical to the kD values measured by ITC with free regulatory subunits. By inference, the binding kD for the holoenzyme and for the free R-subunit cannot be very different. In addition, we have recently determined the EC50 for PKA activation in vivo in trypanosomes using a bioluminescence complementation reporter assay. The values fit perfectly to the values obtained with purified holoenzyme (Wu et al. in preparation). A sentence in Results (lines 201-203) has been added.

      2) The authors should point out that a major implication of nucleoside vs. cyclic nucleotide activation is in terms of signal termination. If phosphodiesterases (PDEs) are responsible for cAMP/cGMP signal termination, what terminates nucleoside-dependent signaling? Although the answer to this question may not be known at this stage, it is important to highlight this critical implication of the authors' study.

      The mechanism of signal termination is indeed unknown so far. We speculate that some enzymes of the purine salvage pathways are differentially localized in subcellular compartments and thereby able to establish microdomains that enable nucleoside signaling. In addition, PKA subunit phosphorylations/dephosphorylations and/or protein turnover may also regulate signal termination. As an example, free PKAC1 is rapidly degraded upon depletion of the PKAR subunit by RNAi. We have now mentioned signal termination in Discussion and have revised the last part of Discussion (lines 567-602). A possible approach to monitor compartmentalized signaling would be using the FluoSTEPs technology (Tenner et al., Sci. Adv. 2021; 7: eabe4091), but adapting this to the trypanosome system will not be a short-term task.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Assessment note: “Whereas the results and interpretations are generally solid, the mechanistic aspect of the work and conclusions put forth rely heavily on in vitro studies performed in cultured L6 myocytes, which are highly glycolytic and generally not viewed as a good model for studying muscle metabolism and insulin action.”

      While we acknowledge that in vitro models may not fully recapitulate the complexity of in vivo systems, we believe L6 myotubes are appropriate for studying the mechanisms underlying muscle metabolism and insulin action. L6 myotubes possess many important characteristics relevant to our research, including high insulin sensitivity and a similar mitochondrial respiration sensitivity compared to primary muscle fibres. Furthermore, several studies have demonstrated the utility of L6 myotubes as a model for studying insulin sensitivity and metabolism, including our own previous work (PMID: 19805130, 31693893, 19915010) and work of others (PMID:12086937, 29486284, 15193147).

      Importantly, our observations from the L6 myotube model are supported by in vivo data from both mice and humans. Chow (Figure 3J, K) and high-fat fed mice (new data - Supplementary Figure 4 H-I) demonstrated a reduction in mitochondrial Ceramide and an increase in CoQ9. Muscle biopsies from humans showed a strong negative correlation between mitochondrial C18:0 ceramide levels and insulin sensitivity (PMID: 29415895). Further, complex I and IV abundance was strongly correlated with both muscle insulin sensitivity and mitochondrial ceramide (CerC18:0) (Figure 6E, F). This is consistent with our observations in L6 myotubes (Figure 6H, I). These findings support the relevance of our in vitro results to in vivo muscle metabolism.

      Points from reviewer 1

      1. Although the authors' results suggest that higher mitochondrial ceramide levels suppress cellular insulin sensitivity, they rely solely on a partial inhibition (i.e., 30%) of insulin-stimulated GLUT4-HA translocation in L6 myocytes. It would be critical to examine how much the increased mitochondrial ceramide would inhibit insulin-induced glucose uptake in myocytes using radiolabeled deoxy-glucose. Another important question to be addressed is whether glycogen synthesis is affected in myocytes under these experimental conditions. Results demonstrating reductions in insulin-stimulated glucose transport and glycogen synthesis in myocytes with dysfunctional mitochondria due to ceramide accumulation would further support the authors' claim.

      Response: We have now conducted additional experiments focusing on glycogen synthesis as a readout of insulin sensitivity, as it offers an orthogonal method for assessing GLUT4 translocation and glucose uptake. L6-myotubes overexpressing the mitochondrial-targeted ASAH1 construct (as described in Fig. 3) were challenged with palmitate and insulin stimulated glycogen synthesis was measured using 14C radiolabeled glucose. As shown below, palmitate suppressed insulin-induced glycogen synthesis, which was effectively prevented by overexpression of ASAH1 (N = 5, * p<0.05) supporting our previous observation using GLUT4 translocation as a readout of insulin sensitivity (Fig. 3). These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism.

      These data have now been added to Supplementary Figure 4K and the results modified as follows:

      “...For this reason, several in vitro models have been employed involving incubation of insulin sensitive cell types with lipids such as palmitate to mimic lipotoxicity in vivo. In this study we have used cell surface GLUT4-HA abundance as the main readout of insulin response...”

      “Notably, mtASAH1 overexpression protected cells from palmitate-induced insulin resistance without affecting basal insulin sensitivity (Fig. 3E). Similar results were observed using insulin-induced glycogen synthesis as an orthologous technique for Glut4 translocation. These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism (Sup. Fig. 5K). Importantly, mtASAH1 overexpression did not rescue insulin sensitivity in cells depleted…”

      Author response image 1.

      Additionally, the following text was added to the method section:

      “L6 myotubes overexpressing ASAH were grown and differentiated in 12-well plates, as described in the Cell lines section, and stimulated for 16 h with palmitate-BSA or EtOH-BSA, as detailed in the Induction of insulin resistance section.

      On day seven of differentiation, myotubes were serum starved in DMEM for 3.5 h. After incubation for 1 h at 37 °C with 2 µCi/ml D-[U-14C]-glucose in the presence or absence of 100 nM insulin, glycogen synthesis assay was performed, as previously described (Zarini S. et al., J Lipid Res, 63(10): 100270, 2022).”

      1. In addition, it would be critical to assess whether the increased mitochondrial ceramide and consequent lowering of energy levels affect all exocytic pathways in L6 myoblasts or just the GLUT4 trafficking. Is the secretory pathway also disrupted under these conditions?

      Response: This is an interesting point raised by the reviewer that is aimed at the next phase of this work, to identify how ceramide induced mitochondrial dysfunction drives insulin resistance. Looking at energy deficiency in more detail as well as general trafficking is part of ongoing work, but given the complexity of this question, it is beyond the scope of the current study.

      Points from reviewer 2

      1. The mechanistic aspect of the work and conclusions put forth rely heavily on studies performed in cultured myocytes, which are highly glycolytic and generally viewed as a poor model for studying muscle metabolism and insulin action. Nonetheless, the findings provide a strong rationale for moving this line of investigation into mouse gain/loss of function models.

      Response: We acknowledge that in vitro models may not fully mimic in vivo complexity as described above in the response to the “Assessment note”. We have now added to the Discussion:

      “In this study, we mainly utilised L6-myotubes, which share many important characteristics with primary muscle fibres. Both types of cells exhibit high sensitivity to insulin and respond similarly to maximal doses of insulin, with GLUT4 translocation stimulated between 2 to 4 times over basal levels in response to 100 nM insulin (as shown in Fig. 1-4 and (46,47)). Additionally, mitochondrial respiration in L6-myotubes has a similar sensitivity to mitochondrial poisons, as observed in primary muscle fibres (as shown in Fig. 5 (48)). Finally, inhibiting ceramide production increases CoQ levels in both L6-myotubes and adult muscle tissue (as shown in Fig. 2-3). Therefore, L6-myotubes possess the necessary metabolic features to investigate the role of mitochondria in insulin resistance, and this relationship is likely applicable to primary muscle fibres”.

      1. One caveat of the approach taken is that exposure of cells to palmitate alone is not reflective of in vivo physiology. It would be interesting to know if similar effects on CoQ are observed when cells are exposed to a more physiological mixture of fatty acids that includes a high ratio of palmitate, but better mimics in vivo nutrition.

      Response: We appreciate the reviewer's comment. Previously, we reported that mitochondrial CoQ depletion occurs in skeletal muscle after 14 and 42 days of HFHSD feeding, coinciding with the onset of insulin resistance (PMID: 29402381, see figure below).

      Author response image 2.

      These data demonstrated that our in vitro model recapitulates the loss of CoQ in insulin resistance observed in muscle tissue in response to a more physiological mixture of fatty acids. Further, it has been reported that different fatty acids can induce insulin resistance via different mechanisms (PMID:20609972), which would complicate interpretation of the data. Saturated fatty acids such as palmitate increase ceramides in cell-lines and humans, but unsaturated FAs generally do not (PMID: 10446195,14592453,34704121). As such we conclude that palmitate is a cleaner model for studying the effects of ceramide on skeletal muscle function.

      We have added to discussion:

      “…These findings align with our earlier observations demonstrating that mice exposed to HFHSD exhibit mitochondrial CoQ depletion in skeletal muscle (Fazakerley et al. 2018).”

      1. While the utility of targeting SMPD5 to the mitochondria is appreciated, the results in Figure 5 suggest that this manoeuvre caused a rather severe form of mitochondrial dysfunction. This could be more representative of toxicity rather than pathophysiology. It would be helpful to know if these same effects are observed with other manipulations that lower CoQ to a similar degree. If not, the discrepancies should be discussed.

      Response: As the reviewer suggests many of these lipids can cause cell death (toxicity) if the dose is too high. We have previously found that low levels (0.15 mM) of palmitate were sufficient to trigger insulin resistance without any signs of toxicity (Hoehn, K, PNAS, 19805130). Using a similar approach, we show that mitochondrial membrane potential is maintained in SMPD5 overexpressing cells (Sup. Fig. 2J - and Author response image 2). Given that toxicity is associated with a loss of mitochondrial membrane potential (eg., 50uM Saclac; RH panel), these data suggest SMPD5 overexpression is not causing overt toxicity.

      Author response image 3.

      Furthermore, we conducted an overrepresentation analysis of molecular processes within our proteomic data from SMPD5-overexpressing cells. As depicted below, no signs of cell toxicity were observed in our model at the protein level. This data is now available in supplementary table 1.

      Author response table 1.

      Our results are therefore consistent with a pathological condition induced by elevated levels of ceramides independently of cellular toxicity. The following text has been added to the discussion:“...downregulation of the respirasome induced by ceramides may lead to CoQ depletion.

      Despite the significant impact of ceramide on mitochondrial respiration, we did not observe any indications of cell damage in any of the treatments, suggesting that our models are not explained by toxicity and increased cell death (Sup. Fig. 2H & J).”

      1. The conclusions could be strengthened by more extensive studies in mice to assess the interplay between mitochondrial ceramides, CoQ depletion and ETC/mitochondrial dysfunction in the context of a standard diet versus HF diet-induced insulin resistance. Does P053 affect mitochondrial ceramide, ETC protein abundance, mitochondrial function, and muscle insulin sensitivity in the predicted directions?

      Response: We agree with the referee about the importance of performing in vivo studies to corroborate our in vitro data. We have now conducted extensive new studies in mice skeletal muscle using targeted metabolomic and lipidomic analyses to investigate the impact of ceramide depletion in CoQ levels in HF-fed mice. Mice were exposed to a HF-fed diet with or without the administration of P053 (selective inhibitor of CerS1) for 5 weeks. As illustrated in the figures below, the administration of P053 led to a reduction in ceramide levels (left panel), increase in CoQ levels (middle panel) and a negative correlation between these molecules (right panel), which is consistent with our in vitro findings.

      Author response image 4.

      Additional suggestions:

      1. Figure 1: How does increased mitochondrial ceramide affect fatty acid oxidation (FAO) in L6-myocytes? As the accumulation of mitochondrial ceramide inhibits respirasome and mitochondrial activity in vitro, can reduce FAO in vivo, due to high mitochondrial ceramide, accounts for ectopic lipid deposition in skeletal muscle of obese subjects?

      Response: We appreciate the reviewer for bringing up this intriguing point. We would like to emphasise that Complex II activity is vital for fatty acid oxidation. As shown in Fig. 5H, our results indicate that specifically Complex II mediated respiration was diminished in cells with SMPD5 overexpression, suggesting that ceramides hinder the mitochondria's capability to oxidise lipids. We agree that this mechanism may potentially play a role in the ectopic lipid accumulation seen in individuals with obesity.

      We have added the following text to discussion:

      “...the mitochondria to switch between different energy substrates depending on fuel availability, named “metabolic Inflexibility”...this mechanism may potentially play a role in the ectopic lipid accumulation seen in individuals with obesity, a condition linked with cardio-metabolic disease.”

      1. Figure 2: Although the authors show that mtSMPD5 overexpression does not affect ceramide abundance in whole cell lysate, it would be critical to examine the abundance of this lipid in other cellular membranes and organelles, particularly plasma membrane. What is the effect of mtSMPD5 overexpression on plasma membrane lipids composition? Does that affect GLUT4-containing vesicles fusion into the plasma membrane, possibly due to depletion of v-SNARE or tSNARE?

      Response: While we acknowledge the importance of this point we strongly feel that measuring lipids in purified membranes has its limitations because it is impossible to purify specific membranes without contamination from other kinds of membranes. For example, we have done proteomics on purified plasma membranes from different cell types and we always observe considerable mitochondrial contamination with these membranes (e.g. PMID 21928809). This was the main factor that led us to use the mitochondrial targeting approach.

      Nevertheless we do acknowledge that there is a possibility that ceramides that are produced in the mitochondria in SMPD5 cells could leak out of mitochondria into other membranes and this could influence other aspects of GLUT4 trafficking and insulin action. However, we believe that the studies using mito targeted ASAH mitigate against this problem. Thus, we have now included a statement in the revised manuscript as follows: “It is also possible that ceramides generated within mitochondria in SMPD5 cells leak out from the mitochondria into other membranes (e.g. PM and Glut4 vesicles) affecting other aspects of Glut4 trafficking and insulin action. However, the observation that ASAH1 overexpression reversed IR without affecting whole cell ceramides argues against this possibility.”.

      1. Figure 4: One critical piece of information missing is the effect (if any) of mitochondrial ceramide accumulation on the mRNAs encoding the ETC components affected by this lipid. Although the ETC protein's lower stability may account for the effect of increased ceramide, transcriptional inhibition can't be ruled out without checking the mRNA expression levels for these ETC components.

      Response: To address this point, we have quantified the mRNA abundance of nine complex I subunits that exhibit downregulation in our proteomic dataset subsequent to mtSMPD5 overexpression (as depicted in Figure 4G).

      Induction of mtSMPD5 expression with doxycycline (below - Left hand panel) had no effect on the mRNA levels of the Complex I subunits (below - right hand panel).. This is consistent with our initial hypothesis that the reduction in electron transport chain (ETC) components, caused by heightened ceramide levels, primarily arises from alterations in protein stability rather than gene expression. While we acknowledge the possibility that certain subunits might be regulated at the transcriptional level, the absence of mRNA downregulation across our data strongly suggests that, at the very least, a portion of the observed protein depletion is attributed to diminished protein stability. We have incorporated this dataset into Supplementary Figure 6J and added the following text to the results:

      Author response image 5.

      “Importantly, CI downregulation was not associated with reduction in gene expression as shown in Sup. Fig. 6J.”

      Additionally, we have added the following text to discussion:

      “In addition, the absence of mRNA downregulation in mtSMPD5 overexpressing cells strongly suggests that at least a portion of the observed protein depletion within CI is attributed to diminished protein stability.”

      1. Figure 3: The authors state that neither palmitate nor mtASAH1 overexpression affected insulin-dependent Akt phosphorylation. However, the results in Figure 3F-G do not support this conclusion, as the overexpression of mtASAH1 does enhance the insulin-stimulated AKT (thr-308) phosphorylation. They need to clarify this issue.

      Response: We have now analysed these data in a manner that preserves the control variance, consistent with the other figures in the manuscript and there is no significant change in Akt phosphorylation in ASAH over-expressing cells.

      Author response image 6.

      1. Figure S2: A functional assessment of mitochondrial function in HeLa cells would be helpful to validate the small effect of Saclac treatment on CI NDUFB8.

      Response: Mitochondrial respiration was conducted in cells treated with Saclac (2 µM and 10 µM) for 24 hours. As shown below, in Hela cells, we did not detect any mitochondrial respiratory impairments at low doses, but only at high doses of Saclac. This suggests that the minor effect of Saclac on CI NDUFB8 is insufficient to alter mitochondrial function.

      Author response image 7.

      Reviewer #2 (Recommendations For The Authors):

      Additional questions and comments for consideration:

      1. The working model links ceramide-induced CoQ depletion to a reduction in ETC proteins and accompanying deficits in OxPhos capacity. The idea that mitochondrial dysfunction necessarily precedes and causes insulin resistance has been heavily debated for years because many animal and human studies have found no overt changes in ETC proteins and/or mitochondrial respiratory capacity during the early phases of insulin resistance. How do the investigators reconcile their work in the context of this controversy?

      Response: We acknowledge this controversy in our revised manuscript more clearly now as follows on page 21: “We present evidence that mitochondrial dysfunction precedes insulin resistance. However, previous studies have failed to observe changes in mitochondrial morphology, respiration or ETC components during early stages of insulin resistance (72). However, in many cases such studies fail to document changes in insulin-dependent glucose metabolism in the same tissue as was used for assessment of mitochondrial function. This is crucial because we and others do not observe impaired insulin action in all muscles from high fat fed mice for example. In addition, surrogate measures such as insulin-stimulated Akt phosphorylation may not accurately reflect tissue specific insulin action as demonstrated in figure 1C. Thus, further work is required to clarify some of these inconsistencies''.

      1. While the utility of targeting SMPD5 to the mitochondria is appreciated, the results in Figure 5 suggest that this manoeuvre caused a rather severe form of mitochondrial dysfunction. Is this representative of pathophysiology or toxicity?

      Response: We believe we have addressed this in point 3 above (Principal comments, reviewer 1, point 3)

      1. How did this affect other mitochondrial lipids (e.g. cardiolipin)?

      Response: As shown in the supplementary figure 3, SMPD5 overexpression did not affect other lipids species such as cardiolipin (D-J). We have added to results:

      “Importantly, mtSMPD5 overexpression did not affect ceramide abundance in the whole cell lysate nor other lipid species inside mitochondria such as cardiolipin, cholesterol and DAGs (Sup. Fig. 3 A, D-J)”

      1. Are these severe effects rescued by CoQ supplementation?

      Response: We have performed additional experiments to address this point. As shown below, mitochondrial ceramide accumulation induced by palmitate was not reversed by CoQ supplementation, as demonstrated in Figure 1F. We have added to results:

      “Addition of CoQ9 had no effect on control cells but overcame insulin resistance in palmitate treated cells (Fig. 1A). Notably, the protective effect of CoQ9 appears to be downstream of ceramide accumulation, as it had no impact on palmitate-induced ceramide accumulation (Fig. 1E-F). Strikingly, both myriocin and CoQ9…”

      Additionally, we assessed mitochondrial respiration by using SeaHorse in cells with SMPD5 overexpression treated with or without CoQ supplementation. Our results, depicted below, indicate that CoQ supplementation reversed the ceramide-induced decrease in basal and ATP linked mitochondrial respiration. We have modified Fig.5.

      Author response image 8.

      We have added to results:

      “Respiration was assessed in intact mtSMPD5-L6 myotubes treated with CoQ9 by Seahorse extracellular flux analysis. mtSMPD5 overexpression decreased basal and ATP-linked mitochondrial respiration (Fig. 5 A, B &C), as well as maximal, proton-leak and non-mitochondrial respiration (Fig. 5 A, D, E & F) suggesting that mitochondrial ceramides induce a generalised attenuation in mitochondrial function. Interestingly, CoQ9 supplementation partially recovered basal and ATP-linked mitochondrial respiration, suggesting that part of the mitochondrial defects are induced by CoQ9 depletion. The attenuation in mitochondrial respiration is consistent with a depletion of the ETC subunits observed in our proteomic dataset (Fig. 4)...”

      1. Are these same effects observed with other manipulations that lower CoQ to a similar degree?

      Response: As mentioned in point 5 (additional suggestions from Reviewer 1), we conducted mitochondrial respiration measurements on HeLa cells treated with Saclac (2 µM and 10 µM) for 24 hours. Our findings showed no signs of mitochondrial respiratory impairments at low doses of Saclac in HeLa cells, despite observing CoQ depletion at this dose (Fig. Sup. 2C). We believe that this variation could be due to the varying sensitivity of mitochondrial respiration/ETC abundance to ceramide-induced CoQ depletion in different cell lines. Alternatively, it is possible that reduced mitochondrial respiration is a secondary event to other mitochondrial/cellular defects such as mitochondrial fragmentation or deficient nutrient transport inside mitochondria.

      *Author response image 9.

      1. The mitochondrial concentrations of CoQ required to maintain insulin sensitivity in L6 myocytes seem to vary from experiment to experiment. Is it the absolute concentration that matters and/or the change relative to a baseline condition?

      Response: This is an excellent observation. The findings indicate that the absolute concentration of CoQ is the determining factor for insulin sensitivity, rather than the relative depletion of CoQ compared to basal conditions. We have added to discussion: “Finally, mtASAH1 overexpression increased CoQ levels. In both control and mtASAH1 cells, palmitate induced a depletion of CoQ, however the levels in palmitate treated mtASAH1 cells remained similar to control untreated cells (Fig. 3I). This suggests that the absolute concentration of CoQ is crucial for insulin sensitivity, rather than the relative depletion compared to basal conditions, thus supporting the causal role of mitochondrial ceramide accumulation in reducing CoQ levels in insulin resistance”

      1. Considering that CoQ has been shown to have antioxidant properties, does the rescue observed after a 16 h treatment require the prolonged exposure, or alternatively, are similar effects observed during short-term exposures (~1-2 h), which might imply a different or additional mechanism.

      Response: This is an excellent point that we have long considered. The problem is how to address the question in a way that will be definitive and we are concerned that the experiment suggested by the referee will not generate definitive data. A major issue is that CoQ has low solubility and needs to reach the right compartment. As such if short term treatment (as suggested) does not rescue, it would be difficult to make any definite conclusions as this might just be because insufficient CoQ is delivered to mitochondria. Conversely, if short term treatment does rescue this could be either because CoQ does get into mitochondria and regulate ETC or because of its general antioxidant function. So, even if we observe a rescue after 1 hour of incubation with CoQ, it will not clarify whether this is due to the antioxidant effect or simply because 1 hour is adequate to boost mitoCoQ levels. Thus, in our view this experiment might not get us any closer to the answer. Nevertheless, we do feel this is an important point and we have added the following statement to our revised manuscript to acknowledge this: “Because CoQ can accumulate in various intracellular compartments, it's important to consider that its impact on insulin resistance might be due to its overall antioxidant properties rather than being limited to a mitochondrial effect”

      1. In Figure 1, CoQ depletion due to 4NB treatment resulted in increased ceramide levels. Could this be due to impaired palmitate oxidation leading to rerouting of intracellular palmitate to the ceramide pathway? This could be tested using stable isotope tracers.

      Response: We have added the statement below to the manuscript to address this point. We feel that while an interesting experiment to perform it is somewhat outside of the major focus of this study.

      “One possibility is that CoQ directly controls ceramide turnover (35). An alternate possibility is that CoQ inside mitochondria is necessary for fatty acid oxidation (12) and CoQ depletion triggers lipid overload in the cytoplasm promoting ceramide production (36). Future studies are required to determine how CoQ depletion promotes Cer accumulation. Regardless, these data indicate that ceramide and CoQ have a central role in regulating cellular insulin sensitivity.”

      1. To a similar point, it would be helpful to know if the C2 ceramide analog is sufficient to cause elevated mito-ceramide and/or CoQ depletion. If not, the results might imply mitochondrial uptake of palmitate is required.

      Response: We feel this point is analogous to Point 7 above in that this experiment is not definitive enough to make any clear conclusions as it may or may not work for many different reasons. For example, C2 ceramide may not work simply because it has the wrong chain length.

      Moreover, it is clear that C2 ceramide has effects that clearly differ from those observed with palmitate most notably the inhibitory effect on Akt signalling. For these reasons we do not agree with the logic of this experiment.

      We have mentioned in the results section:

      “Based on these data we surmise that C2-ceramide does not faithfully recapitulate physiological insulin resistance, in contrast to that seen with incubation with palmitate”.

      1. Likewise, does inhibition of CPT1 ameliorate or exacerbate palmitate-induced insulin resistance?

      Response: This experiment has been performed by a number of different labs. For instance, muscle specific CPT1 overexpression is protective against high fat diet induced insulin resistance in mice (Bruce C, PMID19073774), CPT1 overexpression protects L6E9 muscle cells from fatty acid-induced insulin resistance (Sebastian D, PMID17062841) and increased beta-oxidation in muscle cells enhances insulin stimulated glucose metabolism and is protective against lipid induced insulin resistance (Perdomo G, PMID15105415). We have now cited all of these studies in our revised manuscript in the discussion: “In fact, increased fatty acid oxidation is protective against insulin resistance in several model organisms (37–39)”

      1. Does the addition of palmitate to the cells treated with mtSMPD5 further reduce CoQ9 (Figure 2I and 2J)?

      Response: This intriguing observation, as highlighted by the referee, has prompted us to conduct additional experiments to investigate the effects of palmitate and SMPD5 overexpression on Coenzyme Q (CoQ) levels in L6 myotubes. As demonstrated in the figures presented below, both palmitate and SMPD5 overexpression independently resulted in the depletion of CoQ9, with no observed additive effects suggesting that they shared a common pathway driving CoQ9 deficiency. One plausible hypothesis is that ceramides may trigger the depletion of a specific CoQ9 pool localised within the inner mitochondrial membrane, likely the pool associated with Complex I (CI) in the Electron Transport Chain (ETC). This hypothesis is supported by previous studies indicating that approximately ~25 - 35 % of CoQ binds to CI (PMID: 33722627) and our data demonstrating that ceramide induces a selective depletion of CI in L6 myotubes (Fig. 4).

      We have added this result to Fig. 2I in the main section.

      Author response image 10.

      We have added to the result section:

      “Mitochondrial CoQ levels were depleted in both palmitate-treated and mtSMPD5-overexpressing cells without any additive effects. This suggests that these strategies to increase ceramides share a common mechanism for inducing CoQ depletion in L6 myotubes (Fig. 2I).”

      We have added to the discussion section:

      “...These are known to form supercomplexes or respirasomes where ~25 - 35 % of CoQ is localised in mammals (58,16).…The observation that both palmitate and SMPD5 overexpression trigger CoQ depletion without additive effects support the notion that ceramides may trigger the depletion of a specific CoQ9 pool localised within the inner mitochondrial membrane.”

      1. Some of the cell-based experiments appear to be underpowered and therefore confidence in the interpretations might benefit from additional repeats. For example, in Figure 3i, it appears that palmitate still causes a substantial reduction of CoQ in the cells treated with mtASAH1, even though mito-ceramide levels are restored to baseline. Please specify if these and other results are representative of multiple cell culture experiments or a single experiment.

      Response: All data were derived from a minimum of 3-4 independent experiments from at least two separate cultures of L6 cells. Separate batches of drug treatments were prepared for each experiment. We have previously compared metabolic parameters between batches of cells differentiated at different times (i.e. at least weeks apart) in a previous study (Krycer PMID 31744882) and found variations of <20% for insulin-stimulated glucose oxidation. With an expected variance of 20% and a type I error rate of 0.05, this is sufficient to detect a 40% difference with a power of 0.8. As the reviewer has indicated this is likely underpowered in situations where variance is unexpectedly high or if a small difference needs to be detected.

      In terms of Fig3, the reviewer raises an interesting point. As discussed in point 6, the fact that palmitate still appears to cause a depletion of CoQ in mtASAH1 cells likely indicates that the absolute concentration of CoQ is the determining factor for insulin sensitivity, rather than the relative depletion of CoQ compared to basal conditions. We have added to the discussion:

      “Finally, mtASAH1 overexpression increased CoQ levels. In both control and mtASAH1 cells, palmitate induced a depletion of CoQ, but this effect was less pronounced in the mtASAH1 cell line (Fig. 3I). Our results suggest that the absolute concentration of CoQ is crucial for insulin sensitivity, rather than the relative depletion compared to basal conditions, thus supporting the causal role of mitochondrial ceramide accumulation in reducing CoQ levels in insulin resistance”

      1. The color scheme of 2E is inconsistent with other panels in the figure.

      Response: Corrected

      1. It would be helpful if the axis labels for CoQ graphs were labeled as "Mito-CoQ" for clarity.

      Response: Corrected

    1. Author Response

      The following is the authors’ response to the previous reviews

      We appreciate the positive comments from the editors and reviewers. The followings are the point to point responses to the questions and comments of the Reviewers:

      Reviewer #1 (Public Review):

      In this study, Jiamin Lin et al. investigated the potential positive feedback loop between ZEB2 and ACSL4, which regulates lipid metabolism and breast cancer metastasis. They reported a correlation between high expression of ZEB2 and ACSL4 and poor survival of breast cancer patients, and showed that depletion of ZEB2 or ACSL4 significantly reduced lipid droplets abundance and cell migration in vitro. The authors also claimed that ZEB2 activated ACSL4 expression by directly binding to its promoter, while ACSL4 in turn stabilized ZEB2 by blocking its ubiquitination. While the topic is interesting, there are several concerns with the study:

      1. My concern regarding the absence of appropriate thresholds or False Discovery Rate (FDR) adjustments for the RNA-seq analysis has not been addressed, leading to incorrect thresholds and erroneous identification of significant signals.

      Response: We thank the reviewer for the concern about the RNA-seq analysis. RNA-seq data was analyzed by the Benjamini and Hochberg’s approach for controlling the false discovery rate. The procedure of RNA-seq bioinformatic analysis is as follows: For data analysis, raw data of fastq format were firstly processed through in-house perl scripts. In this step, clean data were obtained by removing reads containing adapter, reads containing N base and low quality reads from raw data. All the downstream analyses were based on the clean data with high quality. Index of the reference genome was built using Hisat2 v2.0.5 and paired-end clean reads were aligned to the reference genome using Hisat2 v2.0.5. FeatureCounts v1.5.0-p3 was used to count the reads numbers mapped to each gene, and then FPKM of each gene was calculated based on the length of the gene and reads count mapped to this gene. Differential expression analysis of two conditions/groups was performed using the DESeq2 R package (1.20.0). The resulting P-values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate. Genes with an adjusted P-value (<0.05) found by DESeq2 were assigned as differentially expressed.

      1. In Figure 3B and C, it appears that the knockdown efficiency of ACSL4 is inadequate in these cells, which contradicts the Western blot results presented in Figure 2F.

      Response: We thank the reviewer for the concern. In figure 3B and 3C, we use the shRNA for the knockdown experiment and in Figure 2F we use siRNA for the knockdown experiment, so the efficiency of them were different.

      1. Regarding Figure 6, the discovery of consensus binding sequences (CACCT) for ZEB2 alone is insufficient evidence to support the direct binding of ZEB2 to the ACSL4 promoter.

      Response: We thank the reviewer for the concern. We performed chromatin immunoprecipitation (ChIP), which examines the direct interaction between DNA and protein, to test if ZEB2 directly binds to the ACSL4 promoter. The results showed that the primer set 1, which covered -184 to -295 of ACSL4 promoter region exhibited apparent ZEB2 binding (Fig. 6F). Moreover, the mutant sequence (AAAA) of ACSL4 promoter showed significant decreased luciferase activity (Fig. 7H). All these evidences suggest that ZEB2 directly bond to the consensus sequence of ACSL4 promoter.

      1. For Figure 7E, there are multiple bands present, and it appears that ZEB2-HA has been cropped, which should ideally be presented with unaltered raw data. Please provide the uncropped raw data.

      Response: We thank the reviewer for the concern. The raw data of the figure 7E ZEB2-HA is shown in Author response image 1:

      Author response image 1.

      1. In Figure 7C, the author claimed to have used 293T cells for the ubiquitin assay, which are not breast cancer cells. Moreover, the efficiency of over-expression differs between ZEB2 and ACSL4 in 293T cell lines. Performing the experiment in an unrelated cell line to justify an important interaction is not acceptable.

      Response: We thank the reviewer for the concern. We also performed the ubiquitination assay in MDA-MB-231 cells in Fig 7D (Author response image 2), The results confirm that knockdown of ACSL4 obviously enhanced the ubiqutination of ZEB2. We also have performed the IP experiment in MDA-MB-231 cells in Author response image 3 (Fig 7F). The results confirmed the interaction between ZEB2 and ACSL4:

      Author response image 2.

      Author response image 3.

      Reviewer #2 (Public Review):

      In this study, the authors validated a positive feedback loop between ZEB2 and ACSL4 in breast cancer, which regulates lipid metabolism to promote metastasis.

      Overall, the study is original, well structured, and easy to read.

      We appreciate the positive comments from the reviewer.

      Reviewer #3 (Public Review):

      The manuscript by Lin et al. reveals a novel positive regulatory loop between ZEB2 and ACSL4, which promotes lipid droplets storage to meet the energy needs of breast cancer metastasis.

      We appreciate the positive comments from the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      I still have some points that should be addressed by the Authors:

      The interaction between ACSL4 and ZEB2 is still not convincing, due to the cellular localization of ACSL4 and ZEB2 is different. The authors should consider utilizing the Duolink experiment to more accurately determine the interaction location of these two proteins in cells.

      Response: We appreciate the reviewer’s suggestion. We performed GST pull-down assay to examine whether ZEB2 and ACSL4 form a complex. GST pull-down assay confirmed the interaction of ZEB2 and ACSL4 (Supplementary Fig. S10). We also performed immunofluorescence assay and found that ZEB2 was co-localized with ACSL4 in some certain regions of the cytoplasm in Author response image 5 (Supplementary Fig. S11):

      Author response image 4.

      Author response image 5.

      In Figure S4, the authors showed both "shACSL4" and "siACSL4", which is a description error.

      Response: We appreciate the reviewer to point out the mistake. We have corrected the "siACSL4" into "shACSL4".

      Author response image 6.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript is improved.

      We appreciate the positive comments from the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors attempt to describe alterations in gene expression, protein expression, and protein phosphorylation as a consequence of chronic adenylyl cyclase 8 overexpression in a mouse model. This model is claimed to have resilience to cardiac stress.

      Major strengths of the study include 1) the large dataset generated which will have utility for further scientific inquiry for the authors and others in the field, 2) the innovative approach of using cross-analyses linking transcriptomic data to proteomic and phosphoproteomic data. One weakness is the lack of a focused question and clear relevance to human disease. These are all critical biological pathways that the authors are studying and essentially, they have compiled a database that could be surveyed to generate and test future hypotheses.

      Thank you for your efforts to review our manuscript, we are delighted to learn that you found our approach to link transcriptomic, proteomic and phosphoproteome data in our analysis to be innovative. Your comment that we have not focused on a question with clear relevance to human disease is “right on point!”

      During chronic pathophysiologic states e.g., chronic heart failure (CHF) in humans, AC/cAMP/PKA/Ca2+ signaling increases progressively the degree of heart failure progresses, leading to cardiac inflammation, mediated in part, by cyclic-AMP- induced up- regulation of renin-angiotensin system (RAS) signaling. Standard therapies for CHF include β-adrenoreceptor blockers and RAS inhibitors, which although effective, are suboptimal in amelioration of heart failure progression. One strategy to devise novel and better therapies for heart failure, would be to uncover the full spectrum of concentric cardio- protective adaptations that becomes activated in response to severe, chronic AC/cAMP/PKA/Ca2+ -induced cardiac stress.

      We employed unbiased omics analyses, in our prior study (https://elifesciences.org/articles/80949v1) of the mouse harboring cardiac specific overexpression of adenylyl cyclase type 8 (TGAC8), and identified more than 2,000 transcripts and proteins, comprising a broad array of biological processes across multiple cellular compartments, that differed in TGAC8 left ventricle compared to WT. These bioinformatic analyses revealed that marked overexpression of AC8 engages complex, concentric adaptation "circuity" that has evolved in mammalian cells to confer resilience to stressors that threaten health or life. The main human disease category identified in these analyses was Organismal Injury and Abnormalities, suggesting that defenses against stress were activated as would be expected, in response to cardiac stress. Specific concentric signaling pathways that were enriched and activated within the TGAC8 protection circuitry included cell survival initiation, protection from apoptosis, proliferation, prevention of cardiac-myocyte hypertrophy, increased protein synthesis and quality control, increased inflammatory and immune responses, facilitation of tissue damage repair and regeneration and increased aerobic energetics. These TGAC8 stress response circuits resemble many adaptive mechanisms that occur in response to the stress of disease states and may be of biological significance to allow for proper healing in disease states such as myocardial infarction or failure of the heart. The main human cardiac diseases identified in bioinformatic analyses were multiple types cardiomyopathies, again suggesting that mechanisms that confer resilience to the stress of chronic increased AC-PKA-Ca2+ signaling are activated in the absence of heart failure in the super-performing TGAC8 heart at 3-months of age.

      In the present study, we performed a comprehensive in silico analysis of transcription, translation, and post-translational patterns, seeking to discover whether the coordinated transcriptome and proteome regulation of the adaptive protective circuitry within the AC8 heart that is common to many types of cardiac disease states identified in our previous study (https://elifesciences.org/articles/80949v1) extends to the phosphoproteome.

      Reviewer #2 (Public Review):

      In this study, the investigators describe an unbiased phosphoproteomic analysis of cardiac-specific overexpression of adenylyl cyclase type 8 (TGAC8) mice that was then integrated with transcriptomic and proteomic data. The phosphoproteomic analysis was performed using tandem mass tag-labeling mass spectrometry of left ventricular (LV) tissue in TGAC8 and wild-type mice. The initial principal component analysis showed differences between the TGAC8 and WT groups. The integrated analysis demonstrated that many stress-response, immune, and metabolic signaling pathways were activated at transcriptional, translational, and/or post-translational levels.

      The authors are to be commended for a well-conducted study with quality control steps described for the various analyses. The rationale for following up on prior transcriptomic and proteomic analyses is described. The analysis appears thorough and well-integrated with the group's prior work. Confirmational data using Western blot is provided to support their conclusions. Their findings have the potential of identifying novel pathways involved in cardiac performance and cardioprotection.

      Thank you for your efforts to review our manuscript, we are delighted to learn that you found our approach to link transcriptomic, proteomic and phosphoproteome data in our analysis. We are delighted that you found our work to be well-conducted, to have been well performed, and that our analysis was thorough and well-integrated with our prior work in this arena and that are findings have the potential of identifying novel pathways involved in cardiac performance and cardioprotection.

      Reviewer #1 (Recommendations For The Authors):

      I humbly suggest that the authors reconsider the title, as it could be more clear as to what they are studying. Are the authors trying to highlight pathways related to cardiac resilience? Resilience might be a clearer word than "performance and protection circuitry".

      Thank you for this important comment. We have revised the title accordingly: Reprogramming of cardiac phosphoproteome, proteome and transcriptome confers resilience to chronic adenylyl cyclase-driven stress.

      Perhaps the text can be reviewed in detail by a copy-editor, as there are many grammatically 'awkward' elements (for example, line 56: "mammalians" instead of mammals), inappropriate colloquialisms (for example, line 73: "port-of-call"), and stylistic unevenness that make it difficult to read.

      We have reviewed the text in detail, with the assistance of a copy editor, in order to identify and correct awkward elements and to search for other colloquialisms. Finally, although “stylistic unevenness” to which you refer may be difficult for us to identify during our re-edits, we have tried our best to identify and revise them.

      The best-written sections are the first few paragraphs of the discussion section, which finally clarify why the TGAC8 mouse is important in understanding cardiac resilience to stress and how the present study leverages this model to disentangle the biological processes underlying the resilience. I wish this had been presented in this manner earlier in the paper, (in the abstract and introduction) so I could have had a clearer context in which to interpret the data. It would also be helpful to point out whether the TGAC8 mouse has any correlates with human disease.

      Thank you for this very important comment. Well put! In addition to recasting the title to include the concept of resilience, we have revised both the abstract and introduction to feature what you consider to be important to the understanding of cardiac resilience to stress, and how the present study leverages this model to disentangle the biological processes underlying the resilience.

      Reviewer #2 (Recommendations For The Authors):

      1. How were the cutoffs determined to distinguish between upregulated/downregulated phosphoproteins and phosphopeptides?

      Thank you for this important question. We used the same criteria to distinguish differences between TGAC8 and WT for unnormalized and normalized phosphoproteins, -log10(p-value) > 1.3, and log2FoldChange <= -0.4 (down) or log2FoldChange >= 0.4 (up), as stated in the methods section, main text and figure legend. The results were consistent across all analyses and selectively verified by experiments.

      1. Were other models assessed for correlation between transcriptome and phosphoproteome other than a linear relationship of log2 fold change?

      Thank you for this comment. In addition to a linear relationship of log2 fold change of molecule expression, we also compared protein activities, e.g., Fig 4F, and pathways enriched from different omics, e.g., Fig 3D, 5J, 6B and 6F.

      1. Figures 1A and 5G seem to show outliers. How many biological and technical replicates would be needed to minimize error?

      Thank you for the question. Figures 1A and 5G were PCA plots which, as expected, manifested some genetic variability among the same genotypes. The PCA plots, however, are useful in determining how the identified items separated, both within and among genotypes. For bioinformatics analysis such as ours, 4-5 samples are sufficient to accomplish this, as demonstrated by separation, by genotype, of samples in PCA. Thus, in addition to discovery of true heterogeneity among the samples, our results are still able to robustly discover the true differences between the genotypes.

      1. Were the up/downregulated genes more likely to be lowly expressed (which would lead to larger log2 changes identified)?

      In response to your query, we calculated the average expression of phosphorylation levels across all samples to observe whether they were expressed in low abundance in all samples. We also generated the MA plots, an application of a Bland–Altman plot, to create a visual representation of omics data. The MA plots in Author response image 1 illustrate that the target molecules with significantly changed phosphorylation levels did not aggregate within the very low abundance. To confirm this conclusion, we adopted two sets of cutoffs: (1) change: -log10(p-value) > 1.3, and log2FoldChange < 0 (down) or log2FoldChange > 0 (up); and (2) change_2: -log10(p-value) > 1.3, and log2FoldChange <= -0.4 (down) or log2FoldChange >= 0.4 (up).

      Author response image 1.

      1. "We verified some results through wet lab experiments" in the abstract is vague.

      Thank you for the good suggestion. What we meant to indicate here was that identified genotypic differences in selected proteins, phosphoproteins and RNAs discovered in omics were verified by western blots, protein synthesis detection, proteosome activity detection, and protein soluble and insoluble fractions detection. However, we have deleted the reference to the wet lab experiments in the revised manuscript.

      1. There are minor syntactical errors throughout the text.

      Thank you very much for the suggestion. As noted in our response, we have edited and revised those errors throughout the text.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The investigators have performed a state-of-the art systematic review and meta-analysis of studies that may help to answer the research question: if administration of multiple antibiotics simultaneously prevents antibiotic resistance development in individuals. The amount of studies eligible for analysis is very low, and within that low number, there is huge variability in bug-drug combinations studied and most studies had a high risk of bias, further limiting the capability of meta-analysis to answer the research question. In addition, based on I2 values there is also huge statistical heterogeneity between outcomes of studies compared, further limiting the predictive value of meta-analysis. In fact, the only 2 studies meeting all eligibility criteria addressed the treatment of mycobacterium tuberculosis, for which the research question is hardly applicable. The authors, therefore, conclude that "our analysis could not identify any benefit or harm of using a higher or a lower number of antibiotics regarding within-patient resistance development." Apart from articulating this knowledge gap, the findings will not have consequences for patient care, but may stimulate the scientific community to better address this research question in future studies.

      Strengths:

      The systematic and rigorous approach for the review and meta-analysis.

      Weaknesses:

      None identified.

      We thank the reviewer for this thoughtful and positive appraisal of our work.

      Reviewer #2 (Public Review):

      Summary:

      The authors performed a systematic review and meta-analysis to investigate whether the frequency of emergence of resistance is different if combination antibiotic therapy is used compared to fewer antibiotics. The review shows that there is currently insufficient evidence to reach a conclusion due to the limited sample size. High-quality studies evaluating appropriate antimicrobial resistance endpoints are needed.

      Strengths:

      The strengths of the manuscript are that the article addresses a relevant research question that is often debated. The article is well-written and the methodology used is valid. The review shows that there is currently insufficient evidence to reach a conclusion due to the limited sample size. High-quality studies evaluating appropriate antimicrobial resistance endpoints are needed. I have several comments and suggestions for the manuscript.

      Weaknesses:

      Weaknesses of the manuscript are the large clinical and statistical heterogeneity and the lack of clear definitions of acquisition of resistance. Both these weaknesses complicate the interpretation of the study results.

      We thank the reviewer for the positive comments and pointing out where our work can be improved.

      Major comments:

      My main concern about the manuscript is the extent of both clinical and statistical heterogeneity, which complicates the interpretation of the results. I don't understand some of the antibiotic comparisons that are included in the systematic review. For instance the study by Paul et al (50), where vancomycin (as monotherapy) is compared to co-trimoxazole (as combination therapy). Emergence (or selection) of co-trimoxazole in S. aureus is in itself much more common than vancomycin resistance. It is logical and expected to have more resistance in the co-trimoxazole group compared to the vancomycin group, however, this difference is due to the drug itself and not due to co-trimoxazole being a combination therapy. It is therefore unfair to attribute the difference in resistance to combination therapy. Another example is the study by Walsh (71) where rifampin + novobiocin is compared to rifampin + co-trimoxazole. There is more emergence of resistance in the rifampin + co-trimoxazole group but this could be attributed to novobiocin being a different type of antibiotic than co-trimoxazole instead of the difference being attributed to combination therapy. To improve interpretation and reduce heterogeneity my suggestion would be to limit the primary analyses to regimens where the antibiotics compared are the same but in one group one or more antibiotic(s) are added (i.e. A versus A+B). The other analyses are problematic in their interpretation and should be clearly labeled as secondary and their interpretation discussed.

      We acknowledge the presence of statistical and clinical heterogeneity in our overall analysis. The decision to pursue this comprehensive examination was predefined in our previously published study protocol (PROSPERO CRD42020187257) and driven by our interest whether, despite some differences, we could either identify an overarching effect of combination therapy on resistance or identify factors that explain potential differences of the effect of combination therapy across pathogens/drugs. We indeed, find that heterogeneity is high, however identifying the driving factors of this heterogeneity is difficult as evidence is limited.

      We carried out several subgroup analyses, e.g. explicitly focusing on specific pathogen groups and medical conditions or exploring heterogeneity in treatment arms (figure 3, supplementary materials section 6). However, it is important to highlight that the number of studies available for these subgroup analyses was low. Additionally, recognizing the high heterogeneity within treatment arms, we performed a subgroup analysis focusing solely on resistances of antibiotics common to both arms (supplementary material section 6.1.8; which would avoid comparisons such as the one between vancomycin and co-trimoxazole raised by the reviewer). Unfortunately, this also revealed substantial heterogeneity. While we aimed to address heterogeneity through these subgroup analyses, limitations arose due to the number of studies meeting specific criteria and the nature of data provided by these studies.

      Moreover, regarding the concern on interpretation of co-trimoxazole as combination therapy, we acknowledge the confusion surrounding its classification as one or two antibiotics. Despite the common contemporary view of co-trimoxazole as a single antibiotic, we chose to consider it as two antibiotics due to historical practices, as observed in Black et al. (1982), where trimethoprim was compared to trimethoprim and sulfamethoxazole. We recognize that this decision may lead to confusion and we consider conducting a further sensitivity analysis in the future version of this manuscript, exploring the possibility of considering co-trimoxazole as a single antibiotic. We agree that the slight trend of less antibiotics performing better overserved for MRSA, should not be over interpreted as this is driven by the two studies Walsh et al 1993 and Paul et al 2015 as pointed out by the reviewer. In lines 183-186 we discuss this issue that for better evaluation of antibiotic combination therapy, more studies which use identical antibiotics (i.e. A versus A+B) are needed. We will try to clarify and highlight this in the future version of the manuscript.

      Another concern is about the definition of acquisition of resistance, which is unclear to me. If for example meropenem is administered and the follow-up cultures show Enterococcus species (which is intrinsically resistant to meropenem), does this constitute acquisition of resistance? If so, it would be misleading to determine this as an acquisition of resistance, as many people are colonized with Enterococci and selection of Enterococci under therapy is very common. If this is not considered as the acquisition of resistance please include how the acquisition of resistance is defined per included study.

      Thank you for pointing out this potential ambiguity. Our definition of “acquisition of resistance” is agnostic to bacterial species and hence intrinsically resistant species can be included if they were only detected during the follow-up culture by the studies. We will clarify this in the definition of “acquisition of the resistance” in the manuscript (see l. 259-260). However, it was not always clear from the studies which pathogens were acquired or whether intrinsically resistant species were not reported. Therefore, we rely on the studies' specifications of resistant and non-resistant without further classifying data into intrinsic and non-intrinsic resistance. The outcome “acquisition of resistance” can be seen more of a risk assessment for having any resistant bacterium during or after treatment. In contrast, the outcome “emergence of resistance” is more rigorous, demanding the same species to be measured as more resistant during or after treatment.

      Table S1 is not sufficiently clear because it often only contains how susceptibility testing was done but not which antibiotics were tested and how a strain was classified as resistant or susceptible.

      In Table S1, we omitted the listing of antibiotics for which susceptibility testing was performed, as this information is already presented in the main text (Table 1). However, we agree that linking this information better in a future version would benefit the understanding. Given the variability in methods used to assess resistance and the variability in drugs, the comparability of breakpoints is limited. Hence, we decided not to provide further details on this aspect so far.

      Line 85: "Even though within-patient antibiotic resistance development is rare, it may contribute to the emergence and spread of resistance."

      Depending on the bug-drug combination, there is great variation in the propensity to develop within-patient antibiotic resistance. For example: within-patient development of ciprofloxacin resistance in Pseudomonas is fairly common while within-patient development of methicillin resistance in S. aureus is rare. Based on these differences, large clinical heterogeneity is expected and it is questionable where these studies should be pooled.

      We agree that our formulation neglects differences in prevalence of within-host resistance emergence depending on bug-drug combinations. We will correct this in our upcoming version. (i.e. we will correct our statement to: “Within-patient antibiotic resistance development, even if rare, can contribute to the emergence and spread of resistance.”)

      Line 114: "The overall pooled OR for acquisition of resistance comparing a lower number of antibiotics versus a higher one was 1.23 (95% CI 0.68 - 2.25), with substantial heterogeneity between studies (I2=77.4%)"

      What consequential measures did the authors take after determining this high heterogeneity? Did they explore the source of this large heterogeneity? Considering this large heterogeneity, do the authors consider it appropriate to pool these studies?

      Thank you for highlighting this lack of clarity. In our upcoming version, we will emphasize the sub-analyses conducted to explore heterogeneity (i.e., figure 3 and supplementary materials section 6). Nevertheless, these analyses faced limitations due to the scarcity of evidence and the data provided by the studies. Given the lack of appropriate evidence, it is hard to identify the source of heterogeneity. The decision to pool all studies was pre-specified in our previously published study protocol (PROSPERO CRD42020187257) and was motivated by the question whether there is a general effect of combination therapy on resistance development or identify factors that explain potential differences of the effect of combination therapy across bug-drug combinations.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We confirm that that “count-down” parameter, mentioned by reviewer 1, is indeed counted from the first lockdown day and increases continuously, even when we do not have any data – and that this is clearly written in the manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (Note, while these authors do reference Derryberry et al., I thought that there could have been much more direct comparison between the results of the two approaches).

      We added some more discussion of the differences between the papers.

      One important drawback of the approach, which potentially calls into question the authors' conclusions, is that the acoustic sampling only occurred during the pandemic: for several lockdown periods and then for a period of 10 days immediately after the end of the final lockdown period in May of 2020. Several relevant things changed from March to May of 2020, most notably the shift from spring to summer, and the accompanying shift into and through the breeding season (differing for each of the three focal species). Although the statistical methods included an attempt to address this, neither the inclusion of the "count down" variable nor the temperature variable could account for any non-linear effects of breeding phenology on vocal activity. I found the reliance on temperature particularly troubling, because despite the authors' claims that it was "a good proxy of seasonality", an examination of the temperature data revealed a considerable non-linear pattern across much of the study duration. In addition, using a period immediately after the lockdowns as a "no-lockdown" control meant that any lingering or delayed effects of human activity changes in the preceding two months could still have been relevant (not to mention the fact that despite the end of an official lockdown, the pandemic still had dramatic effects on human activity during late May 2020).

      In general, the reviewer is correct, and we reformulated some of the text to more carefully address these points. However, we would like to note two things: (1) Changes occurred rapidly with birds rapidly changing their behavior – this is one of the main conclusions of our study, i.e., that urban dwelling animals are highly plastic in behavior. So that lingering effects were unlikely. (2) Changes occurred in both directions, and thus seasonality (which is expected to have a uni-directional effect) cannot explain everything we observed. We are not sure what the reviewer means by ‘considerable non-linear patterns’ when referring to the temperature. Except for ~5 days with temperatures that exceeded the expected average by 3-4 degrees, the temperature increased approximately linearly during the period as expected from seasonality (see Author response image 1). Following the reviewer’s comment, we tested whether exclusion of data from these days changes the results and found no change.

      We would like to note that in terms of breeding, all birds were within the same state during both the lockdown and the non-lockdown periods. Parakeets and crows have a long breeding season Feb-end of June with one cycle. They will stay around the nest throughout this season and especially in the peak of the season March-May. Prinias start slightly later at the beginning of March with 2-3 cycles till end of June.

      Regarding the comment about human activity, as we now also note in the manuscript, reality in Israel was actually the opposite of the reviewer’s suggestion with people returning to normal behavior towards the end of the lockdown (even before its official removal). We believe that this added noise to our results, and that the effect of the lockdown was probably higher than we observed.

      Author response image 1.

      Another weakness of the current version of the manuscript is the use of a supposed "contradiction" in the existing literature to create the context for the present study. Although the various studies cited do have many differences in their results, those other papers lay out many nuanced hypotheses for those differences. Almost none of the studies cited in this manuscript actually reported blanket increases or decreases in urban birds, as suggested here, and each of those papers includes examples of species that showed different responses. To suggest that they are on opposite sides of a supposed dichotomy is a misrepresentation. Many of those other studies also included a larger number of different species, whereas this study focused on three. Finally, this study was completed at a much finer spatial scale than most others and was examining micro-habitat differences rather than patterns apparent across landscapes. I believe that highlighting differences in scale to explain nuanced differences among studies is a much better approach that more accurately adds to the body of literature.

      We thank the reviewer for this good feedback and revised the manuscript, accordingly, placing more emphasis on the micro-scale of this study.

      Finally a note on L244-247: I would recommend against discounting the possibility that lockdowns resulted in changes to the birds' vocal acoustics, as Derryberry et al. 2020 found, especially while suggesting that their results were the effects of signal processing artifacts. Audio analysis is not my area of expertise, but isn't it possible that the birds did increase call intensity, but were simply not willing (or able) to increase it to the same degree as the additional ambient noise?

      This is an important question. The fact is that when ambient noise increases (at the relevant frequency channels), then the measured vocalizations will also increase. There is no way to separate the two effects. Thus, as scientists, when we cannot measure an effect, it is safer not to suggest an effect. Unfortunately, most studies that claim an increase in vocalizations’ intensity in noise, do not account for this potential artifact (and most of them do not estimate noise at a species-specific level as we have done). This has created a lot of “noise” in the field. We do not want to criticize the Derryberry results without analyzing the data, but from reading their methods it does not seem like they took the noise into account in their acoustic measurements. But if you look at their figure 4A you will see a lot of variability in measuring the minimum frequency – which could be strongly affected by ambient noise.

      In light of the above, we thus prefer to be careful and not to state changes that are probably false. We added some of this information to the manuscript. We also added the linear equations to the graph (in the caption of figure 3) where it can be seen that the slope is always <=1.

      Reviewer 2:

      The explanation of methods can be improved. For example, it is not clear if data were low-pass filtered before resampling to avoid aliasing.

      We edited the methods and hopefully they are clearer now. Regarding the specific question – yes, an LPF was applied to prevent aliasing before the resampling. This information was added to the manuscript.

      It is quite possible that birds move into the trees and further from the recorders with human activity. Since sound level decreases by the square of the distance of the source from the recorders, this could significantly affect the data. As indicated in the Discussion, this is a significant parameter that could not be controlled.

      The reviewer is correct, and we addressed this point. Such biases could arise with any type of surveying including manual transects (except for perhaps, placing tags on the animals). We note that we only analyzed high SNR signals and that the species we selected somewhat overcome this bias – both crows and parakeets are not shy and Prinias are anyway shy and prefer to not be out in the open. We would also expect to see a stronger effect for human speech if this was a central phenomenon, and we did not see this, but of course this might have affected our results.

      In interpreting the data, the authors mention the effect of human activity on bird vocalizations in the context of inter-species predator-prey interactions; however, the presence of humans could also modify intraspecies interactions by acting as triggers for communication of warning and alarm, and/or food calls (as may sometimes be the case) to conspecifics. Along the same lines, it is important to have a better understanding of the behavioral significance of the syllables used to monitor animal activity in the present study.

      We agree with this point and added more discussion of both this potential bias and the type of syllables that were analyzed.

      Another potential effect that may influence the results but is difficult to study, relates to the examination of vocalizations near to the ambient noise level. This is the bandwidth of sound levels where most significant changes may occur, for example, due to the Lombard effect demonstrated in bird and bat species. However, as indicated, these are also more difficult to track and quantify. Moreover, human generated noise, other than speech, may be a more relevant factor in influencing acoustic activity of different bird species. Speech, per se, similar to the vocalizations of many other species, may simply enrich the acoustic environment so that the effects observed in the present study may be transient without significant long-term consequences.

      We note that we already included a noise parameter (in addition to human speech) in the original manuscript. Following the reviewer’s comment, we examined another factor, namely we replaced the previous ambient noise parameter with an estimate of ambient noise under 1kHz which should reflect most anthropogenic noise (not restricted to human speech). This model gave very similar results to the previous one (which is not very surprising as noise is usually correlated). We added this information to the revised manuscript, and we now also added examples of anthropogenic noise to the supplementary materials (Fig. S8). In general, we accept the comments made by the reviewer, but would like to emphasize that we only analyze high SNR vocalization (and not vocalizations that were close to the noise level). This strategy should have overcome biases that resulted from slight changes in ambient noise.

      In general, the authors achieved their aim of illustrating the complexity of the effect of human activity on animal behavior. At the same time, their study also made it clear that estimating such effects is not simple given the dynamics of animal behavior. For example, seasonality, temperature changes, animal migration and movement, as well as interspecies interactions, such as related to predator-prey behavior, and inter/intra-species competition in other respects can all play into site-specific changes in the vocal activity of a particular species.

      We completely agree and tried to further emphasize this in the revised manuscript. This is one of the main conclusions of this study – we should be careful when reaching conclusions.

      Although the methods used in the present study are statistically rigorous, a multivariate approach and visualization techniques afforded by principal components analysis and multidimensional scaling methods may be more effective in communicating the overall results.

      Following this comment, we ran a discriminant function analysis with the parameters of the best model (site category, ambient noise, human activity, temperature and lockdown state) with the task of classifying the level of bird activity. The DFA analysis managed to classify activity significantly above chance and the weights of the parameters revealed some insight about their relative importance. We added this information to the revised manuscript

      Suggestions for improvement:

      In Figure 2, the labeling of the Y-axis in the right panel should be moved to the left, similar to A and C. This will provide clear separation between the two side-to-side panels.

      Revised

      In Figure 3, it will be good to see the regression lines (as dashed lines) separately for the lockdown and no-lockdown conditions in addition to the overall effect.

      Revised

      Editor:

      Limitations

      Scale: The study's limited spatial and temporal scale was not addressed by the authors, which contrasts with the broader scope of other cited studies. To enhance the significance of the study, acknowledging and clearly highlighting this limitation, along with its potential caveats, modifications in the language used throughout the text would be beneficial. Furthermore, although the authors examined slight variations in habitat, it is important to note that all sites were primarily located within an urban landscape.

      We revised the manuscript accordingly.

      Control period: The control period is significantly shorter than the lockdown treatment period and occurs at a different time of year, potentially impacting the vocalization patterns of birds due to different annual cycle stages. It is crucial to consider that the control period falls within the pandemic timeframe despite being shortly after the lockdowns ended.

      Revised – we included a control comparison to periods of equal length within the lockdown. People gradually stopped obeying the lockdown regulations before its removal so in fact, the official removal date is probably an overestimate for the effect of the lockdown. We now explain this.

      Recommendations

      Human-generated noise, beyond speech, might have a greater influence on the acoustic activity of various bird species, but previous studies lacked detailed human activity data. Instead of solely noting the number of human talkers, the authors could quantify other aspects of human activity such as vehicles or overall anthropogenic noise volume. Exploring the relationships between these factors and bird activity at a fine scale, while disentangling them from bird detection, would be compelling. It is important to consider the potential difficulty in resolving other anthropogenic sounds within a specific bandwidth, which could be demonstrated to readers through spectrograms and potential post-pandemic changes. Such information, including daily coefficient of variation/fluctuation rather than absolute frequency spectra, could provide valuable insights.

      We note that we have already included an ambient noise factor (in addition to human speech) in the previous version. Following the reviewers’ comments, we examined another factor, namely we replaced the current ambient noise parameter with the ambient noise under 1kHz which should reflect most of anthropogenic noise (not restricted to human speech). This model gave very similar results to the previous one (which is not surprising as noise is usually correlated). We also added several spectrograms in the Supplementary material that show examples of different types of noise.

      Authors should limit their data interpretation to the impact of lockdown on behavioral responses within small-scale variations in habitat. A key critique is the assumption that activity changes solely resulted from the lockdown, disregarding other environmental factors and phenology.

      Following the editor comment we realized that our conclusion\assertations were not clear. We never claimed that activity changes solely resulted from the lockdown. While revsing the mansucirpt we ensurred that we show a significant effect of temperature, ambient noise and human activity – all of which are not dependent on lockdown. We made an effort to emphasize the complexity of the system. We show that the lockdown seemed to have an additional impact, but we never claimed it was the only factor.

      To address this, the authors could compare acoustic monitoring data within a shorter timeframe before and after the lockdown (20 days), while also controlling for temperature effects, to strengthen the validity of their claims. They would need to explain in their discussion, however, that such a comparison may still be confounded by any carry-over effects from the 10 days of treatment.

      This analysis would be difficult because although the lockdown was officially removed at a specific date, it was gradually less respected by the citizens and thus the last period of the lockdown was somewhere between lockdown and no-lockdown. This is why we chose the approach of taking 10 days randomly from within the lockdown period and comparing them with the 10 post-lockdown days. We now clarify the reason better.

      An option is that authors could frame their analysis as a study of the behavior of wildlife coming out of a lockdown, to draw a distinction from other studies that compared pre-pandemic data to pandemic data.

      Good idea – revised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank all three Reviewers for their comments and have revised the manuscript accordingly.

      Reviewer #1 (Public Review):

      The main objective of this paper is to report the development of a new intramuscular probe that the authors have named Myomatrix arrays. The goal of the Myomatrix probe is to significantly advance the current technological ability to record the motor output of the nervous system, namely fine-wire electromyography (EMG). Myomatrix arrays aim to provide large-scale recordings of multiple motor units in awake animals under dynamic conditions without undue movement artifacts and maintain long-term stability of chronically implanted probes. Animal motor behavior occurs through muscle contraction, and the ultimate neural output in vertebrates is at the scale of motor units, which are bundles of muscle fibers (muscle cells) that are innervated by a single motor neuron. The authors have combined multiple advanced manufacturing techniques, including lithography, to fabricate large and dense electrode arrays with mechanical features such as barbs and suture methods that would stabilize the probe's location within the muscle without creating undue wiring burden or tissue trauma. Importantly, the fabrication process they have developed allows for rapid iteration from design conception to a physical device, which allows for design optimization of the probes for specific muscle locations and organisms. The electrical output of these arrays is processed through a variety of means to try to identify single motor unit activity. At the simplest, the approach is to use thresholds to identify motor unit activity. Of intermediate data analysis complexity is the use of principal component analysis (PCA, a linear second-order regression technique) to disambiguate individual motor units from the wide field recordings of the arrays, which benefits from the density and numerous recording electrodes. At the highest complexity, they use spike sorting techniques that were developed for Neuropixels, a large-scale electrophysiology probe for cortical neural recordings. Specifically, they use an estimation code called kilosort, which ultimately relies on clustering techniques to separate the multi-electrode recordings into individual spike waveforms.

      The biggest strength of this work is the design and implementation of the hardware technology. It is undoubtedly a major leap forward in our ability to record the electrical activity of motor units. The myomatrix arrays trounce fine-wire EMGs when it comes to the quality of recordings, the number of simultaneous channels that can be recorded, their long-term stability, and resistance to movement artifacts.

      The primary weakness of this work is its reliance on kilosort in circumstances where most of the channels end up picking up the signal from multiple motor units. As the authors quite convincingly show, this setting is a major weakness for fine-wire EMG. They argue that the myomatrix array succeeds in isolating individual motor unit waveforms even in that challenging setting through the application of kilosort.

      Although the authors call the estimated signals as well-isolated waveforms, there is no independent evidence of the accuracy of the spike sorting algorithm. The additional step (spike sorting algorithms like kilosort) to estimate individual motor unit spikes is the part of the work in question. Although the estimation algorithms may be standard practice, the large number of heuristic parameters associated with the estimation procedure are currently tuned for cortical recordings to estimate neural spikes. Even within the limited context of Neuropixels, for which kilosort has been extensively tested, basic questions like issues of observability, linear or nonlinear, remain open. By observability, I mean in the mathematical sense of well-posedness or conditioning of the inverse problem of estimating single motor unit spikes given multi-channel recordings of the summation of multiple motor units. This disambiguation is not always possible. kilosort's validation relies on a forward simulation of the spike field generation, which is then truth-tested against the sorting algorithm. The empirical evidence is that kilosort does better than other algorithms for the test simulations that were performed in the context of cortical recordings using the Neuropixels probe. But this work has adopted kilosort without comparable truth-tests to build some confidence in the application of kilosort with myomatrix arrays.

      Kilosort was developed to analyze spikes from neurons rather than motor units and, as Reviewer #1 correctly points out, despite a number of prior validation studies the conditions under which Kilosort accurately identifies individual neurons are still incompletely understood. Our application of Kilosort to motor unit data therefore demands that we explain which of Kilosort’s assumptions do and do not hold for motor unit data and explain how our modifications of the Kilosort pipeline to account for important differences between neural and muscle recording, which we summarize below and have included in the revised manuscript.

      Additionally, both here and in the revised paper we emphasize that while the presented spike sorting methods (thresholding, PCA-based clustering, and Kilosort) robustly extract motor unit waveforms, spike sorting of motor units is still an ongoing project. Our future work will further elaborate how differences between cortical and motor unit data should inform approaches to spike sorting as well as develop simulated motor unit datasets that can be used to benchmark spike sorting methods.

      For our current revision, we have added detailed discussion (see “Data analysis: spike sorting”) of the risks and benefits of our use of Kilosort to analyze motor unit data, in each case clarifying how we have modified the Kilosort code with these issues in mind:

      “Modification of spatial masking: Individual motor units contain multiple muscle fibers (each of which is typically larger than a neuron’s soma), and motor unit waveforms can often be recorded across spatially distant electrode contacts as the waveforms propagate along muscle fibers. In contrast, Kilosort - optimized for the much more local signals recorded from neurons - uses spatial masking to penalize templates that are spread widely across the electrode array. Our modifications to Kilosort therefore include ensuring that Kilosort search for motor unit templates across all (and only) the electrode channels inserted into a given muscle. In this Github repository linked above, this is accomplished by setting parameter nops.sigmaMask to infinity, which effectively eliminates spatial masking in the analysis of the 32 unipolar channels recorded from the injectable Myomatrix array schematized in Supplemental Figure 1g. In cases including chronic recording from mice where only a single 8-contact thread is inserted into each muscle, a similar modification can be achieved with a finite value of nops.sigmaMask by setting parameter NchanNear, which represents the number of nearby EMG channels to be included in each cluster, to equal the number of unipolar or bipolar data channels recorded from each thread. Finally, note that in all cases Kilosort parameter NchanNearUp (which defines the maximum number of channels across which spike templates can appear) must be reset to be equal to or less than the total number of Myomatrix data channels.”

      “Allowing more complex spike waveforms: We also modified Kilosort to account for the greater duration and complexity (relative to neural spikes) of many motor unit waveforms. In the code repository linked above, Kilosort 2.5 was modified to allow longer spike templates (151 samples instead of 61), more spatiotemporal PCs for spikes (12 instead of 6), and more left/right eigenvector pairs for spike template construction (6 pairs instead of 3). These modifications were crucial for improving sorting performance in the nonhuman primate dataset shown in Figure 3, and in a subset of the rodent datasets (although they were not used in the analysis of mouse data shown in Fig. 1 and Supplemental Fig. 2a-f).”

      Furthermore, as the paper on the latest version of kilosort, namely v4, discusses, differences in the clustering algorithm is the likely reason for kilosort4 performing more robustly than kilosort2.5 (used in the myomatrix paper). Given such dependence on details of the implementation and the use of an older kilosort version in this paper, the evidence that the myomatrix arrays truly record individual motor units under all the types of data obtained is under question.

      We chose to modify Kilosort 2.5, which has been used by many research groups to sort spike features, rather than the just-released Kilosort 4.0. Although future studies might directly compare the performance of these two versions on sorting motor unit data, we feel that such an analysis is beyond the scope of this paper, which aims primarily to introduce our electrode technology and demonstrate that a wide range of sorting methods (thresholding, PCA-based waveform clustering, and Kilosort) can all be used to extract single motor units. Additionally, note that because we have made several significant modifications to Kilosort 2.5 as described above, it is not clear what a “direct” comparison between different Kilosort versions would mean, since the procedures we provide here are no longer identical to version 2.5.

      There is an older paper with a similar goal to use multi-channel recording to perform sourcelocalization that the authors have failed to discuss. Given the striking similarity of goals and the divergence of approaches (the older paper uses a surface electrode array), it is important to know the relationship of the myomatrix array to the previous work. Like myomatrix arrays, the previous work also derives inspiration from cortical recordings, in that case it uses the approach of source localization in large-scale EEG recordings using skull caps, but applies it to surface EMG arrays. Ref: van den Doel, K., Ascher, U. M., & Pai, D. K. (2008). Computed myography: three-dimensional reconstruction of motor functions from surface EMG data. Inverse Problems, 24(6), 065010.

      We thank the Reviewer for pointing out this important prior work, which we now cite and discuss in the revised manuscript under “Data analysis: spike sorting” [lines 318-333]:

      “Our approach to spike sorting shares the same ultimate goal as prior work using skin-surface electrode arrays to isolate signals from individual motor units but pursues this goal using different hardware and analysis approaches. A number of groups have developed algorithms for reconstructing the spatial location and spike times of active motor units (Negro et al. 2016; van den Doel, Ascher, and Pai 2008) based on skin-surface recordings, in many cases drawing inspiration from earlier efforts to localize cortical activity using EEG recordings from the scalp (Michel et al. 2004). Our approach differs substantially. In Myomatrix arrays, the close electrode spacing and very close proximity of the contacts to muscle fibers ensure that each Myomatrix channel records from a much smaller volume of tissue than skin-surface arrays. This difference in recording volume in turn creates different challenges for motor unit isolation: compared to skin-surface recordings, Myomatrix recordings include a smaller number of motor units represented on each recording channel, with individual motor units appearing on a smaller fraction of the sensors than typical in a skin-surface recording. Because of this sensordependent difference in motor unit source mixing, different analysis approaches are required for each type of dataset. Specifically, skin-surface EMG analysis methods typically use source-separation approaches that assume that each sensor receives input from most or all of the individual sources within the muscle as is presumably the case in the data. In contrast, the much sparser recordings from Myomatrix are better decomposed using methods like Kilosort, which are designed to extract waveforms that appear only on a small, spatially-restricted subset of recording channels.”

      The incompleteness of the evidence that the myomatrix array truly measures individual motor units is limited to the setting where multiple motor units have similar magnitude of signal in most of the channels. In the simpler data setting where one motor dominates in some channel (this seems to occur with some regularity), the myomatrix array is a major advance in our ability to understand the motor output of the nervous system. The paper is a trove of innovations in manufacturing technique, array design, suture and other fixation devices for long-term signal stability, and customization for different muscle sizes, locations, and organisms. The technology presented here is likely to achieve rapid adoption in multiple groups that study motor behavior, and would probably lead to new insights into the spatiotemporal distribution of the motor output under more naturally behaving animals than is the current state of the field.

      We thank the Reviewer for this positive evaluation and for the critical comments above.

      Reviewer #2 (Public Review):

      Motoneurons constitute the final common pathway linking central impulse traffic to behavior, and neurophysiology faces an urgent need for methods to record their activity at high resolution and scale in intact animals during natural movement. In this consortium manuscript, Chung et al. introduce highdensity electrode arrays on a flexible substrate that can be implanted into muscle, enabling the isolation of multiple motor units during movement. They then demonstrate these arrays can produce high-quality recordings in a wide range of species, muscles, and tasks. The methods are explained clearly, and the claims are justified by the data. While technical details on the arrays have been published previously, the main significance of this manuscript is the application of this new technology to different muscles and animal species during naturalistic behaviors. Overall, we feel the manuscript will be of significant interest to researchers in motor systems and muscle physiology, and we have no major concerns. A few minor suggestions for improving the manuscript follow.

      We thank the Reviewer for this positive overall assessment.

      The authors perhaps understate what has been achieved with classical methods. To further clarify the novelty of this study, they should survey previous approaches for recording from motor units during active movement. For example, Pflüger & Burrows (J. Exp. Biol. 1978) recorded from motor units in the tibial muscles of locusts during jumping, kicking, and swimming. In humans, Grimby (J. Physiol. 1984) recorded from motor units in toe extensors during walking, though these experiments were most successful in reinnervated units following a lesion. In addition, the authors might briefly mention previous approaches for recording directly from motoneurons in awake animals (e.g., Robinson, J. Neurophys. 1970; Hoffer et al., Science 1981).

      We agree and have revised the manuscript to discuss these and other prior use of traditional EMG, including here [lines 164-167]:

      “The diversity of applications presented here demonstrates that Myomatrix arrays can obtain highresolution EMG recordings across muscle groups, species, and experimental conditions including spontaneous behavior, reflexive movements, and stimulation-evoked muscle contractions. Although this resolution has previously been achieved in moving subjects by directly recording from motor neuron cell bodies in vertebrates (Hoffer et al. 1981; Robinson 1970; Hyngstrom et al. 2007) and by using fine-wire electrodes in moving insects (Pfluger 1978; Putney et al. 2023), both methods are extremely challenging and can only target a small subset of species and motor unit populations. Exploring additional muscle groups and model systems with Myomatrix arrays will allow new lines of investigation into how the nervous system executes skilled behaviors and coordinates the populations of motor units both within and across individual muscles…

      For chronic preparations, additional data and discussion of the signal quality over time would be useful. Can units typically be discriminated for a day or two, a week or two, or longer?

      A related issue is whether the same units can be tracked over multiple sessions and days; this will be of particular significance for studies of adaptation and learning.

      Although the yields of single units are greatest in the 1-2 weeks immediately following implantation, in chronic preparations we have obtained well-isolated single units up to 65 days post-implant. Anecdotally, in our chronic mouse implants we occasionally see motor units on the same channel across multiple days with similar waveform shapes and patterns of behavior-locked activity. However, because data collection for this manuscript was not optimized to answer this question, we are unable to verify whether these observations actually reflect cross-session tracking of individual motor units. For example, in all cases animals were disconnected from data collection hardware in between recording sessions (which were often separated by multiple intervening days) preventing us from continuously tracking motor units across long timescales. We agree with the reviewer that long-term motor unit tracking would be extremely useful as a tool for examining learning and plan to address this question in future studies.

      We have added a discussion of these issues to the revised manuscript [lines 52-59]:

      “…These methods allow the user to record simultaneously from ensembles of single motor units (Fig. 1c,d) in freely behaving animals, even from small muscles including the lateral head of the triceps muscle in mice (approximately 9 mm in length with a mass of 0.02 g 23). Myomatrix recordings isolated single motor units for extended periods (greater than two months, Supp. Fig. 3e), although highest unit yield was typically observed in the first 1-2 weeks after chronic implantation. Because recording sessions from individual animals were often separated by several days during which animals were disconnected from data collection equipment, we are unable to assess based on the present data whether the same motor units can be recorded over multiple days.”

      Moreover, we have revised Supplemental Figure 3 to show an example of single motor units recorded >2 months after implantation:

      Author response image 1.

      Longevity of Myomatrix recordings In addition to isolating individual motor units, Myomatrix arrays also provide stable multi-unit recordings of comparable or superior quality to conventional fine wire EMG…. (e) Although individual motor units were most frequently recorded in the first two weeks of chronic recordings (see main text), Myomatrix arrays also isolate individual motor units after much longer periods of chronic implantation, as shown here where spikes from two individual motor units (colored boxes in bottom trace) were isolated during locomotion 65 days after implantation. This bipolar recording was collected from the subject plotted with unfilled black symbols in panel (d).

      It appears both single-ended and differential amplification were used. The authors should clarify in the Methods which mode was used in each figure panel, and should discuss the advantages and disadvantages of each in terms of SNR, stability, and yield, along with any other practical considerations.

      We thank the reviewer for the suggestion and have added text to all figure legends clarifying whether each recording was unipolar or bipolar.

      Is there likely to be a motor unit size bias based on muscle depth, pennation angle, etc.?

      Although such biases are certainly possible, the data presented here are not well-suited to answering these questions. For chronic implants in small animals, the target muscles (e.g. triceps in mice) are so small that the surgeon often has little choice about the site and angle of array insertion, preventing a systematic analysis of this question. For acute array injections in larger animals such as rhesus macaques, we did not quantify the precise orientation of the arrays (e.g. with ultrasound imaging) or the muscle fibers themselves, again preventing us from drawing strong conclusions on this topic. This question is likely best addressed in acute experiments performed on larger muscles, in which the relative orientations of array threads and muscle fibers can be precisely imaged and systematically varied to address this important issue.

      Can muscle fiber conduction velocity be estimated with the arrays?

      We sometimes observe fiber conduction delays up to 0.5 msec as the spike from a single motor unit moves from electrode contact to electrode contact, so spike velocity could be easily estimated given the known spatial separation between electrode contacts. However (closely related to the above question) this will only provide an accurate estimate of muscle fiber conduction velocity if the electrode contacts are arranged parallel to fiber direction, which is difficult to assess in our current dataset. If the arrays are not parallel, this computation will produce an overestimate of conduction velocity, as in the extreme case where a line of electrode contacts arranged perpendicular to the fiber direction might have identical spike arrival times, and therefore appear to have an infinite conduction velocity. Therefore, although Myomatrix arrays can certainly be used to estimate conduction velocity, such estimates should be performed in future studies only in settings where the relative orientation of array threads and muscle fibers can be accurately measured.

      The authors suggest their device may have applications in the diagnosis of motor pathologies. Currently, concentric needle EMG to record from multiple motor units is the standard clinical method, and they may wish to elaborate on how surgical implantation of the new array might provide additional information for diagnosis while minimizing risk to patients.

      We thank the reviewer for the suggestion and have modified the manuscript’s final paragraph accordingly [lines 182-188]:

      “Applying Myomatrix technology to human motor unit recordings, particularly by using the minimally invasive injectable designs shown in Figure 3 and Supplemental Figure 1g,i, will create novel opportunities to diagnose motor pathologies and quantify the effects of therapeutic interventions in restoring motor function. Moreover, because Myomatrix arrays are far more flexible than the rigid needles commonly used to record clinical EMG, our technology might significantly reduce the risk and discomfort of such procedures while also greatly increasing the accuracy with which human motor function can be quantified. This expansion of access to high-resolution EMG signals – across muscles, species, and behaviors – is the chief impact of the Myomatrix project.”

      Reviewer #3 (Public Review):

      This work provides a novel design of implantable and high-density EMG electrodes to study muscle physiology and neuromotor control at the level of individual motor units. Current methods of recording EMG using intramuscular fine-wire electrodes do not allow for isolation of motor units and are limited by the muscle size and the type of behavior used in the study. The authors of Myomatrix arrays had set out to overcome these challenges in EMG recording and provided compelling evidence to support the usefulness of the new technology.

      Strengths:

      They presented convincing examples of EMG recordings with high signal quality using this new technology from a wide array of animal species, muscles, and behavior.

      • The design included suture holes and pull-on tabs that facilitate implantation and ensure stable recordings over months.

      • Clear presentation of specifics of the fabrication and implantation, recording methods used, and data analysis.

      We thank the Reviewer for these comments.

      Weaknesses:

      The justification for the need to study the activity of isolated motor units is underdeveloped. The study could be strengthened by providing example recordings from studies that try to answer questions where isolation of motor unit activity is most critical. For example, there is immense value for understanding muscles with smaller innervation ratio which tend to have many motor neurons for fine control of eyes and hand muscles.

      We thank the Reviewer for the suggestion and have modified the manuscript accordingly [lines 170-174]:

      “…how the nervous system executes skilled behaviors and coordinates the populations of motor units both within and across individual muscles. These approaches will be particularly valuable in muscles in which each motor neuron controls a very small number of muscle fibers, allowing fine control of oculomotor muscles in mammals as well as vocal muscles in songbirds (Fig. 2g), in which most individual motor neurons innervate only 1-3 muscle fibers (Adam et al. 2021).”

      Reviewer #1 (Recommendations for The Authors):

      I would urge the authors to consider a thorough validation of the spike sorting piece of the workflow. Barring that weakness, this paper has the potential to transform motor neuroscience. The validation efforts of kilosort in the context of Neuropixels might offer a template for how to convince the community of the accuracy of myomatrix arrays in disambiguating individual motor unit waveforms.

      I have a few minor detailed comments, that the authors may find of some use. My overall comment is to commend the authors for the precision of the work as well as the writing. However, exercising caution associated with kilosort could truly elevate the paper by showing where there is room for improvement.

      We thank the Reviewer for these comments - please see our summary of our revisions related to Kilosort in our reply to the public reviews above.

      L6-7: The relationship between motor unit action potential and the force produced is quite complicated in muscle. For example, recent work has shown how decoupled the force and EMG can be during nonsteady locomotion. Therefore, it is not a fully justified claim that recording motor unit potentials will tell us what forces are produced. This point relates to another claim made by the authors (correctly) that EMG provides better quality information about muscle motor output in isometric settings than in more dynamic behaviors. That same problem could also apply to motor unit recordings and their relationship to muscle force. The relationship is undoubtedly strong in an isometric setting. But as has been repeatedly established, the electrical activity of muscle is only loosely related to its force output and lacks in predictive power.

      This is an excellent point, and our revised manuscript now addresses this issue [lines 174-176]:

      “…Of further interest will be combining high-resolution EMG with precise measurement of muscle length and force output to untangle the complex relationship between neural control, body kinematics, and muscle force that characterizes dynamic motor behavior. Similarly, combining Myomatrix recordings with high-density brain recordings….”

      L12: There is older work that uses an array of skin mounted EMG electrodes to solve a source location problem, and thus come quite close to the authors' stated goals. However, the authors have failed to cite or provide an in-depth analysis and discussion of this older work.

      As described above in the response to Reviewer 1’s public review comments, we now cite and discuss these papers.

      L18-19: "These limitations have impeded our understanding of fundamental questions in motor control, ..." There are two independently true statements here. First is that there are limitations to EMG based inference of motor unit activity. Second is that there are gaps in the current understanding of motor unit recruitment patterns and modification of these patterns during motor learning. But the way the first few paragraphs have been worded makes it seem like motor unit recordings is a panacea for these gaps in our knowledge. That is not the case for many reasons, including key gaps in our understanding of how muscle's electrical activity relates to its force, how force relates to movement, and how control goals map to specific movement patterns. This manuscript would in fact be strengthened by acknowledging and discussing the broader scope of gaps in our understanding, and thus more precisely pinpointing the specific scientific knowledge that would be gained from the application of myomatrix arrays.

      We agree and have revised the manuscript to note this complexity (see our reply to this Reviewer’s other comment about muscle force, above).

      L140-143: The estimation algorithms yields potential spikes but lacking the validation of the sorting algorithms, it is not justifiable to conclude that the myomatrix arrays have already provided information about individual motor units.

      Please see our replies to Reviewer #1s public comments (above) regarding motor unit spike sorting.

      L181-182: "These methods allow very fine pitch escape routing (<10 µm spacing), alignment between layers, and uniform via formation." I find this sentence hard to understand. Perhaps there is some grammatical ambiguity?

      We have revised this passage as follows [lines 194-197]:

      "These methods allow very fine pitch escape routing (<10 µm spacing between the thin “escape” traces connecting electrode contacts to the connector), spatial alignment between the multiple layers of polyimide and gold that constitute each device, and precise definition of “via” pathways that connect different layers of the device.”

      L240: What is the rationale for choosing this frequency band for the filter?

      Individual motor unit waveforms have peak energy at roughly 0.5-2.0 kHz, although units recorded at very high SNR often have voltage waveform features at higher frequencies. The high- and lowpass cutoff frequencies should reflect this, although there is nothing unique about the 350 Hz and 7,000 Hz cutoffs we describe, and in all recordings similar results can be obtained with other choices of low/high frequency cutoffs.

      L527-528: There are some key differences between the electrode array design presented here and traditional fine-wire EMG in terms of features used to help with electrode stability within the muscle. A barb-like structure is formed in traditional fine-wire EMG by bending the wire outside the canula of the needle used to place it within the muscle. But when the wire is pulled out, it is common for the barb to break off and be left behind. This is because of the extreme (thin) aspect ratio of the barb in fine wire EMG and low-cycle fatigue fracture of the wire. From the schematic shown here, the barb design seems to be stubbier and thus less prone to breaking off. This raises the question of how much damage is inflicted during the pull-out and the associated level of discomfort to the animal as a result. The authors should present a more careful statement and documentation with regard to this issue.

      We have updated the manuscript to highlight the ease of inserting and removing Myomatrix probes, and to clarify that in over 100 injectable insertions/removal there have been zero cases of barbs (or any other part) of the devices breaking off within the muscle [lines 241-249]:

      “…Once the cannula was fully inserted, the tail was released, and the cannula slowly removed. After recording, the electrode and tail were slowly pulled out of the muscle together. Insertion and removal of injectable Myomatrix devices appeared to be comparable or superior to traditional fine-wire EMG electrodes (in which a “hook” is formed by bending back the uninsulated tip of the recording wire) in terms of both ease of injection, ease of removal of both the cannula and the array itself, and animal comfort. Moreover, in over 100 Myomatrix injections performed in rhesus macaques, there were zero cases in which Myomatrix arrays broke such that electrode material was left behind in the recorded muscle, representing a substantial improvement over traditional fine-wire approaches, in which breakage of the bent wire tip regularly occurs (Loeb and Gans 1986).”

      Reviewer #2 (Recommendations For The Authors):

      The Abstract states the device records "muscle activity at cellular resolution," which could potentially be read as a claim that single-fiber recording has been achieved. The authors might consider rewording.

      The Reviewer is correct, and we have removed the word “cellular”.

      The supplemental figures could perhaps be moved to the main text to aid readers who prefer to print the combined PDF file.

      After finalizing the paper we will upload all main-text and supplemental figures into a single pdf on biorXiv for readers who prefer a single pdf. However, given that the supplemental figures provide more technical and detailed information than the main-text figures, for the paper on the eLife site we prefer the current eLife format in which supplemental figures are associated with individual main-text figures online.

      Reviewer #3 (Recommendations For The Authors):

      • The work could be strengthened by showing examples of simultaneous recordings from different muscles.

      Although Myomatrix arrays can indeed be used to record simultaneously from multiple muscles, in this manuscript we have decided to focus on high-resolution recordings that maximize the number of recording channels and motor units obtained from a single muscle. Future work from our group with introduce larger Myomatrix arrays optimized for recording from many muscles simultaneously.

      • The implantation did not include mention of testing the myomatrix array during surgery by using muscle stimulation to verify correct placement and connection.

      As the Reviewer points out electrical stimulation is a valuable tool for confirming successful EMG placement. However we did not use this approach in the current study, relying instead on anatomical confirmation of muscle targeting (e.g. intrasurgical and postmortem inspection in rodents) and by implanting large, easy-totarget arm muscles (in primates) where the risk of mis-targeting is extremely low. Future studies will examine both electrical stimulation and ultrasound methods for confirming the placement of Myomatrix arrays.

      References cited above

      Adam, I., A. Maxwell, H. Rossler, E. B. Hansen, M. Vellema, J. Brewer, and C. P. H. Elemans. 2021. 'One-to-one innervation of vocal muscles allows precise control of birdsong', Curr Biol, 31: 3115-24 e5.

      Hoffer, J. A., M. J. O'Donovan, C. A. Pratt, and G. E. Loeb. 1981. 'Discharge patterns of hindlimb motoneurons during normal cat locomotion', Science, 213: 466-7.

      Hyngstrom, A. S., M. D. Johnson, J. F. Miller, and C. J. Heckman. 2007. 'Intrinsic electrical properties of spinal motoneurons vary with joint angle', Nat Neurosci, 10: 363-9.

      Loeb, G. E., and C. Gans. 1986. Electromyography for Experimentalists, First edi (The University of Chicago Press: Chicago, IL).

      Michel, C. M., M. M. Murray, G. Lantz, S. Gonzalez, L. Spinelli, and R. Grave de Peralta. 2004. 'EEG source imaging', Clin Neurophysiol, 115: 2195-222.

      Negro, F., S. Muceli, A. M. Castronovo, A. Holobar, and D. Farina. 2016. 'Multi-channel intramuscular and surface EMG decomposition by convolutive blind source separation', J Neural Eng, 13: 026027.

      Pfluger, H. J.; Burrows, M. 1978. 'Locusts use the same basic motor pattern in swimming as in jumping and kicking', Journal of experimental biology, 75: 81-93.

      Putney, Joy, Tobias Niebur, Leo Wood, Rachel Conn, and Simon Sponberg. 2023. 'An information theoretic method to resolve millisecond-scale spike timing precision in a comprehensive motor program', PLOS Computational Biology, 19: e1011170.

      Robinson, D. A. 1970. 'Oculomotor unit behavior in the monkey', J Neurophysiol, 33: 393-403.

      van den Doel, Kees, Uri M Ascher, and Dinesh K Pai. 2008. 'Computed myography: three-dimensional reconstruction of motor functions from surface EMG data', Inverse Problems, 24: 065010.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Firstly, we must take a moment to express our sincere gratitude to editorial board for allowing this work to be reviewed, and to the peer reviewers for taking the time and effort to review our manuscript. The reviews are thoughtful and reflect the careful work of scientists who undoubtedly have many things on their schedule. We cannot express our gratitude enough. This is not a minor sentiment. We appreciate the engagement.

      Allow us to briefly highlight some of the changes made to the revised manuscript, most on behalf of suggestions made by the reviewers:

      1) A supplementary figure that includes the calculation of drug applicability and variant vulnerability for a different data set–16 alleles of dihydrofolate reductase, and two antifolate compounds used to treat malaria–pyrimethamine and cycloguanil.

      2) New supplementary figures that add depth to the result in Figure 1 (the fitness graphs): we demonstrate how the rank order of alleles changes across drug environments and offer a statistical comparison of the equivalence of these fitness landscapes.

      3) A new subsection that explains our specific method used to measure epistasis.

      4) Improved main text with clarifications, fixed errors, and other addendums.

      5) Improved referencing and citations, in the spirit of better scholarship (now with over 70 references).

      Next, we’ll offer some general comments that we believe apply to several of the reviews, and to the eLife assessment. We have provided the bulk of the responses in some general comments, and in response to the public reviews. We have also included the suggestions and made brief comments to some of the individual recommendations.

      On the completeness of our analysis

      In our response, we’ll address the completeness issue first, as iterations of it appear in several of the reviews, and it seems to be one of the most substantive philosophical critiques of the work (there are virtually no technical corrections, outside of a formatting and grammar fixes, which we are grateful to the reviewers for identifying).

      To begin our response, we will relay that we have now included an analysis of a data set corresponding to mutants of a protein, dihydrofolate reductase (DHFR), from Plasmodium falciparum (a main cause of malaria), across two antifolate drugs (pyrimethamine and ycloguanil). We have also decided to include this new analysis in the supplementary material (see Figure S4).

      Author response image 1.

      Drug applicability and variant vulnerability for 16 alleles of dihydrofolate reductase.

      Here we compute the variant vulnerability and drug applicability metrics for two drugs, pyrimethamine (PYR) and cycloguanil (CYC), both antifolate drugs used to treat malaria. This is a completely different system than the one that is the focus of the submitted paper, for a different biomedical problem (antimalarial resistance), using different drugs, and targets. Further, the new data provide information on both drugs of different kinds, and drug concentrations (as suggested by Reviewer #1; we’ve also added a note about this in the new supplementary material). Note that these data have already been the subject of detailed analyses of epistatic effects, and so we did not include those here, but we do offer that reference:

      ● Ogbunugafor CB. The mutation effect reaction norm (mu-rn) highlights environmentally dependent mutation effects and epistatic interactions. Evolution. 2022 Feb 1;76(s1):37-48.

      ● Diaz-Colunga J, Sanchez A, Ogbunugafor CB. Environmental modulation of global epistasis is governed by effective genetic interactions. bioRxiv. 2022:202211.

      Computing our proposed metrics across different drugs is relatively simple, and we could have populated our paper with suites of similar analyses across data sets of various kinds. Such a paper would, in our view, be spread too thin–the evolution of antifolate resistance and/or antimalarial resistance are enormous problems, with large literatures that warrant focused studies. More generally, as the reviewers doubtlessly understand, simply analyzing more data sets does not make a study stronger, especially one like ours, that is using empirical data to both make a theoretical point about alleles and drugs and offer a metric that others can apply to their own data sets.

      Our approach focused on a data set that allowed us to discuss the biology of a system: a far stronger paper, a far stronger proof-of-concept for a new metric. We will revisit this discussion about the structure of our study. But before doing so, we will elaborate on why the “more is better” tone of the reviews is misguided.

      We also note that study where the data originate (Mira et al. 2015) is focused on a single data set of a single drug-target system. We should also point out that Mira et al. 2015 made a general point about drug concentrations influencing the topography of fitness landscapes, not unlike our general point about metrics used to understand features of alleles and different drugs in antimicrobial systems.

      This isn’t meant to serve as a feeble appeal to authority – just because something happened in one setting doesn’t make it right for another. But other than a nebulous appeal to the fact that things have changed in the 8 years since that study was published, it is difficult to argue why one study system was permissible for other work but is somehow “incomplete” in ours. Double standards can be appropriate when they are justified, but in this case, it hasn’t been made clear, and there is no technical basis for it.

      Our study does what countless other successful ones do: utilizes a biological system to make a general point about some phenomena in the natural world. In our case, we were focused on the need for more evolution-inspired iterations of widely used concepts like druggability. For example, a recent study of epistasis focused on a single set of alleles, across several drugs, not unlike our study:

      ● Lozovsky ER, Daniels RF, Heffernan GD, Jacobus DP, Hartl DL. Relevance of higher-order epistasis in drug resistance. Molecular biology and evolution. 2021 Jan;38(1):142-51.

      Next, we assert that there is a difference between an eagerness to see a new metric applied to many different data sets (a desire we share, and plan on pursuing in the future), and the notion that an analysis is “incomplete” without it. The latter is a more serious charge and suggests that the researcher-authors neglected to properly construct an argument because of gaps in the data. This charge does not apply to our manuscript, at all. And none of the reviewers effectively argued otherwise.

      Our study contains 7 different combinatorially-complete datasets, each composed of 16 alleles (this not including the new analysis of antifolates that now appear in the revision). One can call these datasets “small” or “low-dimensional,” if they choose (we chose to put this front-and-center, in the title). They are, however, both complete and as large or larger than many datasets in similar studies of fitness landscapes:

      ● Knies JL, Cai F, Weinreich DM. Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular biology and evolution. 2017 May 1;34(5):1040-54.

      ● Lozovsky ER, Daniels RF, Heffernan GD, Jacobus DP, Hartl DL. Relevance of higher-order epistasis in drug resistance. Molecular biology and evolution. 2021 Jan;38(1):142-51.

      ● Rodrigues JV, Bershtein S, Li A, Lozovsky ER, Hartl DL, Shakhnovich EI. Biophysical principles predict fitness landscapes of drug resistance. Proceedings of the National Academy of Sciences. 2016 Mar 15;113(11):E1470-8.

      ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.

      ● Lindsey HA, Gallie J, Taylor S, Kerr B. Evolutionary rescue from extinction is contingent on a lower rate of environmental change. Nature. 2013 Feb 28;494(7438):463-7.

      These are only five of very many such studies, some of them very well-regarded.

      Having now gone on about the point about the data being “incomplete,” we’ll next move to the more tangible comment-criticism about the low-dimensionality of the data set, or the fact that we examined a single drug-drug target system (β lactamases, and β-lactam drugs).

      The criticism, as we understand it, is that the authors could have analyzed more data,

      This is a common complaint, that “more is better” in biology. While we appreciate the feedback from the reviewers, we notice that no one specified what constitutes the right amount of data. Some pointed to other single data sets, but would analyzing two different sets qualify as enough? Perhaps to person A, but not to persons B - Z. This is a matter of opinion and is not a rigorous comment on the quality of the science (or completeness of the analysis).

      ● Should we analyze five more drugs of the same target (beta lactamases)? And what bacterial orthologs?

      ● Should we analyze 5 antifolates for 3 different orthologs of dihydrofolate reductase?

      ● And in which species or organism type? Bacteria? Parasitic infections?

      ● And why only infectious disease? Aren’t these concepts also relevant to cancer? (Yes, they are.)

      ● And what about the number of variants in the aforementioned target? Should one aim for small combinatorially complete sets? Or vaster swaths of sequence space, such as the ones generated by deep mutational scanning and other methods?

      I offer these options in part because, for the most part, were not given an objective suggestion for appropriate level of detail. This is because there is no answer to the question of what size of dataset would be most appropriate. Unfortunately, without a technical reason why a data set of unspecified size [X] or [Y] is best, then we are left with a standard “do more work” peer review response, one that the authors are not inclined to engage seriously, because there is no scientific rationale for it.

      The most charitable explanation for why more datasets would be better is tied to the abstract notion that seeing a metric measured in different data sets somehow makes it more believable. This, as the reviewers undoubtedly understand, isn’t necessarily true (in fact, many poor studies mask a lack of clarity with lots of data).

      To double down on this take, we’ll even argue the opposite: that our focus on a single drug system is a strength of the study.

      The focus on a single-drug class allows us to practice the lost art of discussing the peculiar biology of the system that we are examining. Even more, the low dimensionality allows us to discuss–in relative detail–individual mutations and suites of mutations. We do so several times in the manuscript, and even connect our findings to literature that has examined the biophysical consequences of mutations in these very enzymes.

      (For example: Knies JL, Cai F, Weinreich DM. Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular biology and evolution. 2017 May 1;34(5):1040-54.)

      Such detail is only legible in a full-length manuscript because we were able to interrogate a system in good detail. That is, the low-dimensionality (of a complete data set) is a strength, rather than a weakness. This was actually part of the design choice for the study: to offer a new metric with broad application but developed using a system where the particulars could be interrogated and discussed.

      Surely the findings that we recover are engineered for broader application. But to suggest that we need to apply them broadly in order to demonstrate their broad impact is somewhat antithetical to both model systems research and to systems biology, both of which have been successful in extracting general principles for singular (often simple) systems and models.

      An alternative approach, where the metric was wielded across an unspecified number of datasets would lend to a manuscript that is unfocused, reading like many modern machine learning papers, where the analysis or discussion have little to do with actual biology. We very specifically avoided this sort of study.

      To close our comments regarding data: Firstly, we have considered the comments and analyzed a different data set, corresponding to a different drug-target system (antifolate drugs, and DHFR). Moreover, we don’t think more data has anything to do with a better answer or support for our conclusions or any central arguments. Our arguments were developed from the data set that we used but achieve what responsible systems biology does: introduces a framework that one can apply more broadly. And we develop it using a complete, and well-vetted dataset. If the reviewers have a philosophical difference of opinion about this, we respect it, but it has nothing to do with our study being “complete” or not. And it doesn’t speak to the validity of our results.

      Related: On the dependence of our metrics on drug-target system

      Several comments were made that suggest the relevance of the metric may depend on the drug being used. We disagree with this, and in fact, have argued the opposite: the metrics are specifically useful because they are not encumbered with unnecessary variables. They are the product of rather simple arithmetic that is completely agnostic to biological particulars.

      We explain, in the section entitled “Metric Calculations:

      “To estimate the two metrics we are interested in, we must first quantify the susceptibility of an allelic variant to a drug. We define susceptibility as $1 - w$, where w is the mean growth of the allelic variant under drug conditions relative to the mean growth of the wild-type/TEM-1 control. If a variant is not significantly affected by a drug (i.e., growth under drug is not statistically lower than growth of wild-type/TEM-1 control, by t-test P-value < 0.01), its susceptibility is zero. Values in these metrics are summaries of susceptibility: the variant vulnerability of an allelic variant is its average susceptibility across drugs in a panel, and the drug applicability of an antibiotic is the average susceptibility of all variants to it.”

      That is, these can be animated to compute the variant vulnerability and drug applicability for data sets of various kinds. To demonstrate this (and we thank the reviewers for suggesting it), we have analyzed the antifolate-DHFR data set as outlined above.

      Finally, we will make the following light, but somewhat cynical point (that relates to the “more data” more point generally): the wrong metric applied to 100 data sets is little more than 100 wrong analyses. Simply applying the metric to a wide number of datasets has nothing to do with the veracity of the study. Our study, alternatively, chose the opposite approach: used a data set for a focused study where metrics were extracted. We believe this to be a much more rigorous way to introduce new metrics.

      On the Relevance of simulations

      Somewhat relatedly, the eLife summary and one of the reviewers mentioned the potential benefit of simulations. Reviewer 1 correctly highlights that the authors have a lot of experience in this realm, and so generating simulations would be trivial. For example, the authors have been involved in studies such as these:

      ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.

      ● Ogbunugafor CB, Wylie CS, Diakite I, Weinreich DM, Hartl DL. Adaptive landscape by environment interactions dictate evolutionary dynamics in models of drug resistance. PLoS computational biology. 2016 Jan 25;12(1):e1004710.

      ● Ogbunugafor CB, Hartl D. A pivot mutation impedes reverse evolution across an adaptive landscape for drug resistance in Plasmodium vivax. Malaria Journal. 2016 Dec;15:1-0.

      From the above and dozens of other related studies, we’ve learned that simulations are critical for questions about the end results of dynamics across fitness landscapes of varying topography. To simulate across the datasets in the submitted study would be be a small ask. We do not provide this, however, because our study is not about the dynamics of de novo evolution of resistance. In fact, our study focuses on a different problem, no less important for understanding how resistance evolves: determining static properties of alleles and drugs, that provide a picture into their ability to withstand a breadth of drugs in a panel (variant vulnerability), or the ability of a drug in a panel to affect a breadth of drug targets.

      The authors speak on this in the Introduction:

      “While stepwise, de novo evolution (via mutations and subsequent selection) is a key force in the evolution of antimicrobial resistance, evolution in natural settings often involves other processes, including horizontal gene transfer and selection on standing genetic variation. Consequently, perspectives that consider variation in pathogens (and their drug targets) are important for understanding treatment at the bedside. Recent studies have made important strides in this arena. Some have utilized large data sets and population genetics theory to measure cross-resistance and collateral sensitivity. Fewer studies have made use of evolutionary concepts to establish metrics that apply to the general problem of antimicrobial treatment on standing genetic variation in pathogen populations, or for evaluating the utility of certain drugs’ ability to treat the underlying genetic diversity of pathogens”

      That is, the proposed metrics aren’t about the dynamics of stepwise evolution across fitness landscapes, and so, simulating those dynamics don’t offer much for our question. What we have done instead is much more direct and allows the reader to follow a logic: clearly demonstrate the topography differences in Figure 1 (And Supplemental Figure S2 and S3 with rank order changes).

      Author response image 2.

      These results tell the reader what they need to know: that the topography of fitness landscapes changes across drug types. Further, we should note that Mira et al. 2015 already told the basic story that one finds different adaptive solutions across different drug environments. (Notably, without computational simulations).

      In summary, we attempted to provide a rigorous, clean, and readable study that introduced two new metrics. Appeals to adding extra analysis would be considered if they augmented the study’s goals. We do not believe this to be the case.

      Nonetheless, we must reiterate our appreciation for the engagement and suggestions. All were made with great intentions. This is more than one could hope for in a peer review exchange. The authors are truly grateful.

      eLife assessment

      The work introduces two valuable concepts in antimicrobial resistance: "variant vulnerability" and "drug applicability", which can broaden our ways of thinking about microbial infections through evolution-based metrics. The authors present a compelling analysis of a published dataset to illustrate how informative these metrics can be, study is still incomplete, as only a subset of a single dataset on a single class of antibiotics was analyzed. Analyzing more datasets, with other antibiotic classes and resistance mutations, and performing additional theoretical simulations could demonstrate the general applicability of the new concepts.

      The authors disagree strongly with the idea that the study is ‘incomplete,” and encourage the editors and reviewers to reconsider this language. Not only are the data combinatorially complete, but they are also larger in size than many similar studies of fitness landscapes. Insofar as no technical justification was offered for this “incomplete” summary, we think it should be removed. Furthermore, we question the utility of “theoretical simulations.” They are rather easy to execute but distract from the central aims of the study: to introduce new metrics, in the vein of other metrics–like druggability, IC50, MIC–that describe properties of drugs or drug targets.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Geurrero and colleagues introduces two new metrics that extend the concept of "druggability"- loosely speaking, the potential suitability of a particular drug, target, or drug-target interaction for pharmacological intervention-to collections of drugs and genetic variants. The study draws on previously measured growth rates across a combinatoriality complete mutational landscape involving 4 variants of the TEM-50 (beta lactamase) enzyme, which confers resistance to commonly used beta-lactam antibiotics. To quantify how growth rate - in this case, a proxy for evolutionary fitness - is distributed across allelic variants and drugs, they introduce two concepts: "variant vulnerability" and "drug applicability".

      Variant vulnerability is the mean vulnerability (1-normalized growth rate) of a particular variant to a library of drugs, while drug applicability measures the mean across the collection of genetic variants for a given drug. The authors rank the drugs and variants according to these metrics. They show that the variant vulnerability of a particular mutant is uncorrelated with the vulnerability of its one-step neighbors and analyze how higher-order combinations of single variants (SNPs) contribute to changes in growth rate in different drug environments.

      The work addresses an interesting topic and underscores the need for evolutionbased metrics to identify candidate pharmacological interventions for treating infections. The authors are clear about the limitations of their approach - they are not looking for immediate clinical applicability - and provide simple new measures of druggability that incorporate an evolutionary perspective, an important complement to the orthodoxy of aggressive, kill-now design principles. I think the ideas here will interest a wide range of readers, but I think the work could be improved with additional analysis - perhaps from evolutionary simulations on the measured landscapes - that tie the metrics to evolutionary outcomes.

      The authors greatly appreciate these comments, and the proposed suggestions by reviewer 1. We have addressed most of the criticisms and suggestions in our comments above.

      Reviewer #2 (Public Review):

      The authors introduce the notions of "variant vulnerability" and "drug applicability" as metrics quantifying the sensitivity of a given target variant across a panel of drugs and the effectiveness of a drug across variants, respectively. Given a data set comprising a measure of drug effect (such as growth rate suppression) for pairs of variants and drugs, the vulnerability of a variant is obtained by averaging this measure across drugs, whereas the applicability of a drug is obtained by averaging the measure across variants.

      The authors apply the methodology to a data set that was published by Mira et al. in 2015. The data consist of growth rate measurements for a combinatorially complete set of 16 genetic variants of the antibiotic resistance enzyme betalactamase across 10 drugs and drug combinations at 3 different drug concentrations, comprising a total of 30 different environmental conditions. For reasons that did not become clear to me, the present authors select only 7 out of 30 environments for their analysis. In particular, for each chosen drug or drug combination, they choose the data set corresponding to the highest drug concentration. As a consequence, they cannot assess to what extent their metrics depend on drug concentration. This is a major concern since Mira et al. concluded in their study that the differences between growth rate landscapes measured at different concentrations were comparable to the differences between drugs. If the new metrics display a significant dependence on drug concentration, this would considerably limit their usefulness.

      The authors appreciate the point about drug concentration, and it is one that the authors have made in several studies.

      The quick answer is that whether the metrics are useful for drug type-concentration A or B will depend on drug type-concentration A or B. If there are notable differences in the topography of the fitness landscape across concentration, then we should expect the metrics to differ. What Reviewer #2 points out as a “major concern,” is in fact a strength of the metrics: it is agnostic with respect to type of drug, type of target, size of dataset, or topography of the fitness landscape. And so, the authors disagree: no, that drug concentration would be a major actor in the value of the metrics does not limit the utility of the metric. It is simply another variable that one can consider when computing the metrics.

      As discussed above, we have analyzed data from a different data set, in a different drug-target problem (DHFR and antifolate drugs; see supplemental information). These demonstrate how the metric can be used to compute metrics across different drug concentrations.

      As a consequence of the small number of variant-drug combinations that are used, the conclusions that the authors draw from their analysis are mostly tentative with weak statistical support. For example, the authors argue that drug combinations tend to have higher drug applicability than single drugs, because a drug combination ranks highest in their panel of 7. However, the effect profile of the single drug cefprozil is almost indistinguishable from that of the top-ranking combination, and the second drug combination in the data set ranks only 5th out of 7.

      We reiterate our appreciation for the engagement. Reviewer #2 generously offers some technical insight on measurements of epistasis, and their opinion on the level of statistical support for our claims. The authors are very happy to engage in a dialogue about these points. We disagree rather strongly, and in addition to the general points raised above (that speak to some of this), will raise several specific rebuttals to the comments from Reviewer #2.

      For one, the Reviewer #2 is free to point to what arguments have “weak statistical support.” Having read the review, we aren’t sure what this is referring to. “Weak statistical support” generally applies to findings built from underpowered studies, or designs constructed in manner that yield effect sizes or p-values that give low confidence that a finding is believable (or is replicable). This sort of problem doesn’t apply to our study for various reasons, the least of which being that our findings are strongly supported, based on a vetted data set, in a system that has long been the object of examination in studies of antimicrobial resistance.

      For example, we did not argue that magnetic fields alter the topography of fitness landscapes, a claim which must stand up to a certain sort of statistical scrutiny. Alternatively, we examined landscapes where the drug environment differed statistically from the non-drug environment and used them to compute new properties of alleles and drugs.

      We can imagine that the reviewer is referring to the low-dimensionality of the fitness landscapes in the study. Again: the features of the dataset are a detail that the authors put into the title of the manuscript. Further, we emphasize that it is not a weakness, but rather, allows the authors to focus, and discuss the specific biology of the system. And we responsibly explain the constraints around our study several times, though none of them have anything to do with “weak statistical support.”

      Even though we aren’t clear what “weak statistical support” means as offered by Reviewer 2, the authors have nonetheless decided to provide additional analyses, now appearing in the new supplemental material.

      We have included a new Figure S2, where we offer an analysis of the topography of the 7 landscapes, based on the Kendall rank order test. This texts the hypothesis that there is no correlation (concordance or discordance) between the topographies of the fitness landscapes.

      Author response image 3.

      Kendall rank test for correlation between the 7 fitness landscapes.

      In Figure S3, we test the hypothesis that the variant vulnerability values differ. To do this, we calculate a paired t-test. These are paired by haplotype/allelic variant, so the comparisons are change in growth between drugs for each haplotype.

      Author response image 4.

      Paired t-tests for variant vulnerability.

      To this point raised by Reviewer #2:

      “For example, the authors argue that drug combinations tend to have higher drug applicability than single drugs, because a drug combination ranks highest in their panel of 7. However, the effect profile of the single drug cefprozil is almost indistinguishable from that of the top-ranking combination, and the second drug combination in the data set ranks only 5th out of 7.”

      Our study does not argue that drug combinations are necessarily correlated with a higher drug applicability. Alternatively, we specifically highlight that one of the combinations does not have a high drug applicability:

      “Though all seven drugs/combinations are β-lactams, they have widely varying effects across the 16 alleles. Some of the results are intuitive: for example, the drug regime with the highest drug applicability of the set—amoxicillin/clavulanic acid—is a mixture of a widely used β-lactam (amoxicillin) and a β-lactamase inhibitor (clavulanic acid) (see Table 3). We might expect such a mixture to have a broader effect across a diversity of variants. This high applicability is hardly a rule, however, as another mixture in the set, piperacillin/tazobactam, has a much lower drug applicability (ranking 5th out of the seven drugs in the set) (Table 3).”

      In general, we believe that the submitted paper is responsible with regards to how it extrapolates generalities from the results. Further, the manuscript contains a specific section that explains limitations, clearly and transparently (not especially common in science). For that reason, we’d encourage reviewer #2 to reconsider their perspective. We do not believe that our arguments are built on “weak” support at all. And we did not argue anything particular about drug combinations writ large. We did the opposite— discussed the particulars of our results in light of the biology of the system.

      Thirdly, to this point:

      “To assess the environment-dependent epistasis among the genetic mutations comprising the variants under study, the authors decompose the data of Mira et al. into epistatic interactions of different orders. This part of the analysis is incomplete in two ways. First, in their study, Mira et al. pointed out that a fairly large fraction of the fitness differences between variants that they measured were not statistically significant, which means that the resulting fitness landscapes have large statistical uncertainties. These uncertainties should be reflected in the results of the interaction analysis in Figure 4 of the present manuscript.”

      The authors are uncertain with regards to the “uncertainties” being referred to, but we’ll do our best to understand: our study utilized the 7 drug environments from Mira et al. 2015 with statistically significant differences between growth rates with and without drug. And so, this point about how the original set contained statistically insignificant treatments is not relevant here. We explain this in the methods section:

      “The data that we examine comes from a past study of a combinatorial set of four mutations associated with TEM-50 resistance to β-lactam drugs [39 ]. This past study measured the growth rates of these four mutations in combination, across 15 different drugs (see Supplemental Information).”

      We go on to say the following:

      “We examined these data, identifying a subset of structurally similar β-lactams that also included β-lactams combined with β-lactamase inhibitors, cephalosporins and penicillins. From the original data set, we focus our analyses on drug treatments that had a significant negative effect on the growth of wild-type/TEM-1 strains (one-tailed ttest of wild-type treatment vs. control, P < 0.01). After identifying the data from the set that fit our criteria, we were left with seven drugs or combinations (concentration in μg/ml): amoxicillin 1024 μg/ ml (β-lactam), amoxicillin/clavulanic acid 1024 μg/m l (βlactam and β-lactamase inhibitor) cefotaxime 0.123 μg/ml (third-generation cephalosporin), cefotetan 0.125 μg/ml (second-generation cephalosporins), cefprozil 128 μg/ml (second-generation cephalosporin), ceftazidime 0.125 μg/ml (third-generation cephalosporin), piperacillin and tazobactam 512/8 μg/ml (penicillin and β-lactamase inhibitor). With these drugs/mixtures, we were able to embody chemical diversity in the panel.”

      Again: The goal of our study was to develop metrics that can be used to analyze features of drugs and targets and disentangle these metrics into effects.

      Second, the interpretation of the coefficients obtained from the epistatic decomposition depends strongly on the formalism that is being used (in the jargon of the field, either a Fourier or a Taylor analysis can be applied to fitness landscape data). The authors need to specify which formalism they have employed and phrase their interpretations accordingly.

      The authors appreciate this nuance. Certainly, how to measure epistasis is a large topic of its own. But we recognize that we could have addressed this more directly and have added text to this effect.

      In response to these comments from Reviewer #2, we have added a new section focused on these points (reference syntax removed here for clarity; please see main text for specifics):

      “The study of epistasis, and discussions regarding the means to detect and measure now occupies a large corner of the evolutionary genetics literature. The topic has grown in recent years as methods have been applied to larger genomic data sets, biophysical traits, and the "global" nature of epistatic effects. We urge those interested in more depth treatments of the topic to engage larger summaries of the topic.”

      “Here will briefly summarize some methods used to study epistasis on fitness landscapes. Several studies of combinatorially-complete fitness landscapes use some variation of Fourier Transform or Taylor formulation. One in particular, the Walsh-Hadamard Transform has been used to measure epistasis across a wide number of study systems. Furthermore, studies have reconciled these methods with others, or expanded upon the Walsh-Hadamard Transform in a way that can accommodate incomplete data sets. These methods are effective for certain sorts of analyses, and we strongly urge those interested to examine these studies.”

      “The method that we've utilized, the LASSO regression, determines effect sizes for all interactions (alleles and drug environments). It has been utilized for data sets of similar size and structure, on alleles resistant to trimethoprim. Among many benefits, the method can accommodate gaps in data and responsibly incorporates experimental noise into the calculation.”

      As Reviewer #2 understands, there are many ways to examine epistasis on both high and low-dimensional landscapes. Reviewer #2 correctly offers two sorts of formalisms that allow one to do so. The two offered by Reviewer #2, are not the only means of measuring epistasis in data sets like the one we have offered. But we acknowledge that we could have done a better job outlining this. We thank Reviewer #2 for highlighting this, and believe our revision clarifies this.

      Reviewer #3 (Public Review):

      The authors introduce two new concepts for antimicrobial resistance borrowed from pharmacology, "variant vulnerability" (how susceptible a particular resistance gene variant is across a class of drugs) and "drug applicability" (how useful a particular drug is against multiple allelic variants). They group both terms under an umbrella term "drugability". They demonstrate these features for an important class of antibiotics, the beta-lactams, and allelic variants of TEM-1 beta-lactamase.

      The strength of the result is in its conceptual advance and that the concepts seem to work for beta-lactam resistance. However, I do not necessarily see the advance of lumping both terms under "drugability", as this adds an extra layer of complication in my opinion.

      Firstly, the authors greatly appreciate the comments from Reviewer #3. They are insightful, and prescriptive. And allow us to especially thank reviewer 3 for supplying a commented PDF with some grammatical and phrasing suggestions/edits. This is much appreciated. We have examined all these suggestions and made changes.

      In general, we agree with the spirit of many of the comments. In addition to our prior comments on the scope of our data, we’ll communicate a few direct responses to specific points raised.

      I also think that the utility of the terms could be more comprehensively demonstrated by using examples across different antibiotic classes and/or resistance genes. For instance, another good model with published data might have been trimethoprim resistance, which arises through point mutations in the folA gene (although, clinical resistance tends to be instead conferred by a suite of horizontally acquired dihydrofolate reductase genes, which are not so closely related as the TEM variants explored here).

      1. In our new supplemental material, we now feature an analysis of antifolate drugs, pyrimethamine and cycloguanil. We have discussed this in detail above and thank the reviewer for the suggestion.

      2. Secondly, we agree that the study will have a larger impact when the metrics are applied more broadly. This is an active area of investigation, and our hope is that others apply our metrics more broadly. But as we discussed, such a desire is not a technical criticism of our own study. We stand behind the rigor and insight offered by our study.

      The impact of the work on the field depends on a more comprehensive demonstration of the applicability of these new concepts to other drugs.

      The authors don’t disagree with this point, which applies to virtually every potentially influential study. The importance of a single study can generally only be measured by its downstream application. But this hardly qualifies as a technical critique of our study and does not apply to our study alone. Nor does it speak to the validity of our results. The authors share this interest in applying the metric more broadly.

      Reviewer #1 (Recommendations For The Authors):

      • The main weakness of the work, in my view, is that it does not directly tie these new metrics to a quantitative measure of "performance". The metrics have intuitive appeal, and I think it is likely that they could help guide treatment options-for example, drugs with high applicability could prove more useful under particular conditions. But as the authors note, the landscape is rugged and intuitive notions of evolutionary behavior can sometimes fail. I think the paper would be much improved if the authors could evaluate their new metrics using some type of quantitative evolutionary model. For example, perhaps the authors could simulate evolutionary dynamics on these landscapes in the presence of different drugs. Is the mean fitness achieved in the simulations correlated with, for example, the drug applicability when looking across an ensemble of simulations with the same drug but varied initial conditions that start from each individual variant? Similarly, if you consider an ensemble of simulations where each member starts from the same variant but uses a different drug, is the average fitness gain captured in some way by the variant vulnerability? All simulations will have limitations, of course, but given that the landscape is fully known I think these questions could be answered under some conditions (e.g. strong selection weak mutation limit, where the model could be formulated as a Markov Chain; see 10.1371/journal.pcbi.1004493 or doi: 10.1111/evo.14121 for examples). And given the authors' expertise in evolutionary dynamics, I think it could be achieved in a reasonable time. With that said, I want to acknowledge that with any new "metrics", it can be tempting to think that "we need to understand it all" before it is useful, and I don't want to fall into that trap here.

      The authors respect and appreciate these thoughtful comments.

      As Reviewer #1 highlighted, the authors are experienced with building simulations of evolution. For reasons we have outlined above, we don’t believe they would add to the arc of the current story and may encumber the story with unnecessary distractions. Simulations of evolution can be enormously useful for studies focused on particulars of the dynamics of evolution. This submitted study is not one of those. It is charged with identifying features of alleles and drugs that capture an allele’s vulnerability to treatment (variant vulnerability) and a drug’s effectiveness across alleles (drug applicability). Both features integrate aspects of variation (genetic and environmental), and as such, are improvements over both metrics used to describe drug targets and drugs.

      • The new metrics rely on means, which is a natural choice. Have the authors considered how variance (or other higher moments) might also impact evolutionary dynamics? I would imagine, for example, that the ultimate outcome of a treatment might depend heavily on the shape of the distribution, not merely its mean. This is also something one might be able to get a handle on with simulations.

      These are relevant points, and the authors appreciate them. Certainly, moments other than the mean might have utility. This is the reason that we computed the one-step neighborhood variant vulnerability–to see if the variant vulnerability of an allele was related to properties of its mutational neighborhood. We found no such correlation. There are many other sorts of properties that one might examine (e.g., shape of the distribution, properties of mutational network, variance, fano factor, etc). As we don’t have an informed reason to pursue any of this in lieu of others, we are pleased to investigate this in the future.

      Also, while we’ve addressed general points about simulations above, we want to note that our analysis of environmental epistasis does consider the variance. We urge Reviewer #1 to see our new section on “Notes on Methods Used to Measure Epistasis” where we explain some of this and supply references to that effect.

      • As I understand it, the fitness measurements here are measures of per capita growth rate, which is reasonable. However, the authors may wish to briefly comment on the limitations of this choice-i.e. the fact that these are not direct measures of relative fitness values from head-to-head competition between strains.

      Reviewer #1 is correct: the metrics are computed from means. As Reviewer 1 definitely understands, debates over what measurements are proper proxies for fitness go back a long time. We added a slight acknowledgement about the existence of multiple fitness proxies in our revision.

      • The authors consider one-step variant vulnerability. Have the authors considered looking at 2-step, 3-step, etc analogs of the 1-step vulnerability? I wonder if these might suggest potential vulnerability bottlenecks associated with the use of a particular drug/drug combo or trajectories starting from particular variants.

      This is an interesting point. We provided one-step values as a means of interrogating the mutational neighborhood of alleles in the fitness landscape. While there could certainly be other pattern-relationships between the variant vulnerability and features of a fitness landscape (as the reviewer recognizes), we don’t have a rigorous reason to test them, other than an appeal to “I would be curious if [Blank].” As in, attempting to saturate the paper with these sorts of examinations might be fun, could turn up an interesting result, but this is true for most studies.

      To highlight just how serious we are about future questions along these lines, we’ll offer one specific question about the relationship between metrics and other features of alleles or landscapes. Recent studies have examined the existence of “evolvabilityenhancing mutations,” that propel a population to high-fitness sections of a fitness landscape:

      ● Wagner, A. Evolvability-enhancing mutations in the fitness landscapes of an RNA and a protein. Nat Commun 14, 3624 (2023). https://doi.org/10.1038/s41467023-39321-8

      One present and future area of inquiry involves whether there is any relationship between metrics like variant vulnerability and these sorts of mutations.

      We thank Reviewer 1 for engagement on this issue.

      • Fitness values are measured in the presence of a drug, but it is not immediately clear how the drug concentrations are chosen and, more importantly, how the choice of concentration might impact the landscape. The authors may wish to briefly comment on these effects, particularly in cases where the environment involves combinations of drugs. There will be a "new" fitness landscape for each concentration, but to what extent do the qualitative features changes-or whatever features drive evolutionary dynamics--change?

      This is another interesting suggestion. We have analyzed a new data set for dihydrofolate reductase mutants that contains a range of drug concentrations of two different antifolate drugs. The general question of how drug concentrations change evolutionary dynamics has been addressed in prior work of ours:

      ● Ogbunugafor CB, Wylie CS, Diakite I, Weinreich DM, Hartl DL. Adaptive landscape by environment interactions dictate evolutionary dynamics in models of drug resistance. PLoS computational biology. 2016 Jan 25;12(1):e1004710.

      ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.

      There are a very large number of environment types that might alter the drug availability or variant vulnerability metrics. In our study, we used an established data set composed of different alleles of a Beta lactamase, with growth rates measured across a number of drug environments. These drug environments consisted of individual drugs at certain concentrations, as outlined in Mira et al. 2015. For our study, we examined those drugs that had a significant impact on growth rate.

      For a new analysis of antifolate drugs in 16 alleles of dihydrofolate reductase (Plasmodium falciparum), we have examined a breadth of drug concentrations (Supplementary Figure S4). This represents a different sort of environment that one can use to measure the two metrics (variant vulnerability or drug applicability). As we suggest in the manuscript, part of the strength of the metric is precisely that it can incorporate drug dimensions of various kinds.

      • The metrics introduced depend on the ensemble of drugs chosen. To what extent are the chosen drugs representative? Are there cases where nonrepresentative ensembles might be advantageous?

      The authors thank the reviewer for this. The general point has been addressed in our comments above. Further, the general question of how a study of one set of drugs applies to other drugs applies to every study of every drug, as no single study interrogates every sort of drug ensemble. That said, we’ve explained the anatomy of our metrics, and have outlined how it can be directly applied to others. There is nothing about the metric itself that has anything to do with a particular drug type – the arithmetic is rather vanilla.

      Reviewer #2 (Recommendations For The Authors):

      1. Regarding my comment about the different formalisms for epistatic decomposition analysis, a key reference is

      Poelwijk FJ, Krishna V, Ranganathan R (2016). The Context-Dependence of Mutations: A Linkage of Formalisms. PLoS Comput Biol 12(6): e1004771.

      The authors appreciate this, are fans of this work, and have cited it in the revision.

      An example where both Fourier and Taylor analyses were carried out and the different interpretations of these formalisms were discussed is

      Unraveling the causes of adaptive benefits of synonymous mutations in TEM-1 βlactamase. Mark P. Zwart, Martijn F. Schenk, Sungmin Hwang, Bertha Koopmanschap, Niek de Lange, Lion van de Pol, Tran Thi Thuy Nga, Ivan G. Szendro, Joachim Krug & J. Arjan G. M. de Visser Heredity 121:406-421 (2018)

      The authors are grateful for these references. While we don’t think they are necessary for our new section entitled “Notes on methods used to detect epistasis,” we did engage them, and will keep them in mind for other work that more centrally focuses on methods used to detect epistasis. As the author acknowledges, a full treatment of this topic is too large for a single manuscript, let alone a subsection of one study. We have provided a discussion of it, and pointed the readers to longer review articles that explore some of these topics in good detail:

      ● C. Bank, Epistasis and adaptation on fitness landscapes, Annual Review of Ecology, Evolution, and Systematics 53 (1) (2022) 457–479.

      ● T. B. Sackton, D. L. Hartl, Genotypic context and epistasis in individuals and populations, Cell 166 (2) (2016) 279–287.

      ● J. Diaz-Colunga, A. Skwara, J. C. C. Vila, D. Bajic, Á. Sánchez, Global epistasis and the emergence of ecological function, BioRxviv

      1. Although the authors label Figure 4 with the term "environmental epistasis", as far as I can see it is only a standard epistasis analysis that is carried out separately for each environment. The analysis of environmental epistasis should instead focus on which aspects of these interactions are different or similar in different environments, for example, by looking at the reranking of fitness values under environmental changes [see Ref.[26] as well as more recent related work, e.g. Gorter et al., Genetics 208:307-322 (2018); Das et al., eLife9:e55155 (2020)]. To some extent, such an analysis was already performed by Mira et al., but not on the level of epistatic interaction coefficients.

      The authors have provided a new analysis of how fitness value rankings have changed across drug environments, often a signature of epistatic effects across environments (Supplementary Figure S1).

      We disagree with the idea that our analysis is not a sort of environmental epistasis; we resolve coefficients between loci across different environments. As with every interrogation of G x E effects (G x G x E in our case), what constitutes an “environment” is a messy conversation. We have chosen the route of explaining very clearly what we mean:

      “We further explored the interactions across this fitness landscape and panels of drugs in two additional ways. First, we calculated the variant vulnerability for 1-step neighbors, which is the mean variant vulnerability of all alleles one mutational step away from a focal variant. This metric gives information on how the variant vulnerability values are distributed across a fitness landscape. Second, we estimated statistical interaction effects on bacterial growth through LASSO regression. For each drug, we fit a model of relative growth as a function of M69L x E104K x G238S x N276D (i.e., including all interaction terms between the four amino acid substitutions). The effect sizes of the interaction terms from this regularized regression analysis allow us to infer higher-order dynamics for susceptibility. We label this calculation as an analysis of “environmental epistasis.”

      As the grammar for these sorts of analyses continues to evolve, the best one can do is be clear about what they mean. We believe that we communicated this directly and transparently.

      1. As a general comment, to strengthen the conclusions of the study, it would be good if the authors could include additional data sets in their analysis.

      The authors appreciate this comment and have given this point ample treatment. Further, other main conclusions and discussion points are focused on the biology of the system that we examined. Analyzing other data sets may demonstrate the broader reach of the metrics, but it would not alter the strength of our own conclusions (or if they would, Reviewer #2 has not told us how).

      1. There are some typos in the units of drug concentrations in Section 2.4 that should be corrected.

      The authors truly appreciate this. It is a great catch. We have fixed this in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      I would suggest demonstrating the concepts for a second drug class, and suggest folA variants and trimethoprim resistance, for which there is existing published data similar to what the authors have used here (e.g. Palmer et al. 2015, https://doi.org/10.1038/ncomms8385)

      The authors appreciate this insight. As previously described, we have analyzed a data set of folA mutants for the Plasmodium falciparum ortholog of dihydrofolate reductase, and included these results in new supplemental material. Please see the supplementary material.

      There are some errors in formatting and presentation that I have annotated in a separate PDF file (https://elife-rp.msubmit.net/eliferp_files/2023/04/11/00117789/00/117789_0_attach_8_30399_convrt.pdf), as the absence of line numbers makes indicating specific things exceedingly difficult.

      The authors apologize for the lack of line numbers (an honest oversight), but moreover, are tremendously grateful for this feedback. We have looked at the suggested changes carefully and have addressed many of them. Thank you.

      One thing to note: we have included a version of Figure 4 that has effects on the same axes. It appears in the supplementary material (Figure S4).

      In closing, the authors would like to thank the editors and three anonymous reviewers for engagement and for helpful comments. We are confident that the revised manuscript qualifies as a substantive revision, and we are grateful to have had the opportunity to participate.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors describe a method to decouple the mechanisms supporting pancreatic progenitor self-renewal and expansion from feed-forward mechanisms promoting their differentiation. The findings are important because they have implications beyond a single subfield. The strength of evidence is solid in that the methods, data and analyses broadly support the claims with only minor weaknesses.

      We are grateful for the substantial effort that reviewers put into reading our manuscript and providing such a detailed feedback. We have strived to address, as much as possible, all comments and criticisms. Thanks to the feedback, we believe that we have now a significantly improved manuscript. Below, there is a point-bypoint response.

      Reviewer #1 (Public Review)

      In this manuscript, the authors are developing a new protocol that aims at expanding pancreatic progenitors derived from human pluripotent stem cells under GMP-compliant conditions. The strategy is based on hypothesis-driven experiments that come from knowledge derived from pancreatic developmental biology.

      The topic is of major interest in the view of the importance of amplifying human pancreatic progenitors (both for fundamental purposes and for future clinical applications). There is indeed currently a major lack of information on efficient conditions to reach this objective, despite major recurrent efforts by the scientific community.

      Using their approach that combines stimulation of specific mitogenic pathways and inhibition of retinoic acid and specific branches of the TGF-beta and Wnt pathways, the authors claim to be able, in a highly robust and reproducible manner) to amplify in 10 passages the number of pancreatic progenitors (PP) by 2,000 folds, which is really an impressive breakthrough.

      The work is globally well-performed and quite convincing. I have however some technical comments mainly related to the quantification of pancreatic progenitor amplification and to their differentiation into beta-like cells following amplification.

      We thank the reviewer for the positive assessment. Below we provide a point-by-point response to specific comments and criticisms.

      Reviewer #1 (Recommendations For The Authors)

      Figure 1:

      Panel A: What is exactly counted in Fig. 1A? Is it the number of PP (as indicated in the title) or the total number of cells? If it is PPs, was it done following PDX1/NKX6.1/SOX9 staining and FACS quantification? This question applies to a number of Figures and the authors should be clear on this point.

      We now define ‘PP cells’ as ‘PP-containing cells’ (PP cells) the first time we use the term in the RESULTS section.

      Panel D: I do not understand the source of TGFb1, GDF11, FGF18, PDGFA. Which cell type(s) express such factors in culture? I was not convinced that the signals are produced by PP and act through an autocrine loop. I have the same type of questions for the receptors: PDGFR on the second page of the results; RARs and RXRs on the third page.

      We refer to these factors/receptors as components of a tentative autocrine loop. We agree we do not prove it and we now comment on this in the discussion section.

      Figure 2:

      FACS plots are very difficult to analyze for two reasons: I do not understand the meaning of the y axes (PDX1/SOX9). Does that mean that 100% of the cells were PDX1+/SOX9+? The authors should show the separated FACS plots. More importantly, the x axes indicate that NKX6.1 FACS staining is very weak. This is by far different from what can be read in publications performing the same types of experiments (publications by Millman, Otonkoski...as examples). How was quantification performed when it is so difficult to properly define positive vs negative populations? It is necessary to present proper "negative controls" for FACS experiments and to clearly indicate how positive versus cells were defined

      We now explain the gating strategy better in the results section, all controls are included in figure S2.

      Figure 3:

      What is the exact "phenotype" of the cells that incorporated EdU: It would be really instructive to add PDX1/NKX6.1/SOX9 staining on top of EdU. I am also surprised that 20% of the cells stain positive for Annexin V. This is a huge fraction. Does that mean that many cells (20%) are dying and if the case, how amplification can take place under such deleterious conditions?

      This is an interesting mechanistic point but performing these experiments would delay the publication of the final manuscript for too long. These assays were done at p3 in order to catch CINI cells that do not expand in most cases. It is important to note that cell death also appears higher in CINI cells. It is likely that the combination of these effects results in reproducible expansion under C5. We comment on the possibilities in the discussion section.

      Figure 4:

      On FACS plots the intensity at the single cell level (see x-axis of the figure) of the NKX6.1 staining is found to increase in Fig. 4G by 50-100 folds when compared to Fig. 4E. Is it expected? This should be discussed in the text. Do the authors observe the same increase by immunocytochemistry?

      The apparent difference is actually 10-fold (from 2x102 to 2x103). We think that the most likely reason for this apparent increase is that at p0 we typically used very few cells for the FC in order to keep as many as possible for the subsequent expansion. If we had used more, we would be able to also detect cells with higher expression. As we mention in the bioinformatics analysis, NKX6 expression does increase with passaging and therefore it is also possible that at least part of this increase is real. However, we don’t have suitable data (same number of cells analyzed at each passage) to address this in a reliable manner.

      Figure 5

      Previous data from the scientific literature indicate that in vitro, by default, PP gives rise to duct-like cells. This is a bit described in the result section and supplementary figures taking into account the expression of transcription factors. However the data are not clearly explained and described in quite a qualitative manner. They should appear in a quantitative fashion (and the main figures), adding additional duct cell markers such as Carbonic anhydrase, SPP1, CFTR, and others. I assume that the authors can easily use their transcriptomic data to produce a Figure to be described and discussed in detail.

      We think it can be misleading to use such markers (other than TFs and the latter only as a collective) because specific markers of terminal differentiation are more often than not expressed during development in multipotent progenitors, the most conspicuous example been CPA1. To illustrate the point, we used the RNA Seq data of and plotted the expression values of a panel of duct genes in isolated human fetal progenitors (Ramond et al., 2017) together with their expression in p0 PP and ePP cells from all three different procedure (please see below). All raw RNA Seq data were processed together to enable direct comparison. According to the analysis of Ramond et al the A population corresponds to MPCs, C to early endocrine progenitors (EP), D to late endocrine progenitors and, by inference and gene expression pattern B to BPs. Expression levels of all these markers were very similar suggesting that these markers cannot be used to distinguish between duct cells and progenitor cells. Importantly, SC-islets derived from either dPP or ePP cells express extremely low and similar levels of KRT19, a marker of duct cells. This latter information is now included in the last part of the results (Figure S7).

      Author response image 1.

      Fig. 7:<br /> The figure is a bit disappointing for 2 reasons. In A and B, the quality of INS, GCG, and SST staining is really poor. In E, GSIS is really difficult to interpret. They should not be presented as stimulatory indexes. The authors should present independently: INS content; INS secretion at low glucose; INS secretion at high glucose; INS secretion with KCL. Finally, the authors should indicate that glucose poorly (around 2 fold) activates insulin/C-Pept secretion in their stem-cell-derived islets.

      We disagree with the quality assessment of the immunofluorescence. Stimulation indexes are also used very widely but we now provide data for actual C-peptide secretion normalized for DNA content of the SC-islets. For technical reasons we do not have normalized C-peptide secretion for human islets. However, we provide a direct comparison to the stimulation index of human islets assayed under the same conditions (2.7 mM glucose / 16.7 mM glucose / 16.7 mM glucose + 30 mM KCl) without presenting SC-islets separately and tweaking the glucose basal (lowering) and stimulation (increasing) levels to inflate the stimulation index. This is unfortunately common. In any case, we do not claim an improvement in the differentiation conditions and our S5-S7 steps may not be optimal but this is not the subject of this work.

      Reviewer #2 (Public Review)

      Summary

      The paper presents a novel approach to expand iPSC-derived pdx1+/nkx6.1+ pancreas progenitors, making them potentially suitable for GMP-compatible protocols. This advancement represents a significant breakthrough for diabetes cell replacement therapies, as one of the current bottlenecks is the inability to expand PP without compromising their differentiation potential. The study employs a robust dataset and state-of-the-art methodology, unveiling crucial signaling pathways (eg TGF, Notch...) responsible for sustaining pancreas progenitors while preserving their differentiation potential in vitro.

      Strengths

      This paper has strong data, guided omics technology, clear aims, applicability to current protocols, and beneficial implications for diabetes research. The discussion on challenges adds depth to the study and encourages future research to build upon these important findings.

      We thank the reviewer for the positive assessment. Below we provide a point-by-point response to general comments and criticisms.

      Weaknesses

      The paper does have some weaknesses that could be addressed to improve its overall clarity and impact. The writing style could benefit from simplification, as certain sections are explained in a convoluted manner and difficult to follow, in some instances, redundancy is evident. Furthermore, the legends accompanying figures should be self-explanatory, ensuring that readers can easily understand the presented data without the need to be checking along the paper for information.

      We have simplified the text in several places and removed redundancies, particularly in the discussion. We revisited the figure legends and made minor corrections to increase clarity. However, regarding the figure legends, we think that adding the interpretation of the results would be redundant to the main text.

      The culture conditions employed in the study might benefit from more systematic organization and documentation, making them easier to follow.<br /> There is a comparative Table (Table S1) where all conditions are summarized. We refer to this Table every time that we introduce a new condition. We also have a Table (Table S4) which presents all different media and components used it the differentiation procedure.

      Another important aspect is the functionality of the expanded cells after differentiation. While the study provides valuable insights into the expansion of pancreas progenitors in vitro and does the basic tests to measure their functionality after differentiation the paper could be strengthened by exploring the behavior and efficacy of these cells deeper, and in an in vivo setting.

      This will be done in a future study where we will also introduce a number of modifications in S5-S7

      Quantifications for immunofluorescence (IF) data should be displayed.

      We have not conducted quantifications of IFs because FC is much more objective and accurate. We have not conducted FC for CDX2 and AFP because all other data strongly favor C6 anyway. It should be noted that CDX2 and AFP expression is generally not addressed at all presumably because it raises uncomfortable questions and, to our knowledge, we are the first to address this so exhaustively.

      Some claims made in the paper may come across as somewhat speculative.

      We have now indicated so where applicable.

      Additionally, while the paper discusses the potential adaptability of the method to GMP-compatible protocols, there is limited elaboration on how this transition would occur practically or any discussion of the challenges it might entail.

      We have now added a paragraph discussing this in the discussion section.

      Reviewer #2 (Recommendations For The Authors)

      Related to Figure 1:

      • Unclear if CINI or SB431542 + CINI was used (first paragraph of results...)

      The paragraph was unclear and it is now rewritten

      • Was the differentiation to PP similar between the different attempts? A basic QC for each Stem Cell technology differentiation would be good to include.

      We added (Figure 1B) a comparison of expression data of general genes (QC) in PP cells showing very comparable patterns of expression. Some of these PP cells went on to expand and most did not but there is no apparent correlation of this with the gene expression data.

      • qPCR data - relative fold? over what condition? (indicate on axis label)

      We added a label as well as an explanation on p0 values in the figure legend

      • FGF18/ PDGFA - worth including background in pancreas development as in the other factors.

      Background information has been added

      • Bioinformatics is a bit biased with a few genes selected - what are the DEGs / top enriched pathways? Maybe worth showing a volcano plot of the DEGs for example.

      We have done all these standard analyses but we think that they did not contribute anything else useful to the study with the exception of pointing to the finding that the TGFb pathway is negatively correlated with expansion, and this is included in the study. The ‘unbiased’ analysis that the reviewer suggests did not turn out something else useful to exploit for the expansion. This does not mean that our approach is biased – in our view it is hypothesis-driven. As we also write in the manuscript, if in a certain pathway a key gene fails to be expressed, the pathway will not show up in any GO or GSEA analyses. However, the pathway will still be regulated. The RA and FGF18 cases clearly illustrate this. We realize that these analyses have become a standard but we think that it is not the only way to approach genomics data and these approaches did not offer much in the context of this study.

      • The E2F part is very speculative

      The pathway came up as a result of ‘unbiased’ GSEA analyses. However, we do agree and rephrased.

      • The authors claim ' the negative correlation of TGFb signalling with expansion retrospectively justifies the use of A83 '. However, p0 is not treated with A83 - how can they tell that there is a correlation between TGFb signalling and expansion?

      The correlation came from the RNA Seq data analysis during expansion. We have rephrased slightly to convey the message more clearly.

      • Typo with TGFbeta inhibitor name is mispelled (A3801)

      Corrected

      • Page 5 - last paragraph - Table S3? (isnt it refering to S2?)

      Since Table S2 is the list of the regulated genes and S3 is the list of the regulated signaling pathway components both are relevant here, we now refer to both.

      • In the text Figure 2G should read Figure 1G (page 7, end of 1st paragraph).

      Corrected

      • 'Autocrine loop' existence – speculative

      Added the phrase ‘we speculated’. We refer to this only as a tentative interpretation. We also elaborate in the discussion now.

      Related to Figure 2:

      • I am not sure if I would refer to chemical "activation/inhibition" of pathways as 'gain/loss of function'. Maybe this term is more adequate for genetic modifications.

      For genetic manipulations, these terms are (supposed to be) accompanied by the adjective ‘genetic’ but to avoid misinterpretations we changed the terms to activation and inhibition as suggested.

      • It would be good to include a summary of the different conditions as a schematic in one of the figures, to make it very clear to the reader what the conditions are.

      We tried this in an early version of the manuscript but, in our view, it was adding complexity, rather than simplifying things. The problem is that as such the Table cannot be integrated in any figure if eg in Figure 2 it would be too early, if in Figure 4 it would be too late and so on. All conditions show up in detail in Table S1.

      • Nkx6.1 - is the image representative? It looks like Nkx6.1 decreases over the passages.

      We do mention in the text that ‘… even though expansion (in C5) appeared to somewhat reduce the number of NKX6.1+ cells. (Figure 2E-G). As we mentioned, this was one of the reasons to continue with other conditions (C6-C8).

      • Upregulation of AFP/ CDX2 is a bit concerning - the IF for C5 p5 shows a high proportion of CDX2+ cells (Fig S2I). perhaps it would be good to quantify the IF.

      It was concerning – this is why we then tested conditions C6-8. Since it is C6 that we propose at the end, it would be, in our view, extraneous to quantify CDX2 in C5.

      • How do C5/C1/C0 compare to CINI?

      We now remind the reader in the results section that CINI was not reproducible - so any other comparison would be extraneous.

      Related to Figure 3:

      • There is a 'Lore Ipsum' label above B

      Corrected

      Related to Figure 4:

      • It is good that AFP expression is reduced at p10, but there seems to be a high proportion of AFP at p5. IF/FACS should be quantified.

      We think that this would not add significantly since there are several other criteria, particularly the increase of the PDX1+/SOX9+/NKX6.1+ that clearly show that the C6 condition is preferable. Further elaboration of C6 could use such additional criteria. We comment on CDX2 / AFP in the discussion.

      • CDX2 should be quantified by IF / FACS.

      We think that this would not add significantly since there are several other criteria, particularly the increase of the PDX1+/SOX9+/NKX6.1+ that clearly show that the C6 condition is preferable. Further elaboration of C6 could use such additional criteria. We comment on CDX2 / AFP in the discussion.

      • Karyotype analysis is good but not very precise when analyzing genetic micro alterations... what does a low-pass sequencing of the expanding lines look like? Are there any micro-deletions in the expanding lines?

      This is an unusual request. Microdeletions may occur at any point – during passaging of hPS cells, differentiation as well as well as expansion but such data are so far not shown in publications – and reasonably so in our opinion. Thus, we have not done this analysis but it certainly would be appropriate in a clinical setting as part of QC.

      • Data supporting that the cells can be cryopreserved and recovered with >85% survival rate is not provided.

      We now provide data for the C6-mediated expansion (Figure 4J). The freezing procedure was developed during the time we were testing C5 and we don’t have sufficient data to show reliably the survival of the cells during C5 expansion. Thus, we have now removed the reference in the C5 part of the manuscript.

      Related to Figure 5:

      -Figure 5C - perhaps worth commenting on the different pathways that are enriched when cells undergo expansion and show some of the genes that are up/down regulated.

      This is indeed of interest but since it will not address any specific question in the context of this work (eg is the endocrine program repressed?) and since it would not be followed by additional experiments we think that it would burden the manuscript unnecessarily. The data are accessible for any type of analysis through the GEO database.

      • Figure S5D shows in vitro clustering away from in vivo PP - it would be good to explain how in vitro generated PP differs from their in vivo counterparts instead of restricting the comparison to the in vitro protocol.

      We have added a possible interpretation of this observation in the results section and discuss, how one could go properly about this comparison.

      • Quantification of Fig5F should be included. Is GP2 expression detectable by IF at p5 too?

      We have quantified GP2 expression by FC at p10 but not at earlier stages. We include now the FC data in Fig5F

      • Validation of Fig5G by qPCR would be good. PDX1 did not seem reduced by IF in Figure 4.

      The purpose of Fig5G is to compare the expression of the same genes across different expansion approaches. Therefore, in our view, qPCRs would not be appropriate since we do not have samples from the other approaches. We did not claim a reduction in PDX1 expression.

      • How can the authors explain the NGN3 expression at PP?

      In our view, differentiation is a dynamic process and not all cells are synchronized at the same cell type, this is true in vivo and in vitro. Sc-RNA Seq data indeed show a small population of cells at PP that are NEUROG3+ (our unpublished data). We have now included this in the discussion.

      Related to Figure 6:

      • How do the different lines differ? Any statistical comparison between lines?

      There is a paragraph dealing with the comparison of PP and ePP cells (p5 and p10) from different lines at the level of gene expression and the data are in Figure S6A-G. Then there is a paragraph addressing this at the level of PDX1/SOX9/NKX6.1 expression by FC. We have now expanded and rewrote the latter to include statistical comparisons across PPs from different lines at p0, p5 an p10

      Related to Figure 7:

      • Mention the use of micropatterned

      Micropatterned wells - not really correct. They use Aggrewells, micropatterned plates are something else.

      We changed ‘micropatterned wells’ into ‘microwells’

      • Figure 7D, those are qPCR data. The label is inconsistent, why did they call it fold induction instead of fold change? Also, not sure if plotting the fold change to hPSC is the best here.

      We use fold change when comparing the expression of the same gene at different passages but fold induction when comparing to its expression in hPS cells. We made sure it is also explained in the figure legends.

      • Absolute values should be shown for the GSIS to determine basal insulin secretion. Also, sequential stimulation to address if the cells are able to respond to multiple glucose stimulations.

      We include now the secreted amounts of human C-peptide under the different conditions (Figure S7) normalized for cell numbers using their DNA content for the normalization. The many parameters we have used suggest that dPP and ePP SC-islets are very similar. If we were claiming a better S5-S7 procedure, such an assay would have been necessary but in this context, we think it is not absolutely necessary.

      • In vivo data would have strengthened the story. It is not clear if, in vivo, the cells will behave as the nonexpanded iPSC-derived beta cells.

      We agree and these studies are under way but we do not expect to complete them soon. We feel that it is important that this work appears sooner rather than later.

      Reviewer #3 (Public Review)

      Summary:

      In this work, Jarc et al. describe a method to decouple the mechanisms supporting progenitor self-renewal and expansion from feed-forward mechanisms promoting their differentiation.

      The authors aimed at expanding pancreatic progenitor (PP) cells, strictly characterized as PDX1+/SOX9+/NKX6.1+ cells, for several rounds. This required finding the best cell culture conditions that allow sustaining PP cell proliferation along cell passages, while avoiding their further differentiation. They achieve this by comparing the transcriptome of PP cells that can be expanded for several passages against the transcriptome of unexpanded (just differentiated) PP cells.

      The optimized culture conditions enabled the selection of PDX1+/SOX9+/NKX6.1+ PP cells and their consistent, 2000-fold, expansion over ten passages and 40-45 days. Transcriptome analyses confirmed the stabilization of PP identity and the effective suppression of differentiation. These optimized culture conditions consisted of substituting the Vitamin A containing B27 supplement with a B27 formulation devoid of vitamin A (to avoid retinoic acid (RA) signaling from an autocrine feed-forward loop), substituting A38-01 with the ALK5 II inhibitor (ALK5i II) that targets primarily ALK5, supplementation of medium with FGF18 (in addition to FGF2) and the canonical Wnt inhibitor IWR-1, and cell culture on vitronectin-N (VTN-N) as a substrate instead of Matrigel.

      Strengths:

      The strength of this work relies on a clever approach to identify cell culture modifications that allow expansion of PP cells (once differentiated) while maintaining, if not reinforcing, PP cell identity. Along the work, it is emphasized that PP cell identity is associated with the co-expression of PDX1, SOX9, and NKX6.1. The optimized protocol is unique (among the other datasets used in the comparison shown here) in inducing a strong upregulation of GP2, a unique marker of human fetal pancreas progenitors. Importantly GP2+ enriched hPS cell-derived PP cells are more efficiently differentiating into pancreatic endocrine cells (Aghazadeh et al., 2022; Ameri et al., 2017).

      The unlimited expansion of PP cells reported here would allow scaling-up the generation of beta cells, for the cell therapy of diabetes, by eliminating a source of variability derived from the number of differentiation procedures to be carried out when starting at the hPS cell stage each time. The approach presented here would allow the selection of the most optimally differentiated PP cell population for subsequent expansion and storage. Among other conditions optimized, the authors report a role for Vitamin A in activating retinoic acid signaling in an autocrine feed-forward loop, and the supplementation with FGF18 to reinforce FGF2 signaling.

      This is a relevant topic in the field of research, and some of the cell culture conditions reported here for PP expansion might have important implications in cell therapy approaches. Thus, the approach and results presented in this study could be of interest to researchers working in the field of in vitro pancreatic beta cell differentiation from hPSCs. Table S1 and Table S4 are clearly detailed and extremely instrumental to this aim.

      We thank the reviewer for the positive assessment. Below we provide a point-by-point response to general comments and criticisms.

      Weaknesses

      The authors strictly define PP cells as PDX1+/SOX9+/NKX6.1+ cells, and this phenotype was convincingly characterized by immunofluorescence, RT-qPCR, and FACS analysis along the work. However, broadly defined PDX1+/SOX9+/NKX6.1+ could include pancreatic multipotent progenitor cells (MPC, defined as PDX1+/SOX9+/NKX6.1+/PTF1A+ cells) or pancreatic bipotent progenitors (BP, defined as PDX1+/SOX9+/NKX6.1+/PTF1A-) cells. It has been indeed reported that Nkx6.1/Nkx6.2 and Ptf1a function as antagonistic lineage determinants in MPC (Schaffer, A.E. et al. PLoS Genet 9, e1003274, 2013), and that the Nkx6/Ptf1a switch only operates during a critical competence window when progenitors are still multipotent and can be uncoupled from cell differentiation. It would be important to define whether culturing PDX1+/SOX9+/NKX6.1+ PP (as defined in this work) in the best conditions allowing cell expansion is reinforcing either an MPC or BP phenotype. Data from Figure S2A (last paragraph of page 7) suggests that PTF1A expression is decreased in C5 culture conditions, thus more homogeneously keeping BP cells in this media composition. However, on page 15, 2nd paragraph it is stated that "the strong upregulation of NKX6.2 in our procedure suggested that our ePP cells may have retracted to an earlier PP stage". Evaluating the co-expression of the previously selected markers with PTF1A (or CPA2), or the more homogeneous expression of novel BP markers described, such as DCDC2A (Scavuzzo et al. Nat Commun 9, 3356, 2018), in the different culture conditions assayed would more shield light into this relevant aspect.

      This is certainly an interesting point. The RNA Seq data suggest that ePP cells resemble BP cells rather than MPCs and that this occurs during expansion. We have now added a new paragraph in the results section to illustrate this and added graphs of CPA2, PTF1A and DCDC2A expression during expansion in Figure 5, S5 as well as data in Table S5. In summary, we favor the interpretation that expanded cells are close but not identical to the BP identity and refer to that in the discussion. We have also amended the statement on page 15 stating the strong upregulation of NKX6.2 in our procedure suggested that our ePP cells may have retracted to an earlier PP stage.

      In line with the previous comment, it would be extremely insightful if the authors could characterize or at least discuss a potential role for YAP underlying the mechanistic effects observed after culturing PP in different media compositions. It is well known that the nuclear localization of the co-activator YAP broadly promotes cell proliferation, and it is a key regulator of organ growth during development. Importantly in this context, it has been reported that TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors and disruption of this interaction arrests the growth of the embryonic pancreas (Cebola, I. et al. Nat Cell Biol 17, 615-26, 2015). More recently, it has also been shown that a cell-extrinsic and intrinsic mechanotransduction pathway mediated by YAP acts as gatekeeper in the fate decisions of BP in the developing pancreas, whereby nuclear YAP in BPs allows proliferation in an uncommitted fate, while YAP silencing induces EP commitment (Mamidi, A. et al. Nature 564, 114-118, 2018; Rosado-Olivieri et al. Nature Communications 10, 1464, 2019). This mechanism was further exploited recently to improve the in vitro pancreatic beta cell differentiation protocol (Hogrebe et al., Nature Protocols 16, 4109-4143, 2021; Hogrebe et al, Nature Biotechnology 38, 460-470, 2020). Thus, YAP in the context of the findings described in this work could be a key player underlying the proliferation vs differentiation decisions in PP.

      We do refer to these publications now and refer to the YAP pathway in the introduction and results sections as well as in the discussion. We have not investigated more because the kinetics of the different components of the pathway are complex and do not give an indication of whether the pathway becomes more or less active – please see below.

      Author response image 2.

      Regarding the improvements made in the PP cell culture medium composition to allow expansion while avoiding differentiation, some of the claims should be better discussed and contextualized with current stateof-the-art differentiation protocols. As an example, the use of ALK5 II inhibitor (ALK5i II) has been reported to induce EP commitment from PP, while RA was used to induce PP commitment from the primitive gut tube cell stage in recently reported in vitro differentiation protocols (Hogrebe et al., Nature Protocols 16, 41094143, 2021; Rosado-Olivieri et al. Nature Communications 10, 1464, 2019). In this context, and to the authors' knowledge, is Vitamin A (triggering autocrine RA signaling) usually included in the basal media formulations used in other recently reported state-of-the-art protocols? If so, at which stages? Would it be advisable to remove it?

      These points and our views are now included in the discussion

      In this line also, the supplementation of cell culture media with the canonical Wnt inhibitor IWR-1 is used in this work to allow the expansion of PP while avoiding differentiation. A role for Wnt pathway inhibition during endocrine differentiation using IWR1 has been previously reported (Sharon et al. Cell Reports 27, 22812291.e5, 2019). In that work, Wnt inhibition in vitro causes an increase in the proportion of differentiated endocrine cells. It would be advisable to discuss these previous findings with the results presented in the current work. Could Wnt inhibition have different effects depending on the differential modulation of the other signaling pathways?

      These points are now included in the discussion together with the points above

      Reviewer #3 (Recommendations For The Authors)

      Recommendations for improving the writing and presentation and minor comments on the text and figures:

      • In the Introduction (page 3, line 1) it is stated: "Diabetes is a global epidemic affecting > 9% of the global population and its two main forms result from .....". The authors could rephrase/remove "global" repeated twice.

      Corrected

      • On page 4 of the introduction, in the context of "Unlimited expansion of PP cells in vitro will require disentangling differentiation signals from proliferation/maintenance signals. Several pathways have been implicated in these processes..." the authors are advised to consider mentioning the YAP mediated mechanisms as another key aspect underlying MPC phenotype (Cebola, I. et al. Nat Cell Biol 17, 615-26, 2015) and the BP to endocrine progenitor (EP) commitment (Mamidi, A. et al. Nature 564, 114-118, 2018; Rosado-Olivieri et al. Nature Communications 10, 1464, 2019). This should be better discussed in the context of the Weaknesses mentioned in the Public Review. It would be worth considering adding effectors and other molecules involved in YAP and Hippo pathway signaling to Table S3.

      We have added the role of the Hippo/YAP pathway in the introduction and mentioned in the results the finding that components of the pathway are generally not regulated except two that are now added in Table S3

      • In page 4, paragraph 3, near "and SB431542, another general (ALK4/5/7) TGFβ inhibitor", consider removing "another". SB431542 is the same inhibitor mentioned in the other protocols at the beginning of the paragraph.

      The paragraph is rewritten because it was not clear – we used A83-01 and not SB431542. Other approaches had used SB431542.

      • Page 5, Table S2 is cited after Table S3, please consider reordering.

      In fact, both S2 and S3 are relevant there, therefore we quote both now.

      • Page 8, 2nd paragraph, near "Expression of both AFP and CDX2 increased transiently upon expansion, at p5 (Figure S2H-J)." How do you explain results in FigS2C, D and FigS2E (AFP/CDX2)? RT-qPCR data does not suggest transient downregulation.

      AFP and CDX2 were – wrongly – italicized in the quoted passage. Therefore, in one case we refer to the protein and in the other to the transcript levels. We corrected and added the qualifier ‘appeared’. The difference is most likely due to translational regulation but we did not elaborate since we do not know. In any case, we have used the, less favorable but more robust, gene expression levels as the main criterion.

      • Page 9, end of 2nd paragraph, Figure 5A is cited but it looks like this should be Figure 4A.

      Corrected

      • Page 9, 3rd paragraph, when stating "C5 ePP cells of the same passage no..." please replace "no" with a number or a suitable abbreviation.

      Corrected

      • Page 9, 3rd paragraph. Expressing the values in the Y axis in a consistent manner for FigS2B-D and FigS4A would make a comparison easier.

      We strive to keep sections autonomous so that the reader would not have to flip between figures and sections – this is why we think that figure S4A is preferable as it is; it is a direct comparison of C6 to C5 for the different markers and has the additional advantage that one needs not to include p0 levels.

      • Page 9, 3rd paragraph. Green dots in FigS4A stand for p5 cells? if so, shouldn't these average 1 for all assayed genes?

      No, because the baseline (average 1) is the C5 expression at the corresponding passage no. We changed the y-axis label, hopefully it is clearer now.

      • Page 10 3rd paragraph, please include color labels in Fig. 5G.

      The different colors here correspond to the different expansion procedures that are compared. The samples are labelled on the x axis.

      • Page 10 3rd paragraph, Figure 6G is cited but it looks like this should be Figure 5G.

      Corrected

      • Page 11, 1st paragraph, at "TF genes such as FOXA2 and RBJ remained comparable", please double check if "RBJ" should be "RBPJ".

      Corrected

      • Page 11, end of 1st paragraph, when stating "Of note, expression of PTF1A was also undetectable in all ePP cells (Table S5)", is PTF1A expression level close to 1000 (which units?) in Table S5 considered undetectable?

      This statement regarding ‘undetectable PTF1A expression’ refers to expanded PP cells (ePP), not PP cells at p0. For the latter, expression is indeed close to 1000 in normalized RNA-sequence counts as mentioned in the Table legend.

      -Page 11, 4th paragraph, "In summary, the comparative transcriptome analyses suggested that our C6 expansion procedure is more efficient at strengthening the PP identity". In the context of comments made in the Public Review, more accuracy needs to be put when defining PP identity. Are these MPC or BP?

      The RNA Seq data suggest that expansion promotes a MPC  BP transition. We have added a paragraph in the corresponding results section and comment in the discussion.

      • Page 15, 2nd paragraph, the sentence "expression of PTF1A, recently shown to promote endocrine differentiation of hPS cells (Miguel-Escalada et al., 2022)" is confusing. Please double-check sentence syntax and reference. Does PTF1A expression "promote" or "create epigenetic competence" for endocrine differentiation?

      Its role is in the MPCs and it prepares the epigenetic landscape to allow for duct and endocrine specification later, thus it ‘creates epigenetic competence’. The paper was cited out of context and we have now corrected it.

      Additional recommendations by the Reviewing Editor:

      An insufficient number of experimental repetitions have been used for the following data: (Figure 1A, n = 2; Figures 2B-D, p10, n = 2; Figures 6A and B, VTN-N, n = 1).

      This is true but we do not draw quantitative conclusions from or do comparisons with these data.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their thoughtful evaluation of our manuscript. We considered all the comments and prepared the revised version. The following are our responses to the reviewers’ comments. All references, including those in the original manuscript are included at the end of this point-by-point response.

      Reviewer #1 (Public Review):

      Weaknesses:

      1) The authors should better review what we know of fungal Drosophila microbiota species as well as the ecology of rotting fruit. Are the microbiota species described in this article specific to their location/setting? It would have been interesting to know if similar species can be retrieved in other locations using other decaying fruits. The term 'core' in the title suggests that these species are generally found associated with Drosophila but this is not demonstrated. The paper is written in a way that implies the microbiota members they have found are universal. What is the evidence for this? Have the fungal species described in this paper been found in other studies? Even if this is not the case, the paper is interesting, but there should be a discussion of how generalizable the findings are.

      The reviewer inquires as to whether the microbial species described in this article are ubiquitously associated with Drosophila or not. Indeed, most of the microbes described in this manuscript are generally recognized as species associated with Drosophila spp. For example, yeasts such as Hanseniaspora uvarum, Pichia kluyveri, and Starmerella bacillaris have been detected in or isolated from Drosophila spp. collected in European countries as well as the United States and Oceania (Chandler et al., 2012; Solomon et al., 2019). As for bacteria, species belonging to the genera Pantoea, Lactobacillus, Leuconostoc, and Acetobacter have also previously been detected in wild Drosophila spp. (Chandler et al., 2011). These statements have been incorporated into our revised manuscript (lines 391-397). Nevertheless, the term “core” in the manuscript and title may lead to misunderstanding, as the generality does not ensure the ubiquitous presence of these microbial species in every individual fly. Considering this point, we replaced the “core” with “key,” a term that is more appropriate to our context.

      2) Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild? Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild?

      The reviewer asked whether the microbial species detected from the fermented banana samples were derived from flies. To address this question, additional experiments under more controlled conditions would be needed, such as artificially introducing wild flies onto fresh bananas in the laboratory. Nevertheless, the microbes potentially originate from wild flies, as supported by the literature cited in our response to the Weakness 1).

      Alternative sources of microbes also merit consideration. For example, microbes may have been introduced to unfermented bananas by penetration through peel injuries (lines 1300-1301). In addition, they could be introduced by insects other than flies, given that rove beetles (Staphylinidae) and sap beetles (Nitidulidae) were observed in some of the traps. The explanation of these possibilities have been incorporated into DISCUSSION (lines 414427) of our revised manuscript.

      Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Our sampling strategy was designed to target not only D. melanogaster but also other domestic Drosophila species, such as D. simulans, that inhabit human residential areas. For the traps where adult flies were caught, we identified the species of the drosophilids as shown in Table S1, thereby showing the presence of either or both D. melanogaster and D. simulans. We added these descriptions in MATERIALS AND METHODS (lines 511-512 and 560-562), and DISCUSSION (lines 378-379).

      3) Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning. The authors described their microarray data in terms of fed/starved in relation to the Finke article. They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning.

      Regarding the antimicrobial peptide genes, statistical comparisons of our RNA-seq data across different conditions were impracticable because most of the genes showed low expression levels. The RNA-seq data of the yeast-fed larvae is shown in Author response Table 1. While a subset of genes exhibited significantly elevated expression in the nonsupportive conditions relative to the supportive ones, this can be due to intra-sample variability rather than the difference in the nutritional conditions. Similar expression profiles were observed in the bacteria-fed larvae as well (data not shown). Therefore, it is difficult to discuss a change in immune genes in the paper. Additionally, the previous study that conducted larval microarray analysis (Zinke et al., 2002) did not explicitly focus on immune genes.

      Author response table 1.

      Antimicrobial peptide genes are not up-regulated by any of the microbes. Antimicrobial peptides gene expression profiles of whole bodies of first-instar larvae fed on yeasts. TPM values of all samples and comparison results of gene expression levels in the larvae fed on supportive and non-supportive yeasts are shown. Antibacterial peptide genes mentioned in Hanson and Lemaitre, 2020 are listed. NA or na, not available.

      They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      We did not observe significant differences in the gene expression profiles of the larvae fed on different microbial species within bacteria or fungi, or between those fed on bacteria and those fed on fungi. For example, the gene expression profiles of larvae fed on the various supportive microbes showed striking similarities to each other, as evidenced by the heat map showing the expression of all genes detected in larvae fed either yeast or bacteria (Author response image 1). Similarities were also observed among larvae fed on various nonsupportive microbes.

      Only a handful of genes showed different expression patterns between larvae fed on yeast and those fed on bacteria. Thus, it is challenging to discuss the potential differential impacts of yeast and bacteria on larval growth, if any.

      Author response image 1.

      Gene expression profiles of larvae fed on the various supporting microbes show striking similarities to each other. Heat map showing the gene expression of the first-instar larvae that fed on yeasts or bacteria. Freshly hatched germ-free larvae were placed on banana agar inoculated with each microbe and collected after 15 h feeding to examine gene expression of the whole body. Note that data presented in Figures 3A and 4C in the original manuscript, which are obtained independently, are combined to generate this heat map. The labels under the heat map indicate the microbial species fed to the larvae, with three samples analyzed for each condition. The lactic acid bacteria (“LAB”) include Lactiplantibacillus plantarum and Leuconostoc mesenteroides, while the lactic acid bacterium (“AAB”) represents Acetobacter orientalis. “LAB + AAB” signifies mixtures of the AAB and either one of the LAB species. The asterisks in the label highlight “LAB + AAB” or “LAB” samples clustered separately from the other samples in those conditions; “” indicates a sample in a “LAB + AAB” condition (Lactiplantibacillus plantarum + Acetobacter orientalis), and “*” indicates a sample in a “LAB” condition (Leuconostoc mesenteroides). Brown abbreviations of scientific names are for the yeast-fed conditions. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; M. asi, Martiniozyma asiatica; Sa. cra, Saccharomycopsis crataegensis; P. klu, Pichia kluyveri; St. bac, Starmerella bacillaris; BY4741, Saccharomyces cerevisiae BY4741 strain.

      4) The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)? Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)?

      Although we did not investigate the microbiota in the gut of either larvae or adults, we did compare the microbiota within surface-sterilized larvae or adults with the microbiota in food samples. We found that adult flies and early-stage foods, as well as larvae and late-stage foods, harbored similar microbial species (Figure 1F). Additionally, previous studies examining the gut microbiota in wild adult flies have detected microbes belonging to the same species or taxa as those isolated from our foods (Chandler et al., 2011; Chandler et al., 2012). We have elaborated on this in our response to Weakness 1).

      While we did not investigate whether these species are capable of establishing a niche in the cardia of adults, we have cited the study by Dodge et al., 2023 in our revised manuscript and discussed the possibility that predominant microbes in adult flies may show a propensity for colonization (lines 410-413).

      Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The reviewer inquires whether the supportive microbes in our study stimulate gut signaling pathways and induce the expression of digestive protease genes, as demonstrated in a previous study (Erkosar et al., 2015). Based on our RNA-seq data, this is unlikely. The aforementioned study demonstrated that seven protease genes are upregulated through Imd pathway stimulation by a bacterium that promotes the larval growth. In our RNA-seq analysis, these seven genes did not exhibit a consistent upregulation in the presence of the supportive microbes (H. uva or K. hum in Author response table 2A; Le. mes + A. ori in Author response table 2B). Rather, they exhibited a tendency to be upregulated by the presence of non-supportive microbes (St. bac or Pi. klu in Author response table 2A; La. pla in Author Response Table 2B).

      Author response table 2.

      Most of the peptidase genes reported by Erkosar et al., 2015 are more highly expressed under the non-supportive conditions than the supportive conditions. Comparison of the expression levels of seven peptidase genes derived from the RNA-seq analysis of yeast-fed (A) or bacteria-fed (B) first-instar larvae. A previous report demonstrated that the expression of these genes is upregulated upon association with a strain of Lactiplantibacillus plantarum, and that the PGRP-LE/Imd/Relish signaling pathway, at least partially, mediates the induction (Erkosar et al., 2015). H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; La. pla, Lactiplantibacillus plantarum; Le. mes, Leuconostoc mesenteroides; A. ori, Acetobacter orientalis; ns, not significant.

      Reviewer #2 (Public Review):

      Weaknesses:

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas. Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation. Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas.

      The reviewer asks whether the isolated microbes were colonized in the larval gut. Previous studies on microbial colonization associated with Drosophila have predominantly focused on adults (Pais et al. PLOS Biology, 2018), rather than larval stages. Developing larvae continually consume substrates which are already subjected to microbial fermentation and abundant in live microbes until the end of the feeding larval stage. Therefore, we consider it difficult to discuss microbial colonization in the larval gut. We have mentioned this point in DISCUSSION of the revised manuscript (lines 408-410).

      Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation.

      While we recognize the importance of comprehensive mechanistic analysis, elucidation of more detailed molecular mechanisms lies beyond the scope of this study and will be a subject of future research.

      Regarding the nutritional role of BCAAs, the incorporation of BCAAs enabled larvae fed with the non-supportive yeast to grow to the second-instar stage. This observation implies that consumption of BCAAs upregulates diverse genes involved in cellular growth processes in larvae. We mentioned a previously reported interaction between lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in the manuscript (lines 433-436). LAB may facilitate lactate provision to AAB, consequently enhancing the biosynthesis of essential nutrients such as amino acids. To test this hypothesis, future experiments will include the supplementation of lactic acid to AAB culture plates, and the co-inoculation of AAB with LAB mutant strains defective in lactate production to assess both larval growth and continuous larval association with AAB. With respect to AAB-yeast interactions, metabolites released from yeast cells might benefit AAB growth, and this possibility will be investigated through the supplementation of AAB culture plates with candidate metabolites identified in the cell suspension supernatants of the late-stage yeasts.

      Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      We appreciate the reviewer's recommendations. The explanation of the universality of our findings has been included in the revised DISCUSSION (lines 391-397). We have also added descriptions on the implication of compositional shifts occurring in adult microbiota (lines 404413), possible inoculation routes of different microbes (lines 414-427), and hypotheses on the mechanism of larval growth promotion by yeasts (lines 469-476), all of which could be the focus of our future study.

      Reviewer #3 (Public Review):

      Weaknesses:

      Despite describing important findings, I believe that a more thorough explanation of the experimental setup and the steps expected to occur in the exposed diet over time, starting with natural "inoculation" could help the reader, in particular the non-specialist, grasp the rationale and main findings of the manuscript. When exactly was the decision to collect earlystage samples made? Was it when embryos were detected in some of the samples? What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects? Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source. Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples?

      We collected traps and early-stage samples 2.5 days after setting up the traps. This duration was determined from pilot experiments. A shorter collection time resulted in a lower likelihood of obtaining traps visited by adult flies, whereas a longer collection time caused overcrowding of larvae as well as deaths of adults from drowning in the liquid seeping out of the fruits. These procedural details have been included in the MATERIALS AND METHODS section of the revised manuscript (lines 523-526).

      What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects?

      We assume that the origins of the microbes detected in the no-fly trap foods vary depending on the species. For instance, Colletotrichum musae, the fungus that causes banana anthracnose, may have been present in fresh bananas before trap placement. The filamentous fungi could have originated from airborne spores, but they could also have been introduced by insects that feed on these fungi. We have included these possibilities in the DISCUSSION section of the revised manuscript (lines 417-421).

      Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source.

      We are grateful for the reviewer's insightful suggestion regarding shifts in the adult microbiome. We have included in the DISCUSSION section of the revised manuscript the possibility that the microbial composition may change substantially during pupal stages or after adult eclosion (lines 404-413).

      Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used?

      In this metabolomic analysis, LC-MS/MS with triple quadrupole MS monitors the formation of fragment ions from precursor ions specific to each target compound. The use of PFPP columns, which provide excellent separation of amino acids and nucleobases, allows chromatographic peaks of many structural isomers to be separated into independent peaks. In addition, all measured compounds are compared with data from a standard library to confirm retention time agreement. Structural isomers were separated either by retention time on the column or by compound-specific MRM signals (in fact, leucine and isoleucine have both unique MRM channels and column separations). Detailed MRM conditions are identical to the previously published study (Oka et al., 2017). These have been included in the revised ‘LC-MS/MS measurement’ section in MATERIALS AND METHODS (lines 810-824).

      Were standard curves produced?

      Since relative quantification of metabolite amounts was performed in this study, no standard curve was generated to determine absolute concentrations. However, a standard compound of known concentration (single point) was measured to confirm retention time and relative area values.

      Were internal, deuterated controls used?

      Internal standards for deuterium-labeled compounds were not used in this study. This is because it is not realistic to obtain deuterium-labeled compounds for all compounds since a large number of compounds are measured. However, an internal standard (L-methionine sulfone) is added to the extraction solvent to calculate the recovery rate. This has been included in the revised ‘LC-MS/MS measurement’ section in MATERIALS AND METHODS (lines 824-825).

      Reviewer #1 (Recommendations For The Authors):

      Additional comments 1. The authors should do a better job of presenting their data. It took me quite a while to understand the protocol of Figure 1. Panel 1A, B, C could be improved. For instance, 1A suggests that flies are transferred to the lab while this is in fact the banana trap. Indicate 'Banana trap colonized by flies' rather 'wild-type flies in the trap'. 1C: should indicate that the food suspension comes from the banana trap. 1B,D,D: do not use pale color as legend. Avoid the use of indices in Figure 2 (Y1 rather than Y1). Grey colors are difficult to distinguish in Figure 2. Etc. It is a pain for reviewers that figure legends are on the verso of each figure and not just below.

      We thank the reviewer for the detailed suggestions to improve the clarity and comprehensibility of our figures. We have improved the figures according to the suggestions. As for the figure legends, we have placed them below each respective figure whenever possible.

      1. Clarify in the text if 'sample' means food substratum or flies/larvae (ex. line 116 and elsewhere).

      We have revised the word “sample” throughout our manuscript and eliminated the confusion.

      1. Line 170 - clarify what you mean by fermented food.

      We have replaced the “fermented larval foods” with “fermented bananas” in our revised manuscript (line 165).

      1. Line 199 - what is the meaning of 'stocks'.

      We have replaced the “stocks” with “strains” (line 195).

      1. Line 320 - explain more clearly what the yeast-conditioned banana-agar plate and cell suspension supernatant are, and what the goals of using these media are. This will help in understanding the subsequent text.

      We have added a supplemental figure illustrating the sample preparation for the metabolomic analysis (Figure S6), with the following legend describing the procedure (lines 1335-1346): “Sample preparation process for the metabolomic analysis. We suspected that the supportive live yeast cells may release critical nutrients for larval growth, whereas the non-supportive yeasts may not. To test this possibility, we made three distinct sample preparations of individual yeast strains (yeast cells, yeast-conditioned banana-agar plates, and cell suspension supernatants). Yeast cells were for the analysis of intracellular metabolites, whereas yeast-conditioned banana-agar plates and cell suspension supernatants were for that of extracellular metabolites. The samples were prepared as the following procedures. Yeasts were grown on banana-agar plates for 2 days at 25°C, and then scraped from the plates to obtain “yeast cells.” Next, the remaining yeasts on the resultant plates were thoroughly removed, and a portion from each plate was cut out (“yeast-conditioned banana agar”). In addition, we suspended yeast cells from the agar plates into sterile PBS, followed by centrifugation and filtration to eliminate the yeast cells, to prepare “cell suspension supernatants.”

      1. Figure 5 is difficult to understand. Provide more explanation. Consider moving the 'all metabolites panel' to Supp. Better explain what this holidic medium is.

      The holidic medium is a medium that has been commonly used in the Drosophila research community, which contains ~40 known nutrients, and supports the larval development to pupariation (Piper et al., 2014; Piper et al., 2017). We have introduced this explanation to the RESULTS section of the manuscript (lines 322-327). However, the scope of our research reaches beyond the analysis of the holidic medium components, because feeding the holidic medium alone causes a significant delay in larval growth, suggesting a lack of nutritional components (Piper et al., 2014). Thus, we believe the "All Metabolites" panels should be placed alongside the corresponding “The holidic medium components” panels.

      1. I could not access Figure 6 when downloading the PDF. The page is white and an error message appears - it is problematic to review a paper lacking a figure.

      We regret any inconvenience caused, perhaps due to a system error. Please refer to the Author response image 2, which is identical to Figure 6 of our original manuscript.

      Author response image 2.

      Supportive yeasts facilitate larval growth by providing nutrients, including branched-chain amino acids, by releasing them from their cells (Figure 6 from the original manuscript). (A and B) Growth of larvae feeding on yeasts on banana agar supplemented with leucine and isoleucine. (A) The mean percentage of the live/dead individuals in each developmental stage. n=4. (B) The percentage of larvae that developed into second instar or later stages. The “Not found” population in Figure 6A was omitted from the calculation. Each data point represents data from a single tube. Unique letters indicate significant differences between groups (Tukey-Kramer test, p < 0.05). (C) The biosynthetic pathways for leucine and isoleucine with S. cerevisiae gene names are shown. The colored dots indicate enzymes that are conserved in the six isolated species, while the white dots indicate those that are not conserved. Abbreviations of genera are given in the key in the upper right corner. LEU2 is deleted in BY4741. (D-G) Representative image of Phloxine B-stained yeasts. The right-side images are expanded images of the boxed areas. The scale bar represents 50 µm. (H) Summary of this study. H. uvarum is predominant in the early-stage food and provides Leu, Ile, and other nutrients that are required for larval growth. In the late-stage food, AAB directly provides nutrients, while LAB and yeasts indirectly contribute to larval growth by enabling the stable larva-AAB association. The host larva responds to the nutritional environment by dramatically altering gene expression profiles, which leads to growth and pupariation. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; Pi. klu, Pichia kluyveri; St. bac, Starmerella bacillaris; GF, germ-free.

      1. Line 323 - Consider rewriting this sentence (too long, explain what the holidic medium is and why this is interesting). "In the yeast-conditioned banana-agar plates, which were anticipated to contain yeast-derived nutrients, many well-known nutrients included in a chemically defined synthetic (holidic) medium for Drosophila melanogaster (Piper et al., 2014, 2017) were not increased compared to the sterile banana-agar plates; instead, they exhibited drastic decreases irrespective of the yeast species."

      We thank the reviewer's suggestion to improve the readability of our manuscript. We have rewritten the sentence in the revised manuscript (lines 320-328) as follows: “The yeastconditioned banana-agar plates were expected to contain yeast-derived nutrients. On the contrary, the result revealed a depletion of various metabolites originally present in the sterile banana agar (Figure 5A). This result prompted us to focus on the metabolites in the chemically defined (holidic) medium for Drosophila melanogaster Piper et al., 2014; Piper et al., 2017. This medium contains ~40 known nutrients, and supports the larval development to pupariation, albeit at the half rate compared to that on a yeast-containing standard laboratory food Piper et al., 2014; Piper et al., 2017. Therefore, the holidic medium could be considered to contain the minimal essential nutrients required for larval growth. Our analysis indicated a substantial reduction of these known nutrients in the yeast-conditioned plates compared to their original quantities (Figure 5B).”

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      1. It should be clearly shown (or stated) that isolated microbes, such as H. uvarum and Pa. agglomerans, are indigenous microbes in wild Drosophila melanogaster in their outdoor sampling.

      We thank the reviewer for the suggestions. Addressing the presence of isolated microbes within wild D. melanogaster adults is important, but cannot be feasible with our data for the following reasons. Our microbiota analysis of adults was conducted using pooled individuals of multiple Drosophila species, rather than using D. melanogaster exclusively. Moreover, the microbial isolation and the analysis of adult microbiota were carried out in two independent samplings (Figures 1A and 1E in the original manuscript, respectively). As a result, the microbial species detected in the adults were slightly different from those isolated from the food samples collected in the previous sampling. Nevertheless, it is worth noting that H. uvarum dominated in 2 out of the 3 adult samples, constituting >80% of the fungal composition. Pantoea agglomerans was not detected in the adults, although Enterobacterales accounted for >59% in 2 out of the 3 samples. Therefore, these isolated microbial species, or at least their phylogenetically related species, are presumed to be indigenous to wild D. melanogaster.

      If the reviewer’s suggestion was to state the dominance of H. uvarum and Pantoea agglomerans in early-stage foods, we have added a supplemental figure showing the species-level microbial compositions corresponding to Figure 1B of the original manuscript (Figure S1), and further revised the manuscript (lines 180-186).

      1. The reviewer supposes that the indigenous microbes of flies may differ from what they usually eat. In this study, the authors use banana-based food, but is it justified in terms of the natural environment of the places where those microbes were isolated? In other words, did sampled wild flies eat bananas outside the laboratory at Kyoto University?

      Drosophila spp. inhabit human residential areas and feed on various fermented fruits and vegetables. In the areas surrounding Kyoto University, they can be found in garbage in residential dwellings as well as supermarkets. In this regard, fruits are natural food sources of wild Drosophila in the area.

      Among various fruits, bananas were selected based on the following two reasons. Firstly, bananas were commonly used in previous Drosophila studies as a trap bait or a component of Drosophila food (Anagnostou et al., 2010; Stamps et al., 2012; Consuegra et al., 2020). Secondly, and rather practically, bananas can be obtained in Japan all year at a relatively low cost. Previous studies have used various fruits such as grapes (Quan and Eisen, 2018), figs (Pais et al., 2018), and raspberries (Cho and Rohlfs, 2023). However, these fruits are only available during limited seasons and are more expensive per volume than bananas. Thus, they were not practical for our study, which required large amounts of fruit-based culture media. We have included a brief explanation regarding this point in MATERIALS AND METHODS (lines 514-518).

      1. In Fig. 6B, the Leu and Ile experiment, is the added amount of those amino acids appropriate in the context that they mention "...... supportive yeasts had concentrations of both leucine and isoleucine that were at least four-fold higher than those of non-supportive yeasts"?

      We acknowledge that the supplementation should be carried out ideally in a quantity equivalent to the difference between the released amounts of supportive and non-supportive species. However, achieving this has been highly challenging. Previous studies determined the amount of amino acid supplementation by quantifying their concentration in the bacteriaconditioned media (Consuegra et al., 2020; Henriques et al., 2020). However, we found that quantifying the exact concentrations of the amino acids is not feasible with our yeasts. As shown in Figure 5B in the original manuscript, the amino acid contents were markedly reduced in the yeast-conditioned banana agar compared to the agar without yeasts, presumably because of the uptake by the yeasts. Thus, the amino acids released from yeast cells on the banana-agar plate are not expected to accumulate in the medium. As this reviewer pointed out, in the cell suspension supernatants of the supportive yeasts, concentrations of both leucine and isoleucine were at least four-fold higher compared to those of non-supportive yeasts (Figures 5G-H in the original submission), However, this measurement does not give the absolute amount of either amino acid available for larvae. Given these constraints, we opted for the amino acid concentrations in the holidic medium, which support larval growth under axenic conditions (Piper et al., 2014). We also showed that the supplementation of the amino acids at that concentration to the bananaagar plate was not detrimental to larval growth (Figures 6A-B in the original manuscript). These rationales have been included in the revised ‘Developmental progression with BCAA supplementation’ section in MATERIALS AND METHODS of our manuscript (lines 840-847).

      1. In addition to the above, it can be included other amino acids or nutrients as control experiments.

      As mentioned in our manuscript (lines 365-368), we did supplement other amino acids, lysine and asparagine, which failed to rescue the larval growth.

      1. In the experiment of Fig. 2E, how about examining larval development using heat-killed LAB or yeast with live AAB? The reviewer speculates that one possibility is that AAB needs nutrients from LAB.

      We did not feed larvae with heat-killed LAB and live AAB for the following reasons. LAB grows very poorly on banana agar compared to yeasts, and preparation of LAB required many banana-agar plates even when we fed live bacteria to larvae. Adding dead LAB to banana-agar tubes would require far more plates, but this preparation is impractical. Furthermore, heat-killing may not allow the investigation of the contribution of heat-unstable or volatile compounds.

      As for the reviewer's suggestion regarding the addition of heat-killed yeast with AAB, heat-killed yeast itself promotes larval growth, as shown in Figures 4G and 4H in the original manuscript, so the contribution of yeast cannot be examined using this method.

      Recommendations for improving the writing and presentation.

      1. It would be good to mention that during sample collection, other insects (other than Drosophila species) were not found in the food if this is true.

      Insects other than Drosophila spp. were found in several traps in the sampling shown in Figures 1C-F. These insects, rove beetles (Staphylinidae) and sap beetles (Nitidulidae), seemed to share a niche with Drosophila in nature. Therefore, we believe that the contamination of these insects did not interfere with our goal of obtaining larval food samples. We added these descriptions and explanations to MATERIALS AND METHODS (lines 527531).

      1. There are many different kinds of bananas. It should be mentioned the detailed information.

      We had included the information on the banana in MATERIALS AND METHODS section (line 622).

      1. Concerning the place of sample collection, detailed longitude, and latitude information can be provided (this is easily obtained from Google Maps). When the collection was performed should also be mentioned. This may suggest the environment of the "wild flies" they collected.

      We added a table listing the dates of our collections, along with the longitude and latitude of each sampling place (Table S1A).

      1. The reviewer could not find how the authors conducted heat killing of yeast.

      We added the following procedure to the ‘Quantification of larval development’ section in MATERIALS AND METHODS (lines 680-688). “When feeding heat-killed yeasts to larvae, yeasts were added to the banana-agar tubes and subsequently heated as following procedures. The yeasts were revived from frozen stocks on banana-agar plates, incubated at 25°C, and then streaked on fresh agar plates. After 2-day incubation, yeast cells were scraped from the plates and suspended in PBS at the concentration of 400 mg of yeast cells in 500 µL of PBS. 125 µL of the suspensions were added to banana-agar tubes prepared as described, and after centrifugation at 3,000 x g for 5 min, the supernatants were removed. The amount of cells in each tube is ~50x compared to that when feeding live yeasts, which compensates for the reduced amount due to their inability to proliferate. The tubes were subsequently heated at 80°C for 30 min before adding germ-free larvae.”

      1. The reviewer prefers that all necessary information on how to see figures be provided in figure legends. For example, an explanation of some abbreviations is missing.

      We carefully re-examined the figure legends and added necessary information.

      1. Many of the figures are not kind to readers, i.e., one needs to refer to the legends and main text very frequently. Adding subheadings (titles) to each figure may help.

      We added subheadings to our figures to improve the comprehensibility.

      Reviewer #3 (Recommendations For The Authors):

      I have some minor questions/suggestions about the manuscript that, if addressed, may increase the clarity and quality of the work.

      1. Please, when referring to microbial species in the abbreviated form, use only the first letter of the genus. For example, P. agglomerans should be used, not Pa. agglomerans.

      We are concerned about the potential confusion caused by using only the first letter of genera, since several genera mentioned in our work share the first letters, such as P (Pichia and Pantoea), S (Starmerella, Saccharomyces, and Saccharomycopsis), or L (Lactiplantibacillus and Leuconostoc). Therefore, we used only the unabbreviated form of the above seven genera in our revised manuscript. We have also made every effort to avoid abbreviations in our figures and tables, but found it necessary to retain two-letter abbreviations when spaces are particularly limiting.

      1. In lines 294-298, how exactly was the experiment where yeasts were killed by anti-fungal agents performed? If these agents killed the yeast, how was the microbial growth on plates required to have biomass for fly inoculation obtained? Please, clarify this section.

      The yeasts were grown on normal banana-agar plates before the addition onto the anti-fungal agents-containing banana agar. We added the following procedure to MATERIALS AND METHODS (lines 689-695). “When feeding yeasts on banana agar supplemented with antifungal agents, the yeasts were individually grown on normal banana agar twice before being suspended in PBS at the concentration of 400 mg of yeast cells in 500 µL of PBS. 125 µL of the suspensions was introduced onto the anti-fungal agents (10 mL/L 10% p-hydroxybenzoic acid in 70% ethanol and 6 mL/L propionic acid, following the concentration described in Kanaoka et al., 2023)-containing banana agar in 1.5 mL tubes. After centrifugation, the supernatants were removed. The amount of cells in each tube is ~50x compared to that when feeding live yeasts.”

      1. In lines 557-558, please clarify how rDNA copy numbers can be calculated in this way.

      Considering the results of the ITS and 16S sequencing analysis, it was highly likely that rDNAs from bananas and Drosophila were amplified along with microbial rDNA in this qPCR. To estimate the microbial rDNA copy number, we assumed that the proportion of microbial rDNA within the total amplification products remains consistent between the qPCR and the corresponding sequencing analysis, because the template DNA samples and amplified regions were shared between the analyses. Based on this, the copy number of microbial rDNA was estimated by multiplying the qPCR results with the microbial rDNA ratio observed in the ITS or 16S sequencing analysis of each sample. This methodology has been detailed in the MATERIALS AND METHODS section (lines 609-615).

      1. In lines 609-611, how did you check for cells left from the previous day? Microscopy? Or do you mean that if there was liquid still in the sample you would not add more bacterial cultures? Please, clarify.

      We observed with the naked eye from outside the tubes to determine if additional AAB should be introduced. Since we placed AAB on the banana agar in a lump, we examined whether the lumps were gone or not. We have added these procedures in MATERIALS AND METHODS (lines 671-673).

      1. In Figure 2A, it is hard to differentiate between the gray tones. Please, improve this.

      We have distinguished the plots for different conditions by changing the shape of the markers on the graphs.

      1. In the legend of Figure 4, line 1101, I believe the panel letters are incorrect.

      We have corrected the manuscript (lines 1241-1242) from “heat-killed yeasts on banana agar (H and I) or live yeasts on a nutritionally rich medium (J and K)” to “heat-killed yeasts on banana agar (G and H) or live yeasts on a nutritionally rich medium (I and J).”

      1. In Figure S1, authors showed that bananas that were not inoculated still had detectable rDNA signal. Is this really because bacteria can penetrate the peel? Or could this be the “reagent microbiome”? Alternatively, could these microbes have been introduced during sample prep, such as cutting the bananas?

      The detection of rDNA in bananas that were not inoculated with microbes was unlikely to be due to microbial contamination during experimental manipulation. The reviewer pointed out the possibility that the “reagent microbiome”, presumably the microbes in PBS, are detected from the uninoculated bananas. This seems to be unlikely, considering the PBS was sterilized by autoclaving before use. To ensure that no viable microbe was left in the autoclaved PBS, we applied a portion of the PBS onto a banana-agar plate and confirmed no colony was formed after incubation for a few days. DNA derived from dead microbes might be present in the PBS, but the PBS-added bananas were incubated for 4 days, so it is also unlikely that a detectable amount of DNA remained until sample collection. Furthermore, we believe that no contamination occurred during sample preparation. Banana peels were treated with 70% ethanol before removing them extremely carefully to avoid touching the fruit inside. All tools were sterilized before use. Taking all of these into account, we speculate that the microbes were already present in the bananas before peeling. We added the details of the sample preparation processes in MATERIALS AND METHODS (lines 518-521 and 540).

      Other major revisions

      1. We deposited our yeast genome annotation data in the DDBJ Annotated/Assembled Sequences database, and the accession numbers have been added to the ‘Data availability’ section in MATERIALS AND METHODS (lines 868-873).

      2. The bacterial composition data in Figure 1B was corrected, because in the original version, the data for Place 3 and Place 4 was plotted in reverse. The original and revised plots are shown side by side in Author response image 3. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p5, lines 117-120).

      Author response image 3.

      Comparison of the original and revised version of bacterial composition graph in Figure 1B. Comparison of the original (left) and revised (right) version of the graph at the bottom of Figure 1B, which shows the result of bacterial composition analysis. The color key, which is unmodified, is placed below the revised version.

      1. The plot data and labels in the RNA-seq result heatmaps (Figures 3A and 4C) have been corrected. In these figures, row Z-scores of log2(TPM + 1) were to be plotted, as indicated by the key in each figure. However, in the original version, row Z-scores of TPM was erroneously plotted. Thus, Figures 3A and 4C of the original version have been replaced with the correct plots, and the original and revised plots are shown side by side in Author response images 4A and 4B. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p7, lines 222-226 and p9, lines 277-281).

      Author response image 4.

      Comparison of the original and revised version of Figures 3A and 4C. (A and B) Comparison of the original (left) and revised (right) version of Figures 3A (A) or 4C (B).

      1. The keys in the original Figures 3D and 4F indicate that log2(fold change) was used to plot all data. However, when plotting the data from the previous study (Zinke et al., 2002), their “fold change value” was used. We have corrected the keys, plots, and legend of Figure 3D to reflect the different nature of the data from our RNA-seq analysis and those from microarray analysis by Zinke et al. The original and revised plots are shown side by side in Author response image 5. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p7, lines 228230 and p9, 277-284).

      Author response image 5.

      Comparison of the original and revised version of Figures 3D and 4F. (A and B) Comparison of the original (left) and revised (right) version of Figures 3D (A) or 4F (B).

      1. The labels in Figure S5C and S5D (Figure S4C and S4D in the original version) have been corrected (they are "Pichia kluyveri > Supportive" and "Starmerella bacillaris > Supportive" rather than "Non-support. > H. uva" and "Non-support. > K. hum"). Additionally, we have reintroduced the circle indicating the number of “dme04070: Phosphatidylinositol signaling system” DEGs in Figure S5D, which was missing in Figure S4D of the original version. The original and revised figures are shown in Author response image 6.

      Author response image 6.

      Comparison of the original and revised version of Figures S5C and S5D. (A and B) Comparison of the original (left) and revised (right) versions of Figures S5C (A) or S5D (B). The original figures corresponding to the aforementioned figures were Figures S4C and S4D, respectively.

      1. The "Fermentation stage" column in Table 1, which indicated whether each microbe was considered an early-stage microbe or a late-stage microbe, has been removed to avoid confusion. This is because some of the microbes (Hanseniaspora uvarum, Pichia kluyveri, and Pantoea agglomerans) were employed in both of the feeding experiments using the microbes detected from the early-stage foods (Figures 2A, 2B, S2A, and S2B) and those from the late-stage foods (Figures 2C, 2D, S2C, and S2D).

      2. The leftmost column in Table S7 has been edited to indicate species names rather than “Sample IDs,” because the IDs were not used in anywhere else in the paper.

      Reference

      Chandler, J. A., Lang, J., Bhatnagar, S., Eisen, J. A. and Kopp, A. (2011). Bacterial communities of diverse Drosophila species: Ecological context of a host-microbe model system. PLoS Genetics 7, e1002272.

      Chandler, J. A., Eisen, J. A. and Kopp, A. (2012). Yeast communities of diverse Drosophila species: Comparison of two symbiont groups in the same hosts. Applied and Environmental Microbiology 78, 7327–7336.

      Cho, H. and Rohlfs, M. (2023). Transmission of beneficial yeasts accompanies offspring production in Drosophila—An initial evolutionary stage of insect maternal care through manipulation of microbial load? Ecology and Evolution 13, e10184.

      Consuegra, J., Grenier, T., Akherraz, H., Rahioui, I., Gervais, H., da Silva, P. and Leulier, F. (2020). Metabolic Cooperation among Commensal Bacteria Supports Drosophila Juvenile Growth under Nutritional Stress. iScience 23, 101232.

      Dodge, R., Jones, E. W., Zhu, H., Obadia, B., Martinez, D. J., Wang, C., Aranda-Díaz, A., Aumiller, K., Liu, Z., Voltolini, M., et al. (2023). A symbiotic physical niche in Drosophila melanogaster regulates stable association of a multi-species gut microbiota. Nat Commun 14, 1557.

      Erkosar, B., Storelli, G., Mitchell, M., Bozonnet, L., Bozonnet, N. and Leulier, F. (2015). Pathogen Virulence Impedes Mutualist-Mediated Enhancement of Host Juvenile Growth via Inhibition of Protein Digestion. Cell Host & Microbe 18, 445–455.

      Hanson, M. A. and Lemaitre, B. (2020). New insights on Drosophila antimicrobial peptide function in host defense and beyond. Current Opinion in Immunology 62, 22–30.

      Henriques, S. F., Dhakan, D. B., Serra, L., Francisco, A. P., Carvalho-Santos, Z., Baltazar, C., Elias, A. P., Anjos, M., Zhang, T., Maddocks, O. D. K., et al. (2020). Metabolic cross-feeding in imbalanced diets allows gut microbes to improve reproduction and alter host behaviour. Nat Commun 11, 4236.

      Oka, M., Hashimoto, K., Yamaguchi, Y., Saitoh, S., Sugiura, Y., Motoi, Y., Honda, K., Kikko, Y., Ohata, S., Suematsu, M., et al. (2017). Arl8b is required for lysosomal degradation of maternal proteins in the visceral yolk sac endoderm of mouse embryos. Journal of Cell Science jcs.200519.

      Pais, I. S., Valente, R. S., Sporniak, M. and Teixeira, L. (2018). Drosophila melanogaster establishes a species-specific mutualistic interaction with stable gut-colonizing bacteria. PLOS Biology 16, e2005710.

      Piper, M. D. W., Blanc, E., Leitão-Gonçalves, R., Yang, M., He, X., Linford, N. J., Hoddinott, M. P., Hopfen, C., Soultoukis, G. A., Niemeyer, C., et al. (2014). A holidic medium for Drosophila melanogaster. Nature Methods 11, 100–105.

      Piper, M. D. W., Soultoukis, G. A., Blanc, E., Mesaros, A., Herbert, S. L., Juricic, P., He, X., Atanassov, I., Salmonowicz, H., Yang, M., et al. (2017). Matching Dietary Amino Acid Balance to the In Silico-Translated Exome Optimizes Growth and Reproduction without Cost to Lifespan. Cell Metab 25, 610–621.

      Quan, A. S. and Eisen, M. B. (2018). The ecology of the drosophila-yeast mutualism in wineries. PLOS ONE 13, e0196440.

      Solomon, G. M., Dodangoda, H., McCarthy-Walker, T. T., Ntim-Gyakari, R. R. and Newell, P. D. (2019). The microbiota of Drosophila suzukii influences the larval development of Drosophila melanogaster. PeerJ 7, e8097.

      Zinke, I., Schütz, C. S., Katzenberger, J. D., Bauer, M. and Pankratz, M. J. (2002). Nutrient control of gene expression in Drosophila: microarray analysis of starvation and sugar-dependent response. The EMBO Journal 21, 6162–6173.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their thoughtful comments and constructive suggestions. Point-by-point responses to comments are given below:

      Reviewer #1 (Recommendations For The Authors):

      This manuscript provides an important case study for in-depth research on the adaptability of vertebrates in deep-sea environments. Through analysis of the genomic data of the hadal snailfish, the authors found that this species may have entered and fully adapted to extreme environments only in the last few million years. Additionally, the study revealed the adaptive features of hadal snailfish in terms of perceptions, circadian rhythms and metabolisms, and the role of ferritin in high-hydrostatic pressure adaptation. Besides, the reads mapping method used to identify events such as gene loss and duplication avoids false positives caused by genome assembly and annotation. This ensures the reliability of the results presented in this manuscript. Overall, these findings provide important clues for a better understanding of deep-sea ecosystems and vertebrate evolution.

      Reply: Thank you very much for your positive comments and encouragement.

      However, there are some issues that need to be further addressed.

      1. L119: Please indicate the source of any data used.

      Reply: Thank you very much for the suggestion. All data sources used are indicated in Supplementary file 1.

      1. L138: The demographic history of hadal snailfish suggests a significant expansion in population size over the last 60,000 years, but the results only show some species, do the results for all individuals support this conclusion?

      Reply: Thank you for this suggestion. The estimated demographic history of the hadal snailfish reveals a significant population increase over the past 60,000 years for all individuals. The corresponding results have been incorporated into Figure 1-figure supplements 8B.

      Author response image 1.

      (B) Demographic history for 5 hadal snailfish individuals and 2 Tanaka’s snailfish individuals inferred by PSMC. The generation time of one year for Tanaka snailfish and three years for hadal snailfish.

      1. Figure 1-figure supplements 8: Is there a clear source of evidence for the generation time of 1 year chosen for the PSMC analysis?

      Reply: We apologize for the inclusion of an incorrect generation time in Figure 1-figure supplements 8. It is important to note that different generation times do not change the shape of the PSMC curve, they only shift the curve along the axis. Due to the absence of definitive evidence regarding the generation time of the hadal snailfish, we have referred to Wang et al., 2019, assuming a generation time of one year for Tanaka snailfish and three years for hadal snailfish. The generation time has been incorporated into the main text (lines 516-517): “The generation time of one year for Tanaka snailfish and three years for hadal snailfish.”.

      1. L237: Transcriptomic data suggest that the greatest changes in the brain of hadal snailfish compared to Tanaka's snailfish, what functions these changes are specifically associated with, and how these functions relate to deep-sea adaptation.

      Reply: Thank you for this suggestion. Through comparative transcriptome analysis, we identified 3,587 up-regulated genes and 3,433 down-regulated genes in the brains of hadal snailfish compared to Tanaka's snailfish. Subsequently, we conducted Gene Ontology (GO) functional enrichment analysis on the differentially expressed genes, revealing that the up-regulated genes were primarily associated with cilium, DNA repair, protein binding, ATP binding, and microtubule-based movement. Conversely, the down-regulated genes were associated with membranes, GTP-binding, proton transmembrane transport, and synaptic vesicles, as shown in following table (Supplementary file 15). Previous studies have shown that high hydrostatic pressure induces DNA strand breaks and damage, and that DNA repair-related genes upregulated in the brain may help hadal snailfish overcome these challenges.

      Author response table 1.

      GO enrichment of expression up-regulated and down-regulated genes in hadal snailfish brain.

      We have added new results (Supplementary file 15) and descriptions to show the changes in the brains of hadal snailfish (lines 250-255): “Specifically, there are 3,587 up-regulated genes and 3,433 down-regulated genes in the brain of hadal snailfish compared to Tanaka snailfish, and Gene Ontology (GO) functional enrichment analyses revealed that up-regulated genes in the hadal snailfish are associated with cilium, DNA repair, and microtubule-based movement, while down-regulated genes are enriched in membranes, GTP-binding, proton transmembrane transport, and synaptic vesicles (Supplementary file 15).”

      1. L276: What is the relationship between low bone mineralization and deep-sea adaptation, and can low mineralization help deep-sea fish better adapt to the deep sea?

      Reply: Thank you for this suggestion. The hadal snailfish exhibits lower bone mineralization compared to Tanaka's snailfish, which may have facilitated its adaptation to the deep sea. On one hand, this reduced bone mineralization could have contributed to the hadal snailfish's ability to maintain neutral buoyancy without excessive energy expenditure. On the other hand, the lower bone mineralization may have also rendered their skeleton more flexible and malleable, enhancing their resilience to high hydrostatic pressure. Accordingly, we added the following new descriptions (lines 295-300): “Nonetheless, micro-CT scans have revealed shorter bones and reduced bone density in hadal snailfish, from which it has been inferred that this species has reduced bone mineralization (M. E. Gerringer et al., 2021); this may be a result of lowering density by reducing bone mineralization, allowing to maintain neutral buoyancy without expending too much energy, or it may be a result of making its skeleton more flexible and malleable, which is able to better withstand the effects of HHP.”

      1. L293: The abbreviation HHP was mentioned earlier in the article and does not need to be abbreviated here.

      Reply: Thank you for the correction. We have corrected the word. Line 315.

      1. L345: It should be "In addition, the phylogenetic relationships between different individuals clearly indicate that they have successfully spread to different trenches about 1.0 Mya".

      Reply: Thank you for the correction. We have corrected the word. Line 374.

      1. It is curious what functions are associated with the up-regulated and down-regulated genes in all tissues of hadal snailfish compared to Tanaka's snailfish, and what functions have hadal snailfish lost in order to adapt to the deep sea?

      Reply: Thank you for this suggestion. We added a description of this finding in the results section (lines 337-343): “Next, we identified 34 genes that are significantly more highly expressed in all organs of hadal snailfish in comparison to Tanaka’s snailfish and zebrafish, while only seven genes were found to be significantly more highly expressed in Tanaka’s snailfish using the same criterion (Figure 5-figure supplements 1). The 34 genes are enriched in only one GO category, GO:0000077: DNA damage checkpoint (Adjusted P-value: 0.0177). Moreover, five of the 34 genes are associated with DNA repair.” This suggests that up-regulated genes in all tissues in hadal snailfish are associated with DNA repair in response to DNA damage caused by high hydrostatic pressure, whereas down-regulated genes do not show enrichment for a particular function.

      Overall, the functions lost in hadal snailfish adapted to the deep sea are mainly related to the effects of the dark environment, which can be summarized as follows (lines 375-383): “The comparative genomic analysis revealed that the complete absence of light had a profound effect on the hadal snailfish. In addition to the substantial loss of visual genes and loss of pigmentation, many rhythm-related genes were also absent, although some rhythm genes were still present. The gene loss may not only come from relaxation of natural selection, but also for better adaptation. For example, the grpr gene copies are absent or down-regulated in hadal snailfish, which could in turn increased their activity in the dark, allowing them to survive better in the dark environment (Wada et al., 1997). The loss of gpr27 may also increase the ability of lipid metabolism, which is essential for coping with short-term food deficiencies (Nath et al., 2020).”

      Reviewer #2 (Recommendations For The Authors):

      I have pointed out some of the examples that struck me as worthy of additional thought/writing/comments from the authors. Any changes/comments are relatively minor.

      Reply: Thank you very much for your positive comments on this work.

      For comparative transcriptome analyses, reads were mapped back to reference genomes and TPM values were obtained for gene-level count analyses. 1:1 orthologs were used for differential expression analyses. This is indeed the only way to normalize counts across species, by comparing the same gene set in each species. Differential expression statistics were run in DEseq2. This is a robust way to compare gene expression across species and where fold-change values are reported (e.g. Fig 3, creatively by coloring the gene name) the values are best-practice.

      In other places, TPM values are reported (e.g. Fig 2D, Fig 4C, Fig 5A, Fig 4-Fig supp 4) to illustrate expression differences within a tissue across species. The comparisons look robust, although it is not made clear how the values were obtained in all cases. For example, in Fig 2D the TPM values appear to be from eyes of individual fish, but in Fig 4C and 5A they must be some kind of average? I think that information should be added to the figure legends.

      Of note: TPM values are sensitive to the shape of the RNA abundance distribution from a given sample: A small number of very highly expressed genes might bias TPM values downward for other genes. From one individual to another or from one species to another, it is not obvious to me that we should expect the same TPM distribution from the same tissues, making it a challenging metric for comparison across samples, and especially across species. An alternative measure of RNA abundance is normalized counts that can be output from DEseq2. See:

      Zhao, Y., Li, M.C., Konaté, M.M., Chen, L., Das, B., Karlovich, C., Williams, P.M., Evrard, Y.A., Doroshow, J.H. and McShane, L.M., 2021. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository. Journal of translational medicine, 19(1), pp.1-15.

      If the authors would like to keep the TPM values, I think it would be useful for them to visualize the TPM value distribution that the numbers were derived from. One way to do this would be to make a violin plot for species/tissue and plot the TPM values of interest on that. That would give a visualization of the ranked value of the gene within the context of all other TPM values. A more highly expressed gene would presumably have a higher rank in context of the specific tissue/species and be more towards the upper tail of the distribution. An example violin plot can be found in Fig 6 of:

      Burns, J.A., Gruber, D.F., Gaffney, J.P., Sparks, J.S. and Brugler, M.R., 2022. Transcriptomics of a Greenlandic Snailfish Reveals Exceptionally High Expression of Antifreeze Protein Transcripts. Evolutionary Bioinformatics, 18, p.11769343221118347.

      Alternatively, a comparison of TPM and normalized count data (heatmaps?) would be of use for at least some of the reported TPM values to show whether the different normalization methods give comparable outputs in terms of differential expression. One reason for these questions is that DEseq2 uses normalized counts for statistical analyses, but values are expressed as TPM in the noted figures (yes, TPM accounts for transcript length, but can still be subject to distribution biases).

      Reply: Thank you for your suggestions. Following your suggestions, we modified Fig 2D, Fig 4C, Fig 4-Fig supp 4, and Fig 5-Fig supp 1, respectively. In the differential expression analyses, only one-to-one orthologues of hadal snailfish and Tanaka's snailfish can get the normalized counts output by DEseq2, so we showed the normalized counts by DEseq2 output for Fig 2D, Fig 4C, Fig 4-Fig supp 4, Fig 5-Fig supp 1, and for Fig 5A, since the copy number of fthl27 genes undergoes specific expansion in hadal snailfish, we visualized the ranking of all fthl27 genes across tissues by plotting violins in Fig 5-Fig supp 2.

      Author response image 2.

      (D) Log10-transformation normalized counts for DESeq2 (COUNTDESEQ2) of vision-related genes in the eyes of hadal snailfish and Tanka's snailfish. * represents genes significantly downregulated in hadal snailfish (corrected P < 0.05).

      Author response image 3.

      (C) The deletion of one copy of grpr and another copy of down-regulated expression in hadal snailfish. The relative positions of genes on chromosomes are indicated by arrows, with arrows to the right representing the forward strand and arrows to the left representing the reverse strand. The heatmap presented is the average of the normalized counts for DESeq2 (COUNTDESEQ2) in all replicate samples from each tissue. * represents tissue in which the grpr-1 was significantly down-regulated in hadal snailfish (corrected P < 0.05).

      Author response image 4.

      Expression of the vitamin D related genes in various tissues of hadal snailfish and Tanaka's snailfish. The heatmap presented is the average of the normalized counts for DESeq2 (COUNTDESEQ2) in all replicate samples from each tissue.

      Author response image 5.

      (B) Expression of the ROS-related genes in different tissues of hadal snailfish and Tanaka's snailfish. The heatmap presented is the average of the normalized counts for DESeq2 (COUNTDESEQ2) in all replicate samples from each tissue.

      Author response image 6.

      Ranking of the expression of individual copies of fthl27 gene in hadal snailfish and Tanaka's snailfish in various tissues showed that all copies of fthl27 in hadal snailfish have high expression. The gene expression presented is the average of TPM in all replicate samples from each tissue.

      Line 96: Which BUSCOs? In the methods it is noted that the actinopterygii_odb10 BUSCO set was used. I think it should also be noted here so that it is clear which BUSCO set was used for completeness analysis. It could even be informally the ray-finned fish BUSCOs or Actinopterygii BUSCOs.

      Reply: Thank you for this suggestion. We used Actinopterygii_odb10 database and we added the BUSCO set to the main text as follows (lines 92-95): “The new assembly filled 1.26 Mb of gaps that were present in our previous assembly and have a much higher level of genome continuity and completeness (with complete BUSCOs of 96.0 % [Actinopterygii_odb10 database]) than the two previous assemblies.”

      Lines 102-105: The medaka genome paper proposes the notion that the ancestral chromosome number between medaka, tetraodon, and zebrafish is 24. There may be other evidence of that too. Some of that evidence should be cited here to support the notion that sticklebacks had chromosome fusions to get to 21 chromosomes rather than scorpionfish having chromosome fissions to get to 24. Here's the medaka genome paper:

      Kasahara, M., Naruse, K., Sasaki, S., Nakatani, Y., Qu, W., Ahsan, B., Yamada, T., Nagayasu, Y., Doi, K., Kasai, Y. and Jindo, T., 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature, 447(7145), pp.714-719.

      Reply: Thank you for your great suggestion. Accordingly, we modified the sentence and added the citation as follows (lines 100-105): “We noticed that there is no major chromosomal rearrangement between hadal snailfish and Tanaka’s snailfish, and chromosome numbers are consistent with the previously reported MTZ-ancestor (the last common ancestor of medaka, Tetraodon, and zebrafish) (Kasahara et al., 2007), while the stickleback had undergone several independent chromosomal fusion events (Figure 1-figure supplements 4).”

      Line 161-173: "Along with the expression data, we noticed that these genes exhibit a different level of relaxation of natural selection in hadal snailfish (Figure 2B; Figure 2-figure supplements 1)." With the above statment and evidence, the authors are presumably referring to gene losses and differences in expression levels. I think that since gene expression was not measured in a controlled way it may not be a good measure of selection throughout. The reported genes could be highly expressed under some other condition, selection intact. I find Fig2-Fig supp 1 difficult to interpret. I assume I am looking for regions where Tanaka’s snailfish reads map and Hadal snailfish reads do not, but it is not abundantly clear. Also, other measures of selection might be good to investigate: accumulation of mutations in the region could be evidence of relaxed selection, for example, where essential genes will accumulate fewer mutations than conditional genes or (presumably) genes that are not needed at all. The authors could complete a mutational/SNP analysis using their genome data on the discussed genes if they want to strengthen their case for relaxed selection. Here is a reference (from Arabidopsis) showing these kinds of effects:

      Monroe, J.G., Srikant, T., Carbonell-Bejerano, P., Becker, C., Lensink, M., Exposito-Alonso, M., Klein, M., Hildebrandt, J., Neumann, M., Kliebenstein, D. and Weng, M.L., 2022. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature, 602(7895), pp.101-105.

      Reply: Thank you for pointing out this important issue. Following your suggestion, we have removed the mention of the down-regulation of some visual genes in the eyes of hadal snailfish and the results of the original Fig2-Fig supp 1 that were based on reads mapping to confirm whether the genes were lost or not. To investigate the potential relaxation of natural selection in the opn1sw2 gene in hadal snailfish, we conducted precise gene structure annotation. Our findings revealed that the opn1sw2 gene is pseudogenized in hadal snailfish, indicating a relaxation of natural selection. We have included this result in Figure 2-figure supplements 1.

      Author response image 7.

      Pseudogenization of opn1sw2 in hadal snailfish. The deletion changed the protein’s sequence, causing its premature termination.

      Accordingly, we have toned down the related conclusions in the main text as follows (lines 164-173): “We noticed that the lws gene (long wavelength) has been completely lost in both hadal snailfish and Tanaka’s snailfish; rh2 (central wavelength) has been specifically lost in hadal snailfish (Figure 2B and 2C); sws2 (short wavelength) has undergone pseudogenization in hadal snailfish (Figure 2-figure supplements 1); while rh1 and gnat1 (perception of very dim light) is both still present and expressed in the eyes of hadal snailfish (Figure 2D). A previous study has also proven the existence of rhodopsin protein in the eyes of hadal snailfish using proteome data (Yan, Lian, Lan, Qian, & He, 2021). The preservation and expression of genes for the perception of very dim light suggests that they are still subject to natural selection, at least in the recent past.”

      Line 161-170: What tissue were the transcripts derived from for looking at expression level of opsins? Eyes?

      Reply: Thank you for your suggestions. The transcripts used to observe the expression levels of optic proteins were obtained from the eye.

      Line 191: What does tmc1 do specifically?

      Reply: Thank you for this suggestion. The tmc1 gene encodes transmembrane channel-like protein 1, involved in the mechanotransduction process in sensory hair cells of the inner ear that facilitates the conversion of mechanical stimuli into electrical signals used for hearing and homeostasis. We added functional annotations for the tmc1 in the main text (lines 190-196): “Of these, the most significant upregulated gene is tmc1, which encodes transmembrane channel-like protein 1, involved in the mechanotransduction process in sensory hair cells of the inner ear that facilitates the conversion of mechanical stimuli into electrical signals used for hearing and homeostasis (Maeda et al., 2014), and some mutations in this gene have been found to be associated with hearing loss (Kitajiri, Makishima, Friedman, & Griffith, 2007; Riahi et al., 2014).”

      Line 208: "it is likely" is a bit proscriptive

      Reply: Thank you for this suggestion. We rephrased the sentence as follows (lines 213-215): “Expansion of cldnj was observed in all resequenced individuals of the hadal snailfish (Supplementary file 10), which provides an explanation for the hadal snailfish breaks the depth limitation on calcium carbonate deposition and becomes one of the few species of teleost in hadal zone.”

      Line 199: maybe give a little more info on exactly what cldnj does? e.g. "cldnj encodes a claudin protein that has a role in tight junctions through calcium independent cell-adhesion activity" or something like that.

      Reply: Thank you for this suggestion. We have added functional annotations for the cldnj to the main text (lines 200-204): “Moreover, the gene involved in lifelong otolith mineralization, cldnj, has three copies in hadal snailfish, but only one copy in other teleost species, encodes a claudin protein that has a role in tight junctions through calcium independent cell-adhesion activity (Figure 3B, Figure 3C) (Hardison, Lichten, Banerjee-Basu, Becker, & Burgess, 2005).”

      Lines 199-210: Paragraph on cldnj: there are extra cldnj genes in the hadal snailfish, but no apparent extra expression. Could the authors mention that in their analysis/discussion of the data?

      Reply: Thank you for your suggestions. Despite not observing significant changes in cldnj expression in the brain tissue of hadal snailfish compared to Tanaka's snailfish, it is important to consider that the brain may not be the primary site of cldnj expression. Previous studies in zebrafish have consistently shown expression of cldnj in the otocyst during the critical early growth phase of the otolith, with a lower level of expression observed in the zebrafish brain. However, due to the unavailability of otocyst samples from hadal snailfish in our current study, our findings do not provide confirmation of any additional expression changes resulting from cldnj amplification. Consequently, it is crucial to conduct future comprehensive investigations to explore the expression patterns of cldnj specifically in the otocyst of hadal snailfish. Accordingly, we added a discussion of this result in the main text (lines 209-214): “In our investigation, we found that the expression of cldnj was not significantly up-regulated in the brain of the hadal snailfish than in Tanaka’s snailfish, which may be related to the fact that cldnj is mainly expressed in the otocyst, while the expression in the brain is lower. However, due to the immense challenge in obtaining samples of hadal snailfish, the expression of cldnj in the otocyst deserves more in-depth study in the future.”

      Lines 225-231: I wonder whether low expression of a circadian gene might be a time of day effect rather than an evolutionary trait. Could the authors comment?

      Reply: Thank you for your suggestions. Previous studies have shown that the grpr gene is expressed relatively consistently in mouse suprachiasmatic nucleus (SCN) throughout the day (Figure 4-figure supplements 1) and we hypothesize that the low expression of grpr-1 gene expression in hadal snailfish is an evolutionary trait. We have modified this result in the main text (lines 232-242): “In addition, in the teleosts closely related to hadal snailfish, there are usually two copies of grpr encoding the gastrin-releasing peptide receptor; we noticed that in hadal snailfish one of them is absent and the other is barely expressed in brain (Figure 4C), whereas a previous study found that the grpr gene in the mouse suprachiasmatic nucleus (SCN) did not fluctuate significantly during a 24-hour light/dark cycle and had a relatively stable expression (Pembroke, Babbs, Davies, Ponting, & Oliver, 2015) (Figure 4-figure supplements 1). It has been reported that grpr deficient mice, while exhibiting normal circadian rhythms, show significantly increased locomotor activity in dark conditions (Wada et al., 1997; Zhao et al., 2023). We might therefore speculate that the absence of that gene might in some way benefit the activity of hadal snailfish under complete darkness.”

      Author response image 8.

      (B) Expression of the grpr in a 24-hour light/dark cycle in the mouse suprachiasmatic nucleus (SCN). Data source with http://www.wgpembroke.com/shiny/SCNseq.

      Line 253: What is gpr27? G protein coupled receptor?

      Reply: We apologize for the ambiguous description. Gpr27 is a G protein-coupled receptor, belonging to the family of cell surface receptors. We introduced gpr27 in the main text as follows (lines 270-273): “Gpr27 is a G protein-coupled receptor, belonging to the family of cell surface receptors, involved in various physiological processes and expressed in multiple tissues including the brain, heart, kidney, and immune system.”

      Line 253: Fig4 Fig supp 3 is a good example of pseudogenization!

      Reply: Thank you very much for your recognition.

      Line 279: What is bglap? It regulates bone mineralization, but what specifically does that gene do?

      Reply: We apologize for the ambiguous description. The bglap gene encodes a highly abundant bone protein secreted by osteoblasts that binds calcium and hydroxyapatite and regulates bone remodeling and energy metabolism. We introduced bglap in the main text as follows (lines 300-304): “The gene bglap, which encodes a highly abundant bone protein secreted by osteoblasts that binds calcium and hydroxyapatite and regulates bone remodeling and energy metabolism, had been found to be a pseudogene in hadal fish (K. Wang et al., 2019), which may contribute to this phenotype.”

      Line 299: Introduction of another gene without providing an exact function: acaa1.

      Reply: We apologize for the ambiguous description. The acaa1 gene encodes acetyl-CoA acetyltransferase 1, a key regulator of fatty acid β-oxidation in the peroxisome, which plays a controlling role in fatty acid elongation and degradation. We introduced acaa1 in the main text as follows (lines 319-324): “In regard to the effect of cell membrane fluidity, relevant genetic alterations had been identified in previous studies, i.e., the amplification of acaa1 (encoding acetyl-CoA acetyltransferase 1, a key regulator of fatty acid β-oxidation in the peroxisome, which plays a controlling role in fatty acid elongation and degradation) may increase the ability to synthesize unsaturated fatty acids (Fang et al., 2000; K. Wang et al., 2019).”

      Fig 5 legend: The DCFH-DA experiment is not an immunofluorescence assay. It is better described as a redox-sensitive fluorescent probe. Please take note throughout.

      Reply: Thank you for pointing out our mistakes. We corrected the word. Line 1048 and 1151 as follows: “ROS levels were confirmed by redox-sensitive fluorescent probe using DCFH-DA molecular probe in 293T cell culture medium with or without fthl27-overexpression plasmid added with H2O2 or FAC for 4 hours.”

      Line 326: Manuscript notes that ROS levels in transfected cells are "significantly lower" than the control group, but there is no quantification or statistical analysis of ROS levels. In the methods, I noticed the mention of flow cytometry, but do not see any data from that experiment. Proportion of cells with DCFH-DA fluorescence above a threshold would be a good statistic for the experiment... Another could be average fluorescence per cell. Figure 5B shows some images with green dots and it looks like more green in the "control" (which could better be labeled as "mock-transfection") than in the fthl27 overexpression, but this could certainly be quantified by flow cytometry. I recommend that data be added.

      Reply: Thank you for your suggestions. We apologize for the error in the main text, we used a fluorescence microscope to observe fluorescence in our experiments, not a flow cytometer. We have corrected it in the methods section as follows (lines 651-653): “ROS levels were measured using a DCFH-DA molecular probe, and fluorescence was observed through a fluorescence microscope with an optional FITC filter, with the background removed to observe changes in fluorescence.” Meanwhile, we processed the images with ImageJ to obtain the respective mean fluorescence intensities (MFI) and found that the MFI of the fthl27-overexpression cells were lower than the control group, which indicated that the ROS levels of the fthl27-overexpression cells were significantly lower than the control group. MFI has been added to Figure 5B.

      Author response image 9.

      ROS levels were confirmed by redox-sensitive fluorescent probe using DCFH-DA molecular probe in 293T cell culture medium with or without fthl27-overexpression plasmid added with H2O2 or FAC for 4 hours. Images are merged from bright field images with fluorescent images using ImageJ, while the mean fluorescence intensity (MFI) is also calculated using ImageJ. Green, cellular ROS. Scale bars equal 100 μm.

      Regarding the ROS experiment: Transfection of HEK293T cells should be reasonably straightforward, and the experiment was controlled appropriately with a mock transfection, but some additional parameters are still needed to help interpret the results. Those include: Direct evidence that the transfection worked, like qPCR, western blots (is the fthl27 tagged with an antigen?), coexpression of a fluorescent protein. Then transfection efficiency should be calculated and reported.

      Reply: Thank you for your suggestions. To assess the success of the transfection, we randomly selected a subset of fthl27-transfected HEK293T cells for transcriptome sequencing. This approach allowed us to examine the gene expression profiles and confirm the efficacy of the transfection process. As control samples, we obtained transcriptome data from two untreated HEK293T cells (SRR24835259 and SRR24835265) from NCBI. Subsequently, we extracted the fthl27 gene sequence of the hadal snailfish, along with 1,000 bp upstream and downstream regions, as a separate scaffold. This scaffold was then merged with the human genome to assess the expression levels of each gene in the three transcriptome datasets. The results demonstrated that the fthl27 gene exhibited the highest expression in fthl27-transfected HEK293T cells, while in the control group, the expression of the fthl27 gene was negligible (TPM = 0). Additionally, the expression patterns of other highly expressed genes were similar to those observed in the control group, confirming the successful fthl27 transfection. These findings have been incorporated into Figure 5-figure supplements 3.

      Author response image 10.

      (B) Reads depth of fthl27 gene in fthl27-transfected HEK293T cells and 2 untreated HEK293T cells (SRR24835259 and SRR24835265) transcriptome data. (C) Expression of each gene in the transcriptome data of fthl27-transfected HEK293T cells and 2 untreated HEK293T cells (SRR24835259 and SRR24835265), where the genes shown are the 4 most highly expressed genes in each sample.

      Lines 383-386: expression of DNA repair genes is mentioned, but not shown anywhere in the results?

      Reply: Thank you for your suggestions. Accordingly, we added a description of this finding in the results section (lines 337-343): “Next, we identified 34 genes that are significantly more highly expressed in all organs of hadal snailfish in comparison to Tanaka’s snailfish and zebrafish, while only seven genes were found to be significantly more highly expressed in Tanaka’s snailfish using the same criterion (Figure 5-figure supplements 1). The 34 genes are enriched in only one GO category, GO:0000077: DNA damage checkpoint (Adjusted P-value: 0.0177). Moreover, five of the 34 genes are associated with DNA repair.”. And we added the information in the Figure 5-figure supplements 1C.

      Author response image 11.

      (C) Genes were significantly more highly expressed in all tissues of the hadal snailfish compared to Tanaka's snailfish, and 5 genes (purple) were associated with DNA repair.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #3 comment

      1) One suggestion for improvement is to consider incorporating the results from Figure S9 into in the main Figure 6, which would enhance readers' comprehension.

      We appreciate your valuable feedback. Based on the reviewer’s suggestion, we have incorporated results from the Figure S9 into the main Figure 6, as shown below. Manuscripts and figure legends have also been modified accordingly.

      Author response image 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly per the following comments:

      1. Pro1153Leu is extremely common in the general population (allele frequency in gnomAD is 0.5). Further discussion is warranted to justify the possibility that this variant contributes to a phenotype documented in 1.5-3% of the population. Is it possible that this variant is tagging other rare SNPs in the COL11A1 locus, and could any of the existing exome sequencing data be mined for rare nonsynonymous variants?

      One possible avenue for future work is to return to any existing exome sequencing data to query for rare variants at the COL11A1 locus. This should be possible for the USA MO case-control cohort. Any rare nonsynonymous variants identified should then be subjected to mutational burden testing, ideally after functional testing to diminish any noise introduced by rare benign variants in both cases and controls. If there is a significant association of rare variation in AIS cases, then they should consider returning to the other cohorts for targeted COL11A1 gene sequencing or whole exome sequencing (whichever approach is easier/less expensive) to demonstrate replication of the association.

      Response: Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Table below. Two of them (NM_080629.2:c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a GlyX-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We did conduct pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18), but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      1. COL11A1 p.Pro1335Leu is pursued as a direct candidate susceptibility locus, but the functional validation involves both: (a) a complementation assay in mouse GPCs, Figure 5; and (b) cultured rib cartilage cells from Col11a1-Ad5 Cre mice (Figure 4). Please address the following:

      2A. Is Pro1335Leu a loss of function, gain of function, or dominant negative variant? Further rationale for modeling this change in a Col11a1 loss of function cell line would be helpful.

      Response: Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

      2B. Expression appears to be augmented compared WT in Fig 5B, but there is no direct comparison of WT with variant.

      Response: Expression of the mutant (from the lentiviral expression vector) is increased compared to mutant. We observed this effect in repeated experiments. Sequencing confirmed that the mutant and wildtype constructs differed only at the position of the rs3753841 SNP. At this time, we cannot explain the difference in expression levels. Nonetheless, even when the variant COL11A1 is relatively overexpressed it fails to suppress MMP3 expression as observed for the wildtype form.

      2C. How do the authors know that their complementation data in Figure 5 are specific? Repetition of this experiment with an alternative common nonsynonymous variant in COL11A1 (such as rs1676486) would be helpful as a comparison with the expectation that it would be similar to WT.

      Response: We agree that testing an allelic series throughout COL11A1 could be informative, but we have shifted our resources toward in vivo experiments that we believe will ultimately be more informative for deciphering the mechanistic role of COL11A1 in MMP3 regulation and spine deformity.

      2D. The y-axes of histograms in panel A need attention and clarification. What is meant by power? Do you mean fold change?

      Response: Power is directly comparable to fold change but allows comparison of absolute expression levels between different genes.

      2E. Figure 5: how many technical and biological replicates? Confirm that these are stated throughout the figures.

      Response: Thank you for pointing out this oversight. This information has been added throughout.

      1. Figure 2: What does the gross anatomy of the IVD look like? Could the authors address this by showing an H&E of an adjacent section of the Fig. 2 A panels?

      Response: Panel 2 shows H&E staining. Perhaps the reviewer is referring to the WT and Pax1 KO images in Figure 3? We have now added H&E staining of WT and Pax1 KO IVD as supplemental Figure 3E to clarify the IVD anatomy.

      1. Page 9: "Cells within the IVD were negative for Pax1 staining ..." There seems to be specific PAX1 expression in many cells within the IVD, which is concerning if this is indeed a supposed null allele of Pax1. This data seems to support that the allele is not null.

      Response: We have now added updated images for the COL11A1 and PAX1 staining to include negative controls in which we omitted primary antibodies. As can be seen, there is faint autofluorescence in the PAX1 negative control that appears to explain the “specific staining” referred to by the reviewer. These images confirm that the allele is truly a null.

      1. There is currently a lack of evidence supporting the claim that "Col11a1 is positively regulated by Pax1 in mouse spine and tail". Therefore, it is necessary to conduct further research to determine the direct regulatory role of Pax1 on Col11a1.

      Response: We agree with the reviewer and have clarified that Pax1 may have either a direct or indirect role in Col11a1 regulation.

      1. There is no data linking loss of COL11A1 function and spine defects in the mouse model. Furthermore, due to the absence of P1335L point mutant mice, it cannot be confirmed whether P1335L can actually cause AIS, and the pathogenicity of this mutation cannot be directly verified. These limitations need to be clearly stated and discussed. A Col11a1 mouse mutant called chondroysplasia (cho), was shown to be perinatal lethal with severe endochondral defects (https://pubmed.ncbi.nlm.nih.gov/4100752/). This information may help contextualize this study.

      Response: We partially agree with the reviewer. Spine defects are reported in the cho mouse (for example, please see reference 36 Hafez et al). We appreciate the suggestion to cite the original Seegmiller et al 1971 reference and have added it to the manuscript.

      1. A recent article (PMID37462524) reported mutations in COL11A2 associated with AIS and functionally tested in zebrafish. That study should be cited and discussed as it is directly relevant for this manuscript.

      Response: We agree with the reviewer that this study provides important information supporting loss of function I type XI collagen in spinal deformity. Language to this effect has been added to the manuscript and this study is now cited in the paper.

      1. Please reconcile the following result on page 10 of the results: "Interestingly, the AISassociated gene Adgrg6 was amongst the most significantly dysregulated genes in the RNA-seq analysis (Figure 3c). By qRT-PCR analysis, expression of Col11a1, Adgrg6, and Sox6 were significantly reduced in female and male Pax1-/- mice compared to wild-type mice (Figure 3d-g)." In Figure 3f, the downregulation of Adgrg6 appears to be modest so how can it possibly be highlighted as one of the most significantly downregulated transcripts in the RNAseq data?

      Response: By “significant” we were referring to the P-value significance in RNAseq analysis, not in absolute change in expression. This language was clearly confusing, and we have removed it from the manuscript.

      1. It is incorrect to refer to the primary cell culture work as growth plate chondrocytes (GPCs), instead, these are primary costal chondrocyte cultures. These primary cultures have a mixture of chondrocytes at differing levels of differentiation, which may change differentiation status during the culturing on plastic. In sum, these cells are at best chondrocytes, and not specifically growth plate chondrocytes. This needs to be corrected in the abstract and throughout the manuscript. Moreover, on page 11 these cells are referred to as costal cartilage, which is confusing to the reader.

      Response: Thank you for pointing out these inconsistencies. We have changed the manuscript to say “costal chondrocytes” throughout.

      Minor points

      • On 10 of the Results: "These data support a mechanistic link between Pax1 and Col11a1, and the AIS-associated genes Gpr126 and Sox6, in affected tissue of the developing tail." qRT-PCR validation of Sox6, although significant, appears to be very modestly downregulated in KO. Please soften this statement in the text.

      Response: We have softened this statement.

      • Have you got any information about how the immortalized (SV40) costal cartilage affected chondrogenic differentiation? The expression of SV40 seemed to stimulate Mmp13 expression. Do these cells still make cartilage nodules? Some feedback on this process and how it affects the nature of the culture what be appreciated.

      Response: The “+ or –“ in Figure 5 refers to Ad5-cre. Each experiment was performed in SV40-immortalized costal chondrocytes. We have removed SV40 from the figure and have clarified the legend to say “qRT-PCR of human COL11A1 and endogenous mouse Mmp3 in SV40 immortalized mouse costal chondrocytes transduced with the lentiviral vector only (lanes 1,2), human WT COL11A1 (lane 3), or COL11A1P1335L. Otherwise we absolutely agree that understanding Mmp13 regulation during chondrocyte differentiation is important. We plan to study this using in vivo systems.

      • Figure 1: is the average Odds ratio, can this be stated in the figure legend?

      Response: We are not sure what is being asked here. The “combined odds ratio” is calculated as a weighted average of the log of the odds.

      • A more consistent use of established nomenclature for mouse versus human genes and proteins is needed.

      Human:GENE/PROTEIN

      Mouse: Gene/PROTEIN

      Response: Thank you for pointing this out. The nomenclature has been corrected throughtout the manuscript.

      • There is no Figure 5c, but a reference to results in the main text. Please reconcile. -There is no Figure 5-figure supplement 5a, but there is a reference to it in the main text. Please reconcile.

      Response: Figure references have been corrected.

      • Please indicate dilutions of all antibodies used when listed in the methods.

      Response: Antibody dilutions have been added where missing.

      • On page 25, there is a partial sentence missing information in the Histologic methods; "#S36964 Invitrogen, CA, USA)). All images were taken..."

      Response: We apologize for the error. It has been removed.

      • Table 1: please define all acronyms, including cohort names.

      Response: We apologize for the oversight. The legend to the Table has been updated with definitions of all acronyms.

      • Figure 2: Indicate that blue staining is DAPI in panel B. Clarify that "-ab" as an abbreviation is primary antibody negative.

      Response: A color code for DAPI and COL11A! staining has been added and “-ab” is now defined.

      • Page 4: ADGRG6 (also known as GPR126)...the authors set this up for ADGRG6 but then use GPR126 in the manuscript, which is confusing. For clarity, please use the gene name Adgrg6 consistently, rather than alternating with Gpr126.

      Response: Thank you for pointing this out. GPR126 has now been changed to ADGRG6 thoughout the manuscript.

      • REF 4: Richards, B.S., Sucato, D.J., Johnston C.E. Scoliosis, (Elsevier, 2020). Is this a book, can you provide more clarity in the Reference listing?

      Response: Thank you for pointing this out. This reference has been corrected.

      • While isolation was addressed, the methods for culturing Rat cartilage endplate and costal chondrocytes are poorly described and should be given more text.

      Response: Details about the cartilage endplate and costal chondrocyte isolation and culture have been added to the Methods.

      • Page 11: 1st paragraph, last sentence "These results suggest that Mmp3 expression"... this sentence needs attention. As written, I am not clear what the authors are trying to say.

      Response: This sentence has been clarified and now reads “These results suggest that Mmp3 expression is negatively regulated by Col11a1 in mouse costal chondrocytes.”

      • Page 13: line 4 from the bottom, "ECM-clearing"? This is confusing do you mean ECM degrading?

      Response: Yes and thank you. We have changed to “ECM-degrading”.

      • Please use version numbers for RefSeq IDs: e.g. NM_080629.3 instead of NM_080629 Response: This change has been made in the revised manuscript.

      • It would be helpful for readers if the ethnicity of the discovery case cohort was clearly stated as European ancestry in the Results main text.

      Response: “European ancestry” has been added at first description of the discovery cohort in the manuscript.

      • Avoid using the term "mutation" and use "variant" instead.

      Response: Thank you for pointing this out. “Variant” is now used throughout the manuscript.

      • Define error bars for all bar charts throughout and include individual data points overlaid onto bars.

      Response: Thank you. Error bars are now clarified in the Figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Positive comments:

      We appreciate the positive comments of the editor and reviewers. The editor noted that the paper presents a “technological advance” that has enabled “important insights about the brain circuits through which the cerebellum could participate in social interactions.” Reviewer 1 thought this was a “timely and important study with solid evidence for correlative conclusions” and that the experiments were “technically challenging” and “well-performed”. Reviewer 2 stated that the finding of correlated activity between the regions is “interesting as non-motor functions of the cerebellum are relatively little explored.” They also thought “that the data are presented clearly, and the manuscript is well-written”. Reviewer 3 mentioned that “this approach can be useful for many neuroscientists”. We thank all the positive comments from the editors and all the reviewers.

      Reviewer #1 (Public Review)

      While the novelty of the device is strongly emphasized, I find that its value is somewhat diminished by the wire-free device developed by the same group as it should thus be possible to perform calcium imaging wire-free and electrophysiological recording via a single conventional cable (or also via wireless headstages).

      While it would be potentially possible to use a wire-free Miniscope in parallel with a wired electrophysiology recording system, this would result in a larger footprint on the animal’s head, more than a gram in increased weight due to an added LiPo battery, a larger electrophysiology head-stage, and limited recording length due to a battery capacity of around 20 minutes. Our main goal for the development of the E-scope platform was to develop an expandable electrophysiology recording board that would work with all previously built UCLA Miniscopes while also streamlining the integration of power and data into the coaxial cable connection already familiar to hundreds of labs using Miniscopes. The vast majority of Miniscope experiments are done using wired systems and we aimed to support the expansion of those systems instead of requiring a more substantial switch to using wire-free Miniscopes.

      The role of the identified network activations in social interactions is not touched upon.

      We agree with the reviewer that we have not discovered a causal role for the co-modulated activity patterns we have observed. As these causal experiments will require the development of real-time techniques for blocking socially evoked changes in firing rate in cerebellum and ACC, we are currently planning experiments to address causality. These results will be described in a future publication.

      Reviewer #1 (Recommendations for the Authors):

      Please provide the number of recorded mice.

      The number is now provided in the revised manuscript.

      If the recorded areas (cerebellar cortex, DN, and ACC) are part of the same circuit regulating social interactions, it would be nice to get insights into the directionality of the circuit. The authors favor the possibility that during social behavior, cerebellar efferences indirectly influence ACC activities (as in Figure 4A), however, no evidence is presented to support this interpretation. ACC activities might also indirectly influence PC firing. It may be possible to get insights into this by comparing the timing of neuronal activity in the different areas with respect to social onset.

      For this study, we mainly focused on the output of the cerebellar circuit to the cortex as previous work shows that dentate nucleus projects to the thalamus, which in turn projects to ACC and other cortical regions. (Badura et al.,eLife, 2018; Kelly et al., Nat. Neurosci., 2020) The temporal resolution of calcium imaging is limited (with the rise time of calcium events with genetically-encoded indicators taking hundreds of milliseconds) such that the resolution is insufficient to precisely assess the relative onset timing of the two regions. Our work certainly does not rule out cortical influences on PC firing.

      Reviewer #2 (Public Review)

      However, the causal relationship is far from established with the methods used, leaving it unclear if these two brain regions are similarly engaged by the behavior or if they form a pathway/loop.

      As indicated in our response to Reviewer #1’s similar critique, the goal of the presented study is to demonstrate the feasibility and capabilities of this novel device. This new tool will allow us to conduct a comprehensive and rigorous study to assess the causal role of the interactions between the cerebellum and ACC in social behavior (as well as other behaviors). These experiments are being designed now.

      Reviewer #2 (Recommendations for the Authors):

      It is unclear what is entirely unique about the E-scope. It seems that its advance is simply a common cable that allows interfacing with both devices (lighter weight than two cables is stated in the Discussion). Is this really an advance? What are its limitations? E.g., how close can the recording sites be to one another? How can it be configured for any other extracellular recording approach (tetrodes, 64-channel arrays, or Neuropixels)?

      In our experience, multiple lines of wires tethered to different head-mounted devices on an animal significantly impacts their behavior. Therefore, one of the major advantages of the UCLA Miniscope Platform is the use of a single, flexible coaxial cable to minimize the impact on tethering on behavior. The E-Scope platform builds on top of this work by incorporating electrophysiology recording capabilities into this single, flexible coaxial cable. Additionally, the electrophysiology recording hardware is backwards compatible with all previously built UCLA Miniscopes and can run through open-source and commercial commutators already used in Miniscope experiments.

      The available bandwidth within the shared single coaxial cable can handle megapixel Miniscope imaging along with the maximum data output of a 32 channel Intan Ephys IC. The E-Scope platform presented here does run the Intan Ephys IC at 20KSps for all 32 channels instead of the maximum 30KSps due to microcontroller speed limitations, but this could be overcome by using a fast microcontroller or clock, or slightly reducing the total number of electrodes samples. Finally, the E-Scope was designed to support any electrode types supported by the Intan Ephys IC. This includes up to 32 channels of passive probes such as single electrodes, tetrodes, silicon probes, and flexible multi-channel arrays but does not include Neuropixels as Neuropixels use custom active electronics on the probe to multiplex, sample, and serialize electrophysiology data.

      The authors only analyzed simple spikes in PCs for social-related activity. What about complex spikes? Is this correlated with ACC activity?

      Complex spikes were detectable to the extent that we were able to define that the recorded cell was a PC, but because these cells were recorded in freely behaving mice, accurate complex spike detection was not reliable enough to be used for further correlational analyses.

      The data is sampled in the two regions (cerebellum and ACC) at very different rates (imaging is much slower than electrophysiology; ephys data was binned). How does this affect the correlation plots?

      We generated firing rate maps for the cerebellar neural activity using a binning size that matched the sampling frequency of calcium imaging (see Methods). As mentioned in our methods, to study the relationship between the electrophysiology and calcium imaging data we binned the spike trains using 33 ms bins to match the calcium imaging sampling rate (~30 Hz). This limits the temporal resolution to calculate fine-scale correlations, but the correlations that we report are on a behaviorally relevant temporal scale. The fine temporal resolution of the electrophysiology data however can still be used to further examine at a higher temporal resolution the relationship between cerebellar output and specific social behavior epochs.

      For the correlation analysis, over what time frame was the activity relationship examined? How was this duration determined?

      Author response image 1.

      The main criteria for the time frame used to study the correlation analysis was the behavioral timescale of social interaction [see figure above for the number of social (red) and object (blue) interaction bouts (a), their duration (b) and coefficient of variation (CV) (c)]. Overall, the activity relationship time frame was based on the average duration of the social interactions (~3 sec). Periods of 3.8 before and 5.8 sec after interaction onset were used to study. Accordingly, the cross-correlograms were constructed using a maximum lag length of 5 sec. In the article we reported correlation at lag 0.

      The relationship between the cerebellum and ACC seems unconvincing. If two brain regions are similarly engaged by the behavior, wouldn't they have a high correlation? Is the activity in one region driving the other?

      We reference studies showing an anatomical and functional indirect connection between the cerebellum and the ACC or prefrontal cortex (Badura et al., eLife, 2018). Also, as stated in the introduction, the ACC is a recognized brain area for social behavioral studies. In the results, we stated that correlations increase in groups of neurons that are similarly engaged during a specific epoch in the social interaction was an expected finding. What was not expected was that there would be no difference in the distribution's correlation when the social epochs were removed, suggesting that intrinsic connectivity does not drive a difference in correlations.

      Although, since there is a cerebello-cortical loop, further study will be needed to understand which area initiates this type of activity during social behavior,

      • In the figures, the color-coded scale bars should be labeled as z-scores (confusing without them).

      • In Figure 4, the color differences for Soc-ACC, Soc+ACC and SocNS ACC should be more striking as it is hard to tell them apart because they are all similar shades of blue-gray.

      We thank the reviewer for their suggestions for improving the figures. We have incorporated these changes in Figures 2, 3 and along with their figure supplements. Graphs in Figure 4D-G have been edited to make the lines more visible to the reader.

      Reviewer #3 (Public Review)

      However, a mouse weighs between 20 and 40 g, so that an implant of 4.5 g is still quite considerable. It can be expected that this has an impact on the behavior and, possibly, the well-being of the animals. Whether this is the case or not, is not really addressed in this study.

      The weight of the E-Scope (4.5 g) is near the maximum that is tolerated by animals in our experience. We therefore acclimated the mouse to the weight with dummy scopes of increasing weights over a 7-10 day period. During this period, we observed the animal to have normal exploratory behavior. Specifically, there is no change in the sociability of the animals (Figure 2A) and animals cover the large arena (48x 48 cm, Figure 2H).

      Overall, the description of animal behavior is rather sparse. The methods state only that stranger age-matched mice were used, but do not state their gender. The nature of the social interactions was not described? Was their aggressive behavior, sexual approach and/or intercourse? Did the stranger mice attack/damage the E-Scope? Were the interactions comparable (using which parameters?) with and without E-Scope attached? It is not even described what the authors define as an "interaction bout" (Figure 2A). The number of interaction bouts is counted per 7 minutes, I presume? This is not specified explicitly.

      As mentioned in the methods section of the original version of our manuscript, all the target mice were age-matched “male” mice. As per the reviewer’s suggestion, we now have added in the manuscript that before any of our social interaction behavioral experiments, aggressive or agitated mice were removed after assessing their behavior in the arena during habituation. For all trials, all mice were introduced for the first time.

      We also mention in the methods section of our manuscript, that social behaviors were evaluated by proximity between the subject mouse and novel target mouse (2 cm from the body, head, or base of tail). From our recordings, we did not observe any aggressive, mounting, nor any other dominance behavior over the E-Scope subject mouse during the 7 minutes of social interaction assessment. Social interaction bouts in Figure 2A show the average number of social interaction bouts during the recording time. This has now been expanded upon in our revised manuscript.

      It would be very insightful if the authors would describe which events they considered to be action potentials, and which not. Similarly, the raw traces of Figure 1E are declared to be single-unit recordings of Purkinje cells. Partially due to the small size of the traces (invisible in print and pixelated in the digital version), I have a hard time recognizing complex spikes and simple spikes in these traces. This is a bit worrisome, as the authors declare the typical duration of the pause in simple spike firing after a complex spike to be 20-100 ms. In my experience, such long pauses are rare in this region, and definitely not typical. In the right panel of Figure 1A, an example of a complex spike-induced pause is shown. This pause is around 15 ms, so not typical according to the text, and starts only around 4 ms after the complex spike, which should not be the case and suggests either a misalignment of the figure or the detection of complex spike spikelets as simple spikes, while the abnormally long pause suggests that the authors fail to detect a lot of simple spikes. The authors could provide more confidence in their data by including more raw data, making explicit how they analyzed the signals, and by reporting basic statistics of firing properties (like rate, cv or cv2, pause duration). In this respect, Figure 2 - figure supplement 3 shows quite a large percentage of cells to have either a very low or a very high firing rate.

      We now provide a better example of simple spikes and complex spikes in Fig 1E and corrected our comment in the body of the manuscript. Previous version of the SS x CS cross-correlation histogram in Figure 1G as the reviewer mentions, was not the best example, because of the detected CS spikelets. However, the detection of CS spikelets has little impact on the interpretation of the results. We have replaced this figure with a better example of the SS x CS cross-correlation histogram.

      The number of Purkinje cells recorded during social interactions is quite low: only 11 cells showed a modulation in their spiking activity (unclear whether in complex spikes, simple spikes or both. During object interaction, only 4 cells showed a significant modulation. Unclear is whether the latter 4 are a subset of the former 11, or whether "social cells" and "object cells" are different categories. Having so few cells, and with these having different types of modulation, the group of cells for each type of modulation is really small, going down to 2 cells/group. It is doubtful whether meaningful interpretation is possible here.

      While the number of neurons is not as high as those reported for other regions, the number presented depicts the full range of responses to social behavior. It is extremely difficult to obtain stable neurons in freely behaving socially interacting animals and only a handful of neurons could be recorded in each animal. Among these recorded neurons only a subset responds to social interactions further reducing the numbers. The results however are consistent among cell types and the direction of modulation fits with the inhibitory connectivity between PCs and DN neurons. To our knowledge, we are the first group to publish neuronal activity of PC and DN neurons from freely behaving mice during social behavior.

      Neural activity patterns observed during social interaction do not necessarily relate specifically to social interaction, but can also occur in a non-social context. The authors control this by comparing social interactions with object interactions, but I miss a direct comparison between the two conditions, both in terms of behavior (now only the number of interactions is counted, not their duration or intensity), and in terms of neural activity. There is some analysis done on the interaction between movement and cerebellar activity (Figure 2 - figure supplement 4), but it is unclear to what extent social interactions and movements are separated here. It would already help to indicate in the plots with trajectories (e.g., Fig. 2H) indicate the social interactions (e.g., social interaction-related movements in red, the rest of the trajectories in black).

      We have updated the social interaction plots in Figure 2H in the revised version of the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      Increase the number of cerebellar neurons that are recorded.

      Due to the difficulty of the experiment and the low yield which we get for cerebellar recordings, substantially increasing the number of neurons will require many more experiments which are not feasible at this time.

      Include more raw data and make the analysis procedure more insightful with illustrations of intermediate steps.

      We have included a more thorough description of the analysis in the methods section of the revised manuscript.

      Provide a better description of the behavior.

      We have increased the level of detail regarding the mouse behavior in the Results and Methods sections. This includes a more detailed description of the parameters we used to analyze the social interaction.

    1. Author Response

      We are grateful to the reviewers for their positive feedback with their comments and suggestions on the manuscript. Reviewer 1 has indicated two weaknesses and Reviewer 2 has none. With this provisional reply, we address the two concerns of the Reviewer 1:

      1) Data obtained from a single aminoacyl-tRNA (D-Tyr-tRNATyr) have been generalized to imply that what is relevant to this model substrate is true for all other D-aa-tRNAs. This is not a risk-free extrapolation. Why do the authors believe that the length of the amino acid side chain will not matter in the activity of DTD2?

      We thank the reviewer for bringing up this important point. We wish to clarify that only a few of the aminoacyl-tRNA synthetases are known to charge D-amino acids and only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009). D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007). Moreover, we have previously shown that it recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      2) While the use of EFTu supports that the ternary complex formation by the elongation factor can resist modifications of L-Tyr-tRNATyr by the aldehydes or other agents, in the context of the present work on the role of DTD2 in plants, one would want to see the data using eEF1alpha. This is particularly relevant because there are likely to be differences in the way EFTu and eEF1alpha may protect aminoacyl-tRNAs (for example see description in the latter half of the article by Wolfson and Knight 2005, FEBS Letters 579, 3467-3472).

      We thank the reviewer for bringing another important point. We analysed the aa-tRNA bound elongation factor structures from both bacteria (PDB id: 1TTT) and mammal (PDB id: 5LZS) and found that the amino acid binding site is highly conserved where side chain of amino acid is projected outside. Modelling of D-amino acid in the same site shows serious clashes, indicating D-chiral rejection during aa-tRNA binding by elongation factor. In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs. Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences. However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We will present a sequence and structural conservation analysis to clarify this important point in our revised manuscript. Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      In our revised manuscript, we will provide a thorough point-by-point response to the above as well as all the specific reviewer comments. We also intend to include new analysis with updated data that would address the key questions raised by the reviewers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The biogenesis of outer membrane proteins (OMPs) into the outer membranes of Gram-negative bacteria is still not fully understood, particularly substrate recognition and insertion by beta-assembly machinery (BAM). In the studies, the authors present their studies that in addition to recognition by the last strand of an OMP, sometimes referred to as the beta-signal, an additional signal upstream of the last strand is also important for OMP biogenesis.

      Strengths:

      1. Overall the manuscript is well organized and written, and addresses an important question in the field. The idea that BAM recognizes multiple signals on OMPs has been presented previously, however, it was not fully tested.

      2. The authors here re-address this idea and propose that it is a more general mechanism used by BAM for OMP biogenesis.

      3. The notion that additional signals assist in biogenesis is an important concept that indeed needs fully tested in OMP biogenesis.

      4. A significant study was performed with extensive experiments reported in an attempt to address this important question in the field.

      5. The identification of important crosslinks and regions of substrates and Bam proteins that interact during biogenesis is an important contribution that gives clues to the path substrates take en route to the membrane.

      Weaknesses:

      Major critiques (in no particular order):

      1. The title indicates 'simultaneous recognition', however no experiments were presented that test the order of interactions during OMP biogenesis.

      We have replaced the word “Simultaneous” with “Dual” so as not to reflect on the timing of the recognition events for the distinct C-terminal signal and -5 signal.

      1. Aspects of the study focus on the peptides that appear to inhibit OmpC assembly, but should also include an analysis of the peptides that do not to determine this the motif(s) present still or not.

      We thank the reviewer for this comment. Our study focuses on the peptides which exhibited an inhibitory effect in order to elucidate further interactions between the BAM complex and substrate proteins, especially in early stage of the assembly process. In the case of peptide 9, which contains all of our proposed elements but did not have an inhibitory effect, there is the presence of an arginine residue at the polar residue next to hydrophobic residue in position 0 (0 Φ). As seen in Fig S5, S6, and S7, there are no positively charged amino acids in the polar residue positions in the -5 or last strands. This might be the reason why peptide 9, as well as peptide 24, the β-signal derived from the mitochondrial OMP Tom40 and contains a lysine at the polar position, did not display an inhibitory effect. Incorporating the reviewer's suggestions might elucidate conditions that should not be added to the elements, but this is not the focus of this paper and was not discussed to avoid complicating the paper.

      1. The β-signal is known to form a β-strand, therefore it is unclear why the authors did not choose to chop OmpC up according to its strands, rather than by a fixed peptide size. What was the rationale for how the peptide lengths were chosen since many of them partially overlap known strands, and only partially (2 residues) overlap each other? It may not be too surprising that most of the inhibitory peptides consist of full strands (#4, 10, 21, 23).

      A simple scan of known β-strands would have been an alternative approach, however this comes with the bias of limiting the experiments to predicted substrate (strand) sequences, and it presupposes that the secondary structure element would be formed by this tightly truncated peptide.

      Instead, we allowed for the possibility that OMPs meet the BAM complex in an unfolded or partially folded state, and that the secondary structure (β-strand) might only form via β-argumentation after the substrate is placed in the context of the lateral gate. We therefore used peptides that mapped right across the entirety of OmpC, with a two amino acid overlap.

      To clarify this important point regarding the unbiased nature of our screen, we have revised the text:

      (Lines 147-151) "We used peptides that mapped the entirety of OmpC, with a two amino acid overlap. This we considered preferable to peptides that were restricted by structural features, such as β-strands, in consideration that β-strand formation may or may not have occurred in early-stage interactions at the BAM complex."

      1. It would be good to have an idea of the propensity of the chosen peptides to form β-stands and participate in β-augmentation. We know from previous studies with darobactin and other peptides that they can inhibit OMP assembly by competing with substrates.

      We appreciate the reviewer's suggestion. However, we have not conducted biophysical characterizations of the peptides to calculate the propensity of each peptide to form β-stands and participate in β-augmentation. The sort of detailed biophysical analysis done for Darobactin (by the Maier and Hiller groups, The antibiotic darobactin mimics a β-strand to inhibit outer membrane insertase Nature 593:125-129) was a Nature publication based on this single peptide. A further biophysical analysis of all of the peptides presented here goes well beyond the scope of our study.

      1. The recognition motifs that the authors present span up to 9 residues which would suggest a relatively large binding surface, however, the structures of these regions are not large enough to accommodate these large peptides.

      The β-signal motif (ζxGxx[Ω/Φ]x[Ω/Φ]) is an 8-residue consensus, some of the inhibitory peptides include additional residues before and after the defined motif of 8 residues, and the lateral gate of BamA has been shown interact with a 7-residue span (eg. Doyle et al, 2022). Cross-linking presented in our study showed BamD residues R49 and G65 cross-linked to the positions 0 and 6 of the internal signal in OmpC (Fig. 6D).

      We appreciate this point of clarification and have modified the text to acknowledge that in the final registering of the peptide with its binding protein, some parts of the peptide might sit beyond the bounds of the BamD receptor’s binding pocket and the BamA lateral gate:

      (Lines 458-471) "The β-signal motif (ζxGxx[Ω/Φ]x[Ω/Φ]) is an eight-residue consensus, and internal signal motif is composed of a nine-residue consensus. Recent structures have shown the lateral gate of BamA interacts with a 7-residue span of substrate OMPs. Interestingly, inhibitory compounds, such as darobactin, mimic only three resides of the C-terminal side of β-signal motif. Cross-linking presented here in our study showed that BamD residues R49 and G65 cross-linked to the positions 0 and 6 of the internal signal in OmpC (Fig. 6D). Both signals are larger than the assembly machineries signal binding pocket, implying that the signal might sit beyond the bounds of the signal binding pocket in BamD and the lateral gate in BamA. These finding are consistent with similar observations in other signal sequence recognition events, such as the mitochondrial targeting presequence signal that is longer than the receptor groove formed by the Tom20, the subunit of the translocator of outer membrane (TOM) complex (Yamamoto et al., 2011). The presequence has been shown to bind to Tom20 in several different conformations within the receptor groove (Nyirenda et al., 2013)."

      Moreover, the distance between amino acids of BamD which cross-linked to the internal signal, R49 and Y62, is approximately 25 Å (pdbID used 7TT3). The distance of the maximum amino acid length of the internal signal of OmpC, from F280 to Y288, is approximately 22 Å (pdbID used 2J1N). This would allow for the signal to fit within the confines of the TRP motif of BamD.

      Author response image 1.

      1. The authors highlight that the sequence motifs are common among the inhibiting peptides, but do not test if this is a necessary motif to mediate the interactions. It would have been good to see if a library of non-OMP related peptides that match this motif could also inhibit or not.

      With respect, this additional work would not address any biological question relevant to the function of BamD. To randomize sequences and then classify those that do or don’t fit the motif would help in refining the parameters of the β-signal motif, but that was not our intent.

      We have identified the peptides from within the total sequence of an OMP, shown which peptides inhibit in an assembly assay, and then observed that the inhibitory peptides conform to a previously published (β-signal) motif.

      1. In the studies that disrupt the motifs by mutagenesis, an effect was observed and attributed to disruption of the interaction of the 'internal signal'. However, the literature is filled with point mutations in OMPs that disrupt biogenesis, particular those within the membrane region. F280, Y286, V359, and Y365 are all residues that are in the membrane region that point into the membrane. Therefore, more work is needed to confirm that these mutations are in parts of a recognition motif rather than on the residues that are disrupting stability/assembly into the membrane.

      As the reviewer pointed out, the side chains of the amino acids constituting the signal elements we determined were all facing the lipid side, of which Y286 and Y365 were important for folding as well as to be recognized. However, F280A and V359A had no effect on folding, but only on assembly through the BAM complex. The fact that position 0 functions as a signal has been demonstrated by peptidomimetics (Fig. 1) and point mutant analysis (Fig. 2). We appreciate this clarification and have modified the text to acknowledge that the all of the signal element faces the lipid side, which contributes to their stability in the membrane finally, and before that the BAM complex actively recognizes them and determines their orientation:

      (Lines 519-526) After OMP assembly, all elements of the internal signal are positioned such that they face into the lipid-phase of the membrane. This observation may be a coincidence, or may be utilized by the BAM complex to register and orientate the lipid facing amino acids in the assembling OMP away from the formative lumen of the OMP. Amino acids at position 6, such as Y286 in OmpC, are not only component of the internal signal for binding by the BAM complex, but also act in structural capacity to register the aromatic girdle for optimal stability of the OMP in the membrane.

      1. The title of Figure 3 indicates that disrupting the internal signal motif disrupts OMP assembly, however, the point mutations did not seem to have any effect. Only when both 280 and 286 were mutated was an effect observed. And even then, the trimer appeared to form just fine, albeit at reduced levels, indicating assembly is just fine, rather the rate of biogenesis is being affected.

      We appreciate this point and have revised the title of Figure 3 to be:

      (Lines 1070-1071) "Modifications in the putative internal signal slow the rate of OMP assembly in vivo."

      1. In Figure 4, the authors attempt to quantify their blots. However, this seems to be a difficult task given the lack of quality of the blots and the spread of the intended signals, particularly of the 'int' bands. However, the more disturbing trend is the obvious reduction in signal from the post-urea treatment, even for the WT samples. The authors are using urea washes to indicate removal of only stalled substrates. However a reduction of signal is also observed for the WT. The authors should quantify this blot as well, but it is clear visually that both WT and the mutant have obvious reductions in the observable signals. Further, this data seems to conflict with Fig 3D where no noticeable difference in OmpC assembly was observed between WT and Y286A, why is this the case?

      We have addressed this point by adding a statistical analysis on Fig. 4A. As the reviewer points out, BN-PAGE band quantification is a difficult task given the broad spread of the bands on these gels. Statistical analysis showed that the increase in intermediates (int) was statistically significant for Y286A at all times until 80 min, when the intermediate form signals decrease.

      (Lines 1093-1096) "Statistical significance was indicated by the following: N.S. (not significant), p<0.05; , p<0.005; *. Exact p values of intermediate formed by Wt vs Y286A at each timepoint were as follows; 20 minutes: p = 0.03077, 40 minutes: p = 0.02402, 60 minutes: p = 0.00181, 80 minutes: p = 0.0545."

      Further regarding the Int. band, we correct the statement as follows.

      (Lines 253-254) "Consistent with this, the assembly intermediate which was prominently observed at the OmpC(Y286A) can be extracted from the membranes with urea;"

      OMP assembly in vivo has additional periplasmic chaperones and factors present in order to support the assembly process. Therefore, it is likely that some proteins were assembled properly in vivo compared to their in vitro counterparts. Such a decrease has been observed not only in E. coli but also in mitochondrial OMP import (Yamano et al., 2010).

      1. The pull-down assays with BamA and BamD should include a no protein control at the least to confirm there is no non-specific binding to the resin. Also, no detergent was mentioned as part of the pull downs that contained BamA or OmpC, nor was it detailed if OmpC was urea solubilized.

      We have performed pull down experiments with a no-protein (Ni-NTA only) control as noted (Author response image 1). The results showed that the amount of OmpC carrying through on beads only was significantly lower than the amount of OmpC bound in the presence of BamD or BamA. The added OmpC was not treated with urea, but was synthesized by in vitro translation; the in vitro translated OmpC is the standard substrate in the EMM assembly assay (Supp Fig. S1) where it is recognized by the BAM complex. Thus, we used it for pull-down as well and, to make this clearer, we have revised as follows:

      Author response image 2.

      Pull down assay of radio-labelled OmpC with indicated protein or Ni-NTA alone (Ni-NTA) . T; total, FT; Flow throw, W; wash, E; Elute.

      (Lines 252-265) "Three subunits of the BAM complex have been previously shown to interact with the substrates: BamA, BamB, and BamD (Hagan et al., 2013; Harrison, 1996; Ieva et al., 2011). In vitro pull-down assay showed that while BamA and BamD can independently bind to the in vitro translated OmpC polypeptide (Fig .S9A), BamB did not (Fig. S9B)."

      11.

      • The neutron reflectometry experiments are not convincing primarily due to the lack controls to confirm a consistent uniform bilayer is being formed and even if so, uniform orientations of the BamA molecules across the surface.

      • Further, no controls were performed with BamD alone, or with OmpC alone, and it is hard to understand how the method can discriminate between an actual BamA/BamD complex versus BamA and BamD individually being located at the membrane surface without forming an actual complex.

      • Previous studies have reported difficulty in preparing a complex with BamA and BamD from purified components.

      • Additionally, little signal differences were observed for the addition of OmpC. However, an elongated unfolded polypeptide that is nearly 400 residues long would be expected to produce a large distinct signal given that only the C-terminal portion is supposedly anchored to BAM, while the rest would be extended out above the surface.

      • The depiction in Figure 5D is quite misleading when viewing the full structures on the same scales with one another.

      We have addressed these five points individually as follows.

      i. The uniform orientation of BamA on the surface is guaranteed by the fixation through a His-tag engineered into extracellular loop 6 of BamA and has been validated in previous studies as cited in the text. Moreover, to explain this, we reconstructed another theoretical model for BamA not oriented well in the system as below. However, we found that the solid lines (after fitting) didn’t align well with the experimental data. We therefore assumed that BamA has oriented well in the membrane bilayer.

      Author response image 3.

      Experimental (symbols) and fitted (curves) NR profiles of BamA not oriented well in the POPC bilayer in D2O (black), GMW (blue) and H2O (red) buffer.

      ii. There would be no means by which to do a control with OmpC alone or BamD alone as neither protein binds to the lipid layer chip. OmpC is diluted from urea and then the unbound OmpC is washed from the chip before NR measurements. BamD does not have an acyl group to anchor it to the lipid layer, without BamA to anchor to, it too is washed from the chip before NR measurements. We have reconstructed another theoretical model for both of BamA + BamD embedding in the membrane bilayer, and the fits were shown below. Apparently, the fits didn’t align well with the experimental data, which discriminate the BamA/BamD individually being located at the membrane surface without forming an actual complex.

      Author response image 4.

      Experimental (symbols) and fitted (curves) NR profiles of BamA+D embedding together in the POPC bilayer in D2O (black), GMW (blue) and H2O (red) buffer.

      iii. The previous studies that reported difficulty in preparing a complex with BamA and BamD from purified components were assays done in aqueous solution including detergent solubilized BamA, or with BamA POTRA domains only. Our assay is superior in that it reports the binding of BamD to a purified BamA that has been reconstituted in a lipid bilayer.

      iv. The relatively small signal differences observed for the addition of OmpC are expected, since OmpC is an elongated, unfolded polypeptide of nearly 400 residues long which, in the context of this assay, can occupy a huge variation in the positions at which it will sit with only the C-terminal portion anchored to BAM, and the rest moving randomly about and extended from the surface.

      v. We appreciate the point raised and have now added a note in the Figure legend that these are depictions of the results and not a scale drawing of the structures.

      1. In the crosslinking studies, the authors show 17 crosslinking sites (43% of all tested) on BamD crosslinked with OmpC. Given that the authors are presenting specific interactions between the two proteins, this is worrisome as the crosslinks were found across the entire surface of BamD. How do the authors explain this? Are all these specific or non-specific?

      The crosslinking experiment using purified BamD was an effective assay for comprehensive analysis of the interaction sites between BamD and the substrate. However, as the reviewer pointed out, cross-linking was observed even at the sites that, in the context of the BAM complex, interact with BamC as a protein-protein interaction and would not be available for substrate protein-protein interactions. To complement this, analysis and to address this issue, we also performed the experiment in Fig. 6C.

      In Fig. 6C, the interaction of BamD with the substrate is examined in vivo, and the results demonstrate that if BPA is introduced into the site, we designated as the substrate recognition site, it is cross-linked to the substrate. On the other hand, position 114 was found to crosslink with the substrate in vitro crosslinking, but not in vivo. It should be noted that position 114 has also been confirmed to form cross-link products with BamC, we believe that BamD-substrate interactions in the native state have been investigated. To explain the above, we have added the following description to the Results section.

      (Lines 319-321) "Structurally, these amino acids locate both the lumen side of funnel-like structure (e.g. 49 or 62) and outside of funnel-like structure such as BamC binding site (e.g. 114) (fig. S12C). (Lines 350-357) Positions 49, 53, 65, and 196 of BamD face the interior of the funnel-like structure of the periplasmic domain of the BAM complex, while position 114 is located outside of the funnel-like structure (Bakelar et al., 2016; Gu et al., 2016; Iadanza et al., 2016). We note that while position 114 was cross-linked with OmpC in vitro using purified BamD, that this was not seen with in vivo cross-linking. Instead, in the context of the BAM complex, position 114 of BamD binds to the BamC subunit and would not be available for substrate binding in vivo (Bakelar et al., 2016; Gu et al., 2016; Iadanza et al., 2016)."

      1. The study in Figure 6 focuses on defined regions within the OmpC sequence, but a more broad range is necessary to demonstrate specificity to these regions vs binding to other regions of the sequence as well. If the authors wish to demonstrate a specific interaction to this motif, they need to show no binding to other regions.

      The region of affinity for the BAM complex was determined by peptidomimetic analysis, and the signal region was further identified by mutational analysis of OmpC. Subsequently, the subunit that recognizes the signal region was identified as BamD. In other words, in the process leading up to Fig. 6, we were able to analyze in detail that other regions were not the target of the study. We have revised the text to make clear that we focus on the signal region including the internal signal, and have not also analyzed other parts of the signal region:

      (Lines 329-332) "As our peptidomimetic screen identified conserved features in the internal signal, and cross-linking highlighted the N-terminal and C-terminal TPR motifs of BamD as regions of interaction with OmpC, we focused on amino acids specifically within the β-signals of OmpC and regions of BamD which interact with β-signal."

      1. The levels of the crosslinks are barely detectable via western blot analysis. If the interactions between the two surfaces are required, why are the levels for most of the blots so low?

      These are western blots of cross-linked products – the efficiency of cross-linking is far less than 100% of the interacting protein species present in a binding assay and this explains why the levels for the blots are ‘so low’. We have added a sentence to the revised manuscript to make this clear for readers who are not molecular biologists:

      (Lines 345-348) "These western blots reveal cross-linked products representing the interacting protein species. Photo cross-linking of unnatural amino acid is not a 100% efficient process, so the level of cross-linked products is only a small proportion of the molecules interacting in the assays."

      15.

      • Figure 7 indicates that two regions of BamD promote OMP orientation and assembly, however, none of the experiments appears to measure OMP orientation?

      • Also, one common observation from panel F was that not only was the trimer reduced, but also the monomer. But even then, still a percentage of the trimer is formed, not a complete loss.

      (i) We appreciate this point and have revised the title of Figure 7 to be:

      (Lines 1137-1138) "Key residues in two structurally distinct regions of BamD promote β-strand formation and OMP assembly."

      (ii) In our description of Fig. 7F (Lines 356-360) we do not distinguish between the amount of monomer and trimer forms, since both are reflective of the overall assembly rate i.e. assembly efficiency. Rather, we state that:

      "The EMM assembly assay showed that the internal signal binding site was as important as the β-signal binding site to the overall assembly rates observed for OmpC (Fig. 7F), OmpF (fig. S15D), and LamB (fig. S15E). These results suggest that recognition of both the C-terminal β-signal and the internal signal by BamD is important for efficient protein assembly."

      16.

      • The experiment in Fig 7B would be more conclusive if it was repeated with both the Y62A and R197A mutants and a double mutant. These controls would also help resolve any effect from crowding that may also promote the crosslinks.

      • Further, the mutation of R197 is an odd choice given that this residue has been studied previously and was found to mediate a salt bridge with BamA. How was this resolved by the authors in choosing this site since it was not one of the original crosslinking sites?

      As stated in the text, the purpose of the experiment in Figure 7B is to measure the impact of pre-forming a β-strand in the substrate (OmpC) before providing it to the receptor (BamD). We thank the reviewer for the comment on the R197 position of BamD. The C-terminal domain of BamD has been suggested to mediate the BamA-BamD interface, specifically BamD R197 amino acid creates a salt-bridge with BamA E373 (Ricci et al., 2012). It had been postulated that the formation of this salt-bridge is not strictly structural, with R197 highlighted as a key amino acid in BamD activity and this salt-bridge acts as a “check-point” in BAM complex activity (Ricci et al., 2012, Storek et al., 2023). Our results agree with this, showing that the C-terminus of BamD acts in substrate recognition and alignment of the β-signal (Fig. 6, Fig S12). We show that amino acids in the vicinity of R197 (N196, G200, D204) cross-linked well to substrate and mutations to the β-signal prevent this interaction (Fig S12B, D). For mutational analysis of BamD, we looked then at the conservation of the C-terminus of BamD and determined R197 was the most highly conserved amino acid (Fig 6C). In order to account for this, we have adjusted the manuscript:

      (Lines 376-377) "R197 has previously been isolated as a suppressor mutation of a BamA temperature sensitive strain (Ricci et al., 2012)."

      (Lines 495-496) "This adds an additional role of the C-terminus of BamD beyond a complex stability role (Ricci et al., 2012; Storek et al., 2023)."

      1. As demonstrated by the authors in Fig 8, the mutations in BamD lead to reduction in OMP levels for more than just OmpC and issues with the membrane are clearly observable with Y62A, although not with R197A in the presence of VCN. The authors should also test with rifampicin which is smaller and would monitor even more subtle issues with the membrane. Oddly, no growth was observed for the Vec control in the lower concentration of VCN, but was near WT levels for 3 times VCN, how is this explained?

      While it would be interesting to correlate the extent of differences to the molecular size of different antibiotics such as rifampicin, such correlations are not the intended aim of our study. Vancomycin (VCN) is a standard measure of outer membrane integrity in our field, hence its use in our tests for membrane integrity.

      We apologize to the reviewer as Figure 8 D-G may have been misleading. Figure 8D,E are using bamD shut-down cells expressing plasmid-borne BamD mutants. Whereas Figure 8F, G are the same strain as used in Figure 3. We have adjusted the figure as well as the figure legend: (Lines 1165-1169) D, E E coli bamD depletion cells expressing mutations at residues, Y62A and R197A, in the β-signal recognition regions of BamD were grown with of VCN. F, G, E coli cells expressing mutations to OmpC internal signal, as shown in Fig 3, grown in the presence of VCN. Mutations to two key residues of the internal signal were sensitive to the presence of VCN.

      1. While Fig 8I indeed shows diminished levels for FY as stated, little difference was observed for the trimer for the other mutants compared to WT, although differences were observed for the dimer. Interestingly, the VY mutant has nearly WT levels of dimer. What do the authors postulate is going on here with the dimer to trimer transition? How do the levels of monomer compare, which is not shown?

      The BN-PAGE gel system cannot resolve protein species that migrate below ~50kDa and the monomer species of the OMPs is below this size. We can’t comment on effects on the monomer because it is not visualized. The non-cropped gel image is shown here. Recently, Hussain et al., has shown that in vitro proteo-liposome system OmpC assembly progresses from a “short-lived dimeric” form before the final process of trimerization (Hussain et al., 2021). However, their findings suggest that LPS plays the final role in stimulation of dimer-to-trimer, a step well past the recognition step of the β-signals. Mutations to the internal signal of OmpC results in the formation of an intermediate, the substrate stalled on the BAM complex. This stalling, presumably, causes a hinderance to the BAM complex resulting in reduced timer and loss of dimer OmpF signal in the EMM of cells expressing OmpC double mutant strain, FY. cannot resolve protein species that migrate below ~50kDa and the monomer species of the OMPs is below this size. We can’t comment on effects on the monomer because it is not visualized. The non-cropped gel image is shown here. We have noted this in the revised text:

      Author response image 5.

      Non-cropped gel of Fig. 8I. the asterisk indicates a band observed in the sample loading wells at the top of the gel.

      (Lines 417-418) "The dimeric form of endogenous OmpF was prominently observed in both the OmpC(WT) as well as the OmpC(VY) double mutant cells."

      1. In the discussion, the authors indicate they have '...defined an internal signal for OMP assembly', however, their study is limited and only investigates a specific region of OmpC. More is needed to definitively say this for even OmpC, and even more so to indicate this is a general feature for all OMPs.

      We acknowledge the reviewer's comment on this point and have expanded the statement to make sure that the conclusion is justified with the specific evidence that is shown in the paper and the supplementary data. We now state:

      (Lines 444-447) "This internal signal corresponds to the -5 strand in OmpC and is recognized by BamD. Sequence analysis shows that similar sequence signatures are present in other OMPs (Figs. S5, S6 and S7). These sequences were investigated in two further OMPs: OmpF and LamB (Fig. 2C and D)."

      Note, we did not state that this is a general feature for all OMPs. That would not be a reasonable proposition.

      20.

      • In the proposed model in Fig 9, it is hard to conceive how 5 strands will form along BamD given the limited surface area and tight space beneath BAM.

      • More concerning is that the two proposal interaction sites on BamD, Y62 and R197, are on opposite sides of the BamD structure, not along the same interface, which makes this model even more unlikely.

      • As evidence against this model, in Figure 9E, the two indicates sites of BamD are not even in close proximity of the modeled substrate strands.

      We can address the reviewer’s three concerns here:

      i. The first point is that the region (formed by BamD engaged with POTRA domains 1-2 and 5 of BamA) is not sufficient to accommodate five β-strands. Structural analysis reveals that the interaction between the N-terminal side of BamD and POTRA1-2 is substantially changed the conformation by substrate binding, and that this surface is greatly extended. This surface does have enough space to accommodate five beta-strands, as now documented in Fig. 9D, 9E using the latest structures (7TT5 and 7TT2) as illustrations of this. The text now reads:

      (Lines 506-515) "Spatially, this indicates the BamD can serve to organize two distinct parts of the nascent OMP substrate at the periplasmic face of the BAM complex, either prior to or in concert with, engagement to the lateral gate of BamA. Assessing this structurally showed the N-terminal region of BamD (interacting with the POTRA1-2 region of BamA) and the C-terminal region of BamD (interacting with POTRA5 proximal to the lateral gate of BamA) (Bakelar et al., 2016; Gu et al., 2016; Tomasek et al., 2020) has the N-terminal region of BamD changing conformation depending on the folding states of the last four β-strands of the substrate OMP, EspP (Doyle et al., 2022). The overall effect of this being a change in the dimensions of this cavity change, a change which is dependent on the folded state of the substrate engaged in it (Fig 9 B-E)."

      ii. The second point raised regards the orientation of the substrate recognition residues of BamD. Both Y62A and R197 were located on the lumen side of the funnel in the EspP-BAM transport intermediate structure (PDBID;7TTC); Y62A is relatively located on the edge of BamD, but given that POTRA1-2 undergoes a conformational change and opens this region, as described above, both are located in locations where they could bind to substrates. This was explained in the following text in the results section of revised manuscript.

      (Lines 377-379) "Each residue was located on the lumen side of the funnel-like structure in the EspP-BAM assembly intermediate structure (PDBID; 7TTC) (Doyle et al., 2022)."

      **Reviewer #2 (Public Review):"

      Previously, using bioinformatics study, authors have identified potential sequence motifs that are common to a large subset of beta-barrel outer membrane proteins in gram negative bacteria. Interestingly, in that study, some of those motifs are located in the internal strands of barrels (not near the termini), in addition to the well-known "beta-signal" motif in the C-terminal region.

      Here, the authors carried out rigorous biochemical, biophysical, and genetic studies to prove that the newly identified internal motifs are critical to the assembly of outer membrane proteins and the interaction with the BAM complex. The author's approaches are rigorous and comprehensive, whose results reasonably well support the conclusions. While overall enthusiastic, I have some scientific concerns with the rationale of the neutron refractory study, and the distinction between "the intrinsic impairment of the barrel" vs "the impairment of interaction with BAM" that the internal signal may play a role in. I hope that the authors will be able to address this.

      Strengths:

      1. It is impressive that the authors took multi-faceted approaches using the assays on reconstituted, cell-based, and population-level (growth) systems.

      2. Assessing the role of the internal motifs in the assembly of model OMPs in the absence and presence of BAM machinery was a nice approach for a precise definition of the role.

      Weaknesses:

      1. The result section employing the neutron refractory (NR) needs to be clarified and strengthened in the main text (from line 226). In the current form, the NR result seems not so convincing.

      What is the rationale of the approach using NR?

      We have now modified the text to make clear that:

      (Lines 276-280) "The rationale to these experiments is that NR provides: (i) information on the distance of specified subunits of a protein complex away from the atomically flat gold surface to which the complex is attached, and (ii) allows the addition of samples between measurements, so that multi-step changes can be made to, for example, detect changes in domain conformation in response to the addition of a substrate."

      What is the molecular event (readout) that the method detects?

      We have now modified the text to make clear that:

      (Lines 270-274) "While the biochemical assay demonstrated that the OmpC(Y286A) mutant forms a stalled intermediate with the BAM complex, in a state in which membrane insertion was not completed, biochemical assays such as this cannot elucidate where on BamA-BamD this OmpC(Y286A) substrate is stalled."

      What are "R"-y axis and "Q"-x axis and their physical meanings (Fig. 5b)?

      The neutron reflectivity, R, refers to the ratio of the incoming and exiting neutron beams and it is measured as a function of Momentum transfer Q, which is defined as Q=4π sinθ/λ, where θ is the angle of incident and λ is the neutron wavelength. R(Q)is approximately given byR(Q)=16π2/ Q2 |ρ(Q)|2, where R(Q) is the one-dimensional Fourier transform of ρ(z), the scattering length density (SLD) distribution normal to the surface. SLD is the sum of the coherent neutron scattering lengths of all atoms in the sample layer divided by the volume of the layer. Therefore, the intensity of the reflected beams is highly dependent on the thickness, densities and interface roughness of the samples. This was explained in the following text in the method section of revised manuscript.

      (Lines 669-678) "Neutron reflectivity, denoted as R, is the ratio of the incoming to the exiting neutron beams. It’s calculated based on the Momentum transfer Q, which is defined by the formula Q=4π sinθ/λ, where θ represents the angle of incidence and λ stands for the neutron wavelength. The approximate value of R(Q) can be expressed as R(Q)=16π2/ Q2 |ρ(Q)|2, where R(Q) is the one-dimensional Fourier transform of ρ(z), which is the scattering length density (SLD) distribution perpendicular to the surface. SLD is calculated by dividing the sum of the coherent neutron scattering lengths of all atoms in a sample layer by the volume of that layer. Consequently, factors such as thickness, volume fraction, and interface roughness of the samples significantly influence the intensity of the reflected beams."

      How are the "layers" defined from the plot (Fig. 5b)?

      The “layers” in the plot (Fig. 5b) represent different regions of the sample being studied. In this study, we used a seven-layer model to fit the experimental data (chromium - gold - NTA - HIS8 - β-barrel - P3-5 - P1-2. This was explained in the following text in the figure legend of revised manuscript. (Lines 1115-1116) The experimental data was fitted using a seven-layer model: chromium - gold - NTA - His8 - β-barrel - P3-5 - P1-2.

      What are the meanings of "thickness" and "roughness" (Fig. 5c)?

      We used neutron reflectometry to determine the relative positions of BAM subunits in a membrane environment. The binding of certain subunits induced conformational changes in other parts of the complex. When a substrate membrane protein is added, the periplasmic POTRA domain of BamA extends further away from the membrane surface. This could result in an increase in thickness as observed in neutron reflectometry measurements.

      As for roughness, it is related to the interface properties of the sample. In neutron reflectometry, the intensity of the reflected beams is highly dependent on the thickness, densities, and interface roughness of the samples. An increase in roughness could suggest changes in these properties, possibly due to protein-membrane interactions or structural changes within the membrane.

      (Lines 1116-1120) "Table summarizes of the thickness, roughness and volume fraction data of each layer from the NR analysis. The thickness refers to the depth of layered structures being studied as measured in Å. The roughness refers to the irregularities in the surface of the layered structures being studied as measured in Å."

      What does "SLD" stand for?

      We apologize for not explaining abbreviation when the SLD first came out. We explained it in revised manuscript. (Line 298)

      1. In the result section, "The internal signal is necessary for insertion step of assembly into OM" This section presents an important result that the internal beta-signal is critical to the intrinsic propensity of barrel formation, distinct from the recognition by BAM complex. However, this point is not elaborated in this section. For example, what is the role of these critical residues in the barrel structure formation? That is, are they involved in any special tertiary contacts in the structure or in membrane anchoring of the nascent polypeptide chains?

      We appreciate the reviewer's comment on this point. Both position 0 and position 6 appear to be important amino acids for recognition by the BAM complex, since mutations introduced at these positions in peptide 18 prevent competitive inhibition activity.

      In terms of the tertiary structure of OmpC, position 6 is an amino acid that contributes to the aromatic girdle, and since Y286A and Y365A affected OMP folding as measured in folding experiments, it is perhaps their position in the aromatic girdle that contributes to the efficiency of β-barrel folding in addition to its function as a recognition signal. We have added a sentence in the revised manuscript:

      (Lines 233-236) "Position 6 is an amino acid that contributes to the aromatic girdle. Since Y286A and Y365A affected OMP folding as measured in folding experiments, their positioning into the aromatic girdle may contributes to the efficiency of β-barrel folding, in addition to contributing to the internal signal."

      The mutations made at position 0 had no effect on folding, so this residue may function solely in the signal. Given the register of each β-strand in the final barrel, the position 0 residues have side-chains that face out into the lipid environment. From examination of the OmpC crystal structure, the residue at position 0 makes no special tertiary contacts with other, neighbouring residues.  

      Reviewer #1 (Recommendations For The Authors):

      Minor critiques (in no particular order):

      1. Peptide 18 was identified based on its strong inhibition for EspP assembly but another peptide, peptide 23, also shows inhibition and has no particular consensus.

      We would correct this point. Peptide 23 has a strong consensus to the canonical β-signal. We had explained the sequence consensus of β-signal in the Results section of the text. In the third paragraph, we have added a sentence indicating the relationship between peptide 18 and peptide 23.

      (Lines 152-168) "Six peptides (4, 10, 17, 18, 21, and 23) were found to inhibit EspP assembly (Fig. 1A). Of these, peptide 23 corresponds to the canonical β-signal of OMPs: it is the final β-strand of OmpC and it contains the consensus motif of the β-signal (ζxGxx[Ω/Φ]x[Ω/Φ]). The inhibition seen with peptide 23 indicated that our peptidomimetics screening system using EspP can detect signals recognized by the BAM complex. In addition to inhibiting EspP assembly, five of the most potent peptides (4, 17, 18, 21, and 23) inhibited additional model OMPs; the porins OmpC and OmpF, the peptidoglycan-binding OmpA, and the maltoporin LamB (fig. S3). Comparing the sequences of these inhibitory peptides suggested the presence of a sub-motif from within the β-signal, namely [Ω/Φ]x[Ω/Φ] (Fig. 1B). The sequence codes refer to conserved residues such that: ζ, is any polar residue; G is a glycine residue; Ω is any aromatic residue; Φ is any hydrophobic residue and x is any residue (Hagan et al., 2015; Kutik et al., 2008). The non-inhibitory peptide 9 contained some elements of the β-signal but did not show inhibition of EspP assembly (Fig. 1A).

      Peptide 18 also showed a strong sequence similarity to the consensus motif of the β-signal (Fig. 1B) and, like peptide 23, had a strong inhibitory action on EspP assembly (Fig. 1A). Variant peptides based on the peptide 18 sequence were constructed and tested in the EMM assembly assay (Fig. 1C)."

      1. It is unclear why the authors immediately focused on BamD rather than BamB, given that both were mentioned to mediate interaction with substrate. Was BamB also tested?

      We thank the reviewer for this comment. Following the reviewer's suggestion, we have now performed a pull-down experiment on BamB and added it to Fig. S9. We also modified the text of the results as follows.

      (Lines 262-265) "Three subunits of the BAM complex have been previously shown to interact with the substrates: BamA, BamB, and BamD (Hagan et al., 2013; Harrison, 1996; Ieva et al., 2011). In vitro pull-down assay showed that while BamA and BamD can independently bind to the in vitro translated OmpC polypeptide (Fig .S9A), BamB did not (Fig. S9B)."

      1. For the in vitro folding assays of the OmpC substrates, labeled and unlabeled, no mention of adding SurA or any other chaperone which is known to be important for mediating OMP biogenesis in vitro.

      We appreciate the reviewer’s concerns on this point, however chaperones such as SurA are non-essential factors in the OMP assembly reaction mediated by the BAM complex: the surA gene is not essential and the assembly of OMPs can be measured in the absence of exogenously added SurA. It remains possible that addition of SurA to some of these assays could be useful in detailing aspects of chaperone function in the context of the BAM complex, but that was not the intent of this study.

      1. For the supplementary document, it would be much easier for the reader to have the legends groups with the figures.

      Following the reviewer's suggestion, we have placed the legends of Supplemental Figures together with each Figure.

      1. Some of the figures and their captions are not grouped properly and are separated which makes it hard to interpret the figures efficiently.

      We thank the reviewer for this comment, we have revised the manuscript and figures to properly group the figures and captions together on a single page.

      1. The authors begin their 'Discussion' with a question (line 454), however, they don't appear to answer or even attempt to address it; suggest removing rhetorical questions.

      As per the reviewers’ suggestion, we removed this question.

      1. Line 464, 'unbiased' should be removed. This would imply that if not stated, experiments are 'negatively' biased.

      We removed this word and revised the sentence as follows:

      (Lines 431-433) "In our experimental approach to assess for inhibitory peptides, specific segments of the major porin substrate OmpC were shown to interact with the BAM complex as peptidomimetic inhibitors."

      1. Lines 466-467; '...go well beyond expected outcomes.' What does this statement mean?

      Our peptidomimetics led to unexpected results in elucidating the additional essential signal elements. The manuscript was revised as follows:

      (Lines 433-435) "Results for this experimental approach went beyond expected outcomes by identifying the essential elements of the signal Φxxxxxx[Ω/Φ]x[Ω/Φ] in β-strands other than the C-terminal strand."

      1. Line 478; '...rich information that must be oversimplified...'?

      We appreciate the reviewer’s pointed out. For more clarity, the manuscript was revised as follows:

      (Lines 450-453) "The abundance of information which arises from modeling approaches and from the multitude of candidate OMPs, is generally oversimplified when written as a primary structure description typical of the β-signal for bacterial OMPs (i.e. ζxGxx[Ω/Φ]x[Ω/Φ]) (Kutik et al., 2008)."

      1. There are typos in the supplementary figures.

      We have revised and corrected the Supplemental Figure legends.  

      Reviewer #2 (Recommendations For The Authors):

      1. In Supplementary Information, I recommend adding the figure legends directly to the corresponding figures. Currently, it is very inconvenient to go back and forth between legends and figures.

      Following the reviewer's suggestion, we have placed the legends of Supplemental Figures together with each Figure.

      1. Line 94 (p.3): "later"

      Lateral?

      Yes. We have corrected this.

      1. Line 113 (p.3): The result section, "Peptidomimetics derived from E. coli OmpC inhibit OMP assembly" Rationale of the peptide inhibition assay is not clear. How can the peptide sequence that effectively inhibit the assembly interpreted as the b-assembly signal? By competitive binding to BAM or by something else? What is the authors' hypothesis in doing this assay?

      In revision, we have added following sentence to explain the aim and design of the peptidomimetics:

      (Lines 140-145) "The addition of peptides with BAM complex affinity, such as the OMP β-signal, are capable of exerting an inhibitory effect by competing for binding of substrate OMPs to the BAM complex (Hagan et al., 2015). Thus, the addition of peptides derived from the entirety of OMPs to the EMM assembly assay, which can evaluate assembly efficiency with high accuracy, expects to identify novel regions that have affinity for the BAM complex."

      1. Line 113- (p.3) and Fig. S1: The result section, "Peptidomimetics derived from E. coli OmpC inhibit OMP assembly"

      Some explanation seems to be needed why b-barrel domain of EspP appears even without ProK?

      We appreciate the reviewer’s pointed out. We added following sentence to explain:

      (Lines 128-137) "EspP, a model OMP substrate, belongs to autotransporter family of proteins. Autotransporters have two domains; (1) a β-barrel domain, assembled into the outer membrane via the BAM complex, and (2) a passenger domain, which traverses the outer membrane via the lumen of the β-barrel domain itself and is subsequently cleaved by the correctly assembled β-barrel domain (Celik et al., 2012). When EspP is correctly assembled into outer membrane, a visible decrease in the molecular mass of the protein is observed due to the self-proteolysis. Once the barrel domain is assembled into the membrane it becomes protease-resistant, with residual unassembled and passenger domains degraded (Leyton et al., 2014; Roman-Hernandez et al., 2014)."

      1. Line 186 (p.6): "Y285"

      Y285A?

      We have corrected the error, it was Y285A.

      1. Lines 245- (p. 7)/ Lines 330- (p. 10)

      It needs to be clarified that the results described in these paragraphs were obtained from the assays with EMM.

      We appreciate the reviewer’s concerns on these points. For the first half, the following text was added at the beginning of the applicable paragraph to indicate that all of Fig. 4 is the result of the EMM assembly assay.

      (Line 241) "We further analyzed the role of internal β-signal by the EMM assembly assay. At the second half, we used purified BamD but not EMM. We described clearly with following sentence."

      (Lines 316-318) "We purified 40 different BPA variants of BamD, and then irradiated UV after incubating with 35S-labelled OmpC."

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The bacterial neurotransmitter:sodium symporter homoglogue LeuT is an well-established model system for understanding the fundamental basis for how human monoamine transporters, such as the dopamine and serotonin, couple ions with neurotransmitter uptake. Here the authors provide convincing data to show that the K+ catalyses the return step of the transport cycle in LeuT by binding to one of the two sodium sites. The paper is an important contribution, but it's still unclear exactly where K+ binds in LeuT, and how to incorporate K+ binding into a transport cycle mechanism.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript tackles an important question, namely how K+ affects substrate transport in the SLC6 family. K+ effects have previously been reported for DAT and SERT, but the prototypical SLC6fold transporter LeuT was not known to be sensitive to the K+ concentration. In this manuscript, the authors demonstrate convincingly that K+ inhibits Na+ binding, and Na+-dependent amino acid binding at high concentrations, and that K+ inside of vesicles containing LeuT increases the transport rate. However, outside K+ apparently had very little effect. Uptake data are supplemented with binding data, using the scintillation proximity assay, and transition metal FRET, allowing the observation of the distribution of distinct conformational states of the transporter.<br /> Overall, the data are of high quality. I was initially concerned about the use of solutions of very high ionic strength (the Km for K+ is in the 200 mM range), however, the authors performed good controls with lower ionic strength solutions, suggesting that the K+ effect is specific and not caused by artifacts from the high salt concentrations.

      The major issue I have with this manuscript is with the interpretation of the experimental data. Granted that the K+ effect seems to be complex. However, it seems counterintuitive that K+ competes with Na+ for the same binding site, while at the same time accelerating the transport rate. Even if K+ prevents rebinding of Na+ on the inside of vesicles, it would be expected that K+ then stabilizes this Na+-free conformation, resulting in a slowing of the transport rate. However, the opposite is found. I feel that it would be useful to perform some kinetic modeling of the transport cycle to identify a mechanism that would allow K+ to act as a competitive inhibitor of Na+ binding and rate-accelerator at the same time.

      This ties into the second point: It is not mentioned in the manuscript what the configuration of the vesicles is after LeuT reconstitution. Are they right-side out? Is LeuT distributed evenly in inside-out and right-side out orientation? Is the distribution known? If yes, how does it affect the interpretation of the uptake data with and without K+ gradient?

      Finally, mutations were only made to the Na1 cation binding site. These mutations have an effect mostly to be expected, if K+ would bind to this site. However, indirect effects of mutations can never be excluded, and the authors acknowledge this in the discussion section. It would be interesting to see the effect of K+ on a couple of mutants that are far away from Na+/substrate binding sites. This could be another piece of evidence to exclude indirect effects, if the K+ affinity is less affected.

      Reviewer #2(Public Review):

      To characterize the relationship between Na+ and K+ binding to LeuT, the effect of K+ on Na+- dependent [3 H] leucine binding was studied using a scintillation proximity assay. In the presence of K+ the apparent affinity for sodium was reduced but the maximal binding capacity for this ion was unchanged, consistent with a competitive mechanism of inhibition between Na+ and K+.

      To obtain a more direct readout of K+ binding to LeuT, tmFRET was used. This method relies on the distance-dependent quenching of a cysteine-conjugated fluorophore (FRET donor) by a transition metal (FRET acceptor). This method is a conformational readout for both ion- and ligand-binding. Along with the effect of K+ on Na+-dependent [3 H] leucine binding, the findings support the existence of a specific K+ binding site in LeuT and that K+ binding to this site induces an outward closed conformation.

      It was previously shown that in liposomes inlaid with LeuT by reconstitution, intra-vesicular K+ increases the concentrative capacity of [ 3 H] alanine. To obtain insights into the mechanistic basis of this phenomenon, purified LeuT was reconstituted into liposomes containing a variety of cations, including Na+ and K+ followed by measurements of [ 3 H] alanine uptake driven by a Na+ gradient.

      The ionic composition of the external medium was manipulated to determine if the stimulation of [3 H] alanine uptake by K+ was due to an outward directed potassium gradient serving as a driving force for sodium-dependent substrate transport by moving in the direction opposite to that of sodium and the substrate. Remarkably it was found that it is the intra-liposomal K+ per se that increases the transport rate of alanine and not a K+ gradient, suggesting that binding of K+ to the intra-cellular face of the transporter could prevent the rebinding of sodium and the substrate thereby reducing their efflux from the cell. These conclusions assume that the measured radioactive transport is via right-side-out liposomes rather than from their inverted counterparts (in case of a random orientation of the transporters in the proteoliposomes). Even though this assumption is likely to be correct, it should be tested.

      Since K+- and Na+-binding are competitive and K+ excludes substrate binding, the Authors chose to focus on the Na1 site where the carboxyl group of the substrate serves as one of the groups which coordinate the sodium ion. This was done by the introduction of conservative mutations of the amino acid residues forming the Na1 site. The potassium interaction in these mutants was monitored by sodium dependent radioactive leucine binding. Moreover, the effect the effect of Na+ with and without substrate as well as that of potassium on the conformational equilibria was measured by tmFRET measurements on the mutants introduced in the construct enabling the measurements. The results suggest that K+-binding to LeuT modulates substrate transport and that the K+ affinity and selectivity for LeuT is sensitive to mutations in the Na1 site, pointing toward the Na1 site as a candidate site for facilitating the interaction between K+ in some NSS members.

      The data presented in this manuscript are of very high quality. They are a detailed extension of results by the same group (Billesbolle et. al, Ref. 16 from the list) providing more detailed information on the importance of the Na1 site for potassium interaction. Clearly this begs for the identification of the binding site in a potassium bound LeuT structure in the future. Presumably LeuT was studied here because it appears that it is relatively easy to determine structures of many conformational states. Furthermore, convincing evidence showed that the stimulatory effect of K+ on transport is not because of energization of substrate accumulation but is rather due to the binding of this cation to a specific site.

      Reviewer #1 (Recommendations For The Authors):

      • Include a transport mechanism that can account for the K+ effects.

      We appreciate the opportunity to elaborate further regarding how we envision this complex mechanism. It is generally known that, within the LeuT-fold transporters, the return step is ratelimiting for the transport process. Our data suggests that K+ binds to the inward-facing apo form.

      Accordingly, we propose that the role of K+ binding is to facilitate LeuT to overcome the rate-limiting step. We propose the following mechanistic model: When Na+ and substrate is released to the intracellular environment the transporter must return to the outward-facing conformation. This can happen in (at least) two ways: 1) The transporter in its apo-form closes the inner gate and opens to the extracellular side, now ready to perform a new transport cycle. 2) The transporter rebinds Na+, which allows for the rebinding of substrate. It can now go in reverse (efflux) or it once again release its content. The transporter can naturally also only rebind Na+ and release it again to the cytosol.

      The purpose of K+ binding is to prevent Na+ rebinding and to promote a conformational state of the transporter, which does not allow Na+ binding. Even though Na+ has a higher affinity for the site, K+ is much more abundant.

      This model is supported by our previous experiment, showing that intravesicular K+ prevents [3H]alanine efflux while LeuT performs Na+-dependent alanine transport. Thus, the increase in Vmax could be due to a decreased efflux (exchange mode), or a facilitation of the rate-limiting step, or a combination of the two.

      Note that the model does not require that K+ is counter-transported. It just has to prevent Na+ rebinding. However, even though we failed to show K+ counter-transport, it does not mean that it does not happen. Further experiments must clarify this issue.

      To be more explicit about our proposed mechanistic model, we have expanded the last paragraph in the Discussion section. It now reads:

      “We propose that K+ binding either facilitates LeuT transition from inward- to outward-facing (the rate limiting step of the transport cycle), or solely prevents the rebinding and possible efflux of Na+ and substrate. It could also be a combination of both. Either way, intracellular K+ will lead to an increase in Vmax and concentrative capacity. Note that our previous experiment showed an increased [3H]alanine efflux when LeuT transports alanine in the absence of intra-vesicular K+16. Specifically, the mechanistic impact of K+ could be to catalyze LeuT away from the state that allows the rebinding of Na+ and substrate. This way, K+ binding would decrease the possible rebinding of intracellularly released Na+ and substrate, thereby rectifying the transport process and increase the concentrative capacity and Vmax (Figure 6). Our results suggest that K+ is not counter-transported but rather promotes LeuT to overcome an internal rate limiting energy barrier. However, further investigations must be performed before any conclusive statement can be made here.”

      • Describe the orientation of the transporter in the vesicles.

      When working with reconstituted NSS, the transport activity is determined by the Na+ gradient. This is also evident in the experiments where we dissipate the Na+ gradient. Here we find transport activity compatible to background. We can also see in the literature, that directionality is rarely determined for transport proteins in reconstituted systems. When that is said, it is difficult to know how the inside-out LeuT contribute to the transport process. Will they work in reverse and contribute to the accumulation of intravesicular [3H]alanine? If so, to what extent? They will likely not be affected by the intravesicular K+. Therefore, their possible contribution will ‘work against’ our results and decrease the apparent K+ effects reported herein. Taken together, unless the vast majority of LeuT molecules are inside-out, knowing the actual proportion will not, in our perspective, affect our interpretations and conclusions of the data.

      When that is said, we have also been curious about this issue and with the question raised by the reviewer, we performed the suggested experiment. We have inserted the results in Figure 3 – Figure supplement 1D. The figure shows that a fraction of the reconstituted LeuT are susceptible to thrombin cleavage of the accessible C-terminal. We have quantified the cleaved fraction to around 40% of the total (see Author response image 1 below). It is, however, a crude estimate since it is difficult to perform reliable dosimetry with fractions that close together. Thus, we are reluctant to add a quantitative measure in the article text.

      Author response image 1.

      We have inserted the following in the main text:

      “It is difficult to control the directionality of proteins when they are reconstituted into lipid vesicles. They will be inserted in both orientations. Outside-out and inside-out. In the case of LeuT it is the imposed Na+-gradient which is determines the directionality of transport. Uptake through the insideout transporters will probably also happen. Note that the inside-out LeuT will not have the K+ binding site exposed to the intra-vesicular environment. Accordingly, a propensity of transporters will likely not be influenced by the added K+ and will tend to mask the contribution of K+ to the transport mode from the right-side out LeuT. To investigate LeuT directionality in our reconstituted samples, we performed thrombin cleavage of accessible C-terminals on intact and perforated vesicles, respectively. The result suggests that the proportion of LeuT inserted as outside-out is larger than the proportion with an inside-out directionality (Figure 3 – Figure supplement 1D).”

      For the inserted Figure 3 – Figure supplement 1D, we have added the following legend:<br /> “(D) SDS-PAGE analysis of LeuT proteoliposomes following time-dependent thrombin digestion of accessible C-terminals (reducing the mass of LeuT by ~1.3 kDa). The reaction was terminated by the addition of PMSF at the specified time points. The lanes corresponding to the time-dependent proteolysis are flanked by lanes containing proteoliposomes without thrombin (left, 0 min) or digested in the presence of DDM (right, 180 min+DDM). Arrows indicate bands of full-length (top) and cleaved (bottom) LeuT.”

      • Check the effects of mutations away from the Na1 cation binding site.

      We have included the LeuT K398C in the study as a negative control for unspecific effects on Na+ and K+ binding. The mutant exhibit Na+ dependent [3H]leucine binding and K+-dependency similar to LeuT WT – see Table 2 and Table 2 - Figure Supplement 1G.

      As a minor point, the authors use the term "affinity" liberally. However, unless these are direct binding experiments, the term "apparent affinity" may be more appropriate, since Km values are affected by the transport cycle (in uptake), as well as binding of cations/substrate.

      We thank the reviewer for emphasizing this important point. We have revised the manuscript accordingly. We use ‘affinity’ when it has been determined under equilibrium conditions, either as a SPA binding experiment or based on tmFRET. We use the term ‘Km’ when the apparent affinity has been determined during non-equilibrium conditions such as during substrate transport.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in part 2, it is important to show the effect of internal potassium on transport in-sided liposomes. This could be done using the methodology developed by Tsai et. al. Biochemistry 51 (2012) 1557-1585.

      We appreciate this important point and have performed the suggested experiment. See reviewer 1 comment #2

      In the Abstract and throughout it is mentioned that K+ is not counter transported, yet on the bottom of p. 16 it is mentioned that this is possible.

      We have tried to be very cautious with any interpretation about whether K+ is only binding or whether it is also counter-transported. Either way, it must facilitate a transition towards a non-Na+ binding state. We tried to differentiate between the two possibilities by investigating if an outwarddirected K+ gradient alone could drive transport (Figure 3E). We do not observe any significant difference from background (no gradient). However, the gained information is rather weak: It is still possible that K+ is counter-transported, but the K+ gradient does not impose any driving force. Instead, it ensures a rectification of the Na+-dependent substrate transport. If so, this experiment would come up negative even if K+ is counter-transported.

      To be more explicit, we have changed the wording on page 16.

      Our results suggests that K+ is not counter-transported, but rather promote LeuT to overcome an internal rate limiting energy barrier. However, further investigations must be performed before any conclusive statement can be made here.

      Fig.2-Fig. Supplement 1: it is important to show that the effect of leucine is sodium-dependent by adding the control K+ and leucine.

      We thank the reviewer for suggesting this important control. We have added the experiment to Figure 2 – Figure supplement 1 as suggested. The effect is not different from K+ alone supporting the SPA-binding data that K+-binding does not promote substrate binding.

      Point for discussion: Whereas potassium is counter transported in SERT, there are conflicting interpretations on this in DAT (Ref. 15 from the list and Bhat et. al eLife (2021) 10:e67996). The situation in LeuT seems like the scenario described by Bhat et. al.

      We appreciate the suggestion for a proposed link between LeuT and hDAT. Although, as mentioned above, we find it early days to be too certain on this option. We have now mentioned the mechanistic similarity in the Discussion following our description of the proposed mechanistic model (see first request from reviewer #1):

      “If K+ is not counter-transported, LeuT might comply with the mechanism previously suggested for the human DAT31.”

      Fig. 5-Fig. Supplement 1: Why are no data on N27Q and N286Q given? If these mutants have no transport activity this should be stated. Moreover, alanine uptake by A22V is almost sodium independent and is also very fast, suggesting binding, not transport. Are the counts sensitive to ionophores like nigericin?

      We appreciate this important point. Indeed, the LeuT N27Q and N286Q are transport inactive. This information is now inserted in the main text when describing the conformational dynamics of N27QtmFRET and N286QtmFRET.

      We agree with the reviewer that the [3H]alanine uptake for A22V is not very conclusive. The vesicles with Na+ on both sides (open diamonds) do allow [3H]alanine binding. Vesicles with added gramicidin are similar in activity. The fast rate could indeed suggest a binding event. This we also do not rule out in the main text. However, the contribution in activity from LeuT A22V in vesicles with a Na+ gradient cannot be explained by a binding event alone. Then it should bind more [3H]alanine in the presence of a Na+ gradient, which is possible, but hard to imagine. Also, the alanine affinity for LeuT A22V is ~1 µM (Table 1). At this affinity it should be literally impossible to detect any binding because the off-rate is so fast that it would all dissociate during the washing procedure.

      We have described the data and left out any interpretation (e.g. changed ‘[3H]alanine transport’ to ‘[3H]alanine activity’). In addition, we have replaced: “This correlates with the lack of changes in conformational equilibrium observed in the tmFRET data between the NMDG+, Na+ and K+ states.” with: “Further investigations must clarify whether the changes in observed [3H]alanine activity constitutes a transport- or a binding event.”

      Lower part of p. 16. The Authors speculate "that the mechanistic impact of K+ binding could be to accelerate a transition away from the conformation where Na+ and substrate are released, to a state where they can no longer rebind and thus revert the transport process (efflux)". This could be easily tested by measuring exchange, which should not be influenced by potassium.

      We performed this experiment in Billesbolle et al. 2016. Nat Commun (Fig. 1f). We show that the exchange is decreased in the presence of K+. We hypothesize that this is because K+ binding forces LeuT away from the exchange mode.

    1. Author Response

      Response to the Reviews

      We are grateful for these balanced, nuanced evaluations of our work concerning the observed epistatic trends and our interpretations of their mechanistic origins. Overall, we think the reviewers have done an excellent job at recognizing the novel aspects of our findings while also discussing the caveats associated with our interpretations of the biophysical effects of these mutations. We believe it is important to consider both of these aspects of our work in order to appreciate these advances and what sorts of pertinent questions remain.

      Notably, both reviewers suggest that a lack of experimental approaches to compare the conformational properties of GnRHR variants weakens our claims. We would first humbly suggest that this constitutes a more general caveat that applies to nearly all investigations of the cellular misfolding of α-helical membrane proteins. Whether or not any current in vitro folding measurements report on conformational transitions that are relevant to cellular protein misfolding reactions remains an active area of debate (discussed further below). Nevertheless, while we concede that our structural and/ or computational evaluations of various mutagenic effects remain speculative, prevailing knowledge on the mechanisms of membrane protein folding suggest our mutations of interest (V276T and W107A) are highly unlikely to promote misfolding in precisely the same way. Thus, regardless of whether or not we were able experimentally compare the relevant folding energetics of GnRHR variants, we are confident that the distinct epistatic interactions formed by these mutations reflect variations in the misfolding mechanism and that they are distinct from the interactions that are observed in the context of stable proteins. In the following, we provide detailed considerations concerning these caveats in relation to the reviewers’ specific comments.

      Reviewer #1 (Public Review):

      The paper carries out an impressive and exhaustive non-sense mutagenesis using deep mutational scanning (DMS) of the gonadotropin-releasing hormone receptor for the WT protein and two single point mutations that I) influence TM insertion (V267T) and ii) influence protein stability (W107A), and then measures the effect of these mutants on correct plasma membrane expression (PME).

      Overall, most mutations decreased mGnRHR PME levels in all three backgrounds, indicating poor mutational tolerance under these conditions. The W107A variant wasn't really recoverable with low levels of plasma membrane localisation. For the V267T variant, most additional mutations were more deleterious than WT based on correct trafficking, indicating a synergistic effect. As one might expect, there was a higher degree of positive correlation between V267T/W107A mutants and other mutants located in TM regions, confirming that improper trafficking was a likely consequence of membrane protein co-translational folding. Nevertheless, context is important, as positive synergistic mutants in the V27T could be negative in the W107A background and vice versa. Taken together, this important study highlights the complexity of membrane protein folding in dissecting the mechanism-dependent impact of disease-causing mutations related to improper trafficking.

      Strengths

      This is a novel and exhaustive approach to dissecting how receptor mutations under different mutational backgrounds related to co-translational folding, could influence membrane protein trafficking.

      Weaknesses

      The premise for the study requires an in-depth understanding of how the single-point mutations analysed affect membrane protein folding, but the single-point mutants used seem to lack proper validation.

      Given our limited understanding of the structural properties of misfolded membrane proteins, it is unclear whether the relevant conformational effects of these mutations can be unambiguously validated using current biochemical and/ or biophysical folding assays. X-ray crystallography, cryo-EM, and NMR spectroscopy measurements have demonstrated that many purified GPCRs retain native-like structural ensembles within certain detergent micelles, bicelles, and/ or nanodiscs. However, helical membrane protein folding measurements typically require titration with denaturing detergents to promote the formation of a denatured state ensemble (DSE), which will invariably retain considerable secondary structure. Given that the solvation provided by mixed micelles is clearly distinct from that of native membranes, it remains unclear whether these DSEs represent a reasonable proxy for the misfolded conformations recognized by cellular quality control (QC, see https://doi.org/10.1021/acs.chemrev.8b00532). Thus, the use and interpretation of these systems for such purposes remains contentious in the membrane protein folding community. In addition to this theoretical issue, we are unaware of any instances in which GPCRs have been found to undergo reversible denaturation in vitro- a practical requirement for equilibrium folding measurements (https://doi.org/10.1146/annurev-biophys-051013-022926). We note that, while the resistance of GPCRs to aggregation, proteolysis, and/ or mechanical unfolding have also been probed in micelles, it is again unclear whether the associated thermal, kinetic, and/ or mechanical stability should necessarily correspond to their resistance to cotranslational and/ or posttranslational misfolding. Thus, even if we had attempted to validate the computational folding predictions employed herein, we suspect that any resulting correlations with cellular expression may have justifiably been viewed by many as circumstantial. Simply put, we know very little about the non-native conformations are generally involved in the cellular misfolding of α-helical membrane proteins, much less how to measure their relative abundance. From a philosophical standpoint, we prefer to let cells tell us what sorts of broken protein variants are degraded by their QC systems, then do our best to surmise what this tells us about the relevant properties of cellular DSEs.

      Despite this fundamental caveat, we believe that the chosen mutations and our interpretation of their relevant conformational effects are reasonably well-informed by current modeling tools and by prevailing knowledge on the physicochemical drivers of membrane protein folding and misfolding. Specifically, the mechanistic constraints of translocon-mediated membrane integration provide an understanding of the types of mutations that are likely to disrupt cotranslational folding. Though we are still learning about the protein complexes that mediate membrane translocation (https://doi.org/10.1038/s41586-022-05336-2), it is known that this underlying process is fundamentally driven by the membrane depth-dependent amino acid transfer free energies (https://doi.org/10.1146/annurev.biophys.37.032807.125904). This energetic consideration suggests introducing polar side chains near the center of a nascent TMDs should almost invariably reduce the efficiency of topogenesis. To confirm this in the context of TMD6 specifically, we utilized a well-established biochemical reporter system to confirm that V276T attenuates its translocon-mediated membrane integration (Fig. S1)- at least in the context of a chimeric protein. We also constructed a glycosylation-based topology reporter for full-length GnRHR, but ultimately found its’ in vitro expression to be insufficient to detect changes in the nascent topological ensemble. In contrast to V276T, the W107A mutation is predicted to preserve the native topological energetics of GnRHR due to its position within a soluble loop region. W107A is also unlike V276T in that it clearly disrupts tertiary interactions that stabilize the native structure. This mutation should preclude the formation of a structurally conserved hydrogen bonding network that has been observed in the context of at least 25 native GPCR structures (https://doi.org/10.7554/eLife.5489). However, without a relevant folding assay, the extent to which this network stabilizes the native GnRHR fold in cellular membranes remains unclear. Overall, we admit that these limitations have prevented us from measuring how much V276T alters the efficiency of GnRHR topogenesis, how much the W107A destabilizes the native fold, or vice versa. Nevertheless, given these design principles and the fact that both reduce the plasma membrane expression of GnRHR, as expected, we are highly confident that the structural defects generated by these mutations do, in fact, promote misfolding in their own ways. We also concede that the degree to which these mutagenic perturbations are indeed selective for specific folding processes is somewhat uncertain. However, it seems exceedingly unlikely that these mutations should disrupt topogenesis and/ or the folding of the native topomer to the exact same extent. From our perspective, this is the most important consideration with respect to the validity of the conclusions we have made in this manuscript.

      Furthermore, plasma membrane expression has been used as a proxy for incorrect membrane protein folding, but this not necessarily be the case, as even correctly folded membrane proteins may not be trafficked correctly, at least, under heterologous expression conditions. In addition, mutations can affect trafficking and potential post-translational modifications, like glycosylation.

      While the reviewer is correct that the sorting of folded proteins within the secretory pathway is generally inefficient, it is also true that the maturation of nascent proteins within the ER generally bottlenecks the plasma membrane expression of most α-helical membrane proteins. Our group and several others have demonstrated that the efficiency of ER export generally appears to scale with the propensity of membrane proteins to achieve their correct topology and/ or to achieve their native fold (see https://doi.org/10.1021/jacs.5b03743 and https://doi.org/10.1021/jacs.8b08243). Notably, these investigations all involved proteins that contain native glycosylation and various other post-translational modification sites. While we cannot rule out that certain specific combinations of mutations may alter expression through their perturbation of post-translational GnRHR modifications, we feel confident that the general trends we have observed across hundreds of variants predominantly reflect changes in folding and cellular QC. This interpretation is supported by the relationship between observed trends in variant expression and Rosetta-based stability calculations, which we identified using unbiased unsupervised machine learning approaches (compare Figs. 6B & 6D).

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Chamness and colleagues make a pioneering effort to map epistatic interactions among mutations in a membrane protein. They introduce thousands of mutations to the mouse GnRH Receptor (GnRHR), either under wild-type background or two mutant backgrounds, representing mutations that destabilize GnRHR by distinct mechanisms. The first mutant background is W107A, destabilizing the tertiary fold, and the second, V276T, perturbing the efficiency of cotranslational insertion of TM6 to the membrane, which is essential for proper folding. They then measure the surface expression of these three mutant libraries, using it as a proxy for protein stability, since misfolded proteins do not typically make it to the plasma membrane. The resulting dataset is then used to shed light on how diverse mutations interact epistatically with the two genetic background mutations. Their main conclusion is that epistatic interactions vary depending on the degree of destabilization and the mechanism through which they perturb the protein. The mutation V276T forms primarily negative (aggravating) epistatic interactions with many mutations, as is common to destabilizing mutations in soluble proteins. Surprisingly, W107A forms many positive (alleviating) epistatic interactions with other mutations. They further show that the locations of secondary mutations correlate with the types of epistatic interactions they form with the above two mutants.

      Strengths:

      Such a high throughput study for epistasis in membrane proteins is pioneering, and the results are indeed illuminating. Examples of interesting findings are that: (1) No single mutation can dramatically rescue the destabilization introduced by W107A. (2) Epistasis with a secondary mutation is strongly influenced by the degree of destabilization introduced by the primary mutation. (3) Misfolding caused by mis-insertion tends to be aggravated by further mutations. The discussion of how protein folding energetics affects epistasis (Fig. 7) makes a lot of sense and lays out an interesting biophysical framework for the findings.

      Weaknesses:

      The major weakness comes from the potential limitations in the measurements of surface expression of severely misfolded mutants. This point is discussed quite fairly in the paper, in statements like "the W107A variant already exhibits marginal surface immunostaining" and many others. It seems that only about 5% of the W107A makes it to the plasma membrane compared to wild-type (Figures 2 and 3). This might be a low starting point from which to accurately measure the effects of secondary mutations.

      The reviewer raises an excellent point that we considered at length during the analysis of these data and the preparation of the manuscript. Though we remain confident in the integrity of these measurements and the corresponding analyses, we now realize this aspect of the data merits further discussion and documentation in our forthcoming revision, in which we will outline the following specific lines of reasoning.

      Still, the authors claim that measurements of W107A double mutants "still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants". I was not entirely convinced that this was true.

      We made this statement based on the simple observation that the surface immunostaining intensities across the population of recombinant cells expressing the library of W107A double mutants was consistently broader than that of recombinant cells expressing W107A GnRHR alone (see Author response image 1 for reference). Given that the recombinant cellular library represents a mix of cells expressing ~1600 individual variants that are each present at low abundance, the pronounced tails within this distribution presumably represent the composite staining of many small cellular subpopulations that express collections of variants that deviate from the expression of W107A to an extent that is significant enough to be visible on a log intensity plot.

      Author response image 1.

      Firstly, I think it would be important to test how much noise these measurements have and how much surface immunostaining the W107A mutant displays above the background of cells that do not express the protein at all.

      For reference, the average surface immunostaining intensity of HEK293T cells transiently expressing W107A GnRHR was 2.2-fold higher than that of the IRES-eGFP negative, untransfected cells within the same sample- the WT immunostaining intensity was 9.5-fold over background by comparison. Similarly, recombinant HEK293T cells expressing the W107A double mutant library had an average surface immunostaining intensity that was 2.6-fold over background across the two DMS trials. Thus, while the surface immunostaining of this variant is certainly diminished, we were still able to reliably detect W107A at the plasma membrane even under distinct expression regimes. We will include these and other signal-to-noise metrics for each experiment in a new table in the revised version of this manuscript.

      Beyond considerations related to intensity, we also previously noticed the relative intensity values for W107A double mutants exhibited considerable precision across our two biological replicates. If signal were too poor to detect changes in variant expression, we would have expected a plot of the intensity values across these two replicates to form a scatter. Instead, we found DMS intensity values for individual variants to be highly correlated from one replicate to the next (Pearson’s R= 0.97, see Author response image 2 for reference). This observation empirically demonstrates that this assay consistently differentiated between variants that exhibit slightly enhanced immunostaining from those that have even lower immunostaining than W107A GnRHR.

      Author response image 2.

      But more importantly, it is not clear if under this regimen surface expression still reports on stability/protein fitness. It is unknown if the W107A retains any function or folding at all. For example, it is possible that the low amount of surface protein represents misfolded receptors that escaped the ER quality control.

      While we believe that such questions are outside the scope of this work, we certainly agree that it is entirely possible that some of these variants bypass QC without achieving their native fold. This topic is quite interesting to us but is quite challenging to assess in the context of GPCRs, which have complex fitness landscapes that involve their propensity to distinguish between different ligands, engage specific components associated with divergent downstream signaling pathways, and navigate between endocytic recycling/ degradation pathways following activation. In light of the inherent complexity of GPCR function, we humbly suggest our choice of a relatively simple property of an otherwise complex protein may be viewed as a virtue rather than a shortcoming. Protein fitness is typically cast as the product of abundance and activity. Rather than measuring an oversimplified, composite fitness metric, we focused on one variable (plasma membrane expression) and its dominant effector (folding). We believe restraining the scope in this manner was key for the elucidation of clear mechanistic insights.

      The differential clustering of epistatic mutations (Fig. 6) provides some interesting insights as to the rules that dictate epistasis, but these too are dominated by the magnitude of destabilization caused by one of the mutations. In this case, the secondary mutations that had the most interesting epistasis were exceedingly destabilizing. With this in mind, it is hard to interpret the results that emerge regarding the epistatic interactions of W107A. Furthermore, the most significant positive epistasis is observed when W107A is combined with additional mutations that almost completely abolish surface expression. It is likely that either mutation destabilizes the protein beyond repair. Therefore, what we can learn from the fact that such mutations have positive epistasis is not clear to me. Based on this, I am not sure that another mutation that disrupts the tertiary folding more mildly would not yield different results. With that said, I believe that the results regarding the epistasis of V276T with other mutations are strong and very interesting on their own.

      We agree with the reviewer. In light of our results we believe it is virtually certain that the secondary mutations characterized herein would be likely to form distinct epistatic interactions with mutations that are only mildly destabilizing. Indeed, this insight reflects one of the key takeaway messages from this work- stability-mediated epistasis is difficult to generalize because it should depend on the extent to which each mutation changes the stability (ΔΔG) as well as initial stability of the WT/ reference sequence (ΔG, see Figure 7). Frankly, we are not so sure we would have pieced this together as clearly had we not had the fortune (or misfortune?) of including such a destructive mutation like W107A as a point of reference.

      Additionally, the study draws general conclusions from the characterization of only two mutations, W107A and V276T. At this point, it is hard to know if other mutations that perturb insertion or tertiary folding would behave similarly. This should be emphasized in the text.

      We agree and will be sure to emphasize this point in the revised manuscript.

      Some statistical aspects of the study could be improved:

      1. It would be nice to see the level of reproducibility of the biological replicates in a plot, such as scatter or similar, with correlation values that give a sense of the noise level of the measurements. This should be done before filtering out the inconsistent data.

      We thank the reviewer for this suggestion and will include scatters for each genetic background like the one shown above in the supplement of the revised version of the manuscript.

      1. The statements "Variants bearing mutations within the C- terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E)." and "In contrast, mutations that are 210 better tolerated in the context of W107A mGnRHR are located 211 throughout the structure but are particularly abundant among residues 212 in the middle of the primary structure that form TMD4, ICL2, and ECL2 213 (Fig. 4 C & F)." are both hard to judge. Inspecting Figures 4B and C does not immediately show these trends, and importantly, a solid statistical test is missing here. In Figures 4E and F the locations of the different loops and TMs are not indicated on the structure, making these statements hard to judge.

      We apologize for this oversight and thank the reviewer for pointing this out. We will include additional statistical tests to reinforce these conclusions in the revised version of the manuscript.

      1. The following statement lacks a statistical test: "Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD)." Is this enrichment significant? Further in the same paragraph, the claim that "In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants". Needs to be backed by relevant data and statistics, or at least a reference.

      We will include additional statistical tests for this in the revised manuscript and will ensure the language we use is consistent with the strength of the indicated statistical enrichment.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for organizing the reviews for our manuscript: Behavioral entrainment to rhythmic auditory stimulation can be modulated by tACS depending on the electrical stimulation field properties,” and for the positive eLife assessment. We also thank the reviewers for their constructive comments. We have addressed every comment, which has helped to improve the transparency and readability of the manuscript. The main changes to the manuscript are summarized as follows:

      1. Surrogate distributions were created for each participant and session to estimate the effect of tACS-phase lag on behavioral entrainment to the sound that could have occurred by chance or because of our analysis method (R1). The actual tACS-amplitude effects were normalized relative to the surrogate distribution, and statistical analysis was performed on the normalized (z-score) values. This analysis did not change our main outcome: that tACS modulates behavioral entrainment to the sound depending on the phase lag between the auditory and the electrical signals. This analysis has now been incorporated into the Results section and in Fig. 3c-d.

      2. Two additional supplemental figures were created to include the single-participant data related to Fig. 3b and 3e (R2).

      3. Additional editing of the manuscript has been performed to improve the readability.

      Below, you will find a point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      We are grateful for the reviewer’s positive assessment of the potential impact of our study. The reviewer’s primary concerns were 1) the tACS lag effects reported in the manuscript might be noise because of the realignment procedure, and 2) no multiple comparisons correction was conducted in the model comparison procedure.

      In response to point 1), we have reanalyzed the data in exactly the manner prescribed by the reviewer. Our effects remain, and the new control analysis strengthens the manuscript. 2) In the context of model comparison, the model selection procedure was not based on evaluating the statistical significance of any model or predictor. Instead, the single model that best fit the data was selected as the model with the lowest Akaike’s information criterion (AIC), and its superiority relative to the second-best model was corroborated using the likelihood ratio test. Only the best model was evaluated for significance and analyzed in terms of its predictors and interactions. This model is an omnibus test and does not require multiple comparison correction unless there are posthoc decompositions. For similar approaches, see (Kasten et al., 2019).

      Below, we have responded to each comment specifically or referred to this general comment.

      Summary of what the authors were trying to achieve.

      This paper studies the possible effects of tACS on the detection of silence gaps in an FM-modulated noise stimulus. Both FM modulation of the sound and the tACS are at 2Hz, and the phase of the two is varied to determine possible interactions between the auditory and electric stimulation. Additionally, two different electrode montages are used to determine if variation in electric field distribution across the brain may be related to the effects of tACS on behavioral performance in individual subjects.

      Major strengths and weaknesses of the methods and results.

      The study appears to be well-powered to detect modulation of behavioral performance with N=42 subjects. There is a clear and reproducible modulation of behavioral effects with the phase of the FM sound modulation. The study was also well designed, combining fMRI, current flow modeling, montage optimization targeting, and behavioral analysis. A particular merit of this study is to have repeated the sessions for most subjects in order to test repeat-reliability, which is so often missing in human experiments. The results and methods are generally well-described and well-conceived. The portion of the analysis related to behavior alone is excellent. The analysis of the tACS results is also generally well described, candidly highlighting how variable results are across subjects and sessions. The figures are all of high quality and clear. One weakness of the experimental design is that no effort was made to control for sensation effects. tACS at 2Hz causes prominent skin sensations which could have interacted with auditory perception and thus, detection performance.

      The reviewer is right that we did not control for the sensation effects in our paradigm. We asked the participants to rate the strength of the perceived stimulation after each run. However, this information was used only to assess the safety and tolerability of the stimulation protocol. Nevertheless, we did not consider controlling for skin sensations necessary given the within-participant nature of our design (all participants experienced all six tACS–audio phase lag conditions, which were identical in their potential to cause physical sensations; the only difference between conditions was related to the timing of the auditory stimulus). That is, while the reviewer is right that 2-Hz tACS can indeed induce skin sensation under the electrodes, in this study, we report the effects that depend on the tACS-phase lag relative to the FM-stimulus. Note that the starting phase of the FM-stimulus was randomized across trials within each block (all six tACS audio lags were presented in each block of stimulation). We have no reason to expect the skin sensation to change with the tACS-audio lag from trial to trial, and therefore do not consider this to be a confound in our design. We have added some sentences with this information to the Discussion section:

      Pages 16-17, lines 497-504: “Note that we did not control for the skin sensation induced by 2-Hz tACS in this experiment. Participants rated the strength of the perceived stimulation after each run. However, this information was used only to assess the safety and tolerability of the stimulation protocol. It is in principle possible that skin sensation would depend on tACS phase itself. However, in this study, we report effects that depend on the relationship between tACS-phase and FM-stimulus phase, which changed from trial to trial as the starting phase of the FM-stimulus was randomized across trials. We have no reason to expect the skin sensation to change with the tACS-audio lag and therefore do not consider this to be a confound in our data.”

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      Unfortunately, the main effects described for tACS are encumbered by a lack of clarity in the analysis. It does appear that the tACS effects reported here could be an artifact of the analysis approach. Without further clarification, the main findings on the tACS effects may not be supported by the data.

      Likely impact of the work on the field, and the utility of the methods and data to the community.

      The central claim is that tACS modulates behavioral detection performance across the 0.5s cycle of stimulation. However, neither the phase nor the strength of this effect reproduces across subjects or sessions. Some of these individual variations may be explainable by individual current distribution. If these results hold, they could be of interest to investigators in the tACS field.

      The additional context you think would help readers interpret or understand the significance of the work.

      The following are more detailed comments on specific sections of the paper, including details on the concerns with the statistical analysis of the tACS effects.

      The introduction is well-balanced, discussing the promise and limitations of previous results with tACS. The objectives are well-defined.

      The analysis surrounding behavioral performance and its dependence on the phase of the FM modulation (Figure 3) is masterfully executed and explained. It appears that it reproduces previous studies and points to a very robust behavioral task that may be of use in other studies.

      Again, we would like to thank the reviewer for the positive assessment of the potential impact of our work and for the thoughtful comments regarding the methodology. For readability in our responses, we have numbered the comments below.

      1. There is a definition of tACS(+) vs tACS(-) based on the relative phase of tACS that may be problematic for the subsequent analysis of Figures 4 and 5. It seems that phase 0 is adjusted to each subject/session. For argument's sake, let's assume the curves in Fig. 3E are random fluctuations. Then aligning them to best-fitting cosine will trivially generate a FM-amplitude fluctuation with cosine shape as shown in Fig. 4a. Selecting the positive and negative phase of that will trivially be larger and smaller than a sham, respectively, as shown in Fig 4b. If this is correct, and the authors would like to keep this way of showing results, then one would need to demonstrate that this difference is larger than expected by chance. Perhaps one could randomize the 6 phase bins in each subject/session and execute the same process (fit a cosine to curves 3e, realign as in 4a, and summarize as in 4b). That will give a distribution under the Null, which may be used to determine if the contrast currently shown in 4b is indeed statistically significant.

      We agree with the reviewer’s concerns regarding the possible bias induced by the realignment procedure used to estimate tACS effects. Certainly, when adjusting phase 0 to each participant/session’s best tACS phase (peak in the fitting cosine), selecting the positive phase of the realigned data will be trivially larger than sham (Fig. 4a). This is why the realigned zero-phase and opposite phase (trough) bins were excluded from the analysis in Fig. 4b. Therefore, tACS(+) vs. tACS(-) do not represent behavioral entrainment at the peak positive and negative tACS lags, as both bins were already removed from the analysis. tACS(+) and tACS(-) are the averages of two adjacent bins from the positive and negative tACS lags, respectively (Zoefel et al., 2019). Such an analysis relies on the idea that if the effect of tACS is sinusoidal, presenting the auditory stimulus at the positive half cycle should be different than when the auditory stimulus lags the electrical signal by the other half. If the effect of tACS was just random noise fluctuations, there is no reason to assume that such fluctuations would be sinusoidal; therefore, any bias in estimating the effect of tACS should be removed when excluding the peak to which the individual data were realigned. Similar analytical procedures have been used previously in the literature (Riecke et al., 2015; Riecke et al., 2018). We have modified the colors in Fig. 4a and 4c (former 4b) and added a new panel to the figure (new 4b) to make the realignment procedure, including the exclusion of the realigned peak and trough data, more visually obvious.

      Moreover, we very much like the reviewer’s suggestion to normalize the magnitude of the tACS effect using a permutation strategy. We performed additional analyses to normalize our tACS effect in Fig. 4c by the probability of obtaining the effect by chance. For each subject and session, tACS-phase lags were randomized across trials for a total of 1000 iterations. For each iteration, the gaps were binned by the FM-stimulus phase and tACS-lag. For each tACS-lag, the amplitude of behavioral entrainment to the FM-stimulus was estimated (FM-amplitude), as shown in Fig. 3. Similar to the original data, a second cosine fit was estimated for the FM-amplitude by tACS-lag. Optimal tACS-phase was estimated from the cosine fit and FM-amplitude values were realigned. Again, the realigned phase 0 and trough were removed from the analysis, and their adjacent bins were averaged to obtain the FM-amplitude at tACS(+) and tACS(−), as shown in Fig. 4c. We then computed the difference between 1) tACS(+) and sham, 2) tACS(-) and sham, and 3) tACS(+) and tACS (-), for the original data and the permuted datasets. This procedure was performed for each participant and session to estimate the size of the tACS effect for the original and surrogate data. The original tACS effects were transformed to z-scores using surrogate distributions, providing us with an estimate of the size of the real effect relative to chance. We then computed one-sample t-tests to compare whether the effects of tACS were statistically significant. In fact, this analysis showed that the tACS effects were still statistically significant. This analysis has been added to the Results and Methods sections and is included in Figure 4d.

      Page 10, lines 282-297: “In order to further investigate whether the observed tACS effect was significantly larger than chance and not an artifact of our analysis procedure (33), we created 1000 surrogate datasets per participant and session by permuting the tACS lag designation across trials. The same binning procedure, realignment, and cosine fits were applied to each surrogate dataset as for the original data. This yielded a surrogate distribution of tACS(+) and tACS(-) values for each participant and session. These values were averaged across sessions since the original analysis did not show a main effect of session. We then computed the difference between tACS(+) and sham, tACS(-) and sham, and tACS(+) and tACS(-), separately for the original and surrogate datasets. The obtained difference for the original data where then z-scored using the mean and standard deviation of the surrogate distribution. Note that in this case we used data of all 42 participants who had at least one valid session (37 participants with both sessions). Three one-sample t-tests were conducted to investigate whether the size of the tACS effect obtained in the original data was significantly larger than that obtained by chance (Fig. 4d). This analysis showed that all z-scores were significantly higher than zero (all t(41) > 2.36, p < 0.05, all p-values corrected for multiple comparisons using the Holm-Bonferroni method).”

      Page 31, lines 962-972: “To further control that the observed tACS effects were not an artifact of the analysis procedure, the difference between the tACS conditions (sham, tACS(+), and tACS(-)) were normalized using a permutation approach. For each participant and session, 1000 surrogate datasets were created by permuting the tACS lag designation across trials. The same binning procedure, realignment, and cosine fits were applied to each surrogate dataset as for the original data (see above). FM-amplitude at sham, tACS(+) and tACS(-) were averaged across sessions since the original analysis did not show a main effect of session. Difference between tACS conditions were estimated for the original and surrogate datasets and the resulting values from the original data were z-scored using the mean and standard deviation from the surrogate distributions. One-sample t-tests were conducted to test the statistical significance of the z-scores. P-values were corrected for multiple comparisons using the Holm-Bonferroni method.”

      1. Results of Fig 5a and 5b seem consistent with the concern raised above about the results of Fig. 4. It appears we are looking at an artifact of the realignment procedure, on otherwise random noise. In fact, the drop in "tACS-amplitude" in Fig. 5c is entirely consistent with a random noise effect.

      Please see our response to the comment above.

      1. To better understand what factors might be influencing inter-session variability in tACS effects, we estimated multiple linear models ..." this post hoc analysis does not seem to have been corrected for multiple comparisons of these "multiple linear models". It is not clear how many different things were tried. The fact that one of them has a p-value of 0.007 for some factors with amplitude-difference, but these factors did not play a role in the amplitude-phase, suggests again that we are not looking at a lawful behavior in these data.

      We suspect that the reviewer did not have access to the supplemental materials where all tables (relevant here is Table S3) are provided. This post hoc analysis was performed as an exploratory analysis to better understand the factors that could influence the inter-session variability of tACS effects. In Table S3, we provide the formula for each of the seven models tested, including their Akaike information criteria corrected for small samples (AICc), R2, F, and p-values. As described in the methods section, the winning model was selected as the model with the smallest AICc. A similar procedure has been previously used in the literature (Kasten et al., 2019). Moreover, to ensure that our winning model was better at explaining the data than the second-best unrestricted model, we used the likelihood ratio test. After choosing the winning model and before reporting the significance of the predictors, we examined the significance of the model in and of itself, taking into account its R2 as well as F- and p-values relative to a constant model. Thus, only one model is being evaluated in terms of statistical significance. Therefore, to our understanding, there are no multiple comparisons to correct for. We added the information regarding the selection procedure, hoping this will make the analysis clearer.

      See page 12, lines 354-360: “This model was selected because it had the smallest Akaike’s information criterion (corrected for small samples), AICc. Moreover, the likelihood ratio test showed no evidence for choosing the more complex unrestricted model (stat = 2.411, p = 0.121). Following the same selection criteria, the winning model predicting inter-session variability in tACS-phase, included only the factor gender (Table S4). However, this model was not significant in and of itself when compared to a constant model (F-statistic vs. constant model: 3.05, p = 0.09, R2 = 0.082).”

      1. "So far, our results demonstrate that FM-stimulus driven behavioral modulation of gap detection (FM-amplitude) was significantly affected by the phase lag between the FM-stimulus and the tACS signal (Audio-tACS lag) ..." There appears to be nothing in the preceding section (Figures 4 and 5) to show that the modulation seen in 3e is not just noise. Maybe something can be said about 3b on an individual subject/session basis that makes these results statistically significant on their own. Maybe these modulations are strong and statistically significant, but just not reproducible across subjects and sessions?

      Please see our response to the first comment regarding the validity of our analysis for proving the significant effect of tACS lag on modulating behavioral entrainment to the FM-stimulus (FM-amplitude), and the new control analysis. After performing the permutation tests, to make sure the reported effects are not noise, our statistical analysis still shows that tACS-lag does significantly modulate behavioral entrainment to the sound (FM-amplitude). Thus, the reviewer is right to say “these modulations are strong and statistically significant, just not reproducible across subjects and sessions”. In this regard, we consider our evaluation of session-to-session reliability of tACS effects is of high relevance for the field, as this is often overlooked in the literature.

      1. "Inter-individual variability in the simulated E-field predicts tACS effects" Authors here are attempting to predict a property of the subjects that was just shown to not be a reliable property of the subject. Authors are picking 9 possible features for this, testing 33 possible models with N=34 data points. With these circumstances, it is not hard to find something that correlates by chance. And some of the models tested had interaction terms, possibly further increasing the number of comparisons. The results reported in this section do not seem to be robust, unless all this was corrected for multiple comparisons, and it was not made clear?

      We thank the reviewer very much for this comment. While the reviewer is right that in these models, we are trying to predict an individual property (tACS-amplitude) that was not test–retest reliable across sessions, we still consider this to be a valid analysis. Here, we take the tACS-amplitude averaged across sessions, trying to predict the probability of a participant to be significantly modulated by tACS, in general, regardless of day-to-day variability. Regarding the number of multiple regression models, how we chose the winning model and the appropriateness/need of multiple-comparisons correction in this case, please see our explanation under “Reviewer 1 (Public review)” and our response to comment 3.

      1. "Can we reduce inter-individual variability in tACS effects ..." This section seems even more speculative and with mixed results.

      We agree with the reviewer that this section is a bit speculative. We are trying to plant some seeds for future research can help move the field forward in the quest for better stimulation protocols. We have added a sentence at the end of the section to explicitly say that more evidence is needed in this regard.

      Page 14, lines 428-429: “At this stage, more evidence is needed to prove the superiority of individually optimized tACS montages for reducing inter-individual variability in tACS effects.”

      Given the concerns with the statistical analysis above, there are concerns about the following statements in the summary of the Discussion:

      1. "2) does modulate the amplitude of the FM-stimulus induced behavioral modulation (FM-amplitude)"

      This seems to be based on Figure 4, which leaves one with significant concerns.

      Please see response to comment 1. We hope the reviewer is satisfied with our additional analysis to make sure the effect of tACS here reported is not noise.

      1. "4) individual variability in tACS effect size was partially explained by two interactions: between the normal component of the E-field and the field focality, and between the normal component of the E-field and the distance between the peak of the electric field and the functional target ROIs."

      The complexity of this statement alone may be a good indication that this could be the result of false discovery due to multiple comparisons.

      We respectfully disagree with the reviewer’s opinion that this is a complex statement. We think that these interaction effects are very intuitive as we explain in the results and discussion sections. These significant interactions show that for tACS to be effective, it matters that current gets to the right place and not to irrelevant brain regions. We believe this finding is of great importance for the field, since most studies on the topic still focus mostly on predicting tACS effects from the absolute field strength and neglect other properties of the electric field.

      For the same reasons as stated above, the following statements in the Abstract do not appear to have adequate support in the data:

      "We observed that tACS modulated the strength of behavioral entrainment to the FM sound in a phase-lag specific manner. ... Inter-individual variability of tACS effects was best explained by the strength of the inward electric field, depending on the field focality and proximity to the target brain region. Spatially optimizing the electrode montage reduced inter-individual variability compared to a standard montage group."

      Please see response to all previous comments

      In particular, the evidence in support of the last sentence is unclear. The only finding that seems related is that "the variance test was significant only for tACS(-) in session 2". This is a very narrow result to be able to make such a general statement in the Abstract. But perhaps this can be made clearer.

      We changed this sentence in the abstract to:

      Page 2, lines 41-43: “Although additional evidence is necessary, our results also provided suggestive insights that spatially optimizing the electrode montage could be a promising tool to reduce inter-individual variability of tACS effects.”

      Reviewer #3 (Public Review):

      In "Behavioral entrainment to rhythmic auditory stimulation can be modulated by tACS depending on the electrical stimulation field properties" Cabral-Calderin and collaborators aimed to document 1) the possible advantages of personalized tACS montage over standard montage on modulating behavior; 2) the inter-individual and inter-session reliability of tACS effects on behavioral entrainment and, 3) the importance of the induced electric field properties on the inter-individual variability of tACS.

      To do so, in two different sessions, they investigated how the detection of silent gaps occurring at random phases of a 2Hz- amplitude modulated sound could be enhanced with 2Hz tACS, delivered at different phase lags. In addition, they evaluated the advantage of using spatially optimized tACS montages (information-based procedure - using anatomy and functional MRI to define the target ROI and simulation to compare to a standard montage applied to all participants) on behavioral entrainment. They first show that the optimized and the standard montages have similar spatial overlap to the target ROI. While the optimized montage induced a more focal field compared to the standard montage, the latter induced the strongest electric field. Second, they show that tACS does not modify the optimal phase for gap detection (phase of the frequency-modulated sound) but modulates the strength of behavioral entrainment to the frequency-modulated sound in a phase-lag specific manner. However, and surprisingly, they report that the optimal tACS lag, and the magnitude of the phasic tACS effect were highly variable across sessions. Finally, they report that the inter-individual variability of tACS effects can be explained by the strength of the inward electric field as a function of the field focality and on how well it reached the target ROI.

      The article is interesting and well-written, and the methods and approaches are state-of-the-art.

      Strengths:

      • The information-based approach used by the authors is very strong, notably with the definition of subject-specific targets using a fMRI localizer and the simulation of electric field strength using 3 different tACS montages (only 2 montages used for the behavioral experiment).

      • The inter-session and inter-individual variability are well documented and discussed. This article will probably guide future studies in the field.

      Weaknesses:

      • The addition of simultaneous EEG recording would have been beneficial to understand the relationship between tACS entrainment and the entrainment to rhythmic auditory stimulation.

      We are grateful for the Reviewer’s positive assessment of our work and for the reviewer’s recommendations. We agree with the reviewer that adding simultaneous EEG or MEG to our design would have been beneficial to understand tACS effects. However, as the reviewer might be familiar with, such combination also possesses additional challenges due to the strong artifacts induced by tACS in the EEG signals, which is at the frequency of interest and several orders of magnitude higher than the signal of interest. Unfortunately, the adequate setup for simultaneous tACS-EEG was not available at the moment of the study. Nevertheless, since we are using a paradigm that we have repeatedly studied in the past and have shown it entrains neural activity and modulates behavior rhythmically, we are confident our results are of interest on their own. For readability of our answers, we numbered to comments below.

      1. It would have been interesting to develop the fact that tACS did not "overwrite" neural entrainment to the auditory stimulus. The authors try to explain this effect by mentioning that "tACS is most effective at modulating oscillatory activity at the intended frequency when its power is not too high" or "tACS imposes its own rhythm on spiking activity when tACS strength is stronger than the endogenous oscillations but it decreases rhythmic spiking when tACS strength is weaker than the endogenous oscillations". However, it is relevant to note that the oscillations in their study are by definition "not endogenous" and one can interpret their results as a clear superiority of sensory entrainment over tACS entrainment. This potential superiority should be discussed, documented, and developed.

      We thank the reviewer very much for this remark. We completely agree that our results could be interpreted as a clear superiority of sensory entrainment over tACS entrainment. We have now incorporated this possibility in the discussion.

      Page 16, line 472-478: “Alternatively, our results could simply be interpreted as a clear superiority of the auditory stimulus for entrainment. In other words, sensory entrainment might just be stronger than tACS entrainment in this case where the stimulus rhythm was strong and salient. It would be interesting to further test whether this superiority of sensory entrainment applies to all sensory modalities or if there is a particular advantage for auditory stimuli when they compete with electrical stimulation. However, answering this question was beyond the scope of our study and needs further investigations with more appropriate paradigms.”

      1. The authors propose that "by applying tACS at the right lag relative to auditory rhythms, we can aid how the brain synchronizes to the sounds and in turn modulate behavior." This should be developed as the authors showed that the tACS lags are highly variable across sessions. According to their results, the optimal lag will vary for each tACS session and subtle changes in the montage could affect the effects.

      We thank the reviewer for this remark. We believe that the right procedure in this case would be using close-loop protocols where the optimal tACS-lag is estimated online as we discuss in the summary and future directions sub-section. We tried to make this clearer in the same sentence that the reviewer mentioned.

      Page 17, line 506-508: “Since optimal tACS phase was variable across participants and sessions, this approach would require closed-loop protocols where the optimal tACS lag is estimated online (see next section).”

      1. In a related vein, it would be very useful to show the data presented in Figure 3 (panels b,d,e) for all participants to allow the reader to evaluate the quality of the data (this can be added as a supplementary figure).

      Thank you very much for the suggestion. We have added two new supplemental figures (Fig S1 and S2) to show individual data for Fig. 3b and 3e. Note that Fig. 3d already shows the individual data as each circle represents optimal FM-phase for a single participant.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      "was optimized in SimNIBS to focus the electric field as precisely as possible at the target ROI" It appears that some form of constrained optimization was used. It would be good to clarify which method was used, including a reference.

      Indeed, SimNIBS implements a constrained optimization approach based on pre-calculated lead fields. We have added the corresponding reference. All parameters used for the optimization are reported in the methods (see sub-section Electric field simulations and montage optimization). Regarding further specifics, the readers are invited to check the MATLAB code that was used for the optimization which is made available at: https://osf.io/3yutb

      "Thus, each montage has its pros and cons, and the choice of montage will depend on which of these dependent measures is prioritized." Well put. It would be interesting to know if authors considered optimizing for intensity on target. That would give the strongest predicted intensity on target, which seems like an important desideratum. Individualizing for something focal, as expected, did not give the strongest intensity. In fact, the method struggled to achieve the desired intensity of 0.1V/m in some subjects. It would be interesting to have a discussion about why this particular optimization method was selected.

      The specific optimization method used in this study was somewhat arbitrary, as there is no standard in the field. It was validated in prior studies, where it was also demonstrated that it performs favorably compared to alternative methods (Saturnino et al., 2019; Saturnino et al., 2021). The underlying physics of the head volume conductor generally limits the maximally achievable focality, and requires a tradeoff between focality and the desired intensity in the target. This tradeoff depends on the maximal amount of current that can be injected into the electrodes due to safety limits (4 mA in total in our case). Further constraints of the optimization in our application were the simultaneous targeting of two areas, and achieving field directions in the targets roughly parallel to those of auditory dipoles. Given the combination of these constraints, as the reviewer noticed, we could not even achieve the desired intensity of .1V/m in some subjects. As we wanted to stimulate both auditory cortices equally, our priority was to have the E-fields as similar as possible between hemispheres. Future studies optimizing for only one target would be easier to optimize for target intensity (assuming the same maximal total current injection). Alternatively, relaxing the constraint on direction and optimizing only for field intensity would help to increase the field intensities in the targets, but would lead to differing field directions in the two targets. As an example, see Rev. Fig.1 below. We extensively discuss some of these points in the discussion section: “Are individually optimized tACS montage better?” (Pages 21-22).

      Additionally, we added a few sentences in the Results and Methods giving more details about the optimization approach.

      Page 5, lines 115-116: “Using individual finite element method (FEM) head models (see Methods) and the lead field-based constrained optimization approach implemented in SimNIBS (31)”

      Page 27, lines 819-822: “The optimization pipeline employed the approach described in (31) and was performed in two steps. First, a lead field matrix was created per individual using the 10-10 EEG virtual cap provided in SimNIBS and performing electric field simulations based on the default tissue conductivities listed below.”

      Author response image 1.

      E-field distributions for one example participant. Brain maps show the results from the same optimization procedure described in the main manuscript but with no constraint for the current direction (top) or constraining the current direction (bottom). Note that the desired intensity of .1 V/m can be achieved when the current direction is not constrained.

      The terminology of "high-definition HD" used here is unconventional and may confuse some readers. The paper cited for ring electrodes (18) does not refer to it as HD. A quick search for high-definition HD yields mostly papers using many small electrodes, not ring electrodes. They look more like what was called "individualized". More conventional would be to call the first configuration a "ring-electrode", and the "individualized" configuration might be called "individualized HD".

      We thank the reviewer for this remark. We changed the label of the high-definition montage to ring-electrode. Regarding the individualized configuration, we prefer not to use individualized HD as it has the same number of electrodes as the standard montage.

      "So far, we have evaluated whether tACS at different phase lags interferes with stimulus-brain synchrony and modulates behavioral signatures of entrainment" The paper does not present any data on stimulus-brain synchrony. There is only an analysis of behavior and stimulus/tACS phase.

      We agree with the reviewer. To be more careful with such statement we now modified the sentence to say:

      Page 10, lines 303-304: “So far, we have evaluated whether tACS at different phase lags modulates behavioral signatures of entrainment: FM-amplitude and FM-phase.”

      "However, the strength of the tACS effect was variable across participants." and across sessions, and the phase also was variable across subjects and sessions.

      "tACS-amplitude estimates were averaged across sessions since the session did not significantly affect FM-amplitude (Fig. 5a)." More importantly, the authors show that "tACS-amplitude" was not reproducible across sessions.

      Unfortunately, we did not understand what the reviewer is suggesting here, and would have to ask the reviewer in this case to provide us with more information.

      References

      Kasten FH, Duecker K, Maack MC, Meiser A, Herrmann CS (2019) Integrating electric field modeling and neuroimaging to explain inter-individual variability of tACS effects. Nat Commun 10:5427. Riecke L, Sack AT, Schroeder CE (2015) Endogenous Delta/Theta Sound-Brain Phase Entrainment Accelerates the Buildup of Auditory Streaming. Curr Biol 25:3196-3201.

      Riecke L, Formisano E, Sorger B, Baskent D, Gaudrain E (2018) Neural Entrainment to Speech Modulates Speech Intelligibility. Curr Biol 28:161-169 e165.

      Saturnino GB, Madsen KH, Thielscher A (2021) Optimizing the electric field strength in multiple targets for multichannel transcranial electric stimulation. J Neural Eng 18.

      Saturnino GB, Siebner HR, Thielscher A, Madsen KH (2019) Accessibility of cortical regions to focal TES: Dependence on spatial position, safety, and practical constraints. Neuroimage 203:116183.

      Zoefel B, Davis MH, Valente G, Riecke L (2019) How to test for phasic modulation of neural and behavioural responses. Neuroimage 202:116175.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public review):

      Weaknesses: The interpretation is somewhat model-dependent, and it is unclear if the interpretation is unique. For example, it is unclear if the heterogeneous release probability among sites, silent sites, can explain the results. N estimates out of variance-mean analysis for example may be limited by the availability of postsynaptic receptors.

      To address this criticism, we have added a paragraph in the Discussion outlining the main assumptions underlying our work and how possible deviations from these assumptions may have affected our conclusions. This new paragraph is titled ' Assumptions behind our analysis, and possible limitations of our conclusions'.

      Reviewer 1, Recommendations to Authors:

      Without molecular evidence or anatomical evidence, the model and conclusions may remain as a postulate at this stage. This can be discussed carefully. Also, the study looks a bit narrow regarding the scope, only dealing with RS-DS model vs TS-LS model. Maybe, the authors pick up a bit more qualitative findings that directly support RS-DS model.

      To address these issues, another paragraph has been added to the Discussion titled 'Functional evidence in favor of the RS/DS model at PF-MLI synapses, and remaining uncertainties on the molecular composition and morphological arrangement of docking sites'.

      Minor: Fukaya et al. studied not cerebellar mossy fiber synapses.

      We apologize for this error, which has now been rectified.

      Reviewer 2 (Public review):

      It remains unclear how generalizable the findings are to other types of synapses.

      We agree with the Reviewer: this is a limitation of our study. In the Discussion we have a paragraph titled 'Maximum RRP size for other synaptic types' where we discuss this point. As we say in this paragraph, central synapses are clearly diverse, and the level of applicability of our results across preparations will depend on our ability to extend SV counting to various types of brain synapses. For the moment SV counting has been applied to only two types of synapses: PF-MLI synapses and hMF-IN synapses. We are encouraged by the fact that the simple synapse study by Tanaka et al. (2021), carried out at hMF-IN synapses, offers another example where the ratio between RRP size and N is larger than 1.

      Recommendations to Authors,

      Minor comments:

      The manuscript is at times difficult to read or reads like a review. The introduction could be shortened to concisely outline the motivation and premises for the study. The results and methods sections should not contain excessive interpretation and discussion. Although very informative, it distracts from the simple principal message.

      To address these criticisms, we have shortened the Introduction and parts of the Results section. These changes have resulted in a presentation of Results that is shorter and more focused on data and simulations than in the previous version. Nevertheless, readers need to be informed of ongoing research on docking sites and the principles of sequential models to understand the usefulness of our work. For this reason, we have maintained a theoretical section at the beginning of Results.

      The rationale for the choice of synapse and experimental conditions remains unclear until the discussion. This needs to be clearly addressed at the beginning, in the introduction, or in the results. In particular, the extracellular calcium concentration and the addition of 4-AP to the recording solution should be addressed in the results.

      The reason to choose the PF-MLI synapse is now indicated at the end of the Introduction. The rationale underlying our choice of experimental conditions including the extracellular calcium concentration and the addition of 4-AP is now briefly explained in the beginning the second section of Results (titled 'Maximizing RRP size and its release during AP trains'), and more extensively in the Methods section (as in the previous version of the manuscript).

      Potential confounds of the approach should be discussed (e.g. could a broadened AP in 4-AP alter synchronicity of release, i.e. desynchronization of release, especially during trains. That could be complemented with information on the EPSC kinetics (rise, decay) under different experimental conditions, as well as during train stimulation. How could presynaptic calcium concentration and time course in 4-AP impact the conclusions?

      To study the effects of 4-AP on AP broadening we have performed a new analysis of EPSC latencies in control and in 4-AP. In both cases the first latencies were independent of i. In 4-AP, first latencies displayed a small right shift of 0.2 ms (see additional figure below). This indicates that 4-AP does broaden the AP waveform, but that the extent of this broadening is limited. This new information has been added in the Methods of the revised manuscript.

      As suspected by the Reviewer, the latency distribution changes as a function of i and in the presence of 4-AP. Consistent with earlier findings (Miki et al., 2018), the proportion of 2-step release (with longer latencies) augments as a function of i both in control and in 4-AP. We also find that the value of the fast time constant of the latency distribution,τf, is larger in 4-AP than in control. This last result probably indicates a longer presynaptic calcium entry in 4-AP.

      In the revised version, we describe these results in the Methods section, in a new paragraph titled 'Changes in latency distributions as a function of i and of experimental conditions'.

      While the latency distributions change as a function of i and as a function of experimental conditions, this does not affect our conclusions, because these conclusions are based on the summed number of release events after each AP (or in other words, on the integral of the latency distributions).

      The kinetics of mEPSCs (risetime and decay time) are unchanged by 4-AP or by PTP. Consequently, in a given experiment, we used the same template to perform our deconvolution analysis for all conditions that were examined (starting with 3 mM Cao up to 200 Hz). This information has now been added in Methods.

      Following an AP stimulation, the amount of calcium entry in the presence of 4-AP is presumably much larger than in control. TEA, a weaker K channel blocker than 4-AP at PF-MLI synapses, elicits a marked increase in calcium entry (Malagon et al., 2020). This suggests an even larger increase with 4-AP, even though this has not been directly confirmed in the present work. The enhanced calcium entry translates in an increase in the parameters pr, r and s of our model. The important thing for our study is to increase pr and r as much as possible to promote the emptying of the RRP during trains. Knowing the exact amount of calcium entry and its relation to pr /r increase is not essential for this purpose. Likewise, whether r (and/or s) increase as a function of i is of little practical importance since much of the RRP is emptied already after the second stimulation, at least in the most extreme case (200 Hz stimulation).

      The applicability of this model to other synapses needs to be addressed more thoroughly. This synapse, under physiological conditions, has a very low Pr, and the experimental conditions have to be adjusted dramatically to achieve a high-Pr. How applicable are the conclusions to high-Pr synapses and/or synapses that operate in a multivesicular release regime? Although that might be difficult to test experimentally it should be addressed in the discussion.

      The applicability issue to other synapses has been addressed above, in response to the public comments of the same Reviewer.

      As the Reviewer points out, the PF-MLI synapse has a small P value under physiological conditions. One can speculate that synapses that exhibit a higher P value may have a higher docking site occupancy than PF-MLI synapses. This feature would increase their chance of having a ratio of RRP size over N larger than 1, as it occurs in PF-MLI synapses in high docking occupancy conditions. A sentence making this point has been added to the paragraph titled 'Maximum RRP size for other synaptic types' in the revised manuscript.

      Author response image 1.

      Latency histograms for s1 in control and in the presence of 4-AP. After normalization, the averaged latency histogram in 4-AP displays an additional delay of 0.2 ms, and a slowing of the time constant τf from 0.47 ms to 0.70 ms.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      “The exact levels of inhibition, excitation, and neuromodulatory inputs to neural networks are unknown. Therefore, the work is based on fine-tuned measures that are indirectly based on experimental results. However, obtaining such physiological information is challenging and currently impossible. From a computational perspective it is a challenge that in theory can be solved. Thus, although we have no ground-truth evidence, this framework can provide compelling evidence for all hypothesis testing research and potentially solve this physiological problem with the use of computers.”

      Response: We agree with the reviewer. This work was intended to determine the feasibility of reverse engineering motor unit firing patterns, using neuron models with a high degree realism. Given the results support this feasibility, our model and technique will therefore serve to construct new hypotheses as well as testing them.

      • Common input structure lines 115

      I agree with the following concepts, but I would specify that there is not only one dominant common input. It has been shown that there are multiple common inputs to the same motor nuclei (e.g., the two inputs are orthogonal and are shared with a subset of the active motoneurons) particularly for agonist motoneuron pools of synergistic muscles. On the hand muscles the authors are correct that there is only one dominant common input. Moreover, there is also some animal work suggesting that common inputs is just an epiphenomenon. This is completely in contradiction to what we observe in-vivo in the firing patterns of motor units, but perhaps worth mentioning and discussing.

      Response: Thanks for emphasizing this point. We have cited a recent reference discussing the important issue of common drive and the possibility of more than one source. Our simulations assume the net form of the excitatory input to all motoneurons in the pool is the same, except for noise. This net form (which produces the linear CST output in each case) essentially represents the sum of all inputs, both descending and sensory. Our results show the same over pattern as human data, i.e. that all motor unit firing patterns have similar trajectories (again allowing for the impact of noise). Future studies will consider separating excitatory inputs into different sources.

      It is interesting that the authors mention suprathreshold rate modulation. Could the authors just discuss more on how the model would respond to a simulated suprathreshold current for all simulated motoneurons (i.e., like the ones generated during a suprathreshold-injected current or voluntary maximal feedforward movement?)

      Response: Thank you for this point. Our use of the term “suprathreshold” was not applied correctly. We meant “suprathreshold” to refer to amount of input above the recruitment threshold. We have decided to remove this term so now the sentence “…so less is available for rate modulation…”.

      194 a full point is missing.

      Response: We addressed the error.

      204-231 and 232-259, these two paragraphs have been copied twice.

      Response: We addressed the error.

      Line 475 typo

      Response: We addressed the error.

      591 It would be interesting to add the me it takes a standard computer with known specs and a super computer to run over one batch of simulation (i.e., how long one of the 6,300,000 simulation takes).

      Response: Each simulation took about 20 minutes of real me. Assuming a standard computer with 16 processor cores using a similar microarchitecture as Bebop (Intel Broadwell architecture), the standard computer could run 16 simulations at a me (one simulation assigned per core). This would take the standard computer about 15 years to complete all 6.3M simulations.

      594 I don't understand why there are 6M simulations, could the authors provide more info on the combinations and why there are 6M simulations.

      Response: The 6M simulations are the total number of simulations that were performed for this work. A detailed explanation can be found in section: “Machine learning inference of motor pool characteristics” at line 591. Briefly, there were 315,000 simulations of a pool of 20 motoneurons (20 x 315,000 = 6.3 million). The 315,000 simulations was required to run all possible combinations of 15 patens of inhibition, 5 of neuromodulation, 7 of distribution of excitatory inputs and 30 different repeats of synaptic noise with different seeds. In addition, there were 20 iterations for each of these combinations to generate a linear CST output (as illustrated in Fig. 3). 15 x5 x 7 x 30 x 20=315,000.

      In several simulations it seems that there was a lot of fine-tuning of inputs to match the measured motor unit firing pattern. Have the authors ever considered a fully black-box AI approach? If they think is interesting maybe it could spice up the discussion.

      Response: We agree that AI has potential for reverse engineering the whole system and we are looking into adding it to future version of this algorithm as an alternative. We started with a simple but powerful grid search to enhance our understanding of the interaction between inputs, neuron properties and outputs.

      Reviewer 2

      Comment 1:

      “First, I believe that the relation between individual motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties can be illustrated more clearly. Although this is explained in the text, I believe that this is not optimally supported by figures. Figure 6 to some extent shows this, but figures 8 and 9 as well as Table 1 shows primarily the goodness of fit rather than the actual fit.”

      Response: We agree with the reviewer that showing the relationship between the motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties would be a great addition to the manuscript. Because the regression models have multiple dimensions (7 inputs and 3 outputs) it is difficult to show the relationship in a static image. We thought it best to show the goodness of fit even though it is more abstract and less intuitive. We added a supplemental diagram to Figure 8 to show the structure of the reverse engineered model that was fit (see Figure 8D).

      Author response image 1.

      Figure 8. Residual plots showing the goodness of fit of the different predicted values: (A) Inhibition, (B) Neuromodulation and (C) excitatory Weight Rao. The summary plots are for the models showing highest 𝑅𝑅2 results in Table 1. The predicted values are calculated using the features extracted from the firing rates (see Figure 7, section Machine learning inference of motor pool characteristics and Regression using motoneuron outputs to predict input organization). Diagram (D) shows the multidimensionality of the RE models (see Model fits) which have 7 feature inputs (see Feature Extraction) predicting 3 outputs (Inhibition, Neuromodulation and Weight Rao).

      Comment 2:

      “Second, I would have expected the discussion to have addressed specifically the question of which of the two primary schemes (pushpull, balanced) is the most prevalent. This is the main research question of the study, but it is to some degree le unanswered. Now that the authors have identified the relation between the characteristics of motor neuron behaviors (which has been reported in many previous studies), why not exploit this finding by summarizing the results of previous studies (at least a few representative ones) and discuss the most likely underlying input scheme? Is there a consistent trend towards one of the schemes, or are both strategies commonly used?”

      Response: We agree with the reviewer that our discussion should have addressed which of the two primary schemes – push-pull or balanced – is the most prevalent. At first glance, the upper right of Figure 6 looks the most realistic when compared to real data. We thus would expect that the push-pull scheme to dominate for the given task.

      We added a brief section (Push-Pull vs Balance Motor Command) in the discussion to address the reviewer’s comments. This section is not exhaustive but frames the debate using relevant literature. We are also now preparing to deploy these techniques on real data.

      Comment 3:

      In addition, it seems striking to me that highly non-linear excitation profiles are necessary to obtain a linear CST ramp in many model configurations. Although somewhat speculative, one may expect that an approximately linear relation is desired for robust and intuitive motor control. It seems to me that humans generally have a good ability to accurately grade the magnitude of the motor output, which implies that either a non-linear relation has been learnt (complex task), or that the central nervous system can generally rely on a somewhat linear relation between the neural drive to the muscle and the output (simpler task).

      Response: We agree with the reviewer, and we were surprised by these results. Our motoneuron pool is equipped with persistent inward currents (PICs) which are nonlinear. Therefore, for the motoneuron to produce a linear output the central nervous system would have to incorporate these nonlinearities into its commands.

      Following this reasoning, it could be interesting to report also for which input scheme, the excitation profile is most linear. I understand that this is not the primary aim of the study, but it may be an interesting way to elaborate on the finding that in many cases non-linear excitation profiles were needed to produce the linear ramp.

      This is a very interesting point. The most realistic firing patterns – with respect to human data – are found in the parameter regions in the upper right in Figure 6, which in fact produce the most nonlinear input (see push-pull pattern in Figure 4C). However, in future studies we hope to separate the total motor command illustrated here into descending and feedback commands. This may result in a more linear descending drive.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper investigates host and viral factors influencing transmission of alpha and delta SARS-CoV-2 variants in the Syrian hamster model and fundamentally increases knowledge regarding transmission of the virus via the aerosol route. The strength of evidence is solid and could be improved with a clearer presentation of the data.

      We thank the editors for their assessment. We are excited to present a revised version of the manuscript with improved data presentation and an improved discussion addressing the reviewer’s concerns.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the submitted manuscript, Port et al. investigated the host and viral factors influencing the airborne transmission of SARS-CoV-2 Alpha and Delta variants of concern (VOC) using a Syrian hamster model. The authors analyzed the viral load profiles of the animal respiratory tracts and air samples from cages by quantifying gRNA, sgRNA, and infectious virus titers. They also assessed the breathing patterns, exhaled aerosol aerodynamic profile, and size distribution of airborne particles after SARS-CoV-2 Alpha and Delta infections. The data showed that male sex was associated with increased viral replication and virus shedding in the air. The relationship between co-infection with VOCs and the exposure pattern/timeframe was also tested. This study appears to be an expansion of a previous report (Port et al., 2022, Nature Microbiology). The experimental designs were rigorous, and the data were solid. These results will contribute to the understanding of the roles of host and virus factors in the airborne transmission of SARS-CoV-2 VOCs.

      Reviewer #2 (Public Review):

      This manuscript by Port and colleagues describes rigorous experiments that provide a wealth of virologic, respiratory physiology, and particle aerodynamic data pertaining to aerosol transmission of SARS-CoV-2 between infected Syrian hamsters. The data is particularly significant because infection is compared between alpha and delta variants, and because viral load is assessed via numerous assays (gRNA, sgRNA, TCID) and in tissues as well as the ambient environment of the cage. The paper will be of interest to a broad range of scientists including infectious diseases physicians, virologists, immunologists and potentially epidemiologists. The strength of evidence is relatively high but limited by unclear presentation in certain parts of the paper.

      Important conclusions are that infectious virus is only detectable in air samples during a narrow window of time relative to tissue samples, that airway constriction increases dynamically over time during infection limiting production of fine aerosol droplets, that variants do not appear to exclude one another during simultaneous exposures and that exposures to virus via the aerosol route lead to lower viral loads relative to direct inoculation suggesting an exposure dose response relationship.

      While the paper is valuable, I found certain elements of the data presentation to be unclear and overly complex.

      Reviewer #1 (Recommendations For The Authors):

      We thank the reviewer for their comments and their attention to detail. We have taken the following steps to address their suggestions and concerns.

      However, the following concerns need to be issued.

      1. Summary seems to be too simple, and some results are not clearly described in the summary.

      We have edited the summary and hope to have addressed the concerns raised by providing more information. We think that the summary includes all relevant findings.

      “It remains poorly understood how SARS-CoV-2 infection influences the physiological host factors important for aerosol transmission. We assessed breathing pattern, exhaled droplets, and infectious virus after infection with Alpha and Delta variants of concern (VOC) in the Syrian hamster. Both VOCs displayed a confined window of detectable airborne virus (24-48 h), shorter than compared to oropharyngeal swabs. The loss of airborne shedding was linked to airway constriction resulting in a decrease of fine aerosols (1-10µm) produced, which are suspected to be the major driver of airborne transmission. Male sex was associated with increased viral replication and virus shedding in the air. Next, we compared the transmission efficiency of both variants and found no significant differences. Transmission efficiency varied mostly among donors, 0-100% (including a superspreading event), and aerosol transmission over multiple chain links was representative of natural heterogeneity of exposure dose and downstream viral kinetics. Co-infection with VOCs only occurred when both viruses were shed by the same donor during an increased exposure timeframe (24-48 h). This highlights that assessment of host and virus factors resulting in a differential exhaled particle profile is critical for understanding airborne transmission.”

      1. Aerosol transmission experiment should be described in Materials and Methods although it is cited as Reference 21#;

      We have modified Line 433:

      “Aerosol caging

      Aerosol cages as described by Port et al. [2] were used for transmission experiments and air sampling as indicated. The aerosol transmission system consisted of plastic hamster boxes (Lab Products) connected by a plastic tube. The boxes were modified to accept a 7.62 cm (3') plastic sanitary fitting (McMaster-Carr), which enabled the length between the boxes to be changed. Airflow was generated with a vacuum pump (Vacuubrand) attached to the box housing the naïve animals and was controlled with a float-type meter/valve (McMaster-Carr).”

      And Line 458: “During the first 5 days, hamsters were housed in modified aerosol cages (only one hamster box) hooked up to an air pump.”.

      Especially, one superspreading event of Alpha VOC (donor animal) was observed in iteration A (Figure 4). What causes that event, experiment system?

      Based on the observed variation in airborne shedding (of the cages from which this was directly measured), we believe that one plausible explanation for the super-spreading event was that the Alpha-infected donor shed considerably more virus during the exposure than other donors, and thus more readily infected the sentinels. That said, it is also conceivable that other factors such as hamster behavior (e.g., closeness to the cage outlet, sleeping) or variable sentinel susceptibility could affect the distribution of transmissions.

      1. Same reference is repeatedly listed as Refs 2 and 21#.

      Addressed. We thank the reviewer for their attention to detail. We have also removed reference 53, which was the same as 54.

      1. Two forms of described time (hour and h) are used in the manuscript. Single form should be chosen.

      This has been addressed.

      5) Virus designation located in line 371 and line 583 is inconsistent, and it needs to be revised.

      For consistency we have chosen this nomenclature for the viruses used: SARS-CoV-2 variant Alpha (B.1.1.7) (hCoV320 19/England/204820464/2020, EPI_ISL_683466) and variant Delta (B.1.617.2/) (hCoV-19/USA/KY-CDC-2-4242084/2021, EPI_ISL_1823618).

      1. In Figure 5F, what time were lung and nasal turbinate tissues collected after virus infection?

      This has been added to the legend. Day 5. Line 904.

      1. Line 562-563, what is the coating antigen (spike protein, generated in-house)? purified or recombinant protein?

      It is in-house purified recombinant protein. This has been added to the methods.

      1. Line 575 and line 578: 10,000x is not standard description, and it should be revised.

      Done.

      Reviewer #2 (Recommendations For The Authors):

      We thank the reviewer for their comments and suggestions to improve the manuscript, and hope we have addressed all concerns adequately.

      • Direct interpretation of the linear regression slope in Figure 3 is challenging. Is the most relevant parameter for transmission known? Intuitively, it would be the absolute number of small droplets at a given timepoint rather than the slope and it would be easier to interpret if the data were reported in this fashion.

      We decided to show a percentage of counts to normalize the data among animals, as we observed large inter-individual variation in counts. The reviewer is correct that it is most likely the number of particles that would be most relevant to transmission, though much (including the role of particle size) remains to be determined. We have added a sentence to the results which explains this in L157.

      Therefore, we decided in this first analysis to utilize the slope measurement and not raw counts. The focus was on the slopes and how particle profiles were changing post inoculation. Because we have focused on percentages, it seems not appropriate to present particle counts within each diameter range because the analysis, model, and results are based on these percentages of particles.

      Use of regression to compute slope is a useful measure because it uses data from all timepoints to estimate the regression line and, therefore, the % of particles on each day. We decided on these methods because efficiency is especially important in a study with a relatively small number of animals and slopes are also a good surrogate for how animal particle profiles are changing post-inoculation.

      To assist with the interpretation: 1) We removed Figure 3C and D and replaced Figure 3B with individual line plots for all conditions to visualize the slopes. The figure legend was corrected to reflect these changes.

      2) We replaced L169 onwards to read: (Figure 3B). Females had a steeper decline at an average rate of 2.2 per day after inoculation in the percent of 1-10 μm particles (and a steeper incline for <0.53 μm) when compared to males, while holding variant group constant. When we compared variant group while holding sex constant, we found that the Delta group had a steeper decline at an average rate of 5.6 per day in the percent of 1-10 μm particles (and a steeper incline for <0.53 μm); a similar trend, but not as steep, was observed for the Alpha group.

      The estimated difference in slopes for Delta vs. controls and Alpha vs. controls in the percent of <0.53 μm particles was 5.4 (two-sided adjusted p= 0.0001) and 2.4 (two-sided adjusted p = 0.0874), respectively. The estimated difference in slopes for percent of 1-10 μm particles was not as pronounced, but similar trends were observed for Delta and Alpha. Additionally, a linear mixed model was considered and produced virtually the same results as the simpler analysis described above; the corresponding linear mixed model estimates were the same and standard errors were similar.

      • Fig 4: what is "limit of quality" mentioned in the legend? Are these samples undetectable?

      We have clarified this in the legend: “3.3 = limit of detection for RNA (<10 copies/rxn)”. If samples have below 10 copy numbers per reaction, they are determined to be below the limit of detection. The limit of detection is 10 copy number/rxn. All samples below 10 copies/rxn are taken to be negative and set = 10 copies/rxn, which equals 3.3. Log10 copies/mL oral swab.

      • Fig 4C would be easier to process in graphical rather than tabular form. The meaning of the colors is unclear.

      We agree with the reviewer that this is difficult to interpret, but we are uncertain if the same data in a tabular format would be easier to digest. We realized that the legend was misplaced and have added this back into the figure, which we hope clarifies the colors and the limit of detection.

      • Figure 4D & E are uninterpretable. What do the pie charts represent?

      We have remodeled this part of the figure to a schematic representation of the majority variant which transmitted for each individual sentinel, and have added a table (Table S1) which summarizes the exact sequencing results for the oral swabs. The reviewer is correct that it was difficult to interpret the pie charts, considering most values are either 0 or close to 100%. We hope this addresses the question. The legend states:

      Author response image 1.

      Airborne attack rate of Alpha and Delta SARS-CoV-2 variants. Donor animals (N = 7) were inoculated with either the Alpha or Delta variant with 103 TCID50 via the intranasal route and paired together randomly (1:1 ratio) in 7 attack rate scenarios (A-G). To each pair of donors, one day after inoculation, 4-5 sentinels were exposed for a duration of 4 h (i.e., h 24-28 post inoculation) in an aerosol transmission set-up at 200 cm distance. A. Schematic figure of the transmission set-up. B. Day 1 sgRNA detected in oral swabs taken from each donor after exposure ended. Individuals are depicted. Wilcoxon test, N = 7. Grey = Alpha, teal = Delta inoculated donors. C. Respiratory shedding measured by viral load in oropharyngeal swabs; measured by sgRNA on day 2, 3, and 5 for each sentinel. Animals are grouped by scenario. Colors refer to legend below. 3.3 = limit of detection of RNA (<10 copies/rxn). D. Schematic representation of majority variant for each sentinel as assessed by percentage of Alpha and Delta detected in oropharyngeal swabs taken at day 2 and day 5 post exposure by deep sequencing. Grey = Alpha, teal = Delta, white = no transmission.

      • Fig S2G is uninterpretable. Please label and explain.

      We have now included an explanations of the figure S2F. The figure is a graphic representation of the neutralization data depicted in Figure S2F. The spacing between grid lines is 1 unit of antigenic distance, corresponding to a twofold dilution of serum in the neutralization assay. The resulting antigenic distance depicted between Alpha and Delta is roughly a 4-fold difference in neutralization between homologous (e.g., Alpha sera with the Alpha virus vs. heterologous, Alpha sera with the Delta virus).

      • I would consider emphasizing lines 220-225 in the summary and abstract. The important implication is that aerosol transmission is more representative of natural heterogeneity of exposure dose and downstream viral kinetics. This is an often-overlooked point.

      We agree with the reviewer and have added this in Line 43.

      • Fig 5: A cartoon similar to Fig 4A showing timing of sentinel exposure with number of animals would be helpful.

      We have added this as a new panel A for Figure 5. See the redrafted Figure 5 below.

      • For Fig 5E & F It would be helpful to use a statistical test to more formally assess whether proportion at exposure predicts proportion of variants in downstream sentinel infection.

      This has been added as a new Figure 5 panel H and I, which we hope addresses the reviewer’s comment.

      Author response image 2.

      Airborne competitiveness of Alpha and Delta SARS-CoV-2 variants. A. Schematic. Donor animals (N = 8) were inoculated with Alpha and Delta variant with 5 x 102 TCID50, respectively, via the intranasal route (1:1 ratio), and three groups of sentinels (Sentinels 1, 2, and 3) were exposed subsequently at a 16.5 cm distance. Animals were exposed at a 1:1 ratio; exposure occurred on day 1 (Donors  Sentinels 1) and day 2 (Sentinels  Sentinels). B. Respiratory shedding measured by viral load in oropharyngeal swabs; measured by gRNA, sgRNA, and infectious titers on days 2 and day 5 post exposure. Bar-chart depicting median, 96% CI and individuals, N = 8, ordinary two-way ANOVA followed by Šídák's multiple comparisons test. C/D/E. Corresponding gRNA, sgRNA, and infectious virus in lungs and nasal turbinates sampled five days post exposure. Bar-chart depicting median, 96% CI and individuals, N = 8, ordinary two-way ANOVA, followed by Šídák's multiple comparisons test. Dark orange = Donors, light orange = Sentinels 1, grey = Sentinels 2, dark grey = Sentinels 3, p-values indicated where significant. Dotted line = limit of quality. F. Percentage of Alpha and Delta detected in oropharyngeal swabs taken at days 2 and day 5 post exposure for each individual donor and sentinel, determined by deep sequencing. Pie-charts depict individual animals. Grey = Alpha, teal = Delta. G. Lung and nasal turbinate samples collected on day 5 post inoculation/exposure. H. Summary of data of variant composition, violin plots depicting median and quantiles for each chain link (left) and for each set of samples collected (right). Shading indicates majority of variant (grey = Alpha, teal = Delta). I. Correlation plot depicting Spearman r for each chain link (right, day 2 swab) and for each set of samples collected across all animals (left). Colors refer to legend on right. Abbreviations: TCID, Tissue Culture Infectious Dose.”

      We have additionally added to the results section: L284: “Combined a trend, while not significant, was observed for increased replication of Delta after the first transmission event, but not after the second, and in the oropharyngeal cavity (swabs) as opposed to lungs (Figure 5H) (Donors compared to Sentinels 1: p = 0.0559; Donors compared to Sentinels 2: p = >0.9999; Kruskal Wallis test, followed by Dunn’s test). Swabs taken at 2 DPI/DPE did significantly predict variant patterns in swabs on 5 DPI/DPE (Spearman’s r = 0.623, p = 0.00436) and virus competition in the lower respiratory tract (Spearman’s r = 0.60, p = 0.00848). Oral swab samples taken on day 5 strongly correlate with both upper (Spearman’s r = 0.816, p = 0.00001) and lower respiratory tract tissue samples (Spearman’s r = 0.832, p = 0.00002) taken on the same day (Figure 5I).”

      • Fig 1A: how are pfu/hour inferred? This is somewhat explained in the supplement, but I found the inclusion of model output as the first panel confusing and am still not 100% clear how this was done. Consider, explaining this in the body of the paper.

      We have added a more detailed explanation of the PFU/h inference to the main text: The motivation for the model was to link more readily measurable quantities such as RNA measured in oral swabs to the quantity of greatest interest for transmission (infectious virus per unit time in the air). To do this, we jointly infer the kinetics of shed airborne virus and parameters relating observable quantities (infected sentinels, plaques from purified air sample filters) to the actual longitudinal shedding. The inferential model uses mechanistic descriptions of deposition of infectious virus into the air, uptake from the air, and loss of infectious virus in the environment to extract estimates of the key kinetic parameters, as well as the resultant airborne shedding, for each animal.

      We have added this information to L106 in the results and hope this clarifies the rationale and execution of the model.

      More minor points:

      • Line 292: "poor proxy" seems too strong as peak levels of viral RNA correlate with positive airway cultures. It might be more accurate to say that high levels of viral RNA during early infection only somewhat correlate with positive airway cultures.

      We have rephrased this to clarify that while peak RNA viral loads are predictive of positive cultures, measuring RNA, especially early during infection and only once, may not be sufficient to infer the magnitude or time-dependence of infectious virus shedding into the air. See Line 308: “We found that swab viral load measurements are a valuable but imperfect proxy for the magnitude and timing of airborne shedding. Crucially, there is a period early in infection (around 24 h post-infection in inoculated hamsters) when oral swabs show high infectious virus titers, but air samples show low or undetectable levels of virus. Viral shedding should not be treated as a single quantity that rises and falls synchronously throughout the host; spatial models of infection may be required to identify the best correlates of airborne infectiousness [32]. Attempts to quantify an individual’s airborne infectiousness from swab measurements should thus be interpreted with caution, and these spatiotemporal factors should be considered carefully.”

      • Line 352: Re is dependent on time of an outbreak (population immunity) and cannot be specified for a given variant as it depends on multiple other variables

      We agree that the current phrasing here could be interpreted to suggest, incorrectly, that Re is an intrinsic property of a variant. We have deleted that language and reworded the section to emphasize that the critical question is heterogeneity in transmission, not mean reproduction number. Line 348: “Moreover, at the time of emergence of Delta, a large part of the human population was either previously exposed to and/or vaccinated against SARS-CoV-2; that underlying host immune landscape also affects the relative fitness of variants. Our naïve animal model does not capture the high prevalence of pre-existing immunity present in the human population and may therefore be less relevant for studying overall variant fitness in the current epidemiological context. Analyses of the cross-neutralization between Alpha and Delta suggest subtly different antigenic profiles [35], and Delta’s faster kinetics in humans may have also helped it cause more reinfections and “breakthrough” infections [36].

      Our two transmission experiments yielded different outcomes. When sentinel hamsters were sequentially exposed, first to Alpha and then to Delta, generally no dual infections—both variants detectable—were observed. In contrast, when we exposed hamsters simultaneously to one donor infected with Alpha and another infected with Delta, we were able to detect mixed-variant virus populations in sentinels in one of the cages (Cage F, see Appendix figures S1, S2). The fact that we saw both single-lineage and multi-lineage transmission events suggests that virus population bottlenecks at the point of transmission do indeed depend on exposure mode and duration, as well as donor host shedding. Notably, our analysis suggests that the Alpha-Delta co-infections observed in the Cage F sentinels could be due to that being the one cage in which both the Alpha and the Delta donor shed substantially over the course of the exposure (Appendix figures S2, S3). Mixed variant infections were not retained equally, and the relative variant frequencies differed between investigated compartments of the respiratory tract, suggesting roles for randomness or host-and-tissue specific differences in virus fitness.

      A combination of host, environmental and virus parameters, many of which vary through time, play a role in virus transmission. These include virus phenotype, shedding in air, individual variability and sex differences, changes in breathing patterns, and droplet size distributions. Alongside recognized social and environmental factors, these host and viral parameters might help explain why the epidemiology of SARS-CoV-2 exhibits classic features of over-dispersed transmission [37]. Namely, SARS-CoV-2 circulates continuously in the human population, but many transmission chains are self-limiting, while rarer superspreading events account for a substantial fraction of the virus’s total transmission. Heterogeneity in the respiratory viral loads is high and some infected humans release tens to thousands of SARS-CoV-2 virions/min [38, 39]. Our findings recapitulate this in an animal model and provide further insights into mechanisms underlying successful transmission events. Quantitative assessment of virus and host parameters responsible for the size, duration and infectivity of exhaled aerosols may be critical to advance our understanding of factors governing the efficiency and heterogeneity of transmission for SARS-CoV-2, and potentially other respiratory viruses. In turn, these insights may lay the foundation for interventions targeting individuals and settings with high risk of superspreading, to achieve efficient control of virus transmission [40].”

      • The limitation section should mention that this animal model does not capture the large prevalence of pre-existing immunity at present in the population and may therefore be less relevant in the current epidemiologic context.

      We agree and have added this more clearly, see response above.

      • Limitation: it is unclear if airway and droplet dynamics in the hamster model are representative of humans.

      We have added the following sentence: Line 331: “It remains to be determined how well airway and particle size distribution dynamics in Syrian hamsters model those in humans.”

      • The mathematical model is termed semi-mechanistic but I think this is not accurate as the model appears to have no mechanistic assumptions.

      We describe the model as semi-mechanistic because it uses mechanistic descriptions of the shedding and uptake process (as described above), incorporating factors including respiration rate and environmental loss, and makes the mechanistic assumption that measurable swab and airborne shedding all stem from a shared within-host infection process that produces exponential growth of virus up to a peak, followed by exponential decay. The model is only semi-mechanistic, however, as we do not attempt a full model of within-host viral replication and shedding (e.g. a target-cell limited virus kinetics model).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Comment 1: It is worth mentioning that the authors show that there are Arid1a transcripts that escape the Cre system. This might mask the phenotype of the Arid1a knockout, given that many sequencing techniques used here are done on a heterogeneous population of knockout and wild-type spermatocytes.

      Response: The proportions of undifferentiated spermatogonia (PLZF+) with detectable (ARID1A+) and non-detectable (ARID1A=) levels of ARID1A protein by immunostaining on testes cryosections obtained from 1-month old Arid1afl/fl (control) and Arid1acKO (CKO) males were 74% ARID1A negative (CKO) and 26% ARID1A positive (CKO) as compared to 95% ARID1A positive and 5% ARID1A negative in WT controls. The manuscript includes these data (page 5, lines 114-116). Furthermore, Western blot analysis of STA-Put purified pachytene WT and mutant spermatocytes showed significantly reduced levels of ARID1A protein in mutant cells (95% reduction). The manuscript has added these data (page 5, line 116 and Fig. S2).

      Comment 2: In relation to this, I think that the use of the term "pachytene arrest" might be overstated, since this is not the phenotype truly observed (these mice produce sperm).

      Response: Based on the profiling of prophase-I spermatocytes by co-staining for SYCP3 and ARID1A, we observed a marked reduction in mid-late pachytene spermatocytes that lacked ARID1A, indicating a failure to progress beyond pachynema in the absence of ARID1A (Table 1 in manuscript). Furthermore, we were unable to detect diplotene spermatocytes lacking ARID1A protein. Haploid spermatid populations isolated from Arid1acKO males appeared normal, expressing the wild-type allele, suggesting that they originated from spermatocytes that failed to undergo efficient Cre recombination (Fig. S3). Arid1acKO also produces viable sperm at a level equal to their wild-type controls (see page 5, lines 123-126). It is reasonable to conclude that the absence of ARID1A results in a pachynema arrest and that the viable sperm are from escapers. We cannot make any conclusions regarding the requirement of ARID1A for progression beyond pachynema.

      Comment 3: ARID1A is present throughout prophase I, and it might have pre-MSCI roles that impact earlier stages of Meiosis I, and cell death might be happening in these earlier stages too.

      Response: We did not observe an effect on the frequency of leptotene and zygotene spermatocytes lacking ARID1A. There appeared to be an accumulation of these prophase-I populations in response to the loss of ARID1A, consistent with a failure in progression beyond pachynema in the mutants (Table 1 in the manuscript).

      Additionally, we did not detect any significant difference in the numbers of undifferentiated spermatogonia expressing PLZF (also known as ZBTB16) in 1-month-old Arid1acKO relative to Arid1afl/fl males (see Table below, now included in the manuscript as supplemental Table 1). Therefore, the Arid1a conditional knockouts generated with a Stra8-Cre did not appear to impact earlier stages of spermatogenesis. However, potential roles of ARID1A early in spermatogenesis might be revealed using a more efficient and earlier-acting germline Cre transgene. In this case, an inducible Cre transgene would be needed, given the haploinsufficiency associated with Arid1a. Such haploinsufficiency was why we used the Stra8-Cre. The lack of Cre expression in the female germline allowed the transmission of the floxed allele maternally.

      Author response table 1.

      Comment 4: Overall, the research presented here is solid, adds new knowledge on how sex chromatin is silenced during meiosis, and has generated relevant databases for the field.

      Response: We thank the reviewer for this comment.

      Reviewer 2

      Comment 1: The conditional deletion mouse model of ARIDA using Stra8-cre showed inefficient deletion; spermatogenesis did not appear to be severely compromised in the mutants. Using this data, the authors claimed that meiotic arrest occurs in the mutants. This is obviously a misinterpretation.

      Response: As stated in response to Reviewer 1, testes cryosections obtained from 1-month-old control and mutant males showed that 74% are ARID1A negative (CKO) and 26% ARID1A positive (CKO) as compared to 95% ARID1A positive and 5% ARID1A negative in WT controls (page 5, lines 114-116). This difference is dramatic. Western blot analysis of STA-Put purified pachytene WT and mutant spermatocytes also showed a significant reduction of ARID1A protein in mutant cells (Fig. S2). We observed a marked decrease in mid-late pachytene spermatocytes that lacked ARID1A, indicating a failure to progress beyond pachynema without ARID1A (Table 1 from the manuscript). Furthermore, we were unable to detect any diplotene spermatocytes lacking ARID1A protein. These data suggest that the haploid spermatids originated from spermatocytes that failed to undergo efficient Cre recombination (Fig. S3). Comparison of cKO and wild-type littermate yielded nearly identical results (Avg total conc WT = 32.65 M/m; Avg total conc cKO = 32.06 M/ml), indicating that the cKO’s produce viable sperm at a level equal to their wild-type controls. Taken together, the conclusion that the absence of ARID1A results in a pachynema arrest and that the escapers produce the haploid spermatids is firm. By IF, we see that ~70% of the spermatocytes have deleted ARID1A. Therefore, we disagree with the reviewer’s comments that “spermatogenesis did not appear to be severely compromised in the mutants”.

      Comment 2: In the later parts, the authors performed next-gen analyses, including ATAC-seq and H3.3 CUT&RUN, using the isolated cells from the mutant mice. However, with this inefficient deletion, most cells isolated from the mutant mice appeared not to undergo Cre-mediated recombination. Therefore, these experiments do not tell any conclusion pertinent to the Arid1a mutation.

      Response: We agree that the ATAC-seq and CUT&RUN data were derived from a mixed population of pachytene spermatocytes consisting of mutants and, to a much lesser extent, escapers. As stated, based on our previous study (Menon et al., 2021, Nat. Commun., PMID: 34772938) and additional analyses in this current work, the undifferentiated spermatogonia lacking ARID1A indicates that Stra8-Cre is ~ 70% efficient. With this efficiency, we can detect striking changes in H3.3 occupancy and chromatin accessibility in the mutants relative to wild-type spermatocytes.

      Comment 3: Furthermore, many of the later parts of this study focus on the analysis of H3.3 CUT&RUN. However, Fig. S7 clearly suggests that the H3.3 CUT&RUN experiment in the wild-type simply failed. Thus, none of the analyses using the H3.3 CUT&RUN data can be interpreted.

      Response: We would like to draw the attention of the reviewer to a recent study (Fointane et al., 2022, NAR, PMID: 35766398) where the authors observed an identical X chromosome-wide spreading of H3.3 in mouse meiotic cells by ChIP-seq. The genomic distribution matches the microscopic observation of H3.3 coating of the sex chromosomes. Therefore, in normal spermatocytes, H3.3 distribution is pervasive across the X chromosome, with very few peaks observed in intergenic regions. Additionally, we detected H3.3 enrichment at TSSs of ARID1A-regulated autosomal genes in wild-type pachytene spermatocytes, albeit reduced relative to the mutants, indicating that the H3.3 CUT&RUN worked. For these reasons, we do not agree with the reviewer’s assessment that the H3.3 CUT&RUN experiment failed in the wild type.

      Comment 4: If the author wishes to study the function of ARID2 in spermatogenesis, they may need to try other cre-lines to have more robust phenotypes, and all analyses must be redone using a mouse model with efficient deletion of ARID2.

      Response: As noted, we chose Stra8-Cre to conditionally knockout Arid1a because ARID1A is haploinsufficient during embryonic development. The lack of Cre expression in the maternal germline allows for transmission of the floxed allele, allowing for the experiments to progress.

      Reviewer 3

      Comment 1: A challenge with the author's CKO model is the incomplete efficiency of ARID1A loss, due to incomplete CRE-mediated deletion. The authors effectively work around this issue, but they don't state specifically what percentage of CKO cells lack ARID1A staining. This information should be added.

      Response: Our data indicate that Stra8-Cre is ~ 70% efficient. This information has been added.

      Comment 2: They refer to cells that retain ARID1A staining in CKO testes as 'internal controls' but this reviewer finds that label inappropriate.

      Response: We have dropped ‘internal controls’ and used ‘escapers’ instead.

      Comment 3: Although some cells that retain ARID1A won't have undergone CRE-mediated excision, others may have excised but possibly have delayed kinetics of deletion or ARID1A RNA/protein turnover and loss. Such cells likely have partial ARID1A depletion to different extents and, therefore, in some cases, are no longer wild-type. In subsequent figures in which co-staining for ARID1A is done, it would be appropriate for the authors to specify if they are quantifying all cells from CKO testes, or only those that lack ARID1A staining.

      Response: We were unable to detect any diplotene spermatocytes lacking ARID1A protein. The data suggest that the haploid spermatids originated from spermatocytes that failed to undergo efficient Cre recombination (Fig. S3). Thus, we conclude that the absence of ARID1A results in a pachynema arrest and that the escapers produce haploid spermatids. In figures displaying quantification data, we indicate whether the quantification was performed on spermatocytes lacking or containing ARID1A from cKO testes. By IF, we see that ~70% of the spermatocytes have deleted ARID1A.

      Comment 4: The authors don't see defects in a few DDR markers in ARID1A CKO cells and conclude that the role of ARID1A in silencing is 'mutually exclusive to DDR pathways' (p 12) and 'occurs independently of DDR signaling' (p30). The data suggest that ARID1A may not be required for DDR signaling, but do not rule out the possibility that ARID1A is downstream of DDR signaling (and the authors even hypothesize this on p30). The data provided do not justify the conclusion that ARID1A acts independently of DDR signaling.

      associated DDR factors such as: H2Ax; ATR; and MDC1. We observed an abnormal persistence of elongating RNA polymerase II on the mutant XY body in response to the loss of ARID1A, emphasizing its role in the transcriptional repression of the XY during pachynema. The loss of ARID1A results in a failure to silence sex-linked genes and does so in the presence of DDR signaling factors in the XY body. As the reviewer notes, we highlighted the possibility that DDR pathways might influence ARID1A recruitment to the XY, evidenced by the hyperaccumulation of ARID1A on the sex body late in diplonema. Therefore, whether ARID1A is dependent on DDR signaling remains an open question.

      Comment 5: After observing no changes in levels or localization of H3.3 chaperones, the authors conclude that 'ARID1A impacts H3.3 accumulation on the sex chromosomes without affecting its expression or incorporation during pachynema.' It's not clear to this reviewer what the authors mean by this. Aside from the issue of not having tested DAXX or HIRA activity, are they suggesting that some other process besides altered incorporation leads to H3.3 accumulation, and if so, what process would that be?

      Response: The loss of ARID1A might result in an abnormal redistribution of DAXX or HIRA on the XY, potentially contributing to the defects in H3.3 accumulation and canonical H3.1/3.2 eviction on the XY. While speculative at this point, it is also possible that the persistence of elongating RNAPII in response to the loss of ARID1A might prevent the sex chromosome-wide coating of H3.3. Addressing the mechanism underlying ARID1A-governed H3.3 accumulation on the XY body remains a topic for future investigation.

      Comment 6: The authors find an interesting connection between certain regions that gained chromatin accessibility after ARID1A loss (clusters G1 and G3) and the presence of the PRDM9 sequence motif. The G1 and G3 clusters also show DMC1 occupancy and H3K4me3 enrichment. However, an additional cluster with gained accessibility (G4) also shows DMC1 occupancy and H3K4me3 enrichment but has modest H3.3 accumulation. The paper would benefit for additional discussion about the G4 cluster (which encompasses 960 peak calls). Is there any enrichment of PRDM9 sites in G4? If H3.3 exclusion governs meiotic DSBs, how does cluster G4 fit into the model?

      Response: We agree that, compared to G1+G3, cluster G4 shows an insignificant increase in H3.3 occupancy in the absence of ARID1A (Figure 6B). The plot profile associated with the heatmap confirms this result (Figure 6B). Therefore, cluster G4 is very distinct in its chromatin composition from G1+G3 upon the loss of ARID1A and, as such, is not inconsistent with our model of H3.3 antagonism with DSB sites. Additionally, we did not observe an enrichment of PRDM9 sites in G4. Since G4 does not display similar dynamics in H3.3 occupancy to G1+G3, DMC1 association might not be perturbed at G4 in response to the loss of ARID1A. Future studies will be required to determine the genomic associations of DMC1 and H3K4me3 in response to the loss of ARID1A.

      Comment 7: The impacts of ARID1A loss on DMC1 focus formation (reduced sex chromosome association) are very interesting and also raise additional questions. Are DMC1 foci on autosomes also affected during pachynema? The corresponding lack of apparent effect on RAD51 implies that breaks are still made and resected, enabling RAD51 filament formation. A more thorough quantitative assessment of RAD51 focus formation will be interesting in the long run, enabling determination of the number of break sites and the kinetics of repair, which the authors suggest is perturbed by ARID1A loss but doesn't directly test. It isn't clear how a nucleosomal factor (H3.3) would influence loading of recombinases onto ssDNA, especially if the alteration is not at the level of resection and ssDNA formation. Additional discussion of this point is warranted. Lastly, there currently are various notions for the interplay between RAD51 and DMC1 in filament formation and break repair, and brief discussion of this area and the implications of the new findings from the ARID1A CKO would strengthen the paper further.

      Response: The impact of H3.3 on the loading of recombinases might be an indirect consequence of ARID1A-governed sex-linked transcriptional repression. In a recent study, Alexander et al. (Nat. Commun, 2023, PMID: 36990976) showed that transcriptional activity and meiotic recombination are spatially compartmentalized during meiosis. Therefore, the persistence of elongating RNA polymerase II on a sex body depleted for H3.3 in the absence of ARID1A might contribute to the defect in DMC1 association. RAD51 and DMC1 are known to bind ssDNA at PRDM9/SPO11 designated DSB hotspots. However, these recombinases occupy unique domains. DMC1 localizes nearest the DSB breakpoint, promoting strand exchange, whereas RAD51 is further away (Hinch et al., PMID32610038). We show that loss of Arid1a decreases DMC1 foci on the XY chromosomes without affecting RAD51. These findings indicate that BAF-A plays a role in the loading and/or retention of DMC1 to the XY chromosomes. This information has been added to the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to both reviewers for taking the time to review our manuscript and data in great detail. We thank you for the fair assessment of our work, the helpful feedback, and for recognizing the value of our work. We have done our best to address your concerns below:

      eLife assessment This work reports a valuable finding on glucocorticoid signaling in male and female germ cells in mice, pointing out sexual dimorphism in transcriptomic responsiveness. While the evidence supporting the claims is generally solid, additional assessments would be required to fully confirm an inert GR signaling despite the presence of GR in the female germline and GR-mediated alternative splicing in response to dexamethasone treatment in the male germline. The work may interest basic researchers and physician-scientists working on reproduction and

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cincotta et al set out to investigate the presence of glucocorticoid receptors in the male and female embryonic germline. They further investigate the impact of tissue-specific genetically induced receptor absence and/or systemic receptor activation on fertility and RNA regulation. They are motivated by several lines of research that report inter and transgenerational effects of stress and or glucocorticoid receptor activation and suggest that their findings provide an explanatory mechanism to mechanistically back parental stress hormone exposure-induced phenotypes in the offspring.

      Strengths:

      A chronological immunofluorescent assessment of GR in fetal and early life oocyte and sperm development.

      RNA seq data that reveal novel cell type specific isoforms validated by q-RT PCR E15.5 in the oocyte.

      2 alternative approaches to knock out GR to study transcriptional outcomes. Oocytes: systemic GR KO (E17.5) with low input 3-tag seq and germline-specific GR KO (E15.5) on fetal oocyte expression via 10X single cell seq and 3-cap sequencing on sorted KO versus WT oocytes both indicating little impact on polyadenylated RNAs

      2 alternative approaches to assess the effect of GR activation in vivo (systemic) and ex vivo (ovary culture): here the RNA seq did show again some changes in germ cells and many in the soma.

      They exclude oocyte-specific GR signaling inhibition via beta isoforms.

      Perinatal male germline shows differential splicing regulation in response to systemic Dex administration, results were backed up with q-PCR analysis of splicing factors. Weaknesses:

      COMMENT #1: The presence of a protein cannot be entirely excluded based on IF data

      We agree that very low levels of GR could escape the detection by IF and confocal imaging. We feel that our IF data do match transcript data in our validation studies of the GR KO using (1) qRT-PCR on fetal ovary in Fig 2E and (2) scRNA-seq in germ cells and ovarian soma in Fig S2B.

      COMMENT #2: (staining of spermatids is referred to but not shown).

      You are correct that this statement was based on a morphological identification of spermatids using DAPI morphology. We have performed a co-stain for GR with the spermatocyte marker SYCP3, and the spermatid/spermatozoa marker PNA (Peanut Agglutinin; from Arachis hypogaea) in adult testis tissue. We have updated Figure 4D to reflect this change, as well as the corresponding text in the Results section.

      COMMENT #3: The authors do not consider post-transcriptional level a) modifications also triggered by GR activation b) non-coding RNAs (not assessed by seq).

      We thank the reviewer for raising this very important point about potential post-transcriptional (non-genomic) effects of GR in the fetal oocyte. We agree that while our RNA-seq results show only a minimal transcriptional response, we cannot rule out a non-canonical signaling function of GR, such as the regulation of cellular kinases (as reviewed elsewhere1), or the regulation of non coding RNAs at the post-transcriptional level, and we have amended the discussion to include a sentence on this point. However, while we fully acknowledge the possibility of GR regulating non-genomic level cellular signaling, we chose not to explore this option further based on the lack of any overall functional effect on meiotic progression when GR signaling was perturbed- either by KO (Figure 2D) or dex-mediated activation (Figure S3C).

      COMMENT #4: Sequencing techniques used are not total RNA but either are focused on all polyA transcripts (10x) or only assess the 3' prime end and hence are not ideal to study splicing

      We thank the reviewer for raising this concern, however this statement is not correct and we have clarified this point in the Results section to explain how the sequencing libraries of the male germ cell RNA-seq were prepared. We agree that certain sequencing techniques (such as 3’ Tag-Seq) that generate sequencing libraries from a limited portion of an entire transcript molecule are not appropriate for analysis of differential splicing. This was not the case, however, for the RNA-seq libraries prepared on our male germ cells treated with dexamethasone. These libraries were constructed using full length transcripts that were reverse transcribed using random hexamer priming, thus accounting for sequencing coverage across the full transcript length. As a result, this type of library prep technique should be sufficient for capturing differential splicing events along the length of the transcript. We do, however, point out that these libraries were constructed on polyA-enriched transcripts. Thus while we obtained full length transcript coverage for these polyA transcripts, any differential splicing taking place in non poly-adenylated RNA moieties were not captured. While we are excited about the possibility of exploring GR-mediated splicing regulation of other RNA species in the future, we chose to focus the scope of our current study on polyA mRNA molecules specifically.

      COMMENT #5: The number of replicates in the low input seq is very low and hence this might be underpowered

      While the number of replicates (n=3-4 per condition) is sufficient for performing statistical analysis of a standard RNA-seq experiment, we do acknowledge and agree with the reviewer that low numbers of FACS-sorted germ cells from individual embryos combined with the low input 3’ Tag-Seq technique could have led to higher sample variability than desired. Given that we validated our bulk RNA-seq analysis of GR knockout ovaries using an orthogonal single-cell RNA-seq approach, we feel that our conclusions regarding a lack of transcriptional changes upon GR deletion remain valid.

      COMMENT #6: Since Dex treatment showed some (modest) changes in oocyte RNA - effects of GR depletion might only become apparent upon Dex treatment as an interaction.

      We may be missing the nuance of this point, but our interpretation of an effect that is seen only when the KO is treated with Dex would be that the mechanism would not be autonomous in germ cells but indirect or off-target.

      COMMENT #7: Effects in oocytes following systemic Dex might be indirect due to GR activation in the soma.

      As both the oocytes and ovarian soma express GR during the window of dex administration, we agree that it is possible that the few modest changes seen in the oocyte transcriptome are the result of indirect effects following robust GR signaling in the somatic compartment. However, given that these modest oocyte transcript changes in response to dex treatment did not significantly alter the ability of oocytes to progress through meiosis, we chose not to explore this mechanism further.

      COMMENT #8: Even though ex vivo culture of ovaries shows GR translocation to the nucleus it is not sure whether the in vivo systemic administration does the same.

      AND

      The conclusion that fetal oocytes are resistant to GR manipulation is very strong, given that "only" poly A sequencing and few replicates of 3-prime sequencing have been analyzed and information is lacking on whether GR is activated in germ cells in the systemically dex-injected animals.

      If we understand correctly, the first part refers to a technical limitation and the second part takes issue with our interpretation of the data. For the former, we appreciate this astute insight on the conundrum of detecting a response to systemic dex in fetal oocytes, which is generally monitored by nuclear translocation of GR. As shown in Figure 1A and 1B, GR localization is overwhelmingly nuclear in fetal oocytes of WT animals at E13.5 without addition of any dex. We could not, therefore, use GR translocation as a proxy for activation in response to dex treatment. We instead used ex vivo organ culture to monitor localization changes, as we were able to maintain fetal ovaries ex vivo in hormone-depleted and ligand negative conditions. As shown in Fig. 3, these defined culture conditions elicited a shift of GR to the cytoplasm of fetal oocytes. This led us to conclude that GR is capable of translocating between nucleus and cytoplasm in fetal oocytes, and we were able to counteract this loss in nuclear localization by providing dex ligand in the media.

      We feel that our conclusion that oocytes are resistant to manipulation of glucocorticoid signaling despite their possession of the receptor and capacity for nuclear translocation is substantiated by multiple results: meiotic phenotyping, bulk RNA-seq and scRNA-seq analysis of both GR KO and dex dosed mice. Our basis for testing the timing and fidelity of meiotic prophase I was the coincident onset of GR expression in female germ cells at E13, and the disappearance of GR in neonatal oocytes as they enter meiotic arrest. The lack of transcriptional changes observed in oocytes in response to dex has made it even more challenging to demonstrate a bona fide “activation” of GR. Observation of a dose-dependent induction of the canonical GR response gene Fkbp5 in the somatic cells of the fetal ovary (Figure S3A and 3A) affirmed that dex traverses the placenta. We agree with the reviewer that it remains possible that dex or GR KO could lead to changes in epigenetic marks or small RNAs in oocytes, and have mentioned these possibilities in the discussion, but we note that even epigenetic perturbations during oocyte development such as the loss of Tet1 or Dnmt1 result in measurable changes in the transcriptome and the timing of meiotic prophase 2–4.

      COMMENT #9: This work is a good reference point for researchers interested in glucocorticoid hormone signaling fertility and RNA splicing. It might spark further studies on germline-specific GR functions and the impact of GR activation on alternative splicing. While the study provides a characterization of GR and some aspects of GR perturbation, and the negative findings in this study do help to rule out a range of specific roles of GR in the germline, there is still a range of other potential unexplored options. The introduction of the study eludes to implications for intergenerational effects via epigenetic modifications in the germline, however, it does not mention that the indirect effects of reproductive tissue GR signaling on the germline have indeed already been described in the context of intergenerational effects of stress.

      The reviewer raises an excellent point that we have not made sufficient distinction in our manuscript between prior studies of gestational stress and preconception stress and the light that our work may shed on those findings. We have revised the introduction to clarify this difference, and added reference to an outstanding study that identifies glucocorticoid-induced changes to microRNA cargo of extracellular vesicles shed by epididymal epithelial cells that when transferred to mature sperm can induce changes in the HPA axis and brain of offspring 5. Interestingly, this GR-mediated effect in the epididymal epithelial cells concurs with our observation in the adult testis that GR can be detected only cKit+ spermatogonia but not in subsequent stages of spermatids.

      COMMENT #10: Also, the study does not assess epigenetic modifications.

      We agree with the reviewer that exploring the role of GR in regulating epigenetic modifications within the germline is an area of extreme interest given the potential links between stress and transgenerational epigenetic inheritance. As this is a broader topic that requires a more thorough and comprehensive set of experiments, we have intentionally chosen to keep this work separate from the current study, and hope to expand upon this topic in the future.

      COMMENT #11: The conclusion that the persistence of a phenotype for up to three generations suggests that stress can induce lasting epigenetic changes in the germline is misleading. For the reader who is unfamiliar with the field, it is important to define much more precisely what is referred to as "a phenotype". Furthermore, this statement evokes the impression that the very same epigenetic changes in the germline have been observed across multiple generations.

      We see how this may be misleading, and we have amended the text of the introduction and discussion accordingly to avoid the use of the term “phenotype”.

      COMMENT #12: The evidence of the presence of GR in the germline is also somewhat limited - since other studies using sequencing have detected GR in the mature oocyte and sperm.

      As described above in response to Comment #2, we have included immunostaining of adult testis in a revised Figure 4D and shown that we detect GR in PLZF+ and cKIT+ spermatogonia. We also show low/minimal expression in some (SYCP3+) early meiotic spermatocytes, but not in (Lectin+) spermatids. We are not aware of any studies that have shown expression of GR protein in the mature oocyte.

      COMMENT #13: The discussion ends again on the implications of sex-specific differences of GR signaling in the context of stress-induced epigenetic inheritance. It states that the observed differences might relate to the fact that there is more evidence for paternal lineage findings, without considering that maternal lineage studies in epigenetic inheritance are generally less prevalent due to some practical factors - such as more laborious study design making use of cross-fostering or embryo transfer.

      We thank the reviewer for this valid point, and we have amended the discussion section.

      Reviewer #2 (Public Review):

      Summary:

      There is increasing evidence in the literature that rodent models of stress can produce phenotypes that persist through multiple generations. Nevertheless, the mechanism(s) by which stress exposure produces phenotypes are unknown in the directly affected individual as well as in subsequent offspring that did not directly experience stress. Moreover, it has also been shown that glucocorticoid stress hormones can recapitulate the effects of programmed stress. In this manuscript, the authors test the compelling hypothesis that glucocorticoid receptor (GR)-signaling is responsible for the transmission of phenotypes across generations. As a first step, the investigators test for a role of GR in the male and female germline. Using knockouts and GR agonists, they show that although germ cells in male and female mice have GR that appears to localize to the nucleus when stimulated, oocytes are resistant to changes in GR levels. In contrast, the male germline exhibits changes in splicing but no overt changes in fertility.

      Strengths:

      Although many of the results in this manuscript are negative, this is a careful and timely study that informs additional work to address mechanisms of transmission of stress phenotypes across generations and suggests a sexually dimorphic response to glucocorticoids in the germline. The work presented here is well-done and rigorous and the discussion of the data is thoughtful. Overall, this is an important contribution to the literature.

      Reviewer #1 (Recommendations For The Authors):

      RECOMMENDATION #1: To assess whether in females the systemic Dex administration directly activates GR in oocytes it would be great to assess GR activation following Dex administration, and ideally to see the effects abolished when Dex is administered to germline-specific KO animals.

      In regard to the recommendation to assess GR activation in response to systemic dex administration, we refer the reviewer back to our response in Comment #8 highlighting the difficulties defining and measuring GR activation in the germline.

      This therefore has made it difficult to assess whether any of the modest effects seen in response to dex are abolished in our germline-specific KO animals. While repeating our RNA-seq experiment in dex-dosed germline KO animals would address whether the ~60 genes induced in oocytes are the result of oocyte-intrinsic GR activity, we have decided not to explore this mechanism further due to the overall lack of a functional effect on meiotic progression in response to dex (Figure S3C).

      RECOMMENDATION #2: To further strengthen the link between GR and alternative splicing it would be great to see the dex administration experiment repeated in germline specific GR KO's.

      While we understand the reviewer’s suggestion to explore whether deletion of GR in the spermatogonia is sufficient to abrogate the dex-mediated decreases in splice factor expression, we chose not to explore the details of this mechanism given that deletion of GR in the male germline does not impair fertility (Figure 6).

      RECOMMENDATION #3: I am wondering how much a given reduction in one of the splicing factors indeed affects splicing events. Can the authors relate this to literature, or maybe an in vitro experiment can be done to see whether the level of differential splicing events detected is in a range that can be expected in the case of the magnitude of splicing factor reduction?

      It has been shown in many instances in the literature that a full genetic deletion of a single splice factor leads to impairments in spermatogenesis, and ultimately infertility 6–16. We suspect that dex treatment leads to fewer differential splicing events than a full splice factor deletion, given that dex treatment causes a broader decrease in splice factor expression without entirely abolishing any single splice factor. We have amended the discussion section to include this point. While we share the reviewer’s curiosity to compare the effects of dex vs genetic deletion of splicing machinery on the overall magnitude of differential splicing events, we unfortunately do not have access to mice with a floxed splice factor at this time. While we have considered knocking out one or more splice factors in an ex vivo cultured testis to compare alongside dex treatment, our efforts to date have proven unsuccessful due to high cell death upon culture of the postnatal testis for more than 24 hours.

      RECOMMENDATION #4: It is unclear from the methods whether in germline-specific KO's also the controls received tamoxifen.

      We thank the reviewer for catching this missing piece of information. All control embryos that were assessed received an equivalent dose of tamoxifen to the germline-specific KO embryos. The only difference between cKOs and controls was the presence of the Cre transgene. We have updated the Materials and Methods 3’ Tag-Seq sample preparation section to include the sentence: “Both GRcKO/cKO and control GRflox/flox embryos were collected from tamoxifen-injected dams, and thus were equally exposed to tamoxifen in utero”.

      Reviewer #2 (Recommendations For The Authors):

      I just have only a few comments/questions.

      RECOMMENDATION #5: It is somewhat surprising that GR is expressed in female germ cells, yet there doesn't seem to be a requirement. Is there any indication of what it does? Is the long-term stability of the germline compromised?

      We thank the reviewer for these questions, and we agree that it was quite surprising to find a lack of GR function in the female germline despite its robust expression. The question of whether loss of GR affects the long-term stability of the female germline is interesting, given that similar work in GR KO zebrafish has shown impairments to female reproductive capacity, yet only upon aging 17–19.

      While we have shared interest in this question, technical limitations thus far have prevented us from properly assessing the effect of GR loss in aged females. Homozygous deletion of GR results in embryonic lethality at approximately E17.5. Conditional deletion of GR using Oct4-CreERT2 with a single dose of tamoxifen (2.5 mg / 20g mouse) at E9.5 results in complete deletion of GR by E10.5, although dams consistently suffer from dystocia and are no longer able to deliver viable pups. While using the more active tamoxifen metabolite (4OHT) at 0.1 mg / 20g has allowed for successful delivery, the resulting deletion rate is very poor (see qPCR results in panel below, left). While using half the dose of standard tamoxifen (1.25 mg / 20g mouse) at E9.5 has on rare occasions led to a successful delivery, the resulting recombination efficiency is insufficient (Author response image 1 right panel).

      Author response image 1.

      While a Blimp1-Cre conditional KO model was used to assess male fertility on GR deletion, we believe this model may not be ideal for studying fertility in the context of aging. While Blimp1-Cre is highly specific to the germ cells within the gonad, there are many cell types outside of the gonad that express Blimp1, including the skin and certain cells of the immune system. It is unclear, particularly over the course of aging, whether any effects on fertility seen would be due to an oocyte-intrinsic effect, or the result of GR loss elsewhere in the body. While we hope to explore the role of GR in the aging oocyte further using alternative Cre models in the future, this is currently outside the scope of this work.

      RECOMMENDATION #6: Figure 5b: what is the left part of that panel? Is it the same volcano plot for germ cells as shown in part a but with splicing factors?

      We apologize if this panel was unclear. Yes, the left panel of Figure 5B is in fact the same volcano plot in 5A, labeled with splicing factors instead of top genes. We have edited Figure 5B and corresponding figure legend to clarify this.

      References: 1. Oakley, R.H., and Cidlowski, J.A. (2013). The biology of the glucocorticoid receptor: New signaling mechanisms in health and disease. J. Allergy Clin. Immunol. 132, 1033–1044. 10.1016/j.jaci.2013.09.007.

      1. Hargan-Calvopina, J., Taylor, S., Cook, H., Hu, Z., Lee, S.A., Yen, M.-R., Chiang, Y.-S., Chen, P.-Y., and Clark, A.T. (2016). Stage-Specific Demethylation in Primordial Germ Cells Safeguards against Precocious Differentiation. Dev. Cell 39, 75–86. 10.1016/j.devcel.2016.07.019.

      2. Hill, P.W.S., Leitch, H.G., Requena, C.E., Sun, Z., Amouroux, R., Roman-Trufero, M., Borkowska, M., Terragni, J., Vaisvila, R., Linnett, S., et al. (2018). Epigenetic reprogramming enables the transition from primordial germ cell to gonocyte. Nature 555, 392–396. 10.1038/nature25964.

      3. Eymery, A., Liu, Z., Ozonov, E.A., Stadler, M.B., and Peters, A.H.F.M. (2016). The methyltransferase Setdb1 is essential for meiosis and mitosis in mouse oocytes and early embryos. Development 143, 2767–2779. 10.1242/dev.132746.

      4. Chan, J.C., Morgan, C.P., Leu, N.A., Shetty, A., Cisse, Y.M., Nugent, B.M., Morrison, K.E., Jašarević, E., Huang, W., Kanyuch, N., et al. (2020). Reproductive tract extracellular vesicles are sufficient to transmit intergenerational stress and program neurodevelopment. Nat Commun 11, 1499. 10.1038/s41467-020-15305-w.

      5. Kuroda, M., Sok, J., Webb, L., Baechtold, H., Urano, F., Yin, Y., Chung, P., Rooij, D.G. de, Akhmedov, A., Ashley, T., et al. (2000). Male sterility and enhanced radiation sensitivity in TLS−/− mice. Embo J 19, 453–462. 10.1093/emboj/19.3.453.

      6. Liu, W., Wang, F., Xu, Q., Shi, J., Zhang, X., Lu, X., Zhao, Z.-A., Gao, Z., Ma, H., Duan, E., et al. (2017). BCAS2 is involved in alternative mRNA splicing in spermatogonia and the transition to meiosis. Nat Commun 8, 14182. 10.1038/ncomms14182.

      7. Li, H., Watford, W., Li, C., Parmelee, A., Bryant, M.A., Deng, C., O’Shea, J., and Lee, S.B. (2007). Ewing sarcoma gene EWS is essential for meiosis and B lymphocyte development. J Clin Invest 117, 1314–1323. 10.1172/jci31222.

      8. O’Bryan, M.K., Clark, B.J., McLaughlin, E.A., D’Sylva, R.J., O’Donnell, L., Wilce, J.A., Sutherland, J., O’Connor, A.E., Whittle, B., Goodnow, C.C., et al. (2013). RBM5 Is a Male Germ Cell Splicing Factor and Is Required for Spermatid Differentiation and Male Fertility. Plos Genet 9, e1003628. 10.1371/journal.pgen.1003628.

      9. Zagore, L.L., Grabinski, S.E., Sweet, T.J., Hannigan, M.M., Sramkoski, R.M., Li, Q., and Licatalosi, D.D. (2015). RNA Binding Protein Ptbp2 Is Essential for Male Germ Cell Development. Mol Cell Biol 35, 4030–4042. 10.1128/mcb.00676-15.

      10. Xu, K., Yang, Y., Feng, G.-H., Sun, B.-F., Chen, J.-Q., Li, Y.-F., Chen, Y.-S., Zhang, X.-X., Wang, C.-X., Jiang, L.-Y., et al. (2017). Mettl3-mediated m6A regulates spermatogonial differentiation and meiosis initiation. Cell Res 27, 1100–1114. 10.1038/cr.2017.100.

      11. Horiuchi, K., Perez-Cerezales, S., Papasaikas, P., Ramos-Ibeas, P., López-Cardona, A.P., Laguna-Barraza, R., Balvís, N.F., Pericuesta, E., Fernández-González, R., Planells, B., et al. (2018). Impaired Spermatogenesis, Muscle, and Erythrocyte Function in U12 Intron Splicing-Defective Zrsr1 Mutant Mice. Cell Reports 23, 143–155. 10.1016/j.celrep.2018.03.028.

      12. Ehrmann, I., Crichton, J.H., Gazzara, M.R., James, K., Liu, Y., Grellscheid, S.N., Curk, T., Rooij, D. de, Steyn, J.S., Cockell, S., et al. (2019). An ancient germ cell-specific RNA-binding protein protects the germline from cryptic splice site poisoning. Elife 8, e39304. 10.7554/elife.39304.

      13. Legrand, J.M.D., Chan, A.-L., La, H.M., Rossello, F.J., Änkö, M.-L., Fuller-Pace, F.V., and Hobbs, R.M. (2019). DDX5 plays essential transcriptional and post-transcriptional roles in the maintenance and function of spermatogonia. Nat Commun 10, 2278. 10.1038/s41467-019-09972-7.

      14. Yuan, S., Feng, S., Li, J., Wen, H., Liu, K., Gui, Y., Wen, Y., and Wang, X. (2021). hnRNPH1 recruits PTBP2 and SRSF3 to cooperatively modulate alternative pre-mRNA splicing in germ cells and is essential for spermatogenesis and oogenesis. 10.21203/rs.3.rs-1060705/v1.

      15. Wu, R., Zhan, J., Zheng, B., Chen, Z., Li, J., Li, C., Liu, R., Zhang, X., Huang, X., and Luo, M. (2021). SYMPK Is Required for Meiosis and Involved in Alternative Splicing in Male Germ Cells. Frontiers Cell Dev Biology 9, 715733. 10.3389/fcell.2021.715733.

      16. Maradonna, F., Gioacchini, G., Notarstefano, V., Fontana, C.M., Citton, F., Valle, L.D., Giorgini, E., and Carnevali, O. (2020). Knockout of the Glucocorticoid Receptor Impairs Reproduction in Female Zebrafish. Int J Mol Sci 21, 9073. 10.3390/ijms21239073.

      17. Facchinello, N., Skobo, T., Meneghetti, G., Colletti, E., Dinarello, A., Tiso, N., Costa, R., Gioacchini, G., Carnevali, O., Argenton, F., et al. (2017). nr3c1 null mutant zebrafish are viable and reveal DNA-binding-independent activities of the glucocorticoid receptor. Sci Rep-uk 7, 4371. 10.1038/s41598-017-04535-6.

      18. Faught, E., Santos, H.B., and Vijayan, M.M. (2020). Loss of the glucocorticoid receptor causes accelerated ovarian ageing in zebrafish. Proc Royal Soc B 287, 20202190. 10.1098/rspb.2020.2190.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors develop a method to fluorescently tag peptides loaded onto dendritic cells using a two-step method with a tetracystein motif modified peptide and labelling step done on the surface of live DC using a dye with high affinity for the added motif. The results are convincing in demonstrating in vitro and in vivo T cell activation and efficient label transfer to specific T cells in vivo. The label transfer technique will be useful to identify T cells that have recognised a DC presenting a specific peptide antigen to allow the isolation of the T cell and cloning of its TCR subunits, for example. It may also be useful as a general assay for in vitro or in vivo T-DC communication that can allow the detection of genetic or chemical modulators.

      Strengths:

      The study includes both in vitro and in vivo analysis including flow cytometry and two-photon laser scanning microscopy. The results are convincing and the level of T cell labelling with the fluorescent pMHC is surprisingly robust and suggests that the approach is potentially revealing something about fundamental mechanisms beyond the state of the art.

      Weaknesses:

      The method is demonstrated only at high pMHC density and it is not clear if it can operate at at lower peptide doses where T cells normally operate. However, this doesn't limit the utility of the method for applications where the peptide of interest is known. It's not clear to me how it could be used to de-orphan known TCR and this should be explained if they want to claim this as an application. Previous methods based on biotin-streptavidin and phycoerythrin had single pMHC sensitivity, but there were limitations to the PE-based probe so the use of organic dyes could offer advantages.

      We thank the reviewer for the valuable comments and suggestions. Indeed, we have shown and optimized this labeling technique for a commonly used peptide at rather high doses to provide a proof of principle for the possible use of tetracysteine tagged peptides for in vitro and in vivo studies. However, we completely agree that the studies that require different peptides and/or lower pMHC concentrations may require preliminary experiments if the use of biarsenical probes is attempted. We think it can help investigate the functional and biological properties of the peptides for TCRs deorphaned by techniques. Tetracysteine tagging of such peptides would provide a readily available antigen-specific reagent for the downstream assays and validation. Other possible uses for modified immunogenic peptides could be visualizing the dynamics of neoantigen vaccines or peptide delivery methods in vivo. For these additional uses, we recommend further optimization based on the needs of the prospective assay.

      Reviewer #2 (Public Review):

      Summary:

      The authors here develop a novel Ovalbumin model peptide that can be labeled with a site-specific FlAsH dye to track agonist peptides both in vitro and in vivo. The utility of this tool could allow better tracking of activated polyclonal T cells particularly in novel systems. The authors have provided solid evidence that peptides are functional, capable of activating OTII T cells, and that these peptides can undergo trogocytosis by cognate T cells only.

      Strengths:

      -An array of in vitro and in vivo studies are used to assess peptide functionality.

      -Nice use of cutting-edge intravital imaging.

      -Internal controls such as non-cogate T cells to improve the robustness of the results (such as Fig 5A-D).

      -One of the strengths is the direct labeling of the peptide and the potential utility in other systems.

      Weaknesses:

      1. What is the background signal from FlAsH? The baselines for Figure 1 flow plots are all quite different. Hard to follow. What does the background signal look like without FLASH (how much fluorescence shift is unlabeled cells to No antigen+FLASH?). How much of the FlAsH in cells is actually conjugated to the peptide? In Figure 2E, it doesn't look like it's very specific to pMHC complexes. Maybe you could double-stain with Ab for MHCII. Figure 4e suggests there is no background without MHCII but I'm not fully convinced. Potentially some MassSpec for FLASH-containing peptides.

      We thank the reviewer for pointing out a possible area of confusion. In fact, we have done extensive characterization of the background and found that it has varied with the batch of FlAsH, TCEP, cytometer and also due to the oxidation prone nature of the reagents. Because Figure 1 subfigures have been derived from different experiments, a combination of the factors above have likely contributed to the inconsistent background. To display the background more objectively, we have now added the No antigen+Flash background to the revised Fig 1.

      It is also worthwhile noting that nonspecific Flash incorporation can be toxic at increasing doses, and live cells that display high backgrounds may undergo early apoptotic changes in vitro. However, when these cells are adoptively transferred and tracked in vivo, the compromised cells with high background possibly undergo apoptosis and get cleared by macrophages in the lymph node. The lack of clearance in vitro further contributes to different backgrounds between in vitro and in vivo, which we think is also a possible cause for the inconsistent backgrounds throughout the manuscript. Altogether, comparison of absolute signal intensities from different experiments would be misleading and the relative differences within each experiment should be relied upon. We have added further discussion about this issue.

      1. On the flip side, how much of the variant peptides are getting conjugated in cells? I'd like to see some quantification (HPLC or MassSpec). If it's ~10% of peptides that get labeled, this could explain the low shifts in fluorescence and the similar T cell activation to native peptides if FlasH has any deleterious effects on TCR recognition. But if it's a high rate of labeling, then it adds confidence to this system.

      We agree that mass spectrometry or, more specifically tandem MS/MS, would be an excellent addition to support our claim about peptide labeling by FlAsH being reliable and non-disruptive. Therefore, we have recently undertaken a tandem MS/MS quantitation project with our collaborators. However, this would require significant time to determine the internal standard based calibration curves and to run both analytical and biological replicates. Hence, we have decided pursuing this as a follow up study and added further discussion on quantification of the FlAsH-peptide conjugates by tandem MS/MS.

      1. Conceptually, what is the value of labeling peptides after loading with DCs? Why not preconjugate peptides with dye, before loading, so you have a cleaner, potentially higher fluorescence signal? If there is a potential utility, I do not see it being well exploited in this paper. There are some hints in the discussion of additional use cases, but it was not clear exactly how they would work. One mention was that the dye could be added in real-time in vivo to label complexes, but I believe this was not done here. Is that feasible to show?

      We have already addressed preconjugation as a possible avenue for labeling peptides. In our hands, preconjugation resulted in low FlAsH intensity overall in both the control and tetracysteine labeled peptides (Author response image 1). While we don’t have a satisfactory answer as to why the signal was blunted due to preconjugation, it could be that the tetracysteine tagged peptides attract biarsenical compounds better intracellularly. It may be due to the redox potential of the intracellular environment that limits disulfide bond formation. (PMID: 18159092)

      Author response image 1.

      Preconjugation yields poor FlAsH signal. Splenic DCs were pulsed with peptide then treated with FlAsH or incubated with peptide-FlAsH preconjugates. Overlaid histograms show the FlAsH intensities on DCs following the two-step labeling (left) and preconjugation (right). Data are representative of two independent experiments, each performed with three biological replicates.

      1. Figure 5D-F the imaging data isn't fully convincing. For example, in 5F and 2G, the speeds for T cells with no Ag should be much higher (10-15micron/min or 0.16-0.25micron/sec). The fact that yours are much lower speeds suggests technical or biological issues, that might need to be acknowledged or use other readouts like the flow cytometry.

      We thank the reviewer for drawing attention to this technical point. We would like to point out that the imaging data in fig 5 d-f was obtained from agarose embedded live lymph node sections. Briefly, the lymph nodes were removed, suspended in 2% low melting temp agarose in DMEM and cut into 200 µm sections with a vibrating microtome. Prior to imaging, tissue sections were incubated in complete RPMI medium at 37 °C for 2 h to resume cell mobility. Thus, we think the cells resuming their typical speeds ex vivo may account for slightly reduced T cell speeds overall, for both control and antigen-specific T cells (PMID: 32427565, PMID: 25083865). We have added text to prevent the ambiguity about the technique for dynamic imaging. The speeds in Figure 2g come from live imaging of DC-T cell cocultures, in which the basal cell movement could be hampered by the cell density. Additionally, glass bottom dishes have been coated with Fibronectin to facilitate DC adhesion, which may be responsible for the lower average speeds of the T cells in vitro.

      Reviewer #1 (Recommendations For The Authors):

      Does the reaction of ReAsH with reactive sites on the surface of DC alter them functionally? Functions have been attributed to redox chemistry at the cell surface- could this alter this chemistry?

      We thank the reviewer for the insight. It is possible that the nonspecific binding of biarsenical compounds to cysteine residues, which we refer to as background throughout the manuscript, contribute to some alterations. One possible way biarsenicals affect the redox events in DCs can be via reducing glutathione levels (PMID: 32802886). Glutathione depletion is known to impair DC maturation and antigen presentation (PMID: 20733204). To avoid toxicity, we have carried out a stringent titration to optimize ReAsH and FlAsH concentrations for labeling and conducted experiments using doses that did not cause overt toxicity or altered DC function.

      Have the authors compared this to a straightforward approach where the peptide is just labelled with a similar dye and incubated with the cell to load pMHC using the MHC knockout to assess specificity? Why is this that involves exposing the DC to a high concentration of TCEP, better than just labelling the peptide? The Davis lab also arrived at a two-step method with biotinylated peptide and streptavidin-PE, but I still wonder if this was really necessary as the sensitivity will always come down to the ability to wash out the reagents that are not associated with the MHC.

      We agree with the reviewer that small undisruptive fluorochrome labeled peptide alternatives would greatly improve the workflow and signal to noise ratio. In fact, we have been actively searching for such alternatives since we have started working on the tetracysteine containing peptides. So far, we have tried commercially available FITC and TAMRA conjugated OVA323-339 for loading the DCs, however failed to elicit any discernible signal. We also have an ongoing study where we have been producing and testing various in-house modified OVA323-339 that contain fluorogenic properties. Unfortunately, at this moment, the ones that provided us with a crisp, bright signal for loading revealed that they have also incorporated to DC membrane in a nonspecific fashion and have been taken up by non-cognate T cells from double antigen-loaded DCs. We are actively pursuing this area of investigation and developing better optimized peptides with low/non-significant membrane incorporation.

      Lastly, we would like to point out that tetracysteine tags are visible by transmission electron microscopy without FlAsH treatment. Thus, this application could add a new dimension for addressing questions about the antigen/pMHCII loading compartments in future studies. We have now added more in-depth discussion about the setbacks and advantages of using tetracysteine labeled peptides in immune system studies.

      The peptide dosing at 5 µM is high compared to the likely sensitivity of the T cells. It would be helpful to titrate the system down to the EC50 for the peptide, which may be nM, and determine if the specific fluorescence signal can still be detected in the optimal conditions. This will not likely be useful in vivo, but it will be helpful to see if the labelling procedure would impact T cell responses when antigen is limited, which will be more of a test. At 5 µM it's likely the system is at a plateau and even a 10-fold reduction in potency might not impact the T cell response, but it would shift the EC50.

      We thank the reviewer for the comment and suggestion. We agree that it is possible to miss minimally disruptive effects at 5 µM and titrating the native peptide vs. modified peptide down to the nM doses would provide us a clearer view. This can certainly be addressed in future studies and also with other peptides with different affinity profiles. A reason why we have chosen a relatively high dose for this study was that lowering the peptide dose had costed us the specific FlAsH signal, thus we have proceeded with the lowest possible peptide concentration.

      In Fig 3b the level of background in the dsRed channel is very high after DC transfer. What cells is this associated with and does this appear be to debris? Also, I wonder where the ReAsH signal is in the experiments in general. I believe this is a red dye and it would likely be quite bright given the reduction of the FlAsH signal. Will this signal overlap with signals like dsRed and PHK-26 if the DC is also treated with this to reduce the FlAsH background?

      We have already shown that ReAsH signal with DsRed can be used for cell-tracking purposes as they don’t get transferred to other cells during antigen specific interactions (Author response image 2). In fact, combining their exceptionally bright fluorescence provided us a robust signal to track the adoptively transferred DCs in the recipient mice. On the other hand, the lipophilic membrane dye PKH-26 gets transferred by trogocytosis while the remaining signal contributes to the red fluorescence for tracking DCs. Therefore, the signal that we show to be transferred from DCs to T cells only come from the lipophilic dye. To address this, we have added a sentence to elaborate on this in the results section. Regarding the reviewer’s comment on DsRed background in Figure 3b., we agree that the cells outside the gate in recipient mice seems slightly higher that of the control mice. It may suggest that the macrophages clearing up debris from apoptotic/dying DCs might contribute to the background elicited from the recipient lymph node. Nevertheless, it does not contribute to any DsRed/ReAsH signal in the antigen-specific T cells.

      Author response image 2.

      ReAsH and DsRed are not picked up by T cells during immune synapse. DsRed+ DCs were labeled with ReAsH, pulsed with 5 μM OVACACA, labeled with FlAsH and adoptively transferred into CD45.1 congenic mice mice (1-2 × 106 cells) via footpad. Naïve e450-labeled OTII and e670-labeled polyclonal CD4+ T cells were mixed 1:1 (0.25-0.5 × 106/ T cell type) and injected i.v. Popliteal lymph nodes were removed at 42 h post-transfer and analyzed by flow cytometry. Overlaid histograms show the ReAsh/DsRed, MHCII and FlAsH intensities of the T cells. Data are representative of two independent experiments with n=2 mice per group.

      In Fig 5b there is a missing condition. If they look at Ea-specific T cells for DC with without the Ova peptide do they see no transfer of PKH-26 to the OTII T cells? Also, the FMI of the FlAsH signal transferred to the T cells seems very high compared to other experiments. Can the author estimate the number of peptides transferred (this should be possible) and would each T cell need to be collecting antigens from multiple DC? Could the debris from dead DC also contribute to this if picked up by other DC or even directly by the T cells? Maybe this could be tested by transferring DC that are killed (perhaps by sonication) prior to inoculation?

      To address the reviewer’s question on the PKH-26 acquisition by T cells, Ea-T cells pick up PKH-26 from Ea+OVA double pulsed DCs, but not from the unpulsed or single OVA pulsed DCs. OTII T cells acquire PKH-26 from OVA-pulsed DCs, whereas Ea T cells don’t (as expected) and serve as an internal negative control for that condition. Regarding the reviewer’s comment on the high FlAsH signal intensity of T cells in Figure 5b, a plausible explanation can be that the T cells accumulate pMHCII through serial engagements with APCs. In fact, a comparison of the T cell FlAsH intensities 18 h and 36-48 h post-transfer demonstrate an increase (Author response image 3) and thus hints at a cumulative signal. As DCs are known to be short-lived after adoptive transfer, the debris of dying DCs along with its peptide content may indeed be passed onto macrophages, neighboring DCs and eventually back to T cells again (or for the first time, depending on the T:DC ratio that may not allow all T cells to contact with the transferred DCs within the limited time frame). We agree that the number and the quality of such contacts can be gauged using fluorescent peptides. However, we think peptides chemically conjugated to fluorochromes with optimized signal to noise profiles and with less oxidation prone nature would be more suitable for quantification purposes.

      Author response image 3.

      FlAsH signal acquisition by antigen specific T cells becomes more prominent at 36-48 h post-transfer. DsRed+ splenic DCs were double-pulsed with 5 μM OVACACA and 5 μM OVA-biotin and adoptively transferred into CD45.1 recipients (2 × 106 cells) via footpad. Naïve e450-labeled OTII (1 × 106 cells) and e670-labeled polyclonal T cells (1 × 106 cells) were injected i.v. Popliteal lymph nodes were analyzed by flow cytometry at 18 h or 48 h post-transfer. Overlaid histograms show the T cell levels of OVACACA (FlAsH). Data are representative of three independent experiments with n=3 mice per time point

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in weaknesses 1 & 2, more validation of how much of the FlAsH fluorescence is on agonist peptides and how much is non-specific would improve the interpretation of the data. Another option would be to preconjugate peptides but that might be a significant effort to repeat the work.

      We agree that mass spectrometry would be the gold standard technique to measure the percentage of tetracysteine tagged peptide is conjugated to FlAsH in DCs. However, due to the scope of such endevour this can only be addressed as a separate follow up study. As for the preconjugation, we have tried and unfortunately failed to get it to work (Reviewer Figure 1). Therefore, we have shifted our focus to generating in-house peptide probes that are chemically conjugated to stable and bright fluorophore derivates. With that, we aim to circumvent the problems that the two-step FlAsH labeling poses.

      Along those lines, do you have any way to quantify how many peptides you are detecting based on fluorescence? Being able to quantify the actual number of peptides would push the significance up.

      We think two step procedure and background would pose challenges to such quantification in this study. although it would provide tremendous insight on the antigen-specific T cell- APC interactions in vivo, we think it should be performed using peptides chemically conjugated to fluorochromes with optimized signal to noise profiles.

      In Figure 3D or 4 does the SA signal correlate with Flash signal on OT2 cells? Can you correlate Flash uptake with T cell activation, downstream of TCR, to validate peptide transfers?

      To answer the reviewer’s question about FlAsH and SA correlation, we have revised the Figure 3d to show the correlation between OTII uptake of FlAsH, Streptavidin and MHCII. We also thank the reviewer for the suggestion on correlating FlAsH uptake with T cell activation and/or downstream of TCR activation. We have used proliferation and CD44 expressions as proxies of activation (Fig 2, 6). Nevertheless, we agree that the early events that correspond to the initiation of T-DC synapse and FlAsH uptake would be valuable to demonstrate the temporal relationship between peptide transfer and activation. Therefore, we have addressed this in the revised discussion.

      Author response image 4.

      FlAsH signal acquisition by antigen specific T cells is correlates with the OVA-biotin (SA) and MHCII uptake. DsRed+ splenic DCs were double-pulsed with 5 μM OVACACA and 5 μM OVA-biotin and adoptively transferred into CD45.1 recipients (2 × 106 cells) via footpad. Naïve e450-labeled OTII (1 × 106 cells) and e670-labeled polyclonal T cells (1 × 106 cells) were injected i.v. Popliteal lymph nodes were analyzed by flow cytometry. Overlaid histograms show the T cell levels of OVACACA (FlAsH) at 48 h post-transfer. Data are representative of three independent experiments with n=3 mice.

      Minor:

      Figure 3F, 5D, and videos: Can you color-code polyclonal T cells a different color than magenta (possibly white or yellow), as they have the same look as the overlay regions of OT2-DC interactions (Blue+red = magenta).

      We apologize for the inconvenience about the color selection. We have had difficulty in assigning colors that are bright and distinct. Unfortunately, yellow and white have also been easily mixed up with the FlAsH signal inside red and blue cells respectively. We have now added yellow and white arrows to better point out the polyclonal vs. antigen specific cells in 3f and 5d.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This nice study by Miyano combines slice electrophysiology and superresolution microscopy to address the role of RBP2 in Ca2+ channel clustering and neurotransmitter release at hippocampal mossy fiber terminals. While a number of studies demonstrated a critical role for RBPs in clustering Ca2+ channels at other synapses and some provided evidence for a role of the protein in molecular coupling of Ca2+ channels and release sites, the present study targets another key synapse that is an important model for presynaptic studies and offers access to a microdomain controlled synaptic vesicle (SV) release mechanism with low initial release probability.

      Summarizing a large body of high-quality work, the authors demonstrate reduced Ca2+ currents and a reduced release probability. They attribute the latter to the reduced Ca2+ influx and can restore release by increasing Ca2+ influx. Moreover, they propose an altered fusion competence of the SVs, which is not so strongly supported by the data in my view.

      The effects are relatively small, but I think the careful analysis of the RBP role at the mossy fiber synapse is an important contribution.

      We thank the reviewer for careful assessment of the paper. We agree that while reduced Ca influx in KO is relatively straightforward, impaired priming is somewhat indirect, remaining as suggestion. We also noted that Moser and colleagues have analyzed the function of RIM-BP2 at hair cell synapses and also showed reduced Ca influx. In cortical synapses, there have been no study using direct presynaptic recording. In the revision, we carefully cited previous studies and tried to be fair. We hope that the current revision is much improved.

      Reviewer #2 (Public Review):

      The proper expression and organization of CaV channels at the presynaptic release sites are subject to coordinative and redundant control of many active zone-specific molecules including RIM-BPs. Previous studies have demonstrated that ablation of RIM-BPs in various mammalian synapses causes significant impairment of synaptic transmission, either by reducing CaV expression or decoupling CaV from synaptic vesicles. The mechanisms remain unknown.

      In the manuscript, Sakaba and colleagues aimed to examine the specific role of RIM-BP2 at the hippocampal mossy fiber-CA3 pyramidal cell synapse, which is well-characterized by low initial release probability and strong facilitation during repetitive stimulation. By directly recording Ca2+ currents and capacitance jumps from the MF boutons, which is very challenging but feasible, they showed that depolarization-evoked Ca2+ influx was reduced significantly (~39%) by KO of RIM-BP2, but no impacts on Ca-induced exocytosis and RRP (measured by capacitance change). They used STED microscopy to image the spatial distribution of the CaV2.1 cluster but found no change in the cluster number with a slight decrease in cluster intensity (~20%). They concluded that RIM-BP2 functions in tonic synapses by reducing CaV expression and thus differentially from phasic synapses by decoupling CaV-SV.

      In general, they provide solid data showing that RIM-BP2 KO reduces Ca influx at MF-CA3 synapse, but the phenotype is not new as Moser and colleagues have also used presynaptic recording and capacitance measurement and shown that RIM-BP2 KO reduces Ca2+ influx at hair cell active zone (Krinner et al., 2017), although at different synapse model expressing CaV1.3 instead of CaV2.1. Further, the concept that RIM-BP2 plays diverse functions in transmitter release at different central synapses has also been proposed with solid evidence (Brockmann et al., 2019).

      We thank the reviewer for careful reading of the ms. We agree that previous studies have sown reduced Ca influx at hair cells, and diverse function of RIM-BP2 in different central synapses have been proposed by Brockman et al. The new point of this study is we firmly and quantitatively show the reduced Ca currents using direct presynaptic recording, which has not been done in mossy fiber synapses or cortical synapses in general. Quantitative and time-resolved measurements of the presynaptic currents cannot be done by other methods, so far. In this revision, we point this out carefully.  

      Reviewer #1 (Recommendations For The Authors):

      The MS is overall carefully prepared and I have only a few minor comments to help with further improving the manuscript.

      Abstract:

      I think the notion of different RBP function at tonic and phasic synapses is not so well founded. The reduced number of Ca2+ channels and their altered topography have been shown in multiple synapses that also include those with phasic release. Quantitative structural and functional analysis of presynaptic Ca2+ channels of RBP-2 and RBP1-2 DKO deficient AZs closely related to the present study has e.g. been provided for auditory synapses (e.g. hair cells, endbulb/calyx of end synapses that provide both phasic and sustained release.

      In abstract, we have omitted description of phasic vs tonic synapses, because it is not well founded as the reviewer pointed out. Specifically, in abstract (Line 13~):

      “Synaptic vesicles dock and fuse at the presynaptic active zone (AZ), the specialized site for transmitter release. AZ proteins play multiple roles such as recruitment of Ca2+ channels as well as synaptic vesicle docking, priming and fusion. However, the precise role of each AZ protein type remains unknown. In order to dissect the role of RIM-BP2 at mammalian cortical synapses having low release probability, we applied direct electrophysiological recording and super-resolution imaging to hippocampal mossy fiber terminals of RIM-BP2 KO mice. By using direct presynaptic recording, we found the reduced Ca2+ currents. The measurements of EPSCs and presynaptic capacitance suggested that the initial release probability was lowered because of the reduced Ca2+ influx and impaired fusion competence in RIM-BP2 KO. Nevertheless, larger Ca2+ influx restored release partially. Consistent with presynaptic recording, STED microscopy suggested less abundance of P/Q-type Ca2+ channels at AZs deficient in RIM-BP2. Our results suggest that the RIM-BP2 regulates both Ca2+ channel abundance and transmitter release at mossy fiber synapses.”

      Intro:

      Line 48: consider adding Butola et al., 2021 /endbuld of Held to reference which concurs on the notion made for Calyx. However, a contrasting finding was made for another synapse with tight coupling: RBP2 deletion did not alter tight coupling in hair cells (Krinner et al., 2017). Line 51: RBP-DKO/lack of additional effect of RBP1 deletion: suggest adding Krinner et al., 2021 to reference, which concurs with the notion made for hair cells.

      We cited Butola et al., 2021 (Line 49) and Krinner et al., 2021 (Line 52), as the reviewer suggested.

      Results:

      STED microscopy: I am concerned with two aspects of the analysis/presentation. I) I recommend replacing density with abundance as the authors do not resolve single channels. II) I appreciate the note of caution about the fact that STED nanoscopy due to the non-linear nature of the depletion process should/could not be easily used to quantify copy numbers based on immunofluorescence. I would recommend the authors perform 2D Gaussian fitting to at least the Cav2.1 immunofluorescent spots neighboring Munc13-1 spots and report the short and long axis estimates as well as potentially the area. Should the authors have confocal Cav2.1 and Cav2.2 immunofluorescent data co-acquired with STED of Munc13-1, this would be very valuable additional information, but I do not think the experiment is essential for the sake of publication if it was not done already, given the large body of high-quality physiology data.

      I) We have changed the term from density to abundance as the reviewer suggested throughout the manuscript.

      II) As the reviewer suggested, we have carried out 2D Gaussian fitting of Cav2.1 spots. The length, width, and area of Cav2.1 clusters in the AZ were not different between WT and RIM-BP2 KO terminals (Line 431-433, Figure 7-figure supplement 4). The spatial resolution of STED, especially at mossy fiber synapses in the tissue, and a small difference between WT and KO (~30 % expected from electrophysiology) could prevent detection of the difference, unlike ribbon synapses and fly NMJ where release sites and Ca channel clusters are well defined. We should also note that the intensity was calculated similar to previous studies (integral of signal intensity, Krinner et al., 2017), and not absolute peak intensity.  

      As the reviewer suggested, we have added confocal data ((Line 434-436, Figure 7-figure supplement 5). We have determined the AZ area from the Munc13-1 STED data, and Munc13-1, Cav2.1 and Cav2.2 intensities were quantified. As shown in the figure, only Ca2.1 intensity was reduced in KO, consistent with the STED data.

      Nevertheless, we should be cautious about interpretation of the intensity as the reviewer suggested, and are aware that the data are just consistent with electrophysiology. From imaging, we only see a qualitative rather than quantitative difference between WT and KO.

      Discussion:

      I think the focus on alterations of presynaptic Ca channels could be further strengthened along with the discussion of the relevant previous studies.

      Thank you for the suggestion. We have added a paragraph as shown below in the discussion (Line 531~).

      “By using direct presynaptic patch clamp recordings, we here observed a decrease of Ca2+ current amplitudes (~30%) in RIM-BP2 KO mice (Fig. 1). Consistently, STED microscopy supported reduced abundance of P/Q-type Ca2+ channels (Cav2.1) in the mutant mossy fiber terminal (Fig. 7). Interestingly, this observation is similar to that at Drosophila NMJ and hair cell synapses (Liu et al., 2011; Krinner et al., 2017), but not that at other synapses (Acuna et al., 2015; Grauel et al., 2016; Butola et al., 2021), suggesting that the functional role of RIM-BP2 in recruiting Ca2+ channels differs among synapse types. “

      Reviewer #2 (Recommendations For The Authors):

      Minor questions:

      1) The title is misleading as it only shows RIM-BP2 regulates CaV expression but not clustering.

      This has been pointed out by the 1st reviewer, too. We have adopted the term “abundance” as suggested by the 1st reviewer and changed to “RIM-BP2 regulates Ca2+ channel abundance and neurotransmitter release at hippocampal mossy fiber terminals.”

      2) Figure 7 legend. Again, RIM-BP2 only changes the intensity of CaV2.1 clusters but not the density.

      Changed Figure 7 title from “RIM-BP2 deletion alters the density …” to “RIM-BP2 deletion alters the signal intensity …”.

      3) Line 31: "Ca2+ influx through voltage-gated Ca2+ channels triggers neurotransmitter release from synaptic vesicles within a millisecond" is not correct. Ca-evoked transmitter release can only occur with such fast speed at very specialized synapses such as the calyx of Held but not at general chemical synapses.

      We changed “within a millisecond” to “within milliseconds” (Line 30).

      4) Line 44-46: In Drosophila NMJs and at Drosophila NMJs are redundant.

      We eliminated “at Drosophila NMJs”.

      5) The authors should use the verb tense consistently throughout the manuscript such as"In RIM-BP1,2 DKO mice, the coupling between Ca2+ channels and synaptic vesicles became loose, and action potential-evoked neurotransmitter release was reduced at the calyx of Held synapse (Acuna et al., 2015). At hippocampal CA3-CA1 synapses, RIM-BP2 deletion alters Ca2+ channel localization at the AZs without altering total Ca2+ influx. Besides, RIM-BP1,2 DKO has no additional effect...".

      We changed verb tenses in Line 46-49, Line 55-58, and Line 62-67. We also checked the ms once more. Thank you for pointing this out.

      6) Line 59: technically difficulty should be technical difficulty.

      Fixed.

      7) Figure 4A-B are representative traces of 0.5 mM EGTA (black) or 5 mM EGTA (red) recorded from the same terminals or from different terminals but simply superimposed?

      Representative traces are recorded from different terminals. We describe this point in the figure legend (Fig 4A). We are very sorry for confusion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (public):

      1) “It is unclear whether new in vivo experiments were conducted for this study”.

      All in vivo experiments were conducted for this study by using previously published fly stocks to directly compare N- and C-terminal shedding side-by-side in two Hh-dependent developmental systems. This is now clearly stated in the revised supplement (Fig. S8). We also conducted these experiments because previous in vivo studies in flies often relied on Hh overexpression in the fat body, raising questions about their physiological relevance. Our in vivo analyses of Hh function in wing and eye discs are more physiologically relevant and can explain the previously reported presence of non-lipidated bioactive Hh in disc tissue (PMID: 23554573).

      2) “A critical shortcoming of the study is that experiments showing Shh secretion/export do not include a Shh(-) control condition. Without demonstration that the bands analyzed are specific for Shh(+) conditions, these experiments cannot be appropriately evaluated”.

      The Cell Signaling Technology C9C5 anti-Shh antibody used in our study is highly specific against Shh, and it has been used in over 60 publications. C9C5 even lacks cross-reactivity with highly similar Ihh or Dhh (https://www.cellsignal.com/products/primary-antibodies/shh-c9c5-rabbit-mab/2207?_requestid=1528451). We confirmed C9C5 specificity repeatedly (one example is shown below; another quality control that includes media of mock-transfected cells is now shown in Fig. S1) and never observed unspecific bands under any experimental condition. As shown below, C9C5 and R&D AF464 anti-Shh antibodies (the latter were previously used in our lab) detect the same bands.

      Author response image 1.

      Shh immunoblot. R&D 8908-SH served as a size control for full-length dual-lipidated Shh, and C25S;26-35Shh served as a size control for N-terminally truncated monolipidated Shh. Both C25SShh bands are specific: One represents the full-length protein and the bottom band represents N-truncated processed proteins. The blot was first incubated with antibody AF464 and reincubated (after stripping) with the much more sensitive antibody C9C5.

      3) “A stably expressing Shh/Hhat cell line would reduce condition to condition and experiment to experiment variability”.

      We agree and therefore have previously aimed to establish stable Hhat-expressing cell lines. However, we found that long-term Hhat overexpression eliminated transfected cells after several passages, or cells gradually ceased to express Hhat. This prevented us from establishing stable cell lines co-expressing Shh/Hhat despite several attempts and different strategies. Instead, we established transient co-expression of Shh/Hhat from the same mRNA as the next-best strategy for reliable near-quantitative Shh palmitoylation in our assays.

      4) “Unusual normalization strategies are used for many experiments, and quantification/statistical analyses are missing for several experiments”.

      We repeated all qPCR assays to eliminate this shortcoming. Biological activities and transcriptional responses of palmitoylated Shh and non-palmitoylated C25AShh are now directly compared and quantified (revised Fig. 4A,B, newly included Fig. 6, revised Fig. S5B). The original comparison of both proteins with dual-lipidated R&D 8908-SH is still important in order to show that both Shh and C25AShh in serum-containing media have equally high, and not equally low, activities because R&D 8908-SH is generally seen as the Shh form with the highest biological activity. These comparisons are therefore still discussed in the main manuscript text and are now shown in Fig. S5E.

      5) “The study provides a modest advance in the understanding of the complex issue of Shh membrane extraction”

      We believe that the revised manuscript advances our understanding of Shh membrane extraction beyond the modest in three important ways. First, although Disp was indeed known as a furin-activated Hh exporter, our findings show for the first time that furin activation of Disp is strictly linked to proteolytic Shh processing as the underlying release mode, fully consistent with data obtained from the Disp-/- cells.

      Second, Scube2 was known as a Shh release enhancer and several lipoproteins were previously shown to play a role in the process, but our findings are the first to show that synergistic Disp/Scube2 function depends on the presence of lipoprotein and that HDL (but no other lipoprotein) accepts free cholesterol or a novel monolipidated Shh variant from Disp. This challenges the dominant model of Scube2 chaperone function in Hh release and transport (PMID 22902404, PMID 22677548, PMID 36932157).

      Third, we show that this Shh variant is fully bioactive, despite the lack of the palmitate. Therefore, N-palmitate is dispensable for Shh signaling to Ptch1 receptors, but only if the morphogen is released by, and physically linked to, HDL. In contrast, previously published studies analyzed monolipidated Shh variants in the absence of HDL, resulting in variably reduced bioactivity of these physiologically irrelevant forms. Therefore, our findings challenge the current dominating model of N-palmitate-dependent Shh signaling to Ptch1 (this model also does not postulate any role for lipoproteins, PMID 36932157) and essential roles of N-palmitate (stating that the N-palmitate is sufficient for signaling, PMID 27647915).

      Reviewer 2 (public):

      1) “However, the results concerning the roles of lipoproteins and Shh lipid modifications are largely confirmatory of previous results, and molecular identity/physiological relevance of the newly identified Shh variant remain unclear”.

      We disagree with this assessment on several points. First, our findings do not confirm, but strongly challenge, the current dogma of Disp-mediated handover of dual-lipidated Shh to Scube2 as a soluble acceptor (instead of to HDL, PMID 36932157). Second, we report three new findings: Disp, Scube2, and lipoproteins all interact to specifically increase N-terminal Shh shedding, whereas C-terminal shedding is optional; Disp function depends on the presence of HDL; and HDL modulates Shh shedding (dual Shh shedding in the absence of HDL versus N-shedding and HDL association in its presence). Our work also directly determines the molecular identity of a previously unknown Shh variant as monolipidated (by RP-HPLC), HDL associated (by SEC and density gradient centrifugation), and fully bioactive (in two cell-based reporter assays).

      Third, regarding the physiological relevance of our findings: Fig. S8 demonstrates that deletion of the N-terminal sheddase target site of Hh abolishes all Hh biofunction in Drosophila eye discs and wing discs, which strongly supports physiological relevance of N-terminal Hh shedding during release. N-terminal shedding is further consistent with in vivo findings of others. These studies showed that artificial monolipidated Shh variants (C25SShh and ShhN) generate highly variable loss-of-function phenotypes in vivo, but can also generate gain-of-function phenotypes if compared with the dual-lipidated cellular protein 1, 2, 3, 4, 5. These observations are difficult to align with the dominating model of essential N-palmitate function at the level of Ptch1 (PMID 36932157), because the lack of N-palmitate is expected to always diminish signaling in all tissue contexts and developmental stages. Our finding that dual-lipidated Shh is strictly released in a Disp/Scube2-controlled manner from producing cells, while artificial monolipidated Shh variants leak uncontrolled from the cellular surface, explains these seemingly paradoxical in vivo findings much better. This is because uncontrolled Shh release can increase Shh signaling locally (when physiological release would normally be prevented at this site 6 or time), while it can also decrease it (for example, in situations requiring timed pulses of Shh release and signaling 7, 8, 9, 10, 11). This is discussed in our manuscript (Discussion, first paragraph).

      2) The molecular properties of the processed Shh variants are unclear – incorporation of cholesterol/palmitate and removal of peptides were not directly demonstrated…

      We also disagree on this point. Our study is the only one that uses RP-HPLC and defined controls (dual-lipidated commercial R&D 9808-SH, dual-lipidated cellular proteins eluting at the same positions, non-lipidated or monolipidated controls, Fig. S1F-K) to compare the lipidation status of cellular and corresponding solubilized Shh and to determine their exact lipidation status (Figs. 1, 3, 5, Figs. S4, S6, S7). Co-expressed Hhat assures full Shh palmitoylation during biosynthesis (as shown in original Figs. 1A and S2F-K & S4A and as confirmed by R&D 9808-SH) as an essential prerequisite to reliably conduct and interpret these analyses. The removal of peptides is demonstrated by the increase in electrophoretic mobility of soluble forms, if compared with their dual-lipidated cellular precursor, because chemical delipidation results in a decrease in electrophoretic mobility in SDS-PAGE (as discussed in detail in 12 that we now cite in our work).

      3) This (N-terminal palmitoylation status) is particularly relevant …, as the signaling activity of non-palmitoylated Hedgehog proteins is controversial.

      We agree with this comment and are aware of the published data. However, in our work, we have demonstrated strong signaling activities by using C25AShh mutants that are fully impaired in their ability to undergo N-palmitoylation (Fig. 4, Fig. S5). These are highly bioactive if associated with HDL. Therefore, we do not see any ambiguity in our findings and suggest that the reports of others resulted from different experimental conditions.

      4) A decrease in hydrophobicity is no proof for cleavage of palmitate, this could also be due to addition of a shorter acyl group.

      As shown in the original manuscript, we have controlled for this possibility: RP-HPLC was established by using defined controls (dual-lipidated, non-lipidated, or monolipidated, Fig. S1F-K and corresponding color coding). Because the cellular Shh precursor prior to release was always dual-lipidated, whereas the soluble form was not, lipids were clearly lost during release (because a decrease in the hydrophobicity of soluble proteins is always shown relative to that in their dual-lipidated cellular precursors). The increase in electrophoretic mobility detected for the very same proteins in SDS-PAGE demonstrates delipidation during their release (please see my reply to point 2 above). Finally, the suggested possibility of palmitate exchange for shorter acyls during Shh release at the cell surface is extremely unlikely, as there is no known machinery to catalyze this exchange at the plasma membrane. Hh acylation only occurs in the ER membrane via Hhat 13.

      5) “It would be important to demonstrate key findings in cells that secrete Shh endogenously”.

      We now show that Panc1 cells release endogenous Shh in truncated form, as our transfected cells do (Fig. S1). Moreover, the experimental data shown in Fig. S8B demonstrate that engrailed-controlled expression of sheddase-resistant Hh variants in wing disc cells completely blocks endogenous Hh produced in the same cells by stalling Disp-mediated morphogen export. Both findings strongly support our key finding that N-processing is not optional but absolutely required to finalize Hh release.

      6) Co-fractionation of Shh and ApoA1 is not convincing, as the two proteins peak at different molecular weights…. The authors could use an orthogonal approach, optimally a demonstration of physical interaction, or at least fractionation by a different parameter

      Shifted Shh peaks upon physiologically relevant Shh transfer via Disp to HDL must be expected in SEC, because Shh association with HDL subfractions increases their size. Comparing relative peaks of Shh-loaded HDL with Shh-free reference HDL suggests 10-15 Shh molecules per HDL (adding 200kDa - 300kDa to its molecular mass). This is now stated in the revised manuscript (page 10, line 2).

      Still, to further support direct Shh/HDL association, we analyzed high molecular weight Shh SEC fractions by subsequent RP-HPLC. This approach confirms direct physical interactions between cholesteroylated Shh and HDL (now shown in Fig. S6G).

      We support this possibility further by density gradient centrifugation, again demonstrating that Shh and HDL interact physically (now shown in Fig. S6 E,F).

      Recommendations from the reviewing editor:

      1) “The authors should certainly tone down statements of novelty because much of the work is confirmatory in nature”

      We followed this request in our revised manuscript and now clearly point out what was known and what we add to the concept of Disp and lipoprotein-mediated Hh export. Still, as outlined in our response to reviewer 2, our findings align with only one previously published model of lipoprotein-mediated Hh transport, while they do not support the most current models of Disp-mediated handover of dual-lipidated Shh to Scube2 (PMID 36932157) and essential signaling roles of N-palmitate at the level of the receptor Ptch1. Thus, our work should not be viewed solely as confirmatory of one of the many previous models, because at the same time it also contradicts the other models of Hh solubilization and transport.

      2) “Inclusion of the Shh(-) control”

      Please see our reply to reviewer 1 above. The Cell Signaling Technology C9C5 anti-Shh antibody used in our study is highly specific against Shh. We also carefully characterized the C9C5 antibody before any of the experiments shown in our work had been initiated. We never observed any unspecific C9C5 reactivity that otherwise would – of course – have prevented us from switching to this antibody from the AF464 antibodies that we had previously used. Consistent C9C5 antibody specificity is evident from the representative example shown below that was recently produced in our lab: no cellular proteins or TCA-precipitated serum-depleted media components from mock-transfected cells (left two lanes) react with C9C5.

      Author response image 2.

      Top left: C9C5 detects the cellular 45kDa Shh precursor and the 19 kDa signaling-active protein. No unspecific signals are detected in untransfected cells and supernatants of such cells (left two lanes). Right: Loading control on the stripped blot.

      3) “Clean up how the data are normalized for quantification”

      Please see our reply to reviewer 1 above. Normalization has been changed for the indicated figures. We also repeated qPCR analyses and added new ones to the manuscript that include required controls. We also changed figure outlines in accordance with the request.

      4) “The issue of a non-specific band of this Shh antibody is critical”

      Please see our replies above. In our hands, unspecific C9C5 antibody binding was never observed.

      5) “Regarding experimental rigor, I would add that the HPLC … should just show the real data points”

      We agree and added individual data points to our revised manuscript.

      Recommendations for the authors:

      1) I would like to see the controls in the same figure with the experimental results.

      We show antibody specificity controls together with released Shh in Fig. S1.

      2) Figure 2 confirms previously published results. It was shown in PMC5811216 that Disp processing by furin is required for Shh release from producing cells.

      Indeed, it was shown that furin processing of Disp increases Shh release (supposedly together with lipids), but we show here that furin-activated Disp specifically mediates proteolytic Shh shedding and loss of lipids – which is not the same. Indeed, we show this finding because we interpret it the other way around: Because it is known that furin activation of Disp increases Shh release by some means (PMC5811216), our observation that furin-mediated Disp activation specifically increases Shh shedding independently supports our model.

      3) Figure 3: it is stated that there is no increase in Shh release into the media…

      We removed this statement.

      4) Figure S5: Scale bars are missing.

      We added scale bars to the figures.

      5) Figure 4: A direct comparison between wt Shh and C25A conditioned media for qPCR is needed.

      We agree and repeated all experiments. Results confirm our previous findings and are shown in revised Fig. 4 and in Fig. S5.

      6) What other components can be examined in addition to ApoA1 as a marker for HDL? Why is the Shh peak shifted to the left? What about exovesicles?

      We also detected ApoE4, a mobile lipoprotein present on expanding (large) HDL (Figs. 5, 6, Figs S6, 7) 14. We also used density gradient centrifugation to support the Shh/HDL association. Regarding the leftwards Shh size shift relative to the major HDL peak in SEC, please refer to our explanation above – if loaded with Shh, a size increase of the respective HDL subfraction is expected. Finally, we did not test the role of exovesicles in our assays. However, due to their large size (60-120nm, HDL 7-12 nm), Shh associated with exovesicles should have eluted in the void volume of our gel filtration column. This we never observed.

      7) Why is osteoblast differentiation used?

      C3H10T1/2 osteoblast differentiation is strongly driven by Ihh and Shh activity and is established as a sensitive and robust assay. Still, following this reviewer’s advice, we conducted qPCR assays on these cells and in addition on NIH3T3 cells to support our findings.

      Finally, we corrected all minor mistakes regarding spelling and figure labeling. We also improved the readability of the revised manuscript, as suggested by reviewer 2.

      References

      1. Gallet A, Ruel L, Staccini-Lavenant L, Therond PP. Cholesterol modification is necessary for controlled planar long-range activity of Hedgehog in Drosophila epithelia. Development 133, 407-418 (2006).

      2. Porter JA, et al. Hedgehog patterning activity: role of a lipophilic modification mediated by the carboxy-terminal autoprocessing domain. Cell 86, 21-34 (1996).

      3. Lewis PM, et al. Cholesterol modification of sonic hedgehog is required for long-range signaling activity and effective modulation of signaling by Ptc1. Cell 105, 599-612 (2001).

      4. Huang X, Litingtung Y, Chiang C. Region-specific requirement for cholesterol modification of sonic hedgehog in patterning the telencephalon and spinal cord. Development 134, 2095-2105 (2007).

      5. Lee JD, et al. An acylatable residue of Hedgehog is differentially required in Drosophila and mouse limb development. Dev Biol 233, 122-136 (2001).

      6. Corrales JD, Rocco GL, Blaess S, Guo Q, Joyner AL. Spatial pattern of sonic hedgehog signaling through Gli genes during cerebellum development. Development 131, 5581-5590 (2004).

      7. Cordero D, Marcucio R, Hu D, Gaffield W, Tapadia M, Helms JA. Temporal perturbations in sonic hedgehog signaling elicit the spectrum of holoprosencephaly phenotypes. J Clin Invest 114, 485-494 (2004).

      8. Dessaud E, et al. Interpretation of the sonic hedgehog morphogen gradient by a temporal adaptation mechanism. Nature 450, 717-720 (2007).

      9. Garcia-Morales D, Navarro T, Iannini A, Pereira PS, Miguez DG, Casares F. Dynamic Hh signalling can generate temporal information during tissue patterning. Development 146, (2019).

      10. Harfe BD, Scherz PJ, Nissim S, Tian H, McMahon AP, Tabin CJ. Evidence for an expansion-based temporal Shh gradient in specifying vertebrate digit identities. Cell 118, 517-528 (2004).

      11. Nahmad M, Stathopoulos A. Dynamic interpretation of hedgehog signaling in the Drosophila wing disc. PLoS Biol 7, e1000202 (2009).

      12. Ehring K, et al. Conserved cholesterol-related activities of Dispatched 1 drive Sonic hedgehog shedding from the cell membrane. J Cell Sci 135, (2022).

      13. Coupland CE, et al. Structure, mechanism, and inhibition of Hedgehog acyltransferase. Mol Cell 81, 5025-5038 e5010 (2021).

      14. Sacks FM, Jensen MK. From High-Density Lipoprotein Cholesterol to Measurements of Function: Prospects for the Development of Tests for High-Density Lipoprotein Functionality in Cardiovascular Disease. Arterioscler Thromb Vasc Biol 38, 487-499 (2018).

    1. Author Response

      The following is the authors’ response to the previous reviews

      The revised manuscript is much improved - many unclear points are now better explained. However, in our opinion, some issues could still be significantly improved.

      1. Statistics: none of us are experts in statistics but several things remain questionable in our opinion and if it were our study, we would consult with an expert:

      a) while we understand the authors note about N-chasing and p-hacking, we wonder how the number of N's was premeditated before obtaining the results. Why in 4M an N of 3 is sufficient while in 3E the N is >20 (and not mentioned). At the very least, we think it would be wise to be cautious when stating something as not-significant when it is clear (as in 4M) that the likelihood of it actually being statistically significant is quite large.

      b) In most analyses, the data is not only normalized by actin or some other measure but also to the first (i.e left side on the graph) condition, resulting in identical data points that equal '1' (in Figure 4 alone - C; I; K; M; and O) - while this might be scientifically sound, it should be mentioned (the specific normalization) and also note that this technique shadows any real variance that exists in the original data in this condition. consider exploring techniques to overcome this issue.

      c) In 3C, - if we understand the experiment, you want to convince us that the DIFFERENCE between eB2-FC compared to FC is larger in the control compared to the experiment. We are not absolutely sure that the statistical tools employed here are sufficient - which is why we would consult an expert.

      A) We are aware that many studies do not consistently quantify such experiments. For example, there are essentially no published examples of the signalling timelines of EphB2 receptors as in Fig. 5. By striving to quantifying such biochemical effects, an unquantified experiment stands out, and so perhaps we were too strict by trying to quantify as many experiments as possible, resulting in low n’s for some of them. We acknowledge that additional experiments on EPHB1 protein stability may reach significance. We have adjusted our text on line 332-335 to point to this interesting trend, and slightly changed the conclusion to this section. Similarly, we commented on similar trends when describing Figs. 1E and 4G on lines 901 and 952.

      B) For the Western blot band intensity normalisation, we believe that our method is scientifically sound. Normally, when the replicate samples are loaded on one gel and blotted on the same membrane, the experimenter only needs to normalise the target band intensity to its cognate loading control band intensity for quantitation. However, we usually have a large number of samples from multiple experiments, carried out on different dates. For example, in Fig. 4B,C there are 7 biological replicates collected from 7 experiments and in Fig. 4D there are 10 protein samples. It is not possible for us to run all samples on the same gel. In addition, due to the combined effects of variance in transfer efficiency, the potency of antibodies, detection efficiency and the developing time for each blot, it is practically impossible to generate similar band intensity for each batch. Thus, we use normalisation of test bands to the loading control for individual experiments, and this analysis method is widely accepted by reputable journals with a focus on biochemical experiments (for example: PMID 37695914: Fig. 3 A,B,C; PMID 36282215: Fig. 3 B,C,D,E; PMID 33843588: Fig. 3 C,D,E,F,G,H). Since the value of the first sample on the plot is 1, which is a hypothetical value and does not meet the parametric test requirement, we performed one-sample t-test for statistics when other samples are compared with the first sample (PMID 35243233 Fig. 6 A,B,C,D; https://www.graphpad.com/quickcalcs/oneSampleT1/, “A one sample t-test compares the mean with a hypothetical value. In most cases, the hypothetical value comes from theory. For example, if you express your data as 'percent of control', you can test whether the average differs significantly from 100.”). Thus, we believe that our normalisation and statistical methods are both correct with a large number of precedents.

      C) This comment refers to the cell collapse experiment shown in Fig. 3C for which the data are plotted in Fig. 3D. We stand by the statistical method used. There are two groups of cells (CTRLCRISPR and MYCBP2 CRISPR) and two treatments for each cell group (Fc control and eB2), thus we should use two-way ANOVA. Since we compared the cell retraction effects of Fc and eB2 on the two groups of cells, Sidak post hoc comparison is the right method to avoid errors introduced by multiple comparisons. Here is an example of an eLife article that used the same statistical method for similar comparisons: PMID 37830910, Fig. 1 H,I. To make the comparison easier, we grouped the experiments by cell type (CTRLCRISPR and MYCBP2 CRISPR) as opposed to by treatment. Below, the old version is on the right, and the new version is on the left. The conclusion is that eB2 induces less cell collapse in cells depleted of MYCBP2, when compared to the control cells. However, eB2 is still able to collapse cells lacking MYCBP2.

      Author response image 1.

      Revisiting these data, we noticed an error introduced when CC compiled the data used to generate Fig. 3D. The data were acquired from nine biological replicates per condition. CC used a mix of two methods for cell collapse rate calculation: the first method involved the sum of collapsed cells and all cells from multiple regions of one coverslip (biological replicate). The second method involved computing a collapse rate in each region which then was used to calculate the average collapse rate for the entire coverslip (technical replicate). Given the small cell numbers due to sparse culture conditions, we believe that the first method is a more conservative approach. We hence re-plotted all replicate data using the first method. This resulted in slightly different % collapse and p values. These were changed accordingly in the text and plot and do not affect the conclusion of this experiment.

      2) thanks for the clarification that the interaction between the extracellular domain of EPHB2 and MYCBP2 might not occur directly - however, unless we missed this it was not clearly stated in the text. It is an important point and also a cool direction for the future - to find the elusive co-receptor that actually helps EPHB2 and MYCBP2 form a complex.

      We now also refer to this in the results section on line 215.

      “Since EPHB2 is a transmembrane protein and MYCBP2 is localised in the cytosol, these experiments suggest that the interaction between the extracellular domain of EPHB2 and MYCBP2 might be indirect and mediated by other unknown transmembrane proteins.”

      3) The Hela CRISPR cell line is better explained in the response letter but still not sufficiently explained in the text for a non-expert reader. If the authors want any reader to comprehend this, we would strongly recommend adding a scheme.

      We now include a schematic outlining the CRISPR cell generation as Fig. 3A and its description on line 926.

      Author response image 2.

      4) To clarify some of our previous (and persisting) concerns about Figure 3D/E - it is true that a reduction in 25% of cell size is dramatic. But (if we understand correctly) your claim is that a reduction in 22% (this is a guess, as the actual numbers are not supplies) is significantly less than 25%. Even if it is, statistically speaking, significant, what is the physiological relevance of this very slight effect? In this experiment, the N was quite large, and we wonder if the images in D are representative - it would be nice to label the data points in E to highlight which images you used.

      We now mention the average cell area contraction measurements in the legend to Fig. 3F on line 935. We also tracked down the individual cells shown in Fig. 3E and they are now labelled as data points in blue in Fig. 3F. HeLa cell collapse is a simplified model of EPHB2 function and we do not know whether the difference between the behaviour of CTRLCRISPR and MYCBP2 CRISPR cells is physiologically significant and thus we prefer not to speculate on this.

      5) Figure 3F and other stripe assays - In the end, it is your choice how to quantify. We believe that quantifying area of overlap is a more informative and objective measurement that might actually benefit your analyses. That said, if you do keep the quantification as it is now, you have to define the threshold of what you mean by "cell/s (or an axon in 7A, where it is even more complicated as are you eluding to primary, secondary, or even smaller branches) are RESIDING within the stripe". Is 1% overlap sufficient or do you need 10 or 50% overlap?

      We now added this statement to the methods on line 745: “A cell was considered to be on an ephrin-B2 stripe when more than 50% of its nucleus was located on that stripe”. For chick explant stripe assay, when measuring the length of an axon on a stripe, we only measured the main axons originated from the explants.

      For explant/stripe experiments in Fig. 7 AB, we now use the term “GFP-expressing neurite” rather than “branch”. This was already present in the results of the previous version, but the methods and legend needed to be brought up to date (lines 786 and 1008. We think that “branch” was a confusing term that was supposed to mean the same thing as “neurite” but came across as some indication of branching. We do not know whether the GFP+ neurites were primary or secondary extensions of explants, or in fact, whether some of them contained more than one axon. We also adjusted the method to reflect the fact that some stripes were used in conjunction with a single explant and added a reference to a previous study extensively using this method (Poliak et al., 2015) on line 778.

      6) We still don't get the link to the lysosomal degradation. Your data suggests that in your cells EPHB2 is primarily degraded by the lysosomal pathway and not proteasome. Any statement about MYCBP2 is not strongly supported by the data, in our opinion - Unless you develop some statistical measurement that shows that the effect of BafA1 is statistically different in MYCBP2 cells than in control cells. Currently, this is not the case and the link is therefore not warranted in our opinion.

      We generated a new version of Fig. 4K with average increase in EPHB2 levels in the presence of BafA1 and CoQ, compared to DMSO treated controls (see below). BafA1 and CoQ restored EPHB2 protein levels by 19% and 14% respectively in CtrlCRISPR cells, while the inhibitors restored EPHB2 protein levels by 40% and 35% respectively in MYCBP2 CRISPR cells.

      Author response image 3.

      For each of the 4 replicates, the increase in EPHB2 levels by BafA1 compared to DMSO is as follows:

      Author response table 1.

      These values are not significantly different between CtrlCRISPR cells versus MYCBP2 CRISPR cells (p= 0.08, student’s t test). Similarly for the CoQ experiment. We now temper our conclusion for this experiment: Although the difference in percentage increase between CTRLCRISPR cells and MYCBP2CRISPR cells is not significant, this trend raises the possibility that the loss of MYCBP2 promotes EPHB2 receptor degradation through the lysosomal pathway (line 319). We also adjusted the section title (line 306).

      7) While the C. elegans part is now MUCH better explained - we are not sure we understand the additional insight. The fact that vab-1 and glo4 double mutants are additive as are vab1 and fsn1, suggest they act in parallel (if the mutants are NULL, and not if they are hypomorphs, if one wants to be accurate) - how this relates to your story is unclear. The vab1/rpm1 double mutant is still uninformative and incomplete. rpm1 phenotype is so severe that nothing would make it more severe. We read the Jin paper that the authors directed to - nothing makes the rpm1 phenotype more severe. Yes, some DOWNSTREAM elements make the rpm1 phenotype LESS severe - this is not something you were testing, to the best of our knowledge. Rather, you wanted to see if rpm1 mutant resulted in stabilization of vab1 and thus suppression of vab1 phenotype - we are just not sure the system is amenable to test (actually reject) your hypothesis that Vab1 is degraded by rpm1. Also, assuming we are talking about NULLs, the fact that the rpm1 phenotype is WAY stronger than the vab1 mutant, suggests that rpm1 functions via multiple routes, adding even more complexity to the system. Given these results, despite the much improved clarity, we are still not sure that the worm data adds new insight, rather than potentially confusing the reader.

      We realise that the genetic interactions between vab-1 and the RPM-1/MYCBP2 signalling network are complicated. However, we insist on keeping the data for the sake of its availability for future studies and completeness. We also think it is important for readers and the community to see these data, even if the authors and reviewers are not entirely in agreement about the importance/interpretation of experimental outcomes. It is our hope that the community will examine the results and draw their own conclusions.

      A few points of clarification:

      The C. elegans experiments were designed to test genetically if the vertebrate interactions between EPHB2 and MYCBP2 and its signalling network are conserved. We studied two kinds of interactions: (1) between vab-1 and RPM-1/MYCBP2 downstream proteins (GLO-4 and FSN-1) and (2) between vab-1 and rpm-1. For these studies, we used null alleles for vab-1, glo-4 and fsn-1 which is now noted on lines 440, 453, 475 and 859. Our findings are consistent with the VAB-1 Ephrin receptor functioning in parallel to known RPM-1 binding proteins. This is further supported by new data: vab-1; fsn-1 double mutants showed enhanced incidence of axon overextension defects using a second transgenic background, zdIs5 (Pmec-4::GFP), to visualize axon termination (Fig. 8F).

      This second transgenic background also allowed us to generate new data to address your concerns about phenotypic saturation in rpm-1 mutants. To do this, we used the zdIs5 (Pmec4::GFP) genetic background, in which axon termination defects are not saturated in rpm-1 mutants (Fig. 8F) because they can be enhanced by other mutants such as cdc-42 and unc-33 (Fig. 7C, D, in Borgen et al. Development 144, 4658–4672 (2017), PMID 29084805). In this new background, we found that vab-1 loss of function fails to enhance the incidence of severe “hook” defects in rpm-1 mutants which is an indication that the two genes function in the same pathway. Importantly, prior studies in this background, also showed that mutants in the RPM-1 signalling network (e.g. fsn-1, glo-4 and ppm-2) do not enhance the incidence of severe “hook” defects as double mutants with rpm-1 compared to rpm-1 single mutants (Fig. 7B, ibid.).

      To reflect these ideas more clearly, we revised the Results section pertaining to C. elegans genetics (starting on line 418) and tempered our discussion (lines 517). Basically, this section now says that we studied genetic interactions between vab-1 and the RPM-1/MYCBP2 signalling network. From these experiments we conclude that: (1) The enhancement of overextension defects in vab-1; glo-4 and vab-1; fsn-1 double mutants compared to single mutants indicates that VAB-1/EPHR functions in parallel to known RPM-1 binding proteins to facilitate axon termination, and (2) Since the vab-1; rpm-1 double mutants do not display an increased frequency or severity of overextension defects compared to rpm-1 single mutants, VAB-1 /EPHR functions in the same genetic pathway as RPM-1/MYCBP2.

      The new genetic data included in this version were generated by Karla J. Opperman who is now included as a co-author.

      Further corrections:

      Author response image 4.

      Because of the errors associated with quantifications in Fig. 3D (see above), we reviewed other quantification methodologies and noticed another discrepancy that required a correction. In the hippocampal neuron growth cone collapse assay shown in the previous version of Fig. 7 D (left), the growth cones were classified into three groups: 1, fully collapsed; 2, hard to tell, but not fully collapsed; 3, fan-shape cones. Two different quantifications were performed as follows: (1), number of fully collapsed cones divided by the numbers of all growth cones; (2), number of fully collapsed cones divided by [number of fully collapsed cones + fan-shape cones]. CC erroneously used the second method to generate Fig. 7D.

      We think that the first method is more appropriate. Furthermore, since n=5 for the Fc and eB1-Fc conditions, but n=3 for the eB2-Fc condition, we decided to omit it. The final plot for figure 7D is the following:

      Author response image 5.

      Our conclusion still stands that exogenous FBD1 WT overexpression impaired the growth cone collapse mediated by EphB.

    1. Author Response

      Response to Reviewer 1:

      Summary of what the author was trying to achieve: In this study, the author aimed to develop a method for estimating neuronal-type connectivity from transcriptomic gene expression data, specifically from mouse retinal neurons. They sought to develop an interpretable model that could be used to characterize the underlying genetic mechanisms of circuit assembly and connectivity.

      Strengths: The proposed bilinear model draws inspiration from commonly implemented recommendation systems in the field of machine learning. The author presents the model clearly and addresses critical statistical limitations that may weaken the validity of the model such as multicollinearity and outliers. The author presents two formulations of the model for separate scenarios in which varying levels of data resolution are available. The author effectively references key work in the field when establishing assumptions that affect the underlying model and subsequent results. For example, correspondence between gene expression cell types and connectivity cell types from different references are clearly outlined in Tables 1-3. The model training and validation are sufficient and yield a relatively high correlation with the ground truth connectivity matrix. Seemingly valid biological assumptions are made throughout, however, some assumptions may reduce resolution (such as averaging over cell types), thus missing potentially important single-cell gene expression interactions.

      Thank you for acknowledging the strengths of this work. The assumption to average gene expression data across individual cells within a given cell type was made in response to the inherent limitations of, for example, the mouse retina dataset, where individual cell-level connectivity and gene expression data are not profiled jointly (the second scenario in our paper). This approach was a necessary compromise to facilitate the analysis at the cell type level. However, in datasets where individual cell-level connectivity and gene expression data are matched, such as the C.elegans dataset referenced below, our model can be applied to achieve single-cell resolution (the first scenario in our paper), offering a more detailed understanding of genetic underpinnings in neuronal connectivity.

      Weaknesses: The main results of the study could benefit from replication in another dataset beyond mouse retinal neurons, to validate the proposed method. Dimensionality reduction significantly reduces the resolution of the model and the PCA methodology employed is largely non-deterministic. This may reduce the resolution and reproducibility of the model. It may be worth exploring how the PCA methodology of the model may affect results when replicating. Figure 5, ’Gene signatures associated with the two latent dimensions’, lacks some readability and related results could be outlined more clearly in the results section. There should be more discussion on weaknesses of the results e.g. quantification of what connectivity motifs were not captured and what gene signatures might have been missed.

      I value the suggestion of validating the propose method in another dataset. In response, I found the C.elegans dataset in the references the reviewer suggested below a good candidate for this purpose, and I plan to explore this dataset and incorporate findings in the revised manuscript. I understand the concerns regarding the PCA methodology and its potential impact on the model’s resolution and reproducibility. In response, alternative methods, such as regularization techniques, will be explored to address these issues. Additionally, I agree that enhancing the clarity and readability of Figure 5, as well as including a more comprehensive discussion of the model’s limitations, would significantly strengthen the manuscript.

      The main weakness is the lack of comparison against other similar methods, e.g. methods presented in Barabási, Dániel L., and Albert-László Barabási. "A genetic model of the connectome." Neuron 105.3 (2020): 435-445. Kovács, István A., Dániel L. Barabási, and Albert-László Barabási. "Uncovering the genetic blueprint of the C. elegans nervous system." Proceedings of the National Academy of Sciences 117.52 (2020): 33570-33577. Taylor, Seth R., et al. "Molecular topography of an entire nervous system." Cell 184.16 (2021): 4329-4347.

      Thank you for highlighting the importance of comparing our model with others, particularly those mentioned in your comments. After reviewing these papers, I find that our bilinear model aligns closely with the methods described, especially in [1, 2]. To see this, let’s start with Equation 1 in Kovács et al. [2]:

      In this equation, B represents the connectivity matrix, while X denotes the gene expression patterns of individual neurons in C.elegans. The operator O is the genetic rule operator governing synapse formation, linking connectivity with individual neuronal expression patterns. It’s noteworthy that the work of Barabási and Barabási [1] explores a specific application of this framework, focusing on O for B that represents biclique motifs in the C.elegans neural network.

      To identify the the operator O, the authors sought to minimize the squared residual error:

      with regularization on O.

      Adopting the notation from our bilinear model paper and using Z to represent the connectivity matrix, the above becomes

      Coming back to the bilinear model formulation, the optimization problem, as formulated for the C.elegans dataset where individual neuron connectivity and gene expression are accessible, takes the form:

      where we consider each neuron as a distinct neuronal type. In addition, we extend the dimensions of X and Y to encompass the entire set of neurons in C.elegans, with X = Y ∈ Rn×p, where n signifies the total number of neurons and p the number of genes. Accordingly, our optimization challenge evolves into:

      Upon comparison with the earlier stated equation, it becomes clear that our approach aligns consistently with the notion of O = ABT. This effectively results in a decomposition of the genetic rule operator O. This decomposition extends beyond mere mathematical convenience, offering several substantial benefits reminiscent of those seen in the collaborative filtering of recommendation systems:

      • Computational Efficiency: The primary advantage of this approach is its improvement in computational efficiency. For instance, solving for O ∈ Rp×p necessitates determining p2 entries. In contrast, solving for A ∈ Rp×d and B ∈ Rp×d involves determining only 2pd entries, where p is the number of genes, and d is the number of latent dimensions. Assuming the existence of a lower-dimensional latent space (d << p) that captures the essential variability in connectivity, resolving A and B becomes markedly more efficient than resolving O. Additionally, from a computational system design perspective, inferring the connectivity of a neuron allows for caching the latent embeddings of presynaptic neurons XA or postsynaptic neurons XB with a space complexity of O(nd). This is significantly more space-efficient than caching XO or OXT, which has a space complexity of O(np). This difference is particularly notable when dealing with large numbers of neurons, such as those in the entire mouse brain. The bilinear modeling approach thus enables effective handling of large datasets, simplifying the optimization problem and reducing computational load, thereby making the model more scalable and faster to execute.

      • Interpretability: The separation into A for presynaptic features and B for postsynaptic features provides a clearer understanding of the distinct roles of pre- and post- synaptic neurons in forming the connection. By projecting the pre- and post- synaptic neurons into a shared latent space through XA and YB, one can identify meaningful representations within each axis, as exemplified in different motifs from the mouse retina dataset. The linear characteristics of A and B facilitate direct evaluation of each gene’s contribution to a latent dimension. This interpretability, offering insights into the genetic factors influencing synaptic connections, is beyond what O could provide itself.

      • Flexibility and Adaptability: The bilinear model’s adaptability is another strength. Much like collaborative filtering, which can manage very different user and item features, our bilinear model can be tailored to synaptic partners with genetic data from varied sources. A potential application of this model is in deciphering the genetic correlates of long-range projectomic rules, where pre- and post-synaptic neurons are processed and sequenced separately, or even involving post-synaptic targets being brain regions with genetic information acquired through bulk sequencing. This level of flexibility also allows for model adjustments or extensions to incorporate other biological factors, such as proteomics, thereby broadening its utility across various research inquiries into the determinants of neuronal connectivity.

      In the study by Taylor et al. [3], the authors introduced a generalization of differential gene expressions (DGE) analysis called network DGE (nDGE) to identify genetic determinants of synaptic connections. It focuses on genes co-expressed across pairs of neurons connected, compared with pairs without connection.

      As the authors acknowledged in the method part of the paper, nDGE can only examine single genes co-expressed at synaptic terminals: "While the nDGE technique introduced here is a generalization of standard DGE, interrogating the contribution of pairs of genes in the formation and maintenance of synapses between pairs of neurons, nDGE can only account for a single co-expressed gene in either of the two synaptic terminals (pre/post)."

      In contrast, the bilinear model offers a more comprehensive analysis by seeking a linear combination of gene expressions in both pre- and post-synaptic neurons. This model goes beyond the scope of examining individual co-expressed genes, as it incorporates different weights for the gene expressions of pre- and post-synaptic neurons. This feature of the bilinear model enables it to capture not only homogeneous but also complex and heterogeneous genetic interactions that are pivotal in synaptic connectivity. This highlights the bilinear model’s capability to delve into the intricate interactions of synaptic gene expression.

      Appraisal of whether the author achieved their aims, and whether results support their conclusions: The author achieved their aims by recapitulating key connectivity motifs from single-cell gene expression data in the mouse retina. Furthermore, the model setup allowed for insight into gene signatures and interactions, however could have benefited from a deeper evaluation of the accuracy of these signatures. The author claims the method sets a new benchmark for single-cell transcriptomic analysis of synaptic connections. This should be more rigorously proven. (I’m not sure I can speak on the novelty of the method)

      I value your appraisal. In response, additional validation of the bilinear model on a second dataset will be undertaken.

      Discussion of the likely impact of the work on the field, and the utility of methods and data to the community : This study provides an understandable bilinear model for decoding the genetic programming of neuronal type connectivity. The proposed model leaves the door open for further testing and comparison with alternative linear and/or non-linear models, such as neural networkbased models. In addition to more complex models, this model can be built on to include higher resolution data such as more gene expression dimensions, different types of connectivity measures, and additional omics data.

      Thank you for your positive assessment of the potential impact of the study.

      Response to Reviewer 2:

      Summary: In this study, Mu Qiao employs a bilinear modeling approach, commonly utilized in recommendation systems, to explore the intricate neural connections between different pre- and post-synaptic neuronal types. This approach involves projecting single-cell transcriptomic datasets of pre- and post-synaptic neuronal types into a latent space through transformation matrices. Subsequently, the cross-correlation between these projected latent spaces is employed to estimate neuronal connectivity. To facilitate the model training, connectomic data is used to estimate the ground-truth connectivity map. This work introduces a promising model for the exploration of neuronal connectivity and its associated molecular determinants. However, it is important to note that the current model has only been tested with Bipolar Cell and Retinal Ganglion Cell data, and its applicability in more general neuronal connectivity scenarios remains to be demonstrated.

      Strengths: This study introduces a succinct yet promising computational model for investigating connections between neuronal types. The model, while straightforward, effectively integrates singlecell transcriptomic and connectomic data to produce a reasonably accurate connectivity map, particularly within the context of retinal connectivity. Furthermore, it successfully recapitulates connectivity patterns and helps uncover the genetic factors that underlie these connections.

      Thank you for your positive assessment of the paper.

      Weaknesses:

      1. The study lacks experimental validation of the model’s prediction results.

      Thank you for pointing out the importance of experimental validation. I acknowledge that the current version of the study is focused on the development and validation of the computational model, using the datasets presently available to us. Moving forward, I plan to collaborate with experimental neurobiologists. These collaborations are aimed at validating our model’s predictions, including the delta-protocadherins mentioned in the paper. However, considering the extensive time and resources required for conducting and interpreting experimental results, I believe it is more pragmatic to present a comprehensive experimental study, including the design and execution of experiments informed by the model’s predictions, in a separate follow-up paper. I intend to include a paragraph in the discussion of this paper outlining the future direction for experimental validation.

      1. The model’s applicability in other neuronal connectivity settings has not been thoroughly explored.

      I recognize the importance of assessing the model across different neuronal systems. In response to similar feedback from Reviewer 1, I am keen to extend the study to include the C.elegans dataset mentioned earlier. The results from applying our bilinear model to the second dataset will be incorporated into the revised manuscript.

      1. The proposed method relies on the availability of neuronal connectomic data for model training, which may be limited or absent in certain brain connectivity settings.

      The concern regarding the dependency of our model on the availability of connectomic data is valid. While complete connectomes are available for organisms like C.elegans and Drosophila, and efforts are underway to map the connectome of the entire mouse brain, such data may not always be accessible for all research contexts. Recognizing this limitation, part of the ongoing research is to explore ways to adapt our model to the available data, such as projectomic data. Furthermore, our bilinear model is compatible with trans-synaptic virus-based sequencing techniques [4, 5], allowing us to leverage data from these experimental approaches to uncover the genetic underpinnings of neuronal connectivity. These initiatives are crucial steps towards broadening the applicability of our model, ensuring its relevance and usefulness in diverse brain connectivity studies where detailed connectomic data may not be readily available.

      References

      [1] Dániel L. Barabási and Albert-László Barabási. A genetic model of the connectome. Neuron, 105(3):435–445, 2020.

      [2] István A. Kovács, Dániel L. Barabási, and Albert-László Barabási. Uncovering the genetic blueprint of the c. elegans nervous system. Proceedings of the National Academy of Sciences, 117(52):33570–33577, 2020.

      [3] Seth R. Taylor, Gabriel Santpere, Alexis Weinreb, Alec Barrett, Molly B. Reilly, Chuan Xu, Erdem Varol, Panos Oikonomou, Lori Glenwinkel, Rebecca McWhirter, Abigail Poff, Manasa Basavaraju, Ibnul Rafi, Eviatar Yemini, Steven J. Cook, Alexander Abrams, Berta Vidal, Cyril Cros, Saeed Tavazoie, Nenad Sestan, Marc Hammarlund, Oliver Hobert, and David M. 3rd Miller. Molecular topography of an entire nervous system. Cell, 184(16):4329–4347, 2021.

      [4] Nicole Y. Tsai, Fei Wang, Kenichi Toma, Chen Yin, Jun Takatoh, Emily L. Pai, Kongyan Wu, Angela C. Matcham, Luping Yin, Eric J. Dang, Denise K. Marciano, John L. Rubenstein, Fan Wang, Erik M. Ullian, and Xin Duan. Trans-seq maps a selective mammalian retinotectal synapse instructed by nephronectin. Nat Neurosci, 25(5):659–674, May 2022.

      [5] Aixin Zhang, Lei Jin, Shenqin Yao, Makoto Matsuyama, Cindy van Velthoven, Heather Sullivan, Na Sun, Manolis Kellis, Bosiljka Tasic, Ian R. Wickersham, and Xiaoyin Chen. Rabies virusbased barcoded neuroanatomy resolved by single-cell rna and in situ sequencing. bioRxiv, 2023.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a potentially valuable discovery which indicates that activation of the P2RX7 pathway can reduce the lung fibrosis after its establishment by inflammatory damage. If confirmed, the study could clarify the role of specific immune networks in the establishment and progression of lung fibrosis. However, the presented data and analyses are incomplete as they primarily rely on limited pharmacological treatments with modest effect sizes. I hope you will be convinced by the validity of our approaches with the following explanation/information and I remain at your disposal to discuss

      Public Reviews:

      Reviewer #1 (Public Review):

      In this revised preprint the authors investigate whether a presumably allosteric P2RX7 activating compound that they previously discovered reduces fibrosis in a bleomycin mouse model. They chose this particular model as publicly available mRNA data indicate that the P2RX7 pathway is downregulated in idiopathic pulmonary fibrosis patients compared to control individuals. In their revised manuscript, the authors use three proxies of lung damage, Ashcroft score, collagen fibers, and CD140a+ cells, to assess lung damage following the administration of bleomycin. These metrics are significantly reduced on HEI3090 treatment. Additional data implicate specific immune cell infiltrates and cytokines, namely inflammatory macrophages and damped release of IL-17A, as potential mechanistic links between their compound and reduced fibrosis. Finally, the researchers transplant splenocytes from WT, NLRP3-KO, and IL-18-KO mice into animals lacking the P2RX7 receptor to specifically ascertain how the transplanted splenocytes, which are WT for P2RX7 receptor, respond to HEI3090 (a P2RX7 agonist). Based on these results, the authors conclude that HEI3090 enhanced IL-18 production through the P2RX7-NLRP3 inflammasome axis to dampen fibrosis.

      These findings could be interesting to the field, as there are conflicting results as to whether NLRP3 activation contributes to fibrosis and if so, at what stage(s) (e.g., acute damage phase versus progression). The revised manuscript is more convincing in that three orthogonal metrics for lung damage were quantified. However, major weaknesses of the study still include inconsistent and small effect sizes of HEI3090 treatment versus either batch effects from transplanted splenocytes or the effects of different genetic backgrounds. Moreover, the fundamental assumption that HEI3090 acts specifically and functionally through the P2RX7 pathway in this model cannot be directly tested, as the authors now provide results indicating that P2RX7 knockout mice do not establish lung fibrosis on bleomycin treatment.

      I’m particularly concerned by the assumption made by reviewer 1 concerning the fact that P2RX7 knockout mice do not establish lung fibrosis on bleomycin treatment.

      Indeed, what we showed in the point-to-point response is that BLM induces fibrosis in both WT and P2RX7 KO mice, but the intensity of the fibrosis is reduced in P2RX7KO mice, panel A. Therefore, as discussed in our first response, our results confirmed the previous publication of Riteau et al, that P2RX7 participates in BLM-induced lung fibrosis (see panel B).

      Author response image 1.

      Bleomycin induced lung fibrosis in WT versus p2rx7 KO mice. A: lung from BLM-treated mice were stained with HE and fibrosis was quantified using the Ashcroft protocol. Result showed that fibrosis induced by BLM in KO mice is reduced as compared to WT mice. B: Representative images of lung sections at day 14 after BLM treatment stained with H&E as published in Riteau et al. and illustrating that fibrosis induced by BLM in KO mice is reduced as compared to WT mice. WT mice vehicle (n=4) or p2rx7 KO (n=6) mice. Two-tailed Mann-Whitney test, p values: **p < 0.01.

      Importantly, this lower intensity of lung fibrosis in P2RX7 KO mice, does not interfere with the capacity of our molecule to attenuate lung fibrosis, as demonstrated in the adoptive transfer of IL1B KO splenocytes in P2RX7 KO mice, in which HEI3090 decreases the Ashcroft score, the % of fibrosis and the collagen fibers (see below).

      Author response image 2.

      HEI3090 activity requires P2RX7’s expressing immune cells: Experimental design. p2rx7-/- mice were given 3.106 il1β-/- splenocytes i.v. one day prior to BLM delivery (i.n. 2.5 U/kg). Mice were treated daily i.p. with 1.5 mg/kg HEI3090 or vehicle for 14 days. (C) Representative images of lung sections at day 14 after treatment stained with H&E and Sirius Red with il1β-/- splenocytes, bar= 100 µm (left) and fibrosis score assessed by the Ashcroft method, the % of fibrosis and the content of collagen fibers (right). Each point represents one mouse (n=2 in WT and NLRP3 experiment, n =1 in IL18 and IL1B experiment), data represented as violin plot or mean±SEM, two-tailed Mann-Whitney test, *p < 0.05. WT: Wildtype, KO: P2RX7 knock-out

      Importantly, in the same experimental setting, e.g adoptive transfer of splenocytes from different genetic backgrounds, HEI3090 decreases the fibrosis intensity only with WT and IL1B KO splenocytes and not with NLRP3 KO and IL18KO splenocytes.

      Author response image 3.

      HEI3090 activity requires P2RX7’s expressing immune cells: Experimental design. p2rx7-/- mice were given 3.106 WT, NLRP3-/-, IL18-/- or IL1β-/- splenocytes i.v. one day prior to BLM delivery (i.n. 2.5 U/kg). Mice were treated daily i.p. with 1.5 mg/kg HEI3090 or vehicle for 14 days. Fibrosis in whole lung was assessed by the % of fibrosis (upper panel) and the content of collagen fibers (lower panel). Each point represents one mouse (n=2 in WT and NLRP3 experiments, n =1 in IL18 and IL1B experiment). Data represented as violin plot or mean±SEM, two-tailed Mann-Whitney test, *p < 0.05. WT: Wildtype, KO: P2RX7 knock-out

      In order to provide clear evidence that HEI3090 functions through P2RX7, a different lung fibrosis model that does not require P2RX7 would be necessary. For example, in such a system the authors could demonstrate a lack of HEI3090-mediated therapeutic effect on P2RX7 knockout.

      Since BLM induces lung fibrosis in P2RX7 KO mice as we showed in this manuscript and as already published by Riteau in 2010, shown earlier in our response (first figure) and because HEI3090 is able to decrease the intensity of fibrosis in WT and IL1B-/- → P2RX7 KO mice but not in KO, NLRP3-/- → P2RX7 KO and IL18-/- → P2RX7 KO mice we believe that our data sustain the conclusion that

      1. HEI3090 required the expression of P2RX7 in immune cells to mediate the antifibrotic activity,

      2. IL1B is not a crucial effector mediating the antifibrotic effect of HEI3090.

      Molecularly, additional evidence on specificity, such as thermal proteome profiling and direct biophysical binding experiments, would also enhance the authors' argument that the compound indeed binds P2RX7 directly and specifically. Since all small molecules have some degree of promiscuity, the absence of an additional P2RX7 modulator, or direct recombinant IL-18 administration (as suggested by another reviewer), is needed to orthogonally validate the functional importance of this pathway. Another way the authors could probe pathway specificity would involve co-administering α-IL-18 with HEI3090 in several key experiments (similar to Figure 4L).

      At the moment we have no funds to do these experiments and given the high competition, we have decided to publish our story without these new data.

      Reviewer #2 (Public Review):

      In the study by Hreich et al, the potency of P2RX7-specific positive modulator HEI3090, developed by the authors, for the treatment of Idiopathic pulmonary fibrosis (IPF) was investigated. Recently, the authors have shown that HEI3090 can protect against lung cancer by stimulating dendritic cell P2RX7, resulting in IL-18 production that stimulates IFN-γ production by T and NK cells (DOI: 10.1038/s41467-021-20912-2). Interestingly, HEI3090 increases IL-18 levels only in the presence of high eATP. Since the treatment options for IPF are limited, new therapeutic strategies and targets are needed. The authors first show that P2RX7/IL-18/IFNG axis is downregulated in patients with IPF. Next, they used a bleomycin-induced lung fibrosis mouse model to show that the use of a positive modulator of P2RX7 leads to the activation of the P2RX7/IL-18 axis in immune cells that limits lung fibrosis onset or progression. Mechanistically, treatment with HEI3090 enhanced IL-18-dependent IFN-γ production by lung T cells leading to a decreased production of IL-17 and TGFβ, major drivers of IPF. The major novelty is the use of the small molecule HEI3090 to stimulate the immune system to limit lung fibrosis progression by targeting the P2RX7, which could be potentially combined with current therapies available. Overall, the study was well performed, and the manuscript is clear.

      We thank the reviewer for this very positive comments.

      However, there is need for more details on the description and interpretation of the adoptive transfer experiments, as well as the statistical analyses and number of replicate independent experiments.

      I’m concerned by the reviewer’s comments, and I would like to bring additional information/explanation, which I hope will convince you on the validity of our approaches.

      Author response image 4.

      Adoptive transfer experiment. Adoptive transfer experiments are classically used to document which immune cells participate in immune cell responses (with more than 150 publications in pubmed with the key words adoptive transfer and onco immunology) and intravenous administration is a common route to trigger lungs (PMID: 23336716). To characterize the molecular effector (P2RX7, NLRP3, IL18 and IL1B) accounting for the antifibrotic effect of HEI3090 we purified splenocytes from donor mice and administrated them intra venously in P2RX7 KO mice. As shown in Author response image 4, HEI3090 has no antifibrotic activity when splenocyte isolated from mice invalidated for p2rx7 are iv into P2RX7 KO mice (KO in KO). By contrast, HEI3090 has antifibrotic activity when WT splenocytes expressing P2RX7 (isolated from WT mice) are transferred into P2RX7 KO mice (WT in KO).

      This experiment brings strong evidence to demonstrate the efficacy of adoptive transfer approach to identify molecular effector required to mediate the antifibrotic effect of HEI3090.

      Statistical analyses and number of replicate independent experiments

      We thank the reviewer for his comment, and we apologize to not have been sufficiently clear in our previous response with this miss phrased statement “the experiment was stopped when significantly statistical results were observed” when we should have written “the experiment was stopped when each experimental group contained at least 5 mice”.

      To define the size of experimental groups we did a pilot experiment, with 4 WT mice (e.g. 4 biological replicates) in each group (as shown aside), and a statistical forecasting based on the result of the pilot experiment (40% difference, standard error: 0.9, α risk: 0.05, power: 0.8). Since we focused on the effect of HEI3090 we based our statistical analysis on a one-way ANOVA analysis comparing in each experiment the vehicle and the treated group.

      The pilot experiment and statistical forecasting indicated 4 mice per group to characterize the effect of HEI3090 on BLM-induced lung fibrosis. Each experiment was started with 6 to 8 mice per group. Being aware that 30% of mice can unexpectedly dye due to BLM treatment, we duplicated the experiment, when necessary, to include at least 5 mice in each group of each experiment meaning 5 biological replicates, knowing that 4 mice are sufficient to statistically analyze the results. In each experiment we have checked for the presence of outlier, using the ROULT method, and removed the outliers when necessary.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      This paper tests the idea that schooling can provide an energetic advantage over solitary swimming. The present study measures oxygen consumption over a wide range of speeds, to determine the differences in aerobic and anaerobic cost of swimming, providing a potentially valuable addition to the literature related to the advantages of group living.

      Response: Thank you for the positive comments.

      Strengths:

      The strength of this paper is related to providing direct measurements of the energetics (oxygen consumption) of fish while swimming in a group vs solitary. The energetic advantages of schooling has been claimed to be one of the major advantages of schooling and therefore a direct energetic assessment is a useful result.

      Response: Thank you for the positive comments.

      Weaknesses:

      1) Regarding the fish to water volume ratio, the arguments raised by the authors are valid. However, the ratio used is still quite high (as high as >2000 in solitary fish), much higher than that recommended by Svendsen et al (2006). Hence this point needs to be discussed in the ms (summarising the points raised in the authors' response)

      Response: Thank you for the comments. We have addressed this point in the previous comments. In short, our ratio is within the range of the published literature. We conducted the additional signal-to-noise analysis for quality assurance.

      2) Wall effects: Fish in a school may have been swimming closer to the wall. The fact that the convex hull volume of the fish school did not change as speed increased is not a demonstration that fish were not closer to the wall, nor is it a demonstration that wall effect were not present. Therefore the issue of potential wall effects is a weakness of this paper.

      Response: Thank you for the comments. We have addressed this point in the previous comments. We provided many other considerations in addition to the convex hull volume. In particular, our boundary layer is < 2.5mm, which was narrower than the width of the giant danio of ~10 mm.

      3) The authors stated "Because we took high-speed videos simultaneously with the respirometry measurements, we can state unequivocally that individual fish within the school did not swim closer to the walls than solitary fish over the testing period". This is however not quantified.

      Response: Thank you for the comments. We have addressed this point in the previous comments. We want to note that the statement in the response letter is to elaborate the discussion points, but not stated as data in the manuscript. The bottom line is very few studies used PIV to quantify the thickness of the boundary layer like what we did in our experiment.

      4) Statistical analysis. The authors have dealt satisfactorily with most of the comments.

      However :

      (a) the following comment has not been dealt with directly in the ms "One can see from the graphs that schooling MO2 tends to have a smaller SD than solitary data. This may well be due to the fact that schooling data are based on 5 points (five schools) and each point is the result of the MO2 of five fish, thereby reducing the variability compared to solitary fish."

      (b) Different sizes were used for solitary and schooling fishes. The authors justify using larger fish as solitary to provide a better ratio of respirometer volume to fish volume in the tests on individual fish. However, mass scaling for tail beat frequency was not provided. Although (1) this is because of lack of data for this species and (2) using scaling exponent of distant species would introduce errors of unknown magnitude, this is still a weakness of the paper that needs to be acknowledged here and in the ms.

      Response: Thank you for the comments. We have addressed both points in the previous comments and provided comprehensive discussions. We also stated the caveats in the method section of the manuscript.

      Reviewer #3 (Public Review):

      Zhang and Lauder characterized both aerobic and anaerobic metabolic energy contributions in schools and solitary fishes in the Giant danio (Devario aequipinnatus) over a wide range of water velocities. By using a highly sophisticated respirometer system, the authors measure the aerobic metabolisms by oxygen uptake rate and the non-aerobic oxygen cost as excess post-exercise oxygen consumption (EPOC). With these data, the authors model the bioenergetic cost of schools and solitary fishes. The authors found that fish schools have a J-shaped metabolism-speed curve, with reduced total energy expenditure per tail beat compared to solitary fish. Fish in schools also recovered from exercise faster than solitary fish. Finally, the authors conclude that these energetic savings may underlie the prevalence of coordinated group locomotion in fish.

      The conclusions of this paper are mostly well supported by data.

      Response: Thank you for the positive comments.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      I have read carefully the revised version of the manuscript and would like to thank the authors for addressing all my comments/suggestions.

      I have no additional comments/suggestions. Now, I strongly believe that this manuscript deserves to be published in eLife.

      Response: Thank you for the positive comments.


      The following is the authors’ response to the original reviews.

      General responses

      Many thanks to the reviewers and editors for their very helpful comments on our manuscript. Below we respond (in blue text) to each of the reviewer comments, both the public ones and the more detailed individual comments in the second part of each review. In some cases, we consider these together where the same point is made in both sets of comments. We have made several changes to the manuscript in response to reviewer suggestions, and we respond in detail to the comments of reviewer #2 who feels that we have overstated the significance of our manuscript and suggests several relevant literature references. We prepared a table summarizing these references and why they differ substantially from the approach taken in our paper here.

      Overall, we would like to emphasize to both reviewers and readers of this response document that previous studies of fish schooling dynamics (or collective movement of vertebrates in general, see Commentary Zhang & Lauder 2023 J. Exp. Biol., doi:10.1242/jeb.245617) have not considered a wide speed range and thus the importance of measuring EPOC (excess post-exercise oxygen consumption) as a key component of energy use. Quantifying both aerobic and non-aerobic energy use allows us to calculate the total energy expenditure (TEE) which we show differs substantially and, importantly, non-linearly with speed between schools and measurements on solitary individuals. Comparison between school total energy use and individual total energy use are critical to understanding the dynamics of schooling behaviour in fishes.

      The scope of this study is the energetics of fish schools. By quantifying the TEE over a wide range of swimming speeds, we also show that the energetic performance curve is concave upward, and not linear, and how schooling behaviour modifies this non-linear relationship.

      In addition, one key implication of our results is that kinematic measurements of fish in schools (such as tail beat frequency) are not a reliable metric by which to estimate energy use. Since we recorded high-speed video simultaneously with energetic measurements, we are able to show that substantial energy savings occur by fish in schools with little to no change in tail beat frequency, and we discuss in the manuscript the various fluid dynamic mechanisms that allow this. Indeed, studies of bird flight show that when flying in a (presumed) energy-saving V-formation, wing beat frequency can actually increase compared to flying alone. We believe that this is a particularly important part of our findings: understanding energy use by fish schools must involve actual measurements of energy use and not indirect and sometimes unreliable kinematic measurements such as tail beat frequency or amplitude.

      Reviewer #1 (Public Review):

      Summary:

      In the presented manuscript the authors aim at quantifying the costs of locomotion in schooling versus solitary fish across a considerable range of speeds. Specifically, they quantify the possible reduction in the cost of locomotion in fish due to schooling behavior. The main novelty appears to be the direct measurement of absolute swimming costs and total energy expenditure, including the anaerobic costs at higher swimming speeds.

      In addition to metabolic parameters, the authors also recorded some basic kinematic parameters such as average distances or school elongation. They find both for solitary and schooling fish, similar optimal swimming speeds of around 1BL/s, and a significant reduction in costs of locomotion due to schooling at high speeds, in particular at ~5-8 BL/s.

      Given the lack of experimental data and the direct measurements across a wide range of speeds comparing solitary and schooling fish, this appears indeed like a potentially important contribution of interest to a broader audience beyond the specific field of fish physiology, in particular for researchers working broadly on collective (fish) behavior.

      Response: Thank you for seeing the potential implications of this study. We also believe that this paper has broader implications for collective behaviour in general, and outline some of our thinking on this topic in a recent Commentary article in the Journal of Experimental Biology: (Zhang & Lauder 2023 doi:10.1242/jeb.245617). Understanding the energetics of collective behaviours in the water, land, and air is a topic that has not received much attention despite the widespread view that moving as a collective saves energy.

      Strengths:

      The manuscript is for the most part well written, and the figures are of good quality. The experimental method and protocols are very thorough and of high quality. The results are quite compelling and interesting. What is particularly interesting, in light of previous literature on the topic, is that the authors conclude that based on their results, specific fixed relative positions or kinematic features (tail beat phase locking) do not seem to be required for energetic savings. They also provide a review of potential different mechanisms that could play a role in the energetic savings.

      Response: Thank you for seeing the nuances we bring to the existing literature and comment on the quality of the experimental method and protocols. Despite a relatively large literature on fish schooling based on previous biomechanical research, our studies suggest that direct measurement of energetic cost clearly demonstrates the energy savings that result from the sum of different fluid dynamic mechanisms depending on where fish are, and also emphasizes that simple metrics like fish tail beat frequency do not adequately reflect energy savings during collective motion.

      Weaknesses:

      A weakness is the actual lack of critical discussion of the different mechanisms as well as the discussion on the conjecture that relative positions and kinematic features do not matter. I found the overall discussion on this rather unsatisfactory, lacking some critical reflections as well as different relevant statements or explanations being scattered across the discussion section. Here I would suggest a revision of the discussion section.

      Response: The critical discussion of the different possible energy-saving mechanisms is indeed an important topic. We provided a discussion about the overall mechanism of ‘local interactions’ in the first paragraph of “Schooling Dynamics and energy conservation”. To clarify, our aim with Figure 1 is to introduce the current mechanisms proposed in the existing engineering/hydrodynamic literature that have studied a number of possible configurations both experimentally and computationally. Thank you for the suggestion of better organizing the discussion to critically highlight different mechanisms that would enable a dynamic schooling structure to still save energy and why the appendage movement frequency does not necessarily couple with the metabolic energy expenditure. Much of this literature uses computational fluid dynamic models or experiments on flapping foils as representative of fish. This exact issue is of great interest to us, and we are currently engaged in a number of other experiments that we hope will shed light on how fish moving in specific formations do or don’t save energy.

      Our aim in presenting Figure 1 at the start of the paper was to show that there are several ways that fish could save energy when moving in a group as shown by engineering analyses, but before investigating these various mechanisms in detail we first have to show that fish moving in groups actually do save energy with direct metabolic measurements. Hence, our paper treats the various mechanisms as inspiration to determine experimentally if, in fact, fish in schools save energy, and if so how much over a wide speed range. Our focus is to experimentally determine the performance curve that shows energy use as speed increases, for schools compared to individuals. Therefore, we have elected not to go into detail about these different hydrodynamic mechanisms in this paper, but rather to present them as a summary of current engineering literature views and then proceed to document energy savings (as stated in the second last paragraph of Introduction). We have an Commentary paper in the Journal of Experimental Biology that addresses this issue generally, and we are reluctant to duplicate much of that discussion here (Zhang & Lauder 2023 doi:10.1242/jeb.245617). We are working hard on this general issue as we agree that it is very interesting. We have revised the Introduction (second last paragraph of Introduction) and Discussion (first paragraph of Discussion) to better indicate our approach, but we have not added any significant discussion of the different hydrodynamic energy saving proposals as we believe that it outside the scope of this first paper and more suitable as part of follow-up studies.

      Also, there is a statement that Danio regularly move within the school and do not maintain inter-individual positions. However, there is no quantitative data shown supporting this statement, quantifying the time scales of neighbor switches. This should be addressed as core conclusions appear to rest on this statement and the authors have 3d tracks of the fish.

      Response: Thank you for pointing out this very important future research direction. Based on our observations and the hypothesized mechanisms for fish within the school to save energy (Fig. 1), we have been conducting follow-up experiments to decipher the multiple dynamic mechanisms that enable the fish within the school to save energy. Tracking the 3D position of each individual fish body in 3D within the fish school has proven difficult. We currently have 3D data on the nose position obtained simultaneously with the energetic measurements, but we do not have full 3D fish body positional data. Working with our collaborators, we are developing a 3-D tracking algorithm that will allow us to quantify how long fish spend in specific formations, and we currently have a new capability to record high-speed video of fish schooling moving in a flow tank for many hours (see our recent perspective by Ko et al., 2023 doi.org/10.1098/rsif.2023.0357). The new algorithms and the results will be published as separate studies and we think that these ongoing experiments are outside the scope of the current study with its focus on energetics. Nevertheless, the main point of Fig. 1 is to provide possible mechanisms to inspire future studies to dissect the detailed hydrodynamic mechanisms for energy saving, and the points raised by this comment are indeed extremely interesting to us and our ongoing experiments in this area. We provide a statement to clarify this point in the 1st paragraph of “Schooling dynamics and energy conservation” section.

      Further, there is a fundamental question on the comparison of schooling in a flow (like a stream or here flow channel) versus schooling in still water. While it is clear that from a pure physics point of view that the situation for individual fish is equivalent. As it is about maintaining a certain relative velocity to the fluid, I do think that it makes a huge qualitative difference from a biological point of view in the context of collective swimming. In a flow, individual fish have to align with the external flow to ensure that they remain stationary and do not fall back, which then leads to highly polarized schools. However, this high polarization is induced also for completely non-interacting fish. At high speeds, also the capability of individuals to control their relative position in the school is likely very restricted, simply by being forced to put most of their afford into maintaining a stationary position in the flow. This appears to me fundamentally different from schooling in still water, where the alignment (high polarization) has to come purely from social interactions. Here, relative positioning with respect to others is much more controlled by the movement decisions of individuals. Thus, I see clearly how this work is relevant for natural behavior in flows and that it provides some insights on the fundamental physiology, but I at least have some doubts about how far it extends actually to “voluntary” highly ordered schooling under still water conditions. Here, I would wish at least some more critical reflection and or explanation.

      Response: We agree completely with this comment that animal group orientations in still fluid can have different causes from their locomotion in a moving fluid. We very much agree with the reviewer that social interactions in still water, which typically involve low-speed locomotion and other behaviours such as searching for food by the group, can be important and could dictate fish movement patterns. In undertaking this project, we wanted to challenge fish to move at speed, and reasoned that if energy savings are important in schooling behaviour due to hydrodynamic mechanisms, we should see this when fish are moving forward against drag forces induced by fluid impacting the school. Drag forces scale as velocity squared, so we should see energy savings by the school, if any, as speed increases.

      We also quantified fish school swimming speeds in the field from the literature and presented a figure showing that in nature fish schools can and do move at considerable speeds. This figure is part of our overview on collective behaviour recently in J. Exp. Biol. (Zhang & Lauder 2023 doi:10.1242/jeb.245617). It is only by studying fish schools moving over a speed range that we can understand the performance curve relating energy use to swimming speed. Indeed, we wonder if fish moving in still water as a collective versus as solitary individuals would show energy savings at all. We now provided the justification for studying fish schooling in moving fluids in the second and third paragraph of the Introduction. When animals are challenged hydrodynamically (e.g. at higher speed), it introduces the need to save energy. Movement in still water lacks the need for fish to save energy. When fish do not need to save locomotor energy in still water, it is hard to justify why we would expect to observe energy saving and related physiological mechanisms in the first place. As the reviewer said, the ‘high polarization in still water has to come purely from social interactions’. Our study does not dispute this consideration, and indeed we agree with it! In our supplementary materials, we acknowledged the definitions for different scenarios of fish schooling can have different behavioural and ecological drivers. Using these definitions, we explicitly stated, in the introduction, that our study focuses on active and directional schooling behaviour to understand the possible hydrodynamic benefits of energy expenditure for collective movements of fish schools. By stating the scope of our study at the outset, we hope that this will keep the discussion focused on the energetics and kinematics of fish schools, without unnecessarily addressing other many possible reasons for fish schooling behaviours in the discussion such as anti-predator grouping, food searching, or reproduction as three examples.

      As this being said, we acknowledge (in the 2nd paragraph of the introduction) that fish schooling behaviour can have other drivers when the flow is not challenging. Also, there are robotic-&-animal interaction studies and computational fluid dynamic simulation studies (that we cited) that show individuals in fish schools interact hydrodynamically. Hydrodynamic interactions are not the same as behaviour interactions, but it does not mean individuals within the fish schooling in moving flow are not interacting and coordinating.

      Related to this, the reported increase in the elongation of the school at a higher speed could have also different explanations. The authors speculate briefly it could be related to the optimal structure of the school, but it could be simply inter-individual performance differences, with slower individuals simply falling back with respect to faster ones. Did the authors test for certain fish being predominantly at the front or back? Did they test for individual swimming performance before testing them in groups together? Again this should be at least critically reflected somewhere.

      Response: Thank you for raising this point. If the more streamlined schooling structure above 2 BL/s is due to the weaker individuals not catching up with the rest of the school, we would expect the weaker individuals to quit swimming tests well before 8 BL/s. However, we did not observe this phenomenon. Although we did not specifically test for the two questions the reviewer raises here, our results suggest that inter-individual variation in the swimming performance of giant Danio is not at the range of 2 to 8 BL/s (a 400% difference). While inter-individual differences certainly exist, we believe that they are small relative to the speeds tested as we did not see any particular individuals consistently unable to keep up with the school or certain individuals maintaining a position near the back of the school. As this being said, we provide additional interpretations for the elongated schooling structure at the end of the 2nd paragraph of the “schooling dynamics and energy conservation” section.

      Reviewer #1 (Recommendations For The Authors):

      Line 58: The authors write "How the fluid dynamics (...) enable energetic savings (...)". However, the paper focuses rather on the question of whether energetic savings exist and does not enlighten us on the dominant mechanisms. Although it gives a brief overview of all possible mechanisms, it remains speculative on the actual fluid dynamical and biomechanical processes. Thus, I suggest changing "How" to "Whether".

      Response: Great point! We changed “How” to “Whether”.

      Lines 129-140: In the discussion of the U-shaped aerobic rate, there is no direct comparison of the minimum cost values between the schooling and solitary conditions. Only the minimum costs during schooling are named/discussed. In addition to the data in the figure, I suggest explicitly comparing them as well for full transparency.

      Response: Thanks for raising this point. We did not belabor this point because there was no statistical significance. As requested, we added a statement to address this with statistics in the 1st paragraph of the Results section.

      Line 149: The authors note that the schooling fish have a higher turning frequency than solitary fish. Here, a brief discussion of potential explanations would be good, e.g. need for coordination with neighbors -> cost of schooling.

      Response: Thank you for the suggestion. In the original version of the manuscript, we discussed that the higher turning frequency could be related to higher postural costs for active stability adjustment at low speeds. As requested, we now added that high turn frequency can relate to the need for coordination with neighbours in the last paragraph of the “Aerobic metabolic rate–speed curve of fish schools” section. As indicated above, the suspected costs of coordination did not result in higher costs of schooling at the lower speed (< 2 BL s-1, where the turn frequency is higher).

      Line 151: The authors discuss the higher maximum metabolic rate of schooling fish as a higher aerobic performance and lower use of aerobic capacity. This may be confusing for non-experts in animal physiology and energetics of locomotion. I recommend providing somewhere in a paper an additional explanation to clarify it to non-experts. While lines 234-240 and further below potentially address this, I found this not very focused or accessible to non-experts. Here, I suggest the authors consider revisions to make it more comprehensible to a wider, interdisciplinary audience.

      Response: We agree with the reviewer that the difference between maximum oxygen uptake and maximum metabolic rate can be confusing. In fact, among animal physiologists, these two concepts are often muddled. One of the authors is working on an invited commentary from J. Exp. Biol. to clearly define these two concepts. We have made the language in the section “Schooling dynamics enhances aerobic performance and reduces non-aerobic energy use” more accessible to a general audience. In addition, the original version presented the relevant framework in the first and the second paragraphs of the Introduction when discussing aerobic and non-aerobic energy contribution. In brief, when vertebrates exhibit maximum oxygen uptake, they use aerobic and non-aerobic energy contributions that both contribute to their metabolic rate. Therefore, the maximum total metabolic rate is higher than the one estimated from only maximum oxygen uptake. We used the method presented in Fig. 3a to estimate the maximum metabolic rate for metabolic energy use (combining aerobic and non-aerobic energy use). In kinesiology, maximum oxygen uptake is used to evaluate the aerobic performance and energy use of human athletes is estimated by power meters or doubly labelled water.

      Line 211: The authors write that Danio regularly move within the school and do not maintain inter-individual positions. Given that this is an important observation, and the relative position and its changes are crucial to understanding the possible mechanisms for energetic savings in schools, I would expect some more quantitative support for this statement, in particular as the authors have access to 3d tracking data. For example introducing some simple metrics like average time intervals between swaps of nearest neighbors, possibly also resolved in directions (front+back versus right+left), should provide at least some rough quantification of the involved timescales, whether it is seconds, tens of seconds, or minutes.

      Response: As responded in the comment above, 3-D tracking of both body position and body deformation of multiple individuals in a school is not a trivial research challenge and we have ongoing research on this issue. We hope to have results on the 3D positions of fish in schools soon! For this manuscript, we believe that the data in Figure 4E which shows the turning frequency of fish in schools and solitary controls shows the general phenomenon of fish moving around (as fish turn to change positions within the school), but we agree that more could be done to address this point and we are indeed working on it now.

      Lines 212-217: There is a very strong statement that energetic savings by collective motion do not require fixed positional arrangements or specific kinematic features. While possibly one of the most interesting findings of the paper, I found that in its current state, it was not sufficiently/satisfactorily discussed. For example for the different mechanisms summarized, there will be clearly differences in their relevance based on relative distance and position. For example mechanisms 3 and 4 likely have significant contributions only at short distances. Here, the question is how relevant can they be if the average distance is 1 BL? Also, 1BL side by side is very much different from 1BL front to back, given the elongated body shape. For mechanisms 1 and 2, it appears relative positioning is quite important. Here, having maybe at least some information from the literature (if available) on the range of wall or push effects or the required precision in relative positioning for having a significant benefit would be very much desired. Also, do the authors suggest that a) these different effects overlap giving any position in the school a benefit, or b) that there are specific positions giving benefits due to different mechanisms and that fish "on purpose" switch only between these energetic "sweet" spots, I guess this what is towards the end referred to as Lighthill conjecture? Given the small group size I find a) rather unlikely, while b) actually also leads to a coordination problem if every fish is looking for a sweet spot. Overall, a related question is whether the authors observed a systematic change in leading individuals, which likely have no, or very small, hydrodynamic benefits.

      Response: Thank you for the excellent discussion on this point. As we responded above, we have softened the tone of the statement. In the original version, we were clear that the known mechanisms as summarized in Fig. 1 lead us to ‘expect’ that fish do not need to be in a fixed position to save energy.

      In general, current engineering/hydrodynamic studies suggest that any fish positioned within one body length (both upstream and downstream and side by side) will benefit from one or more of the hydrodynamic mechanisms that we expect will reduce energy costs, relative to a solitary individual. Our own studies using robotic systems suggest that a leading fish will experience an added mass “push” from a follower when the follower is located within roughly ½ body length behind the leader. We cited a Computational Fluid Dynamic (CFD) study about the relative distance among individuals for energy saving to be in effect. Please keep in mind that CFD simulation is a simplified model of the actual locomotion of fish and involves many assumptions and currently only resolves the time scale of seconds (see commentary of Zhang & Lauder 2023 doi:10.1242/jeb.245617 in J. Exp. Biol. for the current challenges of CFD simulation). To really understand the dynamic positions of fish within the school, we will need 3-D tracking of fish schools with tools that are currently being developed. Ideally, we would also have simultaneous energetic measurements, but of course, this is enormously challenging and it is not clear at this time how to accomplish this.

      We certainly agree that the relative positions of fish (vertically staggered or in-line swimming) do affect the specific hydrodynamic mechanisms being used. We cited the study that discussed this, but the relative positions of fish remain an active area of research. More studies will be out next few years to provide more insight into the effects of the relative positions of fish in energy saving. The Lighthill conjecture is observed in flapping foils and whether fish schools use the Lighthill conjecture for energy saving is an active area of research but still unclear. We also provided a citation about the implication of the Lighthill conjecture on fish schools. Hence, our original version stated ‘The exact energetic mechanisms….would benefit from more in-depth studies’. We agree with the reviewer that not all fish can benefit Lighthill conjecture (if fish schools use it) at any given time point, hence the fish might need to rotate in using the Lighthill conjecture. This is one more explanation for the dynamic positioning of fish in a school.

      Overall, in response to the question raised, we do not believe that fish are actively searching for “sweet spots” within the school, although this is only speculation on our part. We believe instead that fish, located in a diversity of positions within the school, get the hydrodynamic advantage of being in the group at that configuration.

      We believe that fish, once they group and maintain a grouping where individuals are all within around one body length distance from each other, will necessarily get hydrodynamic benefits. As a collective group, we believe that at any one time, several different hydrodynamic mechanisms are all acting simultaneously and result in reduced energetic costs (Fig. 1).

      Figure 4E: The y-axis is given in the units of 10-sec^-1 which is confusing is it 10 1/s or 1/(10s)? Why not use simply the unit of 1/s which is unambiguous?

      Response: Thank you for the suggestions. We counted the turning frequency over the course of 10 seconds. To reflect more accurately on what we did, we used the suggested unit of 1/(10s) to more correctly correspond to how we made the measurements and the duration of the measurement. We recognize that this is a bit non-standard but would like to keep these units if possible.

      Figure 4F: The unit in the school length is given in [mm], which suggests that the maximal measured school length is 4mm, this can't be true.

      Response: Thank you for pointing this out. The unit should be [cm], which we corrected.

      Reviewer #2 (Public Review):

      Summary:

      This paper tests the idea that schooling can provide an energetic advantage over solitary swimming. The present study measures oxygen consumption over a wide range of speeds, to determine the differences in aerobic and anaerobic cost of swimming, providing a potentially valuable addition to the literature related to the advantages of group living.

      Response: Thank you for acknowledging our contribution is a valuable addition to the literature on collective movement by animals.

      Strengths:

      The strength of this paper is related to providing direct measurements of the energetics (oxygen consumption) of fish while swimming in a group vs solitary. The energetic advantages of schooling have been claimed to be one of the major advantages of schooling and therefore a direct energetic assessment is a useful result.

      Response: Thank you for acknowledging our results are useful and provide direct measurements of energetics to prove a major advantage of schooling relative to solitary motion over a range of speeds.

      Weaknesses:

      The manuscript suffers from a number of weaknesses which are summarised below:

      1) The possibility that fish in a school show lower oxygen consumption may also be due to a calming effect. While the authors show that there is no difference at low speed, one cannot rule out that calming effects play a more important role at higher speed, i.e. in a more stressful situation.

      Response: Thank you for raising this creative point on “calming”. When vertebrates are moving at high speeds, their stress hormones (adrenaline, catecholamines & cortisol) increase. This phenomenon has been widely studied, and therefore, we do not believe that animals are ‘calm’ when moving at high speed and that somehow a “calming effect” explains our non-linear concave-upward energetic curves. “Calming” would have to have a rather strange non-linear effect over speed to explain our data, and act in contrast to known physiological responses involved in intense exercise (whether in fish or humans). It is certainly not true for humans that running at high speeds in a group causes a “calming effect” that explains changes in metabolic energy expenditure. We have added an explanation in the third paragraph in the section “Schooling dynamics enhances aerobic performance and reduces non-aerobic energy use”. Moreover, when animal locomotion has a high frequency of appendage movement (for both solitary individual and group movement), they are also not ‘calm’ from a behavioural point of view. Therefore, we respectfully disagree with the reviewer that the ‘calming effect’ is a major contributor to the energy saving of group movement at high speed. It is difficult to believe that giant danio swimming at 8 BL/s which is near or at their maximal sustainable locomotor limits are somehow “calm”. In addition, we demonstrated by direct energetic measurement that solitary individuals do not have a higher metabolic rate at the lower speed and thus directly show that there is very likely no cost of “uncalm” stress that would elevate the metabolic rate of solitary individuals. Furthermore, the current version of this manuscript compared the condition factor of the fish in the school and solitary individuals and found no difference (see Experimental Animal Section in the Methods). This also suggests that the measurement on the solitary fish is likely not confounded by any stress effects.

      Finally, and as discussed further below, since we have simultaneous high-speed videos of fish swimming as we measure oxygen consumption at all speeds, we are able to directly measure fish behaviour. Since we observed no alteration in tail beat kinematics between schools and individuals (a key result that we elaborate on below), it’s very hard to justify that a “calming” effect explains our results. Fish in schools swimming at speed (not in still water) appear to be just as “calm” as solitary individuals.

      2) The ratio of fish volume to water volume in the respirometer is much higher than that recommended by the methodological paper by Svendsen et al. (J Fish Biol 2016) Response: The ratio of respirometer volume to fish volume is an important issue that we thought about in detail before conducting these experiments. While Svendsen et al., (J. Fish Biol. 2016) recommend a respirometer volume-to-fish volume ratio of 500, we are not aware of any experimental study comparing volumes with oxygen measuring accuracy that gives this number as optimal. In addition, the Svendsen et al. paper does not consider that their recommendation might result in fish swimming near the walls of the flume (as a result of having relatively larger fish volume to flume volume) and hence able to alter their energetic expenditure by being near the wall. In our case, we needed to be able to study both a school (with higher animal volumes) and an individual (relatively lower volume) in the same exact experimental apparatus. Thus, we had to develop a system to accurately record oxygen consumption under both conditions.

      The ratio of our respirometer to individual volume for schools is 693, while the value for individual fish is 2200. Previous studies (Parker 1973, Abrahams & Colgan, 1985, Burgerhout et al., 2013) that used a swimming-tunnel respirometer (i.e., a sealed treadmill) to measure the energy cost of group locomotion used values that range between 1116 and 8894 which are large and could produce low-resolution measurements of oxygen consumption. Thus, we believe that we have an excellent ratio for our experiments on both schools and solitary individuals, while maintaining a large enough value that fish don’t experience wall effects (see more discussion on this below, as we experimentally quantified the flow pattern within our respirometer).

      The goal of the recommendation by Svendsen et al. is to achieve a satisfactory R2 (coefficient of determination) value for oxygen consumption data. However, Chabot et al., 2020 (DOI: 10.1111/jfb.14650) pointed out that only relying on R2 values is not always successful at excluding non-linear slopes. Much worse, only pursuing high R2 values has a risk of removing linear slopes with low R2 only because of a low signal-to-noise ratio and resulting in an overestimation of the low metabolic rate. Although we acknowledge the excellent efforts and recommendations provided by Svendsen et al., 2016, we perhaps should not treat the ratio of respirometer to organism volume of 500 as the gold standard for swim-tunnel respirometry. Svendsen et al., 2020 did not indicate how they reached the recommendation of using the ratio of respirometer to organism volume of 500. Moreover, Svendsen et al., 2020 stated that using an extended measuring period can help to resolve the low signal-to-noise ratio. Hence, the key consideration is to obtain a reliable signal-to-noise ratio which we will discuss below.

      To ensure we obtain reliable data quality, we installed a water mixing loop (Steffensen et al., 1984) and used the currently best available technology of oxygen probe (see method section of Integrated Biomechanics & Bioenergetic Assessment System) to improve the signal-to-noise ratio. The water mixing loop is not commonly used in swim-tunnel respirometer. Hence, if a previously published study used a respirometer-to-organism ratio up to 8894, our updated oxygen measuring system is completely adequate to produce reliable signal-to-noise ratios in our system with a respirometer-to-organism ratio of 2200 (individuals) and 693 (schools). In fact, our original version of the manuscript used a published method (Zhang et al., 2019, J. Exp. Biol. https://doi.org/10.1242/jeb.196568) to analyze the signal-to-noise ratio and provided the quantitative approach to determine the sampling window to reliably capture the signal (Fig. S5).

      3) Because the same swimming tunnel was used for schools and solitary fish, schooling fish may end up swimming closer to the wall (because of less volume per fish) than solitary fish. Distances to the wall of schooling fish are not given, and they could provide an advantage to schooling fish.

      Response: This is an issue that we considered carefully in designing these experiments. After considering the volume of the respirometer and the size of the fish (see the response above), we decided to use the same respirometer to avoid any other confounding factors when using different sizes of respirometers with potentially different internal flow patterns. In particular, different sizes of Brett-type swim-tunnel respirometers differ in the turning radius of water flow, which can produce different flow patterns in the swimming section. Please note that we quantified the flow pattern within the flow tank using particle image velocimetry (PIV) (so we have quantitative velocity profiles across the working section at all tested speeds), and modified the provided baffle system to improve the flow in the working section.

      Because we took high-speed videos simultaneously with the respirometry measurements, we can state unequivocally that individual fish within the school did not swim closer to the walls than solitary fish over the testing period (see below for the quantitative measurements of the boundary layer). Indeed, many previous respirometry studies do not obtain simultaneous video data and hence are unable to document fish locations when energetics is measured.

      In studying schooling energetics, we believe that it is important to control as many factors as possible when making comparisons between school energetics and solitary locomotion. We took great care as indicated in the Methods section to keep all experimental parameters the same (same light conditions, same flow tank, same O2 measuring locations with the internal flow loop, etc.) so that we could detect differences if present. Changing the flow tank respirometer apparatus between individual fish and the schools studied would have introduced an unacceptable alteration of experimental conditions and would be a clear violation of the best experimental practices.

      We have made every effort to be clear and transparent about the choice of experimental apparatus and explained at great length the experimental parameters and setup used, including the considerations about the wall effect in the extended Methods section and supplemental material provided.

      Our manuscript provides the measurement of the boundary layer (<2.5 mm at speeds > 2 BL s-1) in the methods section of the Integrated Biomechanics & Bioenergetic Assessment System. We also state that the boundary layer is much thinner than the body width of the giant danio (~10 mm) so that the fish cannot effectively hide near the wall. Due to our PIV calibration, we are able to quantify flow near the wall.

      In the manuscript, we also provide details about the wall effects and fish schools as follows from the manuscript: ”…the convex hull volume of the fish school did not change as speed increased, suggesting that the fish school was not flattening against the wall of the swim tunnel, a typical feature when fish schools are benefiting from wall effects. In nature, fish in the centre of the school effectively swim against a ‘wall’ of surrounding fish where they can benefit from hydrodynamic interactions with neighbours.”’ The notion that the lateral motion of surrounding slender bodies can be represented by a streamlined wall was also proposed by Newman et al., 1970 J. Fluid Mech. These considerations provide ample justification for the comparison of locomotor energetics by schools and solitary individuals.

      4) The statistical analysis has a number of problems. The values of MO2 of each school are the result of the oxygen consumption of each fish, and therefore the test is comparing 5 individuals (i.e. an individual is the statistical unit) vs 5 schools (a school made out of 8 fish is the statistical unit). Therefore the test is comparing two different statistical units. One can see from the graphs that schooling MO2 tends to have a smaller SD than solitary data. This may well be due to the fact that schooling data are based on 5 points (five schools) and each point is the result of the MO2 of five fish, thereby reducing the variability compared to solitary fish. Other issues are related to data (for example Tail beat frequency) not being independent in schooling fish.

      Response: We cannot agree with the reviewer that fish schools and solitary individuals are different statistical units. Indeed, these are the two treatments in the statistical sense: a school versus the individual. This is why we invested extra effort to replicate all our experiments on multiple schools of different individuals and compare the data to multiple different solitary individuals. This is a standard statistical approach, whether one is comparing a tissue with multiple cells to an individual cell, or multiple locations to one specific location in an ecological study. Our analysis treats the collective movement of the fish school as a functional unit, just like the solitary individual is a functional unit. At the most fundamental level of oxygen uptake measurements, our analysis results from calculating the declining dissolved oxygen as a function of time (i.e. the slope of oxygen removal). Comparisons are made between the slope of oxygen removal by fish schools and the slope of oxygen removal by solitary individuals. This is the correct statistical comparison.

      The larger SD in individuals can be due to multiple biological reasons other than the technical reasons suggested here. Fundamentally, the different SD between fish schools and individuals can be the result of differences between solitary and collective movement and the different fluid dynamic interactions within the school could certainly cause differences in the amount of variation seen. Our interpretation of the ‘numerically’ smaller SD in fish schools than that of solitary individuals suggests that interesting hydrodynamic phenomena within fish schools remain to be discovered.

      Reviewer #2 (Recommendations For The Authors):

      I have reviewed a previous version of this paper. This new draft is somewhat improved but still presents a number of issues which I have outlined below.

      Response: Thanks for your efforts to improve our paper with reviews, but a number of your comments apply to the previous version of the paper, and we have made a number of revisions before submitting it to eLife. We explain below how this version of the manuscript addresses many of your comments from both the previous and current reviews. As readers can see from our responses below, this version of the manuscript version no longer uses only ‘two-way ANOVA’ as we have implemented an additional statistical model. (Please see the comments below for more detailed responses related to the statistical models).

      1) One of the main problems, and one of the reasons (see below) why many previous papers have measured TBF and not the oxygen consumption of a whole school, is that schooling also provides a calming effect (Nadler et al 2018) which is not easily differentiated from the hydrodynamic advantages (Abraham and Colgan 1985). This effect can reduce the MO2 while swimming and the EPOC when recovering. The present study does not fully take this potential issue into account and therefore its results are confounded by such effects. The authors state (line 401) that " the aerobic locomotion cost of solitary individuals showed no statistical difference from (in fact, being numerically lower) that of fish schools at a very low testing speed. The flow speed is similar to some areas of the aerated home aquarium for each individual fish. This suggests that the stress of solitary fish likely does not meaningfully contribute to the higher locomotor costs". While this is useful, the possibility that at higher speeds (i.e. a more stressful situation) solitary fish may experience more stress than fish in a school, cannot be ruled out.

      Response: Thank you for finding our results and data useful. We have addressed the comments on calming or stress effects in our response above. The key point is that either solitary or school fish are challenged (i.e. stressed) at a high speed where the sizable increases in stress hormones are well documented in the exercise physiology literature. We honestly just do not understand how a “calming” effect could possibly explain the upward concave energetic curves that we obtained, and how “calming” could explain the difference between schools and solitary individuals. Since we have simultaneous high-speed videos of fish swimming as we measure oxygen consumption at all speeds, we are able to directly observe fish behaviour. It is not exactly clear what a “calming effect” would look like kinematically or how one would measure this experimentally, but since we observed no alteration in tail beat kinematics between schools and individuals (a key result that we elaborate on below), it’s very hard to justify that a “calming” effect explains our results. Fish in schools appear to be just as “calm” as solitary individuals.

      If the reviewer's “calming effect” is a general issue, then birds flying in a V-formation should also experience a “calming effect”, but at least one study shows that birds in a V-formation experience higher wing beat frequencies.

      In addition, Nalder et al., 2018 (https://doi.org/10.1242/bio.031997) did not study any such “calming effect”. We assume the reviewer is referring to Nalder et al., 2016, which showed that shoaling reduced fish metabolic rates in a resting respirometer that has little-to-no water current that would motivate fish to swim (which is very different from the swim-tunnel respirometer we used). Moreover, the inter-loop system used by Nalder et al., 2016 has the risk of mixing the oxygen uptake of the fish shoal and solitary individuals. Hence, we believe that it is not appropriate to extend the results of Nalder et al., 2016 to infer and insist on a calming effect for fish schools that we studied which are actively and directionally swimming over a wide speed range up to and including high speeds. Especially since our data clearly show that ‘the aerobic locomotion cost of solitary individuals showed no statistical difference from (in fact, being numerically lower) that of fish schools at very low testing speeds’. More broadly, shoaling and schooling are very different in terms of polarization as well as the physiological and behavioural mechanisms used in locomotion. Shoaling behaviour by fish in still water is not the same as active directional schooling over a speed range. Our supplementary Table 1 provides a clear definition for a variety of grouping behaviours and makes the distinction between shoaling and schooling.

      Our detailed discussion about other literature mentioned by this reviewer can be seen in the comments below.

      2) The authors overstate the novelty of their work. Line 29: "Direct energetic measurements demonstrating the 30 energy-saving benefits of fluid-mediated group movements remain elusive" The idea that schooling may provide a reduction in the energetic costs of swimming dates back to the 70s, with pioneering experimental work showing a reduction in tail beat frequency in schooling fish vs solitary (by Zuyev, G. V. & Belyayev, V. V. (1970) and theoretical work by Weihs (1973). Work carried out in the past 20 years (Herskin and Steffensen 1998; Marras et al 2015; Bergerhout et al 2013; Hemelrijk et al 2014; Li et al 2021, Wiwchar et al 2017; Verma et al 2018; Ashraf et al 2019) based on a variety of approaches has supported the idea of a reduction in swimming costs in schooling vs solitary fish. In addition, group respirometry has actually been done in early and more recent studies testing the reduction in oxygen consumption as a result of schooling (Parker, 1973; Itazawa et al., 1978; Abrahams and Colgan 1985; Davis & Olla, 1992; Ross & Backman, 1992, Bergerhout et al 2013; Currier et al 2020). Specifically, Abrahams and Colgan (1985) and Bergerhout et al (2013) found that the oxygen consumption of fish swimming in a school was higher than when solitary, and Abrahams and Colgan (1985) made an attempt to deal with the confounding calming effect by pairing solitary fish up with a neighbor visible behind a barrier. These issues and how they were dealt with in the past (and in the present manuscript) are not addressed by the present manuscript. Currier et al (2020) found that the reduction of oxygen consumption was species-specific.

      Response: We cannot agree with this reviewer that we have overstated the novelty of our work, and, in fact, we make very specific comments on the new contributions of our paper relative to the large previous literature on schooling. We are well aware of the literature cited above and many of these papers have little or nothing to do with quantifying the energetics of schooling. In addition, many of these papers rely on simple kinematic measurements which are unrelated to direct energetic measurements of energy use. To elaborate on this, we present the ‘Table R’ below which evaluates and compares each of the papers this reviewer cites above. The key message (as we wrote in the manuscript) is that none of the previous studies measured non-aerobic cost (and thus do not calculate the total energy expenditure (TEE), which we show to be substantial. In addition, many of these studies do not compare schools to individuals, do not quantify both energetics and kinematics, and do not study a wide speed range. Only 33% of previous studies used direct measurements of aerobic metabolic rate to compare the locomotion costs of fish schools and solitary individuals (an experimental control). We want to highlight that most of the citations in the reviewer’s comments are not about the kinematics or hydrodynamics of fish schooling energetics, although they provide peripheral information on fish schooling in general. We also provide an overview of the literature on this topic in our paper in the Journal of Experimental Biology (Zhang & Lauder 2023 doi:10.1242/jeb.245617) and do not wish to duplicate that discussion here. We summarized and cited the relevant papers about the energetics of fish schooling in Table 1.

      Author response table 1.

      Papers cited by Reviewer #2, and a summary of their contributions and approach.

      References cited above:

      Zuyev, G., & Belyayev, V. V. (1970). An experimental study of the swimming of fish in groups as exemplified by the horsemackerel [Trachurus mediterraneus ponticus Aleev]. J Ichthyol, 10, 545-549.

      Weihs, D. (1973). Hydromechanics of fish schooling. Nature, 241(5387), 290-291.

      Herskin, J., & Steffensen, J. F. (1998). Energy savings in sea bass swimming in a school: measurements of tail beat frequency and oxygen consumption at different swimming speeds. Journal of Fish Biology, 53(2), 366-376.

      Marras, S., Killen, S. S., Lindström, J., McKenzie, D. J., Steffensen, J. F., & Domenici, P. (2015). Fish swimming in schools save energy regardless of their spatial position. Behavioral ecology and sociobiology, 69, 219-226.

      Burgerhout, E., Tudorache, C., Brittijn, S. A., Palstra, A. P., Dirks, R. P., & van den Thillart, G. E. (2013). Schooling reduces energy consumption in swimming male European eels, Anguilla anguilla L. Journal of experimental marine biology and ecology, 448, 66-71.

      Hemelrijk, C. K., Reid, D. A. P., Hildenbrandt, H., & Padding, J. T. (2015). The increased efficiency of fish swimming in a school. Fish and Fisheries, 16(3), 511-521.

      Li, L., Nagy, M., Graving, J. M., Bak-Coleman, J., Xie, G., & Couzin, I. D. (2020). Vortex phase matching as a strategy for schooling in robots and in fish. Nature communications, 11(1), 5408.

      Wiwchar, L. D., Gilbert, M. J., Kasurak, A. V., & Tierney, K. B. (2018). Schooling improves critical swimming performance in zebrafish (Danio rerio). Canadian Journal of Fisheries and Aquatic Sciences, 75(4), 653-661.

      Verma, S., Novati, G., & Koumoutsakos, P. (2018). Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences, 115(23), 5849-5854.

      Ashraf, I., Bradshaw, H., Ha, T. T., Halloy, J., Godoy-Diana, R., & Thiria, B. (2017). Simple phalanx pattern leads to energy saving in cohesive fish schooling. Proceedings of the National Academy of Sciences, 114(36), 9599-9604.

      Parker Jr, F. R. (1973). Reduced metabolic rates in fishes as a result of induced schooling. Transactions of the American Fisheries Society, 102(1), 125-131.

      Itazawa, Y., & Takeda, T. (1978). Gas exchange in the carp gills in normoxic and hypoxic conditions. Respiration physiology, 35(3), 263-269.

      Abrahams, M. V., & Colgan, P. W. (1985). Risk of predation, hydrodynamic efficiency and their influence on school structure. Environmental Biology of Fishes, 13, 195-202.

      Davis, M. W., & Olla, B. L. (1992). The role of visual cues in the facilitation of growth in a schooling fish. Environmental biology of fishes, 34, 421-424.

      Ross, R. M., Backman, T. W., & Limburg, K. E. (1992). Group-size-mediated metabolic rate reduction in American shad. Transactions of the American Fisheries Society, 121(3), 385-390.

      Currier, M., Rouse, J., & Coughlin, D. J. (2021). Group swimming behaviour and energetics in bluegill Lepomis macrochirus and rainbow trout Oncorhynchus mykiss. Journal of Fish Biology, 98(4), 1105-1111.

      Halsey, L. G., Wright, S., Racz, A., Metcalfe, J. D., & Killen, S. S. (2018). How does school size affect tail beat frequency in turbulent water?. Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology, 218, 63-69.

      Johansen, J. L., Vaknin, R., Steffensen, J. F., & Domenici, P. (2010). Kinematics and energetic benefits of schooling in the labriform fish, striped surfperch Embiotoca lateralis. Marine Ecology Progress Series, 420, 221-229.

      3) In addition to the calming effect, measuring group oxygen consumption suffers from a number of problems as discussed in Herskin and Steffensen (1998) such as the fish volume to water volume ratio, which varies considerably when testing a school vs single individuals in the same tunnel and the problem of wall effect when using a small volume of water for accurate O2 measurements. Herskin and Steffensen (1998) circumvented these problems by measuring tailbeat frequencies of fish in a school and then calculating the MO2 of the corresponding tailbeat frequency in solitary fish in a swim tunnel. A similar approach was used by Johansen et al (2010), Marras et al (2015), Halsey et al (2018). However, It is not clear how these potential issues were dealt with here. Here, larger solitary D. aequipinnatus were used to increase the signal-to-noise ratio. However, using individuals of different sizes makes other variables not so directly comparable, including stress, energetics, and kinematics. (see comment 7 below).

      Response: We acknowledge the great efforts made by previous studies to understand the energetics of fish schooling. These studies, as detailed in the table and elaborated in the response above (see comment 2) are very different from our current study. Our study achieved a direct comparison of energetics (including both aerobic and non-aerobic cost) and kinematics between solitary individuals and fish schools that has never been done before. Our detailed response to the supposed “calming effect” is given above.

      As highlighted in the previous comments and opening statement, our current version has addressed the wall effect, tail beat frequency, and experimental and analytical efforts invested to directly compare the energetics between fish schools and solitary individuals. As readers can see in our comprehensive method section, achieving the direct comparison between solitary individuals and fish schools is not a trivial task. Now we want to elaborate on the role of kinematics as an indirect estimate of energetics. Our results here show that kinematic measurements of tail beat frequency are not reliable estimates of energetic cost, and the previous studies cited did not measure EPOC and those costs are substantial, especially as swimming speed increases. Fish in schools can save energy even when the tail beat frequency does not change (although school volume can change as we show). We elaborated (in great detail) on why kinematics does not always reflect on the energetics in the submitted version (see last paragraph of “Schooling dynamics and energy conservation” section). Somehow modeling what energy expenditure should be based only on tail kinematics is, in our view, a highly unreliable approach that has never been validated (e.g., fish use more than just tails for locomotion). Indeed, we believe that this is an inadequate substitute for direct energy measurements. We disagree that using slightly differently sized individuals is an issue since we recorded fish kinematics across all experiments and included the measurements of behaviour in our manuscript. Slightly altering the size of individual fish was done on purpose to provide a better ratio of respirometer volume to fish volume in the tests on individual fish, thus we regard this as a benefit of our approach and not a concern.

      Finally, in another study of the collective behaviour of flying birds (Usherwood, J. R., Stavrou, M., Lowe, J. C., Roskilly, K. and Wilson, A. M. (2011). Flying in a flock comes at a cost in pigeons. Nature 474, 494-497), the authors observed that wing beat frequency can increase during flight with other birds. Hence, again, we cannot regard movement frequency of appendages as an adequate substitute for direct energetic measurements.

      4) Svendsen et al (2016) provide guidelines for the ratio of fish volume to water volume in the respirometer. The ratio used here (2200) is much higher than that recommended. RFR values higher than 500 should be avoided in swim tunnel respirometry, according to Svendsen et al (2016).

      Response: Thank you for raising this point. Please see the detailed responses above to the same comment above. We believe that our experimental setup and ratios are very much in line with those recommended, and represent a significant improvement on previous studies which use large ratios.

      5) Lines 421-436: The same goes for wall effects. Presumably, using the same size swim tunnel, schooling fish were swimming much closer to the walls than solitary fish but this is not specifically quantified here in this paper. Lines 421-436 provide some information on the boundary layer (though wall effects are not just related by the boundary layer) and some qualitative assessment of school volume. However, no measurement of the distance between the fish and the wall is given.

      Response: Please see the detailed responses above to the same comment. Specifically, we used the particle image velocimetry (PIV) system to measure the boundary layer (<2.5 mm at speeds > 2 BL s-1) and stated the parameters in the methods section of the Integrated Biomechanics & Bioenergetic Assessment System. We also state that the boundary layer is much thinner than the body width of the giant danio (~10 mm) so that the fish cannot effectively hide near the wall. Due to our PIV calibration, we are able to quantify flow near the wall.

      Due to our video data obtained simultaneously with energetic measurements, we do not agree that fish were swimming closer to the wall in schools and also note that we took care to modify the typical respirometer to both ensure that flow across the cross-section did not provide any refuges and to quantify flow velocities in the chamber using particle image velocimetry. We do not believe that any previous experiments on schooling behaviour in fish have taken the same precautions.

      6) The statistical tests used have a number of problems. Two-way ANOVA was based on school vs solitary and swimming speed. However, there are repeated measures at each speed and this needs to be dealt with. The degrees of freedom of one-way ANOVA and T-tests are not provided. These tests took into account five groups of fish vs. five solitary fish. The values of MO2 of each school are the result of the oxygen consumption of each fish, and therefore the test is comparing 5 individuals (i.e. an individual is the statistical unit) vs 5 schools (a school made out of 8 fish is the statistical unit). Therefore the test is comparing two different statistical units. One can see from the graphs that schooling MO2 tend to have a smaller SD than solitary data. This may well be due to the fact that schooling data are based on 5 points (five schools) and each point is the result of the MO2 of five fish, thereby reducing the variability compared to solitary fish. TBF, on the other hand, can be assigned to each fish even in a school, and therefore TBF of each fish could be compared by using a nested approach of schooling fish (nested within each school) vs solitary fish, but this is not the statistical procedure used in the present manuscript. The comparison between TBFs presumably is comparing 5 individuals vs all the fish in the schools (6x5=30 fish). However, the fish in the school are not independent measures.

      Response: We cannot agree with this criticism, which may be based on this reviewer having seen a previous version of the manuscript. We did not use two-way ANOVA in this version. This version of the manuscript reported the statistical value based on a General Linear Model (see statistical section of the method). We are concerned that this reviewer did not in fact read either the Methods section or the Results section. In addition, it is hard to accept that, from examination of the data shown in Figure 3, there is not a clear and large difference between schooling and solitary locomotion, regardless of the statistical test used.

      Meanwhile, the comments about the ‘repeated’ measures from one speed to the next are interesting, but we cannot agree. The ‘repeated’ measures are proper when one testing subject is assessed before and after treatment. Going from one speed to the next is not a treatment. Instead, the speed is a dependent and continuous variable. In our experimental design, the treatment is fish school, and the control is a solitary individual. Second, we never compared any of our dependent variables across different speeds within a school or within an individual. Instead, we compared schools and individuals at each speed. In this comparison, there are no ‘repeated’ measures. We agree with the reviewer that fish in the school are interacting (not independent). This is one more reason to support our approach of treating fish schools as a functional and statistical unit in our experiment design (more detailed responses are stated in the response to the comment above).

      7) The size of solitary and schooling individuals appears to be quite different (solitary fish range 74-88 cm, schooling fish range 47-65 cm). While scaling laws can correct for this in the MO2, was this corrected for TBF and for speed in BL/s? Using BL/s for speed does not completely compensate for the differences in size.

      Response: Our current version has provided justifications for not conducting scaling in the values of tail beat frequency. Our justification is “The mass scaling for tail beat frequency was not conducted because of the lack of data for D. aequipinnatus and its related species. Using the scaling exponent of distant species for mass scaling of tail beat frequency will introduce errors of unknown magnitude.”. Our current version also acknowledges the consideration about scaling as follows: “Fish of different size swimming at 1 BL s-1 will necessarily move at different Reynolds numbers, and hence the scaling of body size to swimming speed needs to be considered in future analyses of other species that differ in size”

      Reviewer #3 (Public Review):

      Summary:

      Zhang and Lauder characterized both aerobic and anaerobic metabolic energy contributions in schools and solitary fishes in the Giant danio (Devario aequipinnatus) over a wide range of water velocities. By using a highly sophisticated respirometer system, the authors measure the aerobic metabolisms by oxygen uptake rate and the non-aerobic oxygen cost as excess post-exercise oxygen consumption (EPOC). With these data, the authors model the bioenergetic cost of schools and solitary fishes. The authors found that fish schools have a J-shaped metabolism-speed curve, with reduced total energy expenditure per tail beat compared to solitary fish. Fish in schools also recovered from exercise faster than solitary fish. Finally, the authors conclude that these energetic savings may underlie the prevalence of coordinated group locomotion in fish.

      The conclusions of this paper are mostly well supported by data, but some aspects of methods and data acquisition need to be clarified and extended.

      Response: Thank you for seeing the value of our study. We provided clarification of the data acquisition system with a new panel of pictures included in the supplemental material to show our experimental system. We understand that our methods have more details and justifications than the typical method sections. First, the details are to promote the reproducibility of the experiments. The justifications are the responses to reviewer 2, who reviewed our previous manuscript version and also posted the same critiques after we provided the justifications for the construction of the system and the data acquisition.

      Strengths:

      This work aims to understand whether animals moving through fluids (water in this case) exhibit highly coordinated group movement to reduce the cost of locomotion. By calculating the aerobic and anaerobic metabolic rates of school and solitary fishes, the authors provide direct energetic measurements that demonstrate the energy-saving benefits of coordinated group locomotion in fishes. The results of this paper show that fish schools save anaerobic energy and reduce the recovery time after peak swimming performance, suggesting that fishes can apport more energy to other fitness-related activities whether they move collectively through water.

      Response: Thank you. We are excited to share our discoveries with the world.

      Weaknesses:

      Although the paper does have strengths in principle, the weakness of the paper is the method section. There is too much irrelevant information in the methods that sometimes is hard to follow for a researcher unfamiliar with the research topic. In addition, it was hard to imagine the experimental (respirometer) system used by the authors in the experiments; therefore, it would be beneficial for the article to include a diagram/scheme of that respiratory system.

      Response: We agree with the reviewer and hence added the pictures of the experimental system in the supplementary materials (Fig. S4). We think pictures are more realistic to present the system than schematics. We also provide a picture of the system during the process of making the energetic measurements. It is to show the care went to ensure fish are not affected by any external stimulation other than the water velocity. The careful experimental protocol is very critical to reveal the concave upward shaped curve of bony fish schools that was never reported before. Many details in the methods have been included in response to Reviewer 2.

      Reviewer #3 (Recommendations For The Authors):

      Overall, this is a very interesting, well-written, and nice article. However, many times the method section looks like a discussion. Furthermore, the authors need to check the use of the word "which" throughout the text. I got the feeling that it is overused/misused sometimes.

      Response: Thank you for the positive comments. The method is written in that way to address the concerns of Reviewer 2 who reviewed our previous versions. We corrected the overuse of ‘which’ throughout the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      Receptor tyrosine kinases such as ALK play critical roles during appropriate development and behaviour and are nodal in many disease conditions, through molecular mechanisms that weren't completely understood. This manuscript identifies a previously unknown neuropeptide precursor as a downstream transcriptional target of Alk signalling in Clock neurons in the Drosophila brain. The experiments are well designed with attention to detail, the data are solid and the findings will be useful to those interested in events downstream of signalling by receptor tyrosine kinases.

      Authors response: We thank the reviewers for this assessment of our Manuscript. We are happy to accept the current eLife assessment of our manuscript. In our revised manuscript we have addressed all of the major reviewer comments, including additional experiments suggested by the reviewers, which have significantly strengthened the revised version.

      Reviewer #1 (Public Review):

      Sukumar et al build on a body of work from the Palmer lab that seeks to unravel the transcriptional targets of Alk signaling (a receptor tyrosine kinase). Having uncovered its targets in the mesoderm in an earlier study, they seek to determine its targets in the central nervous system. To do this, they use Targeted DamID (TaDa) in the wild-type and Alk dominant negative background and identify about 1700 genes that might be under the control of Alk signalling. Using their earlier data and applying a set of criteria - upregulated in gain-of-Alk, downregulated in loss-of-Alk, and co-expressed with Alk positive cells in single cell datasets - they arrive upon a single gene, Sparkly, which is predicted to be a neuropeptide precursor.

      They generate antibodies and mutants for Sparkly and determine that it is responsive to Alk signalling and is expressed in many neuroendocrine cells, as well as in clock neurons. Though the mutants survive, they have reduced lifespans and are hyperactive. In summary, the authors identify a previously unidentified transcriptional target of Alk signalling, which is likely cleaved into a neuropeptide and is involved in regulating circadian activity.

      The data support claims made, are generally well presented and the manuscript clearly written. The link between circadian control of Alk signalling in Clock neurons > Spar expression > ultimately controlling circadian activity, however, was not clear.

      Authors response: We thank the reviewer for this through reading of our manuscript and for kindly highlighting the important takeaways from the study. The role of Alk signalling in activity, circadian rhythm and sleep has previously been reported by other groups in the following studies – (Bai and Sehgal, 2015; Weiss et al, 2017; Gouzi, Bouraimi et al 2018), which we have discussed in our manuscript. We also have identified a hyperactivity phenotype in our Alk CNS specific loss-of-function allele, AlkRA, which is similar to the Spar loss-of-function mutant phenotype. We hypothesize that one of ways in which Alk signalling regulates fly activity is through regulating Spar gene expression in neuroendocrine cells. This is supported by our data which shows Alk expression in Clock neurons, as well by the new experimental data showing an activity phenotype in flies expressing Spar RNAi driven by the Clk678-Gal4 driver.

      Reviewer #2 (Public Review):

      This manuscript illustrates the power of "combined" research, incorporating a range of tools, both old and new to answer a question. This thorough approach identifies a novel target in a well-established signalling pathway and characterises a new player in Drosophila CNS development.

      Largely, the experiments are carried out with precision, meeting the aims of the project, and setting new targets for future research in the field. It was particularly refreshing to see the use of multi-omics data integration and Targeted DamID (TaDa) findings to triage scRNA-seq data. Some of the TaDa methodology was unorthodox (and should be justifed/caveats mentioned in the main text), however, this does not affect the main finding of the study.

      Their discovery of Spar as a neuropeptide precursor downstream of Alk is novel, as well as its ability to regulate activity and circadian clock function in the fly. Spar was just one of the downstream factors identified from this study, therefore, the potential impact goes beyond this one Alk downstream effector.

      Authors response: We thank the reviewer for the positive comments highlighting the strengths of our study. TaDa was used as a semi-quantitative readout of the transcriptional activity in a Alk loss-of-function background with an emphasis on relative differences in peaks close to GATC sites, providing an important dataset for integration with bulk and single cell RNAseq. As the reviewer points out there are important considerations when interpreting this data and we have now added sentences in the discussion to inform readers of possible caveats of our TaDa dataset.

      Reviewer #3 (Public Review):

      Summary:

      The receptor tyrosine kinase Anaplastic Lymphoma Kinase (ALK) in humans is nervous system expressed and plays an important role as an oncogene. A number of groups have been signalling ALK signalling in flies to gain mechanistic insight into its various role. In flies, ALK plays a critical role in development, particularly embryonic development and axon targeting. In addition, ALK also was also shown to regulate adult functions including sleep and memory. In this manuscript, Sukumar et al., used a suite of molecular techniques to identify downstream targets of ALK signalling. They first used targeted DamID, a technique that involves a DNA methylase to RNA polymerase II, so that GATC sites in close proximity to PolII binding sites are marked. They performed these experiments in wild-type and ALK loss of function mutants (using an Alk dominant negative ALkDN), to identify Alk responsive loci. Comparing these loci with a larval single-cell RNAseq dataset identified neuroendocrine cells as an important site of Alk action. They further combined these TaDa hits with data from RNA seq in Alk Loss and Gain of Function manipulations to identify a single novel target of Alk signalling - a neuropeptide precursor they named Sparkly (Spar) for its expression pattern. They generated a mutant allele of Spar, raised an antibody against Spar, and characterised its expression pattern and mutant behavioural phenotypes including defects in sleep and circadian function.

      Strengths:

      The molecular biology experiments using TaDa and RNAseq were elegant and very convincing. The authors identified a novel gene they named Spar. They also generated a mutant allele of Spar (using CrisprCas technology) and raised an antibody against Spar. These experiments are lovely, and the reagents will be useful to the community. The paper is also well written, and the figures are very nicely laid out making the manuscript a pleasure to read.

      Weaknesses:

      My main concerns were around the genetics and behavioural characterisation which is incomplete. The authors generated a novel allele of Spar - Spar ΔExon1 and examined sleep and circadian phenotypes of this allele. However, they have only one mutant allele of Spar, and it doesn't appear as if this mutant was outcrossed, making it very difficult to rule out off-target effects. To make this data convincing, it would be better if the authors had a second allele, perhaps they could try RNAi?

      Further, the sleep and circadian characterisation could be substantially improved. In Fig 8 E-F it appears as if sleep was averaged over 30 days! This is a little bizarre. They then bin the data as day 1 - 12 and 12-30. This is not terribly helpful either. Sleep in flies, as in humans, undergoes ontogenetic changes - sleep is high in young flies, stabilises between day 3-12, and shows defects by around 3 weeks of age (cf Shaw et al., 2000 PMID 10710313). The standard in the sleep field is to average over 3 days or show one representative day. The authors should reanalyse their data as per this standard, and perhaps show data from 310 day old flies, and if they like from 20-30 day old flies. Further, sleep data is usually analysed and presented from lights on to lights on. This allows one to quantify important metrics of sleep consolidation including bout lengths in day and night, and sleep latency. These metrics are of great interest to the community and should be included.

      The authors also claim there are defects in circadian anticipatory activity. However, these data, as presented are not solid to me. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827). Further, circadian period could also be evaluated. There are several free software packages to perform these analyses so it should not be hard to do.

      Authors response: We thank the reviewer for the thorough reading of our manuscript and for generously praising the positives as well as pointing out the weakness of our study. We have now addressed the highlighted weaknesses in behavioural experiments. In particular, we have reanalysed our data according to the reviewer’s suggestions. In addition, we provide experimental data, driving Spar RNAi in Clock neurons, that support our Spar mutant analysis.

      Point-by-point response to the reviewers’ concerns:

      Point 1. “My main concerns were around the genetics and behavioural characterisation which is incomplete. The authors generated a novel allele of Spar - Spar ΔExon1 and examined sleep and circadian phenotypes of this allele. However, they have only one mutant allele of Spar, and it doesn't appear as if this mutant was outcrossed, making it very difficult to rule out off-target effects. To make this data convincing, it would be better if the authors had a second allele, perhaps they could try RNAi?”

      Authors response: As per the reviewer's suggestion, we conducted a targeted knockdown of Sparkly specifically in clock neurons (Clk-Gal4 > Spar-RNAi) and assessed the circadian phenotypes. Flies were monitored for 5 days in LD followed by a shift to DD, similar to our previous LD-DD experiments. The results revealed a significant disruption in both activity and sleep during the DD transition period upon knockdown of Spar in circadian clock neurons. These findings strongly align with the expression pattern of Spar in clock neurons (Figure 7i-l’’). We have now included a new main figure (Figure 9) together with several supplementary figure (Figure 9 – figure supplements 1 and 2) and discussed these experiments on pages 17-18 of the results section of the revised manuscript.

      Point 2. “Further, the sleep and circadian characterisation could be substantially improved. In Fig 8 E-F it appears as if sleep was averaged over 30 days! This is a little bizarre. They then bin the data as day 1 - 12 and 12-30. This is not terribly helpful either. Sleep in flies, as in humans, undergoes ontogenetic changes - sleep is high in young flies, stabilises between day 3-12, and shows defects by around 3 weeks of age (cf Shaw et al., 2000 PMID 10710313). The standard in the sleep field is to average over 3 days or show one representative day. The authors should reanalyse their data as per this standard, and perhaps show data from 3–10-day old flies, and if they like from 20–30-day old flies.”

      Authors response: We have reanalysed these data according to the reviewer's suggestions and revised the sleep data presented. Specifically, we have focused on two 3-day periods, days 5-7 as well as days 20-22. By averaging the sleep mean during these time points, we observed a significant decrease in average sleep duration in the SparΔExon1 and Alk ΔRA mutant flies at a younger age (Figure 8h-h’, Figure 8 – figure supplement 2). However, no significant effect was observed in older flies (Figure 8h-h’, Figure 8 – figure supplement 2). We have incorporated this new data into Figure 8 and provided a detailed description in the results section (page 16) of the revised manuscript.

      Point 3. “Further, sleep data is usually analysed and presented from lights on to lights on. This allows one to quantify important metrics of sleep consolidation including bout lengths in day and night, and sleep latency. These metrics are of great interest to the community and should be included.”

      Authors response: We have now reanalysed these data as per the reviewer's suggestion. From the raw data collected over a span of 3 days, we specifically selected the lights on-lights on data and examined the average sleep duration. Notably, we observed a significant downregulation of average sleep in SparΔExon1 and AlkΔRA flies, but only at a younger age (Figure 8h-h’, Figure 8 – figure supplement 2). Furthermore, we assessed the number of sleep bouts using this data and found a significant increase in the number of bouts in younger SparΔExon1 and AlkΔRA flies, with no changes observed at an older age (Figure 8 – figure supplement 2). Additionally, we evaluated the number of bouts in flies that were initially monitored in LD and then shifted to DD, observing a significant decrease in the number of sleep bouts in SparΔExon1 flies following the transition to DD (Figure 9d). This new data is described in detail in the results section (pages 16-18) of the revised manuscript.

      Point 4. “The authors also claim there are defects in circadian anticipatory activity. However, these data, as presented are not solid to me. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827).”

      Authors response: We appreciate the valuable suggestion provided by the reviewer. In accordance with the referenced paper by Harrisingh et al. (2007), we calculated the "anticipation score" defined as the percentage of activity in the 6hour period preceding the lights-on or lights-off transition that occurs in the 3-hour window just before the transition. To analyse the mean activity of the flies, we selected the data corresponding to the 6 hours before lights-on and the 6 hours before lights-off, averaged over a 14-day period under normal LD conditions. Interestingly, we observed a significant increase in the mean activity of SparΔExon1 flies during both morning anticipation (a.m. anticipation) and evening anticipation (p.m. anticipation) (Figures 8f). Furthermore, we analysed this parameter for flies entrained in DD and found that SparΔExon1 flies exhibited lower mean activity during both morning and evening anticipation (Figures 8g). We have incorporated this new data into Figure 8 and provided a detailed description in the results section (pages 16-18) of the revised manuscript.

      Point 5. Further, circadian period could also be evaluated. There are several free software packages to perform these analyses so it should not be hard to do.

      Authors response: We have now evaluated the circadian period as suggested by the reviewer; generating a chi-square periodogram for each fly to calculate the free-running period for the flies that were under normal LD conditions additionally to the ones that were entrained in DD. We calculated the percentage of flies that had a shorter or longer period than 1440 min (24 h) and observed that w1118 and SparΔExon1 flies have a longer circadian period (Figure 8 – figure supplement 4) but following the shift to DD, they tend to have a shorter circadian period (Figure 9 – figure supplement 3). This new data is described in the results (pages 16-18).

      Recommendations for the authors:

      There are two major concerns that we recommend the authors address:

      1) The behaviour: There are a number of unconventional representations of the behavioural data in this manuscript. We recommend that the authors revisit their data representation to adhere to conventions in the field - specific suggestions are in the reviews. We also suggest an additional experiment - an RNAi/different allele/rescue experiment to ensure that the phenotypes the authors observe are not due to off-target effects of the mutant they have generated.

      Authors response: In the revised manuscript, we have reanalysed the behavioural data according to the reviewers’ recommendations (included in Figures 8 and 9 of the revised version). In addition, we have performed a targeted Spar RNAi experiment in clock neurons (included in Figure 9 of the revised version), identifying a hyperactive behavioural phenotype similar to that of Spar mutants. The inclusion of these new analyses and data strengthens the manuscript and support the conclusion that Spar plays a role in regulation of behaviour.

      2) TaDa analyses: We were concerned that the authors might be picking up false positives with the way they have analysed their data. While this may not matter for this study, it will be useful to reason out their approach and keep this in mind for any other targets they choose from these data for further studies.

      Authors response: In line with the reviewers concerns we have now highlighted the potential caveats and drawbacks of our TaDa dataset in the discussion section of the revised manuscript (detailed in response to Reviewer #2 below).

      Reviewer #1 (Recommendations For The Authors):

      Though generally well written, I felt that some sections could be written in more detail. For example, the text around Figure 5 was not very informative. Many of the other approaches to the analyses and details of datasets used were glossed over. Since the manuscript uses a lot of previously published data, it would be nice to give more details about them in the context of the results.

      Authors response: We thank the reviewer for this recommendation. We have now added additional information about peptidomics analysis in the results and in the legend of Figure 5. We have also included a table in the Methods that summarised the datasets used in this study, including the Dataset name, brief description and reference.

      In the panels where co-localisations have been represented, it would be nice to include enlarged insets depicting the co-labelling. It is not always obvious in the way the figures have currently been represented. For example, in Fig 2G, Alk stain appears to be everywhere, but the authors make the point that it is enriched in neuroendocrine cells (as labelled by dimmed), but the co-localisation isn't evident. Similar issues come up with the sparkly colocalisations.

      Authors response: As suggested by the reviewer, we have now added additional panels to complement the stainings in Figure 2G. These new data are included as Figure 2 – figure supplement 1 (Alk/Dimm-Gal4>UAS-GFPcaax staining) and as Figure 4 – figure supplement 1 (Alk/Spar staining), which indicate colocalization in the central brain and ventral nerve cord prosecretory cells with enlarged panels.

      Supplementary figures S3C and 3F appear garbled to me? Maybe it didn't upload properly?

      Authors response: Unfortunately, this issue is not apparent to us. However, we have now re-uploaded these Figures.

      Sparkly's responsiveness to Alk signalling: Visually, there does not seem to be an increase or decrease in spar levels in the images in Fig 4F-H. How was the quantification done? I would suggest a more detailed interpretation of their results related to spar's responsiveness to Alk signalling - at the mRNA vs protein levels and the GOF vs LOF conditions.

      Authors response: We thank the reviewer for this constructive recommendation. In the revised manuscript, we have now repeated this experiment with increased numbers of larval CNS followed by blinded image analysis. These results also show an increased fluorescence intensity as measured by corrected total cell fluorescence (CTCF), confirming our previous observation of increased Spar protein expression in in Alk gain-of-function conditions compared to controls. In this analysis, changed in Spar levels in Alk loss-of-function remained non-significant compared to control, in agreement with our previous data. As suggested by the reviewer, we have now included several additional sentences discussing the possible reasons for these observations. This following text is now included on Page 11 of the results section:

      “While our bulk RNA-seq and TaDa datasets show a reduction in Spar transcript levels in Alk loss-of-function conditions, this reduction is not reflected at the protein level. This observation may reflect additional uncharacterised pathways that regulate Spar mRNA levels as well as translation and protein stability. Taken together, these observations confirm that Spar expression is responsive to Alk signaling in CNS, although Alk is not critically required to maintain Spar protein levels.” We have also added an additional Image analysis method section explaining the methodology of the CTCF fluorescent intensity quantification on Page 28.

      Reviewer #2 (Recommendations For The Authors):

      It was surprising to see that the authors did not use Dam-only controls. This is to control for background methylation by Dam (i.e. accessible chromatin). This does not invalidate the main results of the manuscript, however, there could be false positives in the dataset for genes that are seen to be up-regulated in the mutant condition (e.g. if accessibility is increased in the mutant but not transcription, then it would look like increased Pol II binding, when it isn't). As the study was focusing on genes down-regulated in the mutant, this is less of an issue, as it is very unlikely to see an increase in transcription with a decrease in accessibility (that could provide a false positive). The authors should explain their rationale for not using Dam-only controls, and the associated caveats, in the manuscript.

      Authors response: We agree with the reviewer’s comment on possibility of identifying false positive candidates from our TaDa dataset. Especially, if one is seeking to find a gene with increased Pol II occupancy in a Alk dominant negative condition. However, our analysis only focuses on genes which are responsive to Alk-manipulation, namely, genes which are downregulated in the Alk dominant negative condition. One of the rationales for not using a Dam-only control was that in our previous Mendoza-Garcia et al, 2021 study, we employed a similar method and were able to successfully identify already known and novel targets of Alk signalling in embryonic mesoderm comparing the Dam-Pol II versus Dam-Pol II; Alk Dominant negative conditions. In the current version of the manuscript, we have expanded our discussion of these caveats as follows (Discussion, Page 19-20):

      “A potential drawback of our TaDa dataset is the identification of false positives, due to non-specific methylation of GATC sites at accessible regions in the genome by Dam protein. Hence, our experimental approach likely more reliably identifies candidates which are downregulated upon Alk inhibition. In our analysis, we have limited this drawback by focusing on genes downregulated upon Alk inhibition and integrating our analysis with additional datasets, followed by experimental validation. This approach is supported by the identification of numerous previously iden- tied Alk targets in our TaDa candidate list.”

      Related to this, could the authors make it clear/justify why they chose to use peakbased analysis of the Dam-Pol II data rather than looking at signals across whole transcripts? For example, this could result in false positives if a gene switches from having no Pol II to having paused Pol II.

      Authors response: In our opinion, a peak based analysis is dependable in this context. We chose to prioritize peaks close (+/- 1kb) to transcription start sites (TSS) to increase the chances of finding true Pol II occupancy peaks. Also, during bioinformatics analysis using Damid-seq pipeline (Maksimov et al, 2016) fragments not aligning to GATC borders are excluded. Therefore, a whole transcript Pol II occupancy peak analysis may not be always feasible. We agree with the reviewer that a paused Pol II will result in false positives, however, it will only result in an increase of a specific peak and in our case, we are seeking to identify peaks with lower pol II occupancy as a result of Alk knockdown. Furthermore, we depend on additional integration with additional relevant datasets to minimise false positive candidates for detailed analysis. In the current version of the manuscript these caveats have been mentioned and discussed (see point above).

      Do the authors have any theories about the mode of action of Spar? Or ideas about how this might be followed up? If so, that could be included in the Discussion.

      Authors response: Other than identifying modified Spar derived peptides, which suggest a target receptor, possibly a GPCR, were have no other data currently that allows us to speculate more on the mode of action of Spar. We are currently working hard to try to identify a receptor, but this is a challenging and ongoing process. In the discussion we speculate regarding the identity of the Spar receptor, as well as its location, which is likely in the CNS, and body muscle, however, these are open questions that we can hopefully answer in a future study.

      Reviewer #3 (Recommendations For The Authors):

      Spar protein expression was unchanged in Alk loss of function. This is a curious result as the authors used RNA seq data from Alk loss of function to identify Spar. This could be commented on in the discussion.

      Authors response: We thank the reviewer for this comment, and they are correct in noticing this. We have also thought about this, and reviewer #1 also commented. To confirm this result, we repeated this experiment with increased numbers of larval CNS followed by blinded image analysis for the revised version. These results also show an increased fluorescence intensity as measured by corrected total cell fluorescence (CTCF), confirming our previous observation of increased Spar protein expression in in Alk gain-of-function conditions compared to controls. In this analysis, changed in Spar levels in Alk loss-of-function remained non-significant compared to control, in agreement with our previous data. As suggested by reviewer #1, we have now included several additional sentences discussing the possible reasons for these observations. This following text is now included on Page 11 of the results section:

      “While our bulk RNA-seq and TaDa datasets show a reduction in Spar transcript levels in Alk loss-of-function conditions, this reduction is not reflected at the protein level. This observation may reflect additional uncharacterised pathways that regulate Spar mRNA levels as well as translation and protein stability. Taken together, these observations confirm that Spar expression is responsive to Alk signaling in CNS, although Alk is not critically required to maintain Spar protein levels.”

      Pg 19: Spar is expressed in the Mushroom Bodies (MBs). Do they mean in Kenyon Cells (KCs)? I don't see this expression in the figures. Maybe this could be highlighted in the figure. It would definitely be of interest if this were true.

      Authors response: We agree with the reviewer that this would be interesting. We have not performed detailed staining of the mushroom bodies at this point, however, Spar mRNA expression in a transcriptomics analysis performed by Crocker et al, 2016, identifies Spar in all cell types, including Kenyon cells. We have now included this and cited this reference in the discussion.

      Spar is also expressed in multiple potential sleep regulatory sites including clock neurons, the PI, AstA cells and so on. Some of these might be arousal-promoting and some sleep-promoting. Taking out Spar in both sleep and arousal-promoting subsets might have complex effects. The authors might want to knock down Alk in different subsets of neurons to make more targeted manipulations.

      Authors response: We thank the reviewer for this suggestion regarding interesting experiments to further investigate Spar function. We are planning to follow up and study the role of Alk signalling in different neuronal subsets, with a specific interest in neuroendocrine/prosecretory cells.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer No.1 (public)

      The authors present a study focused on addressing the key challenge in drug discovery, which is the optimization of absorption and affinity properties of small molecules through in silico methods. They propose active learning as a strategy for optimizing these properties and describe the development of two novel active learning batch selection methods. The methods are tested on various public datasets with different optimization goals and sizes, and new affinity datasets are curated to provide up-todate experimental information. The authors claim that their active learning methods outperform existing batch selection methods, potentially reducing the number of experiments required to achieve the same model performance. They also emphasize the general applicability of their methods, including compatibility with popular packages like DeepChem.

      Strengths:

      Relevance and Importance: The study addresses a significant challenge in the field of drug discovery, highlighting the importance of optimizing the absorption and affinity properties of small molecules through in silico methods. This topic is of great interest to researchers and pharmaceutical industries.

      Novelty: The development of two novel active learning batch selection methods is a commendable contribution. The study also adds value by curating new affinity datasets that provide chronological information on state-of-the-art experimental strategies.

      Comprehensive Evaluation: Testing the proposed methods on multiple public datasets with varying optimization goals and sizes enhances the credibility and generalizability of the findings. The focus on comparing the performance of the new methods against existing batch selection methods further strengthens the evaluation.

      Weaknesses:

      Lack of Technical Details: The feedback lacks specific technical details regarding the developed active learning batch selection methods. Information such as the underlying algorithms, implementation specifics, and key design choices should be provided to enable readers to understand and evaluate the methods thoroughly.

      Evaluation Metrics: The feedback does not mention the specific evaluation metrics used to assess the performance of the proposed methods. The authors should clarify the criteria employed to compare their methods against existing batch selection methods and demonstrate the statistical significance of the observed improvements.

      Reproducibility: While the authors claim that their methods can be used with any package, including DeepChem, no mention is made of providing the necessary code or resources to reproduce the experiments. Including code repositories or detailed instructions would enhance the reproducibility and practical utility of the study.

      Suggestion 1:

      Elaborate on the Methodology: Provide an in-depth explanation of the two active learning batch selection methods, including algorithmic details, implementation considerations, and any specific assumptions made. This will enable readers to better comprehend and evaluate the proposed techniques.

      Answer: We thank the reviewer for this suggestion. Following this comments we have extended the text in Methods (in Section: Batch selection via determinant maximization and Section: Approximation of the posterior distribution) and in Supporting Methods (Section: Toy example). We have also included the pseudo code for the Batch optimization method.

      Suggestion 2:

      Clarify Evaluation Metrics: Clearly specify the evaluation metrics employed in the study to measure the performance of the active learning methods. Additionally, conduct statistical tests to establish the significance of the improvements observed over existing batch selection methods.

      Answer: Following this comment we added to Table 1 details about the way we computed the cutoff times for the different methods. We also provide more details on the statistics we performed to determine the significance of these differences.

      Suggestion 3:

      Enhance Reproducibility: To facilitate the reproducibility of the study, consider sharing the code, data, and resources necessary for readers to replicate the experiments. This will allow researchers in the field to validate and build upon your work more effectively.

      Answer: This is something we already included with the original submission. The code is publicly available. In fact, we provide a phyton library, ALIEN (Active Learning in data Exploration) which is published on the Sanofi Github(https://github.com/ Sanofi-Public/Alien). We also provide details on the public data used and expect to provide the internal data as well. We included a small paragraph on code and data availability.

      Reviewer No.2 (public)

      Suggestion 1:

      The authors presented a well-written manuscript describing the comparison of activelearning methods with state-of-art methods for several datasets of pharmaceutical interest. This is a very important topic since active learning is similar to a cyclic drug design campaign such as testing compounds followed by designing new ones which could be used to further tests and a new design cycle and so on. The experimental design is comprehensive and adequate for proposed comparisons. However, I would expect to see a comparison regarding other regression metrics and considering the applicability domain of models which are two essential topics for the drug design modelers community.

      Answer: We want to thank the reviewer for these comments. We provide a detailed response to the specific comments below. 

      Reviewer No.1 (Recommendations For The Authors)

      Recommendation 1:

      The description provided regarding the data collection process and the benchmark datasets used in the study raises some concerns. The comment specifically addresses the use of both private (Sanofi-owned) and public datasets to benchmark the various batch selection methods. Lack of Transparency: The comment lacks transparency regarding the specific sources and origins of the private datasets. It would be crucial to disclose whether these datasets were obtained from external sources or if they were generated internally within Sanofi. Without this information, it becomes difficult to assess the potential biases or conflicts of interest associated with the data.

      Answer: We would like to thank the reviewer for this comment. As mentioned in the paper, the public github page contains links to all the public data and we expect also to the internal Sanofi data. We also now provide more information on the specific experiments that were internally done by Sanofi to collect that data.

      Potential Data Accessibility Issues: The utilization of private datasets, particularly those owned by Sanofi, may raise concerns about data accessibility. The lack of availability of these datasets to the wider scientific community may limit the ability of other researchers to replicate and validate the study’s findings. It is essential to ensure that the data used in research is openly accessible to foster transparency and encourage collaboration.

      Answer: Again, as stated above we expect to release the data collected internally on the github page.

      Limited Information on Dataset Properties: The comment briefly mentions that the benchmark datasets cover properties related to absorption, distribution, pharmacokinetic processes, and affinity of small drug molecules to target proteins. However, it does not provide any specific details about the properties included in the datasets or how they were curated. Providing more comprehensive information about the properties covered and the methods used for curation would enhance the transparency and reliability of the study.

      To address these concerns, it is crucial for the authors to provide more detailed information about the data sources, dataset composition, representativeness, and curation methods employed. Transparency and accessibility of data are fundamental principles in scientific research, and addressing these issues will strengthen the credibility and impact of the study.

      Answer: We agree with this comment and believe that it is important to be explicit about each of the datasets and to provide information on the new data. We note that we already discuss the details of each of the experiments in Methods and, of course, provide links to the original papers for the public data. We have now added text to Supporting Methods that describes the experiments in more details as well as providing literature references for the experimental protocols used. As noted above, we expect to provide our new internal data on the public git page. 

      Recommendation 2:

      Some comments on the modeling example Approximation of the posterior distribution. Lack of Methodological Transparency: The comment fails to provide any information regarding the specific method or approach used for approximating the posterior distribution. Without understanding the methodology employed, it is impossible to evaluate the quality or rigor of the approximation. This lack of transparency undermines the credibility of the study.

      Answer: We want to thank the reviewer for pointing this out. Based on this comment we added more information to Section: Approximation of the posterior distribution. Moreover, we now provide details on the posterior approximation in Section: Two approximations for computing the epistemic covariance.

      Questionable Assumptions: The comment does not mention any of the assumptions made during the approximation process. The validity of any approximation heavily depends on the underlying assumptions, and their omission suggests a lack of thorough analysis. Failing to acknowledge these assumptions leaves room for doubt regarding the accuracy and relevance of the approximation.

      Answer: We are not entirely sure which assumptions the reviewer is referring to here. The main assumption we can think of that we have used is the fact that getting within X% of the optimal model is a good enough approximation. We have specifically discussed this assumption and tested multiple values of X. While it would have been great to have X = 0 this is unrealistic for retrospective studies. For Active Learning the main question is how many experiments can be saved to obtain similar results and the assumptions we used are basically ’what is the definition of similar’. We now added this to Discussion.

      Inadequate Validation: There is no mention of any validation measures or techniques used to assess the accuracy and reliability of the approximated posterior distribution. Without proper validation, it is impossible to determine whether the approximation provides a reasonable representation of the true posterior. The absence of validation raises concerns about the potential biases or errors introduced by the approximation process.

      Answer: We sincerely appreciate your concern regarding the validation of the approximated posterior distribution. We acknowledge that our initial submission might not have clearly highlighted our validation strategy. It is, of course, very hard to determine the accuracy of the distribution our model learns since such distribution cannot be directly inferred using experiments (no ’ground truth’). Instead, we use an indirect method to determine the accuracy. Specifically, we conducted retrospective experiment using the learned distribution. In these experiments, we indirectly validated our approximation by measuring the error with the respective method. The results from these retrospective experiments provided evidence for the accuracy and reliability of our approximation in representing the true posterior distribution. We now emphasize this in Methods.

      Uncertainty Quantification: The comment does not discuss the quantification of uncertainty associated with the approximated posterior distribution. Properly characterizing the uncertainty is crucial in statistical inference and decision-making. Neglecting this aspect undermines the usefulness and applicability of the approximation results.

      Answer: Thank you for pointing out the importance of characterizing uncertainty in statistical inference and decision-making, a sentiment with which we wholeheartedly agree. In our work, we have indeed addressed the quantification of uncertainty associated with the approximated posterior distribution. Specifically, we utilized Monte Carlo Dropout (MC Dropout) as our method of choice. MC Dropout is a widely recognized and employed technique in the neural networks domain to approximate the posterior distribution, and it offers an efficient way to estimate model uncertainty without requiring any changes to the existing network architecture [1, 2]. In the revised version, we provide a more detailed discussion on the use of Monte Carlo Dropout in our methodology and its implications for characterizing uncertainty.

      Comparison with Gold Standard: There is no mention of comparing the approximated posterior distribution with a gold standard or benchmark. Failing to provide such a comparison leaves doubts about the performance and accuracy of the approximation method. A lack of benchmarking makes it difficult to ascertain the superiority or inferiority of the approximation technique employed.

      Answer: As noted above, it is impossible to find gold standard information for the uncertainly distribution. It is not even clear to us how such gold standard can be experimentally determined since its a function of a specific model and data. If the reviewer is aware of such gold standard we would be happy to test it. Instead, in our study, we opted to benchmark our results against state-of-the-art batch active learning methods, which also rely on uncertainty prediction (such uncertainty prediction is the heart of any active learning method as we discuss). Results clearly indicate that our method outperforms prior methods though we agree that this is only an indirect way to validate the uncertainty approximation.

      Reviewer No.2 (Recommendations For The Authors)

      Recommendation 1:

      The text is kind of messy: there are two results sections, for example. It seems that part of the text was duplicated. Please correct it.

      Answer: We want to thank the reviewer pointing this out. These were typos and we fixed them accordingly.

      Recommendation 2:

      Text in figures is very small and difficult to read. Please redraw the figures, increasing the font size: 10-12pt is ideal in comparison with the main text.

      Answer: We want to thank the reviewer for this comment and we have made the graphics larger.

      Recommendation 3: Please, include specific links to data availability instead of just stating it is available at the Sanofi-Public repository.

      Answer: We want to thank the reviewer for this comment and added the links and data to the Sanofi Github page listed in the paper.

      Recommendation 4:

      What are the descriptors used to train the models?

      Answer: We represented the molecules as molecular graphs using the MolGraphConvFeaturizer from the DeepChem library. We now explicitly mention this in Methods.

      Recommendation 5:

      Regarding the quality of the models, I strongly suggest two approaches instead of using only RMSE as metrics of models’ performance. I recommend using the most metrics as possible as reported by Gramatica (https://doi.org/10.1021/acs.jcim.6b00088). I also recommend somehow comparing the increment on the dataset diversity according to the employed descriptors (applicability domain) as a measurement to further applications on the unseen molecules.

      Answer: We want to thank the reviewer for this great suggestions. As suggested we added new comparison metrics to the Supplement.

      • Distribution plot for the range of the Y values Figure 8 • Clustering of the data sets represented as fingerprints Supplementary material Figure 5,6

      • Retrospective experiments with Spearman correlation coefficient. Supplementary material Figure: 2,3,4

      I suggest also a better characterization of datasets including the nature and range of the Y variable, the source of data in terms of experimentation, and chemical (structural and physicochemical) comparison of samples within each dataset.

      Answer: As noted above in response to a similar comment by Reviewer 1, we have added more detailed information about the different experiments we tested to Supporting Methods.

      References

      [1] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR.

      [2] N.D. Lawrence. Variational Inference in Probabilistic Models. University of Cambridge, 2001.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a very well written and performed study describing a TOPBP1 separation of function mutation, resulting in defective MSCI maintenance but normal sex body formation. The phenotype differs from that of a previous TOPBP1 null allele, in which both MSCI and sex body formation were defective. Additional defects in CHK phosphorylation and SETX localization are also described.

      Strengths:

      The study is very rigorous, with a remarkably large number of MSCI marks assayed, phosphoproteomics (leading to the interesting SETX discovery) and 10X RNAseq, allowing the MSCI phenotype to be further deconvolved. The approaches in most cases are robust.

      Weaknesses:

      There aren't many; please find list below:

      1) The authors are committed to the idea that maintenance of MSCI is the major defect here. However, based on the data, an alternative would be that some cells achieve sex body formation and MSCI normally, while others do not. It would only take a small percentage of cells exhibiting MSCI failure to kill all the cells in the same germinal epithelium, so this could still explain the complete pachytene block. This isn't a major point...this phenotype is clearly different to the TOPBP1 KO, but a broader discussion of possibilities in the discussion would help. I raise this in the context of both the cytology and 10X analysis:

      a) The assessment that sex body formation is normal is based on cytology in Supp 8 and 9, but a more rigorous approach would be to assess condensation of the XY pair in stage-matched spread cells (maybe they have that data already) by measuring distances between the X and Y centromere, or looking at stage IV of the seminiferous cycle, where all cells should have oval sex bodies but sex body mutants have persistent elongated XY pairs (see work of Namekawa and Turner). The authors do actually mention that gH2AX spreading is defective in many cells....and if this is true, condensation to form a sex body would almost certainly not have taken place in those cells.

      We appreciate the reviewer’s comment and have performed the experiment suggested, counting the number of elongated sex bodies in all sex body-positive cells in seminiferous tubules stained with γH2AX and DAPI (as done by Turner in Hirota et al., 2018). The experiment did not show significant differences between Topbp1+/+ and Topbp1B5/B5 as shown in Author response image 1.

      Author response image 1.

      Topbp1B5/B5 displays normal condensation of the XY-pair. A) Immunostaining of XY condensation in Topbp1+/+ and Topbp1B5/B5 testes sections (γH2AX: green and DAPI: gray). B) Quantification of all sex body-positive cells per tubule (Topbp1+/+ number of cells counted = 781, number of tubules counted = 28, number of mice = 3; Topbp1B5/B5 number of cells counted = 967, number of tubules counted = 28, number of mice = 3). C) Quantification of elongated-sex body cells per tubule (Topbp1+/+ number of cells counted = 19 and 762 normal round/oval-sex bodies cells, number of tubules counted = 28, number of mice = 3; Topbp1B5/B5 number of cells counted = 45 and 922 normal round/oval-sex bodies cells, number of tubules counted = 28, number of mice = 3).

      b) Regarding the 10X data, the finding that expression of some XY genes is elevated and others are not is also consistent with a "partial" phenotype (some cells have normal XY bodies and MSCI, others fail in both). In Fig 6E, X expression looks to be elevated in B5 vs wt at all stages...if this were a maintenance issue, shouldn't it be equal to that in wt and then elevate later?

      We understand the point raised by the reviewer, however we do not favor the “partial” phenotype model because of the absence of any post-pachytene spermatocytes in the B5 mutant. If some cells had escaped the MSCI defect, we would expect to detect cells progressing further in meiosis. Because we cannot rule out completely the possibility of a subtle disruption in XY silencing initiation, we decided to better emphasize this point in the discussion (lines 391-394).

      In Figure 6E, the X-linked genes were normalized against chromosome 9-linked genes. The normalization against pre-leptotene was done for the results displayed on Figure 7, in which we demonstrate the maintenance issue. Furthermore, for the 10X analysis, while the same number of cells were loaded for wild-type and mutant, the composition of cells varied between these two samples. Despite the fact that very few “spermatocyte 3” cells were detected in the mutant, those cells displayed much higher X-linked gene expression than the wild-type spermatocyte 3 cells.

      2) How is the quantitation showing impaired localization of select markers (e.g. SETX) normalized? How do we know that the antibody staining simply didn't work as well on the mutant slides?

      The quantification showing impaired localization of the selected markers such as SETX was done as described by Sims, et al. 2022 and Adams, et al. 2018. In brief, the green signal was measured along (XY cores) or across (XY DNA loops) the X and Y chromosomes and normalized against the analogous signal on the autosomal chromosomes. The possibility that the antibody simply did not work as well on the mutant is unlikely since multiple biological replicates were performed and we reproducibly followed standard practices in the field for meiotic spreads staining, imaging, and quantification. We also note that our findings published in Sims et al, 2022 show that ATR inhibition strongly impairs SETX localization to the sex body, further substantiating our claim that signaling via ATR-TOPBP1 controls SETX.

      3) Is testis TOPBP1 protein expression reduced in the B5 mutant?

      TOPBP1 protein abundance in the B5 mutant is reduced in lysates from whole testis, measured via western blot. We did not detect a significant reduction in TOPBP1 signal intensity measured by immunofluorescence in pachytene spreads of the B5 mutant.

      4) 10X analysis: how were the genes on the y-axis in Supp 24 arranged? Is this by location on the X chromosome?

      These genes were sorted by location across the chromosome X.

      5) The final analyses in Fig 7: X-genes are subdivided based on their behavior (up, down, unchanged). What isn't clear to me is whether the authors have considered the fact that there are global changes in gene expression during meiosis (very low in lep , zyg and early pach, then ramps up hugely from mid pach). In other words, is this normalized to autosomal gene expression?

      For the final analysis in Fig7, the normalization was done by their expression at the pre-leptotene stage. Moreover, the analysis was made comparing X-linked gene behavior in Wild-type vs B5 mutant.

      6) Again regarding the 10X analysis, my prediction would be that not ALL X and Y gene would increase in pach if MSCI were ablated...we should remember that XY genes have been subject to MSCI for some 160 million years of evolution, and this will mean that many enhancers that originally drove their expression prior to the evolution of MSCI will now be lost. This has been our experience: many XY genes aren't elevated at pach even in mutants in which MSCI is totally defective. I'd urge the authors to consider this possibility when they use XY gene expression patterns to diagnose the severity or timing of the MSCI phenotype. This could be a discussion point.

      We greatly appreciate the reviewer’s suggestion and have added discussion about this point to lines 392400).

      Reviewer #2 (Public Review):

      Summary:

      This paper described the role of BRCT repeat 5 in TOPBP1, a DNA damage response protein, in the maintenance of meiotic sex chromosome inactivation (MSCI). By analyzing a Topbp1 mutant mouse with amino acid substitutions in BRCT repeat 5, the authors found reduced phosphorylation of a DNA/RNA helicase, Sentaxin, and decreased localization of the protein to the X-Y sex body in pachynema. Moreover, the authors also found decreased repression of several genes on the sex chromosomes in the male mice.

      Strengths:

      The works including phospho-proteomics and single-cell RNA sequencing with lots of data have been done with great care and most of the results are convincing.

      Weaknesses:

      One concern is that, although the Topbp1 mutant spermatocytes show very severe defects after the stage of late pachynema, the defect in the gene silencing in the sex body is relatively weak. It is a bit difficult to explain how such a weak mis regulation of the gene silencing in mice causes the complete loss of cells in the late stage of spermatogenesis.

      We appreciate the reviewer’s comment. We note that even subtle mis-regulation of XY gene silencing has been reported to lead to significant loss of cells in late stage of prophase I (Ichijima et al., 2011; Modzelewski et al., 2012). Moreover, it is possible that some cells with drastic changes in X-gene expression were excluded from the downstream analysis due to high levels of mitochondrial gene expression (cells that were likely dying due to apoptosis). The exclusion of cells with high levels of mitochondrial gene expression is a common practice in downstream analysis of sc-RNA sequencing data.

      Reviewer #3 (Public Review):

      The work presented by Ascencao and coworkers aims to deepen into the process of sex chromosome inactivation during meiosis (MSCI) as a critical factor in the regulation of meiosis progression in male mammals. For this purpose, they have generated a transgenic mouse model in which a specific domain of TOPBP1 protein has been mutated, hampering the binding of a number of protein partners and interfering with the regulatory cascade initiated by ATR. Through the use of immunolocalization of an impressive number of markers of MSCI, phosphoproteomics and single cell RNA sequencing (scRNAseq), the authors are able to show that despite a proper morphological formation of the sex body and the incorporation of most canonical MSCI makers, sex chromosome-liked genes are reactivated at some point during pachytene and this triggers meiosis progression breakdown, likely due to a defective phosphorylation of the helicase SETX.

      The manuscript presents a clear advance in the understanding of MSCI and meiosis progression with two main strengths. First, the generation of a mouse model with a very uncommon phenotype. Second, the use of a vast methodological approach. The results are well presented and illustrated. Nevertheless, the discussion could be still a bit tuned by the inclusion of some ideas, and perhaps speculations, that have not been considered.

      We appreciate the reviewer’s comment and have improved the discussion section addressing the points raised in the “recommendation For the Authors”.

      Reviewer #1 (Recommendations For The Authors):

      I don't have any additional points here

      Reviewer #2 (Recommendations For The Authors):

      The paper by Ascencao et al. describes a separation-in-function allele of TOPBP1 critical for DNA damage response (DDR) that confers a specific defect in XY sex chromosome inactivation during male mouse meiosis. The authors constructed a Topbp1 separation-of-function mouse by introducing amino acid substitutions in BRCT repeat 5 and found the mice with normal DDR response in mitosis and meiosis show male infertility. Topbp1(B5/B5) mice do not contain spermatocytes after diplonema, as a result, little spermatids/sperms. In the mice, most of the meiotic events in prophase I including chromosome synapsis and meiotic recombination as well as the formation of the sex body are normal. The detailed proteomic analysis revealed the reduced ATR-dependent phosphorylation of a DNA/RNA helicase, Sentaxin. And also single-cell RNA sequencing found that the expression of some of genes from sex chromosomes are not silenced well compared to the control. The works with lots of data have been done with great care and most of the results are convincing. One clear concern is that, although the authors nicely showed a defect in gene silencing in sex chromosomes in the Topbp1(B5/B5) mice, how a small defect in the gene silencing leads to the complete loss of diplotene spermatocytes remains unaddressed.

      Major points:

      Although the authors showed a change in the transcriptome in spermatocytes of Topbp1(B5/B5) male mice, the authors cannot explain the complete lack of spermatids in this mouse. Even the transcriptome seems not to provide a clue.

      1) Given that the TOPBP1-B5 protein cannot bind to both 53BP1 and BLM, it is interesting to check the localization of both proteins on meiotic chromosome spreads (in the case of 53BP1, the localization in MEFs with DNA damage).

      We appreciate the reviewer’s comment. We have tried to stain BLM in meiotic spreads using several different antibodies, however we were not successful getting specific signals for BLM. In the case of 53BP1, we monitored its localization, and it was not significantly different from Topbp1-/- meiotic spreads, please refer to Supplemental Figure 11. While we appreciate the reviewer’s suggestion of looking at the localization of 53BP1 in MEFs with DNA damage, we opted not to perform the experiment because we have shown that 53BP1 can still bind the BRCT 1 and 2 domains of TOPBP1 as previously described (Bigot et al., 2019; Cescutti et al., 2010; Liu et al., 2017). Additionally, both male and female 53BP1 KO mice are fertile (Ward et al., 2003), thus the partial disruption in binding to 53BP1 that we observed in TOPBP1 B5 mutant is likely not causing the infertility phenotype.

      2) A recent preprint by Fujiwara et al. (doi: https://doi.org/10.1101/2023.04.12.536672) showed the accumulation of R-loops in spermatocyte spreads in Senataxin knockout mice. The authors may check the R-loop on the sex body in Topbp1-B5 mice.

      We thank the reviewer for the suggestion. We have tried several protocols to stain R-loops (including the protocol used in the paper mentioned above) but were not successful.

      3) The authors need to check the protein level (and band shift) of Senataxin in the testis by western blotting analysis.

      We have tried several SETX antibodies, and none worked for western blot analysis.

      4) If possible, the authors can see any protein interaction between TOPBP1 and Senataxin.

      We appreciate the suggestion, and we will investigate this interaction in future work.

      5) The authors need to check the statistics in the paper.

      (1) It is better to show actual P-values in the case of "ns".

      P-values were added to the respective figure legends.

      (2) In focus counting such as Figures 3D, G, H, 4B, D, F, H, 5E, and F (and in Supplemental Figures), please indicate how many spreads were counted in each mouse. Moreover, the distribution of focus numbers and intensity of fluorescence are not parametric (not normal distribution). It is better to use a non-parametric method such as Mann-Whitney's U test.

      We appreciate the reviewer's comment and upon consulting with a Statistician at Cornell Statistical Consulting Unit (CSCU), we were advised to use a linear mixed effect model to take into account the variability in cells within each mouse when comparing mice between groups (Topbp1+/+ vs Topbp1B5/B5). We then reanalyzed all quantified meiotic spreads using this mixed effect model, and the p-value, number of mice, and number of cells counted for each group are displayed in the respective figure legends. Upon going through all the quantified meiotic spreads, we realized a minor error in one of the previous data points related to SETX staining in Topbp1+/+ and have fixed it. Using the previous quantification data and the new stats analysis the p-value for cores was 0.5598 and p-value for loops was 0.0273. Now using the correct values and the new stats analysis the p-value for cores is 0.5987 and p-value for loops is 0.0452. The correction did not change the conclusion of this data and is now displayed in the new Figure 5. We also realized a mistake in the ATR quantification when the spreadsheet was moved from excel to Graphpad. Using the previous quantification and the new stats analysis the p-value for cores was 0.2451 and p-value for loops was 0.8933. Now using the correct values and the new stats analysis the p-value for cores is 0.4068 and p-value for loops is 0.9396. The correction did not change the conclusion of this data and is now displayed in the new Figure 4. Moreover, we realized that we used n = 8 (n = number of mice) for MDC1 quantification and n = 2 for pCHK1_S345, instead of n =3 as shown in the preprint version of the manuscript. Corrected values were added to their respective figures and figure legends.

      (3) From Figures 6E, 7B, and 7C, the authors conclude the difference in the expression profile between wild type and Topbp1(B5) spermatocytes. It is better to show P-values for the comparison. Particularly, in Figure 7C, Xiap expression kinetics look similar between wild type and the mutant.

      We have added p-values to figures 6E and 7B and their respective figures or figure legends.<br /> In figure 7C, we now recognize that the Δ could have been misleading as we meant to compare Wild-type SP2 to Wild-type SP3 and Mutant SP2 to SP3; and not comparing Wild-type SP3 to Mutant SP3. Therefore, the Δ was excluded from Figure 7C. For the comparisons between expression levels of SP2 and SP3, it is challenging to calculate p-values for a single gene since these cells have started X-gene silencing and expression values are very low. Meaningful p-values for the comparisons between Wildtype SP3 to Mutant SP3 can be visualized in Figure 7B, where the comparison is based on number of genes instead of expression levels of each gene.

      Minor comments:

      1) Line 34: SPO11 is NOT a nuclease. Just delete it.

      It has been deleted (see line 34).

      2) Line 71, a protein: Is this protein ATR? Is so, please write it. If not, please give the name of the protein.

      In line 71 (now lines 79-80), we refer to TOPBP1-interacting proteins in general since many of these interactions happen through a phosphorylation in the TOPBP1’s interactor. This is the case for BLM, 53BP1, FANCJ, and RAD9. ATR interacts with TOPBP1 through TOPBP1’s AAD domain and this is not a phospho-mediated interaction. We restructured the sentence for clarity.

      3) In the Introduction, the authors often refer to a review by Cimprich and Cortez (2008) in various places. It is better to cite an original paper or the other an appropriate review.

      We have accepted the reviewer’s suggestion and added original papers when appropriate.

      4) Line 143-145: The authors generated eight charge reversal point mutations in the BRCT domain 5 of TOPBP1. If possible, it is helpful to mention the logic to generate these substitutions and also why BRCT domain 5, is not other domains.

      We generated eight charge reversal point mutations to abrogate all possible phospho-dependent interactions and avoid potential residual interactions. We have mutated other BRCT domains as well, which will be published separately.

      5) Line 174 (and Figure 2E): RPA should be either RPA2 or RPA32.

      Corrected (it is RPA2).

      6) Figure 5C-F: Please explain in more detail how the authors quantified the SETX signals. Why the two results are different?

      The quantification was done as described by Sims, et al. 2022, yielding separate data for XY cores and DNA loops. In brief, the green signal was measured along (XY cores) or across (XY DNA loops) the X and Y chromosomes. Signals were normalized by the signal in the autosomal chromosomes.

      Reviewer #3 (Recommendations For The Authors):

      I have no major criticisms, but I include a list of comments and suggestions (some of them conceptual, and disputable) that could help the authors to improve some parts of the manuscript.

      1) Line 52: I realize that the term protein "sequestration" (used in many instances along the manuscript) has been widespread in the literature related to MSCI in the last years. While this might be a cool way to describe the dynamics of proteins accumulating in the sex body, this reviewer considers this term is totally inappropriate. It is confusing and introduces at least to mistakes to the fact of protein accumulation in the sex body. First, it seems to indicate that once trapped in the sex body, proteins are incapable of leaving it, which might be completely wrong (histone replacement refutes this idea). Second, it is suggested that DDR proteins are attracted by the sex body and cannot remain associated to autosomes even if DNA repair has not been completed. This has also been demonstrated to be incorrect (see for example PDMI 19714216). Moreover, DDR proteins can associate de novo to chromosomes if needed, for instance upon DNA damage caused by chemicals or irradiation. Thus, I suggest that the use of "sequestration" should be evaluated more critically, evaluating the misleading ideas that are subjacent to this term. The use of protein "accumulation" is much more objective and descriptive of the real facts.

      We thank the reviewer’s suggestion and have addressed it in lines 52, 97 and 324.

      2) Line 88: Just as a deference to the original ideas, it would be nice to acknowledge that the inactivation of sex chromosomes and the formation of a sex body in mouse meiosis was described more than 50 years ago (PDMI 5833946; 4854664). Likewise, the ideas about the sequential achievement and reinforcement of MSCI during pachytene have been developed during the last 20 years, far before the recent reports cited in the manuscript. Citations to these "old fashion" works would be great.

      We appreciate the reviewer’s suggestion and have addressed it in line 86.

      3) Line 90. Please, take into consideration that such a strong effect on meiosis progression occurs mainly in some knockout mice models and that in many other models (including hybrid mice models from natural populations) autosomal regions can remain unsynapsed and accumulate DDR proteins without impairing meiosis. In other mammalian species, meiosis is even more permissive to these MSUC phenomena.

      We appreciate the reviewer’s suggestion and have addressed it at line 88.

      4) Line 211: The differences in the abundance of MLH1 and MLH3 are remarkable. If these two proteins are supposed to form a heterodimer leading to crossover formation, then the increase of only MLH1 might be related to a different process, not leading to crossover (even not class II ones).

      We agree with the reviewer’s comment and have included this point in the discussion (lines 491- 497).

      5) Line 217: I have some doubts about the results presented in Supplementary Figure 9. First, it is not clear to me how the represented cells counts were performed. Each spot is supposed to represent cell counts in a single individual, but how many cells were counted per individual? The proportion of cells could be a better indicator. Second, some B5/B5 individuals' counts were close to the ones displayed in the wild type. Did mutant animals show a high divergence compared to each other? It could be great to have each individual data displayed in a pie chart, and not only the aggregated data.

      We have now addressed this in the new Supplemental figure 9 legend. Each dot in the graph represents the sum of cells counted for each individual. We counted cells from 8 mice for each, Topbp1+/+ and Topbp1B5/B5.

      Here we summarize the total cells counted per individual:

      Author response table 1.

      6) Line 222: The data on 53BP1 deserve further attention. On the one side, from the analysis presented in Supplementary Figure 11, it seems that 53BP1 tends to show a lower intensity in Topbp1B5/B5 mice. Since only 2 mice were analyzed, while for most of the other proteins 3-8 animals were studied, I suggest increasing the number of animals analyzed for 53BP1 localization, to test if this slight difference turns significant. This is relevant since: 1) the association of 53BP1 protein in somatic cells was clearly affected, and 2) 53BP1 is one of the last MSCI markers incorporated to the sex body at mid-late pachytene. These results should be moved to the main text and not appear as supplementary data. On the other hand, if no differences were to be found in meiosis, compared to somatic cells, how do authors explain these differences? Would 53BP1 have another partner at the sex body apart from TOPBP1? Could TOPBP1 have other BRCT domains (apart from domain 5) able to bind 53BP1?

      We appreciate the reviewer’s suggestion; however, we had an issue with 53BP1 antibody. We analyzed 2 mice and needed to re-order the antibody. This antibody was backordered for almost one year, and when we finally received the order, the company had changed the clone for this antibody, and it no longer worked for meiotic spreads. In somatic cells, we see in HEK-293T a partial disruption in the binding to TOPBP1 B5 through IP-MS and IP-Western blot. The disruption is only partial due to the binding of 53BP1 to other domains in TOPBP1 such as BRCT 1 and 2 (Bigot et al., 2019; Cescutti et al., 2010; Liu et al., 2017). However, in assays in which we would expect a phenotypic response caused by impaired 53BP1, we did not see any effect, such as survival after IR (using the mice) and survival after phleomycin challenge (using Mefs). Moreover, 53BP1 KO mice, males and females, are fertile (Ward et al., 2003) so, the partial disruption in binding to 53BP1 that we observed in TOPBP1 B5 mutant is likely not causing the infertility phenotype.

      7) Line 250: I do not understand what is represented in Figure 5A. Why did the author mix two different experiments (differences in phosphoprotein abundance in B5/B5 compared to wild type and the interference of ATR with AZ20)?

      To account for the differences in cell population observed in the whole testis between Topbp1+/+ and Topbp1B5/B5, and to know exactly which phosphorylation changes were due to disruption in the ATR signaling and not pleiotropic effects, we combined two different phosphoproteomes: One phosphoproteome from the comparison between Topbp1+/+ and Topbp1B5/B5 and another one from the comparison between Vehicle or ATR inhibitor-treated mice. By utilizing this approach, we only consider hits that were disrupted in both analyses. A similar method was used by Sims et.al, 2022 (Sims et al., 2022).

      8) It is not clearly explained what is represented in Figure 6B. There is no explanation in the text or the figure legend. Do this represent the difference between scRNAseq in control and Topbp1B5/B5? If so, please, clarify.

      We thank the reviewer’s comment and have addressed it in the legend of Figure 6B.

      9) Line 342 and following. The authors describe a decrease of gene silencing. The use of two negative concepts is always confusing and results in the conversion to a positive one. I suggest considering the possibility of just talking about increase of gene expression, in order to make the message clearer.

      We appreciate the reviewer’s point here, but it is important to note that the phenomenon disrupted in our mutants is MSCI, which is by definition a gene silencing mechanism. This phenotype is not as simple as “increased gene expression”, it is the removal of a mechanism that is a key feature of prophase I. Thus, because we are focusing on the mechanism of MSCI, it is crucial to maintain this (albeit unusual) terminology.

      10) As for the classification of spermatocytes into 9 categories, I am curious about which spermatocytes are included in each of these categories. For instance, from cytology it seems that in Topbp1B5/B5 mice, spermatocytes are able to reach mid-late pachytene. However, in the spermatocyte categories established by scRNAseq they only reach class 3. Therefore, which are the populations included in the remaining 6 classes of spermatocytes? Do authors have any morphological correlation to these scRNAseq categories? Is it possible that in this mutant morphological advance of meiosis and gene expression profiles are uncoupled?

      The clustering of cells to a specific group is based on RNA expression, which does not always match cytological features. Moreover, during the analysis, cells with high expression of mitochondrial genes are excluded (these are dying cells that do not pass the quality control). Thus, while Topbp1B5/B5 reaches a mid-late-pachytene stage according to cytological analyses, in the single-cell RNA seq analysis we could only detect one pachytene stage. The other 6 remaining categories of spermatocytes can be classified according to their best-fit profile of gene expression. For that, we use the classification described by Chen et al., 2018 and Lau et al.,2020. Spermatocytes 3-5 = Pachytene, Spermatocytes 6-7 = Diplotene, Spermatocytes 8-9 = secondary spermatocytes (metaphase I/II). The gene markers used for this classification are displayed in Author response image 2.

      Author response image 2.

      Genes used as markers of spermatocytes captured in the scRNAseq analysis. Violin plots display the distribution of cells expressing Gm960 (Leptotene marker), Meiob (Leptotene/Zygotene marker), Psma8 (Pachytene marker), Pwill1 (Pachytene marker), Pou5f2 (Diplotene marker), and Ccna1 (Secondary Spermatocytes marker).

      11) Figure 6E shows that overexpression of X-linked genes is not a feature of spermatocytes but it is initiated in spermatogonia. This fact has not been properly stated in the text and perhaps not sufficiently highlighted.

      We noticed subtle changes during the spermatogonia stage and have addressed the reviewer’s comment in lines 317-322, however the downstream analyses related to a defect in X-gene silencing maintenance displayed in Figure 7 were done based on normalization of gene expression to its respective pre-leptotene stage.

      12) Supplementary Figure 24 shows that some X-linked genes are more expressed in Topbp1B5/B5 compared to control mice. In the figure it can be observed that many genes accumulate at the bottom of the graph. Does this have any correlation to the location of these genes along the X chromosome, for instance near or within the PAR? This could correlate with the defects in γH2AX accumulation at this region.

      These are the locations along the chromosome. Only the bottom 5 rows are within the PAR region, so this accumulation is not within the PAR region specifically. The bottom tenth of the genes in the heatmap correspond to roughly a 17 Mb region.

      13) The authors only analyzed the overexpression of genes located on the X chromosome. It would be interesting to show the behavior of Y-linked genes as well.

      The coverage of Y-linked genes was not very high and that is why we have not shown the results in the paper. However, the results for Y-linked genes were similar to the X-linked genes and can be visualized in Author response image 3.

      Author response image 3.

      Single cell RNAseq reveals that Topbp1B5/B5 spermatocytes initiate MSCI but fail to promote full silencing of Y chromosome-linked genes. Violin plot displaying the ratio of the average expression of Y chromosome genes by the average expression of chromosome 9 genes at different stages of spermatogenesis for Topbp1+/+ and Topbp1B5/B5 cells.

      14) Line 425: Authors indicate that it is not known if association of TOPBP1 and BLM, 53BP1 or other proteins is disrupted in Topbp1B5/B5 spermatocytes. Could these experiments be performed in the testis, as they were in somatic cells?

      The cellular composition in Topbp1+/+ and Topbp1B5/B5 testes is very different so it would not be a fair comparison. While we have tried to isolate pachytene cells to perform these experiments, we were successful only when using Topbp1+/+ but not Topbp1B5/B5, likely due to the extremely small size of the mutant testis.

      15) Line 455 and following. I find that the discussion about the role of SETX is not completely clear. It seems that a failure of SETX function could result in defective or no transcription, as a consequence of the impossibility to resolve RNA-DNA hybrid molecules. Therefore, should impairment of SETX lead to reduced or enhanced transcription? Please clarify. On the other hand, this defect in SETX function should affect the whole genome, and not only sex chromosomes. Do authors have any clues about this broad effect?

      We thank the reviewer’s comment and have expanded on discussion in lines 470-474. While we agree with the reviewer’s point that an impairment on SETX should affect the whole genome, however, during pachytene stage, SETX is mostly localized to the sex body. The Topbp1B5/B5 shows a specific defect in X and Y silencing maintenance during pachytene stage, thus we hypothesized that an impairment in SETX localization during pachytene should especially impair the X and Y chromosomes.

      16) As a general comment to the discussion section, I think authors could extend into some specific ideas or speculations. It is shocking that sex chromosome-linked genes are able to escape silencing without dismantling the complex (almost complete) MSCI response in the Topbp1 mutant (although perhaps this is not so surprising considering the high number of escapees reported in the inactivated X chromosome in female somatic cells).

      How to explain this paradox? One possibility (which would make a real breakthrough) is that the expression of sex chromosome-linked genes represents a regulated response to meiotic defects, and not just an unfortunate consequence of a defective MSCI. Thus, MSCI might be somehow irrelevant to prevent the execution of this sex chromosome-based program to stop meiosis progression when needed. The fact that this regulated activation was never proposed is perhaps due to the fact that most of the meiosis mutants characterized so far are unable to reach the stage at which MSCI is properly established, which is the most remarkable difference with the Topbp1 mutant studied here.

      Although naïve, the critical point for the activation of this sex chromosome-based program seems to depend simply on the transcription of Zfy1 and Zfy2 (encoding for transcription factors). The signaling cascades up and downstream these genes are the real mystery, awaiting further studies.

      We thank the very interesting point raised by the reviewer. Our interpretation of the data is that X and Y silencing being a dynamic process requires an initiation step and a maintenance step driven/controlled by the DDR machinery, and that Topbp1B5/B5 shows a grossly normal initiation of X and Y silencing but fails on maintain MSCI. Moreover, the expression of Zfy1 and Zfy2 have been previously demonstrated as enough to trigger cell death (Royo et al., 2010; Vernet et al., 2016), and Topbp1B5/B5 cells show increased expression of these genes. However, we do not exclude the very interesting possibility, raised by the reviewer, that the expression of XY-linked genes represents a regulated response to meiotic defects to stop meiosis progression, leading to the cell death observed in Topbp1B5/B5, which makes the Topbp1B5/B5 an unique model for these studies as most of the previous meiosis mutants are unable to reach the stage at which MSCI is properly established. We add discussion about this exciting point in lines 513-522.

      17) Scale bars are impossible to read in Figures 1I and J, and are missing in all the other image figures. Please, correct.

      We have addressed this in the new Figure 1. For figures displaying meiotic spreads, adding a scale bar is not a common practice in the field as these cells are swollen while being prepared.

      18) Line 828. Since Paula Cohen is an author of the manuscript, it seems weird to acknowledge herself in this section.

      Corrected.

      References

      Adams SR, Maezawa S, Alavattam KG, Abe H, Sakashita A, Shroder M, Broering TJ, Sroga Rios J, Thomas MA, Lin X, Price CM, Barski A, Andreassen PR, Namekawa SH. 2018. RNF8 and SCML2 cooperate to regulate ubiquitination and H3K27 acetylation for escape gene activation on the sex chromosomes. PLoS Genet 14. doi:10.1371/journal.pgen.1007233

      Bigot N, Day M, Baldock RA, Watts FZ, Oliver AW, Pearl LH. 2019. Phosphorylation-mediated interactions with topbp1 couple 53bp1 and 9-1-1 to control the g1 DNA damage checkpoint. Elife 8:1–28.

      Cescutti R, Negrini S, Kohzaki M, Halazonetis TD. 2010. TopBP1 functions with 53BP1 in the G1 DNA damage checkpoint. EMBO J 29:3723–3732.

      Chen Y, Zheng Y, Gao Y, Lin Z, Yang S, Wang T, Wang Q, Xie N, Hua R, Liu M, Sha J, Griswold MD, Li J, Tang F, Tong M-H. 2018. Single-cell RNA-seq uncovers dynamic processes and critical regulators in mouse spermatogenesis. Cell Res 28:879–896.

      Hirota T, Blakeley P, Sangrithi MN, Mahadevaiah SK, Encheva V, Snijders AP, ElInati E, Ojarikre OA, de Rooij DG, Niakan KK, Turner JMA. 2018. SETDB1 Links the Meiotic DNA Damage Response to Sex Chromosome Silencing in Mice. Dev Cell 47:645-659.e6.

      Ichijima Y, Ichijima M, Lou Z, Nussenzweig A, Daniel Camerini-Otero R, Chen J, Andreassen PR, Namekawa SH. 2011. MDC1 directs chromosome-wide silencing of the sex chromosomes in male germ cells. Genes and Development 25:959–971.

      Lau X, Munusamy P, Ng MJ, Sangrithi M. 2020. Single-Cell RNA Sequencing of the Cynomolgus Macaque Testis Reveals Conserved Transcriptional Profiles during Mammalian Spermatogenesis. Dev Cell 54:548-566.e7.

      Liu Y, Cussiol JR, Dibitetto D, Sims JR, Twayana S, Weiss RS, Freire R, Marini F, Pellicioli A, Smolka MB. 2017. TOPBP1Dpb11 plays a conserved role in homologous recombination DNA repair through the coordinated recruitment of 53BP1Rad9. J Cell Biol 216:623–639.

      Modzelewski AJ, Holmes RJ, Hilz S, Grimson A, Cohen PE. 2012. AGO4 regulates entry into meiosis and influences silencing of sex chromosomes in the male mouse germline. Dev Cell 23:251–264. Royo H, Polikiewicz G, Mahadevaiah SK, Prosser H, Mitchell M, Bradley A, De Rooij DG, Burgoyne PS, Turner JMA. 2010. Evidence that meiotic sex chromosome inactivation is essential for male fertility. Curr Biol 20:2117–2123.

      Sims JR, Faça VM, Pereira C, Ascenção C, Comstock W, Badar J, Arroyo-Martinez GA, Freire R, Cohen PE, Weiss RS, Smolka MB. 2022. Phosphoproteomics of ATR signaling in mouse testes. Elife 11. doi:10.7554/eLife.68648

      Vernet N, Mahadevaiah SK, de Rooij DG, Burgoyne PS, Ellis PJI. 2016. Zfy genes are required for efficient meiotic sex chromosome inactivation (MSCI) in spermatocytes. Hum Mol Genet 25:5300–5310.

      Ward IM, Minn K, van Deursen J, Chen J. 2003. p53 Binding protein 53BP1 is required for DNA damage responses and tumor suppression in mice. Mol Cell Biol 23:2556–2563.

      Yeo AJ, Becherel OJ, Luff JE, Graham ME, Richard D, Lavin MF. 2015. Senataxin controls meiotic silencing through ATR activation and chromatin remodeling. Cell Discovery 1. doi:10.1038/celldisc.2015.25

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their work, and the very useful comments.

      Public reviews:

      Reviewer #2

      1) The authors discussed possible reasons for the different results of the RRP sizes between this study and Alten et al., 2021. One of them is how the hypertonic solution is applied. The authors thought that the long application of hypertonic solution in Alten et al., 2021 caused an overlapping release of RRP and upstream vesicle pools because Alten et al., 2021 measured 10-fold larger RRP size than what was measured in this study. However, Alten et al., 2021 measured RRP from IPSCs and a single inhibitory vesicle fusion causes larger charge transfer than an excitatory vesicle. The authors need to take this into consideration and 10-fold is likely an overestimate.

      Answer: Thank you for pointing out this important difference. We have modified the text in the Discussion accordingly and we no longer refer to the 10-fold difference.

      2) Statistical tests should be performed for protein expression levels (Fig 2A and Fig 10A) and in vitro fusion assays (Fig 8D,E and Fig 9 B,C).

      Answer: We inserted new panels B and C in Fig. 2 and Fig. 10 showing all the Western Blot data and performed statistical tests (none were significant). For the in vitro fusion assays, we have inserted statistical tests in panels 8E and 9C. The quantities in those panels (subdivided into “Pre Ca2+”, “post Ca2+” and “end fusion”) are based on the data in Figure 8D and 9B. We have therefore not inserted separate statistical tests in Figures 8D and 9B.

      Reviewer #1 (Recommendations For The Authors):

      It would be quite interesting for future studies to address how these three mutations in SNAP-25 behave in the Syt1 null background in their electrophysiological experiments. Does the I167N allele block the enhanced spontaneous release in the Syt1 null? Do the V48F and D1667 alleles synergize with Syt1 to enhance spontaneous release to even higher levels? By examining how different components interact to shape the energy landscape for priming and fusion, these types of approaches should be quite revealing.

      Answer: We agree with the reviewer that these future studies would be interesting. Unfortunately, they are beyond our current capacities.

      Reviewer #2 (Recommendations For The Authors):

      1) In the introduction, when discussing haploinsufficiency of Munc18-1 causes a decrease in release, additional references should be included, for example, the studies in flies (Wu et al., 1998, EMBO), human neurons (Patzke et al., 2015 JCI), and mouse neurons (Toonen et al., 2006 PNAS; Chen et al., 2020 eLife).

      Answer: Thank you for the suggestion. We have rewritten the text and added additional references.

      2) The authors may consider introducing additional motivations and significance of this study. For example, the evoked EPSCs cannot be properly measured in the cultures of Alten et al., 2021, but was properly studied here.

      Answer: We agree and have added additional motivations in the Introduction.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      We thank the reviewer for the suggestion! We evaluated PLMGraph-Inter with the predicted monomers and analyzed the result in details (see the “Impact of the monomeric structure quality on contact prediction” section and Figure 3). To mimic the real cases, we even deliberately reduced the performance of AF2 by using reduced MSAs (see the 2nd paragraph in the ““Impact of the monomeric structure quality on contact prediction” section). We leave some of the results in the supplementary of the current manuscript (Table S2). We will move these results to the main text to emphasize the performance of PLMGraph-Inter with the predicted monomers in the revision.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      We thank the reviewer for the suggestion! Yes! The performance of PLMGraph-Inter drops when the predicted monomers are used in the prediction. However, it is difficult to say which is a fairer comparison, Figure 6 or Figure S2, since AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native templates. We will provide the AFM confidence values of the AFM predictions in the revision.

      Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      We thank the reviewers for the suggestion! We would like to notify that AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native template.

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number can be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets. In the revision, we will analyze the performance of AFM on targets released after the date cutoff of the AFM training set, but with which we cannot totally remove the redundancy between the training and the test sets of AFM.

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. Our test set would have more overlaps with the training set of AFM V3, which is one reason that we think AFM V2 is more appropriate to be used in the comparison.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      We agree with the reviewer that testing whether the model can keep its performance on targets with no templates (i.e. non-redundant in structure) is important. We will perform the analysis in the revision.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

      Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model. We will further clarity this in the revision.

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding protein-protein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      We thank the reviewer for recognizing the significance of our work! We will revise the manuscript carefully to address the reviewer’s concerns.

      1. The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      We thank the reviewer for the valuable suggestion! Using different thresholds to reduce the redundancy between the test set and the training set is a very good suggestion, and we will perform the analysis in the revision. In the current version of the manuscript, the 40% sequence identity is used as the cutoff for many previous studies used this cutoff (e.g. the Recent-PDB-Multimers used in AlphaFold-Multimer (see: 7.8 Datasets in the AlphaFold-Multimer paper); the work of DSCRIPT: https://www.cell.com/action/showPdf?pii=S2405-4712%2821%2900333-1 (see: the PPI dataset paragraph in the METHODS DETAILS section of the STAR METHODS)). One reason for using the relatively higher threshold for PPI studies is that PPIs are generally not as conserved as protein monomers.

      We performed a preliminary analysis using different thresholds to remove redundancy when preparing this provisional response letter:

      Author response table 1.

      Table1. The performance of PLMGraph-Inter on the HomoPDB and HeteroPDB test sets using native structures(AlphaFold2 predicted structures).

      Method:

      To remove redundancy, we clustered 11096 sequences from the training set and test sets (HomoPDB, HeteroPDB) using MMSeq2 with different sequence identity threshold (40%, 30%, 20%, 10%) (the lowest cutoff for CD-HIT is 40%, so we switched to MMSeq2). Each sequence is then uniquely labeled by the cluster (e.g. cluster 0, cluster 1, …) to which it belongs, from which each PPI can be marked with a pair of clusters (e.g. cluster 0-cluster 1). The PPIs belonging to the same cluster pair (note: cluster n - cluster m and cluster n-cluster m were considered as the same pair) were considered as redundant. For each PPI in the test set, if the pair cluster it belongs to contains the PPI belonging to the training set, we remove that PPI from the test set.

      We will perform more detailed analyses in the revised manuscript.

      1. Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-to-head scatter plots as supplementary figures, not in the main figure.

      We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision.

      3) The authors claim that PLMGraph-Inter is complementary to AlphaFold-multimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      We thank the reviewer for the suggestion! We will add this comparison in the revision.

      4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

      We thank the reviewer for the suggestion! We will perform such analysis in the revision.

    1. Author Response

      We are grateful for the constructive comments of the reviewers. Here is a provisional response to major questions.

      To Question 1, we appreciate that you point out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig 2H-2I, in previous manuscript) whose morning anticipation still exist at some level (Fig a) though the decreases of morning anticipation index (Fig b) and advanced evening activity were not as pronounced as observed in han5304 (Fig 3C Hyun et al., 2005), our response is that the difference between pan-neuronal knockout of PDFR by unmodified Cas9 might be caused by the limited efficiency of unmodified Cas9 in our conditional system. We will adjust the relevant conclusions in the revised version, and these findings underscore the necessity to enhance the efficiency of the original Cas9

      Author response image 1.

      To Question 2, that some expression profiles of clock neurons are not consistent with previous reports, such as Dh31 and ChAT in s-LNvs, our response is that the differences can be attributed to the variation in expression patterns between 3’ terminal KI-LexA (used in this gene expression dissection) and KO-GAL4, KI-GAL4, or transgenic GAL4. We have indeed observed differences when identical sites were inserted in frame with Gal4 or LexA.

      To Question 3, that our description of advanced morning anticipation versus no morning anticipation with the term "opposite" is not accurate enough, our response is that we will modify that. Mutants of CNMa or CNMaR exhibit advanced morning activity, suggesting an inhibitory role of CNMa/CNMaR. Mutants of Pdf/Pdfr, on the other hand, showed no morning anticipation, indicating a promoting role in morning anticipation.

      To Question 4, whether we have generated transgenic UAS-sgRNA flies for all CCT genes or only a subset, our response is that we have indeed generated UAS-sgRNA flies for all CCT genes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signalproviding cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.

      Thanks!

      I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.

      I again finish by commending the authors for this terrific piece of work.

      Thanks! It is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF. We will include this discussion in the text and pursue it in our next project.

      We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all premotor neurons are NotchON neurons while all postsensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the cosubmitted paper), which in turn activates downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.

      Thanks for the positive feedback!

      Strengths:

      The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.

      Thanks!

      Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This supports their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.

      Thanks!

      Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.

      Thanks for the positive feedback on both manuscripts.

      Weaknesses:

      Differential Notch activity in L4 and L5:

      ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.

      We agree. Historically, LPCs are thought to be homogenous; our data suggests otherwise. We now emphasize this in the Discussion as requested. We are also investigating this question using single-cell RNAseq on LPCs to look for molecular heterogeneities. Nevertheless, whether L4 is generated by E(spl)mɣ-GFP+ (NotchON) LPCs does not affect our conclusion that Notch signaling and the primary HDTF Bsh are integrated to specify L4 fate over L5.

      ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.

      Dl is transiently expressed in newborn L1 neurons. To knock down Dl in newborn L1, we need to express Dl-RNAi before the onset of Dl expression in newborn L1; the only known Gal4 line expressed that early is the LPC-Gal4, which is the one that we used.

      ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.

      We agree! Whether L4 neurons are derived from NotchON LPCs is a great question. However, MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter will not work because E(spl)-mɣ-GFP reporter is only expressed in LPCs but not lamina neurons. We now mention this in the Discussion.

      ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these. differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.

      Thanks! It is a great question why Notch induces Espl-mɣ in LPCs but Hey in newborn neurons. However, it is not the question we are tackling in this paper and it will be a great direction to pursue in future. We will add this to our Discussion.

      Notch role in establishing L4 vs L5 fates:

      ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.

      We disagree that the use of 27G05-Gal4 is problematic when performing Notch-KD because our conclusion from Notch-KD is that Bsh without Notch signaling activates Pdm3 and specifies L5 fate. However, 27G05-Gal4 does not have any effect on Pdm3 expression. To make this clearer, we will quantify the percentage of Pdm3+ L5 neurons in Bsh+ lamina neurons for Notch-KD experiment. We are sorry this wasn't clearer.

      ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.

      Thank you for catching this. We will correct it in the text.

      ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?

      Our data show that Bsh with transient Notch signaling in newborn neurons specifies L4 fate while Bsh without Notch signaling in newborn neurons specifies L5 fate. Therefore, we think the window of fate competence is during newborn neurons.

      However, as suggested by the reviewer, we did the experiment (see figure below). We used Gal80 (Gal80 inhibits Gal4 activity at 18C) to temporarily control Bsh-Gal4 activity for expressing N-ICD (the active form of Notch) in L5 neurons. We found that tub-Gal80ts, Bsh-Gal4>UAS-N-ICD is unable to induce ectopic L4 neurons when we shift the temperature from 18C to 30C to inactivate Gal80 at 15 hours after pupal formation, which is close to the end of lamina neurogenesis. However, it is unknown how many hours it takes to inactivate Gal80 and activate Bsh-Gal4 and thus we decided not to include this data in our manuscript.

      Author response image 1.

      L4-to-L3 conversion in the absence of Bsh

      ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dlexpressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.

      Our data show the L4-to-L3 conversion in the absence of Bsh and in the presence of Notch activity while the L5-to-L1 conversion in the absence of Bsh and in the absence of Notch activity. Therefore, Notch activity is necessary for the L4-to-L3 conversion. Unfortunately, currently, we only have Hey as an available Notch target reporter in newborn neurons. To tackle this challenge in the future, we will profile the genome-binding targets of endogenous Notch in newborn neurons. This will identify novel genes as Notch signaling reporters in neurons for the field.

      ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.

      That is an interesting suggestion, but without knowing that Bsh + Notch = L4 identity the experiment would be hard to interpret. Note that we took advantage of Notch signaling to trace the cell fate in the absence of Bsh and found the L4-to-L3 conversion (see Figure 5G-K).

      Different chromatin landscape in L4 and L5 neurons

      ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation. (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.

      We agree and appreciate the comment, it is well justified. We have toned down our comments and clearly state that this is a correlation that needs to be tested for a causal relationship. The reviewer posits: “An alternative hypothesis: different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.” Yes, it is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF (e.g., Bsh). We will include this discussion in the text and pursue it in our next project. We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all premotor neurons are NotchON neurons while all post-sensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.

      We agree and think in L5 neurons, the secondary HDTF Pdm3 also contributes to L5-specific gene transcription during the synaptogenesis window, in addition to Bsh. We will include this in the text.

    1. Author Response

      The following is the authors’ response to the latest reviews.

      A revised version of the manuscript models "slope-based" excitability changes in addition to "threshold-based" changes. This serves to address the above concern that as constructed here changes in excitability threshold are not distinguishable from changes in input. However, it remains unclear what the model would do should only a subset of neurons receive a given, fixed input. In that case, are excitability changes sufficient to induce drift? This remains an important question that is not addressed by the paper in its current form.

      Thank you for this important point. In the simulation of two memories (Fig. S6), we stimulated half of the neural population for each of the two memories. We therefore also showed that drift happens when only a subset of neuron was simulated.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Current experimental work reveals that brain areas implicated in episodic and spatial memory have a dynamic code, in which activity r imulated networks for epresenting familiar events/locations changes over time. This paper shows that such reconfiguration is consistent with underlying changes in the excitability of cells in the population, which ties these observations to a physiological mechanism.

      Delamare et al. use a recurrent network model to consider the hypothesis that slow fluctuations in intrinsic excitability, together with spontaneous reactivations of ensembles, may cause the structure of the ensemble to change, consistent with the phenomenon of representational drift. The paper focuses on three main findings from their model: (1) fluctuations in intrinsic excitability lead to drift, (2) this drift has a temporal structure, and (3) a readout neuron can track the drift and continue to decode the memory. This paper is relevant and timely, and the work addresses questions of both a potential mechanism (fluctuations in intrinsic excitability) and purpose (time-stamping memories) of drift.

      The model used in this study consists of a pool of 50 all-to-all recurrently connected excitatory neurons with weights changing according to a Hebbian rule. All neurons receive the same input during stimulation, as well as global inhibition. The population has heterogeneous excitability, and each neuron's excitability is constant over time apart from a transient increase on a single day. The neurons are divided into ensembles of 10 neurons each, and on each day, a different ensemble receives a transient increase in the excitability of each of its neurons, with each neuron experiencing the same amplitude of increase. Each day for four days, repetitions of a binary stimulus pulse are applied to every neuron.

      The modeling choices focus in on the parameter of interest-the excitability-and other details are generally kept as straightforward as possible. That said, I wonder if certain aspects may be overly simple. The extent of the work already performed, however, does serve the intended purpose, and so I think it would be sufficient for the authors to comment on these choices rather than to take more space in this paper to actually implement these choices. What might happen were more complex modeling choices made? What is the justification for the choices that are made in the present work?

      The two specific modeling choices I question are (1) the excitability dynamics and (2) the input stimulus. The ensemble-wide synchronous and constant-amplitude excitability increase, followed by a return to baseline, seems to be a very simplified picture of the dynamics of intrinsic excitability. At the very least, justification for this simplified picture would benefit the reader, and I would be interested in the authors' speculation about how a more complex and biologically realistic dynamics model might impact the drift in their network model. Similarly, the input stimulus being binary means that, on the singleneuron level, the only type of drift that can occur is a sort of drop-in/drop-out drift; this choice excludes the possibility of a neuron maintaining significant tuning to a stimulus but changing its preferred value. How would the use of a continuous input variable influence the results.

      (1) In our model, neurons tend to compete for allocation to the memory ensemble: neurons with higher excitability tend to be preferentially allocated and neurons with lower excitability do not respond to the stimulus. Because relative, but not absolute excitability biases this competition, we suggest that the exact distribution of excitability would not impact the results qualitatively. On the other hand, the results might vary if excitability was considered dependent on the activity of the neurons as previously reported experimentally (Cai 2016, Rachid 2016, Pignatelli 2019). An increase in excitability following neural activity might induce higher correlation among ensembles on consecutive days, decreasing the drift.

      (2) We thank the reviewer for this very good point. Indeed, two recent studies (Geva 2023 , Khatib 2023) have highlighted distinct mechanisms for a drift of the mean firing rate and the tuning curve. We extended the last part of the discussion to include this point: “Finally, we intended to model drift in the firing rates, as opposed to a drift in the turning curve of the neurons. Recent studies suggest that drifts in the mean firing rate and tuning curve arise from two different mechanisms [33, 34]. Experience drives a drift in neurons turning curve while the passage of time drives a drift in neurons firing rate. In this sense, our study is consistent with these findings by providing a possible mechanism for a drift in the mean firing rates of the neurons driven a dynamical excitability. Our work suggests that drift can depend on any experience having an impact on excitability dynamics such as exercise as previously shown experimentally [9, 35] but also neurogenesis [9, 31, 36], sleep [37] or increase in dopamine level [38]”

      Result (1): Fluctuations in intrinsic excitability induce drift

      The two choices highlighted above appear to lead to representations that never recruit the neurons in the population with the lowest baseline excitability (Figure 1b: it appears that only 10 neurons ever show high firing rates) and produce networks with very strong bidirectional coupling between this subset of neurons and weak coupling elsewhere (Figure 1d). This low recruitment rate need may not necessarily be problematic, but it stands out as a point that should at least be commented on. The fact that only 10 neurons (20% of the population) are ever recruited in a representation also raises the question of what would happen if the model were scaled up to include more neurons.

      This is a very good point. To test how the model depends on the network size, we plotted the drift index against the size of the ensemble. With this current implementation, we did not observe a significant correlation between the drift rate and size of the initial ensemble (Figure S2).

      Author response image 1.

      The rate of the drift does not depend on the size of the engram. Drift rate against the size of the original engram. Each dot shows one simulation (Methods). n = 100 simulations.

      Result (2): The observed drift has a temporal structure

      The authors then demonstrate that the drift has a temporal structure (i.e., that activity is informative about the day on which it occurs), with methods inspired by Rubin et al. (2015). Rubin et al. (2015) compare single-trial activity patterns on a given session with full-session activity patterns from each session. In contrast, Delamare et al. here compare full-session patterns with baseline excitability (E = 0) patterns. This point of difference should be motivated. What does a comparison to this baseline excitability activity pattern tell us? The ordinal decoder, which decodes the session order, gives very interesting results: that an intermediate amplitude E of excitability increase maximizes this decoder's performance. This point is also discussed well by the authors. As a potential point of further exploration, the use of baseline excitability patterns in the day decoder had me wondering how the ordinal decoder would perform with these baseline patterns.

      This is a good point. Here, we aimed at dissociating the role of excitability from the one of the recurrent currents. We introduced a time decoder that compares the pattern with baseline excitability (E = 0), in order to test whether the temporal information was encoded in the ensemble i.e. in the recurrent weights. By contrast, because the neural activity is by construction biased towards excitability, a time decoder performed on the full session would work in a trivial way.

      Result (3): A readout neuron can track drift

      The authors conclude their work by connecting a readout neuron to the population with plastic weights evolving via a Hebbian rule. They show that this neuron can track the drifting ensemble by adjusting its weights. These results are shown very neatly and effectively and corroborate existing work that they cite very clearly.

      Overall, this paper is well-organized, offers a straightforward model of dynamic intrinsic excitability, and provides relevant results with appropriate interpretations. The methods could benefit from more justification of certain modeling choices, and/or an exploration (either speculative or via implementation) of what would happen with more complex choices. This modeling work paves the way for further explorations of how intrinsic excitability fluctuations influence drifting representations.

      Reviewer #2 (Public Review):

      In this computational study, Delamare et al identify slow neuronal excitability as one mechanism underlying representational drift in recurrent neuronal networks and that the drift is informative about the temporal structure of the memory and when it has been formed. The manuscript is very well written and addresses a timely as well as important topic in current neuroscience namely the mechanisms that may underlie representational drift.

      The study is based on an all-to-all recurrent neuronal network with synapses following Hebbian plasticity rules. On the first day, a cue-related representation is formed in that network and on the next 3 days it is recalled spontaneously or due to a memory-related cue. One major observation is that representational drift emerges day-by-day based on intrinsic excitability with the most excitable cells showing highest probability to replace previously active members of the assembly. By using a daydecoder, the authors state that they can infer the order at which the reactivation of cell assemblies happened but only if the excitability state was not too high. By applying a read-out neuron, the authors observed that this cell can track the drifting ensemble which is based on changes of the synaptic weights across time. The only few questions which emerged and could be addressed either theoretically or in the discussion are as follows:

      1. Would the similar results be obtained if not all-to-all recurrent connections would have been molded but more realistic connectivity profiles such as estimated for CA1 and CA3?

      This is a very interesting point. We performed further simulations to show that the results are not dependent on the exact structure of the network. In particular, we show that all-to-all connectivity is not required to observe a drift of the ensemble. We found similar results when the recurrent weights matrix was made sparse (Fig. S4a-c, Methods). Similarly to all-to-all connectivity, we found that the ensemble is informative about its temporal history (Fig. S4d) and that an output neuron can decode the ensemble continuously (Fig. S4e).

      Author response image 2.

      Sparse recurrent connectivity shows similar drifting behavior as all-to-all connectivity. The same simulation protocol as Fig. 1 was used while the recurrent weights matrix was made 50% sparse (Methods). a) Firing rates of the neurons across time. The red traces correspond to neurons belonging to the first assembly, namely that have a firing rate higher than the active threshold after the first stimulation. The black bars show the stimulation and the dashed line shows the active threshold. b) Recurrent weights matrices after each of the four stimuli show the drifting assembly. c) Correlation of the patterns of activity between the first day and every other days. d) Student's test t-value of the ordinal time decoder, for the real (blue) and shuffled (orange) data and for different amplitudes of excitability E. e) Center of mass of the distribution of the output weights (Methods) across days. c-e) Data are shown as mean ± s.e.m. for n = 10 simulations.

      1. How does the number of excited cells that could potentially contribute to an engram influence the representational drift and the decoding quality?

      This is indeed a very good question. We did not observe a significant correlation between the drift rate and size of the initial ensemble (Fig. S2).

      Author response image 3.

      The rate of the drift does not depend on the size of the engram. Drift rate against the size of the original engram. Each dot shows one simulation (Methods). n = 100 simulations.

      1. How does the rate of the drift influence the quality of readout from the readout-out neuron?

      We thank the reviewer for this interesting question. We introduced a measure of the “read-out quality” and plotted this value against the rate of the drift. We found a small correlation between the two quantities. Indeed, the read-out quality decreases with the rate of the drift.

      Author response image 4.

      The quality of the read-out decreases with the rate of the drift. Read-out quality computed on the firing rate of the output neuron against the rate of the drift (Methods). Each dot shows one simulation. n = 100 simulations.

      Reviewer #3 (Public Review):

      The authors explore an important question concerning the underlying mechanism of representational drift, which despite intense recent interest remains obscure. The paper explores the intriguing hypothesis that drift may reflect changes in the intrinsic excitability of neurons. The authors set out to provide theoretical insight into this potential mechanism.

      They construct a rate model with all-to-all recurrent connectivity, in which recurrent synapses are governed by a standard Hebbian plasticity rule. This network receives a global input, constant across all neurons, which can be varied with time. Each neuron also is driven by an "intrinsic excitability" bias term, which does vary across cells. The authors study how activity in the network evolves as this intrinsic excitability term is changed.

      They find that after initial stimulation of the network, those neurons where the excitability term is set high become more strongly connected and are in turn more responsive to the input. Each day the subset of neurons with high intrinsic excitability is changed, and the network's recurrent synaptic connectivity and responsiveness gradually shift, such that the new high intrinsic excitability subset becomes both more strongly activated by the global input and also more strongly recurrently connected. These changes result in drift, reflected by a gradual decrease across time in the correlation of the neuronal population vector response to the stimulus.

      The authors are able to build a classifier that decodes the "day" (i.e. which subset of neurons had high intrinsic excitability) with perfect accuracy. This is despite the fact that the excitability bias during decoding is set to 0 for all neurons, and so the decoder is really detecting those neurons with strong recurrent connectivity, and in turn strong responses to the input. The authors show that it is also possible to decode the order in which different subsets of neurons were given high intrinsic excitability on previous "days". This second result depends on the extent by which intrinsic excitability was increased: if the increase in intrinsic excitability was either too high or too low, it was not possible to read out any information about past ordering of excitability changes.

      Finally, using another Hebbian learning rule, the authors show that an output neuron, whose activity is a weighted sum of the activity of all neurons in the network, is able to read out the activity of the network. What this means specifically, is that although the set of neurons most active in the network changes, the output neuron always maintains a higher firing rate than a neuron with randomly shuffled synaptic weights, because the output neuron continuously updates its weights to sample from the highly active population at any given moment. Thus, the output neuron can readout a stable memory despite drift.

      Strengths:

      The authors are clear in their description of the network they construct and in their results. They convincingly show that when they change their "intrinsic excitability term", upon stimulation, the Hebbian synapses in their network gradually evolve, and the combined synaptic connectivity and altered excitability result in drifting patterns of activity in response to an unchanging input (Fig. 1, Fig. 2a). Furthermore, their classification analyses (Fig. 2) show that information is preserved in the network, and their readout neuron successfully tracks the active cells (Fig. 3). Finally, the observation that only a specific range of excitability bias values permits decoding of the temporal structure of the history of intrinsic excitability (Fig. 2f and Figure S1) is interesting, and as the authors point out, not trivial.

      Weaknesses:

      1. The way the network is constructed, there is no formal difference between what the authors call "input", Δ(t), and what they call "intrinsic excitability" Ɛ_i(t) (see Equation 3). These are two separate terms that are summed (Eq. 3) to define the rate dynamics of the network. The authors could have switched the names of these terms: Δ(t) could have been considered a global "intrinsic excitability term" that varied with time and Ɛ_i(t) could have been the external input received by each neuron i in the network. In that case, the paper would have considered the consequence of "slow fluctuations of external input" rather than "slow fluctuations of intrinsic excitability", but the results would have been the same. The difference is therefore semantic. The consequence is that this paper is not necessarily about "intrinsic excitability", rather it considers how a Hebbian network responds to changes in excitatory drive, regardless of whether those drives are labeled "input" or "intrinsic excitability".

      This is a very good point. We performed further simulations to model “slope-based”, instead of “threshold-based”, changes in excitability (Fig. S5a, Methods). In this new definition of excitability, we changed the slope of the activation function, which is initially sampled from a random distribution. By introducing a varying excitability, we found very similar results than when excitability was varied as the threshold of the activation function (Fig. S5b-d). We also found similarly that the ensemble is informative about its temporal history (Fig. S5e) and that an output neuron can decode the ensemble continuously (Fig. S5f).

      Author response image 5.

      Change of excitability as a variable slope of the input-output function shows similar drifting behavior as considering a change in the threshold. The same simulation protocol as Fig. 1 was used while the excitability changes were modeled as a change in the activation function slope (Methods). a) Schema showing two different ways of defining excitability, as a threshold (top) or slope (bottom) of the activation function. Each line shows one neuron and darker lines correspond to neurons with increased excitability. b) Firing rates of the neurons across time. The red traces correspond to neurons belonging to the first assembly, namely that have a firing rate higher than the active threshold after the first stimulation. The black bars show the stimulation and the dashed line shows the active threshold. c) Recurrent weights matrices after each of the four stimuli show the drifting assembly. d) Correlation of the patterns of activity between the first day and every other days. e) Student's test t-value of the ordinal time decoder, for the real (blue) and shuffled (orange) data and for different amplitudes of excitability E. f) Center of mass of the distribution of the output weights (Methods) across days. d-f) Data are shown as mean ± s.e.m. for n = 10 simulations.

      1. Given how the learning rule that defines input to the readout neuron is constructed, it is trivial that this unit responds to the most active neurons in the network, more so than a neuron assigned random weights. What would happen if the network included more than one "memory"? Would it be possible to construct a readout neuron that could classify two distinct patterns? Along these lines, what if there were multiple, distinct stimuli used to drive this network, rather than the global input the authors employ here? Does the system, as constructed, have the capacity to provide two distinct patterns of activity in response to two distinct inputs?

      This is an interesting point. In order to model multiple memories, we introduced non-uniform feedforward inputs, defining different “contexts” (Methods). We adapted our model so that two contexts target two random sub-populations in the network. We also introduced a second output neuron to decode the second memory. The simulation protocol was adapted so that each of the two contexts are stimulated every day (Fig. S6a). We found that the network is able to store two ensembles that drift independently (Fig. S6 and S7a). We were also able to decode temporal information from the patterns of activity of both ensembles (Fig. S7b). Finally, both memories could be decoded independently using two output neurons (Fig. S7c and d).

      Author response image 6.

      Two distinct ensembles can be encoded and drift independently. a) and b) Firing rates of the neurons across time. The red traces in panel b) correspond to neurons belonging to the first assembly and the green traces to the second assembly on the first day. They correspond to neurons having a firing rate higher than the active threshold after the first stimulation of each assembly. The black bars show the stimulation and the dashed line shows the active threshold. c) Recurrent weights matrices after each of the eight stimuli showing the drifting of the first (top) and second (bottom) assembly.

      Author response image 7.

      The two ensembles are informative about their temporal history and can be decoded using two output neurons. a) Correlation of the patterns of activity between the first day and every other days, for the first assembly (red) and the second assembly (green). b) Student's test t-value of the ordinal time decoder, for the first (red, left) and second ensemble (green, right) for different amplitudes of excitability E. Shuffled data are shown in orange. c) Center of mass of the distribution of the output weights (Methods) across days for the first (w?ut , red) and second (W20L't , green) ensemble. a-c) Data are shown as mean ± s.e.m. for n = 10 simulations. d) Output neurons firing rate across time for the first ensemble (Yl, top) and the second ensemble (h, bottom). The red and green traces correspond to the real output. The dark blue, light blue and yellow traces correspond to the cases where the output weights were randomly shuffled for every time points after presentation of the first, second and third stimulus, respectively.

      Impact:

      Defining the potential role of changes in intrinsic excitability in drift is fundamental. Thus, this paper represents a potentially important contribution. Unfortunately, given the way the network employed here is constructed, it is difficult to tease apart the specific contribution of changing excitability from changing input. This limits the interpretability and applicability of the results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weinberger et al. use different fate-mapping models, the FIRE model and PLX-diet to follow and target different macrophage populations and combine them with single-cell data to understand their contribution to heart regeneration after I/R injury. This question has already been addressed by other groups in the field using different models. However, the major strength of this manuscript is the usage of the FIRE mouse model that, for the first time, allows specific targeting of only fetal-derived macrophages. The data show that the absence of resident macrophages is not influencing infarct size but instead is altering the immune cell crosstalk in response to injury, which is in line with the current idea in the field that macrophages of different origins have distinct functions in tissues, especially after an injury. To fully support the claims of the study, specific targeting of monocyte-derived macrophages or the inhibition of their influx at different stages after injury would be of high interest. In summary, the study is well done and important for the field of cardiac injury. But it also provides a novel model (FIRE mice + RANK-Cre fate-mapping) for other tissues to study the function of fetal-derived macrophages while monocyte-derived macrophages remain intact.

      Response from the authors: We thank the reviewer for the thorough review and the positive feedback, and we agree that the Csf1r-FIRE mice represent an interesting model for studying the role of resident embryo-derived macrophages in different tissues and pathologies.

      Recent work of the Cochain lab demonstrated by combined CITE-seq analysis and CCR2 antibody treatment that monocyte depletion does not affect levels of resident tissue macrophages after myocardial infarction (REF Rizzo et al PMID: 35950218), supporting the concept to specifically investigate the role of resident and recruited macrophages. While previous work has addressed the effects of broad CCR2-mediated monocyte depletion, information on differential macrophage subsets derived from blood monocytes has been lacking. We agree with the reviewer that targeting subsets of monocyte-derived macrophages, such as for example Ly6Chi monocytes, MHCII+Il1b+ macrophages, and Isg15hi populations (REF Rizzo et al PMID: 35950218), or interference with their recruitment at different time-points after myocardial infarction would be of interest and could help to decipher their functions in the different stages of cardiac healing. However, these studies would go beyond the scope of the current analysis and will be addressed in a separate project.

      Reviewer #2 (Public Review):

      In this study Weinberger et al. investigated cardiac macrophage subsets after ischemia/reperfusion (I/R) injury in mice. The authors studied a ∆FIRE mouse model (deletion of a regulatory element in the Csf1r locus), in which only tissue resident macrophages might be ablated. The authors showed a reduction of resident macrophages in ∆FIRE mice and characterized its macrophages populations via scRNAseq at baseline conditions and after I/R injury. 2 days after I/R protocol ∆FIRE mice showed an enhanced pro inflammatory phenotype in the RNAseq data and differential effects on echocardiographic function 6 and 30 days after I/R injury. Via flow cytometry and histology the authors confirmed existing evidence of increased bone marrow-derived macrophage infiltration to the heart, specifically to the ischemic myocardium. Macrophage population in ∆FIRE mice after I/R injury were only changed in the remote zone. Further RNAseq data on resident or recruited macrophages showed transcriptional differences between both cell types in terms of homeostasis-related genes and inflammation. Depleting all macrophage using a Csf1r inhibitor resulted in a reduced cardiac function and increased fibrosis.

      Strengths

      1) The authors utilized robust methodology encompassing state of the art immunological methods, different genetic mouse models and transcriptomics.

      2) The topic of this work is important given the emerging role of tissue resident macrophages in cardiac homeostasis and disease.

      Response from the authors: We thank the reviewer for pointing out the strengths of our study, and putting the findings in context of the current view of the role of resident macrophages.

      Weaknesses:

      1) Specificity of ∆FIRE mouse model for ablating resident macrophages.

      The study builds on the assumption that only resident macrophages are ablated in ∆FIRE mice, while bone marrow-derived macrophages are unaffected. While the effects of the ∆FIRE model is nicely shown for resident macrophages, the authors did not directly assess bone marrow-derived macrophages. Moreover, in the immunohistological images in Fig. 1D nearly all macrophages appear to be absent. It would be helpful to further address the question of whether recruited macrophages are influenced in ∆FIRE mice. Evaluation of YFP positive heart and blood cells in ∆FIRE mice crossed with Flt3CreRosa26eYFP mice could clarify whether bone marrow-derived cardiac macrophages are influenced in ∆FIRE mice. This would be even more relevant in the I/R model where recruitment of bone marrow-derived macrophages is increased. A more direct assessment of recruited macrophages in ∆FIRE mice could also help to discuss potential similarities or discrepancies to the study of Bajpai et al, Circ Res 2018, which showed distinct effects of resident versus recruited macrophages after myocardial infarction. Providing the quantification of flow cytometry data (fig. 1E-F) would be supportive.

      Response from the authors: We thank the reviewer for these comments. The reviewer addresses the specificity of the ∆FIRE mouse model for ablating resident macrophages and its potential effects on bone marrow-derived macrophages. Our single-cell sequencing data support the specificity of the ∆FIRE model regarding embryo-derived resident macrophages in two ways. First, the ∆FIRE mice are characterized by the specific reduction of embryo-derived macrophage clusters (e.g. homeostatic macrophages as well as antigen-presenting macrophages) in baseline conditions, while the abundance of recruited macrophages (e.g. Ccr2hiLy6chi macrophages, Cx3Cr1hi macrophages) is not altered (Fig. 2B-D). Second, transcriptomic analysis of bone marrow-derived macrophage clusters (e.g. Ccr2hiLy6chi macrophages, Cx3Cr1hi macrophages) and of monocytes revealed no differences in ∆FIRE compared to control mice. On the other hand, we found substantial transcriptome differences in clusters that were mainly of embryonic origins (e.g. homeostatic macrophages as well as antigenpresenting macrophages) (Fig.2 and Fig S.4). These findings indicate that the ∆FIRE model mainly induces changes in embryo-derived macrophages.

      We agree with this reviewer that crossbreeding of ∆FIRE mice with Flt3CreRosa26eYFP mice would be of interest, and we have been working hard to establish this line. However, our breeding efforts have thus far been in vain, which is probably due to the necessity to keep a CBA/Ca background for the FIRE model (as reported by JAX: https://www.jax.org/strain/032783) and requires further backcrossing of Flt3CreRosa26eYFP mice with the respective CBA strain. In future work, we plan to carry out this experiment and also to specifically target monocyte-derived macrophages.

      The reviewer further asks about the modality to quantify cardiac macrophages, and suggests flow cytometry to quantify their number and not only use immunohistology. The quantification of cardiac immune cells shown in Fig. 1D (formerly 1C) was in fact performed by flow cytometry. We apologize for the lack of clarity. We rearranged the figure and added this information to the figure legend. We also added quantification by immunohistology, which is now shown in Fig. 1G.

      2) Limited adverse cardiac remodeling in ∆FIRE mice after I/R.

      The authors suggested an adverse cardiac remodeling in ∆FIRE mice. However, the relevance of a <5% reduction in ejection fraction/stroke volume within an overall normal range in ∆FIRE mice is questionable. Moreover, 6 days after I/R injury ∆FIRE mice were protected from the impairment in ejection fraction and had a smaller viability defect. Based on the data few questions may arise: Why was ablation of resident macrophages beneficial at earlier time points? Are recruited macrophages affected in ∆FIRE mice (see above)? Overall, the manuscript could benefit if the claim of an adverse remodeling in ∆FIRE mice would be discussed more carefully.

      Underlying mechanisms:

      The study did not functionally evaluated targets from transcriptomics to provide further mechanistic insights. It would be helpful if the authors discuss potential mechanisms of the differential effects of macrophages after ischemia in more detail.

      Response from the authors: The reviewer raises the question why the ablation of resident macrophages trends towards a beneficial effect at earlier time points after I/R injury. Further, the reviewer questions the relevance of a <5% reduction in ejection fraction/stroke volume over time in the light of an otherwise modestly reduced ejection fraction.

      In this study we used the experimental mouse model of ischemia-reperfusion injury with transient (1h) coronary artery occlusion. The potential disadvantage of this model is the smaller infarct size and smaller effects on cardiac function. However, it better represents the clinical picture and pathology of myocardial infarction in human patients with timely reperfusion by percutaneous coronary intervention. Infarct size after I/R was approx. 25% in control animals indicating relevant cardiac injury. Further, infarct size was reduced to approx. 16% in ∆FIRE mice 6 days after infarction, however, the difference did reach statistical significance. In line with this, the ejection fraction was numerically reduced on d6 after infarction in the control group, however with no statistical significance. In the chronic phase after infarction, the ejection fraction improved over time in the control group by approx. 5% and decreased in ∆FIRE mice by 4%, which resulted in a difference (delta) of 9% change of ejection fraction. This indicated adverse remodeling in ∆FIRE mice.

      We agree that the different impact of the absence of resident cardiac macrophages during the course of myocardial healing after injury is of great interest to the field. We discuss potential mechanisms of the differential effects of resident macrophage ablation in lines 290-314 in the revised manuscript. However, to decipher the influence of embryo-derived macrophages at different time points after infarction, an inducible model for specific depletion of this macrophage population would be necessary, which to our knowledge does not exist.

      In the revised manuscript, we now discuss the effects on cardiac healing in ∆FIRE and also the limitations more thoroughly.

      Other:

      • It is unclear why the authors performed RNAseq experiments 2 days after I/R (fig. 5/6), while the proposed functional phenotype occurred later. - A sample size of 2 animals per group appears very limited for RNAseq in ∆FIRE mice (fig.6).

      Response from the authors: We chose a time point in the “late early phase” of myocardial infarction (= day 2 post I/R) as we were also interested in the effect of resident macrophage depletion on other immune cell subsets (e.g. neutrophils) which could only be captured in this time period.

      We aimed to analyse 10000 cells per condition. The applied sample size allowed us to analyse 13452 CD45+cells from ∆FIRE mice and 9152 cells from control mice in infarct condition.

      Lines 299-324 "Ablation of resident macrophages altered macrophage crosstalk to non-macrophage immune cells, especially lymphocytes and neutrophils. This was characterized by a proinflammatory gene signature, such as neutrophil expression of inflammasome-related genes and a reduction in anti-inflammatory genes like Chil3 and Lcn2. Interestingly, inflammatory polarization of neutrophils have also been associated with poor outcome after ischemic brain injury (Cuartero et al, 2013). Clinical trials in myocardial infarction patients showed a correlation of inflammatory markers with the extent of myocardial damage {Sanchez, 2006 #2763} and with short- and long-term mortality {Mueller, 2002 #2780}.

      Our study provides evidence that the absence of resident macrophages negatively influences cardiac remodeling in the late postinfarction phase in ∆FIRE mice indicating their biological role in myocardial healing. In the early phase after I/R injury, absence of resident macrophages had no significant effect on infarct size or LV function. These observations potentially indicate a protective role in the chronic phase after myocardial infarction by modulating the inflammatory response, including adjacent immune cells like neutrophils or lymphocytes.

      Deciphering in detail the specific functions of resident macrophages is of considerable interest but requires both cell-specific and temporally-controlled depletion of respective immune cells in injury, which to our knowledge is not available at present. These experiments could be important to tailor immune-targeted treatments of myocardial inflammation and postinfarct remodelling."

      Reviewer #1 (Recommendations For The Authors):

      1) Fetal-derived macrophages are often involved in organ development and function during steady-state. The authors should show heart morphology/function before I/R injury to make sure that the cause for a worsened outcome in FIRE mice is not due to a developmental/functional defect.

      Response from the author: We conducted a gross analysis of cardiac morphology by histology, and did not determine differences to littermate controls. However, we have not conducted a detailed investigation of cardiac development since this was not the scope of this study. Further, our study mainly shows differences in cardiac healing between d6 and d30, which is unlikely influenced by developmental defects.

      2) Line 164: The authors state that they have analysed macrophages via flow cytometry, but Figure 4a only shows IF. Quantification of different macrophage subsets via flow cytometry should be included in this model.

      Response from the author: The sentence “To gain a deeper understanding of the inflammatory processes taking place in the infarcted heart, we quantified macrophage distribution by immunofluorescence and flow cytometry analysis of ischemic and remote areas after I/R.” beginning line 164 describes the entire figure 4 and not only 4a. Here we show IF as well as flow cytometry to describe numbers but also different subpopulations of macrophages (BM-derived vs. resident).

      3) Lines 254-255 (now starting 267): it is not entirely true that the heart does not harbor BM-derived macrophages under steady state. Of course, there are many more after I/R injury, but the authors should take also their own data into account (Figure 1c, e showing a clear reduction but not complete absence of macrophages) and not claim a "scarce" population. See also Dick et al (PMID: 30538339), where both, the Ccr2-Tim4- and Ccr2+ populations are (slowly) replaced by BM monocytes.

      Response from the author: We thank the reviewer for this comment. We changed “scarce population” to “small population”.

      4) Lines 269-273 (now starting line 283): The point that DT-mediated depletion of cells causes inflammation that may have an impact on macrophages is compelling. However, the approach of combining and correlating data from PLX diet and FIRE mice is not proof that the significant increase in infarct size and deterioration of left ventricular function after I/R injury is driven by monocyte-derived macrophages. The authors could use Ccr2KO mice or injection of Ly6C antibody to show the specific functions of recruited macrophages.

      Response from the author: In this study we combine a specific genetic depletion of resident macrophages (FIRE) with an pharmaceutical depletion of all macrophage populations (Csf1r-inhibiton with PLX5622). We did not aim to specifically deplete monocyte-derived macrophages, which has been addressed previously by Bajpai et al. (PMID: 30582448) using the CCR2-DTR mouse line. To address the functions of recruited macrophages would go beyond the scope of the manuscript.

      Along these lines: the authors discuss that neutrophils may have been targeted in the Ccr2-DTR model. However, the egress of neutrophils in the CCR2 KO model is not affected and should be a good model to look at the impact of monocyte-derived macrophages after I/R injury in the heart.

      Response from the author: We agree with the reviewer that CCR2 under steady state conditions might not be important for the egress of neutrophils. However, after ischemic injury CCR2-inhibition has been shown to impair neutrophil egress as well as neutrophil recruitment to ischemic tissue in an ischemia-reperfusion injury model (PMID: 28670376).

      5) Line 299 (now line 332): Reference is missing for Ccr2-DTR mice study

      Response from the author: We added the respective reference.

      6) Can the authors take also the timing of treatment/cell depletion into account in their discussion incoming monocytes may be required in the first days after injury to promote the regeneration process so that targeting them before the onset of the injury may be detrimental while targeting them during the chronic phase may be beneficial.

      Response from the author: We thank the reviewer for this comment. We added the following sentence to the manuscript (Lines 343-346):

      “An explanation of this controversy might be the timing and duration of macrophage depletion. Bajpai et al. depleted recruited macrophages only in the initial phase of myocardial infarction which improved cardiac healing (Bajpai et al., 2019), while depletion of macrophages over a longer period of time, as shown in our study, is detrimental for cardiac repair.”

      7) Figure 6E, F: Why are the outgoing signals pooled? The data has the strength of distinguishing between distinct populations. This data should be used and exploited to work out distinct pathways of distinct macrophage populations in more detail. From the representation, it remains unclear which pathways are active and distinct between Ctrl and FIRE mice besides the few chosen once (inflammasome). Also, legends are missing (what is red/blue?)

      Response from the author: We thank the reviewer for this comment. The aim of this analysis was to evaluate the effect of the FIRE ko on communication of immune cells in infarct conditions. To address changes in all populations which are affected by the FIRE ko we pooled the respective clusters (e.g. homeostatic, antigen-presenting and Ccr2loLy6clo Mø clusters). We provided the detailed analysis of the individual clusters in the new Supplemental Figure 9. Further, we added the respective legend to the Figure.

      8) The methods part mentioned CD169-DTR mice, however, there are no experiments shown in the manuscript. Further, how did the authors breed the FIRE mice? It is known in the field that they have big developmental issues and behavioural deficits if kept on a B6 background, which was likely the case in the study, at least for the fate-mapping approach.

      Response from the author: We removed the CD169-DTR reference from the methods part.<br /> FIRE mice were kept on a CBA/Ca background. As mentioned by the reviewer this was not the case for the experiment where reporter mice were bred with FIRE mice (Csf1rΔFIRE/+RankCreRosa26eYFP) as these mice are on a C57Bl6 background. All experiments evaluating cardiac function and outcome after infarction in FIRE mice were performed on mice kept with a CBA/Ca background.

      Reviewer #2 (Recommendations For The Authors):

      • Please provide the sample size for Fig. 5.

      We described the sample size in the methods part (lines 448-450: “Cell sorting was performed on a MoFlo Astrios (Beckman Coulter) to obtain cardiac macrophages from CD45.2; Mx1CreMybflox/flox after BM-transplantation of CD45.1 BM (n=3 for 2 days after I/R injury) for bulk sequencing,..“). We added the sample size also to the figure legend.

      • Please state in the methods how the normality of data was tested.

      We added the respective normality test to the methods part. “The Shapiro-Wilk test was used to test normality. “

      • How did the authors ensure a standardized infarct size?

      The authors ensured a standardized infarct size in mice following myocardial infarction through a carefully controlled experimental protocol. We employed the well-established I/R procedure for inducing myocardial infarction in mice by ligation of the LAD for 1h to mimic the transient blockage of blood flow to the anterior wall of the heart. Success of the ligation of the LAD and the induction of ischemia was confirmed by the pale color of the myocardium after ligation and the success of reperfusion by the return of color after removing the suture. The surgical technique was consistently performed by the same highly trained veterinarian in a blinded fashion to minimize variability.

    1. Author Response

      The following is the authors’ response to the original reviews.

      To the reviewers.

      We appreciate a detailed and deep review of our manuscript. Below are our comments and responses. Many requested data are present in the Supplementary figures of the manuscript. There seem to be two main concerns: one regarding the evidence of TLT2 expression in HFSCs; and second, regarding CEP/TLR2. As detailed below, we utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. The source (the data are in Supplementary Fig. 5A, B and in references below) and nature of CEP (it is not a protein, but metabolic product of Polyunsaturated acid DHA oxidation by MPO amongst other ROS sources) are also explained below.

      1) “The expression analysis of TLR2 is questionable. Many of the conclusions about the level of target genes are based on quantifying fluorescence intensity in microscopy images (e.g., TLR2 level in young or aged mice, BMP7 levels in mice with/without TLR2 KO). This could be strengthened by using qPCR to measure gene expression levels in FACS-sorted HFSCs, which would provide more accurate quantification. Additionally, the authors should test if the TLR2 antibody used is valid.”

      In most instances we have used TLR2 reporter mouse, which presents an advantage over immunostaining. Fig.2 (A-H) shows expression of TLR2 reporter, not the staining with TLR2 abs. For selected experiments we utilized immunostaining with anti- TLR2 (Santa Cruz Biotechnology, sc-21759) antibody, which has been validated in our previous publication (see Michael G. McCoy and all. Endothelial TLR2 promotes proangiogenic immune cell recruitment and tumor angiogenesis. // Sci Signal. 2021 Jan 19; 14(666): eabc5371/doi: 10.1126/ scisignal.abc5371). In Fig.S2E of that manuscript we validated these abs using a knockout of TLR2. In the current paper, we further validate anti-TLR2 abs by showing its co-localization with the TLR2-GFP reporter (Fig. S1A).

      We then confirmed reporter and immunostaining data by qPCR showing Tlr2 expression in FACS-purified mouse HFSCs in anagen, telogen, and catagen (Fig.2J), in mouse epidermal cells and FACS-purified HFSCs (Fig.2K), and FACS-purified HFSCs isolated from Control and TLR2HFSC-KO mice (Fig.4E).

      As for the mechanistic link between TLR2 and BMP signaling was identified using RNAseq on FACS-purified HFSCs (supplementary Fig.4), then verified using qPCR (Fig.4E shows Bmp7,Bmp2, Bmpr1a ) and only then immunohistochemistry staining for BMP7 and phosphoSMAD1/5/9 was used (Fig.4A-D, F-H). Note that the large body of requested evidence is presented in Supplementary data. Other mechanistic links shown using qPCR include Nfkb2, Il1b, Il6, and Bmp7 in FACS-purified mouse HFSCs treated with BSA control or CEP (Fig.6Q,6R).

      “As the reviewers note, it is not clear whether the TLR2+ signal is located at the basal side of bulge stem cells, basement membrane underlying bulge stem cells, or dermal sheath cells encapsulating bulge structure. Co-staining with basement membrane markers such as collagen and laminin or HFSC basal side membrane markers such as Itga6, Itgb1, and Itgb4 will clarify this. In addition, showing the expression pattern of TLR2 in full skin including epidermis and dermis would be helpful. As TLR2 is highly expressed in immune cells or blood endothelial cells, if the antibody staining is valid, strong positive signals should present in the cells. Moreover, testing the TLR2 antibody in Tlr2 knock-out mouse tissues would be an appropriate control experiment.”

      Once again, in most instances we have used not the staining for TLR2 but TLP2 reporter mouse (Fig.2 legend). Anti-TLR2 abs have been verified in TLR2 KO as described above. Fig.2K shows comparison of Tlr2 mRNA expression in mouse epidermal cells to FACS-purified HFSCs by qPCR.

      TLR2 signal is detected in several cell types within the hair follicle as well as in dermal cells surrounding the hair follicles, such as lymphocytes, resident tissue macrophages, fibroblast, and fibroblast precursors, etc. (https://www.proteinatlas.org/ENSG00000137462-TLR2/single+cell+type). In Author response image 1 below, white arrows point to the TLR2-positive cells around the hair follicle. In our paper, we focus on HFSC TLR2 and use the respective inducible tissue specific TLR2 KO. The contribution of TLR2 on other cell types can be assessed by the comparison of the phenotypes of global TLR2 KO, TLR2 KO-WT bone marrow chimeras and HFSC-specific TLR2 KO. The results are presented in both, main and supplementary figures (Fig.5D-I and SFig.5I-K shows global TLR2 KO, Fig.6H-I, SFig.5G-h shows bone marrow chimeras and Figs.3,4, 5 (J-M), Fig.5 (J-N) shows the main focus, HFSC-TLR2 KO. Overall, the phenotype (delay of hair regeneration after wounding) seems to be the strongest in TLR2 KO, whereas bone marrow chimeras and HFSCs phenotypes are comparable. Thus, TLR2 on bone marrow derived cells complements the main role for TLR2 on HFSCs.

      Author response image 1.

      Staining for TRLR2 (white), DAPI (blue) and Keratin 17 (purple) is shown

      “The increase in expression of TLR2 during the hair follicle stem cell activation should be documented by FACS and/or qPCR. This is important because as noted by one of the reviewers.”

      While original observation was done using both, a TLR2 reporter mouse and immunostaining, the data were confirmed by qPCR showing Tlr2 mRNA expression in FACS-purified mouse HFSCs in anagen, telogen, and catagen (Fig.2J).

      “In Fig 1D, the authors mentioned that they re-analyzed published RNA-seq data (Greco et al., 2009) to show the increase of Tlr2 and Tlr6 expression in late telogen compared to early telogen. However, there is no RNA-seq data in that paper, but only microarray data of bulge vs HG comparison and dermal papillae cells (DP) in early, mid, late Telo. If the authors used DP data to show the increase of Tlr2 transcripts in late Telo, the analysis is completely wrong and has to be corrected. The problem is compounded by the fact that in other published HFSC RNA-seq datasets (Yang et al., Cell, 2017, Adam et al., Nature Cell Biology, 2020), the expression levels of Tlr2 and Tlr6 are very low (below 5 TPM). In Fig 1G, the authors also re-analyzed Morinaga et al., 2021 data to show the reduction of Tlr2 expression in HFSCs in high-fat diet mice. However, in the raw data of Morinaga et al., 2021 (GSE169173), Tlr2 expression FPKM values are below 1 in both normal diet and high-fat diet samples, which are too low to perform comparative analysis and are not statistically meaningful. Like Tlr2, the expressions of Tlr1 and Tlr6, which form heterodimer with TLR2, are almost 0. Thus, the authors should revisit the dataset and revise their analysis and conclusion.”

      To document the existence of Tlr2 and Tlr6 expression in HFSCs, the authors should perform RNR-seq-based gene expression analysis by themselves. Otherwise, the authors' TLR2 expression analyses in Fig 1 are not convincing. These are serious issues that the authors will want to rectify so that eLIFE readers will not discount their findings and importance.”

      It is correct, we analyzed a published array, not RNAseq data (Greco et al., 2009) using GEO2R tool which allowed us to compare the mRNA expression levels between early, middle, and late telogen in bulge CD34 positive cells. We changed the “RNA-seq” (the term was used incorrectly) to “RNA microarray” in the main text.

      In our manuscript, TLR2 expression is documented not only in Fig.1, but also in Fig.2 and S.Fig.1. We utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. Fig.2K shows comparison of Tlr2 mRNA expression in mouse epidermal cells to FACS-purified HFSCs by qPCR to document increased TLR2 expression on HFSCs. Likewise, Fig.2J shows qPCR for TLR2 on HFSC during various phases of hair growth.

      “In Fig 2, to support the expression of Tlr2 in HFSCs, the authors utilized TLR2-GFP mice and showed the strong GFP expression in HFSCs, hair bulb, and ORS. However, as the expression data in Fig 1 are questionable, the GFP reporter data should be carefully analyzed with proper control experiments. For example, although TLRs are highly expressed in immune cells and endothelial cells, which are abundantly present in skin, Fig 2 data did show the GFP expression in these cells. Instead, the GFP signals looked very specific to epithelial compartments, which is odd. Again, to convince readers, the authors should provide more comprehensive analyses of expression patterns of TLR2-GFP mice in skin. Also, if the TLR2-GFP signals faithfully reflect the actual expression of Tlr2 mRNA, the GFP signals should increase in late telogen compared to early telogen. The authors should check whether TLR2-GFP expression follows this pattern.”

      The specificity of TLR reporter was characterized in Price et al. , 2018. A Map of Toll-like Receptor Expression in the Intestinal Epithelium Reveals Distinct Spatial, Cell Type-Specific, and Temporal Patterns. Immunity, 49. Thus, TLR2 reporter mouse is well characterized (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6152941/) and represents one of the best available tools to show TLR2 expression.

      Expression of TLR2 on endothelial cells and validation of anti-TLR2 abs was performed in McCoy et al, Science Signaling as mentioned above. Also as discussed above we show a strong correlation between TLR2-GFP reporter expression and TLR2 expression using coimmunostaining with GFP and TLR2 antibodies with appropriate isotype-match non-immune antibodies as negative controls.

      There is no doubt that TLR2 is expressed on immune, endothelial and epithelial cells. According to the Human Protein Atlas, TLR2 expression is identified in skin fibroblasts, keratinocytes, melanocytes, etc., so our findings are well supported by the literature (https://www.proteinatlas.org/ENSG00000137462-TLR2/single+cell+type). Indeed, we detected TLR2 in cells surrounding the hair follicle (see the pictures above). TLR2 signal was detected in nearly all niches of hair follicles including the CD34-positive cells.

      In Fig.S1 we demonstrated an increased level of TLR2 in the late (competent) telogen compared to the early (refractory) telogen using immunostaining for TLR2-GFP. The results mirrored published RNA-array data in Fig.1D. Again, reporter and immunostaining results have been validated by qPCR for TLR2.

      The levels of TLR2 might be heavily influences by the environment, i.e. pathogens availability. In this regard, note that mice for this study were kept in normal, not pathogen-free conditions.

      “Overall, the existence of Tlr2 expression in HFSCs is still questionable. Without resolving these, genetic deletion of Tlr2 in HFSCs cannot be rationalized.”

      In our manuscript, TLR2 expression is documented not only in Fig.1, but also in Fig.2 and S.Fig.1. We utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. Besides these data, we show the functional responses to canonical TLR2 ligand, PAM3CSK4, and previously characterized endogenous ligand, CEP, using proliferation, western blotting and many other approaches. In numerous immunostainings we show co-localization of TLR2 and CD34 (Fig.2) using IMARIS surface rendering and colocalization tools. Our conclusions are further supported by published results as discussed above.

      2) “The central conclusion of this study is that the activation of TLR2 can suppress BMP signaling; however, the molecular link between TLR2 and BMP signaling is still missing. Given the importance of this finding, it would be intriguing to further investigate how TLR2 activation suppresses BMP signaling. A better characterization of the molecular-level interaction between TLR2 and BMP signaling can further enhance the impact of this study.

      -The published dataset should be re-analyzed, as some images and their quantification do not appear to be matched. Representative images should be used.”“In Fig 4, the authors propose that the activation of TLR2 pathway inhibits the BMP signaling pathway, which makes HFSCs quiescent. In TLR2-HFSC-KO, the authors showed that BMP7 is increased and pSMAD1/5/9 is sustained. The increase in BMP7 expression and SMAD activation should be demonstrated by additional assays. Are SMAD target genes activated in the cKO mice?”

      This mechanistic link between TLR2 and BMP was originally identified by RNAseq, confirmed by qPCR and then by immunostaining for both, BMP7 and BMP pathway activation based on phosphoSMAD1/5/9 levels. The connection to BMP pathway was also shown by western blotting (S.Fig.4B,C). The rescue experiments have been performed using Noggin injections. According to our data, numerous SMAD target genes are upregulated in TLR2-HFSC-KO, such as Kank2, Ptk2b, Scarf2, Camk1, Dpysl2, as well as BMP2 and BMP7, and these changes were confirmed by qPCR analysis in Fig.4E. Additional evidence is shown in Fig.6, which demonstrates that endogenous TLR2 ligand, CEP-carboxyethylpyrrole, acts by a similar, BMP-dependent pathway. Also, Supplemental Fig.4 adds more details to this link. SFig.4B,C shows that TLR2 activation by canonical ligand PAM3CSK4 inhibits pSMAD levels induced by BMP (western blot is shown). At the same time, as anticipated PAM3CSK4 upregulated NFkB, however, little of no effect of BMP stimulation on NFkB is observed. To summarize: TLR2 affects both, BMP7 production and BMP induced downstream signaling judged by PhosphoSMADs. The later connection appears to go in one direction: TLR2 signaling affects BMP-induced pSMADs, however, BMP signaling does not seem to substantially change TLR2-dependent NFkB. We plan to delve into the intersection of these important pathways in future.

      “Functionally, downregulation of BMP signaling by injecting Noggin, a BMP antagonist, in TLR2HFSC-KO mice induces HFSC proliferation. These functional data are solid. However, it is still curious how TLR2 signaling interact with BMP pathway molecularly. Is it transcriptional regulation or translational regulation? Perhaps, RNA-seq analysis of TLR2HFSC-KO could give some hints to answer this question. Furthermore, checking out other signaling pathways such as WNT/LEF1 and pCREB, which are important for hair cycle activation and NFkB, a downstream effector of TLR signaling would be helpful to interrogate mechanistic insights.”

      As discussed above, TLR2 affects both, BMP7 production and BMP-induced downstream signaling judged by PhosphoSMADs. The later connection appears to go in one direction: TLR2 signaling affects BMP-induced pSMADs, however, BMP signaling does not seem to substantially change TLR2-dependent NFkB.

      Indeed, in addition to BMP signaling, the Wnt signaling and β-catenin stabilization within HFSCs, known to trigger their activation (Deschene et al., 2014). However, this axis remained unchanged upon TLR2HFSC-KO (as shown in Supplementary Fig. 4J). There were several published reports on the crosstalk between TLR and BMP signaling such as (doi: 10.1089/scd.2013.0345. Epub 2013 Nov 7) showing that activation of TLR4 inhibits BMP-induced pSMAD1/5/8 and this connection requires NFkB. We probed NfkB activation, please, see the responses above.

      However, we were not able to detect substantial effect of NFkB inhibition on BMP signaling in hair follicles (not shown).

      3) “The function of CEP, a proposed endogenous ligand of TLR2, is still not clear. The authors imply that the decreased CEP level in aged mice could lead to deficient TLR2 signaling, which could further cause aging-associated hair regeneration defects. But this has not been demonstrated. What are the BMPs and pSmad1/5 levels in aged skin? Another important experiment to confirm the importance of this link during aging would be to inject CEP into the aged skin and examine whether this could restore hair regeneration in aged mice. Does CEP activate hair cycling during the endogenous pathway? What might be the source of CEP? Does CEP treatment activate BMP7 signaling? The authors should clarify these issues. The authors suggested that CEP is an endogenous ligand of TLR2, and administration of CEP induces hair cycle entry in a TLR2dependent manner. How potent is CEP in terms of HFSC activation? In Fig 6Q, CEP increases the expression of Nfkb2, Il1b, and Il6, but the fold changes are marginal. Also, if CEP is a critical ligand, the loss of CEP by a genetic deletion or a pharmacological inhibition should result in the delay of hair cycle entry. Furthermore, the source of CEP expression is curious. Is it expressed by HFSCs or dermal fibroblast or immune cells? Finally, comparing the effect of CEP to the effect of other bacterial origin Tlr2 ligands such as heat killed bacteria, purified microbial cell-wall components, and synthetic agonists (Pam3CSK4) would be helpful. It is curious if HFSC directly senses the bacterial materials and triggers hair follicle regeneration or are indirectly directed by immune cells and endothelial cells, which could be primary sensor.”

      CEP is not a protein, it is an oxidative stress-generated metabolite of polyunsaturated fatty acid, DHA (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5360178/), thus, it is impossible to generate a knockout of this molecule. As demonstrated in previous publications (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990914/, https://pubmed.ncbi.nlm.nih.gov/34871763/) CEP serves as a critical endogenous ligand supporting TLR2 signaling in the absence of pathogens. While other TLR2 endogenous ligands, such as HMGBs or HSPs exist (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4373479/), CEP binds to TLR2 directly, and its generation is aided by MPO (myeloperoxidase) amongst other peroxidases and sources of reactive oxygen/nitrogen species. MPO (produced by immune cells amongst others) serves as an innate immunity response against pathogens, but it also generates CEP adducts (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034644/) adducts in both protein and lipid form. The knockout of MPO diminishes CEP generation in skin (PMC6034644), thereby demonstrating the causative relationship between CEP and MPO.

      Author response image 2.

      Additional immunostaining of mouse skin for Keratin 17 (purple), CEP (green) and MPO (red). Similar staining is in S.Fig.5A and quantification is in S.Fig.5B.

      Also, the above-mentioned manuscripts show that CEP effects are milder but overall comparable with canonical TLR2 agonists, PAM3SCK4. As we mention in the present manuscript, normal young mice’s tissues are devoid of CEP (which is generated in response to inflammation) with an exception of hair follicles. This is likely attributed to the secretion of MPO by hair follicles (PMID: 36402231) especially in conditions of inflammation (PMID: 32893875). Supplementary Fig.5A,B show that MPO is present at the high level in sebaceous gland (as a part of anti-microbial mechanism). Again, MPO is a secreted enzyme and it is likely to be a source of continuous DHA oxidation into CEP in hair follicles. We also document that both, TLR2 and CEP levels in hair follicles (but not in other tissues-an important point for CEP) are reduced in aging. Likewise, SFig.5A,B shows that MPO secretion in hair follicle is reduced by more than 60% in aging mice. Thus, it is likely that reduced MPO levels in aging hair follicle produce less CEP. Together with reduced TLR2 levels, the lack of CEP might contribute to hair loss in aging.

      We show that similar to TLR2, CEP in hair follicles operates via a BMP-7 dependent pathway (see Fig.6). We also provide results using canonical bacterial ligand for TLR2, PAM3CSK4 whose effect on HFSCs proliferation is similar to CEP in a TLR2-dependent manner. TLR2 blocking approaches were used (Supp. Fig.4B, C, D, E, Supp. Fig.5D-5F). It remains to be seen whether CEP is required for the normal hair cycling and whether its administration might improve hair loss in aging subjects.

      “The impacts of CEP/TLR2 on proliferation of keratinocytes is still weak. How much of this effect is a result of NFkB activation, and how much is simply due to inhibiting BMP signaling?

      Impact of TLR2 on proliferation was demonstrated using a variety of mouse models, from global TLR2 KO to bone marrow chimeras to HFSCs-specific TLR2 KO, again using multiple approaches. The same applies to the effects of CEP as well as to canonical TLR2 ligand, PAM3CSK4, which were demonstrated both in vivo and in culture to be TLR2-dependent (Fig.6MO) and Supplementary Fig.4E-D). As for NFkB connection, see our responses above. It seems that the connection between TLR2 and BMP pathway occurs independently of NFkB activation.

      4) The links between TLR2 pathway and aging and obesity are only correlative. Although the authors suggest that the reduction of TLR2 expression in aging and obesity may diminish hair growth (Fig 1), there is no direct functional evidence that supports this possibility. If the authors wish to make this claim, they should test the roles of TLR2 and CEP in aging and obesity conditions.”

      We show that both, TLR2 and CEP are reduced in aging, and that this pathway contributes to hair cycling and regeneration upon wounding, we do not wish to claim more.

      5) More minor points:

      “Fig.4: The Noggin treatment in TLR2 KO mice is an important experiment. However, it is unclear why Noggin only enhances proliferation (Ki67 level) in HG but not in the bulge. This discrepancy should be addressed.”

      As we showed in Fig. 3B-3F, TLR2 HFSC-KO mice have prolonged first telogen. Noggin treatment at the first postnatal telogen promotes telogen to anagen transition in TLR2HFSC-KO characterized by the activation of HG cells prior to the bulge cells. According to the literature, the bulge cells remained silent during the late telogen, however, HGs became Ki67- positive and the proliferation of HG cells contributed to the telogen-to-anagen transition.

      (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2668200/

      https://www.sciencedirect.com/science/article/pii/S0022202X15404518?via%3Dihub

      https://journals.biologists.com/jcs/article/114/19/3419/34892/Hair-follicle-predetermination).

      “Fig.5: Does TLR2 cKO slow down wound healing, in addition to affecting pigmentation and the number of hair follicles?”

      In our previous publication, we demonstrated that deletion of TLR2 in HFSC does not affect wound healing process. Instead, endothelial TLR2 promotes wound vascularization and healing.

      (see Xiong and all. Timely Wound Healing Is Dependent on Endothelial but Not on Hair Follicle Stem Cell Toll-Like Receptor 2 Signaling.// Journal of Investigative Dermatology, Volume 142, Issue 11, November 2022, Pages 3082-3092.e1).

      “There is no panel B in Fig.4. There is no image in Fig 4D. Please correct this properly.”

      We corrected Fig.4

      “Discussion: The constant production of CEP in homeostatic skin and in the absence of inflammation should be further discussed. Additionally, the possible causes of reducing CEP levels during aging should also be further discussed.”

      We explained the sources of CEP generation, such as MPO as a one of the key enzyme, above.<br /> The data on MPO levels in hair follicles of young and old mice are presented in Supplementary Fig.5A,B. Since we previously shown that MPO produces CEP from DHA (PMC6034644), the reduction in MPO in aging is likely to contribute to reduced CEP levels.

    1. Author Response

      We are grateful to the three reviewers and the editors who have provided comments about our manuscript, "Formation of malignant, metastatic small cell lung cancers through overproduction of cMYC protein in TP53 and RB1 depleted pulmonary neuroendocrine cells derived from human embryonic stem cells.”

      We are pleased that the reviewers recognized the importance of the problem we have addressed – namely, the need for better models of small cell lung cancer, a relatively common and refractory cancer. We also appreciate their acknowledgement of the significance of our major finding: that addition of an efficiently expressed CMYC transgene to neuroendocrine cells derived from human embryonic stem cells in which the RB1 and TP53 genes have been suppressed serves to drive aggressive growth and metastatic spread, rendering this system an appealing one for future studies of this recalcitrant cancer. Further, we acknowledge that more work needs to be done to more fully characterize and better understand the mechanistic features of this model system and to exploit it for therapeutic purposes.

      More specifically, we agree with the reviewers that this manuscript would be stronger if it included: (i) tests of other oncogenes, especially other members of the MYC gene family, to serve as drivers of tumor growth and metastasis and tests of orthotropic implantation of cells into the lung; (ii) descriptions of how such tumors with various genotypes respond to therapeutic approaches, both established and novel; and (iii) a more complete assessment of the contribution of abundant MYC proteins to physiological changes in tumor cells, such as growth, apoptosis, and invasion.

      While we wish we could provide such information, it is unrealistic to believe that it will be generated by the current constellation of authors in the foreseeable future. Data in the present manuscript has been generated over nearly five years, mostly in the early phases of that interval. Since then, some of us have moved from one institution to another, and some have shifted the focus of our studies. Further delays in publishing the main messages in this paper will only delay the pursuit of further studies, most likely by others. Indeed, one of the strongest justifications for the novel publication policies at eLife is to return control of the time for dissemination of results to the hands of the authors. Our situation illustrates the wisdom of that approach.

      We also note that the reviewers have raised a few issues that we aim to clarify by revisions of the current manuscript, thereby creating an improved Version of Record, within the next few weeks. We acknowledge here the significance of those issues and the ambiguities noted by the reviewers.

      The issues include the following point noted by more than one reviewer: our claim that expression of the CMYC oncogene increases the neuroendocrine character of the tumors. We recognize that this observation may be influenced by the nature of the analysis (single cell or bulk RNA sequencing), the choice of lineage markers (eg, NEUROD1 or ASCL1 or others), and the statistical evaluation of the data. We will review these aspects of the problem and make appropriate changes in the text to be submitted as the Version of Record.

      Reviewer 1 also makes a good point about the possible effects of CMYC on the differentiation of hESC-derived lung progenitors (LPs). In this paper, we examine this issue only in LPs in which the tumor suppressor genes, RB1 and TP53, have been suppressed. Further studies of the effect of CMYC on differentiation of LPs with various combinations of functional tumor suppressor genes might well prove valuable in exploring the origins of SCLC.

      Finally, we wish to note that a topic discussed by Reviewer 1 (and by us) about the still poorly understood relationship between cancer genotypes and cell lineages has been partially addressed in a paper from our group that has been accepted for publication in Science.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      1) A single biomarker seems very unlikely to be of much help in the detection of glaucoma due to the etiological heterogeneity of the disease, the existence of different subtypes, and the genetic variability among patients. Rather, a panel of biomarkers may provide more useful information for clinical prediction, including better sensitivity and specificity. The inclusion of additional metabolites already identifying in the study, in combination, may provide more reliable and correct assignment results.

      The authors’ answer: Thank you for your comment. We recognize the constraints of using single biomarkers for diagnosis. In upcoming research, we aim to incorporate multiple biomarkers to improve diagnostic accuracy and will consider adding more metabolites as suggested.

      2) The number of samples in the supplementary phase is low, larger sample sizes are mandatory to confirm the diagnostic accuracy.

      The authors’ answer: Thank you for your comment. Collecting aqueous humor is invasive, making samples scarce. We acknowledge the small sample size limitation. In future studies, we plan to use larger samples to verify the biomarker's diagnostic accuracy. Your feedback emphasizes the need for thorough validation in our next research

      3) Cohorts from different populations are needed to verify the applicability of this candidate biomarker.

      The authors’ answer: Thank you for the suggestion. We agree on the need to test the biomarker's relevance across varied populations. Reports from other groups will help confirm and broaden our results.

      4) Sex hormones seem to be associated also with other types of glaucoma, such as primary open-angle glaucoma (POAG), although the molecular mechanisms are unclear (see doi:10.1167/iovs.17-22708). The inclusion of patients diagnosed with other subtypes of glaucoma, like POAG, may contribute to determining the sensitivity and specificity of the proposed biomarker. Androstenedione levels should be determined in POAG, NTG, or PEXG patients.

      The authors’ answer: I agree with your comment and thank you for your suggestion. PACG is a major cause of irreversible blindness in Asians. While this study centers on PACG, the link between sex hormones and other glaucoma subtypes, like POAG, merits investigation. Future studies will include POAG and other subtypes to further assess androstenedione's diagnostic relevance.

      5) In addition, the levels of androstenedione were found significantly altered during other diseases as described by the authors or by conditions like polycystic ovary syndrome, limiting the utility of the proposed biomarker.

      The authors’ answer: Thank you for your advice. Androstenedione levels also change in conditions like polycystic ovary syndrome, which could affect the biomarker's specificity. We plan to further study androstenedione's unique changes in glaucoma versus other conditions to clarify its diagnostic value.

      6) Uncertainty of the androstenedione levels compromises its usefulness in clinical practice.

      The authors’ answer: The uncertainty surrounding androstenedione levels and its impact on clinical applicability is a valid concern. We plan to delve deeper into understanding the variability and determinants of androstenedione levels to better assess its clinical relevance.

      Reviewer #2 (Public Review):

      The "predict" part is on much less solid ground. The visual field progression and association with serum androstenedione within the current experimental design eludes to a correlation. It truly cannot be stated as predictive. To predict one needs to put the substance when nothing is there and demonstrate that the desired endpoint is reached. Conversely, the substance (androstenedione) can be removed, and show that the condition regresses. None of these are possible without model system experiments, which have not been done. The authors could put some additional details in the methods, such as: 1) how much sample was collected, 2) whether equal serum volume for analysis had equal serum proteins (or cells). They have used a LC-MS/MS and a Chemiluminescence method, but another independent method such as GC-MS/MS or NMR to detect androstenedione for a subset of patients with different stages of visual field defect would be desirable.

      The authors’ answer: We acknowledge your constructive critique concerning our use of the term "predict". In the present study, we elucidated a discernible correlation between visual field progression and serum androstenedione concentrations. We are cognizant of the critical distinction between correlation and causation, and we concur that our application of the term “predict” may have been overly assertive in this context.

      Your emphasis on the imperative of employing model system experiments to unequivocally ascertain causative relationships is well-received. The experimental approach of modulating the substance, androstenedione in this case, to empirically observe its consequential impact on the condition, is a pivotal direction that warrants exploration in subsequent research endeavors. With regard to the variability of serum protein concentrations across participants, we adopted a methodological standardization by ensuring that the analyzed serum volume remained consistent across samples. This was implemented to enhance the reliability and generalizability of our findings.

      Your recommendation to consider alternative detection methodologies, specifically GC-MS/MS or NMR, is duly noted. Although our choice of LC-MS/MS and Chemiluminescence was predicated on available resources, we recognize the scientific merit in leveraging multiple analytical techniques. In future investigations, we endeavor to incorporate a broader spectrum of detection methodologies for androstenedione, particularly when assessing patients with varied visual field defect stages, thereby bolstering the robustness and validity of our conclusions.

      Reviewer #1 (Recommendations for The Authors):

      1) POAG is the leading cause of irreversible blindness worldwide (see reference #4). The prevalence of PACG is highest in Asia, but the major form of glaucoma is still POAG. The authors should modify the abstract and background sections accordingly (see line 30 and lines 61-62).

      The authors’ answer: Thank you for your suggestion, and we apologize for this mistake. The sentence” Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness worldwide” has been changed to” Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness in Asia”. (Page 2, lines 33; Page 3, lines 62-64)

      2) Line 69, please change the sentence "the He et al. taught us..." to the following "the He et al. study taught us.".

      The authors’ answer: Thank you for your comment. The sentence "the He et al. taught us..." has been changed to "the He et al. study taught us.". (Page 3, lines 72)

      3) I suggest including the name of the identified candidate biomarker in the title of the manuscript. The title must be straightforward.

      The authors’ answer: We agree with your comment and thank you for your suggestion. The sentence “Metabolomics Identifies and Validates Serum Novel Biomarker for Diagnosing Primary Angle Closure Glaucoma and Predicting the Visual Field Progression” has been changed to “Metabolomics Identifies and Validates Serum Androstenedione as Novel Biomarker for Diagnosing Primary Angle Closure Glaucoma and Predicting the Visual Field Progression”. (Page 1, lines 1)

      4) Line 88, please change "normal subjects" to "control individuals".

      The authors’ answer: Thank you for your comment. We have changed "normal subjects" to "control individuals”. (Page 4, lines 91)

      5) Line 95 and so on along the manuscript, avoid the term "normal controls" or "normal" and use only the term "controls".

      The authors’ answer: Thank you for your advice. "normal subjects" has been changed to "controls". (Page 4, lines 113; Page5, lines 118,120,124,128,133)

      6) In the participants section, indicate the ocular treatments of PACG patients. For example, on line 141, which "treatment" are you referring to?

      The authors’ answer: Thank you for your comment. We apologize to this vague statement. Treatment included medical treatment and surgical treatment. We have revised it in the manuscript. (Page 5, lines 142)

      7) The entire section 2.4 is confusing. According to Figure S2, untargeted metabolomics was conducted with a mixed sample containing "all" serum extracts in order to obtain an in-house database with molecular features present in serum by LCHRMS. Then, this database was used for targeted metabolomics in individual serum samples using LCQQQ. However, as it is described in the manuscripts, it seems that first, an untargeted metabolomics analysis was carried out to identify altered metabolites, then targeted metabolomics was carried out to validate the untargeted analysis and finally, a profiling analysis was carried out to construct the database. The workflow must be clearly discussed and amended to be understable.

      The authors’ answer: Thank you for your comment. We have revised the description of the experimental method section 2.4. (Page 7, lines 195-198)

      8) Please, briefly explain what widely-targeted metabolomics is and how it works in this study (see section 2.4).

      The authors’ answer: Thank you for your comment. For extensively targeted metabolome detection, a local database was first established by using the standard database, and ion pair information was obtained by scanning ion pairs of mixed samples (QC) with QTOF. A wide range of metabolites were qualitatively obtained by comparing with the local self-built database, and then the metabolites of each sample were qualitatively and quantitatively measured by MRM scanning mode of triple four-bar QQQ. This project combines the non-target public database scanning construction database and the wide target local database to build a new database, and then scans the database of the samples of this project with Q-TOF, and then carries out the qualitative and quantitative detection of metabolites of each sample in MRM mode. (Figure S2)

      9) On Table 1, indicate the number of patients and controls with cataracts.

      The authors’ answer: For the glaucoma group and the control group, we have excluded people with cataracts. This section is described in the inclusion and exclusion criteria for supplementary materials. (Inclusion and exclusion criteria)

      10) On "Sample processing" section, lines 152 and 153: Have you used cold methanol to ensure metabolic quenching? If not, how metabolite quenching was carried out?

      The authors’ answer: Thank you for your comment. We use cold methanol to extract metabolites, and the early blood samples have been stored in a -80°C refrigerator to ensure a low temperature process and ensure metabolic quenching. (Page 6, lines 196)

      11) On the same "Sample processing" section, have you used internal standards during metabolite extraction? If yes, ones? If not, why?

      The authors’ answer: Thank you for your comment. In the metabolite extraction process of each sample, the same internal standard was added, and the same volume of 50 μL serum samples were extracted. The specific internal label name has been added in "Sample processing" section. (Page 6, lines 153-155)

      12) Lines 161-163, I suggest including in the supplementary material the worklist of the entire experiment run by LC-MS, including analytical replicates and QCs.

      The authors’ answer: Thank you for your comment. Worklist for mass spectrometry can be found in supplementary sheet1. (Page 6, lines 165)

      13) The title of the section "Detection method" does not seem appropriate, please change it to "Analytical methods "or something similar.

      The authors’ answer: Thank you for your advice. "Detection method" has been changed to “Analytical methods “. (Page 6, lines 168)

      14) Section 2.4.1, I suggest changing "Untargeted detection conditions" to "Untargeted metabolomics analysis".

      The authors’ answer: Thank you for your comment. "Untargeted detection conditions" has been changed to "Untargeted metabolomics analysis". (Page 6, lines 169)

      15) Lines 170-172, the column used is compatible with 100% water, why start with 5% acetonitrile?

      The authors’ answer: Thank you for your comment. If the acetonitrile starting gradient is 0, it will cause a lot of water-soluble substances to elute and easily clog the column, so we want to use 5% organic phase.

      16) Section 2.4.1, the chromatographic conditions (mobiles phases) were the same in both positive and negative ion mode? It is desirable to change or adjust a basic pH when working in negative, so please amend and clarify it.

      The authors’ answer: Thank you for your comment. In the negative ion mode, the peak shape of the chromatogram under the acidic system is better than that under the alkaline system, so we choose the acidic system.

      17) I am not able to clearly understand what is "widely targeted conditions" (see section 2.4.2). What is the difference with the conventional targeted metabolomics analysis? In my view, widely-targeted metabolomics refers to the combination of untargeted metabolomics and targeted metabolomics. This must be clarified and simplified.

      The authors’ answer: Thank you for your syggestion. The characterization of metabolites in this study was conducted using a non-targeted database and a self-built database. Non-targeted metabolites were characterized with mixed samples, and then combined with the laboratory self-established database to form a new metabolome database for this study. 2.4.2 The broad targeting here refers to the use of the MWDB standard self-built database to characterize metabolites, and then the QQQ MRM model to quantify metabolites. In order to clearly describe the detection process, this part of the method has been modified. (Figure S2)

      18) Line 199, please, indicate the normalization carried out.

      The authors’ answer: We agree with your comment and thank you for your suggestion. The normalization description is missing from its data processing steps and has been corrected in the manuscript. (Page 7, lines 203)

      19) How many instrumental replicates have you carried out both in untargeted and targeted metabolomics? Please, indicate it.

      The authors’ answer: Thank you for your advice. In this project, all sample mixtures were used as QC samples, which were repeated several times in the testing process (one QC sample was inserted between every 10 samples), and the repeated correlation between repeated QC was more than 99% to ensure the stability of sample testing. (Sheet1)

      20) Line 267, why did you select a fold changes threshold greater than 1.15 (or lower 0.85)? In metabolomics, it would be desirable to have a minimum of 1.5-fold change considering the variability of data.

      The authors’ answer: Thank you for your comment. FC reduction is selected to expand potential candidate metabolites and can be repeated in three batches and refer to the literature "Blood metabolomics uncovers inflammation-associated mitochondrial dysfunction as a potential mechanism. underlying ACLF "method screening threshold.

      21) To include anywhere the molecular formula of androstenedione.

      The authors’ answer: I agree with your comment and thank you for your suggestion. We have added the molecular formula of androstenedione to the supplementary material. (Page 17, lines 475)

      22) Line 290 is not Figure 4B and 4C, you may refer to Figure 3B and 3C.

      The authors’ answer: Thank you for your advice. We apologize to this mistake. Figure 4B and 4C have been changed to Figure 3B and 3C.

      23) Figure S3 was lost from Supplementary material, please include it.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. There is an error in the ordering of the supplementary graph. Figure 3 is redundant, and we have modified it in the supplementary materials.

      24) Figure 4 B, indicate in the text the average and uncertainty of androstenedione levels in both control and PACG groups.

      The authors’ answer: Thank you for your comment. In the manuscript, We have added descriptions of mean ± standard deviation of androstendione levels in the control group and the disease group. (Page 11, lines 311-312)

      25) Section 3.6. please include the average and uncertainty of androstenedione levels in males and females in both control and PACG groups.

      The authors’ answer: Thank you for your advice. For 3.6 section, we supplemented the mean ± standard deviation of androstenedione levels in the control and disease groups. (Page 13, lines 350-356)

      26) Figure S9 seems missing.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. Figures S9 has been added in the Supplementary material.

      27) Lines 345-346, indicate the levels obtained for the metabolite in the compared groups.

      The authors’ answer: Thank you for your suggestion. The levels of androstenedione in each group are seen in “The results from both discovery set 1 (Figure S9A, Mild:32600±17011, Moderate:33215±17855, Severe:46060±21789) and discovery set 2 (Figure S9B, Mild:27866±19873, Moderate:27057±13166, Severe:43972±19234) indicated that the mean serum androstenedione levels were significantly higher in the severe PACG group compared to the moderate and mild PACG groups (P<0.001). These findings were further validated in both validation phase 1 (Figure S9C, Mild:75726±45719, Moderate:65798±30610, Severe:94348±30858) and validation phase 2 (Figure S9D, Mild:1.121±0.3143 ng/ml, Moderate:1.461±0.4391 ng/ml, Severe:2.147±0.6476 ng/ml).” and “Notably, the level of androstenedione was found to be significantly higher in PACG patients than in normal subjects in both discovery set 1 (Figure 4B, P=0.0081, Normal:33987±11113, PACG:42852±20767) and discovery set 2 (Figure 4C, P=0.0078, Normal:31559±10975, PACG:37934±18529).”

      28) Line 368, you don't need to indicate the PACG abbreviation again.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. I have changed " patients with PACG " to "patients". (Page 13, lines 377)

      29) Figure 6, panels A and B are not labeled (i.e., commented) in the body text of the manuscript.

      The authors’ answer: Thank you for your suggestion. We’re very sorry for this mistake. Figure 6, panels A and B have been labeled in the manuscript. (Page 13, lines 377-379)

      30) Section 3.7., when you indicate "after therapy" are you referring to surgical treatment? Please, clarify.

      The authors’ answer: Thank you for your comment. We apologize to this vague statement. Blood samples were taken before and three months after surgery. “therapy” has been changed to “surgical treatment” in the manuscript. (Page 13, lines 377)

      31) Line 370, "97th patient" should be replaced by "nine patients"?

      The authors’ answer: Thank you for your advice. We apologize to this mistake. "97th patient" has been changed to “nine patients". (Page 13, lines 378-379)

      32) Lines 370-372, it difficult to understand, please clarify why these findings indicate that severity is related to increased PACG according to Figure 6B.

      The authors’ answer: Thank you for your comment. We’re very sorry for this vague statement. The sentence of “These findings showed that the levels of androstenedione that were tightly connected with PACG severity rose dramatically as PACG progressed.” Has been removed.

      33) Line 447, the word "corrected" should be changed to "correlated"?

      The authors’ answer: Thank you for your comment. "corrected" has been changed to "correlated". (Page 16, lines 453,456)

      34) According to the literature, the levels found in control subjects are within the range of the "normal" values, i.e., are they comparable?

      The authors’ answer: Thank you for your advice. Androstenedione ranges from 0.4 to 2 in the normal population. The mean standard deviation of androstenedione in the normal population was 1.552 ± 0.4859.

      35) Lines 471-474, why "steroid hormone biosynthesis appears to be the critical node to high-match PACG pathophysiological concepts" while the high enrichment was observed in the "metabolic pathways"?

      The authors’ answer: Metabolic pathways encompass a series of chemical reactions within a cell that enable the synthesis or breakdown of molecules to maintain the cell's energy balance. Steroid hormone biosynthesis is one of these metabolic pathways, and its products, steroid hormones, participate in a wide range of physiological processes, including metabolism, immune response, and the regulation of inflammation. In a different context, a study related to fatigue during Androgen Deprivation Therapy (ADT) showed a significant difference in metabolite levels within the steroid hormone biosynthesis pathways, emphasizing the role these pathways play in metabolic alterations. The mentioned findings suggest that steroid hormone biosynthesis and metabolic pathways are intertwined. (Page 17, lines 481-488)

      36) Figure S13 and Figure S14A are the same.

      The authors’ answer: Thank you for your comment. Figure S14A has been removed.

      37) On lines 476-485, it would be interesting to discuss whether alterations of this metabolite could be a cause or consequence of PACG.

      The authors’ answer: Based on the literature found, androstenedione is a naturally occurring steroid hormone produced by the gonads and adrenal glands, and serves as an intermediate in testosterone biosynthesis (Androstenedione (a Natural Steroid and a Drug Supplement): A Comprehensive Review of Its Consumption, Metabolism, Health Effects, and Toxicity with Sex Differences). Early events in the pathobiology of glaucoma involve oxidative, metabolic, or mechanical stress acting on retinal ganglion cells (RGCs), leading to their rapid release of danger signals such as extracellular ATP, thus triggering microglial and macroglial activation as well as neuroinflammation (Immune Responses in the Glaucomatous Retina: Regulation and Dynamics). However, one might speculate that since androstenedione is a steroid hormone, it could potentially impact the inflammatory and metabolic stress observed in the pathophysiological processes of glaucoma (Adaptive responses to neurodegenerative stress in glaucoma). Metabolic and anti-inflammatory avenues might be crucial in understanding the relationship between alterations in androstenedione levels and the severity of glaucoma. Nevertheless, more research and literature analysis would be necessary to better understand the precise relationship and its underlying mechanisms between these two entities.

      38) I suggest sending the MS and MS/MS into a publicly available repository.

      The authors’ answer: Thank you for your suggestion. Further research will necessitate the utilization of the raw mass spectrometry data. We anticipate making this raw data available in a public repository upon the conclusion of subsequent experiments.

      Reviewer #2 (Recommendations for The Authors):

      The authors should aim to describe methods in greater detail.

      The authors could improve the writing to accurately describe their results and their interpretation and state what else could be done to make the result truly "predictive".

      The authors’ answer: (1) Detail Enhancement in the Methods section: We expand the description of methods such as sample pre-processing, mass spectrometry detection, and result analysis in the study to provide more detailed information about the procedures, equipment, and materials used. (2) Improvement in Writing Quality: We have engaged a scientific editor to review our manuscript for clarity, coherence, and consistency to ensure that the results and interpretations are accurately and clearly conveyed. Terminologies and phrases have been revised to better reflect the findings and interpretations. (3) Limitation supplement: We have included a discussion on the limitations of our study and suggested additional studies and analyses that could be conducted to enhance the predictive value of our findings. We sincerely appreciate the constructive feedback from the reviewer, which has greatly contributed to improving the quality and rigor of our manuscript.