5,954 Matching Annotations
  1. Mar 2024
    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your comments. We will further investigate the correlation between aging and the proteins eIF2β and eIF2α and include the results in the revised version.

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      We agree with the reviewer that multiple ways exist to interpret the LC3 blotting for the autophagy assessment. Thus, we analyzed the levels of p62, an autophagy substrate, and found that milton knockdown caused elevated levels of p62 (Figure 2B). Together, these results suggest that autophagic degradation is lowered.

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for your comments. We will include more analyses of the proteomic data in the next version of our manuscript. In this study, we aimed to elucidate the mechanisms by which depletion of axonal mitochondria induces proteostasis disruption prematurely. Thus, we did not investigate the roles of differentially expressed proteins in proteostasis at 21-day-old in milton knockdown. Aging disrupts proteostasis via multiple pathways: eIF2β levels may be lowered by feedback of earlier changes or via interaction with other age-related changes at 21-day-old. We will include more discussion in the next version of our manuscript.

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation. Thank you for your comment. We understand that the phosphorylation of eIF2α is inhibitory for translation initiation, as we described in page 9, Line 312-314. We propose a model in which autophagic defects caused by milton knockdown is mediate by upregulation of eIF2β, however, we are not arguing that the translational suppression in milton knockdown is caused by a reduction in p-eIF2α. We found that milton knockdown causes an increase in eIF2β, and overexpression of eIF2β copied phenotypes of milton knockdown such as autophagic defects (Figure 5 and 6). We also found that the increase in eIF2β reduces the level of p-eIF2α (Supplemental Figure 2), thus, eIF2α phosphorylation in milton knockdown may be caused by an increase in eIF2β. However, the effects of upregulation of eIF2β on the function of eIF2 complex is not fully understood. The translational suppression in milton knockdown may be caused by disruption of eIF2 complex, while it is also possible that it is mediated by a function of eIF2β that is yet-to-be-determined, or mediated by the pathways other than eIF2. We will include more details in the revised version.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%. Thank you for pointing it out. We are sorry, it was a mistake. The gradient was actually 10-50%, and we described it wrong. We will correct it in the revised version.

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript. Thank you for pointing it out. We will replace the graph.

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. A new set of experiments with technical challenges will be required to capture the spatial resolution for the axonal translation. We will work on it and hope to achieve it in the future.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations For The Authors):

      I would like to thank the authors for their comments. However, my request for additional experiments to consolidate this manuscript and text changes have not been addressed (point 1 and point 2), which I believe are essential for completion of this manuscript.

      The reviewer raised the question about the relevant substrates of PARG in S-phase cells (point 1). As we explained in our previous response, the most important substrate of PARG is PARP1, since we observed increased chromatin-associated PARP1 and PARylated PARP1 in cells with PARG depletion. Moreover, PARP1 or PARP1/2 depletion rescued cell lethality caused by PARG depletion. These data strongly suggest that PARP1 is the major substrate of PARG in S phase cells. Of course, PARG may have additional substrates. In the future, we will perform proteomics experiments as suggested by this reviewer to identify additional PARG substrates, which may reveal new roles of PARG in S phase progression.

      The reviewer also suggested us to re-organize our manuscript (point 2). However, we prefer to keep the manuscript as it is, since this is how the project evolved. The other reason we would like to share with the readers is the challenge to validate KO cells. This is an important lesson we learned from this study. We hope that this will raise the awareness of hypomorphic mutant cells we often use to draw conclusions about gene functions and/or genetic interactions. We understand that the current flow of our manuscript may bring some confusion. To avoid it, we included additional explanations at the beginning of this manuscript to draw attention to the readers that our initial KO cells may not be complete PARG KO cells, i.e. they may have residual PARG activity. We also included additional discussion of this important point in the Discussion section.

      Moreover, WB analysis of PARG KO clones is inconclusive, as the additional prominent band at 50 kDa could be a degradation product. The authors should check PARG levels are localization by IF, which allows detection of intact proteins and their cellular localizations, since the shorter isoform should be localized in the cytosol. WB with PARG isoforms is missing important information regarding Mw of the PARG constructs and Mw labels of western blots, which makes is difficult to evaluate this data and compare to KO. Ideally, KO and PARG isoform samples should be all on one gel for proper comparison with different antibodies.

      We appreciate the concerns raised by this reviewer. We agree that the additional prominent band at 50kDa could be a degradation product. As we explained in our previous response, despite using several PARG antibodies, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Immunostaining experiments may not be more conclusive, since IF experiments rely on the same antibodies for recognizing endogenous PARG. Additionally, even a protein mainly localizes in the cytosol, we cannot exclude the possibility that a small fraction of this protein may localize in nuclei and have nuclear functions.

      Instead, as we presented in our manuscript, we used a biochemical assay to measure PARG activity in cell lysate and showed that our initial PARG KO cells still have residual PARG activity. However, we could not detect any PARG activity in our complete/conditional PARG KO cells (cKO cells; these cells can only survive in the presence of PARP inhibitor). These data strongly suggest that PARG is essential for cell survival.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The author should evaluate the possibility of naturally occurring arrhythmia due to the geometry of the tissues, by using voltage or calcium dye.

      Answer: We thank the reviewer for this suggestion. We have performed new experiments using a voltage-sensitive fluorescent dye (i.e. FluoVolt) with data reported in the new Figure 4 + new results section “arrhythmia analysis”. Briefly, we found that our ring-shaped tissues are compatible with live fluorescence imaging. We were then able to show that our cardiac tissues beat regularly, without naturally occurring arrhythmias or extra beats. We could not detect any re-entrant waves in our tissues in the conditions offered by the speed of our camera. A specific paragraph has also been added to the discussion.

      (2) There is only 50% survival after 20 days of culture in the optimized seeding group. Is there any way to improve it? The tissues had two compartments, cardiac and fibroblast-rich regions, where fibroblasts are responsible for maintaining the attachment to the glass slides. Do the cardiac rings detach from the glass slides and roll up? The SD of the force measurement is a quarter of the value, which is not ideal with such a high replicate number.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. We are currently expanding to other cell lines with improvement in survival (see https://insight.jci.org/articles/view/161356). We confirm that the rings do not detach. The pillar was specifically designed to avoid this (See figure 1B).

      As the platform utilizes imaging analysis to derive contractile dynamics, calibration should be done based on the angle and the distance of the camera lens to the individual tissues to reduce the error. On the other hand, how reproducible of the pillars? It is highly recommended to mechanically evaluate the consistency of the hydrogel-based pillars across different wells and within the wells to understand the variance.

      Answer: We propose a system and a measurement method that do not need calibration. Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A). There is thus no influence of the distance of the camera lens.

      In order to evaluate the consistency of the mechanical properties of the hydrogel, we reproduced the experiment pictured in Figure1-Supplement 1, and measured the Young’s Modulus of three different gel solutions on different days. In the three experiments performed, we found values of 10.0-12.2 kPa, resulting in a final average value of 11.2 (+/- 0.6) kPa, coherent with the value reported in the article. We are therefore confident that the mechanical properties are consistent across and within wells. More extensive mechanical characterization of the molded gels would require the access to an Atomic Force Microscope (AFM), and is considered in the future.

      The author should address the longevity and reproducibility issues, by working on the calibration of camera lens position/distance to tissues and further optimizing the seeding conditions with hydrogels such as collagen or fibrin, and/or making sure the PEG gels have high reproducibility and consistency.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. This platform (including the design, approach and choice of polymers) allows a fast and reproducible formation of an important number of cardiac tissues (up to 21 per well in a 96-well format, meaning a potential total of about 2,000 tissues) with a limited number of cells.

      (3) The evaluation of the arrhythmia should be more extensively explained and demonstrated.

      Answer : See answer to comment 1

      (4) The results of isoproterenol should be checked as non-paced tissues should have increased beating frequency with increasing dosages. Dofetilide does not typically have a negative inotropic effect on the tissues. Please check on the cell viability before and after dosing

      Answer : We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training. We agree that above a concentration of 10nM, dofetilide shows cardiotoxicity in our tissues as tissues completely stop beating.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the general comments in the public review, I have the following specific suggestions to the authors, that would help improve the manuscript.

      (1) Please describe the protocol for preparation of cardiac rings (shown in Figure 1C) in more detail. In particular, please describe how the tissues were transferred from the mold into the 96-well plate and how are they positioned and characterized during the study.

      Answer: There is no transfer of the tissues as they directly form in the well, that is pre-equipped with the molded PEG gel (See Figure 1B and methods section). The in situ analysis is a strong asset of this platform.

      (2) Please clarify the timepoints in this study. The overall schematic in Figure 1 C shows that the rings were formed on day 22 and then studied for 14 days, while Figure 2B shows data over 20 days following seeding, and Figure 3 shows data 14 days after seeding. It appears that these were separate studies (optimization of myocyte/fibroblast ratio followed by the main study.

      Answer: Figure 1C is showing the timeline including the cardiomyocytes differentiation. hiPSC-CMs are indeed seeded in the wells 22 days after starting the differentiation, which represent the Day0 for tissue formation. We apologize for the confusion.

      (3) Please explain if the number of rings per well (Figure 2) was used as the only criterion for selecting the myocyte/fibroblast ratio, and if so, why. Were these rings also characterized for their structural and contractile properties?

      Answer: Figure 2 supplement 1 report the contractility data according to the different tested ratios, and show no differences. The number for generated ring-shaped tissues was indeed the only criterion retained.

      (4) Please provide rationale for using the dermal rather than cardiac fibroblasts.

      Answer: We had previous experience generating EHTs using dermal fibroblasts which are easier to obtain commercially. Our approach could in theory also work using cardiac fibroblasts, which we have not tested in the present study.

      (5) Figure 2 panels C-E show an interesting segregation of cardiomyocytes into a thin cylindrical layer that does not appear to contain fibroblasts and a shorter and thicker cylinder containing fibroblasts mixed with occasional myocytes. Please specify at which time point this structure forms, and how does it change over time in culture? At which time point were the images taken? It would be helpful to include serial images taken over 1-14 days of study.

      Answer: We thank the reviewer for this interesting comment. We have performed additional immunostainings (reported in Figure 2 supplement 3) on tissues at Day 1 and day 7 after seeding. The segregation appears in the 7 first days. It appears that 1 day after seeding the fibroblasts are not yet attached, although the cardiac fiber has already started to be formed. Seven days after seeding, fibroblasts are fully spread and attached, and the contractile ring is formed and well-aligned. Brightfield images are reported in Figure 1E.

      (6) In the cardiomyocyte region (Figure 2D) the cells staining for troponin seem to be only at the surfaces. The thickness of the layer is only about 30-40 µµ, so one would assume that cell viability was not an issue. Please specify and discuss the composition of this region.

      Answer: We agree but we think this is a technical issue as at the center of the tissue, tissue thickness will limit laser penetration, although at the surface (inner our outer), the laser infiltrates easily between the tissue and the PEG. Moreover, we see on the zoomed view of the tissue in Figure 2 Supplement 2 that we have a staining inside the cardiac fiber, which just appears less strong due to tissue thickness.

      (7) Please also discuss segregation in terms of possible causes and the implications of apparently very limited contact between the two cell types, i.e., how representative is this two-region morphology of native heart tissue. Also, it would be interesting to know how the segregation has changed with the change in myocyte/fibroblast ratio.

      Answer: We are not sure there is a very limited contact as the use of fibroblasts is critical to ensure the formation of tissues (i.e. no tissues can be formed if we avoid the use of fibroblasts). We agree that these ring-shaped cardiac tissues are not especially representative of a native heart tissue in terms of interactions between several cell types. They were developed as a surrogate for physiopathological and pharmacological experiments (see a recent application in https://insight.jci.org/articles/view/161356)

      (8) There is interest and demonstrated ability to culture engineered cardiac tissues over longer periods of time. Please comment what was the rationale for selecting 14-day culture and if the system allows longer culture durations.

      Answer: In line with this comment, we have studied the contractile parameters of our rings 28 days after seeding and compared to their contractile parameters at D14. We found a slight increase for all the parameters, which is significant for the maximum contraction speed. Nevertheless, the data is much more variable and the number of tissues is lower (29 for D14 against 17 for D28). Therefore, we demonstrated that long-term culture of our tissues is possible, however not yet optimized. Hence, the following physiological and pharmacological tests have been done at D14.

      (9) Figure 3 documents the development of contractile parameters over 14 days of culture. Would it be possible to replace the arbitrary units with the actual values? Also, would it be possible to include the corresponding images of the rings taken at the same time points, to show the associated changes in ring morphologies.

      Answer: Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A): it is a ratio, thus without unit. Corresponding images can be seen in Figure 1 E.

      (10) The measured contraction stress, strain, and the speeds of contraction and relaxation improve from day 1 to day 7 and then plateau (Figure 3, Supplemental Figure 3. Please discuss this result.

      Answer: The new immunostainings performed on tissues at Day 1 and Day 7 show the progressive alignment of the cardiomyocytes and the muscular fibers, with an almost complete organization at Day 7.

      (11) The beating frequency does not appear to markedly change over time, while Figure 3B shows strong statistical significance (***) throughout the 14-day period. Please check/confirm.

      Answer: We confirm this result.

      (12) Please comment on the lack of effect of isoproterenol on beating frequency.

      Answer: We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training.

      (13) Please compare the contractile function of cardiac tissues measured in this study with data reported for other iPSC-derived tissue models.

      Answer : A specific paragraph tackles this aspect in the discussion

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews

      We thank the reviewers for their insightful comments and helpful suggestions that allowed us to improve the manuscript.

      Reviewer #1:

      Thermogenic adipocyte activity associate with cardiometabolic health in humans but decline with age. Identifying the underlying mechanisms of this decline is therefore highly important.

      To address this task, Holman and co-authors investigated the effects of two major determinants of thermogenic activity: cold, which induce thermogenic de novo differentiation as well as conversion of dormant thermogenic inguinal adipocytes: and aging, which strongly reduce thermogenic activity. The authors study young and middle-aged mice at thermoneutrality and following cold exposure.

      Using linage tracing, the authors conclude that the older group produce less thermogenic adipocytes from progenitor differentiation. However, they found no differences between thermogenic differentiation capacity between the age groups when progenitors are isolated and differentiated in vitro. This finding is consistent with previous findings in humans, demonstrating that progenitor cells derived from dormant perirenal brown fat of humans differentiate into thermogenic adipocytes in vitro. Taken together, this underscores that age-related changes in the microenvironment rather than autonomous alterations in the ASPCs explain the age-related decline in thermogenic capacity. This is an important finding in terms of identifying new approaches to switch dormant adipocytes into an active thermogenic phenotype.

      To gain insight into the age-related changes, the authors use single cell and single nuclei RNA sequencing mapping of their two age groups, comparing thermoneutral and cold conditions between the two groups. Interestingly, where the literature previously demonstrated that de novo lipogenesis (DNL) occurs in relation to thermogenic activation, the authors show that DNL in fact is activated in a white adipocyte cell type, whereas the beige thermogenic adipocytes form a separate cluster.

      Considering recent findings, that adipose tissue contains several subtypes of ASPCs and adipocytes, mapping the changes at single cell resolution following cold intervention provides an important contribution to the field, in particular as an older group with limited thermogenic adaptation is analyzed in parallel with a younger, more responsive group. This model also allowed for detection of microenvironment as a determining factor of thermogenic response.

      The use of only two time points (young and middle-aged) along the aging continuum limits the conclusions that can be made on aging as the only driver of the observed differences between the groups. It should for example be noted that the older mice had higher weights and larger fat depots, thus the phenotype is complex and this should be taken into consideration when interpreting the data.

      In conclusion, this study provides an important resource for further studies on how to reactivate dormant thermogenic fat and potentially improve metabolic health.

      (1) The authors claim "Aging impairs cold-induced beige adipogenesis and adipocyte metabolic reprogramming". It is previously established in humans that aging strongly associate with a decline in thermogenic capacity. With this in mind, it is easy to accept that the reduced browning observed in the older group is due to age. However, the older group also have larger adipose depots, which also can be a confounding factor. I, therefore, recommend bringing this into the discussion and putting more focus on the complexity of the phenotype. For example, it could be discussed whether the de novo lipogenesis less due to that the adipocytes of older mice is already filled with more lipids. Additional time points along the aging continuum would be needed to make a strong conclusion about age as the determinant, but even so, aging is complex and further definitions and discussion would be needed.

      We agree with the reviewer regarding the confounding effect of body weight changes. We have added a paragraph to the discussion (pasted below) to comment on the complexity of the phenotype and the contributing role of linked changes in body weight/composition.

      “Aging is a complex process, and unsurprisingly, many pathways have been linked to the aging-related decline in beiging capacity. For example, increased adipose cell senescence, impaired mitochondrial function, elevated PDGF signaling and dysregulated immune cell activity during aging diminish beige fat formation (Benvie et al., 2023; Berry et al., 2017; Goldberg et al., 2021; Nguyen et al., 2021). Of note, older mice exhibit higher body and fat mass, which is associated with metabolic dysfunction and reduced beige fat development. While the effects of aging and altered body composition are difficult to separate, previous studies suggest that the beiging deficit in aged mice is not solely attributable to changes in body weight (Rogers et al., 2012). Further studies, including additional time points across the aging continuum may help clarify the role of aging and ascertain when beiging capacity decreases.”

      (2) The study would gain from more comparisons to existing human studies and discussion on the translation potential of the findings. For example, how does the adipocyte subtypes identified in the current study translate to subtypes identified in human adipose tissue (e.g. Emont et al).

      We analyzed the human adipose tissue atlas from Emont et al. 2022 (PMID: 35296864). We did not find any obvious homologous human adipocyte subtypes. However, this and other available human single cell studies have not investigated the effects of cold exposure on white adipose tissue depots, which may be necessary to reveal DNL-high and especially beige adipocytes.

      (3) The group has contributed multiple studies demonstrating that Prdm16 is a major inducer of a thermogenic phenotype, and the literature shows that Prdm16 promote a thermogenic phenotype in favour of a fibrogenic aging phenotype. It would therefore be interesting to see how Prdm16 is regulated in the current data set, across adipocytes subtypes, age groups and temperature conditions.

      We thank the reviewer for this comment. Previous studies showed that PRDM16 protein and not mRNA levels are downregulated during aging (Wang et al., 2019, Cell Metab, PMID: 31155495; Wang et al., 2022, Nature, PMID: 35978186). Consistent with this, we did not observe an agingassociated reduction in Prdm16 mRNA levels in adipocytes in our dataset. We did observe enrichment of Prdm16 mRNA levels in beige adipocytes relative to other adipocyte clusters. We included these data in Fig. 5F.

      (4) In Figure 1, it is difficult to understand why the 6 weeks cold exposure is not shown in relation to the thermoneutrality, 3 days and 2-week cold exposure? It would be useful to have this in the same graph relating the levels and showing all four marker genes for all time points.

      These experiments were done at different times using separate groups of mice. We have now clarified this in the figure legend.

      (5) The older mice had larger inguinal fat depots, suggesting more lipids stored. The morphology of adipose tissue has previously been shown to be modulated by cold acclimation and is also the main similarity between brown adipose tissue in adult humans and young mice beige adipose tissue. Fig S2b suggests smaller adipocytes in the young group. It would also be useful, for comparison to published data, if authors show tissue sections with H&E of their model.

      Good point. We added panels showing H&E staining of serial iWAT sections, showing changes in tissue morphology across age and temperature conditions (Figure S1F).

      (6) The authors use t-tests to compare the differences induced by e.g. cold or min vs max cell culture media etc, within each age group. However, in my opinion, a two-way Anova with post-tests would be more informative as this would allow for testing the effects of the two age categories on any quantitative variable and allow for addressing whether there is an interaction between the categories.

      Following the reviewer’s recommendation, we applied two-way ANOVA with a Tukey correction for multiple comparisons for categorical comparisons with different age groups and conditions. P values from all significant multiple comparison tests are now included within the methods section.

      (7) In Figure 5F, please include Adipoq expression between clusters and please add a reference to why Nnat is considered a canonical white adipocyte marker.

      We added Adipoq to the violin plot in Figure 5F, showing differential expression across adipocyte clusters. We included a line in the results section to highlight this observation:

      “Interestingly, Adiponectin (Adipoq) was differentially expressed across adipocyte clusters, with higher levels in Npr3-high and DNL-high cells.”

      We removed “canonical” and added references for Nnat and Lep as white marker genes.

      (8) After 14 days of cold exposure, it looks like the DNL high population divides into two populations, did the authors explore if there was any differences between these clusters?

      We also noticed this apparent division and explored this question. However, upon increasing the resolution for clustering and splitting the DNL high population, there were no obvious differentially expressed genes that defined the two subclusters. Thus, we opted to keep them together.

      (9) As cold treatment transform a subset of cells, can authors perform a data-driven analysis to visualize the directions in their single nuclei data sets by using monocle pseudotime and/or velocity analyses?

      This is a good question. We spent a long time trying to address this question using several trajectory and pseudotime analysis methods, including Velocity (scVelo), Slingshot and Dynoverse. Unfortunately, we were unable to obtain concordant results using at least two different methods and felt that the analyses were unreliable.

      Reviewer #2:

      This manuscript focused on why aging leads to decreased beiging of white adipose tissue. The authors used an inducible lineage tracing system and provided in vivo evidence that de novo beige adipogenesis from Pdgfra+ adipocyte progenitor cells is blocked during early aging in subcutaneous fat. Single-cell RNA sequencing of adipocyte progenitor cells and in vitro assays showed that these cells have similar beige adipogenic capacities in vitro. Single-cell nucleus RNA sequencing of mature adipocytes indicated that aged mice have more Npr3 high-expressing adipocytes in the subcutaneous fat from aged mice.

      Meanwhile, adipocytes from aged mice have significantly lower expression of genes involved in de novo lipogenesis, which may contribute to the declined beige adipogenesis.

      The mechanism that leads to age-related impairment of white adipose tissue beiging is not very clear. The finding that Pdgfra+ adipocyte progenitor cells contribute to beige adipogenesis is novel and interesting. It is more intriguing that the aging process represses Pdgfra+ adipocyte progenitor cells from differentiating into beige adipocytes during cold stimulation. Mature adipocytes that have high de novo lipogenesis activity may support beige adipogenesis is also novel and worth further pursuing. The study was carried out with a nice experimental design, and the authors provided sufficient data to support the major conclusions. I only have a few comments that could potentially improve the manuscript.

      (1) It is interesting that after three days of cold exposure, aged mice also have much fewer beige adipocytes. Is de novo adipogenesis involved at this early stage? Or does the previous beige adipocyte that acquired white morphology have a better "reactivation" in young mice? It would be nice if the author could discuss the possibilities.

      This is a good question. We did not evaluate beige adipogenesis at the 3d timepoint. However, a previous study demonstrates that 3d of cold exposure is sufficient to promote de novo beige adipogenesis (Wang et al., Nat Med. 2013, PMID: 23995282). We observed that beige adipogenesis from Pdgfra+ cells are a relatively minor contributor to beige adipocyte development, even after long term cold exposure in young mice. Based on these data, we presume that beige adipocyte activation (or re-activation) is the dominant mechanism for beige adipocyte development.

      To clarify this point, we have included the following lines in the manuscript:

      “Previous studies in mice using an adipocyte fate tracking system show that a high proportion of beige adipocytes arise via the de novo differentiation of ASPCs as early as 3 days of cold (Wang et al., 2013).”

      “Based on these findings, we presume that mature (dormant beige) adipocytes serve as the major source of beige adipocytes in our cold-exposure paradigm. However, long-term cold exposure also recruits smooth muscle cells to differentiate into beige adipocytes; a process that we did not investigate here (Berry et al., 2016; Long et al., 2014; McDonald et al., 2015; Shamsi et al., 2021).”

      (2) Is the absolute number of Pdgfra+ cells decreased in aged mice? It would be nice to include quantifications of the percentage of tomato+ beige adipocytes in total tomato+ cells to reflect the adipogenic rate.

      We presented FACS quantification of tdTomato+/Pdgfra+ cells in Fig. 2B. We added a graph showing the percentage of Pdgfra+ cells of total live, lin- cells in adipose tissue; this showed no difference between young and aged mice. We did not perform FACS quantification of tdTomato+ beige adipocytes due to the technical challenges with sorting adipocytes. Quantification of total tdTomato+ cells was also unreliable and inconsistent due to the widespread labeling of fibroblasts, blood vessels, along with traced adipocytes. Thus, we did not include this analysis.

      (3) Line 112, the sentence seems to be not finished.

      This has been corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewers’ Public Comments

      We are grateful for the reviewers’ comments. We have modified the manuscript accordingly and detail our responses to their major comments below.

      (1) Reviewer 2 was concerned that transformation of continuous functional data into categorical form could reduce precision in estimating the genetic architecture.

      We agree that transforming continuous data into categories may reduce resolution, but it also improves accuracy when the continuous data are affected by measurement noise. In our dataset, many genotypes are at the lower bound of measurement, and the variation in measured fluorescence among these genotypes is largely or entirely caused by measurement noise. By transforming to categorical data, we dramatically reduced the effect of this noise on the estimation of genetic effects. We modified the results and discussion sections to address this point.

      (2) Reviewer 2 asked about generalizability of our findings.

      Because our paper is the first use of reference-free analysis of a 20-state combinatorial dataset, generalizability is at this point unknown. However, a recent manuscript from our group confirms the generality of the simplicity of genetic architecture: using reference-free methods to analyze 20 published combinatorial deep mutational scans, several of which involve 20-state libraries, we found that main and pairwise effects account for virtually all of the genetic variance across a wide variety of protein families and types of biochemical functions (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057). Concerning the facilitating effect of epistasis on the evolution of new functions, we speculate that this result is likely to be general: we have no reason to think that the underlying cause of this observation – epistasis brings genotypes with different functions closer in sequence space to each other and expands the total number of functional sequences – arises from some peculiarity of the mechanisms of steroid receptor DBD folding or DNA binding. However, we acknowledge that our data involve sequence variation at those sites in the protein that directly mediate specific protein-DNA contact; it is plausible that sites far from the “active site” may have weaker epistatic interactions and therefore have weaker effects on navigability of the landscape. We have addressed these issues in the discussion.

      (3) Reviewer 3 asked “in which situation would the authors expect that pairwise epistasis does not play a crucial role for mutational steps, trajectories, or space connectedness, if it is dominant in the genotype-phenotype landscape?”

      The question addressed in our paper is not whether epistasis shapes steps, trajectories or connectedness in sequence space but how it does so and what its particular effects are on the evolution of new functions. The dominant view in the field has been that the primary role of epistasis is to block evolutionary paths. We show, however, that in multi-state sequence space, epistasis facilitates rather than impedes the evolution of new functions. It does this by increasing the number of functional genotypes and bringing genotypes with different functions closer together in sequence space. This finding was possible because of the difference in approach between our paper and prior work: most prior work considered only direct paths in a binary sequence space between two particular starting points – and typically only considering optimization of a single function – whereas we studied the evolution of new functions in a multi-state amino acid space, under empirically relevant epistasis informed by complete combinatorial experiments. The result is a clear demonstration that the net effect of real-world levels of epistasis on navigability of the multidimensional sequence landscape is to make the evolution of new functions easier, not harder.

      (4) Reviewer 3 asked for “an explanation of how much new biological results this paper delivers as compared with the paper in which the data were originally published.”

      Starr 2017 did not use their data to characterize the underlying genetic architecture of function by estimating main and epistatic effects of amino acid states and combinations; it also did not evaluate the importance of epistasis in generating functional variants, determining the transcription factor’s specificity, or shaping evolutionary navigability on the landscape.

      (5) Reviewer 3 requested an explanation of how the results would have been (potentially) different if a reference-based approach were used, and how reference-based analysis compares with other reference-free approaches to estimating epistasis.

      This topic has been covered in detail in a recent manuscript from our group (Park et al. Biorxiv 2023.09.02.556057). Briefly, reference-free approaches provide the most efficient explanation of an entire genotype-phenotype map, explaining the maximum amount of genetic variance and reducing sensitivity to experimental noise and missing genotypes compared to reference-based approaches. Reference-based approaches tend to infer much more epistasis, especially higher-order epistasis, because measurement error and local idiosyncrasy near the wild-type sequence propagate into spurious high-order terms. Reference-based analyses are appropriate for characterizing only the immediate sequence neighborhood of a particular “wild-type” protein of interest. Reference-free approaches are therefore best suited to understanding genotype-phenotype landscapes as a whole. We have clarified these issues in the revised discussion.

      (6) Reviewer 3 suggested that the comparison between the full and main-effects-only model should involve a re-estimation of main effects in the latter case.

      This is indeed what we did in our analysis. We have clarified the description in the results and methods sections to make this clear.

      (7) Reviewer 3 asked about the applicability of the approach to data beyond those analyzed in the present study and requirements to use it.

      Our approach could be used for any combinatorial DMS dataset in which the phenotypic data are categorical (or can be converted to categorical form). Complete sampling is not required: a virtue of reference-free analysis is that by averaging the estimated effects of states and combinations over all variants that contain them, reference-free analysis is highly robust to missing data (except at the highest possible order of epistasis, where only a single variant represents a high-order effect) as long as variant sampling is unbiased with respect to phenotype. All the required code are publicly available at the github link provided in this manuscript. We have also described a general form of reference-free analysis for continuous data and applied it to 20 protein datasets in a recent publication (Park et al. Biorxiv 2023.09.02.556057).

      (8)Reviewer 3 suggested that the text could be shortened and made less dense.

      We agree and have done a careful edit to streamline the narrative.

      Response to Reviewers’ Non-Public Recommendations

      (1) Reviewer 1 noted that specific epistatic effects might in some cases produce global nonlinearities in the genotype-phenotype relationship. They then asked how our results might change if we did not impose a nonlinear transformation as part of the genotype-phenotype model. The reviewer’s underlying concern was that the non-specific transformation might capture high-order specific epistatic effects and thus reducing their importance.

      Because our data are categorical, we required a model that characterizes the effect of particular amino acid states and combinations on the probability that a variant is in a null, weak, or strong activation class. A logistic model is the classic approach to this kind of analysis. The model structure assumes that amino acid states and combinations have additive effects on the log-odds of being in one functional class versus the lower functional class(es); the only nonlinear transformation is that which arises mathematically when log-odds are transformed into probability through the logistic link function. Thinking through the reviewer’s comment, we have concluded that our model does not make any explicit transformation to account for nonlinearity in the relationship between the effects of specific sequence states/combinations and the measured phenotype (activation class). If additional global nonlinearities are present in the genotype-phenotype relationship – such as could be imposed by limited dynamic range in the production of the fluorescence phenotype or the assay used to measure it – it is possible that the sigmoid shape of the logistic link function may also accommodate these nonlinearities. We have noted this part in the revised manuscript.

      (2) Reviewer 1 observed that our model seems to prefer sets of several pairwise interactions among states across sites rather than fewer high-order interactions among those same states.

      This finding arises because the pattern of phenotypic variation across genotypes in our dataset is consistent with that which would be produced by pairwise interactions rather than by high-order interactions. In a reference-free framework, these patterns are distinct from each other: a group of second-order terms cannot fit the patterns produced by high-order epistasis, and high-order terms cannot fit the pattern produced by pairwise interactions. Similarly, main-effect terms cannot fit the pattern of phenotypes produced by a pairwise interaction, and a pairwise epistatic term cannot fit the pattern produced by main effects of states at two sites. For example, third-order terms are required when the genotypes possessing a particular triplet of states deviate from that expected given all the main and second-order effects of those states; this deviation cannot be explained by any combination of first- and second-order effects.

      We explain this point in detail in our recent manuscript (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057) and we summarize it here. Consider the simple example of two sites with two possible states (genotypes 00, 01, 10, and 11). If there are no main effects and no pairwise effects, this architecture will generate the same phenotype for all four variants – the global average (or zero-order effect). If there are pairwise effects but no main effects, this architecture will generate a set of phenotypes on which the average phenotype of genotypes with a 0 at the first site (00 and 01) equals the global average – as does the average of those with 0 at the second site (00 and 10). The epistatic effect causes the individual genotypes to deviate from the global average. This pattern can be fit only by a pairwise epistatic term, not by first-order terms. Conversely, if there are main effects but no pairwise effects, then the average phenotype of genotypes 00 and 01 will deviate from the global average (by an amount equal to the first-order effect), as will the average of (00 and 10): the phenotype of each genotype will be equal to the sum of the relevant first-order effects for the state it contains. This pattern cannot be fit by second-order model terms. The same logic extends to higher orders: a cluster of second-order terms cannot explain variation generated by third-order epistasis, because third-order variation is by definition is the deviation from the best second-order model.

      (3) Reviewer 1 suggested several places in the text where citations to prior work would be appropriate.

      We appreciate these suggestions and have modified the manuscript to refer to most of these works.

      (4) Reviewer 1 pointed to the paper of Gong et al eLife 2013 and asked whether it is known how robust the proteins in our study are to changes in conformation/stability compared to other proteins, and whether this might impact the likelihood of observing higher-order epistasis in this system.

      The DBDs that we study here are very stable, and previous work shows that mutations affect DNA specificity primarily by modifying the DBD’s affinity rather than its stability (McKeown et al., Cell 2014). Additionally, Gong et al.’s findings pertain to a globally nonlinear relationship between stability and function, which arises from the Boltzmann relationship between the energy of folding and occupancy of the folded state. Because our data are categorical – based on rank-order of measured phenotype rather than fluorescence as a continuous phenotype – the kind of global nonlinearity observed in Gong’s study are not expected to produce spurious estimates of epistasis in our work. We have modified the discussion to discuss the point.

      (5) Reviewer 1 asked a) why the epistatic models produce landscapes on which variants have fewer neighbors on average than main-effects only models and b) why the average distance from all ERE-specific nodes to all SRE-specific nodes is greater with epistasis (but the average distance from ERE to nearest SRE is lower with epistasis).

      In the main effects-only landscape, the functional genotypes are relatively similar to each other, because each must contain several of the states that contribute the most to a positive genetic score. Moreover, ERE-specific nodes are similar to each other, and SRE-specific nodes are similar to each other, because each must contain one or more of a relatively small number of specificity-determining states. When epistasis is added to the genetic architecture, two things happen: 1) more genotypes become functional because there are more combinations that can exceed the threshold score to produce a functional activator and 2) these additional functional variants are more different from each other – in general, and within the classes of ERE- or SRE-specific variants – because there are now more diverse combinations of states that can yield either phenotype. As a result, a broader span of sequence space is occupied, but ERE- and SRE-specific variants are more interspersed with each other. This means that the average distance between all pairs of nodes is greater, and this applies to all ERE-SRE pairs, as well. However, the interspersing means that the closest single SRE to any particular ERE is closer than it was without epistasis. We have added this explanation to the main text.

      (6) Reviewer 2 asked us to explain why average path length increases with pairwise epistasis as the strength of selection for specificity increases.

      This behavior occurs because of the existence of a local peak in the pairwise model. Genotypes on this peak contained few connections to other genotypes, all of which were less SRE specific. Thus, with strong selection, i.e. high population size, the simulations became stuck on the local peak, cycling among the genotypes many times before leaving, resulting in a large increase in the mean step number. As shown in the rest of the figure, when the longest set of paths are removed, there are still differences in the average number of steps with and without epistasis. This issue is described in the methods section.

      (7) Reviewers made several suggestions for clarity in the text and figures.

      We have modified the paper to address all of these comments.

      (8) Reviewer 3 stated that the code should be available.

      The code is available at https://github.com/JoeThorntonLab/DBD.GeneticArchitecture.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors were trying to understand the relationship between the development of large trunks and longirrostrine mandibles in bunodont proboscideans of Miocene, and how it reflects the variation in diet patterns.

      Strengths:

      The study is very well supported, written, and illustrated, with plenty of supplementary material. The findings are highly significant for the understanding of the diversification of bunodont proboscideans in Asia during Miocene, as well as explaining the cranial/jaw disparity of fossil lineages. This work elucidates the diversification of paleobiological aspects of fossil proboscideans and their evolutionary response to open environments in the Neogene using several methods. The authors included all Asian bunodont proboscideans with long mandibles and I suggest that they should use the expression "bunodont proboscideans" instead of gomphotheres.

      Weaknesses:

      I believe that the only weakness is the lack of discussion comparing their results with the development of gigantism and long limbs in proboscideans from the same epoch.

      Thank you for your comprehensive review and positive feedback on our study regarding the co-evolution of feeding organs in bunodont proboscideans during the Miocene. We appreciate your suggestion, and have decided to use the term "bunodont elephantiforms" (for more explicit clarification, we use elephantiforms to exclude some early proboscideans, like Moeritherium, ect.) instead of "gomphotheres," and we will make this change in our revised manuscript. We also appreciate the potential weakness you mentioned regarding the lack of discussion comparing our results with the development of gigantism and long limbs in proboscideans from the same epoch. We agree with the reviewer’s suggestion, and we are aware that gigantism and long limbs are potential factors for trunk development. Gigantism resulted in the loss of flexibility in elephantiforms, and long limbs made it more challenging for them to reach the ground. A long trunk serves as compensation for these limitations. limb bones were rare to find in our material, especially those preserved in association with the skull.

      Reviewer #2 (Public Review):

      This study focuses on the eco-morphology, the feeding behaviors, and the co-evolution of feeding organs of longirostrine gomphotheres (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) which are characterised by their distinctive mandible and mandible tusk morphologies. They also have different evolutionary stages of food acquisition organs which may have co-evolve with extremely elongated mandibular symphysis and tusks. Although these three longirostrine gomphothere families were widely distributed in Northern China in the Early-Middle Miocene, the relative abundances and the distribution of these groups were different through time as a result of the climatic changes and ecosysytems.

      These three groups have different feeding behaviors indicated by different mandibular symphysis and tusk morphologies. Additionally, they have different evolutionary stages of trunks which are reflected by the narial region morphology. To be able to construct the feeding behavior and the relation between the mandible and the trunk of early elephantiformes, the authors examined the crania and mandibles of these three groups from the Early and Middle Miocene of northern China from three different museums and also made different analyses.

      The analyses made in the study are:

      (1) Finite Element (FE) analysis: They conducted two kinds of tests: the distal forces test, and the twig-cutting test. With the distal forces test, advantageous and disadvantageous mechanical performances under distal vertical and horizontal external forces of each group are established. With the twig-cutting test, a cylindrical twig model of orthotropic elastoplasity was posed in three directions to the distal end of the mandibular task to calculate the sum of the equivalent plastic strain (SEPS). It is indicated that all three groups have different mandible specializations for cutting plants.

      (2) Phylogenetic reconstruction: These groups have different narial region morphology, and in connection with this, have different stages of trunk evolution. The phylogenetic tree shows the degree of specialization of the narial morphology. And narial region evolutionary level is correlated with that of character-combine in relation to horizontal cutting. In the trilophodont longirostrine gomphotheres, co-evolution between the narial region and horizontal cutting behaviour is strongly suggested.

      (3) Enamel isotopes analysis: The results of stable isotope analysis indicate an open environment with a diverse range of habitats and that the niches of these groups overlapped without obvious differentiation.

      The analysis shows that different eco-adaptations have led to the diverse mandibular morphology and open-land grazing has driven the development of trunk-specific functions and loss of the long mandible. This conclusion has been achieved with evidence on palaecological reconstruction, the reconstruction of feeding behaviors, and the examination of mandibular and narial region morphology from the detailed analysis during the study.

      All of the analyses are explained in detail in the supplementary files. The 3D models and movies in the supplementary files are detailed and understandable and explain the conclusion. The conclusions of the study are well supported by data.

      We appreciate your detailed and insightful review of our study. Your summary accurately captures the essence of our research, and we are pleased to note that multiple research methods were used to demonstrate our conclusions. Your recognition of the evidence-based conclusions from paleoecological, feeding behavior reconstruction, and morphological analyses reinforces the validity of our findings. Once again, we appreciate your time and thoughtful reviews.

      Reviewer #1 (Recommendations For The Authors):

      Thank you very much for the invitation to review this amazing manuscript. It is very well written and supported, and I have only minor suggestions to improve the text:

      (1) Some references are not in chronological sequence in the text, and this should be reviewed.

      We greatly appreciate the positive comments of the reviewer. We revised the reference of the manuscript as the reviewer’s suggestion.

      (2) I suggest the use of the expression "bunodont proboscideans" instead of Gomphotheres because there is no agreement if Amebelodontidae and Choerolophodontidae are within Gomphotheriidae, as well as some brevirrostrine bunodont proboscideans from South America. So I think it is ok to use "Gomphotheriidae", but not gomphotheres to refer to all bunodont proboscideans included in the study.

      The reviewer is correct. Using “gomphotheres” to refer to these three groups is inappropriate. We have replaced “gomphotheres” with "bunodont elephantiforms" throughout the entire manuscript. Here, we use “elephantiforms”, not “proboscideans”, to avoid confusion with some early proboscidean members like Moeritherium, ect.

      (3) I was expecting some discussion on the development of large trunks related to the gigantism in these bunodont proboscideans, regarding the huge skulls and the columnar limbs.

      We appreciate this suggestion, and we are aware that gigantism is a potential factor for trunk development. It is difficult to compare the three groups (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) in terms of their weight and limb bone length, because in our material, limb bones were rarely found, especially those associated with cranial material. Nevertheless, at this stage, all elephantiforms had significantly enlarged cranial sizes and limb bone lengths compared to early members like Phiomia. Gigantism caused the loss of flexibility in elephantiforms, and even the long limbs made it more difficult for an elephantiform to reach the ground. A long trunk compensates for this evolutionary change. Exploring these aspects further is a part of our future work.

      (4) The reference to Alejandro et al should be replaced by Kramarz et al (and the correct surname of the authors). The name and surname of this reference need to be corrected. The correct names are Kramarz, A., Garrido, A., Bond, M. 2019. Please correct this in the text too.

      We thank the reviewer for catching this error. This reference has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      I believe your paper will lead to other studies on other Proboscidean groups on the evolution of the mandible and trunk. There are some corrections in the text:

      • In line 199 in the text in pdf, "Tassy, 1994" should be "Tassy, 1996".

      • In line 241, "studied" should be "studies"

      • In line 313, "," after the word "tool" should be "."

      We appreciate the reviewer for pointing these errors out and have revised these based on the suggestions.

      • In the References, you write "et al." in some references. You should write the names of all of the authors.

      • In the References: "Lister AM. 2013" and "Shoshani&Tassy" are not referenced in the text.

      • In the References: "Tassy P. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn." should be "Tassy P. 1994. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn. 112, 1-2, 101-117" and replaced before "Tassy P. 1996".

      We appreciate the reviewer’s suggestions and have revised these references.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      The authors provided experimental data in response to my comments/suggestions in the revision. Overall, most points were appropriate and satisfactory, but some issues remain.

      (1) It is not fully addressed how atypical survivors are generated independently of Rad52-mediated homologous recombination.

      The newly provided data indicate that the formation of atypical telomeres is independent of the Rad52 homologous recombination pathway.

      "The atypical telomeres clones exhibit non-uniform telomere pattern", but the TG-hybridized signals after XhoI digestion are clear and uniform.

      "Atypical telomere" clones may carry circular chromosomes embedded with short TG repeats, rather than linear chromosomes. In other words, atypical telomeres may differ from telomeres, the ends of chromosomes. Is atypical telomere formation dependent on NHEJ? Given that "two chromosomes underwent intra-chromosomal fusions" (Line 248), are atypical telomere clones detected frequently in SY13 cells containing two chromosomes?

      We thank the reviewer’s questions. Frankly, we have not been able to determine the chromosome structures in these so-called "atypical survivors". As we mentioned in the manuscript, there could be mixed telomere structures, e.g. TG tract amplification, intro-chromosome telomere fusion and inter-chromosome telomere fusion. Worse still, these 'atypical survivors' may not have maintained a stable genome, and their karyotype may have undergone stochastic changes during passages. To avoid misunderstanding, we change the term "atypical" to "uncharacterized" in the revised manuscript.

      We have previously shown that deletion of YKU70 does not affect MMEJ-mediated intra-chromosome fusion in single-chromosome SY14 cdc13Δ cells (Wu et al., 2020). In SY12 cells, double knockout of TLC1 and YKU resulted in synthetic lethality, and we were unable to continue our investigation. The result of synthetic lethality of TLC1 and YKU70 double deletion was shown in the Figure 7B in the reviewed preprint version 1, and the result was not included in the reviewed preprint version 2 in accordance with the reviewer's instructions.

      "Atypical” survivors could be detected in SY13 cells (Figure 1D), but the frequency of their formation in the SY13 strain appeared to be lower than in SY12. As one can imagine, SY13 contains two chromosomes and its survivors should have a higher frequency of intra-chromosome fusions.

      (2) From their data, it is possible that X and Y elements influence homologous recombination, type 1 and type 2 (type X), at telomeres. In particular, the presence of X and Y elements appears to be important for promoting type 1 recombination. In other words, although not essential, subtelomeres have some function in maintaining telomeres. I suggest that the authors include author response image 4 in the text. They could revise their conclusion and the paper title accordingly.

      According to this suggestion, we have included author response image 4 in the revised manuscript as Figure 2E, Figure 5D, Figure 6C and Figure 6E. Accordingly, we have changed the title as “Elimination of subtelomeric repeat sequences exerts little effect on telomere essential functions in Saccharomyces cerevisiae”.

      (3) Minor points: The newly added data indicate that X survivors are generated in a type 2-dependent manner. The authors could discuss how Y elements were eroded while retaining X elements (line 225, Figure 2A).

      Thank this reviewer’s suggestion. We have discussed it in the revised manuscript (p.13 line 244-245). When telomere was deprotected, chromosome end resection took place. Since SY12 only has one Y’-element, it is hard to search homology sequences to repair the Y’-element in XVI-L. When the X-element in XVI-L was exposed by further resection, it is easier to find homology sequences to repair. So, in Type X survivor the Y’-element was eroded while retaining X-element.

      Reviewer #2

      I would like to congratulate the authors for their work and the efforts they put in improving the manuscript. The major criticism I had previously, ie testing the genetic requirements for the survivor subtypes, has been met. Below are a few minor comments that don't necessarily require a response.

      (1) I think the Author response image 6 could have been included in the manuscript. I understand that the authors don't want to overinterpret survivor subtype frequencies, but this figure would have suggested some implication of Rad51 in the emergence of survivors even in the absence of Y' elements. At this stage, however, it is up to the authors, and leaving this figure out is also fine in my opinion.

      According to the suggestion, the author response image 6 has been presented as Figure 6—figure supplement 7.

      (2) Chromosome circularization seems to rely on microhomologies. Previously, the authors proposed that SY14 circularization depended on SSA (Wu et al. 2020), but here, since circularization appears to be Rad52-independent, it is likely to be based on MMEJ rather than SSA (although there are contradictory results on Rad52's role in MMEJ in the literature).

      Yes, we mentioned it in the revised manuscript.

      (3) p. 28 lines 511-513: "The erosion sites and fusion sequences differed from those observed in SY12 tlc1Δ-C1 cells (Figure 2D), suggesting the stochastic nature of chromosomal circularization": I don't think they are necessarily stochastic, because the sequences beyond the telomeres are now modified, the available microhomologies have changed as well.

      We agreed with your opinion. In different chromosomes, there tend to be some hotspots for chromosome fusion. For example, in Figure 6C and 6F the resection site in Chr1 and Chr2 was the same in SY12XYΔ+Y tlc1Δ-C1 and SY12XYΔ tlc1Δ-C1. So, we speculate that there are some hotspots for chromosome fusion, but which site the cell will choose in one round chromosome fusion event is stochastic.

      (4) Typos and other errors:

      • p. 3 line 52: "subtelomerice" and "varies" are mispelled.

      • p. 5 line 78: "processes" should be "process".

      • Supp files are mislabelled (the numbers do not correspond to file name).

      • Supp file 2: how come SY12 has only one Y' element and SY13 has two?

      • p. 10 line 175: "emerging" should be "emergence".

      • p.15 line 276: "counter-selected" should be "being counter-selected" or "counterselection".

      • p. 29 line 523: "the formation of them" should be "their formation".

      • p. 37 line 653: "could have been an ideal tool": the sentence is grammatically incorrect. Writing "AND could have been an ideal tool" is enough to make it structurally correct.

      Thanks for pointing these errors out. We have corrected them in the revised manuscript. For the question “how come SY12 has only one Y' element and SY13 has two?” we were not sure at this moment. We speculated that one of the Y’ might be lost during genetic engineering of the chromosomes by CRISPR–Cas9 system.

      Reviewer #3

      The authors included statistical analyses of the qPCR data (Fig 4B) as requested, but did not comment on the striking difference in expression of MPH3 and HSP32 in the SY12 strain compared to BY4742. An improvement of the manuscript is the inclusion of rad52 tlc1 strains in their analyses, demonstrating that the "atypical and circular survivors" arose independently of homologous recombination. In addition, by analyzing rad51 and rad50 mutant strain they could demonstrate that the "type X" survivors had similar molecular requirements to type II survivors. Overall, the revised submission improves the article.

      We thank the reviewer’s comments and suggestions. The SY12 strain (with three chromosomes) exhibited lower expression levels of both MPH3 and HSP32 compared to the parental strain BY4742 (with 16 chromosomes). We speculated that with the reduced chromosome numbers, the silencing proteins appeared to no longer be titrated by other telomeres that have been deleted. We have added these comments in the revised manuscript.

      Wu, Z.J., Liu, J.C., Man, X., Gu, X., Li, T.Y., Cai, C., He, M.H., Shao, Y., Lu, N., Xue, X., et al. (2020). Cdc13 is predominant over Stn1 and Ten1 in preventing chromosome end fusions. Elife 9.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This valuable study describes a new role of epithelial intercellular adhesion molecule 1 (ICAM-1) protein in controlling bile duct size. The effect is mediated via EBP-50 and subapical actomyosin to regulate size of bile canaliculi. These solid findings have theoretical and practical implications in hepatology and human disorders of bile ducts.

      Public Reviews:

      In this study, Cacho-Navas et al. describe the role of ICAM-1 expressed on the apical membrane of bile canaliculi and its function to control the bile canaliculi (BCs) homeostasis. This is a previously unrecognized function of this protein in hepatocytes. The same authors have previously shown that basolateral ICAM-1 plays a role in controlling lymphocyte adhesion to hepatocytes during inflammation and that this interaction is responsible for the loss of polarity of hepatocytes during disease states.

      This new study shows that ICAM-1 is mainly localized in the apical domain of the BC and in association with EBP-50, communicates with the subapical acto-myosin ring to regulate the size and morphology of the BC. They used the well-known immortal cell line of liver cells (HepG2) in which they deleted ICAM-1 gene by CRISPR-Cas9 editing and hepatic organoids derived from WT and ICAM-1-KO mice. alternating KO as well as rescue experiments. They show that in the absence of apical ICAM-1, the BC become dilated.

      The data sufficiently support the conclusions of the study.

      Recommendations for the authors:

      We would like to thank the editor and reviewer for recognizing the manuscript's value and the solid nature of the data. We are also thankful to them for acknowledging that the manuscript supports the conclusions. Below, we have addressed their commentaries and questions in a point-by-point rebuttal document:

      We have a few suggestions to improve the manuscript:

      (1) HepG2 cells form canaliculi-like structures but are not the ideal system to study the apical basal polarity. On the other hand, hepatic organoids can assume a hepatocyte-like phenotype, when cultured under specific conditions but are not functionally comparable to hepatocytes organized in a 3D structure with a hollow lumen that does not recapitulate the BC physiological structure. Therefore, primary hepatocyte in collagen sandwich would be the best model to study the polarization of BCs and could be isolated from WT and ICAM-1-KO mice, that are available. Some of the major findings should be confirmed in this system.

      We adopted the culture of hepatic organoids as an experimental strategy motivated by the difficulties to culture primary hepatocytes experienced in previous analyses (RegleroReal, Cell Rep, 2014). The generation of organoids or mature hepatocytes from various sources of stem cells is a commonly employed strategy in hepatocyte cell biology (Meyer et al. EMBO Rep, 2023), due to the difficulties in maintaining mature hepatic epithelial cell cultures for longer than a few hours.

      The hepatic organoids we have used in the manuscript are being accepted as advanced cellular strategies for a broad range of fields (Belenguer, Nat Commun, 2022; de Crignis, eLife, 2021; Huch, Cell, 2015). Despite they have some morphological differences with real hepatocytes, we conducted a thorough characterization of their organization identifying canalicular-like structures with functional (CFDA) and molecular (HA-4) markers, which we believe adds value to the manuscript. In addition, the organoid technology has allowed us to import the bipotent precursors to get an permanent source of hepatic cells without the need to import and use the ICAM-1_KO mice, in line with the current guides to reduce animal experimentation.

      Taking this into account and to further validate data obtained with our cellular systems, we carried out a quantification of the canalicular diameter in livers from WT and ICAM1_KO cells (New Figure 8B), which validates our data on human cell lines and organoids. We acknowledge that the data obtained from hepatic tissues cannot rule out the contribution of immune cell adhesion to changes in the hepatocyte architecture. However, these experiments, together with the aforementioned organoids and human cell lines, strongly suggest a role for hepatic ICAM-1 in regulating canalicular size.

      (2) Overexpression of proteins was used in the study. While this approach is an easier means to visualize, without the use of specific antibodies, it is known to alter the distribution of the protein compared to the endogenous one.

      Most of our characterization has been done with antibodies or other fluorescent tools against endogenous proteins localized at BCs: CD59, F-actin, EBP50, MHC, MLC…. In addition, we have included MDR1-GFP and GFP-Rab11, the latter to analyze the subapical compartment (SAC) surrounding BCs. As requested by the reviewer, we now include in a new Supplementary Figure 1C the confocal analyses of endogenous canalicular markers, radixin and MRP2, as well as a new Supplementary Figure 1D containing the staining of an endogenous marker of the SAC, plasmolipin/PLLP (Fraticelli et al, Nat Cell Biol, 2015; Cacho-Navas, Cell Mol Life Sci, 2022), which is consistent with the previous analyses performed with GFP-Rab11.

      (3) In the absence of ICAM-1, BCs change shape and dimension but still show the presence of microvilli. What happens to the distribution of polarized transporters like Mrp2, or the transport of bile acids (CFDA clearance) in vivo in the KO animal?

      Thank you for this comment. We have analyzed this transporter in murine livers and human hepatic cells. MRP2 distribution does not significantly change and is concentrated in BCs also in ICAM-1_KO livers (New Figure 8C). Likewise, ICAM-1 gene edition does not affect MRP2 localization in the polarized human hepatic epithelial cell line in vitro (Supplementary Figure 1C). We cannot rule out changes for this transporter in other murine liver cell types in vivo, such as sinusoidal endothelial cells, which we believe should be further addressed in a different piece of work.

      (4) Does the lack of ICAM-1 affect the cell viability, proliferation or cell size?

      ICAM-1_KO cells proliferate slightly more slowly than their WT counterparts, with no detected changes in cell size and death. We present these data in Supplementary Figure 1, A and B.

      (5) Are the findings recapitulated in the livers of ICAM-1 KO animals?

      ICAM-1 KO animals present enlarged BCs, which is consistent with the main findings of the manuscript (Figure 8B).

      The text needs to be more concise. Some of the concepts, in particular those already published, should be condensed. There is a large amount of experiments that are difficult to connect logically. Possibly, cartoons summarizing the approach of the figure could help the reader.

      The text of Results and Discussion sections has been shortened by almost 100 words, despite the additional panels and experiments are now described and discussed. New cartoons have been added in Figure 5G and Figure 8F, in addition to those previously included in Figure 1 and Supplementary Figure 6, the latter containing a graphical descriptions of the main conclusions.

      Also, more detailed information about statistical analysis (what post-test was used?), concentration of cytokines, and description of the mouse model should be included in the methods.

      Cytokine concentrations have been included in the legend of Figure 3 and in the Cell and Culture section of Methods. A brief description of the ICAM-1_KO mouse and the corresponding reference for further information is also provided in the Organoid Culture section of Methods. A statistical analysis section describing the post-test used is also included at the end of Methods. The references of anti-plasmolipin, anti-radixin and antiMRP2 antibodies, as well as the new fixation methods used for immunofluorescence are also included in the corresponding Antibody List and in the Confocal Microscopy section of Methods, respectively . .

      Figure 3D. Sample names should be added as in the rest of the figures.

      The arrangement of sample names in Figure 3D has been revised and is now similar to that of Figure 3A.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Yao et al. explored the transcriptomic characteristics of neural stem cells (NSCs) in the human hippocampus and their changes under different conditions using single-nucleus RNA sequencing (snRNA-seq). They generated single-nucleus transcriptomic profiles of human hippocampal cells from neonatal, adult, and aging individuals, as well as from stroke patients. They focused on the cell groups related to neurogenesis, such as neural stem cells and their progeny. They revealed genes enriched in different NSC states and performed trajectory analysis to trace the transitions among NSC states and towards astroglial and neuronal lineages in silico. They also examined how NSCs are affected by aging and injury using their datasets and found differences in NSC numbers and gene expression patterns across age groups and injury conditions. One major issue of the manuscript is questionable cell type identification. For example, more than 50% of the cells in the astroglial lineage clusters are NSCs, which is extremely high and inconsistent with classic histology studies.

      While the authors have made efforts to address previous critics, major concerns have not been adequately addressed, including a very limited sample size and with poor patient information. In addition, some analytical approaches are still questionable and the authors acknowledged that some they cannot address. Therefore, while the topic is interesting, some results are preliminary and some conclusions are not fully supported by the data presented.

      We thank the reviewer for reevaluating our revised manuscript. We respect the reviewer’s comments and discuss the technical and conceptual limitations of this work. Here we provide the response to Reviewer #1 (Public Review) on these below.

      Firstly, we appreciate the concerns raised by Reviewer 1 regarding the high proportion of NSCs within the astroglia lineage clusters. it is worth mentioning that distinguishing hippocampal qNSCs from astrocytes by transcription profiling poses a significant challenge in the field due to their high transcriptional similarity. From previous global UMAP analysis, AS1 (adult specific) can be separated from qNSCs, but AS2 (NSC-like astrocytes) cannot. Therefore, the data presented in Figure 2C to G aimed to further distinguish the qNSCs from AS2 by using gene set scores analysis. Based on different scores, we categorized qNSC/AS lineages into qNSC1, qNSC2 and AS2. Figure 2C presented the UMAP plot of qNSC/AS2 population from only neonatal sample. We apologize for not clarifying this in the figure legend. We have now clarified this information in the figure legend of Figure 2C. More importantly, we have added UMAP plots and quantifications for other groups in Figure 2-Supplement 2A and B, including adult, aging, and injure samples. This supplementary figure provides more complete information of the cell type composition and dynamic variations during aging and injury. Although the ratio of NSCs in the astroglia lineage clusters remains higher compared to classic histology studies, the trends indicate a reduction in qNSCs and an increase in astrocytes during aging and injury, which supports that cell type identification by using gene set score analysis is effective, although still not optimal. Combined methods to accurately distinguish between qNSCs and astrocytes are required in the future, and we also discuss this in the corresponding texts.

      Secondly, we cannot adequately address the major concern regarding sample size raised by the reviewer due to the scarcity of stroke and neonatal human brain samples. We have collected additional details about the donors. Please refer to Figure 1-source data 1 for the updated information. Other information regarding the lifestyle parameters of these donors has not been sufficiently recorded by the hospital. Therefore, we cannot improve the patient information further.

      Thirdly, regarding the questionable subpopulations of granule cells (GCs) that derive from neuroblasts in Figure 4A-4D, which are inconsistent with previous single-cell transcriptomic studies, we tried various strategies to confirm the identity of the two subpopulations of granule cells (GCs) derived from neuroblasts but didn’t get a clear answer. As a result, we can only provide an objective description of the differences in gene expression and developmental trajectory and speculate that these differences may be related to their degree of maturity but are not aligned on the same trajectory.

      In the end, we have discussed the technical and conceptual limitations of this work and added a brief discussion about these limitations in the last paragraph of the main text. We hope the readers can interprate our data critically and objectively.

      Reviewer #2 (Public Review):

      In this manuscript, Yao et al. present a series of experiments aiming at generating a cellular atlas of the human hippocampus across aging, and how it may be affected by injury, in particular, stroke. Although the aim of the study is interesting and relevant for a larger audience, due to the ongoing controversy around the existence of adult hippocampal neurogenesis in humans, a number or technical weaknesses result in a poor support for many of the conclusions made from the results of these experiments.

      In particular, a recent meta analysis of five previous studies applying similar techniques to human samples has identified different aspects of sample size as main determinants of the statistical power needed to make significant conclusions. Some of this aspects are the number of nuclei sequenced and subject stratification. These two aspects are of concern in Yao's study. First, the number of sequenced nuclei is lower than the calculated numbers of nuclei required for detecting rare cell types. However, Yao et al. report succeeding in detecting rare populations, including several types of neural stem cells in different proliferation states, which have been demonstrated to be extremely scarce by previous studies. It would be very interesting to read how the authors interpret these differences. Secondly, the number of donors included in some of the groups is extremely low (n=1) and the miscellaneous information provided about the donors is practically inexistent. As individual factors such as chronic conditions, medication, lifestyle parameters, etc... are considered determinant for the variability of adult hippocampal neurogenesis levels across individuals, this represents a series limitation of the current study. Overall, several technical weaknesses severely limit the relevance of this study and the ability of the authors to achieve their experimental aims.

      After a first review round, the manuscript is still lacking a clear discussion of its several technical limitations, which will help the audience to grasp the relevance of the findings. In particular, detailed information about individual patients health status and relevant lifestyle parameters that may have affected it is lacking. The authors make the point themselves that the discrepancies among studies might be caused by health state differences across hippocampi, which subsequently lead to different degrees of hippocampal neurogenesis.". So, even in the authors own interpretation this is a serious limitation to the manuscript, that however out of the authors control, impacts on the quality of their findings.

      Reviewer #2 (Recommendations For The Authors):

      Please see public review. I do understand the authors point about incomplete patient data collection and low patient numbers and how the former is out of their control. Nevertheless, these are crucial parameters that impact negatively on the quality and relevance of several of their bold claims in the manuscript, especially given the low number of patients included. The current version still lacks a clear and honest discussion of the several technical and conceptual limitations of the authors work, as in some cases they are presented to the reviewers in the rebuttal letter, for the readership, so that they could critically evaluate the relevance of the authors' finding in a bigger perspective.

      We thank the reviewer for reevaluating our revised manuscript. We respect the reviewer’s comm¬ents and discuss the technical and conceptual limitations of this work. Here we provide the response to Reviewer #2 (Public Review) on these below.

      We understand the reviewer’s concern and have also noticed that according to the computational modeling conducted by Tosoni et al. (Neuron, 2023), at least 21 neuroblast cells (NBs) can be identified out of 30,000 granule cells (GCs) from a total of 180,000 dentate gyrus (DG) cells. In our dataset, we sequenced 24,671 GC nuclei and 92,966 total DG cell nuclei, which also includes neonatal samples. The number of nuclei we sequenced is 4.5 times higher than that of Wang et al. (Cell Research, 2022), who also detected NBs. Therefore, it is possible that we are able to detect NBs. Importantly, we have implemented strict quality control measures to support the reliability of our sequencing data. These measures include: 1. Immediate collection of tissue samples after postmortem (3-4 hrs) to ensure the quality of isolated nuclei. 2. Only nuclei expressing more than 200 genes but fewer than 5000-8600 genes (depending on the peak of enrichment genes) were considered. On average, each cell detected around 3000 genes. 3. The average proportion of mitochondrial genes in each sample was approximately 1.8%, with no sample exceeding 5%. We have shown that the number of cells captured from individual samples and the average number of genes detected per cell are sufficient, indicating overall good sequencing quality (Figure 1-supplement 1A,B andF, and Figure 1-source data 1). Additionally, we have further confirmed the presence of these cell types with low abundance by integrating immunofluorescence staining (Figure 4E, 5D and 6B), cell type-specific gene expression (Figure1 C and D), overall transcriptomic characteristics (Figure 1-supplement 1E), and developmental potential (Figure4 A-D, Figure 6E and F). We hope these evidences together could explain why we can identify the rare neurogenic populations.

      Regarding the limited sample size and poor patient information, we cannot adequately address these two major concerns. Due to the scarcity of stroke or neonatal human samples, it was not feasible to collect a larger sample size within the expected timeframe. We have collected additional details about the donors. Please refer to Figure 1-source data 1 for the updated information. Other information regarding the lifestyle parameters of these donors has not been sufficiently recorded by the hospital. Therefore, we cannot improve the patient information further.

      As per the reviewer’s recommendation, in the latest version, we have discussed the technical and conceptual limitations of this work and added a brief discussion about these limitations in the last paragraph of the main text. We hope the readers can interprate our data critically and objectively.

    1. Author Response

      We thank both reviewers for the positive evaluation of our work and suggestions on how to improve it.

      We agree with Reviewer #1 that reporting uncertainties will both clarify and strengthen our arguments. Where applicable, uncertainties will be added in a revised version.

      To Reviewer #2’s suggestion of including free energy calculations to estimate the free energies of hydrogen bond and hydrophobic interactions, the current free energy methods are capable of given accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet.

      Again, we thank the reviewers for their assessment and suggestions. We will update the manuscript as we have outlined above.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review

      [...] A particular strength of the present study is the structural characterization of human PURA, which is a challenging target for structural biology approaches. The molecular dynamics simulations are state-of-the-art, allowing a statistically meaningful assessment of the differences between wild-type and mutant proteins. The functional consequences of PURA mutations at the cellular level are fascinating, particularly the differential compartmentalization of wild-type and mutant PURA variants into certain subcellular condensates.

      Weaknesses that warrant rectification relate to (i) The interpretation of statistically non-significant effects seen in the molecular dynamic simulations.

      We removed from the manuscript the sentence which indicated that we analyzed statistically non-significant effects. Therefore, the above statement has been resolved.

      (ii) The statistical analysis of the differential compartmentalization of PURA variants into processing bodies vs. stress granules, and

      We re-analyzed all cell-biological data and adjusted the statistical analysis of P-bodies and Stress-granule intensity analysis. The new, and improved statistics have replaced the original analyses in the corresponding figures (Figs. 1C and 2B).

      (iii) Insufficient documentation of protein expression levels and knock-down efficiencies.

      Quantification of protein expression levels by Western blotting is shown in Appendix Figure S1. Quantification of knock-down efficiencies by Western blot experiments (Appendix Figure S3).

      Recommendations for the authors: Reviewer #1

      Concerns and Suggested Changes

      (a) I have only one concern about the computational part and that is about statements such as "There are also large differences in the residue surrounding the mutation spot (residues 90 to 100), where the K97E mutant also shows much greater fluctuation. However, these differences are not significant due to the large standard deviations." If the differences are not statistically significant, then I would suggest either removing such a statement or increasing the statistics.

      We agree with the Reviewer’s comment. We removed this sentence from the text.

      Recommendations for the authors: Reviewer #2

      General Comments

      This is a challenging structural target and the authors have made considerable efforts to determine the effect of several mutations on the structure and function. Many of the constructs, however, could not be expressed and/or purified in bacteria. However, it is not clear to what extent other expression systems (e.g. Drosophila or human) were considered and if this would have been beneficial.

      We did not use other expression systems because the wild-type protein is well-behaved when expressed in E. coli. In case a mutant variant cannot be expressed or does not behave well in E. coli, this constitutes a clear indication that the respective mutation impairs the protein’s integrity. Thus, by using E. coli as a reference system for all the variants of PURA protein, we could assess the influence of the mutations on the structural integrity and solubility. Only for the variants that did not show impairment in E. coli expression, we continued to assess in more detail why they are nevertheless functionally impaired and cause PURA Syndrome.

      Concerns and Suggested Changes

      (a) The schematic in Figure 3A would have been helpful for interpreting the mutations discussed in Figures 1 and 2. I would suggest moving it earlier in the text.

      We changed the figure according to the Reviewer’s suggestion.

      (b) I believe the RNA used for binding studies in Figures 3C and D was (CGG)8. Are the two "free" RNA bands a monomer and a dimer (duplex?)?

      Although we do not know for certain, it is indeed likely that the two free RNA bands represent either different secondary structures of the free RNA or a duplex of two molecules. Of note, PURA binds to both “free” RNA bands, indicating that it either does not discriminate between them or melts double-stranded RNA in these EMSAs.

      There also seems to be considerable cooperativity in the binding, so I wonder if a shorter RNA oligonucleotide might facilitate the measurement of Kds.

      The length of the used RNA was selected based on the estimated elongated size of the full-length PURA and the presence of 3 PUR repeats. Assuming that one PUR repeat interacts with about 6-7 bases (data from the co-structure of Drosophila PURA with DNA; PDB-ID: 5FGP) and that full-length PURA forms a dimer consisting of three PUR repeats, the full-length protein in its extended form should cover a nucleic-acid stretch of about 24 bases.

      Also, it is not clear how the affinities were measured particularly for hsPURA III since free band is never fully bound at the highest protein concentration.

      It was not our goal to measure Kds for the interaction of PURA variants with RNA. The EMSA experiments were conducted to detect relative differences in the interaction between PURA variants and RNA. To estimate the differences, we measured total intensity of the bound (shifted) and unbound RNA. The intensities of the bands observed on the scanned EMSA gels were quantified with FUJI ImageJ software. We calculated the percentage of the shifted RNA and normalized it. hsPURA III fragment shows much lower affinity therefore it does not fully shift RNA with the highest protein concentration when compared to the full-length PURA and to PURA I-II.

      (c) Do the human PURA I+II and dmPURA I+ II crystallize in the same space group and have similar packing? Can the observed structural flexibility be due to crystal contacts?

      hsPURA I+II and dmPURA I+II crystallize in different space groups with different crystal packing. In both cases, the asymmetric unit contains 4 independent molecules with the flexible part of the structure composed of the β4 and β8 (β ridge) exposed to solvent. In the case of the Drosophila structure, we do not observe any flexibility of both β-strands. In contrast, for the human PURA structure the β ridge exhibits lots of flexibility and it adopts different conformations in all 4 molecules of the asymmetric unit. We observe similar flexibility of the β4 and β8 (β ridge) in the structure of K97E mutant which contains 2 molecules in the asymmetric unit. We would like to add that we expect crystal contacts to rather stabilize than destabilize domains.

      Similarly, can the conformations observed for the K97E mutant be partially explained by packing?

      Regarding the sequence shift observed for the β5 and β6 strands in hsPURA I+II K97E variant: although the β5 strand with shifted amino acid sequence is involved in the contact with the symmetry-related molecule with another β5 strand we don’t consider this interaction as a source of the shift. To be sure that the shift is not forced by the crystallization, we had performed NMR measurement which confirmed that in solution there is a strong change in the β-stands comparing WT and K97E mutant. This is an unambiguous indication that the structural changes observed in the crystal structure are also happening in solution. In addition, the MD simulations provide additional confirmation of our interpretation that K97E destabilizes the corresponding PUR domain. Taken together, we provide proof from three different angles that the observed differences indeed affect the integrity and hence function of the protein.

      (d) Perhaps, it is my misunderstanding, but I find the NMR data on the Arg sidechains for the K97E confusing. If they are visible for K97E and not WT, doesn't this indicate that there is an exchange between two conformations or more dynamics in the WT structure? This does not seem to be the opposite of the expectation if K97E is thought to have more conformational flexibility.

      Due to a technical issue (peak contour level), arginine side chain resonances were not clearly visible in the WT spectrum. The figure 5F has been updated. Now, they do correspond to those seen in the mutant spectrum. However, to prevent any confusion or mis/overinterpretation, we removed the sentence regarding arginine side chain: "Intriguingly, arginine side chain resonances Nε-Hε were only visible in the K97E variant, while they were broadened out in the wild-type spectrum."

      (e) The most speculative part of the paper is the interpretation of SG and PB localization of PURA in Fig 1 and 2. There is an important issue with the statistics that must be clarified because it would appear that statistical significance was determined using each SG or PB as an independent measurement. This is incorrect and significance should be measured by only using the means of three biological replicates. This is well described here. It is not clear at this time if the reported P values will be confirmed upon reanalysis, and this may require reinterpretation of the data.

      We are grateful for this clarifying comment and agree that the statistical analysis of P-body and stress granule was misleading. Of note, while the figures depicted all the values independent of the biological repeats, the statistical analyses were done on the mean value of each replicate of each cell line and not all raw data points.

      We prepared new Plots, only showing the mean value of each replicate, and also re-calculated P-values. The values have changed only slightly in this new analysis because we now also included the previously labeled outliers (red points) to better demonstrate that significance still exists even when considering them.

      In the new analysis of stress-granule association, only the value of the K97E mutant lost its significance, indicating that its association to stress granules is not lost. Therefore, we adjusted the following sentences in the manuscript.

      Results:

      Original: "While quantification showed a reduced association of hsPURA K97E mutant with G3BP1-positive granules (Fig 1B), the two other mutants, I206F and F233del, showed the same co-localization to stress granules as the wild type control."

      Corrected: "In all the patient-related mutations, no significant reduction in stress granule association was seen when compared to the wild type control (Fig 1C)."

      Original: "The observation that only one of the patient-related mutations of hsPURA, K97E, showed reduced stress granule association indicates that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      Corrected: "As we did not observe significant changes in the association of patient-related mutations of hsPURA to stress granules, it is suggested that that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      (f) A western blot showing the level of overexpression of the PURA proteins should be shown in Figure 1 as well as the KD of endogenous PURA for Figure S2?

      As requested, a Western blot showing the level of overexpression of the different PURA proteins has been added as Appendix Figure S1.

      A Western blot of the siRNA-mediated knock-down experiments of PURA and their corresponding control has been added to Appendix Figure S3. Quantification of three biological repeats showed a significant reduction of PURA protein levels upon knock down.

      (g) While I appreciate that rewriting is time-consuming, I would recommend considering restructuring the manuscript because I think that it would aid the overall clarity. I think the foundation of the work is the structural characterization and would suggest beginning the paper with this data and the biochemical characterization. The co-localization with SGs and PBs and how this may be relevant to disease is much more speculative and is therefore better to present later. While I appreciate that the structural interpretation of why some mutants localize to PBs differently is not entirely clear, I do think that this would provide some context for the discussion.

      In the initial version of the manuscript we first presented the structural characterization of PURA and afterwards the co-localization with SGs and PBs. As this reviewer stated him-/herself in (e), we also noticed that the SG and PB interpretation is the most speculative part of this manuscript. We felt that having this at the end of the results section would weaken the manuscript. On the other hand, we consider that the structural interpretation of mutations is much stronger and has a greater impact for future research. After long discussion we decided to swap the order to leave the most important results for the end of the manuscript.

      Recommendations for the authors: Reviewer #3

      Concerns and Suggested Changes:

      (a) For the characterization of G3BP1-positive stress granules in HeLa cells upon depletion of PURA, it remains unclear what is the efficiency of siRNA? The authors should provide a western blot to indicate how much the endogenous levels were reduced.

      We completely agree with the stated concern and addressed it accordingly. We had performed this experiment prior to submission but for some unknown reason it was not included in the manuscript.

      The Western blot of siRNA-mediated knock-down experiments of PURA and their corresponding control is now shown in Appendix Figure S3. Quantification of three biological repeats, showed a significant reduction of PURA protein levels upon knock down.

      (b) How does knocking down PURA affect DCP1A-positive structures in HeLa cells? Would P bodies be formed even in the absence (or reduction) of total PURA?

      Indeed, the stated question is very interesting. In fact, we have already shown in our recent publication (Molitor et al., 2023) that a knock down of PURA in HeLa and NHDF cells leads to a significant reduction of P-bodies. We actually referred to this finding on page 6:

      "Since hsPURA was recently shown to be required for P-body formation in HeLa cells and fibroblasts (Molitor et al. 2023), PURA-dependent liquid phase separation could potentially also directly contribute to the formation of these granules."

      On the same page, we also refer to the underlying molecular mechanism:

      "However, when putting this observation in perspective with previous reports, it seems unlikely that P-body formation directly depends on phase separation by hsPURA, but rather on its recently reported function as gene regulator of the essential P-body core factors LSM14a and DDX6 (Molitor et al., 2023)."

    1. Author Response

      The following is the authors’ response to the previous reviews.

      • Is the coronal slice in Figure 2 the corresponding mid-coronal plane to compute Dice scores? If so, the authors could mention it so that readers have an idea where the selected slice is.

      This is indeed a good point. The coronal slice in Figure 2 is not part of the set of slices that we used to compute Dice scores. Showing such a slice is important, so we have added a small figure to the appendix with one of these slices, along with the corresponding automated segmentations.

      • SIFT descriptors were adopted to detect fiducials only. Maybe it could also be applied to align stacked photographs of brain slices.

      While SIFT is robust against changes in pose (e.g., object rotation), perspective, and lightning, it is not robust against changes in the object itself – such as changes between one slice to the next, as is the case in our work. We have added a sentence to the methods section clarifying this issue.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Weaknesses:

      Start site fidelity in purified recons5tuted systems can be drama5cally altered in different buffer condi5ons. Interpreta5on of the observed changes to start site selec5on in mRNAs in the absence or presence of Ded1 using only the one buffer condi5on used is therefore limited.

      This is an excellent point and is something we could explore in future studies using the Rec-Seq system. We have added this caveat to the Discussion on lines 797-809. We have previously studied the fidelity of start codon recogni>on in the recons>tuted system (Kolitz et al., [2009] RNA, 15:138-152) and found that under our standard buffer condi>ons the codon specificity generally reflects what we observed in vivo using a dual-luciferase reporter assay, with the most stable 48S complexes forming on AUG codons, followed by first posi>on mismatches (GUG, UUG, CUG), with second and third posi>on mismatches leading to significantly less stable complexes. However, as the reviewer notes, there are some devia>ons: ACG and AUA are poor codons in the in vitro system under the buffer condi>ons used but allowed rela>vely strong expression in our in vivo reporter assay. It should also be noted that the hierarchy of nearcognate start codon usage in vivo in yeast differs according to the study and the reporter used, making it difficult to establish a “ground truth” for start codon fidelity.

      I have some specific comments to strengthen the manuscript and address some minor issues.

      It is not clear to me whether the authors refold the purified mRNA aEer phenol/chloroform extrac5on? Have the authors observed different results if the mRNA is refolded or not? This is appropriate since the authors compare their Rec-Seq data to PARS scores that were generated from refolded mRNAs. One assumes that the total mRNA used is refolded in the same way as the PARS score study, but this is not clearly stated. The authors should make this point clear in the text and methods.

      This is an excellent point. We did not use the final refolding protocol that Kertesz et al. used when they developed their PARS scores and now clarify this in the Methods sec>on (lines 962967). It is possible that we would have seen stronger correla>ons in the analyses using PARS scores had we followed the renatura>on protocol, although the fact that we observed significant correla>ons (e.g., Fig. 3E-H) suggests the structures in the Kertesz et al. mRNAs were similar to those in our mRNAs.

      It is not clear how the authors determine the concentra5on of total mRNA that is used in the assay - reported as 60 nM? Are the authors assuming a molecular weight of an average mRNA to determine the concentra5on? The authors should provide more detail for how they quan5fy their mRNA concentra5on and its stoichiometry compared to 43S PICs.

      We thank the reviewer for poin>ng out this oversight and have now included this informa>on on lines 849-855 of the Methods sec>on.

      Comments regarding start site fidelity in the recons5tuted system:

      The authors use in vitro transcribed tRNAi-Met. Since tRNA modifica5ons may play a role in start site fidelity, the authors should perhaps men5on that this will need to be inves5gated in a future study in the discussion.

      This is a good point and we now note it as a caveat in the Discussion on lines 806-809.

      The authors state that Ded1 promotes leaky scanning regardless of the mAUG start site context (page 24; lines 533-534). The authors then state on page 25 that the level of iAUG ini5a5on rela5ve to mAUG ini5a5on does depend on the mAUG context (lines 545-546). This seems contradictory unless I am not understanding this correctly? It would certainly be surprising that mAUG context didn't regulate leaky scanning in the recons5tuted system given the fact that ini5a5on codon context regulates selec5on in cells (when Ded1 is present).

      These statements are correct as wrihen. As shown in Figure 5O, the frequency of leaky scanning (as measured by rela>ve ribosome occupancy of the internal region of the ORF, not including the main start codon, to the whole ORF, including the main start codon; RRO) decreases as the context score around the start codon gets stronger (green and purple lines). The RRO is increased to the same extent when 500 nM Ded1 is added, regardless of the strength of the start codon context, indica>ng that Ded1 enhances leaky scanning equally (compare slopes of the green line without Ded1 to the purple line with Ded1). Because of this, the effect of Ded1 on RRO (DRR0) is constant across context score bins (orange line). There is no discrepancy between our two conclusions that leaky scanning of the mAUG increases as context score decreases and that Ded1 increases leaky scanning equally for good and bad mAUG contexts, indica>ng that Ded1 does not inspect the mAUG context and simply decreases the dwell >me equally at all contexts.

      Further to the start site context ques5on. It is possible that the fidelity of the recons5tuted system (i.e. buffer condi5ons) is not fully reflec5ng in vivo-like start site selec5on. A rigorous characteriza5on of commercially available re5culocyte lysate systems iden5fied buffer condi5ons that provided similar start site fidelity to that observed in live cells (Kozak. Nucleic Acids Res. 1990 May 11;18(9):2828). While I feel that it is beyond the context of the current work to undertake a similar rigorous buffer characteriza5on, one must be careful about interpre5ng the results about leaky scanning and upstream ini5a5on sites in the current work. Perhaps one would observe similar results to Guenther et al. if the fidelity (buffer condi5ons) of the recons5tuted system were different? I appreciate that the authors state that their results only apply to their recons5tuted system and do not necessarily suggest that previous data are incorrect, but with only one buffer condi5on being tested in the current study it may be appropriate to further soEen the interpreta5on of the current results when compared to published data in live cells.

      This point is well-taken. As noted above, we have added a caveat about possible effects of buffer condi>ons on start codon fidelity to the Discussion (lines 797-809). In terms of the possibility that upstream ini>a>on is more frequent in vivo than we observe in the in vitro RecSeq system, we previously studied 5’UTR transla>on in vivo using ribosome profiling (Kulkarni et al. [2019] BMC Biol., 17:101). The ra>o of RPFs in 5’UTRs to coding sequences in this study was 0.0027, very similar to the value measured in the in vitro Rec-Seq system in the presence of Ded1 (0.0016-0.0017). Thus, it does not seem that the frequency of upstream ini>a>on is drama>cally higher in vivo than in our in vitro system. We have now made note of this point in the Results (lines 594-598). Guenther et al. employed a ribosome profiling protocol in which they added cycloheximide to their cells prior to lysis, which has been shown to create significant ar>facts, par>cularly in 5’UTR transla>on (e.g., Gerashchenko and Gladyshev [2014] Nucleic Acids Res., 42:e134). Nevertheless, as suggested by the reviewer, we have modified the text in the Results and Discussion to somen the interpreta>on somewhat (lines 582-583; 616-618; 761763).

      Reviewer #2

      Weaknesses:

      Several findings in this report are quite surprising and may require addi5onal work to fully interpret. Primary among these is the finding that Ded1p s5mulates accumula5on of PICs at internal site in mRNA coding sequences at an incidence of up to ~50%. The physiological relevance of this is unclear.

      We agree with the reviewer that understanding the physiological significance, if any, of the apparent leaky scanning of main AUG start codons induced by Ded1 is an unanswered ques>on that will require addi>onal studies. It is possible that rapid 60S subunit joining and forma>on of the 80S ini>a>on complex amer start codon recogni>on on most mRNAs reduces the leaky scanning effect in vivo. We now bring up this possibility in the Discussion sec>on (lines 804809). However, as noted in lines 568-580, mRNAs that display significantly decreased mRPFs at 500 nM Ded1 in the Rec-Seq system also tend to have TEs that are increased in the ded1-cs- mutant rela>ve to WT yeast in in vivo ribosome profiling experiments, sugges>ng that Ded1 ac>vity also diminishes ini>a>on on mAUG codons in these mRNAs in vivo.

      A limita5on of the methodology is that, as an endpoint assay, Rec-Seq does not readily decouple effects of Ded1p on PIC-mRNA loading from those on the subsequent scanning step where the PIC locates the start codon. Considering that Ded1p ac5vity may influence each of these ini5a5on steps through dis5nct mechanisms - i.e., binding to the mRNA cap-recogni5on factor eIF4F, or direct mRNA interac5on outside eIF4F - addi5onal studies may be needed to gain deeper mechanis5c insights.

      We agree that this is a limita>on of the Rec-Seq assay and now men>on this point in the Discussion sec>on (lines 810-817). It is possible that future work using cross-linking agents to stabilize 43S complexes bound near the cap and scanning the 5’UTR, similar to the methodology used in 40S ribosome profiling, could enable us or others to disentangle these steps from one another.

      As the authors note, the achievable Ded1p concentra5ons in Rec-Seq may mask poten5al effects of Ded1p-based granule forma5on on transla5on ini5a5on. Addi5onal factors present in the cell could poten5ally also promote this mechanism. Consequently, the results do not fully rule out granule forma5on as a poten5al parallel Ded1p-mediated transla5on-inhibitory mechanism in cells.

      We agree. As stated in the Discussion sec>on (lines 735-741): “It is possible that at higher concentra>ons of Ded1 than were achievable in these in vitro experiments or in the presence of addi>onal factors that modify Ded1’s ATPase or RNA binding ac>vi>es the factor could directly inhibit a subset of mRNAs, by ac>ng as an mRNA clamp that impedes scanning by the PIC, or by sequestering the mRNAs in insoluble condensates. It might be interes>ng in the future to test candidate factors in Rec-Seq to determine if they switch Ded1 from being a s>mulatory helicase to an inhibitory mRNA clamp that removes transcripts from the soluble phase.”

      It is certainly clear why the 15-minute 5mepoint was chosen for these assays. However, I wondered whether data from an earlier 5mepoint would provide useful informa5on. The descrip5on on line 210 of the compiled PDF suggests data from different 5mepoints may be available; if it is, in my view it could be a useful addi5on. More generally, including language about the single-turnover nature of these reac5ons may be helpful for the benefit of a broad audience.

      In preliminary experiments, we have used the Rec-Seq system to measure the kine>cs of 48S PIC forma>on transcriptome-wide. As you probably can imagine, this is a challenging experiment and requires addi>onal work before we would feel comfortable publishing it. We very much agree with the reviewer that resolving the kine>cs of these events will provide important addi>onal informa>on. As suggested, we have added caveats about the endpoint and single-turnover nature of the assay to the Discussion (lines 821-828).

      I wondered whether it might be useful to present addi5onal informa5on on the mRNAs not found in the assay. For example, are these the least abundant mRNAs, which may not have had 5me to recruit the 43S PIC?

      75% of mRNAs (2719 of 3640) not observed in the Rec-Seq analysis had densi>es below the median (2.3 reads per nucleo>de). We now men>on this in the Methods sec>on (lines 855856).

      The Rec-Seq recruitment reac5ons were carried out at 22C˚ . Considering that remodeling of RNA structure by helicase enzymes is a focal point of the study, linking the results to the recruitment landscape at a closer-to-physiological temperature may bolster the conclusions.

      In the future, it would be interes>ng to test the effects of temperature on 48S PIC forma>on using the Rec-Seq system. As the reviewer suggests, the interplay between temperature and mRNA structure could reveal interes>ng phenomenon. It is worth no>ng, however, that there is no clear “physiological” temperature for S. cerevisiae. For consistency and convenience, lab yeast is usually grown at 30 ˚C, but in the wild yeast live at a wide range of temperatures, which generally change throughout the day. From this standpoint, 22 ˚C seems reasonably physiological.

      Results from Rec-seq experiments conducted at 15° C might be more directly comparable to in vivo Ribo-seq data with the ded1-cs mutant. However, already ~90% of the Ded1hyperdependent mRNAs iden>fied by Ribo-seq analysis of that mutant were iden>fied here as Ded1-s>mulated mRNAs in Rec-Seq experiments at 22°C. The Ribo-seq experiments conducted by Guenther et al. were conducted on the ded1-ts mutant at 37°C; thus, any structures that confer Ded1-dependent leaky-scanning through uORFs detected in that study should have been stable in our Rec-Seq experiments.

      The introduc5on provides an important, detailed exposi5on of the state of the field with respect to Ded1p ac5vity. Nevertheless, in my view, it is quite lengthy and could be streamlined for clarity. As just one example, the proposed func5on of Ded1p in the nucleus seems like a detail that could be dispensed with for the present work.

      We have ahempted to shorten the Introduc>on, as suggested. However, we did not remove the short sec>on describing Ded1’s possible roles in the nucleus and ribosome biogenesis because we felt it was important to emphasize that one of the strengths of the Rec-Seq system is that it allows us to isolate the early steps of transla>on ini>a>on from later steps and from other cellular processes. In addi>on, at the sugges>on of Reviewer #3, we added a brief explana>on of Ded1’s possible role in the subunit joining step of transla>on.

      Reviewer #3

      Weaknesses:

      The slow nature of the biochemical experiments could bias results.

      We agree that the 15-minute >me point used could mask effects that are manifested at a purely kine>c level. It should be noted that we have measured the observed rate constants for 48S forma>on on a variety of mRNAs in the in vitro recons>tuted system in the presence of satura>ng Ded1 (Gupta et al. [2018] eLife, hhps://elifesciences.org/ar>cles/38892 ) and found that they are generally in the range of es>mates of rate constants for transla>on ini>a>on in vivo in yeast (~1-10 min-1; e.g., Siwiak and Zielenkiewicz [2010], PLOS Comput. Biol., 6: e100865). In preliminary experiments, we have used the Rec-Seq system to measure the kine>cs of 48S PIC forma>on transcriptome-wide in the absence of Ded1 and find that the mean rate constant observed (~2 min-1) is also within the range of es>mates of the rate of transla>on ini>a>on in vivo in yeast. We hope to publish this analysis in a future manuscript.

      It has been suggested that Ded1 and its human homolog DDX3X could play a role in subunit joining postscanning (Wang et al. 2022, Cell and Geissler et al. 2012 Nucleic Acids Res). Could the authors poten5ally inves5gate this by adding GTP, eIF5B and 60S subunits into the reac5on mixture and isola5ng 80S complexes?

      This is a very interes>ng sugges>on. One of our plans with the Rec-Seq system is to see if we can also observe 80S forma>on with it and dis>nguish 80S from 48S complexes. Although we haven’t yet tried this and there might be technical obstacles to doing it, if it works we would like to examine the poten>al effects of Ded1, as suggested. We now men>on this possibility in the Discussion sec>on (lines 709-716 and 810-817).

      An incuba5on 5me of 15 minutes is quite long on the 5mescale of transla5on ini5a5on. Presumably, the compe55on for 40S among mRNAs is par5ally kine5cally controlled so it would be interes5ng if the authors could do a 5me series on the incuba5on 5me. Does Ded1 increase ini5a5on on more structured UTRs even at shorter incuba5ons or are those only observed with longer incuba5ons?

      We agree. See the response to the ques5on about kine5cs above.

      Does GDPNP lead to off-pathway events? What happens when GTP is used in the TC? Presumably in the absence of eIF5B the 48S PIC should remain stalled at the start codon.

      In previous experiments in the recons>tuted system, we showed that using GTP instead of GDPNP resulted in 48S complexes that were less stable than those stalled prior to GTP hydrolysis (e.g., Algire et al. [2002] RNA 8:382-397). This is presumably because eIF2•GDP and eIF5 release from the complex and the Met-tRNAi can dissociate in the absence of subunit joining. Although we haven’t tried it in the Rec-Seq system, we suspect that the resul>ng PICs would fall apart during sucrose gradient sedimenta>on.

      The authors use assembly of a 48S PIC at the start codon as evidence of scanning but could use more evidence to back this claim up. Does removing the cap structure on the two luciferase mRNA controls disrupt ini5a5on using this approach? That would be direct evidence of 5' end 40S loading and scanning to the start codon.

      In previous work using the recons>tuted system, we studied the effect of the 5’-cap on 48S PIC forma>on (Mitchell et al. [2010] Mol. Cell 39:950-962; Yourik et al. [2017] eLife hhps://elifesciences.org/ar>cles/31476 ). We found that stable 48S PIC forma>on is strongly dependent on the presence of the 5’-cap. In addi>on, the cap prevents off-pathway events and enforces a requirement for the full set of ini>a>on factors to achieve efficient 48S PIC forma>on. As the reviewer indicates, the cap-dependence of the system supports the conclusion that 5’end loading and scanning take place. We have now added this informa>on and the relevant cita>ons to the Introduc>on (lines 147-153). We thank the reviewer for poin>ng out this oversight. It should also be noted that the cases of mRNAs in which 5’UTR transla>on is increased by addi>on of Ded1 support the conclusion that the factor promotes ahachment of the PIC to the 5’ ends of mRNAs and subsequent 5’ to 3’ scanning, as noted in lines 608-618.

      The authors state that "The correla5on between CDS length and RE could be indirect because CDS length also correlates with 5'UTR length". Could the authors bin the transcripts into different 5' UTR length ranges and then probe for CDS length differences on RE for each 5' UTR length bin? This could be useful to truly parse the mechanism by which CDS length is influencing RE.

      This was an excellent sugges>on. We now include this analysis in a new supplementary figure, Figure 3S-2. Corresponding text was added in lines 380-387:

      “Importantly, correlations between Ded1 stimulation and 5’ UTR lengths are evident for all three groups of mRNAs containing distinct ranges of CDS lengths (Fig. 3-S2A-C). In contrast, a marked correlation between Ded1 stimulation and CDS length was detected only for the group of mRNAs with longest 5’UTRs (Fig. 3-S2D-F), and only the latter group showed a clear correlation between 5’UTR length and CDS length (Fig. 3-S2G-I). Thus, the correlation between Ded1 stimulation and CDS length appears to be indirect, driven by the tendency for the mRNAs with the longest 5’UTRs to also have correspondingly longer CDSs.”

      We thank the reviewer for this very useful idea.

      In Figure 3I, why does RE dip for the middle bins of CDS length in both 100 nM and 500 nM condi5ons, and then rise back up for the later bins? In other words, why do the shortest and longest CDS have the best RE in the presence of ded1?

      We do not know the reason for this dip and now say this in the Results on lines 377-378.

      The discussion sec5on would be well served to discuss proposed roles of Ded1 post-scanning and how those fit, if at all, with the data presented throughout the manuscript.

      We have now added this to the Discussion (lines 709-716 and 810-817). We thank the reviewer for poin>ng out this oversight.

      Minor comments:

      • Define bins on figures rather than using bin number for axis labels. For example, Figure 3A-D x-axis labels indicate the length range of each bin.

      Thank you for the sugges>on. We have made this change.

      • Figure 3I: the data seem to indicate that shortest CDSs have a ded1 dependency similar to the longest CDSs. This result seems inconsistent with the given rela5onship between UTR length, structure, CDS length. Please clarify.

      See answer to this ques>on above.

      • Replace qualita5ve statements, such as "substan5ally smaller reduc5ons" with percent change, numbers, etc.

      We have tried to replace qualita>ve statements with quan>ta>ve ones, where possible.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study advances our understanding of the cell specific treatment of cone photoreceptor degeneration by Txnip. The evidence supporting the conclusions is convincing with rigorous genetic manipulation of Txnip mutations, however, there are a few areas in which the article may be improved through further analysis and application of the data. The work will be of broad interest to vision researchers, cell biologists and biochemists.

      Reviewer #1 (Public Review):

      Summary:

      This is a follow-up study to the authors' previous eLife report about the roles of an alpha-arrestin called protein thioredoxin interacting protein (Txnip) in cone photoreceptors and in the retinal pigment epithelium. The findings are important because they provide new information about the mechanism of glucose and lactate transport to cone photoreceptors and because they may become the basis for therapies for retinal degenerative diseases.

      Strengths:

      Overall, the study is carefully done and, although the analysis is fairly comprehensive with many different versions of the protein analyzed, it is clearly enough described to follow. Figure 4 greatly facilitated my ability to follow, understand and interpret the study.

      Weaknesses:

      I have just one concern that I would like the authors to address. It is about the text that begins at line 133: "We assayed their ability to clear GLUT1 from the RPE surface (Figure 2A)". Please provide more details about this. From the figure it appears that n = 1 for this experiment, but given how careful the authors are with these types of studies that seems unlikely. How did the authors quantify the ability to clear GLUT1 from the surface? Was it cleared from both the apical and basal surface? (It is hard to resolve the apical and basal surfaces in the images provided). The experiments shown in Fig. 1H and Fig. 1I of PMID 31365873 shows how GLUT1 disappears only from the apical surface (under the conditions of that experiment and through the mechanism described in their text). It would be helpful for the authors to discuss their current results in the context of that experiment.

      We repeated all eight AAV-Best1-Txnip alleles for RPE GLUT1 staining with more than three eyes of each condition. We also quantified the GLUT1 intensity on the RPE basal surface. A new Figure 2-figure supplement 1 with these data has been added to this submission. The results and conclusions are similar to those in our initial submission.

      As mentioned in our provisional responses: GLUT1 on the basal surface of the RPE is more easily scored than that on the apical surface. The photoreceptor inner segments and Müller glia microvilli also have GLUT1, and their processes are juxtaposed and/or intertwined with the apical processes of the RPE, making the apical process GLUT1 staining of the RPE much more difficult to score. In some sections where the RPE and the retina separate, we can score the apical process GLUT1 staining of the RPE, but we do not always have this situation in our sections. The current quantification in the new Figure 2-figure supplement 1 thus concerns only the basal staining.

      As a separate issue, Reviewer #1 mentioned the work of another group (Wang et al., 2019, PMID: 31365873), which claimed that, on the apical surface of the RPE, GLUT1 is down-regulated in a RP mouse strain, RhoP23H. We have not consistently observed such a down-regulation of GLUT1 in other RP mouse strains such as rd1, rd10 or Rho-/- (unpublished data; see review Xue and Cepko, 2023, PMID: 37460158). However, as we pointed out above, it is difficult to score GLUT1 staining on the RPE apical surface. It is even more difficult in the degenerating retina where RPE and photoreceptor processes degenerate. For reference, one can see images of degenerating RPE apical processes in Wu et al. 2021 (PMID: 33491671).

      Reviewer #2 (Public Review):

      The hard work of the authors is much appreciated. With overexpression of a-arrestin Txnip in RPE, cones and the combined respectively, the authors show a potential gene agnostic treatment that can be applied to retinitis pigmentosa. Furthermore, since Txnip is related to multiple intracellular signaling pathway, this study is of value for research in the mechanism of secondary cone dystrophy as well.

      There are a few areas in which the article may be improved through further analysis and application of the data, as well as some adjustments that should be made in to clarify specific points in the article.

      Reviewer #3 (Public Review):

      Summary:

      Xue et al. extended their groundbreaking discovery demonstrating the protective effect of Txnip on cone photoreceptor survival. This was achieved by investigating the protection of cone degeneration through the overexpression of five distinct mutated variants of Txnip within the retinal pigment epithelium (RPE). Moreover, the study explored the roles of two proteins, HSP90AB1 and Arrdc4, which share similarities or associations with Txnip. They found the protection of Txnip in RPE cells and its mechanism is different from its protection in cone cells. These discoveries have significant implications for advancing our understanding of the mechanisms underlying Txnip's protection on cone cells.

      Strengths: (1) Identify the roles of different Txnip mutations in RPE and their effects on the expression of glucose transporter

      (2) Dissect the mechanism of Txnip in RPE vs Cone photoreceptors in retinal degeneration models.

      (3) Explore the functions of ARrdc4, a protein similar to Txnip and HSP90AB1 in cone degeneration.

      Weaknesses:

      (1) Arrdc4 has deleterious effect on cone survival but no discussion on its mechanism.

      (2) Inhibition of HSP90 is known to cause retinal generation. It is unclear why inhibition enhances the protection of Txnip.

      As mentioned in our provisional responses, little was known about the function of Arrdc4 or HSP90AB1 in cones. We summarize some of the recent discoveries regarding these two proteins in the new Discussion:

      “Arrdc4, the most similar α-arrestin protein to Txnip that also has Arrestin N- and C- domains, accelerated RP cone death when transduced via AAV (Figure 1). This observation suggests that Txnip has unique functions that protect RP cones. Recently, Arrdc4 has been proposed to be critical for liver glucagon signaling, which could be negated by insulin (Dagdeviren et al. 2023). The implication of this potential role in RP cone survival is unclear, but interestingly, the activation of the insulin/mTORC1 pathway is beneficial to RP cone survival (Punzo et al. 2009; Venkatesh et al. 2015).”

      “Little is known about the function of HSP90AB1. Knocking down Hsp90ab1 improved mitochondrial metabolism of skeletal muscle in a diabetic mouse model (Jing et al. 2018). Knocking out HSP90AA1, a paralog of HSP90AB1 which has 14% different amino acids, led to rod death and correlated with PDE6 dysregulation (Munezero et al. 2023). Inhibiting HSP90AA1 with small molecules transiently delayed cone death in human retinal organoids under low glucose conditions (Spirig et al. 2023). However, the exact role of HSP90AA1 in photoreceptors needs to be clarified, and the implications for HSP90AB1 in RP cones are still unclear. ”

      In addition, we used AlphaFold Multimer, an AI algorithm based on AlphaFold-2, to explore the possible interaction between TXNIP, PARP1 and HSP90AB1 in the revision. One of the predicted models is shown as the new Figure 5-figure supplement 2. The C-terminus of Txnip is predicted to link HSP90AB1 and PARP1 together in this model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have just one concern that I would like the authors to address. It is about the text that begins at line 133: "We assayed their ability to clear GLUT1 from the RPE surface (Figure 2A)". Please provide more details about this. From the figure it appears that n = 1 for this experiment, but given how careful the authors are with these types of studies that seems unlikely. How did the authors quantify the ability to clear GLUT1 from the surface? Was it cleared from both the apical and basal surface? (It is hard to resolve the apical and basal surfaces in the images provided). The experiments shown in Fig. 1H and Fig. 1I of PMID 31365873 shows how GLUT1 disappears only from the apical surface (under the conditions of that experiment and through the mechanism described in their text). It would be helpful for the authors to discuss their current results in the context of that experiment.

      See our responses to Review #1’s public review section above.

      Also, is the clearance from the RPE plasma membrane homogenous throughout the RPE monolayer?

      In the area of AAV infection, the effects are very homogenous. In the uninfected area, the clearance does not occur, and we consider the uninfected area of the same eye to be an excellent internal control.

      A statistical analysis (as was provided for other experiments in the manuscript) would help to make the surprising conclusion about C.Txhniip.C247S more convincing.

      In this revision, we used the Mann-Whitney U test with the Bonferroni correction for GLUT1 intensity quantification. For the cone survival statistics, we used the t-test or ANOVA with Dunnett multiple comparison test. The information has been added to each figure legend.

      Another improvement I suggest for this figure is to include normal full length Txnip as a positive control to show how completely it removes GLUT1 from the surface.

      Added. See the new Figure 2-figure supplement 1.

      Another point that should be discussed is - when Txnip prevents GLUT1 from reaching the surface does all the GLUT1 get fully degraded within the cell. A brief description of how Txnip influences GLUT1 stability and localization would be helpful.

      We are unable to track the fate of the GLUT1 after it is removed, i.e. we do not see definitive intracellular staining. We do not know if this is due to degradation or a hidden epitope.

      Minor point

      (1) Confusing citation on lines 99-100: "We previously showed that overexpressing the Txnip wt allele in the RPE using an RPE specific promoter, derived from the Best1 gene (Esumi et al. 2009),.." makes it sound like Esumi et al. is the citation for their previous study, which is not correct.

      We have amended this to: "We previously showed (Xue et al. 2021) that overexpressing the Txnip wt allele in the RPE using an RPE-specific promoter, derived from the Best1 gene (Esumi et al., 2009), did not improve RP cone survival."

      Reviewer #2 (Recommendations For The Authors):

      Regarding the manuscript, here are some suggestions that authors can take into consideration for the completeness of the study:

      (1) The text references the relationship between α-arrestin and glucose metabolism in cone cells, but fails to provide an explanation for its specific involvement in glucose metabolism. Consequently, readers may struggle to discern the targeted metabolic pathway.

      We understand this point from Reviewer, and would love to know more about its mechanism, which is one reason why we undertook the current study. The mechanism(s) by which Txnip affects metabolism remains to be elucidated. To summarize our findings from our previous study, we showed that LDHB, which converts lactate to pyruvate, was required for Txnip-mediated rescue. Addition of the LDHB gene, however, did not boost rescue. We also showed that mitochondrial size and membrane potential were improved, and the Na/K pump function was improved, in Txnip-treated cones. Improved mitochondria were not sufficient, however, as revealed by a PARP-1 KO mouse with improved mitochondria that did not extend cone survival. In addition, using a Txnip mutant that does not remove the glucose transporter, we still saw cone rescue, so this function cannot be required for Txnip-mediated rescue. How does Txnip lead to improved mitochondria and to a reliance on lactate? We do not know.

      (2) Although the author conducted an experiment on arrdc14 due to its similarity to Txnip, the lack of clarification on why arrdc4, with a 60% amino acid similarity, did not yield the same effects as Txnip remains unaddressed. Highlighting structural disparities or differences in intracellular signaling pathways could potentially shed light on this incongruity. Subsequently, an additional experiment may be warranted to test the hypothesis regarding the effective component of α-arrestin for cone rescue.

      Additional experiments are needed to learn of the relevant differences between Arrdc4 and Txnip, but are beyond the scope of our work at the present. However, we have added a paragraph on newly published data on the function of Arrdc4 in the new Discussion:

      “Arrdc4, the most similar α-arrestin protein to Txnip that also has Arrestin N- and C- domains, accelerated RP cone death when transduced by AAV (Figure 1). This observation suggests that Txnip has unique functions that protect RP cones. Recently, Arrdc4 has been proposed to be critical for liver glucagon signaling, which could be negated by insulin (Dagdeviren et al. 2023). The implication of this potential role regarding RP cone survival is unclear, but interestingly, the activation of the insulin/mTORC1 pathway is beneficial to RP cone survival (Punzo et al. 2009; Venkatesh et al. 2015).”

      (3) The utilization of distinct mutant Txnip variants to impact RPE, cones, and their combined influence is noted. A comparative table elucidating the impact of cone rescue on these three targets would greatly enhance clarity.

      We presented these data in Figure 4 in a table format.

      Additionally, the text does not definitively establish whether Txnip.C247S.LL351 and 352AA, as well as Txnip.C247S, indeed manifest discrepancies when exclusively affecting RPE.

      We edited a sentence in Results to: “Similar to Best1-wt Txnip (Xue et al., 2021), Best1-Txnip.C247S did not show significant improvement of cone survival, ruling out the C247S mutation alone as promoting the cone survival by Best1-Txnip.C247S.LL351 and 352AA.”

      (4) While the text mentions that Txnip stimulates lactate utilization within cones, it remains unclear whether this effect extends to RPE. If applicable, this trait could potentially contribute to its role in cone rescue.

      We agree with the Reviewer, and hope to address this question in our next study.

      (5) The discussion introduces the notion that one potential mechanism for cone rescue by Txnip.C247S involves facilitating unhindered movement of Thioredoxin for redox processes. To validate this hypothesis and elucidate the mechanics of Txnip's involvement in cone rescue, it may be prudent to conduct further experiments concentrating on the interaction between Txnip and thioredoxin. Alternatively, an experiment aimed at upregulating Thioredoxin expression would be a valuable addition.

      We hope to address this question in the future. However, the effect may be more complicated than our simple hypothesis regarding release of Thioredoxin. More than a dozen proteins were found to differentially interact with Txnip vs. Txnip.C247S (Forred et al. 2016).

      Reviewer #3 (Recommendations For The Authors):

      (1) Glucose transporter 1 is identified as an important mechanism in the protection of cone degeneration. It is unclear why GLut1 is upregulated in retinal cells although the expression of Txnip mutants are specifically in the RPE in Figure 2.

      This retinal GLUT1 upregulation was not consistently observed in the treated eyes, so we did not comment on it in the text.

      (2) Mutant N. Txnip was mentioned in the discussion that it causes obvious retinal degeneration. The quantification of retinal thickness from Figure 2 will be more rigorous.

      Unlike the robust effects of Best1-N.Txnip on RPE GLUT1 level, this negative effect of Best1-N.Txnip on ONL thickness was not consistent. This result does not undermine the other major conclusions. Therefore, we deleted the related sentence of the original text: “This hypothesis is supported by the observation that N.Txnip led to an obvious thinning of the outer nuclear layer of the wt retina, reflecting a loss of photoreceptors”. We did leave in the related finding as follows:

      “The N-terminal half of Txnip (1-228aa) might exert harmful effects in the RPE, that negate the beneficial effects from the C-terminal half, suggested by the observation that its removal, in the C-terminal 149-397 allele, led to better cone survival when expressed in the RPE (Figure 2). In cones, the C-terminal half, including the C-terminal IDR tail, may cooperate with the N-terminal half, or negate its negative effects, to benefit RP cone survival. However, the C-terminal half is not sufficient for cone rescue when expressed in cones, as the 149-397 allele did not rescue.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels.

      Strengths:

      The experiments were thorough and well designed. The results are compelling and support the main claim. The development and the use of the DrosoX two-choice assay put forward for a more quantitative and automatic/unbiased assessment for ingestion volume and preference.

      Weaknesses:

      There are a few inconsistencies with respect the the exact role by which IR60b neurons limit high salt consumption and the contribution of external (labellar) high-salt sensors in regulating high salt consumption. These weaknesses do not significantly impact the main conclusion, however.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Sang et al. set out to identify gustatory receptors involved in salt taste sensation in Drosophila melanogaster. In a two-choice assay screen of 30 Ir mutants, they identified that Ir60b is required for avoidance of high salt. In addition, they demonstrate that activation of Ir60b neurons is sufficient for gustatory avoidance using either optogenetics or TRPV1 to specifically activate Ir60b neurons. Then, using tip recordings of labellar gustatory sensory neurons and proboscis extension response behavioral assays in Ir60b mutants, the authors demonstrate that Ir60b is dispensable for labellar taste neuron responses to high salt and the suppression of proboscis extension by high salt. Since external gustatory receptor neurons (GRNs) are not implicated, they look at Poxn mutants, which lack external chemosensory sensilla but have intact pharyngeal GRNs. High salt avoidance was reduced in Poxn mutants but was still greater than Ir60b mutants, suggesting that pharyngeal gustatory sensory neurons alone are sufficient for high salt avoidance. The authors use a new behavioral assay to demonstrate that Ir60b mutants ingest a higher volume of sucrose mixed with high salt than control flies do, suggesting that the action of Ir60b is to limit high salt ingestion. Finally, they identify that Ir60b functions within a single pair of gustatory sensory neurons in the pharynx, and that these neurons respond to high salt but not bitter tastants.

      Strengths:

      A great strength of this paper is that it rigorously corroborates previously published studies that have implicated specific Irs in salt taste sensation. It further introduces a new role for Ir60b in limiting high salt ingestion, demonstrating that Ir60b is necessary and sufficient for high salt avoidance and convincingly tracing the action of Ir60b to a particular subset of gustatory receptor neurons. Overall, the authors have achieved their aim by identifying a new gustatory receptor involved in limiting high salt ingestion. They use rigorous genetic, imaging, and behavioral studies to achieve this aim, often confirming a given conclusion with multiple experimental approaches. They have further done a great service to the field by replicating published studies and corroborating the roles of a number of other Irs in salt taste sensation. An aspect of this study that merits further investigation is how the same gustatory receptor neurons and Ir in the pharynx can be responsible for regulating the ingestion of both appetitive (sugar) and aversive tastants (high salt).

      A previous report published in eLife from John Carlson’s lab (Joseph et al, 2017) showed that the Ir60b GRN in the pharynx responds to sucrose resulting in sucrose repulsion. Thus, stimulation of this pharyngeal GRN results in gustatory avoidance only, not both attraction and avoidance. (lines 205-207)

      Weaknesses:

      There are several weaknesses that, if addressed, could greatly improve this work.

      (1) The authors combine the results and discussion but provide a very limited interpretation of their results. More discussion of the results would help to highlight what this paper contributes, how the authors interpret their results, and areas for future study.

      We agree and have now separated the Results and Discussion, and in so doing have greatly expanded discussion of the results.

      (2) The authors rename previously studied populations of labellar GRNs to arbitrary letters, which makes it difficult to understand the experiments and results in some places. These GRN populations would be better referred to according to the gustatory receptors they are known to express.

      One of the corresponding authors (Craig Montell) introduced this alternative GRN nomenclature in a review in 2021: Montell, C. (Drosophila sensory receptors—a set of molecular Swiss Army Knives. Genetics 217, 1-34) (Montell, 2021). We are not fans of referring to different classes of GRNs based on the receptors that they express since it is not obvious which receptors to use. For example, the GRNs that respond to bitter compounds all express multiple GR co-receptors. The same is true for the GRNs that respond to sugars. The former system of referring to GRNs simply as sugar, bitter, salt and water GRNs is also not ideal since the repertoire of chemicals that stimulates each class is complex. For example, the Class A GRNs (formerly sugar GRNs) are also activated by low Na+, glycerol, fatty acids, and acetic acid, while the B GRNs (former bitter GRNs) are also stimulated by high Na+, acids, polyamines, and tryptophan. In addition, there are five classes of GRNs. At first mention of the Class A—E GRNs, we mention the most commonly used former nomenclature of sugar, bitter, salt and water GRNs. In addition, for added clarify, we now also include a mention of one of the receptors that mark each class. (lines 51-59)

      (3) The conclusion that GRNs responsible for high salt aversion may be inhibited by those that function in low salt attraction is not well substantiated. This conclusion seems to come from the fact that overexpression of Ir60b in salt attraction and salt aversion sensory neurons still leads to salt aversion, but there need not be any interaction between these two types of sensory neurons if they act oppositely on downstream circuits.

      We did not make this claim.

      (4) The authors rely heavily on a new Droso-X behavioral apparatus that is not sufficiently described here or in the previous paper the authors cite. This greatly limits the reader's ability to interpret the results.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      Reviewer #3 (Public Review):

      Summary:

      Sang et al. successfully demonstrate that a set of single sensory neurons in the pharynx of Drosophila promotes avoidance of food with high salt concentrations, complementing previous findings on Ir7c neurons with an additional internal sensing mechanism. The experiments are well-conducted and presented, convincingly supporting their important findings and extending the understanding of internal sensing mechanisms. However, a few suggestions could enhance the clarity of the work.

      Strengths:

      The authors convincingly demonstrate the avoidance phenotype using different behavioral assays, thus comprehensively analyzing different aspects of the behavior. The experiments are straightforward and well-contextualized within existing literature.

      Weaknesses:

      Discussion

      While the authors effectively relate their findings to existing literature, expanding the discussion on the surprising role of Ir60b neurons in both sucrose and salt rejection would add depth. Additionally, considering Yang et al. 2021's (https://doi.org/10.1016/j.celrep.2021.109983) result that Ir60b neurons activate feeding-promoting IN1 neurons, the authors should discuss how this aligns with their own findings.

      Yang et al. demonstrated that the activation of Ir60b neurons can trigger the activation of IN1 neurons akin to pharyngeal multimodal (PM) neurons, potentially leading to enhanced feeding (Yang et al, 2021). However, our research reveals a specific pattern of activation for Ir60b neurons. Instead of being generalists, they are specialized for certain sugars, such as sucrose and high salt. Consequently, while Ir60b GRNs activate IN1 neurons, we contend that there are other neurons in the brain responsible for inhibiting feeding. (lines 412-417)

      Lines 187: The discussion primarily focuses on taste sensillae outside the labellum, neglecting peg-type sensillae on the inner surface. Clarification on whether these pegs contribute to the described behaviors and if the Poxn mutants described also affect the pegs would strengthen the discussion.

      We added the following to the Discussion section. “We also found that the requirement for Ir60b appears to be different when performing binary liquid capillary assay (DrosoX), versus solid food binary feeding assays. When we employed the DrosoX assay to test mutants that were missing salt aversive GRNs in labellar bristles but still retained functional Ir60b GRNs, the flies behaved the same as wild-type flies (e.g. Figure 3J and 3L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al, 2015), displayed repulsion to high salt food that was intermediate between control flies and the Ir60b mutant (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), and these hairless taste organs become exposed to food only when the labial palps open. We suggest that there are high-salt sensitive GRNs associated with taste pegs, which are accessed when the labellum contacts a solid substrate, but not when flies drink from the capillaries used in DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B).”. (lines 430-444)

      In line 261 the authors state: "We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, similar to what was observed with Ir56b 8; however, this did not generate a salt receptor (Figures S6A)"

      An obvious explanation would be that these neurons are missing the identified necessary co-receptors Ir76b and Ir25a. The authors should discuss here if the Gr33a neurons they target also express these co-receptors, if yes this would strengthen their conclusion that an additional receptor might be missing.

      We clarified this point in the Discussion section as follows, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al, 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al, 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites”. (lines 464-477)

      Methods

      The description of the Droso-X assay seems to be missing some details. Currently, it is not obvious how the two-choice is established. Only one capillary is mentioned, I assume there were two used? Also, the meaning of the variables used in the equation (DrosoX and DrosoXD) are not explained.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      The description of the ex-vivo calcium imaging prep. is unclear in several points:

      (1) It is lacking information on how the stimulus was applied (was it manually washed in? If so how was it removed?).

      We expanded the description of the apparatus in the ex vivo calcium imaging section of the Materials and Methods. (lines 682-716)

      (2) The authors write: "A mild swallow deep well was prepared for sample fixation." I assume they might have wanted to describe a "shallow well"?

      We deleted the word “deep.”.(line 691)

      (3) "...followed by excising a small portion of the labellum in the extended proboscis region to facilitate tastant access to pharyngeal organs." It is not clear to me how one would excise a small portion of the labellum, the labellum depicts the most distal part of the proboscis that carries the sensillae and pegs. Did the authors mean to say that they cut a part of the proboscis?

      Yes. We changed the sentence to “…followed by excising a small portion of the extended proboscis to facilitate tastant access to the pharyngeal organs.”.(lines 693)-695

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels. In general, I find the collective evidence presented by the authors convincing. But I feel the MS can benefit from having a discussion session and a few simple experiments. Below I listed some inconsistencies I hope the authors can address or at least discuss.

      We have now added a Discussion section, and expanded the discussion.

      (1) The role of IR60b neurons on suppressing PER appeared inconsistent. On the one hand, optogenetic activation of these neurons suppressed PER (Fig 1D), on the other hand, IR60b mutants were as competent to suppress PER in response to high salt as WT (Fig 2G). Are pharyngeal neurons expected to modulate PER? It might be worth including a retinal-free or genotype control to ascertain the PER suppression exhibited by IR60b>CsChrimson is genuine.

      Please note that Figure 2G is now Figure 2H.

      Our interpretation is that activation of aversive GRNs by high salt either in labellar bristles or in the pharynx is sufficient to inhibit repulsion to high salt. Consistent with this conclusion, optogenetic activation of Ir60b GRNs, which are specific to the pharynx, is sufficient to reduce the PER to sucrose containing food (Figure 1D). However, mutation of Ir60b has no impact on the PER to sucrose plus high (300 mM) NaCl since the high-salt activated GRNs in labellar bristles are not impaired by the Ir60b mutation. In contrast, Ir25a and Ir76b are required in both labellar bristles and in the pharynx to reject high salt. As a consequence, mutation of either Ir25a or Ir76b impairs the repulsion to high salt. Thus, there is no inconsistency between the optogenetics and PER results. We clarified this point in the Discussion section. In terms of controls for IR60b>CsChrimson, we show that UAS-CsChrimson alone or UAS-CsChrimson in combination with the Gr5a driver has no impact on the PER (Figure 1D). In addition, we now include a retinal free control (Figure 1D). These findings provide the key genetic controls and are described in the Results section. (lines 167-170)

      (2) The role of labellar high-salt sensors in regulating salt intake appeared inconsistent. On the one hand, they appeared to have a role in limiting high salt consumption because poxn mutants were significantly more receptive to high salt than WT (Fig. 2J). On the other hand, selectively restoring IR76b or IR25a in only the IR60b neurons in these mutants - thus leaving the labellar salt sensors still defective - reverted the flies to behave like WT when given a choice between sucrose vs. sucrose+high salt (Fig 3J, L).

      We now offer an explanation for these seemingly conflicting results in the Discussion section. When we employed the DrosoX assay with mutants with functional Ir60b GRNs, but were missing salt aversive GRNs in labellar bristles, the flies behaved the same as control flies (e.g. Figure 3J and L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al., 2015), display aversion high salt food intermediate between control and Ir60b mutant flies (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), which are exposed to food substrates only when the labial palps open. We suggest that the taste pegs harbor high salt sensitive GRNs, and they may be exposed to solid substrates, but not to the liquid in capillary tubes used in the DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B). (lines 433-444)

      (3) The behavior sensitivity of IR60b mutant to high salt again appeared somewhat inconsistent when assessed in the two different choice assays. IR60b mutant flies were indifferent to 300 mM NaCl when assayed with DrosoX (Fig 3A, B) but were clearly still sensitive to 300 mM NaCl when assayed with "regular" assay - they showed much reduced preference for 5 mM sucrose over 1 mM sucrose when the 5 mM sucrose was adulterated with 300 mM NaCl (Fig 1B).

      The explanation provided above may also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but not when selecting between 300 mM NaCl and 5 mM sucrose versus 1 mM sucrose in the solid food binary assay (Figure 1B). Alternatively, the different behavioral responses might be due to the variation in sucrose concentrations in each of these two assays, which employed 5 mM sucrose in the solid food binary assay, as opposed to 100 mM sucrose in the DrosoX assay. This disparity in attractive valence between these two concentrations of sucrose might consequently impact feeding amount and preference. This point is now also included in the Discussion section. (lines 441-449)

      (4) Given the IR60b neurons exhibited clear IR60b/IR25a/IR76b-dependent sucrose sensitivity, too, I am curious how the various mutant animals behave when given a choice between 100 mM sorbitol vs. 100 mM sorbitol + 300 mM NaCl, a food choice assay not complicated by the presence of sucrose. Similarly, I am curious if the Ca2+ response of IR60 neurons differs significantly when presented with 100 mM sucrose vs. when presented with 100 mM sucrose + 300 mM NaCl. In principle, the magnitude for the latter should be significantly larger than the former as animals appeared to be capable of discriminating these two choices solely relying on their IR60b neurons.

      To investigate the aversion induced by high salt in the absence of a highly attractive sugar, such as sucrose, we combined 300 mM salt with 100 mM sorbitol, which is a tasteless but nutritive sugar (Burke & Waddell, 2011; Fujita & Tanimura, 2011). Using two-way choice assays, we found that the Ir25a, Ir60b, and Ir76b mutants exhibited substantial reductions in high salt avoidance (Figure 3—figure supplement 2A). In addition, we performed DrosoX assays using 100 mM sorbitol alone, or sorbitol mixed with 300 mM NaCl. Sorbitol alone provoked less feeding than sucrose since it is a tasteless sugar (Figure 3—figure supplement 2B and C). Nevertheless, addition of high salt to the sorbitol reduced food consumption (Figure 3—figure supplement 2B and C). (lines 300-308)

      We also conducted a comparative analysis of the Ca2+ responses within the Ir60b GRN, examining its reaction to various stimuli, including 100 mM sucrose alone, 300 mM NaCl alone, and a combination of 100 mM sucrose and 300 mM NaCl. We found that the Ca2+ responses were significantly higher when we exposed the Ir60b GRN to 300 mM NaCl alone, compared with the response to 100 mM sucrose alone (Figure 4—figure supplement 1D). However, the GCaMP6f responses was not higher when we presented 100 mM sucrose with 300 mM NaCl, compared with the response to 300 mM NaCl alone (Figure 4—figure supplement 1D). (lines 360-367)

      Minor issues

      (1) The labels of sucrose concentration on Figure 2D were flipped.

      This has been corrected.

      (2) The phrasing of the sentence that begins in line 196 (i.e., "This suggests the internal sensor ...") is not as optimal.

      We changed the sentence to, “We found that the aversive behavior to high salt was reduced in the Poxn mutants relative to the control (Figure 2J), consistent with previous studies demonstrating roles for GRNs in labellar bristles in high salt avoidance (Jaeger et al, 2018; McDowell et al, 2022; Zhang et al, 2013).”. (lines 217-219)

      (3) In Line 231, I am not sure why the authors think ectopic expressing IR60b in labellar neurons would allow them to become activated by Na+. It seems highly unlikely to me, especially given IR60b also plays a role in sensing sugar.

      We added the following paragraph to the Discussion addressing this point, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al., 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al., 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites.”. (lines 464-477)

      Reviewer #2 (Recommendations For The Authors):

      Line 41, acutely excessive salt ingestion can lead to death, not just health issues

      We now state that, “consumption of excessive salt can contribute to various health issues in mammals, including hypertension, osteoporosis, gastrointestinal cancer, autoimmune diseases, and can lead to death.”. (lines 41-43)

      Line 46, delete the comma after flies

      Done. (line 47)

      Lines 51-56: This description is unnecessarily confusing and does not cite proper sources. Renaming these GRNs arbitrarily can only create confusion, plus this description lacks nuance. If E GRNs are Ir94e positive, this description is out of date. Furthermore, If D GRNs are ppk23 and Gr66a positive then they will respond to both bitter and high salt.

      Papers to consult: https://elifesciences.org/articles/37167 10.1016/j.cell.2023.04.038

      We have now added citations. We prefer the A—E nomenclature, which was introduced in a 2021 Genetics review by one of the authors of this manuscript (Montell) (Montell, 2021) since naming different classes of GRNs on the basis of markers or as sweet, bitter, salt and water GRNs is misleading and an oversimplification. We cite the Genetics 2021 review, and for added clarity include both types of former names (markers and sweet, bitter, salt and water). Class D GRNs are not marked by Gr66a. The eLife reference cited above provided the initial rationale for stating that Class E GRNs are marked by Ir94e and activated by low salt. According to the Taisz et al reference (Cell 2023), the Class E GRNs, which are marked by Ir94e, are also activated by pheromones, which we now mention (Taisz et al, 2023). (lines 51-59)

      Line 62, E GRNs are not required for low salt behaviors

      We do not state that E GRNs are required for low salt behaviors, only that they sense low Na+ levels. (line 58)

      Line 70-81 - Great deal of emphasis on labellar GRNs but then no mention of how pharyngeal GRNs fit into categories A-E

      We devote the following paragraph to pharyngeal GRNs. We do not mention how they fit in with the A—E categories because it is not clear.

      “In addition to the labellum and taste bristles on other external structures, such as the tarsi, fruit flies are endowed with hairless sensilla on the surface of the labellum (taste pegs), and three internal taste organs lining the pharynx, the labral sense organ (LSO), the ventral cibarial sense organ (VCSO), and the dorsal cibarial sense organ (DCSO), which also function in the decision to keep feeding or reject a food (Chen & Dahanukar, 2017, 2020; LeDue et al., 2015; Nayak & Singh, 1983; Stocker, 1994). A pair of GRNs in the LSO express a member of the gustatory receptor family, Gr2a, and knockdown of Gr2a in these GRNs impairs the avoidance to slightly aversive levels of Na+ (Kim et al, 2017). Pharyngeal GRNs also promote the aversion to bitter tastants, Cu2+, L-canavanine, and bacterial lipopolysaccharides (Choi et al, 2016; Joseph et al., 2017; Soldano et al, 2016; Xiao et al, 2022). Other pharyngeal GRNs are stimulated by sugars and contribute to sugar consumption (Chen & Dahanukar, 2017; Chen et al, 2021; LeDue et al., 2015). Remarkably, a pharyngeal GRN in each of the two LSOs functions in the rejection rather the acceptance of sucrose (Joseph et al., 2017).”. (lines 74-89)

      Line 89, aversive --> aversion

      We changed this part.

      Line 90, gain of aversion capsaicin avoidance suggests they are sufficient for avoidance, not essential for avoidance.

      We changed “essential” to “sufficient.”. (line 100)

      Line 104, what are you recording from here? Labellar or pharyngeal GRNs

      We added “S-type and L-type sensilla” to the sentence. (line 119)

      Line 107, How are A GRNS marked with tdTomato? It is important to mention how you are defining A GRNs.

      We modified the sentence as follows: “Using Ir56b-GAL4 to drive UAS-mCD8::GFP, we also confirmed that the reporter was restricted to a subset of Class A GRNs, which were marked with LexAop-tdTomato expressed under the control of the Gr64f-LexA (Figure 1—figure supplement 1D—F).”. (lines 120-123)

      Line 124, should read "concentrated as sea water."

      We made the change. (line 142)

      Line 125, I am not sure what is meant by "alarm neurons"

      We changed “additional pain or alarm neurons” to “nociceptive neurons.”. (line 144)

      Line 141, Are you definitely A GRNs as only labellar GRNs, i.e. the Gr5a-GAL4 pattern with labellar plus few pharyngeal GRNs? Or are the defining it as Gr64f-GAL4 (i.e. labellar plus many pharyngeal GRNs)

      We refer to the Class A—E GRNs as labellar GRNs. Therefore, in this instance, we removed the reference to A GRNs and B GRNs, and simply mention the drivers that we used (Gr5a-GAL4 and Gr66a-GAL4) to express UAS-CsChrimson. The modified sentence is, “As controls we drove UAS-CsChrimson under control of either the Gr5a-GAL4 or the Gr66a-GAL4.”. (lines 51-59, 160-161)

      Line 180, labellar hairs--> labellar taste bristles

      We made the change. (line 204)

      Line 190, possess only --> only possess

      We made the change. (line 216)

      Line 202, Should this read increased?

      Yes. We changed “reduced” to “increased.”. (line 225)

      Line 206, The information provided here and in reference 47 was not sufficient for me to understand how the Droso-X system works and whether it has been validated. Better diagrams and much more description is required for the reader to understand this system and assess its validity

      We now explain that the DrosoX “system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary is then monitored automatically over the course of 6 hours and recorded on a computer.”. (lines 238-243)

      Line 218-219, It would be helpful to expand on this to explain how the previous paper detected no difference. Is this because the contact time with the food is the same but the rate of ingestion is slower?

      Yes. This is correct. We now clarify this point by stating that, “In a prior study, it was observed that the repulsion to high salt exhibited by the Ir60b mutant was indistinguishable from wild-type (Joseph et al., 2017). Specifically, the flies were presented with drop of liquid (sucrose plus salt) at the end of a probe, and the Ir60b mutant flies fed on the food for the same period of time as control flies (Joseph et al., 2017). However, this assay did not discern whether or not the volume of the high salt-containing food consumed by the Ir60b mutant flies was reduced relative to control flies. Therefore, to assess the volume of food ingested, we used the DrosoX system, which we recently developed (Figure 3—figure supplement 1A) (Sang et al, 2021). This system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary was then monitored automatically over the course of 6 hours and recorded on a computer. We found that control flies consuming approximately four times more of the 100 mM sucrose than the sucrose mixed with 300 mM NaCl (Figure 3A). In contrast, the Ir25a, Ir60b, and Ir76b mutants consumed approximately two-fold less of the sucrose plus salt (Figure 3A). Consequently, they ingested similar amounts of the two food options (Figure 3B; ingestion index). Thus, while the Ir60b mutant and control flies spend similar amounts of time in contact with high salt-containing food when it is the only option (Joseph et al., 2017), the mutant consumes considerably less of the high salt food when presented with a sucrose option without salt.”. (lines 226-251)

      Lines 231-235, Is this evidence for this, that Ir60b expression in the Ir25a or Ir76b pattern will induce high salt responses in the labellum? You should elaborate on this to clearly state what you mean rather than implying it. I do not think that overexpression of one Ir is enough evidence for this sweeping conclusion.

      We agree. We eliminated this point. (lines 227-232)

      Lines 261-263, Please elaborate here, how did you target the I-type sensilla and where are these neurons? So they already express Ir76b and Ir25a?

      We now explain in the Results that, “We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. Gr33a is co-expressed with Gr66a (Moon et al., 2009), which has been shown to be co-expressed Ir25a and Ir76b (Li et al., 2023). When we performed tip recordings from I7 and I10 sensilla, we did not observe a significant increase in action potentials in response to 300 mM NaCl (Figure 4—figure supplement 1A), indicating that ectopic expression of Ir60b in combination with Ir25a and Ir76b is not sufficient to generate a high salt receptor.”. (lines 324-330)

      Lines 300-303, The discussion needs to be greatly expanded. What is the proposed mechanism by which the same neurons/receptors can inhibit sucrose and high salt feeding? What is the author's interpretation of what this study adds to our understanding of taste aversion?

      We have now added a Discussion section and greatly expanded the discussion.

      Reviewer #3 (Recommendations For The Authors):

      In line 73 there is a typo in "esophagus"

      We changed this part.

      In line 331, the use of a mixture of sucrose and "saponin" seems to be a mistake; "NaCl" is likely intended.

      We made the correction. (lines 546 and 640)

      On several occasions, the authors refer to the pharynx as a taste organ (for example 1st sentence of the abstract). I am not sure this is correct, the actual pharyngeal taste organs are the LSO, DSCO, and VSCO which are located in the pharynx.

      We made the corrections. (lines 24, 90, 92, 93, and 356)

      In line 155 the authors refer to Ir25a and Ir76b as "broadly tuned". I think it is not correct to refer to co-receptors this way, I'd suggest to just call them co-receptors.

      We made the correction. (lines 177-178)

      In line 182, stating "Gr2a is also expressed in the proboscis" is unclear. Clarify whether it refers to sensillae, pharyngeal taste organs, etc.

      We clarified it refers to pharyngeal taste organs. (lines 206-207)

      Line 253: "These finding imply that all three Irs are coexpressed in the pharynx." "The pharynx" is very unspecific, did the authors mean to say "the same neuron"?

      We now clarify by saying “in the Ir60b GRN in the pharynx.”. (line 317)

      Figures & Legends

      I found it confusing that the same color scale is being reused for different panels with different meanings repeatedly and in inconsistent ways. For example in Figure 2, red and blue are being used for Ir25a² mutants, while blue is also being used for Gr64f-Gal4 and S type sensilla. It is also not easily visible nor mentioned in the caption which of the 3 color scales presented belong to which panels.

      We modified the colors in the figures so that they are used in a consistent way. We now also define the colors in the legends.

      In Figure 2 F-I, indicating the stimulus sequence in each panel would enhance clarity. The color scale in Figure 3 could benefit from explicit explanations of different shades in the caption for easier interpretation.

      For example: "The ingestion of (a, dark color) 100 mM sucrose alone and (b, light color) in combination with 300 mM"

      We made the suggested modification.

      In Figure 4a the authors highlight that Ir76b and Ir25a label 2 neurons in the LSO. Did the imaging in 4c also capture the second cell, and if so did it respond to their stimulation?

      No, the focal plane differs, and the signal in Figure 4C is considerably weaker compared to the immunohistochemistry shown in Figure 4A. Notably, the other neuron did not exhibit a response to NaCl.

      In Figure 4f a legend for the color scale is missing, or the color might not be necessary at all. Also, the asterisks seem to be shifted to the right.

      We fixed the shifted asterisks and eliminated the color.

      Figure 4i is mislabeled 4f

      We made the correction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This study highlights new insights into the mechanism of pheochromocytoma pathogenesis that remains poorly understood. In the context of hereditary syndromes, such as multiple endocrine neoplasia 2 (MEN-2), where RET mutation is the major driver of thyroid, parathyroid, and adrenal pathologies, including pheochromocytoma, this mechanistic dissection of RET and TMEM127 is fundamentally sound. While the significance was deemed important, the strength of the evidence was found to be solid,

      Recognizing the limitations of models available for study of neuroendocrine cancers, and specifically for pheochromocytomas, we have revised and clarified the text of the current manuscript version and provide specific responses to the additional comments provided below, highlighting changes and new data.

      Reviewer #1 (Recommendations For The Authors):

      A current lack of pheochromocytoma cell lines and the use of generated cell lines for mechanistic studies presents a significant challenge that may undermine the inferred value of these findings in mock in vitro systems and question reproducibility in pheochromocytoma. Consideration for 3-dimensional patient-derived pheochromocytoma organoid in vitro and patient-derived organoid xenograft in vivo models will enable confirmation or refute novel findings described by the authors.

      We agree completely with Reviewer 1 that ideally, we should replicate these findings with PCC-derived cells in vitro and in organoids. Despite many attempts, PCC cell lines have proved a major challenge for the field of neuroendocrine cancers. Cell line models are not available and PDOs have proven poorly growing and resistant to manipulations, such as CRISPR KOs or siRNA KD. In studies completed since the submission and review of the present manuscript, and subsequently published elsewhere, we have shown that RET protein is highly expressed in TMEM127-mutant PCC by immunohistochemistry. We also showed that the TMEM127-KO SH-SY5Y cell model does grow more robustly than Mock-KO cells in nude mice and that RET inhibition (Selpercatinib) does lead to tumor regression (Guo et al., 2023), suggesting that our findings may be reproducible in vivo. These findings, and potential caveats of the cell models used have been further discussed in the text.

      Reviewer #2 (Recommendations For The Authors):

      Most notably, all experiments are conducted in an isogenic single-cell line. This exposes the whole story to be potentially confounded by unknown variables.

      In addition, studies would benefit from the adding back of TMEM127, or other methods to modulate endosome and plasma membrane dynamics to mechanistically secure the cause of the findings.

      As suggested by Reviewer 2, we have generated a TMEM127 KO in HEK293, an unrelated cell line which expressed low levels of TMEM127 but does not express RET. Consistent with our findings in SH-SY5Y, we saw increased membrane accumulation of endogenous membrane proteins N-cadherin and transferrin receptor-1 in these cells in the absence of TMEM127. Additionally, re-expression of a wildtype TMEM127 (FLAG-TMEM127) in these cells led to dramatic decreases in membrane localization of these proteins (Supplemental Figure 1D). These data suggest that membrane accumulation is indeed TMEM127 dependent, and that these processes are not directly dependent on RET expression.

      References

      Guo, Q., Z.M. Cheng, H. Gonzalez-Cantu, M. Rotondi, G. Huelgas-Morales, P. Ethiraj, Z. Qiu, J. Lefkowitz, W. Song, B.N. Landry, H. Lopez, C.M. Estrada-Zuniga, S. Goyal, M.A. Khan, T.J. Walker, E. Wang, F. Li, Y. Ding, L.M. Mulligan, R.C.T. Aguiar, and P.L.M. Dahia. 2023. TMEM127 suppresses tumor development by promoting RET ubiquitination, positioning, and degradation. Cell Rep. 42:113070.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript by DeHaro-Arbona et al., the authors wish to understand how a signaling pathway (Notch) is dynamically decoded to elicit a specific transcriptional output. In particular, they investigate the kinetic properties of Notch-responsive nuclear complexes (the DNA binding factor CSL and its co-activator Mastermind (mam) along with several candidate interacting partners). Their experimental model is the polytene chromosome of the Drosophila salivary gland, in which the naturally inactive Notch can be artificially induced through the expression of a constitutively active form of Notch.

      The authors develop a series of CRISPR and transgenic lines enabling the live imaging of these complexes at a specific locus and in various backgrounds (genetic perturbations/drug treatments). This quantitative live imaging data suggests that Notch nuclear complexes form hubs, and the authors characterize their binding dynamics. Interestingly, they elegantly demonstrate that the content of these hubs and their kinetic properties can evolve, even within Notch ON cells. Hence, they propose the existence of distinct hubs, distinguishing an open (CSL), engaged (CSK-Mam), or active (CSL-Mam-Med-PolII) configuration in Notch ON cells and an inactive hub (in Notch OFF having previously been exposed to Notch) state, that would explain the surprising transcriptional memory that the authors observe hours after Notch withdrawal.

      We thank the reviewer for this constructive summary of our work

      Reviewer #2 (Public Review):

      The manuscript from deHaro-Arbona et al, entitled "Dynamic modes of Notch transcription hubs conferring memory and stochastic activation revealed by live imaging the co-activator Mastermind", uses single molecule microscopy imaging in live tissues to understand the dynamics and molecular determinants of transcription factor recruitment to the E(spl)-C locus in Drosophila salivary gland cells under Notch-ON and -OFF conditions. Previous studies have identified the major players that are involved in transcription regulation in the Notch pathway, as well as the importance of general transcriptional coregulators, such as CBP/P300 and the Mediator CDK module, but the detailed steps and dynamics involved in these processes are poorly defined. The authors present a wealth of single molecule data that provides significant insights into Notch pathway activation, including:

      (1) Activation complexes, containing CSL and Mam, have slower dynamics than the repressor complexes, containing CSL and Hairless.

      (2) Contribution of CSL, NICD, and Mam IDRs to recruitment.

      (3) CSL-Mam slow-diffusing complexes are recruited and form a hub of high protein concentrations around the target locus in Notch-ON conditions.

      (4) Mam recruitment is not dependent on transcription initiation or RNA production.

      (5) CBP/P300 or its associated HAT activity is not required for Mam recruitment.

      (6) Mediator CDK module and CDK8 activity are required for Mam recruitment, and vice-versa, but not CSL recruitment.

      (7) Mam is not required for chromatin accessibility but is dependent on CSL and NICD.

      (8) CSL recruitment and increased chromatin accessibility persist after NICD removal and loss of Mam, which confers a memory state that enables rapid re-activation in response to subsequent Notch activation.

      (9) Differences in the proportions of nuclei with both Pol II and with Mam enrichment, which results in transcription being probabilistic/stochastic. These data demonstrate that the presence of Mamcomplexes is not sufficient to drive all the steps required for transcription in every Notch-ON nucleus.

      (10) The switch from more stochastic to robust transcription initiation was elicited when ecdysone was added.

      Overall, the manuscript is well written, concise, and clear, and makes significant contributions to the Notch field, which are also important for a general understanding of transcription factor regulation and behavior in the nucleus. I recommend that the authors address my relatively minor criticisms detailed below.

      We thank the reviewer for their thorough and constructive summary of our work. We are glad that they overall found it insightful and interesting. Below we have addressed the points they have raised.

      Page 7, bottom. The authors speculate, "It is possible therefore that, once recruited, Mam can be retained at target loci independently of CSL by interactions with other factors so that it resides for longer." Is it possible that another interpretation of that data is that Mam is a limiting factor?

      As indicated our comment is a speculation and is based on the observations summarized in the paragraph. We are not entirely sure what the reviewer is proposing as an alternate model. However, if it relates to the relative concentrations of the different factors, this would not account for the differences in trajectory durations. And for most aspects of our analysis, K[off] has the most profound influence on the results. Furthermore, differences persist even when CSL levels are considerably reduced (as in conditions with Hairless RNAi).

      Page 9. The authors write, "A very low level of enrichment was evident for... for the CSL Cterminus..". The recruitment of CSL ct IDR does not appear to be statistically significant or there is no apparent difference (Figure S2C), suggesting the CSL ct IDR does not play a role in enrichment.

      We agree with the comments of the reviewer and have adjusted the text on page 9 accordingly.

      Page 9. The authors write, "Notably, MamnIDR::GFP fusion was present in droplets, suggesting it can self-associate when present in a high local concentration (Figure S2B)." Is this result only valid for Mam nIDR or does full-length Mam also localize into droplets, as has been previously observed for full-length mammalian Maml1 in transfected cells?

      We agree that the observed foci of MamL1 that have been detected in mammalian cells are interesting. We have not tried to replicate those data because the large size of Mam has made it challenging to produce a full-length form in over-expression. We note however that another portion of Mam, MamIDR, does not make droplets when over-expressed despite it containing a large section of the disordered region of the Drosophila Mam. We have now included a comment about the mammalian data in the text (page 9) to put our findings in context.

      Previous studies in mammalian cells suggest that Maml1 is a high-confidence target for phosphorylation by CDK8, see Poss et al 2016 Cell Reports https://doi.org/10.1016/j.celrep.2016.03.030. By sequence comparison, does fly Mam have similar potential phosphorylation sites, and might these be critical for Mam/CDK module recruitment?

      We thank the reviewer for highlighting this point. Indeed, we were very excited when we learnt that MamL1 was found to be a high confidence CDK8 target and we looked hard in the Mam sequence for potential phosphorylation sites. Sadly, there is very little conservation between the fly and the mammalian proteins beyond the helical region that contacts CSL and NICD. Furthermore, there are no identifiable putative CDK8 phosphorylation sites based on conventional motifs. It therefore remains to be established whether or not Mam is a direct target of the CDK8 kinase activity. We have added an explanatory comment in the text (page 11).

      Page 11: The authors write, "The differences in the effects on Mam and CSL imply that the CDK module is specifically involved in retaining Mam in the hub, and that in its absence other CSL complexes "win-out", either because the altered conditions favour them and/or because they are the more abundant." Are the "other" complexes the authors are referring to Hairless-containing complexes? With the reagents the authors have in hand couldn't this be explicitly shown for CSLcomplexes rather than speculated upon?

      The reviewer is correct that CSL complexes containing Hairless are good candidates to be recruited in these conditions. We have compared the levels of Hairless at E(spl)-C following treatments with Senexin and have not detected a difference. However, it appears that the high proportion of unbound Hairless makes it difficult to detect/quantify the enrichment at E(spl)-C. We have therefore taken a different strategy, which is to measure the recruitment of a mutant form of CSL that is compromised for Hairless binding. Recruitment of the mutant CSL is detected in Notch-ON conditions, but is significantly reduced/absent following Senexin treatment. These data favour the model proposed by the reviewer that in the absence of CDK8 activity, the CSL-Hairless complexes win out. These new data have been added in new Supplementary Figure S3F and S3G (and see text page 11)

      Page 12/13: The authors write, "Based on these results we propose that, after Notch activity decays, the locus remains accessible because when Mam-containing complexes are lost they are replaced by other CSL complexes (e.g. co-repressor complexes)." Again, why not actually test this hypothesis rather than speculate? The dynamics of Hairless complexes following the removal of Notch would be very interesting and build upon previously published results from the Bray lab.

      We thank the reviewer for this comment and we agree it’s possible that the proportion of Hairless complexes increases after Notch withdrawal. However, for the reasons outlined above, it is difficult to quantify changes in Hairless, (and our preliminary experiment did not reveal any large-scale effect) and because of the complexity of the genetics we cannot straightforwardly extend the experiment to analyze the behaviour of the mutant CSL as above. Therefore, at present, we cannot say whether the loss of Mam is compensated by an increase in Hairless. We hope in future to investigate the characteristics of the memory in more depth.

      Page 13: The authors write, "As Notch removal leads to a loss of Mam, but not CSL, from the hub, it should recapitulate the effects of MamDN." While the data in Figure 5B seem to support this hypothesis, it's not clear to me that the loss of Mam and MamDN should phenocopy each other, bc in the case of MamDN, NICD would still be present.

      We apologise that this sentence was a bit misleading. We have now rewritten it to improve accuracy (page 13) “As Notch removal leads to a loss of Mam, but not CSL, from the hub, we hypothesised it would recapitulate the effects of MamDN on chromatin accessibility and transcription of targets.”

      The temporal dynamics for Mam recruitment using the temperature- and optogenetic-paradigms are quite different. For example, in the optogenetic time course experiments, the preactivated cells are in the dark for 4 hours, while in the temperature-controlled experiments, there is still considerable enrichment of Mam at 4 hours. For the preactivated optogenetic experiments, how sure are the authors that Mam is completely gone from the locus, and alternatively, can the optogenetic experimental results be replicated in the temperature-controlled assays? My concern is whether the putative "memory" observation is just due to incomplete Mam removal from the previous activation event.

      We appreciate the concerns of the reviewer. However, we are confident that the 4-hour optogenetic inactivation is much more effective than the equivalent time for temperature shifts. The temperature sensitive experiment involves a longer decay, because not only the protein but also the mRNA has to decay to fully remove NICD activity. The optogenetic experiments, involve only protein decay and so are more acute. Furthermore, we have tested (and we show in Figure 5H) that Mam is fully depleted after 4 hours “Off” in the optogenetic experiments.

      In order to further strengthen the evidence in favour of the memory hub, we have extended the time-frame further to show that CSL is retained at the locus even after 24 hours “Notch OFF” in both the temperature and the optogenetic paradigm. We have also measured the effects on transcription after a 24hr OFF period using the optogenetic paradigm and seen that robust transcription is initiated in cells that have experienced a previous activation (preactivated) compared to those that have not (naïve). These new data have been added to new Figure 5 C-F and strongly support the memory model.

      Reviewer #3 (Public Review):

      Summary:

      DeHaro-Arbona and colleagues investigate the in vivo dynamics of Notch-dependent transcriptional activation with a focus on the role of the Mastermind (MAM) transcriptional co-activator. They use GFP and HALO-tagged versions of the CSL DNA-binding protein and MAM to visualize the complex, and Int/ParB to visualize the site of Notch-dependent E(Spl)-C transcription. They make several conclusions. First, MAM accumulates at E(Spl)-C when Notch signaling is active, just like CSL. Second, MAM recruits the CDK module of Mediator but does not initiate chromatin accessibility. Third, after signaling is turned off, MAM leaves the site quickly but CSL and chromatin accessibility are retained. Fourth, RNA pol II recruitment, Mediator recruitment, and active transcription were similar and stochastic. Fifth, ecdysone enhances the probability of transcriptional initiation.

      Strengths:

      The conclusions are well supported by multiple lines of extensive data that are carefully executed and controlled. A major strength is the strategic combination of Drosophila genetics, imaging, and quantitative analyses to conduct compelling and easily interpretable experiments. A second major strength is the focus on MAM to gain insights into the dynamics of transcriptional activation specifically.

      We thank the reviewer for their positive comments about the strengths of our work.

      Weaknesses:

      Weaknesses are minor. There were no p-values reported for data presented in Figure S1D and no indication of how variable measurements were. In addition, the discussion of stochasticity was not integrated optimally with relevant literature.

      We thank the reviewer for noting these points. The statistical tests have now been included for Figure S1D (now Figure S1F). We have amplified the discussion about stochasticity, to include more reference to the literature and to make clear also the distinction with transcription bursting (page 19, 20).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have an elegant series of manipulations that provide strong evidence for their hypotheses and conclusions. Their exploitation of a unique biological system amenable to imaging in the larval salivary gland is well-considered and well-performed. Most of the conclusions are supported by the data. I only have the concerns below.

      (1) One of the main findings is the composition of Notch nuclear complexes and their interactions within a 'hub'. Yet most of the data showing hubs focus on labeling one protein component (+the locus or transcription), but multi-color imaging is rarely used to show how CSL-Mam, Mam-Med... protein signals coalescence to form a hub. Given the powerful tool developed, it would be important to show these multi-state hubs. Related to this, if the authors expect that hubs are formed independently of transcription or Notch pathway activation, do the authors see clustering at other non-specific loci in the nucleus? If not, can the authors comment on why they think that is the case? If so, do they demonstrate consistent residence time profiles with the tracked E(spl) locus?

      We apologise that it was not evident from the data shown that the proteins co-localize. First we stress that all the experiments are multicolor and most rely on very powerful methods to measure co-recruitment at a chromosomal locus- something that is very rarely achieved by others studying hubs. Second, we have in all cases confirmed that the proteins do colocalize. We have modified the diagram of our analysis pipeline to make more clear that this relies on multi-colour imaging, and adjusted all the figure labels to indicate the position of E(spl)-C. We have also added panels to new supplementary Figure S1C with examples of the co-localization between CSL and Mam and a plot confirming their levels of recruitment are correlated across multiple nuclei.

      We would like to clarify that our data show that the hubs do require Notch activation for their establishment. Other regions of enrichment are detected in Notch-ON conditions, but these are less prominent and, with no independent method for identifying them, can’t be compared between nuclei. In SPT experiments, other clusters with consistent residence are detected as reported in our recent paper which expanded on the SPT data (Baloul et al, 2023). We also detect co-localizations and “hubs” in other tissues, but those analyses are ongoing and beyond the scope of this paper.

      (2) The authors convincingly show that Notch hub complexes exhibit a memory. While the data showing rapid hub reformation upon Notch withdrawal are solid and convincing (Figure 5, in particular, F), the claim that this memory fosters rapid transcriptional reactivation is less clear. Yet in order to invoke transcriptional memory, it's necessary to solidify this transcriptional response angle. The authors should consider quantifying the changes in transcription activity (at the TS and not in the cytoplasm as currently shown), as well as the timing of transcriptional reactivation (with the MS2 system or smFISH). Manipulating the duration of the activation and dark recovery periods could help to draw a better correlation between the timing of hub reformation and that of transcriptional response and would also help determine how persistent this phenomenon is.

      We thank the reviewer for these suggestions. We have carried out several new experiments to probe further the persistence of memory and to show the effects on transcription when Notch is inactivated/reactivated. First, we have extended the time period for Notch inactivation by temperature control and show that the CSL hub persists even at 24 hours and that no transcription from the target E(spl)m3 is detected –neither at the transcription start-site nor in the cytoplasm. Second, we have extended the Notch OFF time period to 24 hours using the optogenetic approach and show that transcription is robustly reinitiated in preactivated nuclei when Notch is re-activated with 30 mins light treatment while little if any E(spl)m3 transcription is detected in naïve nuclei with the same treatment. These new data are included in new Figure 5 C-F and see page 13-14. Both these new experiments substantiate the model that the nuclei retain transcriptional memory.

      (3) The manuscript ends with the finding that the presence of a Mam hub does not always correlate with transcription. They conclude that transcription is initially stochastic. The authors find this surprising and even state that this could not be observed without their in vivo live imaging approaches. I don't understand why this result is surprising or unexpected, as we now know that transcription is generally a stochastic process and that most (if not all) loci are transcribed in a bursting manner. The fact that E(spl)-C locus is bursty is already obvious from the smFISH data. The fact that active nascent transcription does not correlate with local TF hubs was already observed in early Drosophila embryos (with Zelda hubs and two MS2 reporters, hb-MS2, sna-MS2). If, in spite of the inherent stochasticity of transcription (bursting), the data are surprising for other reasons, the authors should explain it better.

      We apologise that we had not made clear the reasons why the results were unexpected. We have substantially rewritten this section, and the discussion section, to clarify. We have also moderated the language used to better reflect the overall context of our results. We briefly summarise here. As the reviewer correctly states, it is well known that transcription is inherently bursty. Indeed the MS2 transcription profiles in “ON” nuclei are bursty, which likely reflects the switching of the promoter. However, in other contexts where we have monitored transcription although it is bursty it has nevertheless been initiated synchronously in response to Notch in all nuclei in a manner that was fully penetrant. What we observe in our current conditions, is that some nuclei never initiate transcription over the time-course of our experiments (2-3 hours), and those that are ON rarely switch off. This implies that there is another rate-limiting step. Supplying a second signal can modulate this so that it occurs with much higher frequency/penetrance. We consider this to be a second tier of regulation above the fundamental transcriptional bursting.

      The fact that Mam is recruited in all nuclei, whether or not they are actively transcribing was surprising because recruitment of the activation complex has been considered as the limiting step. This is somewhat different from Zelda, which is thought to be permissive and needed at an early step to prime genes for later activation rather than to be the last step needed to fire transcription. We note also that we are not monitoring the position of the hub with respect to the promoter, as in the Zelda experiments (Zelda hubs may still persist, but they are not overlapping with the nascent RNA), we are monitoring the presence or absence of Mam hub in proximity to a genomic region.

      Minor suggestions:

      (1) The genotypes of the samples should be indicated in the figure legends.

      We thank the reviewer for this suggestion. We have provided a table (new Table S3) where all of the genetic combinations are provided in detail for each figure. We considered that this approach would be preferable because it would be quite cumbersome to have the genotypes in each legend as they would become very long and repetitive.

      (2) While the schematic Fig1A explains how the locus is detected, the presence of ParS/ParB is never indicated in subsequent panels and Figure. I assume that all panels depicting enrichment profiles, use a given radius from the ParS/ParB dot to determine the zero of the x-axis (grey zone). This should be clearly stated in all panels/figure legends concerned.

      We apologies if this was not made explicit. Yes, all panels depicting enrichment profiles, use immunofluorescence signal from ParA/ParB recruitment to determine the zero of the x-axis. We have now marked this more clearly In all figures (grey bar, grey shading or labelled 0). All images where the locus is indicated by an arrowhead, by a coloured bar above the intensity plots or by grey shading in the graphs have been captured with dual colour and the signal from ParA/B recruitment used to define its location. This is now clearly stated in the analysis methods and in the legend. We have also modified the diagram in new supplementary Figure S1B, showing our analysis pipeline, to make that more explicit.

      (3) FRAP/SPT experiments: the author should provide more details. How many traces? Are traces showing bleaching removed?

      P7: does the statement ' The residences are likely an underestimation because bleaching and other technical limitations also affect track durations' imply that traces showing bleaching have not been removed from the analysis?

      The authors could justify the choice of the model for fitting FRAP/Spt experiments and be cautious about their interpretation. For example, interpreting a kinetic behavior as a DNA-specific binding event can be accurate, only if backed up with measurements with a mutant version of the DNA binding domain.

      We apologise if some of this information was not evident. The number of trajectories is provided in new Figure S1F, which indicates the number of trajectories analyzed for each condition in Figure 1.

      We have now added also the numbers of trajectories analyzed for the ring experiments.

      The comments on page 7 about bleaching refer to the technical limitations of the SPT approach. However, as bleached particles cannot be distinguished from those that leave the plane of imaging, they have not been filtered or removed. We have not sought to make claims about absolute residence times for that reason. Rather the point is to make a comparison between the different molecules. As the same fluorescent ligand and imaging conditions are used in all the experiments, all the samples are equivalently affected by bleaching. We subdivide trajectories according to their properties and infer that those which are essentially stationary are bound to chromatin, as is common practice in the field. We note that we have previously shown that a DNA binding mutant of CSL does not produce a hub at E(spl)-C in Notch-ON conditions and has a markedly more rapid recovery in FRAP experiments (Gomez-Lamarca et al, 2018) consistent with the slow recovery being related to DNA binding. This point has been added to the text (page 8).

      (4) The authors should quantify their RNAi efficiency for Hairless-RNAi, Med13-RNAi, white-RNAi, yellow-RNAi, CBP-RNAi, and CDK8-RNAi.

      We thank the reviewer for this comment. We have made sure that we are using well validated RNAis in all our experiments and have included the references in Table S2 where they have been used. We have now evaluated the knock-down in the precise conditions used in our experiments by quantitative RT-PCR and added those data, which show efficient knock-down is occurring, to new Supplementary Figure S1D and Figure S3J. We note also that the RNAi experiments are complemented by experiments inhibiting the complexes with specific drugs and that these yield similar results.

      (5) Figure 3 A: could the author show that transcription is indeed inhibited upon triptolide treatment with smFISH (with for example m3 probes)? Why not use alpha-amanitin?

      We thank the reviewer for this suggestion. We had omitted the smFISH data from this experiment in error. These data have now been added to new Supplementary Figure S3A and clearly show that transcription is inhibited following 1 hour exposure to triptolide. Triptolide is a very fast acting and very efficient inhibitor of transcription that acts at a very early step in transcription initiation. In our experience it is much more efficient than alpha-amanitin and is now the inhibitor of choice in many transcription studies.

      (6) Figure 4 typo: panel B should be D and vice versa. Accessibility panels are referred to as Figure 4D, D' in the text but presented as panel B in the Figure.

      We thank the reviewer for noting this mistake, it is now changed in the main text.

      (7) The authors must add their optogenetic manipulation protocol to their methods section.

      The method is described in detail in a recently published paper that reports its design and use. We have now also added a section explaining the paradigm in the methods (Page 31) as requested.

      (8) Figure 3G needs a Y-axis label.

      Our apologies, this has now been added.

      (9) The authors should note why there was a change of control in Figure 3D compared to 3E and G (yellow RNAi vs white RNAi).

      This is a pragmatic choice that relates to the chromosomal site of the RNAis being tested. Controls were chosen according to the chromosome that carries the UAS-RNAi: for the second chromosome this was yellow RNAi and for the third white RNAi. This is explained in the methods.

      (10) Figure 1 would benefit from a diagram describing the genomic structure of the E(spl) locus and the relative position of the labelled locus within it.

      We thank the reviewer for this suggestion and have added a diagram to Supplementary Figure S1A .

      Reviewer #2 (Recommendations For The Authors):

      Minor criticisms and typos:

      Pet peeve: in some of the figure panels they are labeled Notch ON or OFF, but in others they are not, albeit that info is included in the figure legend. For the ease of the reader/reviewer, would it be possible to label all relevant figure panels either Notch ON or OFF for clarity?

      We thank the reviewer for this suggestion and have modified the figures accordingly.

      Page 7, top. "In comparison to their average distribution across the nucleus, both CSL and Mam trajectories were significantly enriched in a region of approximately 0.5 μm around the target locus in Notch-ON conditions, reflecting robust Notch dependant recruitment to this gene complex." Are the authors referring to Figure 1D here?

      Thank you, this figure call-out has been added in the text.

      Page 9. "...reported to interact with p300 and other factors (Figure S2B)." I believe the authors mean Figure S2C and not S2B.

      Thank you, this has been corrected in the text.

      Page 9. There is no Figure S2D.

      Apologies, this was referring to Figure S1D, and is now corrected in the text.

      Page 11: "...were at very reduced levels in nuclei co-expressing MamDN (Figure 4B).." Should be Figure 4CD.

      Thank you, this has been corrected in the text.

      Page 12: "...which was maintained in the presence of MamDN (Figure 4D, D')." Should be Figure 4B.

      Thank you, this has been corrected in the text.

      Reviewer #3 (Recommendations For The Authors):

      In the Results section on Hub, the paragraph starting with "Third, we reasoned . ." the callout to Figure S2D should be Fig S1D.

      Thank you, this has been corrected in the text

      Figures: The font size in the Figures is so small that most words and numbers cannot be read on a printout. One has to go to the electronic version and increase the size to read it. This reviewer found that inconvenient and often annoying.

      We apologise for this oversight, the font size has now been adjusted on all the graphs etc.

      Figure legends: the legends are terse and in some cases leave explanations to the imagination (e.g. "px" in Figure 2E). It would be useful to go through them and make sure those who are not a Drosophila Notch person and not a transcription biochemist can make sense of them.

      Our apologies for the lack of clarity in the legends. We have gone over them to make them more accessible and less succinct.

    1. Author Response

      We are very pleased to hear the overall positive views and constructive criticisms of eLife Editors and Reviewers on our work. In particular, we appreciate their comments highlighting the value of our new pipeline for high-throughput quantification of fly embryonic movement and the positive views of reviewers and editors that our data on the roles of miR-2b-1 in embryonic movement are well supported.

      Regarding Reviewer 1, we thank them for their positive comments that our work is experimentally sound and well-written, their kind words on the value of our new embryonic movement pipeline, and their overall appreciation of the quality, scope, and significance of our work. In a revised version of the manuscript we will consider discussing and addressing some of the interesting points raised by Rev1.

      Turning to the comments by Rev2, we are grateful to them for their recognition of the novelty of our miRNA findings and appreciation of the utility of our novel quantitative pipeline for assessing embryonic movement. Nonetheless, we politely – but strongly – disagree with their suggestion that the findings are inflated by our language. For example, they criticise our use of the verb ‘control’, yet this is a standard textbook term in molecular biology to describe biological processes regulated by genetic factors: given that miR-2b-1 regulates movement patterns during embryogenesis, to say that miR-2b-1 ‘controls’ embryonic movement in the Drosophila embryo is reasonable and in line with the language used in the field. It is not inflation. In connection to other comments, in a revised manuscript we will propose a different name for the gene here described as Janus to avoid annotation issues at FlyBase due to other, unrelated genes that include this word as part of their names.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Weaknesses are the absence of correlation between the results from the animal studies and human pancreatic cancers.

      Author response: We appreciate the reviewer’s attention to the importance of human pancreatic cancer studies. In a previous study (D’Amico et al. Genes & Development 2018 doi: 10.1101/gad.311852.118), we evaluated the expression of STAT3 in human pancreatic tissue microarrays and data from the Human Protein Atlas. Mutations in Stat3 are infrequent in human pancreatic cancers, however there is a trend of decreased STAT3 activity in poorly differentiated carcinomas.

      In the current study, STAT3 and SMAD4 gene signature scores (computed from KO KPC cells) were aligned with human pancreatic ductal adenocarcinoma samples from the TCGA cohort, and statistical analyses supported the selective antagonism of STAT3 and SMAD4 (Fig 4D, Fig 4E).

      The complex process of EMT is difficult to characterize rigorously in human cancers. Mouse models offer an opportunity to study the relationships between cancer phenotypes and genetic alterations.

      Reviewer #2 (Public Review):

      [...] While correlations are strong, the study would benefit from additional cause-and-effect type experiments. It would also be beneficial to better tie together the first and second parts of the paper.

      Author response: We understand the Reviewer’s interest in additional experiments that could further elucidate mechanisms that drive EMT and/or KRAS dependency in relation to STAT3 and TGF-beta antagonism. We previously investigated the development of mutant KRAS knockout tumors (Ischenko et al. Nature Communications 2021 doi:10.1038/s41467-021-21736) to find loss of KRAS promotes EMT, similar to loss of STAT3. Additional experiments are underway but are outside the scope of the current study.

      The first part of the paper is mechanistic and used KRAS-transformed mouse embryo fibroblasts to perform in vitro studies with foci formation. The cell-based foci formation assay has been shown to best evaluate malignant transformation and oncogenic potential. In the second part we transitioned to epithelial cells and pancreatic ductal adenocarcinomas to combine mechanistic relationships with genetic models.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ need further interrogation. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors for the positive assessment of our manuscript. We have carefully considered the reviewers’ constructive and helpful comments and revised our manuscript accordingly. To address the question about the dissociable relationships between global and local BM processing, we have provided more evidence and additional analyses in this revised version.

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating differences in biological motion perception in participants with ADHD in comparison with controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated local and global (holistic) biological motion perception, the group, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention/impulsivity). As well as local global biological motion perception is reduced in ADHD participants. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not the controls. A path analysis in the ADHD data suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature and adds potentially also new behavioral markers for this clinical condition. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thank you for your positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper.

      We agree that the relationship between genetic factors and BM processing in ADHD needs more investigation, We have modified our statement in Discussion section as following:

      “Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19.” (lines 421 - 425),

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a relatively clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate your positive assessment of our work.

      Weaknesses:

      Except for the main analysis, it is unclear what the authors' specific predictions are regarding the three different tasks they employ. The three BM tasks are used to probe different processes underlying BM perception, but it is difficult to gather from the introduction why these three specific tasks were chosen and what predictions the authors have about the performance of the ADHD group in these tasks. Relatedly, the authors do not report whether (and if so, how) they corrected for multiple comparisons in their analyses. As the number of tests one should control for depends on the theoretical predictions (http://daniellakens.blogspot.com/2016/02/why-you-dont-need-to-adjust-you-alpha.html), both are necessary for the reader to assess the statistical validity of the results and any inferences drawn from them. The same is the case for the secondary analyses exploring relationships between the 3 individual BM tasks and social function measured by the social responsivity scale (SRS).

      We appreciate these constructive suggestions. In response, we have included a detailed description in the Introduction section explaining why we employed three different tasks and our predictions about the performance in ADHD:

      “Despite initial indications, a comprehensive investigation into BM perception in ADHD is warranted. We proposed that it is essential to deconstruct BM processing into its multiple components and motion features, since treating them as a single entity may lead to misleading or inconsistent findings31. To address this issue, we employed a carefully designed behavioral paradigm used in our previous study19, making slight adjustments to adapt for children. This paradigm comprises three tasks. Task 1 (BM-local) aimed to assess the ability to process local BM cues. Scrambled BM sequences were displayed and participants could use local BM cues to judge the facing direction of the scrambled walker. Task 2 (BM-global) tested the ability to process the global configuration cues of the BM walker. Local cues were uninformative, and participants used global BM cues to determine the presence of an intact walker. Task 3 (BM-general) tested the ability to process general BM cues (local + global cues). The stimulus sequences consisted of an intact walker and a mask containing similar target local cues, so participants could use general BM cues (local + global cues) to judge the facing direction of the walker.” (lines 116 - 130)

      “In Experiment 1, we examined three specific BM perception abilities in children with ADHD. As mentioned earlier, children with ADHD also show impaired social interaction, which implies atypical social cognition. Therefore, we speculated that children with ADHD performed worse in the three tasks compared to TD children.” (lines 131 - 134)

      Additionally, we have reported the p values corrected for multiple comparisons (false discovery rate, FDR) in the revised manuscript wherever it was necessary to adjust the alpha (lines 310 - 316; Table 2). The pattern of the results remained unchanged.

      In relation to my prior point, the authors could provide more clarity on how the conclusions drawn from the results relate to their predictions. For example, it is unclear what specific conclusions the authors draw based on their findings that ADHD show performance differences in all three BM perception tasks, but only local BM is related to social function within this group. Here, the claim is made that their results support a specific hypothesis, but it is unclear to me what hypothesis they are actually referring to (see line 343 & following). This lack of clarity is aggravated by the fact that throughout the rest of the discussion, in particular when discussing other findings to support their own conclusions, the authors often make no distinction between the two processes of interest. Lastly, some of the authors' conclusions related to their findings on local vs global BM processing are not logically following from the evidence: For instance, the authors conclude that their data supports the idea that social atypicalities are likely to reduce with age in ADHD individuals. However, according to their own account, local BM perception - the only measure that was related to social function in their study - is understood to be age invariant (and was indeed not predicted by age in the present study).

      Thank you for pointing out this issue. We have carefully revised the Discussion section about our findings to clarify these points:

      “Our study contributes several promising findings concerning atypical biological motion perception in ADHD. Specifically, we observe the atypical local and global BM perception in children with ADHD. Notably, a potential dissociation between the processing of local and global BM information is identified. The ability to process local BM cues appears to be linked to the traits of social interaction among children with ADHD. In contrast, global BM processing exhibits an age-related development. Additionally, general BM perception may be affected by factors including attention.” (lines 387 - 393)

      We have provided a detailed discussion on the two processes of interest to clarify their potential differences and the possible reasons behind the difference of the divergent developmental trajectories between local and global BM processing:

      “BM perception is considered a multi-level phenomenon56-58. At least in part, processing information of local BM and global BM appears to involve different genetic and neural mechanisms16,19. Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19. The sensitivity to local rather than global BM cues seems to emerge early in life. Visually inexperienced chicks exhibit a spontaneous preference for the BM stimuli of hen, even when the configuration was scrambled20. The same finding was reported in newborns. On the contrary, the ability to process global BM cues rather than local BM cues may be influenced by attention28,29 and shaped by experience24,56.” (lines 419 - 430)

      “We found that the ability to process global and general BM cues improved significantly with age in both TD and ADHD groups, which imply the processing module for global BM cues tends to be mature with development. In the ADHD group, the improvement in processing general and global BM cues is greater than that in processing local BM cues, while no difference was found in TD group. This may be due to the relatively higher baseline abilities of BM perception in TD children, resulting in a relatively milder improvement. These findings also suggest a dissociation between the development of local and global BM processing. There seems to be an acquisition of ability to process global BM cues, akin to the potential age-related improvements observed in certain aspects of social cognition deficits among individuals with ADHD5, whereas local BM may be considered an intrinsic trait19.” (lines 438 -449)

      In addition, we have rephased some inaccurate statements in revised manuscript. Another part of social dysfunction might be stable and due to the atypical local BM perception in ADHD individuals, although some studies found a part of social dysfunction would reduce with age in ADHD individuals. One reason is that some factors related to social dysfunction would improve with age, like the symptom of hyperactivity.

      Results reported are incomplete, making it hard for the reader to comprehensively interpret the findings and assess whether the conclusions drawn are valid. Whenever the authors report negative results (p-values > 0.05), the relevant statistics are not reported, and the data not plotted. In addition, summary statistics (group means) are missing for the main analysis.

      Thanks for your comments. We have provided the complete statistical results in the revised manuscript (lines 309 - 316) and supplementary material, which encompass relevant statistics and plots of negative results (Figure 4, Figure S2 and S3), in accordance with our research questions. And we have also included summary statistics in the Results section (lines 287 - 293).

      Some of the conclusions/statements in the article are too strong and should be rephrased to indicate hypotheses and speculations rather than facts. For example, in lines 97-99 the authors state that the finding of poor BM performance in TD children in a prior study 'indicated inferior applicability' or 'inapplicable experimental design'. While this is one possibility, a perhaps more plausible interpretation could be that TD children show 'poor' performance due to outstanding maturation of the underlying (global) BM processes (as the authors suggest themselves that BM perception can improve with age). There are several other examples where statements are too strong or misleading, which need attention.

      We thank you for pointing out the issue. We have toned down and rephrased the strong statements and made the necessary revisions.

      “Another study found that children with ADHD performed worse in BM detection with moderate ratios of noise34. This may be due to the fact that BM stimuli with noise dots will increase the difficulty of identification, which highlights the difference in processing BM between the two groups33,35.” (lines 111 - 115)

      Reviewer #3 (Public Review):

      Summary:

      The authors presented point light displays of human walkers to children (mean = 9 years) with and without ADHD to compare their biological motion perception abilities and relate them to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three biological motion tasks, but that those loading more heavily on local processing related to social interaction skills and global processing to age. The important and solid findings are informative for understanding this complex condition, as well as biological motion processing mechanisms in general. However, I am unsure that these differences between local and global skills are truly supported by the data and suggest some further analyses.

      Strengths:

      The authors present clear differences between the ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate your positive feedback. In revised manuscript, we have added more analyses to support the differences between local and global motion processing. Please refer to our response to the point #3 you mentioned below.

      Weaknesses:

      I am unsure that the data are strong enough to support claims about differences between global and local processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but do not seem so plausible to me. I am also concerned about gender, and possible autism, confounds when examining the effect of ADHD. Specifics:

      Gender confound. There are proportionally more boys in the ADHD than TD group. The authors appear to attempt to overcome this issue by including gender as a covariate. I am unsure if this addresses the problem. The vast majority of participants in the ADHD group are male, and gender is categorically, not continuously, defined. I'm pretty sure this violates the assumptions of ANCOVA.

      We appreciate your comments. We concur with you that although we observed a clear difference between local and global BM processing in ADHD, the evidence is to some extent preliminary. The mechanistic possibilities for why these abilities may dissociate have been discussed in revised manuscript. Please refer to the response to reviewer 2’s point #2. To further examine if gender played a role in the observed results, we used a statistical matching technique to obtain a sub-dataset. The pattern of results remained with the more balanced dataset (see Supplementary Information part 1). According to your suggestion, we have also presented the results without using gender as a covariate in main text and also separated the data of boys and girls on the plots (see Figure 1 and Figure S1). There were indeed no signs of a gender effect.

      Autism. Autism and ADHD are highly comorbid. The authors state that the TD children did not have an autism or ADHD diagnosis, but they do not state that the ADHD children did not have an autism diagnosis. Given the nature of the claims, this seems crucial information for the reader.

      Thanks for your suggestion. We have confirmed that all children with ADHD in our study were not diagnosed with autism. We used a semi-structured interview instrument (K-SADSPL-C) to confirm every recruited child with ADHD but not with ASD. The exclusion criteria for both groups were mentioned in the Materials and methods section:

      “Exclusion criteria for both groups were: (a) neurological diseases; (b) other neurodevelopmental disorders (e.g., ASD, Mental retardation, and tic disorders), affective disorders and schizophrenia…” (lines 158 - 162)

      Conclusions. The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. I think that if the authors wish to make strong claims here they must show inferential stats supporting (1) a difference between ADHD and TD SRS-Task 1 correlations, and (2) a difference in those differences for Task 2 and 3 relative to Task 1. I think they should also show a scatterplot of this correlation, with separate lines of best fit for the two groups, for Tasks 2 and 3 as well. I.e. Figure 4 should have 3 panels. I would recommend the same type of approach for age. Currently, they have small samples for correlations, and are reading much of theoretical significance between some correlations passing significance threshold and others not. It would be incredibly interesting if the social skills (as measured by SRS) only relate to local BM abilities, and age only to global, but I think the data are not so clear with the current information. I would be surprised if all BM abilities did not improve with age. Even if there is some genetic starter kit (and that this differs according to particular BM component), most abilities improve with learning/experience/age.

      Thank you for this recommendation. We have added more statistics to test differences between the correlations (a difference between ADHD and TD in SRS-Task 1 correlations (see the first paragraph of Supplementary Information part 2), a difference in SRS-response accuracy correlations for Task 2 and 3 relative to Task 1(see the second paragraph of Supplementary Information part 2), and a difference in age-response accuracy correlations for Task 2 and 3 relative to Task 1 in ADHD group (see Supplementary Information part 3)). Additionally, we have included scatterplots for SRS-Task1, SRS-Task2, SRS-Task3 (with separate lines of best fit for the two groups in each, see Figure 4), SRS-ADHD, SRS-TD, age-ADHD and age-TD (with separate lines of best fit for the three tasks in each, see Figure S2 and S3) to make a clear demonstration. Detailed results have been presented in the revised manuscript and Supplementary Information. We expect these further analyses would strengthen our conclusions.

      Theoretical assumptions. The authors make some sweeping statements about local vs global biological motion processing that need to be toned down. They assume that local processing is specifically genetically whereas global processing is a product of experience. The fact their global, but not local, task performance improves with age would tend to suggest there could be some difference here, but the existing literature does not allow for this certainty. The chick studies showing a neonatal preference are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      Thank you for pointing out this issue. We have toned down rephrased our claims that the difference between local and global BM processing according to your suggestion:

      “These findings suggest that local and global mechanisms might play different roles in BM perception, though the exact mechanisms underlying the distinction remain unclear. Exploring the two components of BM perception will enhance our understanding of the difference between local and global BM processing, shedding light on the psychological processes involved in atypical BM perception.” (lines 87 - 92)

      Reviewer #1 (Recommendations For The Authors):

      I have only a number of minor points that should be addressed prior to publication:

      L. 95ff: What is meant by 'inapplicability of experimental designs' ? This paragraph is somewhat unclear.

      In revised manuscript, we have clarified this point (lines 111 - 115).

      L. 146: The groups were not perfectly balanced for sex. Would results change fundamentally in a more balanced design, or can arguments be given that gender does not play a role, like it seems to be the case for some functions in biological motion perception (e.g. Pavlova et al. 2015; Tsang et al 2018). One could provide a justification that this disbalance does not matter or test for subsampled balanced data sets maybe.

      This point is similar to the point #1 from reviewer 3, and we have addressed this issue in our response above.

      L. 216 f.: In this paragraph it does not become very clear that the mask for the global task consisted of scrambles generated from walkers walking in the same direction. The mask for the local task then should consist of a balanced mask that contains the same amount of local motion cues indicating right and leftwards motion. Was this the case? (Not so clear from this paragraph.)

      Regarding the local task, the introduction of mask would make the task too difficult for children. Therefore, in the local task, we only displayed a scrambled walker without a mask, which was more suitable for children to complete the task. We have made clear this point in the corresponding paragraph (lines 232 - 241).

      L. 224 ff.: Here it would be helpful to see the 5 different 'facing' directions of the walkers. What does this exactly mean? Do they move on oblique paths that are not exactly orthogonal to the viewing directions, and how much did these facing directions differ?

      Out of the five walkers we used, two faced straight left or right, orthogonal to the viewing directions. Two walked with their bodies oriented 45 degrees from the observer, to the left or right. The last one walked towards the observer. We have included a video (Video 4) to demonstrate the 5 facing directions.

      L. 232: How was the number of 5 practicing trials determined/justified?

      As mentioned in main text, global BM processing is susceptible to learning. Therefore, too many practicing trials would increase BM visual experience and influence the results. We determined the number of training trials to be 5 based on the results of the pilot experiment. During this phase, we observed that nearly all children were able to understand the task requirements well after completing 5 practicing trials.

      L 239: Apparently no non-parametric statistics was applied. Maybe it would be good to mention in the Statistics section briefly why this was justified.

      We appreciate your suggestion and have cited two references in the Statistics section (Fagerland et al. 2012, Rochon et al. 2012). Fagerland et al., mentioned that when the sample size increases, the t-test is more robust. According to the central limit theorem, when the sample size is greater than 30, the sampling distribution of the mean can be safely assumed to be normal.

      (http://www2.psychology.uiowa.edu/faculty/mordkoff/GradStats/part%201/I.07%20normal.p df). In fact, we also ran non-parametric statistics for our data and found the results to be robust.

      L 290: 'FIQ' this abbreviation should be defined.

      Regarding the abbreviation ’FIQ’, it stands for the abbreviation of the full-scale intellectual quotient, which was mentioned in Materials and methods section:

      “Scores of the four broad areas constitute the full-scale intellectual quotient (FIQ).”

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors gender etc. time appropriate beta_i values. This formula should be corrected or one just says that a GLM was run with the predictors gender ....

      The same criticism applies to these other models that follow.

      We thank you for pointing this out. We have modified all formulas accordingly in the revised manuscript (see part3 of the Results section).

      All these models assume linearity of the combination of the predictors.was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the group of patients. Does the same observation also apply to the normals?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data of the study will be available at https://osf.io/37p5s/.

      (2) Although overall, the language was clear and understandable, there are a few parts where language might confuse a reader and lead to misconceptions. For instance, line 52: Did the authors mean to refer to 'emotions and intentions' instead of 'emotions and purposes'? See also examples where rephrasing may help to reflect a statement is speculation rather than fact.

      Thanks for the comments. We have carefully checked the full text and rephrased the confused statements.

      (3) Line 83/84: Autism is not a 'mental disorder' - please change to something like 'developmental disability'. Authors are encouraged to adapt their language according to terms preferred by the community (e.g., see Fig. 5 in this article:

      https://onlinelibrary.wiley.com/doi/10.1002/aur.2864)

      Suggestion well taken. We have changed the wording accordingly:

      “In recent years, BM perception has received significant attention in studies of mental disorders (e.g., schizophrenia30) and developmental disabilities, particularly in ASD, characterized by deficits in social communication and social interaction31,32.” (lines 93 - 95)

      (4) Please report how the sample size for the study was determined.

      In the Materials and methods section (lines 168 - 173), we explained how the sample size was determined.

      Line 94: It would be helpful to have a brief description of what neurophysiological differences have been observed upon BM perception in children with ADHD.

      Thanks for the comment. We have added a brief description of neurophysiological findings in children with ADHD (lines 108 - 111).

      (6) Line 106/107 and 108/109: please add references.

      We have revised this part, and the relevant findings and references are in line with the revised manuscript (lines 77, 132 - 133).

      (7) Line 292: Please add what order the factors were entered into each regression model.

      Regarding this issue, we used SPSS 26 for the main analysis. SPSS utilizes the Type III sum of squares (default) to evaluate models. Regardless of the order in the GLM, we will obtain the same result. For more information, please refer to the documentation of SPSS 26 (https://www.ibm.com/docs/en/spss-statistics/26.0.0?topic=features-glm-univariate-analysis).

      Reviewer #3 (Recommendations For The Authors)

      (1) Task specifics. It is key to understanding the findings, as well as the dissociation between tasks, that the precise nature of the stimuli is clear. I think there is room for improvement in description here. Task 1 is described as involving relocating dots within the range of the intact walker. Of course, PLWs are created by presenting dots at the joints, so relocation can involve either moving to another place on the body, or random movement within the 2D spatial array (which likely involves moving it off the body). Which was done? It is said that Ps must indicate the motion direction, but what was the display of the walker? Sagittal? Task 2 requires detecting whether there is an intact walker amongst scrambled walkers. Were all walkers completely overlaid? Task 3 requires detecting the left v right facing of an intact walker at different orientations, presented amongst noise. So Task 3 requires determining facing direction and Task 1 walking direction. Are these tasks the same but described differently? Or can walkers ever walk backwards? Wrt this point, I also think it would help the reader if example videos were uploaded.

      We appreciate you for bringing this to our attention. With regards to Task 1, it appears that your second speculation is correct. We scrambled the original dots and randomly presented them within the 2D spatial array (which likely involved moving them off the body). As a result, the global configuration of the 13 dots was completed disrupted while preserving the motion trajectory of each individual dot. This led to the display of scrambled dots on the monitor (which does not resemble a human). In practice, these local BM cues contain information about motion direction. In Task 2, the target walkers completely overlaid by a mask that is approximately 1.44 times the size of the intact walker. The task requirements of Task 1 and Task3 are same, which is judging the motion (walking) direction. The difference is that Task 1 displayed a scrambled walker while Task 3 displayed an intact walker within a mask. We have clarified these points and improved our descriptions in Procedure section and created example videos for each task, which we believe will be helpful for the readers to understand each task.

      (2) Gender confound (see above). I think that the authors should present the results without gender as a covariate. Can they separate boys and girls on the plots with different coloured individual datapoints, such that readers can see whether it's actually a gender effect driving the supposed ADHD effect? And show that there are no signs of a gender effect in their TD group?

      This point is similar to the point #1 you mentioned. Please refer to our response to that point above.

      (3) Autism possible confound (see above). I think the authors must report whether any of the ADHD group had an autism diagnosis.

      Please refer to the response for the point #2 your mentioned.

      (4) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors should add stats demonstrating differences between the correlations to support such claims, as well as demonstrating appropriate scatterplots for SRS-Task 1, SRS-Task 2, SRS-Task 3 and age-Task 1, age-Task2 and age-Task 3 (with separate lines of best fit for the two groups in each).

      Please refer to the response for the point #3 your mentioned.

      (5) Theoretical assumptions (see above). I would suggest rephrasing all claims here to outline that these discussed mechanistic differences between local and global BM processing are only possibilities and not known on the basis of existing data.

      Please refer to the response for the point #4 your mentioned.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I only have a few minor suggestions:

      Abstract: I really liked the conclusion (that IM and VWM are two temporal extremes of the same process) as articulated in lines 557--563. (It is always satisfying when the distinction between two things that seem fundamentally different vanishes). If something like this but shorter could be included in the Abstract, it would highlight the novel aspects of the results a little more, I think.

      Thank you for this comment. We have added the following to the abstract:

      “A key conclusion is that differences in capacity classically thought to distinguish IM and VWM are in fact contingent upon a single resource-limited WM store.”

      L 216: There's an orphan parenthesis in "(justifying the use".

      Fixed.

      L 273: "One surprising result was the observed set size effect in the 0 ms delay condition". In this paragraph, it might be a good idea to remind the reader of the difference between the simultaneous and zero-delay conditions. If I got it right, the results differ between these conditions because it takes some amount of processing time to interpret the cue and free the resources associated with the irrelevant stimuli. Recalling that fact would make this paragraph easier to digest.

      That is correct. However, at this point in the text, we have not yet fitted the DyNR model to the data. Therefore, we believe that introducing cue processing and resource reallocation as concepts that differentiate between those two conditions would disrupt the flow of this paragraph. We address these points soon after, in a paragraph starting on line 341.

      Figures 3, 5: The labels at the bottom of each column in A would be more clear if placed at the top of each column instead. That way, the x-axis for the plots in A could be labeled appropriately, as "Error in orientation estimate" or something to that effect.

      We edited both figures, now Figure 4 and Figure 6, as suggested.

      L 379: It should be "(see Eq 6)", I believe.

      That is correct, line 379 (currently line 391) should read ‘Eq 6’. Fixed.

      L 379--385: I was a bit mystified as to why the scaled diffusion rate produced a worse fit than a constant rate. I imagine the scaled version was set to something like

      sigma^2_diff_scaled = sigma^2_base + K*(N-1)

      where N is the set size and sigma^2_base and K are parameters. If this model produced a similar fit as with a constant diffusion rate, the AIC would penalize it because of the extra parameter. But why would the fit be worse (i.e., not match the pattern of variability)? Shouldn't the fitter just find that the K=0 solution is the best? Not a big deal; the Nelder-Mead solutions can wobble when that many parameters are involved, but if there's a simple explanation it might be worth commenting on.

      The scaled diffusion was implemented by extending Eq 6 in the following way:

      σ(t)2 = (t-toffset) * σ̇ 2diff * N

      where N is set size. Therefore, the scaling was not associated with a free parameter that could become 0 if set size did not affect diffusion rate, but variability rather mandatory increased with set size. We now clarify this in the text:

      “The second variant was identical to the proposed model, except that we replaced the constant diffusion rate with a set size scaled diffusion rate by multiplying the right side of Eq 6 by N.“

      Figure 4 is not mentioned in the main text. Maybe the end of L 398 would be a good place to point to it. The paragraph at L 443-455 would also benefit from a couple of references to it.

      Thank you for this suggestion. Figure 4 (now Figure 5) was previously mentioned on line 449 (previously line 437), but now we have included it on line 410 (previously line 398), within the paragraph spanning lines 455-467 (previously 443-455), and also on line 136 where we first discuss masking effects.

      L 500: Figure S7 is mentioned before Figures S5 and S6. Quite trivial, I know....

      Thank you for this comment. There was no specific reason for Figure S7 to appear after S5 & S6, so we simply swapped their order to be consistent with how they are referred to in the manuscript (i.e., S7 became S5, S5 became S6, and S6 became S7).

      Reviewer #2 (Recommendations For The Authors):

      (1) One potential weakness is that the model assumes sensory information is veridical. However, this isn't likely the case. Acknowledging noise in sensory representations could affect the model interpretation in a couple of different ways. First, neurophysiological recordings have shown normalization affects sensory representations, even when a stimulus is still present on the screen. The DyNR model partially addresses this concern because reports are drawn from working memory, which is normalized. However, if sensory representations were also normalized, then it may improve the model variant where subjects draw directly from sensory representations (an alternative model that is currently described but discarded).

      Thank you for this suggestion. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      Second, visual adaptation predicts sensory information should decrease over time. This would predict that for long stimulus presentation times, the error would increase. Indeed, this seems to be reflected in Figure 5B. This effect is not captured by the DyNR model.

      Indeed, neural responses in the visual cortex have been observed to quickly adapt during stimulus presentation, showing reduced responses to prolonged stimuli after an initial transient (Groen et al., 2022; Sawamura et al., 2006; Zhou et al., 2019). This adaptation typically manifests as 1) reduced activity towards the end of stimulus presentation and 2) a faster decay towards baseline activity after stimulus offset.

      In the DyNR model, we use an idealized solution in which we convolve the presented visual signal with a response function (i.e., temporal filter). At the longest presentation durations, in DyNR, the sensory signal plateaus and remains stable until stimulus offset. Because our psychophysical data does not allow us to identify the exact neural coding scheme that underlies the sensory signal, we tend to favour this simple implementation, which is broadly consistent with some previous attempts to model temporal dynamics in sensory responses (e.g., Carandini and Heeger, 1994). However, we agree with the reviewer that some adaptation of the sensory signal with prolonged presentation would also be consistent with our data.

      We have added the following to the manuscript:

      “In Experiment 2, the longest presentation duration shows an upward trend in error at set sizes 4 and 10. While this falls within the range of measurement error, it is also possible that this is a meaningful pattern arising from visual adaptation of the sensory signal, whereby neural populations reduce their activity after prolonged stimulation. This would mean less residual sensory signal would be available after the cue to supplement VWM activity, predicting a decline in fidelity at higher set sizes. Visual adaptation has previously been successfully accounted for by a type of delayed normalization model in which the sensory signal undergoes a series of linear and nonlinear transformations (Zhou et al., 2019). Such a model could in future be incorporated into DyNR and validated against psychophysical and neural data.”

      Carandini, M., & Heeger, D. J. (1994). Summation and division by neurons in primate visual cortex. Science, 264(5163), 1333–1336. https://doi.org/10.1126/science.8191289

      Groen, I. I. A., Piantoni, G., Montenegro, S., Flinker, A., Devore, S., Devinsky, O., Doyle, W., Dugan, P., Friedman, D., Ramsey, N. F., Petridou, N., & Winawer, J. (2022). Temporal Dynamics of Neural Responses in Human Visual Cortex. The Journal of Neuroscience, 42(40), 7562–7580. https://doi.org/10.1523/JNEUROSCI.1812-21.2022

      Sawamura, H., Orban, G. A., & Vogels, R. (2006). Selectivity of Neuronal Adaptation Does Not Match Response Selectivity: A Single-Cell Study of the fMRI Adaptation Paradigm. Neuron, 49(2), 307–318. https://doi.org/10.1016/j.neuron.2005.11.028

      Zhou, J., Benson, N. C., Kay, K., & Winawer, J. (2019). Predicting neuronal dynamics with a delayed gain control model. PLOS Computational Biology, 15(11), e1007484. https://doi.org/10.1371/journal.pcbi.1007484

      (2) A second potential weakness is that, in Experiment 1, the authors briefly change the sensory stimulus at the end of the delay (a 'phase shift', Fig. 6A). I believe this is intended to act as a mask. However, I would expect that, in the DyNR model, this should be modeled as a new sensory input (in Experiment 2, 50 ms is plenty of time for the subjects to process the stimuli). One might expect this change to disrupt sensory and memory representations in a very characteristic manner. This seems to make a strong testable hypothesis. Did the authors find evidence for interference from the phase shift?

      The phase shift was implemented with the intention of reducing retinal after-effects, essentially acting as a mask for retinal information only; crucially the orientation of the stimulus is unchanged by the phase shift, so from the perspective of the DyNR model, it transmits the same orientation information to working memory as the original stimulus.

      If our objective were to model sensory input at the level of individual neurons and their receptive fields, we would indeed need to treat this phase shift as a novel input. Nevertheless, for DyNR, conceived as an idealization of a biological system for encoding orientation information, we can safely assume that visual areas in biological organisms have a sufficient number of phase-sensitive simple cells and phase-indifferent complex cells to maintain the continuity of input to VWM.

      When comparing conditions with and without the phase shift of stimuli (Fig S1B), we found performance to be comparable in the perceptual condition (simultaneous presentation) and with the longest delay (1 second), suggesting that the phase shift did not change the visibility or encoding of information into VWM. In contrast, we found strong evidence that observers had access to an additional source of information over intermediate delays when the phase shift was not used. This was evident through enhanced recall performance from 0 ms to 400 ms delay. Based on this, we concluded that the additional source of information available in the absence of a phase shift was accessible immediately following stimulus offset and had a brief duration, aligning with the theoretical concept of retinal afterimages.

      (3) It seems odd that the mask does not interrupt sensory processing in Experiment 2. Isn't this the intended purpose of the mask? Should readers interpret this as all masks not being effective in disrupting sensory processing/iconic memory? Or is this specific to the mask used in the experiment?

      Visual masks are often described as instantly and completely halting the visual processing of information that preceded the mask. We also anticipated the mask would entirely terminate sensory processing, but our data indicate the effect was not complete (as indicated by model variants in Experiment 2). Nevertheless, we believe we achieved our intended goal with this experiment – we observed a clear modulation of response errors with changing stimulus duration, indicating that the post-stimulus information that survived masking did not compromise the manipulation of stimulus duration. Moreover, the DyNR model successfully accounted for the portion of signal that survived the mask.

      We can identify two possible reasons why masking was incomplete. First, it is possible that the continuous report measure used in our experiments is more sensitive than the discrete measures (e.g., forced-choice methods) commonly employed in experiments that found masks to be 100% effective. Second, despite using a flickering white noise mask at full contrast, it is possible that it may not have been the most effective mask; for instance, a mask consisting of many randomly oriented Gabor patches matched in spatial frequency to the stimuli could prove more effective. We decided against such a mask because we were concerned that it could potentially act as a new input to orientation-sensitive neurons, rather than just wiping out any residual sensory activity.

      (4) I apologize if I missed it, but the authors did not compare the DyNR model to a model without decaying sensory information for Experiment 1.

      We tested two DyNR variants in which the diffusion process was solely responsible for memory fidelity dynamics. These models assumed that the sensory signal terminates abruptly with stimuli offset, and the VWM signal encoding the stimuli was equal to the limit imposed by normalization, independent of the delay duration.

      As variants of this model failed to account for the observed response errors both quantitatively (see 'Fixed neural signal' under Model variants) and qualitatively (Figure S3), we decided not to test any more restrictive variants, such as the one without sensory decay and diffusion.

      (5) In the current model, selection is considered to be absolute (all or none). However, this need not be the case (previous work argues for graded selection). Could a model where memories are only partially selected, in a manner that is mediated by load, explain the load effects seen in behavior?

      Thank you for this point. If attentional selection was partial, it would affect the observers’ efficiency in discarding uncued objects to release allocated resources and encode additional information about the cued item. We and others have previously examined whether humans can efficiently update their VWM when previous items become obsolete. For example, Taylor et al. (2023) showed that observers could efficiently remove uncued items from VWM and reallocate the released resources to new visual information. These findings align with results from other studies (e.g., Ecker, Oberauer, & Lewandowsky, 2014; Kessler & Meiran, 2006; Williams et al., 2013).

      Based on these findings, we feel justified in assuming that observers in our current task were capable of fully removing all uncued objects, allowing them to continue the encoding process for the cued orientation that was already partially stored in VWM, such that the attainable limit on representational precision for the cued item equals the maximum precision of VWM.

      Partial removal could in principle be modelled in the DyNR model by introducing an additional plateau parameter specifying a maximum attainable precision after the cue. Our concern would be that such a plateau parameter would trade off with the parameter associated with Hick’s law (i.e., cue interpretation time). The former would control the amount of information that can be encoded into VWM, while the latter regulates the amount of sensory information available for encoding. We are wary of adding additional parameters, and hence flexibility, to the model where we do not have the data to sufficiently constrain them.

      Ecker, U. K. H., Oberauer, K., & Lewandowsky, S. (2014b). Working memory updating involves item-specific removal. Journal of Memory and Language, 74, 1–15. https://doi.org/10.1016/j.jml. 2014.03.006

      Kessler, Y., & Meiran, N. (2006). All updateable objects in working memory are updated whenever any of them are modified: Evidence from the memory updating paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 570–585. https://doi.org/10.1037/0278-7393.32.3.570

      Taylor, R., Tomić, I., Aagten-Murphy, D., & Bays, P. M. (2023). Working memory is updated by reallocation of resources from obsolete to new items. Attention, Perception, & Psychophysics, 85(5), 1437–1451. https://doi.org/10.3758/s13414-022-02584-2

      Williams, M., & Woodman, G. F. (2012). Directed forgetting and directed remembering in visual working memory. Journal of Experimental Psychology. Learning, Memory, and Cognition, 38(5), 1206–1220. https://doi.org/10.1037/a0027389

      (6) Previous work, both from the authors and others, has shown that memories are biased as if they are acted on by attractive/repulsive forces. For example, the memory of an oriented bar is biased away from horizontal and vertical and biased towards diagonals. This is not accounted for in the current model. In particular, this could be one mechanism to generate a non-uniform drift rate over time. As noted in the paper, a non-uniform drift rate could capture many of the behavioral effects reported.

      The reviewer is correct that the model does not currently include stimulus-specific effects, although our work on that topic provides a clear template for incorporating them in future (e.g. Taylor & Bays, 2018). Specifically on the question of generating a non-uniform drift, we have another project that currently looks at this exact question (cited in our manuscript as Tomic, Girones, Lengyel, and Bays; in prep.). By examining various datasets with varying memory delays, including the Additional Dataset 1 reported in the Supplementary Information, we found that stimulus-specific effects on orientation recall remain constant with retention time. Specifically, although there is a clear increase in overall error over time, estimation biases remain constant in direction and amplitude, indicating that the bias does not manifest in drift rates (see also Rademaker et al., 2018; Figure S1).

      Taylor, R., & Bays, P. M. (2018). Efficient coding in visual working memory accounts for stimulus-specific variations in recall. The Journal of Neuroscience, 1018–18. https://doi.org/10.1523/JNEUROSCI.1018-18.2018

      Rademaker, R. L., Park, Y. E., Sack, A. T., & Tong, F. (2018). Evidence of gradual loss of precision for simple features and complex objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance. https://doi.org/10.1037/xhp0000491

      (7) Finally, the authors use AIC to compare many different model variants to the DyNR model. The delta-AICs are high (>10), indicating a strong preference for the DyNR model over the variants. However, the overall quality of fit to the data is not clear. What proportion of the variance in data was the model able to explain? In particular, I think it would be helpful for the reader if the authors reported the variance explained on withheld data (trials, conditions, or subjects).

      Thank you for this comment.

      Below we report the estimates of r2, representing the goodness of fit between observed data (i.e., RMSE) and the DyNR model predictions.

      In Experiment 1, the r2 values between observations and predictions were computed across delays for each set size, yielding the following estimates: r2ss1 = 0.60; r2ss4 = 0.87; r2ss10 = 0.95. Note that lower explained variance for set size 1 arises from both data and model predictions having near-constant precision.

      In Experiment 2, we calculated r2 between observations and predictions across presentation durations, separately for each set size, resulting in the following estimates: r2ss1 = 0.88; r2ss4 = 0.71; r2ss10 = 0.70. Note that in this case the decreasing percentage of explained variance with set size is a consequence of having less variability in both data and model predictions with larger set sizes.

      While these estimates suggest that the DyNR model effectively fits the psychophysical data, a more rigorous validation approach would involve cross-validation checks across all conditions with a withheld portion of trials. Regrettably, due to the large number of conditions in each experiment, we could only collect 50 trials per condition. We are sceptical that fitting the model to even fewer trials, as necessary for cross-validation, would provide a reliable assessment of model performance.

      Minor: It isn't clear to me why the behavioral tasks are shown in Figure 6. They are important for understanding the results and are discussed earlier in the manuscript (before Figure 3). This just required flipping back and forth to understand the task before I could interpret the results.

      Thank you for this comment. We have now moved the behavioural task figure to appear early in the manuscript (as Figure 3).

      Reviewer #3 (Recommendations For The Authors):

      (1) Dynamics of sensory signals during perception

      I believe that the modeled sensory signal is a reasonable simplification and different ways to model the decay function are discussed. I would like to ask the authors to discuss the implications of slightly more complex initial sensory transients such as the ones shown in Teeuwen (2021). Specifically for short exposure times, this might be particularly relevant for the model fits as some of the alternative models diverge from the data for short exposures. In addition, the role of feedforward (initial transient?) and feedback signaling (subsequent "plateau" activity) could be discussed. The first one might relate more strongly to sensory signals whereas the latter relates more to top-down attention/recurrent processing/VWM.

      Particularly, this latter response might also be sensitive to the number of items present on the screen which leads to a related question pertaining to the limitations of attention during perception. Some work suggests that perception is similarly limited in the amount of information that can be represented concurrently (Tsubomi, 2013). Could the authors discuss the implications of this hypothesis? What happens if maximum sensory amplitude is set as a free parameter in the model?

      Tsubomi, H., Fukuda, K., Watanabe, K., & Vogel, E. K. (2013). Neural limits to representing objects still within view. Journal of Neuroscience, 33(19), 8257-8263.

      Thank you for this question. Below, we unpack it and answer it point by point.

      While we agree our model of the sensory response is justified as an idealization of the biological reality, we also recognise that recent electrophysiological recordings have illuminated intricacies of neuronal responses within the striate cortex, a critical neural region associated with sensory memory (Teeuwen et al, 2021). Notably, these recordings reveal a more nuanced pattern where neurons exhibit an initial burst of activity succeeded by a lower plateau in firing rate, and stimulus offset elicits a second small burst in the response of some neurons, followed by a gradual decrease in activity after the stimulus disappears (Teeuwen et al, 2021).

      In general, asynchronous bursts of activity in individual neurons will tend to average out in the population making little difference to predictions of the DyNR model. Synchronized bursts at stimulus onset could affect predictions for the shortest presentations in Exp 2, however the model appears to capture the data very well without including them. We would be wary of incorporating these phenomena into the model without more clarity on their universality (e.g., how stimulus-dependent they are), their significance at the population level (as opposed to individual neurons), and most importantly, their prominence in visual areas outside striate cortex. Specifically, while Teeuwen et al. (2021) described activity in V1, our model does not make strong assumptions about which visual areas are the source of the sensory input to WM. Based on these uncertainties we believe the idealized sensory response is justified for use in our model.

      Next, thank you for the comment on feedforward and feedback signals. We have added the following to our manuscript:

      “Following onset of a stimulus, the visual signal ascends through visual areas via a cascade of feedforward connections. This feedforward sweep conveys sensory information that persists during stimulus presentation and briefly after it disappears (Lamme et al., 1998). Simultaneously, reciprocal feedback connections carry higher-order information back towards antecedent cortical areas (Lamme and Roelfsema, 2000). In our psychophysical task, feedback connections likely play a critical role in orienting attention towards the cued item, facilitating the extraction of persisting sensory signals, and potentially signalling continuous information on the available resources for VWM encoding. While our computational study does not address the nature of these feedforward and feedback signals, a challenge for future research is to describe the relative contributions of these signals in mediating transmission of information between sensory and working memory (Semedo et al., 2022).”

      Lamme, V. A., Supèr, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in the visual cortex. Current Opinion in Neurobiology, 8(4), 529–535. https://doi.org/10.1016/S0959-4388(98)80042-1

      Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23(11), 571–579. https://doi.org/10.1016/S0166-2236(00)01657-X

      Semedo, J. D., Jasper, A. I., Zandvakili, A., Krishna, A., Aschner, A., Machens, C. K., Kohn, A., & Yu, B. M. (2022). Feedforward and feedback interactions between visual cortical areas use different population activity patterns. Nature Communications, 13(1), 1099. https://doi.org/10.1038/s41467-022-28552-w

      Finally, both you and Reviewer 2 raised a similar interesting question regarding capacity limitations of attention during perception Such a limitation could be modelled by freely estimating sensory amplitude and implementing divisive normalization to that signal, similar to how VWM is constrained. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      (2) Effectivity of retro-cues at long delays

      Can the authors discuss how cues presented at long delays (>1000 ms) can still lead to increased memory fidelity when sensory signals are likely to have decayed? A list of experimental work demonstrating this can be found in Souza & Oberauer (2016).

      Souza, A. S., & Oberauer, K. (2016). In search of the focus of attention in working memory: 13 years of the retro-cue effect. Attention, Perception, & Psychophysics, 78, 1839-1860.

      The increased memory fidelity observed with longer delays between memory array offset and cue does not result from integrating available sensory signals into VWM because the sensory signal would have completely decayed by that time. Instead, research so far has indicated several alternative mechanisms that could lead to higher recall precision for cued items, and we can briefly summarize some of them, which are also reviewed in more detail in Souza and Oberauer (2016).

      One possibility is that, after a highly predictive retro-cue indicates the to-be-tested item, uncued items can simply be removed from VWM. This could result in decreased interference for the cued item, and consequently higher recall precision. Secondly, the retro-cue could also indicate which item can be selectively attended to, and thereby differentially strengthening it in memory. Furthermore, the retro-cue could allow evidence to accumulate for the target item ahead of decision-making, and this could increase the probability that the correct information will be selected for response. Finally, the retro-cued stimulus could be insulated from interference by subsequent visual input, while the uncued stimuli may remain prone to such interference.

      A neural account of this retro-cue effect based on the original neural resource model has been proposed in Bays & Taylor, Cog Psych, 2018. However, as we did not use a retro-cue design in the present experiments, we have decided not to elaborate on this in the manuscript.

      (3) Swap errors

      I am somewhat surprised by the empirically observed and predicted pattern of swap errors displayed in Figure S2. For set size 10, swap probability does not consistently increase with the duration of the retention interval, although this was predicted by the author's model. At long intervals, swap probability is significantly higher for large compared to small set sizes, which also seems to contrast with the idea of shared, limited VWM resources. Can the authors provide some insight into why the model fails to reproduce part of the behavioral pattern for swap errors? The sentence in line 602 might also need some reconsideration in this regard.

      Determining the ground truth for swap errors poses a challenge. The prevailing approach has been to employ a simpler model that estimates swap errors, such as a three-component mixture model, and use those estimates as a proxy for ground truth. However, this method is not without its shortcomings. For example, the variability of swap frequency estimates tends to increase with variability in the report feature dimension (here, orientation). This is due to the increasing overlap of response probability distributions for swap and non-swap responses. Consequently, the discrepancy between any two methods of swap estimation is most noticeable when there is substantial variability in orientation reports (e.g., 10 items and long delay or short exposure).

      When modelling swap frequency in the DyNR model, our aim was to provide a parsimonious account of swap errors while implementing similar dynamics in the spatial (cue) feature as in the orientation (report) feature. This parametric description captured the overall pattern of swap frequency with set size and retention and encoding time, but is still only an approximation of the predictions if we fully modelled memory for the conjunction of cue and report features (as in e.g. Schneegans & Bays, 2017; McMaster et al, 2020).

      We expanded the existing text in the section ‘Representational dynamics of cue-dimension features’ of our manuscript:

      “… Although we did not explicitly model the neural signals representing location, the modelled dynamics in the probability of swap errors were consistent with those of the primary memory feature. We provided a more detailed neural account of swap errors in our earlier works that is theoretically compatible with the DyNR model (McMaster et al., 2020; Schneegans & Bays, 2017).

      The DyNR model successfully captured the observed pattern of swap frequencies (intrusion errors). The only notable discrepancy between DyNR and the three-component mixture model (Fig. S2) arises with the largest set size and longest delay, although with considerable interindividual variability. As the variability in report-dimension increases, the estimates of swap frequency become more variable due to the growing overlap between the probability distributions of swap and non-swap responses. This may explain apparent deviations from the modelled swap frequencies with the highest set size and longest delay where orientation response variability was greatest. “

      McMaster, J. M. V., Tomić, I., Schneegans, S., & Bays, P. M. (2022). Swap errors in visual working memory are fully explained by cue-feature variability. Cognitive Psychology, 137, 101493. https://doi.org/10.1016/j.cogpsych.2022.101493

      Schneegans, S., & Bays, P. M. (2017). Neural Architecture for Feature Binding in Visual Working Memory. The Journal of Neuroscience, 37(14), 3913–3925. https://doi.org/10.1523/JNEUROSCI.3493-16.2017

      (4) Direct sensory readout

      The model assumes that readout from sensory memory and from VWM happens with identical efficiency. Currently, we don't know if these two systems are highly overlapping or are fundamentally different in terms of architecture and computation. In the case of the latter, it might be less reasonable to assume that information readout would happen at similar efficiencies, as it is currently assumed in the manuscript. Perhaps the authors could briefly discuss this possibility.

      In the direct sensory read-out model, we did not explicitly model the efficiency of readout from either sensory or VWM store. However, the distinctive prediction of this model is that the precision of recall changes exponentially with delay at every set size, including one item. This prediction does not depend on the relative efficiency of readout from sensory and working memory, but only on the principle that direct readout from sensory memory bypasses the capacity limit on working memory. This prediction is inconsistent with the pattern of results observed in Experiment 1, where early cues did not show a beneficial effect on recall error for set size 1. While the proposal raised by the reviewer is intriguing, even if we were to model the process of readout from both the sensory and VWM stores with different efficiencies, the direct read-out model could not account for the near-constant recall error with delay for set size one.

      (5) Encoding of distractors

      One of the model assumptions is that, for simultaneous presentations of memory array and cue only the cued feature will be encoded. Previous work has suggested that participants often accidentally encode distractors even when they are cued before memory array onset (Vogel 2005). Given these findings, how reasonable is this assumption in the authors' model?

      Vogel, E. K., McCollough, A. W., & Machizawa, M. G. (2005). Neural measures reveal individual differences in controlling access to working memory. Nature, 438(7067), 500-503.

      Although previous research suggested that observers can misinterpret the pre-cue and encode one of the uncued items, our results argue against this being the case in the current experiment. Such encoding failures would manifest in overall recall error, resulting in a gradient of error with set size, owing to the presence of more adjacent distractors in larger set sizes. However, when we compared recall errors between set sizes in the simultaneous cue condition, we did not find a significant difference between set sizes, and moreover, our results were more likely under the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). If observers occasionally encoded and reported one of the uncued items in the simultaneous cue condition, those errors were extremely infrequent and did not affect the overall error distributions.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Zeng and Staley provide a valuable analysis of the molecular requirements for the export of a reporter mRNA that contains a lariat structure at its 5' end in the budding yeast S. cerevisiae. The authors provide evidence that this is regulated by the main mRNA export machinery (Yra1, Mex67, Nab2, Npl3, Tom1, and Mlp1). Of note, Mlp1 has been mainly implicated in the nuclear retention of unspliced pre-mRNA (i.e. quality control), and relatively little has been done to investigate its role in mRNA export in budding yeast.

      Strengths:

      There is relatively little information in the current literature about the nuclear export of splicing intermediates. This paper provides one of the first analyses of this process and dissects the molecular components that promote this form of RNA export. Overall, the strength of the data presented in the manuscript is solid. The paper is well written and the message is clear and of general interest to the mRNA community.

      We thank the reviewer for highlighting these strengths.

      Weaknesses:

      There are three problems with the paper, although these are not major and likely would not affect the final model as most aspects of the molecular details are confirmed by multiple complementary assays.

      (1) The brG reporter produces both unspliced pre-mRNA and a lariat-containing intermediate RNA. Based on the primer extension assay the authors claim that only 33% of the final product is in pre-mRNA form and that this "is insufficient to account for the magnitude of the cytoplasmic signal from the brG reporter (83%)". Nevertheless, it is possible that primer extension is incomplete or that the lariat-containing RNA is inaccessible for smFISH. The authors could easily perform a dual smFISH experiment (similar to Adivarahan et l., Molecular Cell 2018) where exon 1 is labelled with probes of one color, and the region that overlaps the lariat-containing intermediate is labelled with probes of a second color. If the authors are correct, then one-third of the smFISH foci should have both labels and the rest would have only the second label. This would also confirm that the latter (i.e. the lariat-containing RNAs) are exported to the cytoplasm. Using this approach, the authors could then show that MLP1-depletion (or depletion of any of the other factors) affect(s) one pool of RNAs (i.e. those that are lariat-containing) but not the other (i.e. pre-mRNA). Including these experiments would make the evidence for their model more convincing.

      We appreciate the reviewer’s comments and suggestions. Concerning the primer extension analysis, we are considering alternative assays to quantitate the pre-mRNA and lariat intermediate levels. Concerning the accessibility of the lariat intermediate in smRNA-FISH, in a dbr1∆ strain the only major species from the UAc reporter that is detected by primer extension is the lariat intermediate (Fig. S3), and this reporter is readily detected by smRNA-FISH, indicate that the lariat intermediate is accessible to smRNA-FISH. Concerning discriminating between pre-mRNA and lariat intermediate by smRNA-FISH, we agree with the reviewer that a dual smFISH experiment would directly distinguish between the signals of these species. The brG reporter we used in most smRNA-FISH experiments has a 5’ exon that is too short for smRNA-FISH probes, as is typical of most budding yeast 5’ exons. We have tried to replace the 5’ exon with a longer sequence (GFP) to allow for smRNA-FISH; however, this substitution inhibited splicing. Therefore, to distinguish signals from pre-mRNA versus lariat intermediate, we used additional reporters: G1c and brC reporters, which accumulate pre-mRNA essentially exclusively (Fig. S2A-C), and the UAc reporter, which accumulates lariat intermediate exclusively, in a dbr1∆ strain (Fig. S3). Whereas the mlp1 deletion did not change beta-galactosidase activities of the G1c and brC pre-mRNA-accumulating reporters (Fig. S2E), the mlp1 deletion in a dbr1∆ background did reduce the beta-galactosidase activities of the UAc lariat intermediate-accumulating reporter (Fig. 3D) and did increase smRNA-FISH signal of this reporter in the nucleus (Fig. 3E). These observations corroborate our interpretation based on the brG reporter that Mlp1p is required for efficient export of lariat intermediates but not pre-mRNAs.

      (2) In some cases, the number of smFISH foci appears to change drastically depending on the genetic background. This could either be due to the stochastic nature of mRNA expression between cells or reflect real differences between the genetic backgrounds that could alter the interpretation of the other observations.

      We thank the reviewer for raising this point. We will review our data to distinguish between these possibilities.

      (3) The authors state in the discussion that "the general mRNA export pathway transports discarded lariat intermediates into the cytoplasm". Although this appears to be the case for the reporters that are investigated in this paper, I don't think that the authors should make such a broad sweeping claim. It may be that some discarded lariat intermediates are exported to the cytoplasm while others are targeted for nuclear retention and/or decay.

      The reviewer’s point is well-taken. We will revise the wording accordingly.

      Reviewer #2 (Public Review):

      In this report, Zeng and Staley have used an elegant combination of RNA imaging approaches (single molecule FISH), RNA co-immunoprecipitations, and translation reporters to characterize the factors and pathways involved in the nuclear export of splicing intermediates in budding yeast. Their study notably involves the use of specific reporter genes, which lead to the accumulation of pre-mRNA and lariat species, in a battery of mutants impacting mRNA export and quality control.

      The authors convincingly demonstrate that mRNA species expressed from such reporters are exported to the cytoplasm in a manner depending on the canonical mRNA export machinery (Mex67 and its adaptors) and the nuclear pore complex (NPC) basket (Mlp1). Interestingly, they provide evidence that the export of splicing intermediates requires docking and subsequent undocking at the nuclear basket, a step possibly more critical than for regular mRNAs.

      We thank the reviewer for this overall positive assessment.

      However, their assays do not always allow us to define whether the impacted mRNA species correspond to lariats and/or pre-mRNAs. This is all the more critical since their findings apparently contradict previous reports that supported a role for the nuclear basket in pre-mRNA quality control. These earlier studies, which were similarly based on the use of dedicated yet distinct reporters, had found that the nuclear basket subunit Mlp1, together with different cofactors, prevents the export of unspliced mRNA species. It would be important to clarify experimentally and discuss the possible reasons for these discrepancies.

      It is true that we did not assess export of all reporters in all mutant strains by smFISH; however, we did validate the key conclusion that the export of lariat intermediates requires the nuclear basket gene MLP1: the export of both the brG reporter (mostly lariat intermediate) and the UAc reporter (exclusively lariat intermediate) showed a dependence on MLP1 (Fig. 3). Further, by beta-galactosidase activity, we tested in total five separate reporters – three that accumulated lariat intermediate and two that accumulated exclusively pre-mRNA; only the three reporters accumulating lariat intermediate showed a dependence of export on MLP1 (Fig. 4B,D; Fig S2D); the reporters accumulating pre-mRNA did not show a dependence on MLP1 (Fig. S2E), further validating our main conclusion. We are considering additional experiments to validate this key conclusion even further. Also, see response to comment 1 from reviewer 1.

      We agree that the main conclusion from this manuscript differs from earlier studies. A key difference is that prior studies monitored exclusively pre-mRNA. In our study, we monitored pre-mRNA and lariat intermediate species and in doing so revealed a role for MLP1 in the export of lariat intermediates. This study, our previous study, as well as the previous studies of others have all provided evidence for efficient export of pre-mRNA; all of these studies are in conflict with the studies purporting a general role for the nuclear basked in retaining immature mRNA. Still, these past apparently conflicting studies can be re-interpreted in the context of our model that the export of such species requires docking at the nuclear basket, followed by undocking. In a revised manuscript, we will discuss the possibility that pre-mRNA apparently “retained” by the nuclear basket are stalled in export at the undocking stage.

      Reviewer #3 (Public Review):

      Summary:

      Zeng and Stanley show that in yeast, intron-lariat intermediates that accumulated due to defects in pre-mRNA splicing, are transported to the cytoplasm using the canonical mRNA export pathway. Moreover, they demonstrate that export requires the nuclear basket, a sub-structure of the nuclear pore complex previously implicated with the retention of immature mRNAs. These observations are important as they put into question a longstanding model that the main role of the nuclear basket is to ensure nuclear retention of immature or faulty mRNAs.

      Strengths:

      The authors elegantly combine genetic, biochemical, and single-molecule resolution microscopy approaches to identify the cellular pathway that mediates the cytoplasmic accumulation of lariat intermediates. Cytoplasmic accumulation of such splicing intermediates had been observed in various previous studies but how these RNAs reach the cytoplasm had not yet been investigated. By using smFISH, the authors present compelling, and, for the first time, direct evidence that these intermediates accumulate in the cytoplasm and that this requires the canonical mRNA export pathway, including the RNA export receptor Mex67 as well as various RNA-binding proteins including Yra1, Npl3 and Nab2. Moreover, they show that the export of lariat intermediates, but not mRNAs, requires the nuclear basket (Mlp1) and basket-associated proteins previously linked to the mRNP rearrangements at the nuclear pore. This is a surprising and important observation with respect to a possible function of the nuclear basket in mRNA export and quality control, as it challenges a longstanding model that the role of the basket in mRNA export is primarily to act as a gatekeeper to ensure that immature mRNAs are not exported. As discussed by the authors, their finding suggests a role for the basket in promoting the export of certain types of RNAs rather than retention, a model also supported by more recent studies in mammalian cells. Moreover, their findings also collaborate with a recent paper showing that in yeast, not all nuclear pores contain a basket (PMID: 36220102), an observation that also questioned the gatekeeper model of the basket, as it is difficult to imagine how the basket can serve as a gatekeeper if not all nuclear pore contain such a structure.

      We thank the reviewer for highlighting the importance and surprising nature of our findings.

      Weaknesses:

      One weakness of this study is that all their experiments rely on using synthetic splicing reporter containing a lacZ gene that produces a relatively long transcript compared to the average yeast mRNA.

      We are considering repeating some of our experiments to monitor export of RNAs with more average lengths.

      The rationale for using a reporter containing the brG (G branch point) resulting in more stable lariat intermediates due to them being inefficient substrates for the debranching enzyme Dbr1 could be described earlier in the manuscript, as this otherwise only becomes clear towards the end, what is confusing.

      We thank the reviewer for this comment. We will revise the text to explain sooner the rationale for using the brG reporter to assess the export of lariat intermediates.

      Discussion of their observation in the context that, in yeast, not all pores contain a basket would be useful.

      Thanks for this suggestion. We will raise this point that a nuclear basket is not present on all nuclear pores and discuss the implications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      This paper reports how mycobacterial cAMP level is increased under stressful conditions and that the increase is important in the survival of the bacterium in animal hosts.

      Strengths:

      The authors show that under different stresses the response regulator PhoP represses a phosphodiesterase (PDE) that degrades cAMP specifically. Identification of a PDE specific to cAMP is significant progress in understanding Mtb pathogenesis. An increase in cAMP apparently increases bacterial survival upon infection. On the practical side, the reduction of cAMP by increasing PDE can be a means to attenuate the growth of the bacilli. The results have wider implications since PhoP is implicated in controlling diverse mycobacterial stress responses and many bacterial pathogens modulate host cell cAMP level. The results here are straightforward, internally consistent, and of both theoretical and applied interests.

      We thank the reviewers for these extremely encouraging comments.

      Weaknesses:

      Repression of PDE promoter by binding of phosphorylated PhoP could have been shown at higher precision. The binding is now somewhere along a roughly 500 bp region. Although the regulation of PDE is shown to be by transcriptional repression only, it has been described as a homeostatic mechanism. The latter would have required a demonstration of both repression and activation by negative feedback.

      We agree. We have now performed EMSA (Electrophoretic Mobility Shift Assay) experiments and included the data showing DNA binding of PhoP to the upstream regulatory region of rv0805 (rv0805up) as a supplemental figure (see Figure 2-figure supplement 1). The supplemental figure, figure caption, and the relevant results have been adjusted accordingly in the revised manuscript.

      Further, as recommended by the reviewer we have now removed the term ‘homeostatic mechanism’ and rephrased it with ‘maintenance of cAMP level’ in the manuscript.

      Response to Reviewers’ comments

      Reviewer #1:

      The authors have used homeostasis inappropriately. Homeostasis usually requires negative feedback (a clear example is the regulation of Lambda prm promoter). Here, there is no feedback from changes in PDE or cAMP level to their synthesis. Homeostasis does not belong to this paper anywhere.

      As recommended by the reviewer, we have now removed “homeostasis” from the manuscript and mostly replaced it with “maintenance of cAMP level” in the revised manuscript.

      The authors have frequently used adverbs at the beginning of a sentence, such as Notably (l.240, 272, 376), Importantly (l.66, 213), More importantly (l.134), Remarkably (l.264), Interestingly (l.115,301), Intriguingly (l.344), unambiguously (l.347), etc. The use of these words is generally counter-productive. The authors should scan the ms. to eliminate them as far as possible. The sentences would read more clearly and become more impactful.

      Following reviewer’s recommendation, we have now eliminated most of the adverbs, mostly used at the beginning of sentences, in the revised manuscript.

      Specific comments

      (1) L.1: "maintenance of homeostasis" or increasing cAMP level.

      As suggested by the reviewer, we have now replaced “maintenance of cAMP homeostasis” with “maintenance of cAMP level”.

      (2) L.27: mechanism or reason; varying or various.

      As recommended by the reviewer, we have now replaced “mechanism” with “reason” and the word “varying” is deleted while incorporating suggested changes in the abstract.

      (3) L.28-29: The logic of connecting PhoP to cAMP doesn't follow well. The logic is much better in l.54, l.112-5 and l.130.

      We thank the reviewer for this suggestion. We have now modified the statement within the ‘abstract’ in the revised manuscript (duplicated below):

      “cAMP is one of the most widely used second messengers which impacts on a wide range of cellular responses in microbial pathogens including M. tuberculosis. Herein, we hypothesized that intra-mycobacterial cAMP level could be controlled by the phoP locus since the major regulator plays a key role in bacterial response against numerous stress conditions.”

      (4) L.30: discovers or reveals (?). Also, in l.101.

      As recommended by the reviewer, we have now replaced ‘discovers’ with ‘reveals’ in the Abstract and ‘uncovered’ with ‘revealed’ in the Introduction section of the manuscript.

      (5) L.31: Delete "The most - - derived". It is not obvious what most fundamental means here. I suggest: We find that PhoP-dependent ---involves specific binding of the regulator---PDE gene.

      As recommended by the reviewer, we have modified the statement (duplicated below): “In keeping with these results, we find specific recruitment of the regulator within the promoter region of rv0805 PDE, and absence of phoP or ectopic expression of rv0805 independently accounts for elevated PDE synthesis leading to depletion of intra-mycobacterial cAMP level.”

      (6) L.36: --pathway decreases cAMP level, stress tolerance, and survival of the bacilli.

      As recommended by the reviewer, we have now modified the statement (duplicated below): “Thus, genetic manipulation to inactivate PhoP-Rv0805-cAMP pathway decreases cAMP level, stress tolerance, and intracellular survival of the bacilli.

      (7) L.41: 'keeps encountering" or encounters?

      As suggested by the reviewer, we have replaced ‘keeps encountering’ with ‘encounters’ in the ‘Introduction’ section of the revised manuscript.

      (8) L.61: responds, carries.

      Our apologies for the embarrassing grammatical mistakes. We have rectified these errors in the revised manuscript.

      (9) L.67: you mean burst in synthesis level, not burst of cAMP itself.

      To improve clarity, we have now modified the statement in the revised manuscript (duplicated below): “Agarwal and colleagues had shown that burst in synthesis of bacterial cAMP upon infection of macrophages, improved bacterial survival by interfering with host signalling pathways (Agarwal et al., 2009)”

      Reference

      Agarwal N, Lamichhane G, Gupta R, Nolan S, Bishai WR (2009) Cyclic AMP intoxication of macrophages by a Mycobacterium tuberculosis adenylate cyclase. Nature 460: 98-102

      (10) L.77: Change Off to Of.

      We are sorry for the inaccuracy. The suggested change has been made to the text.

      (11) L.83: Did not discuss "degradation" earlier.

      Following reviewer’s recommendation, we have now modified the statement in the revised manuscript (duplicated below).

      “Together, these results strongly suggest that a balance between cAMP synthesis by adenylate cyclases and cAMP degradation by phosphodiesterases contributes to rapid adaptive response of mycobacteria in a hostile intracellular environment (Johnson and McDonough, 2018; McDonough and Rodriguez, 2011).”

      Reference

      Johnson RM, McDonough KA (2018) Cyclic nucleotide signaling in Mycobacterium tuberculosis: an expanding repertoire. Pathog Dis 76 (5)

      McDonough KA, Rodriguez A (2011) The myriad roles of cyclic AMP in microbial pathogens: from signal to sword. Nature reviews Microbiology 10: 27-38

      (12) L.95: Isn't PhoPR a two-component signal transduction system, the terminology that is more specific than a two-protein regulatory system?

      As recommended by the reviewer, we have replaced “two protein regulatory system” with more specific “two-component signal transduction system” in the revised manuscript.

      (13) L.124: check-point prevents things from happening. Here the mechanism you found allows growth and survival.

      We agree. As recommended by the reviewer, we have now modified the sentence in the revised manuscript (duplicated below).

      “Together, the newly identified mechanism of regulation of cAMP level allows intraphagosomal survival and growth program of mycobacteria.”

      (14) L.132: why not say directly-"---under normal, and NO and acid stress conditions (Fig. 1A).

      As recommended by the reviewer, we have now deleted the first part of the sentence and directly stated that “we compared cAMP levels………. under normal, NO and acidic stress conditions” (duplicated below).

      “We compared cAMP levels of WT and phoPR-KO (lacking both phoP and phoR), grown under normal, NO stress and acid stress conditions (Fig. 1A).”

      (15) L.134: The complementation is quite variable. Also true in Fig. 2A. If no simple answer, you can say- cAMP values increased in complemented cells, although to a variable extent, for reasons unknown.

      We agree with the reviewer. We have now incorporated new text in the ‘Results’ section of the revised manuscript (duplicated below):

      “A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress conditions (Khan et al., 2022).”

      (16) L.154: You rather not say "conclude" and "most likely" at the same time. How about replacing "we conclude" with suggests? In that case, no need to say "most likely". Also, in l.306-7 & l.322-3.

      We thank the reviewer for these suggestions. We have now modified the statements in the revised manuscript (duplicated below).

      “We suggest that lower cAMP level of the mutant is not due to its higher efficacy of cAMP secretion.”

      Following reviewer’s recommendation, we have incorporated similar changes in two other places of the ‘Results’ section of the revised manuscript.

      (17) L.161: introduce both the acronyms here and not in l.162.

      Following reviewer’s recommendation, we have made the suggested changes.

      (18) L.164: Second, (to be in line with First).

      We have made the suggested change.

      (19). Fig. 2C: There are no black and white bars. This is an important figure because the results appear in the abstract. The signal change from pH 7 to 4.5 is not much. An independent approach would have been desirable. If it were E. coli, I would have suggested beta-gal assay or in vivo footprints. Is a PhoP binding site recognizable in the promoter region of rv0805?

      We apologize for the inaccuracy. We have corrected it in the revised manuscript. Also, we have now carried out DNA binding assays, and included the EMSA data of rv0805 upstream regulatory region binding to phosphorylated PhoP (P~PhoP) as a supplemental figure (Figure 2-figure supplement 1A-B). In this figure, we have also incorporated our results on the likely PhoP binding site within rv0805up. The new figure, figure caption and the relevant results have been adjusted accordingly in the revised manuscript.

      (20) L.209: ORFs; also delete "of growth" from the sentence.

      The suggested changes were made to the text.

      (21) L.213: Delete Importantly and change "failed to" to 'did not' (since you did not motivate the expectation earlier, it is better to state the results in an unbiased way).

      As recommended by the reviewer, both changes were included in the revised manuscript.

      (22) L.217: The requirement of PhoR is a new result - why say "confirm". Change it to indicate. Also, delete "indeed" here and from L.233.

      As recommended by the reviewer, both changes were included in the revised manuscript.

      (23) L.224: Are the results in Fig 3-S1A under inducing conditions?

      The results shown in Fig 3-S1A are not under inducing conditions of expression. For better clarity, we have modified the sentence describing Figure 3-figure supplement 1A (duplicated below).

      “rv0805 ORF was cloned within the multicloning site of integrative pSTki (Parikh et al., 2013) between EcoRI and HindIII sites under the control of Pmyc1tetO promoter, and expression of rv0805 under non-inducing condition was verified by determining the mRNA level (Figure 3 - figure supplement 1A).

      Reference:

      Parikh et al (2013) Development of a new generation of vectors for gene expression, gene replacement, and protein-protein interaction studies in mycobacteria. Applied and environmental microbiology 79: 1718-1729

      (24) L.225: ---cAMP level. Add (Fig. 3C) at the end of the next sentence.

      As recommended by the reviewer, both the suggested changes were made to the revised text.

      (25) L.231: Delete "Most importantly"- you didn't specify what are other less important results.

      We agree. We have now deleted “most importantly” from the sentence in the revised text.

      (26) L.243 & 254: Change homeostasis to level? Here you are showing mechanisms that can change cAMP level. Homeostasis here would mean how fluctuations in cAMP level are adjusted, usually requiring negative feedback.

      As recommended by the reviewer, ‘homeostasis’ was replaced with ‘level’ in both places.

      (27) L.256: stress response or stress? Also, in l.272

      We are sorry for the inaccuracy. We have corrected these errors in the revised version of the manuscript.

      (28) L.259: Change "maintenance of homeostasis" to 'repressing the rv0805 PDE gene'. It is safer to use a fact-based title. In this section, direct measurement of rv0805 mRNA, and/or cAMP levels in different genetic backgrounds seem desirable.

      We agree. As recommended by the reviewer, we have modified the title of the ‘Results’ section in the revised manuscript (duplicated below).

      “PhoP contributes to mycobacterial stress tolerance and intracellular survival by repressing the rv0805 PDE expression.”

      Please note that direct measurements of rv0805 mRNA and cAMP levels are part of Fig. 3 and Figure 3- figure supplement 1A, respectively.

      (29) Fig, 4A: White and grey symbols are not easily discriminated without zooming. Use color for phoPR-KO.

      We agree. We have now indicated the phoPR-KO in blue in the revised Fig. 4.

      (30) L.264: Delete remarkable or explain what is so remarkable. Aren't the results expected- the PDE level would go up in both cases. Direct measurement of PDE /cAMP levels would take the mystery out of the results.

      As recommended by the reviewer, we have deleted ‘remarkably’ in the revised text. We have measured cAMP and PDE expression levels of the four strains in Fig. 3 and Figure 3-figure supplement 1.

      (31) L.273: --suggesting a role of ---

      We have modified this sentence in the revised version of the manuscript (duplicated below).

      “A previous study had reported that phoP-deleted mutant strain was more sensitive to Cumene Hydrogen Peroxide (CHP), suggesting a role of PhoP in regulating mycobacterial stress response to oxidative stress (Walters et al., 2006).”

      Reference:

      Walters et al. (2006) The Mycobacterium tuberculosis PhoPR two-component system regulates genes essential for virulence and complex lipid biosynthesis. Mol Microbiol 60: 312-330

      (32) L.275: Delete "transcriptome". CHP sensitivity alone doesn't speak for transcriptome.

      As suggested by the reviewer, we have deleted “transcriptome”. Also, please see our response to the previous comment (above).

      (33) Fig. 4D and E: % Colocalization in the Merge panels is not much different among the four strains tested (to an untrained eye). Can the results be explained to readers not used to in vivo studies?

      As recommended by the reviewer, we have now incorporated new text to explain the in vivo experiment (duplicated below).

      “In this assay, WT-H37Rv inhibits phagosome maturation, whereas phagosomes with phoPR-KO mature into phagolysosomes (Anil Kumar et al., 2016).”

      Further, for better clarity of the results shown in Fig. 4D, we have (a) increased size of the figure to highlight the difference in the ‘merge’ panel; (b) included “white arrowheads” in the merge panels of Fig. 4D to indicate auramine labeled mycobacteria, which either have inhibited or facilitated trafficking into lysosomes, and finally (c) incorporated method used to calculate percent co-localization in greater details in the ‘Material and Methods’ section of the revised manuscript.

      Reference

      Anil Kumar et al. (2016) EspR-dependent ESAT-6 secretion of Mycobacterium tuberculosis requires the presence of virulence regulator PhoP. J Biol Chem. 291, 19018-19030

      (34) L.275-6: Delete "next" (also in l.347) and "Note that". In this paragraph, I was expecting some explanation on how phoPR-KO and WT-Rv0805 are behaving similarly. Even if the reason is not known, it should be mentioned.

      The suggested changes have been made to the text. Also, as recommended by the reviewer, we have included the following text in the revised manuscript (duplicated below):

      “Together, these results reveal similar behaviour of phoPR-KO, and WT-Rv0805 by demonstrating a comparably higher susceptibility of these strains to acidic pH and oxidative stress relative to WT bacteria and indicate a link between intra-mycobacterial cAMP level and bacterial stress response. Collectively, these data suggest that at least one of the mechanisms by which PhoP contributes to global stress response is attributable to maintenance of cAMP level.”

      (35) L.281: ---WT and indicate a link between cAMP level and stress response in mycobacteria. (No mention of homeostasis).

      The suggested change has been made to the revised text. Please see above our response to point # 34.

      (36) L.288, 290: No Thus and no clearly.

      Both the suggested changes have been made to the text.

      (37) L.297: Can you be more direct and state --is due to reduced cAMP level?

      As recommended by the reviewer, we have now modified the sentence to make it more direct in the revised manuscript (duplicated below):

      “Together, our findings facilitate an integrated view of our results, suggesting that higher susceptibility of WT-Rv0805 to stress conditions, is attributable to its reduced cAMP level.”

      (38) L.307: May delete "most likely----homeostasis". cAMP is not discussed here. The same deletion is desired in l.324.

      We agree. As recommended by the reviewer, we have now modified the relevant texts in the revised manuscript. These are duplicated below.

      “From these results, we suggest that ectopic expression of rv0805 impacts phagosome maturation arguing in favour of a role of PhoP in influencing phagosome-lysosome fusion in macrophages.”

      “Thus, we suggest that one of the reasons which accounts for an attenuated phenotype of phoPR-KO in both cellular and animal models is attributable to PhoP-dependent repression of rv0805 PDE activity, which controls mycobacterial cAMP level.”

      (39) L.342: cAMP level is regulated remains---

      The suggested change has been made to the revised text (duplicated below):

      “Although many bacterial pathogens modulate host cell cAMP level as a common strategy, the mechanism of regulation of mycobacterial cAMP level remains unknown.”

      (40) L.373: tone down "most fundamental". It is not obvious what is so profound about a stress-response system that depends on PhoP also depends on PhoR. OR justify what is most fundamental about it.

      We agree. Following reviewer’s recommendation, we have modified the text in the revised manuscript (duplicated below):

      “In keeping with these results, we find that PhoP-dependent rv0805 expression requires PhoR (Figs. 3A-B), the cognate kinase which activates PhoP in a signal-dependent manner (Gupta et al., 2006; Singh et al., 2023).”

      References:

      Gupta et al. (2006) Transcriptional autoregulation by Mycobacterium tuberculosis PhoP involves recognition of novel direct repeat sequences in the regulatory region of the promoter. FEBS Letters 580, 5328-5338.

      Singh et al. (2023) Dual functioning by the PhoR sensor is a key determinant to Mycobacterium tuberculosis virulence. PLoS Genetics 19(12): e1011070.

      (41) L.395: delete correspondingly (?)

      The suggested change has been made to the text.

      (42) L.396: Delete "appear to" and "somewhat". The uncertainty is already implied in "suggest". The evidence that ectopic expression of rv0805 is functionally equivalent to phoP deletion is quite clear in this paper and not saying that clearly is confusing.

      We agree with the reviewer. The suggested changes have been made to the revised text (duplicated below):

      “Thus, our results suggest that ectopic expression of rv0805 is functionally equivalent to deletion of the phoP locus.”

      (43) L.401: --over-expressing bacilli, induction level of rv0805 expression was significantly different in Matange et al and our studies. The next sentence is also very wordy.

      We have made changes to the text to address the reviewer’s concern. Also, the next sentence has been rewritten (duplicated below).

      “Although both studies were performed with rv0805 over-expressing bacilli, the fact that important differences in the expression of PDEs, in this study (Matange et al., 2013) and in our assays - yielding significantly different levels of rv0805 expression - most likely account for this discrepancy. While we cannot rule out the possibility of cleavage of other cyclic nucleotides by Rv0805 (Keppetipola & Shuman, 2008; Shenoy et al., 2007; Shenoy et al., 2005), consistent with a previous study our results correlate rv0805 expression with intra-mycobacterial cAMP level (Agarwal et al., 2009).”

      References:

      Matange et al. (2013) Overexpression of the Rv0805 phosphodiesterase elicits a cAMP-independent transcriptional response. Tuberculosis (Edinb) 93: 492-500.

      Keppetipola N, Shuman S (2008) A phosphate-binding histidine of binuclear metallophosphodiesterase enzymes is a determinant of 2',3'-cyclic nucleotide phosphodiesterase activity. J Biol Chem 283: 30942-30949

      Shenoy et al. (2007) Structural and biochemical analysis of the Rv0805 cyclic nucleotide phosphodiesterase from Mycobacterium tuberculosis. Journal of molecular biology 365: 211-225

      Shenoy et al. (2005) The Rv0805 gene from Mycobacterium tuberculosis encodes a 3',5'-cyclic nucleotide phosphodiesterase: biochemical and mutational analysis. Biochemistry 44: 15695-15704

      Agarwal N, Lamichhane G, Gupta R, Nolan S, Bishai WR (2009) Cyclic AMP intoxication of macrophages by a Mycobacterium tuberculosis adenylate cyclase. Nature 460: 98-102

      (44) L.409: To avoid saying "conclude" and "most likely" at the same time, can you start the sentence thus: 'We infer that Pho-----rv0805 is a---.

      We agree. We have made suggested changes to the text. The modified sentence is duplicated below:

      “We infer that PhoP-dependent regulation of Rv0805 is a critical regulator of intra-mycobacterial cAMP level.”

      (45) L.424. Delete "According to this model". In the preceding sentence, the subject is results, not model. This whole paragraph needs to be rewritten in fewer lines. The shorter the summary statement, the greater would be its impact (less is more here). I would delete the red circles from the figure- it appears that in the repressed state, you are making more products. Replace the circles with an arrow. The legend could be "Increased cAMP level and effective stress response" and "Decreased cAMP---and reduced---.

      We thank the reviewer for these suggestions. Following reviewer’s recommendations, we have made numerous changes and rewritten the paragraph in the revised manuscript (duplicated below):

      “In summary, upon sensing low acidic pH as a signal PhoR activates PhoP, P~PhoP binds to rv0805 upstream regulatory region and functions as a specific repressor of Rv0805. Therefore, we observed (a) a reproducibly lower level of cAMP in phoPR-KO relative to WT-H37Rv, (b) a significantly reduced expression of rv0805 in WT-H37Rv, grown under acidic pH relative to normal conditions, and (c) comparable cAMP levels in phoPR-KO and WT-Rv0805. This is why the two strains remain ineffective to mount an appropriate stress response, most likely due to their inability to coordinate regulation of gene expression because of dysregulation of intra-mycobacterial cAMP level. However, without uncoupling regulatory control of PhoPR and rv0805 expression, we cannot confirm that dysregulation of cAMP level accounts for virulence attenuation of phoPR-KO. Given the fact that rv0805-depleted M. tuberculosis is growth attenuated in vivo (McDowell et al., 2023), paradoxically ectopic expression of rv0805 leads to dysregulated metabolic adaptation, thereby resulting in reduced stress tolerance and intracellular survival.”

      Also, the suggested changes have been incorporated in Fig. 6 and the figure caption.

      Reference

      McDowell JR, Bai G, Lasek-Nesselquist E, Eisele LE, Wu Y, Hurteau G, Johnson R, Bai Y, Chen Y, Chan J et al (2023) Mycobacterial phosphodiesterase Rv0805 is a virulence determinant and its cyclic nucleotide hydrolytic activity is required for propionate detoxification. Mol Microbiol 119: 401-422

      (46) L.458 & 500: ---was used to transform.

      Following reviewer’s recommendation, the suggested changes were made to the text in the Materials and Methods section of the revised manuscript.

      (47) L.460: --- antibiotics plates.

      Both suggested changes were made to the text.

      (48) L.466-7: --they were transferred-pH 4.5) and grown for further-

      We thank the reviewer for these suggestions. The suggested changes were made to the text.

      (49) L.486: ---full-length ORFs of interest were---

      The suggested changes were incorporated in the revised manuscript.

      (50) L.497: The RNAs were 20 nt long and complementary---

      As recommended by the reviewer, we have modified the text in the revised manuscript (duplicated below).

      “The RNAs were 20 nt long and complementary to the non-template strand of the target gene.”

      Reviewer #2:

      (1) Rephrase this sentence in the abstract: “Because growing evidence connects PhoP with varying stress response, we hypothesized that the level of 3’,5’ cAMP, one of the most widely used second messengers, was regulated by the phoP locus, linking numerous stress responses with cAMP production”.

      As recommended by the reviewer, we have now rewritten the sentence. The modified text is incorporated in the revised manuscript (duplicated below):

      “cAMP is one of the most widely used second messengers, which impacts on a wide range of cellular responses in microbial pathogens including M. tuberculosis. Herein, we hypothesized that intra-mycobacterial cAMP level could be controlled by the phoP locus since the major regulator plays a key role in bacterial responses against numerous stress conditions.”

      Also, please see our response to specific comments #1-3 of Reviewer 1.

      (2) Line 134: please describe the complementation strain features as it is mentioned for the first time (plasmid, copy number, promoter etc.) in the manuscript. Especially under NO stress what could be the authors' justification regarding the high cAMP concentration in the complementation strain?

      As recommended by the reviewer, the details of construction of the complemented strain have been incorporated in the ‘Materials and Methods’ section of the revised manuscript (duplicated below):

      “To complement phoPR expression, pSM607 containing a 3.6- kb DNA fragment of M. tuberculosis phoPR including 200-bp phoP promoter region, a hygromycin resistance cassette, attP site and the gene encoding phage L5 integrase, as detailed earlier (Walters et al., 2006) was used to transform phoPR mutant to integrate at the L5 attB site.”

      To address the reviewer’s other concern, we have now included the following sentence in the ‘Results’ section of the revised manuscript (duplicated below):

      “A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress condition (Khan et al., 2022).”

      Reference:

      Khan et al. (2022) Convergence of two global regulators to coordinate expression of essential virulence determinants of Mycobacterium tuberculosis. eLife 2022, 11:e80965.

      (3) In Figure 1C, it is a bit confusing to see the numbers 1,2,3 and 4 and nothing is referred to these numbers in the figure legend so it's better to remove them.

      We agree with the reviewer. We have now removed the lane numbers from the figure (Fig. 1C) in the revised manuscript.

      (4) Line 852: rephrase it "insignificantly different".

      The suggested change has been made to the text. The modified text is incorporated in the manuscript (duplicated below):

      “Note that the difference in expression levels of rv0805 between WT and phoPR-KO was significant (p<0.01), whereas the fold difference in mRNA level between WT and the complemented mutant (Compl.) remains nonsignificant (not indicated).”

      (5) Line198-200: There are no open/black bars, they all are coloured bars. Correct the same. The significance test should be done for the same gene (suppose rv0805 up) in different pH conditions. Right now, it is not revealing anything and misleading.

      We apologize for the inaccuracy. We have now rectified the error. As recommended by the reviewer, Fig. 4C was modified, and the significance tests were carried out between samples involving identical promoter enrichments under different pH conditions. The modified figure, figure legend, and the relevant results have been adjusted accordingly in the revised manuscript.

      (6) Line 213: Is there any difference between this complementation strain (phoPR-KO:: phoPphoR with the one used in Figure 1A, 1B, and 2A? If yes, then please describe it.

      The same complemented mutant strain, which has been described in the ‘Materials and Methods’ section of the revised manuscript, was used in the experiments described in Fig. 1A, Fig.1B and Fig. 2A.

      (7) Line 223: Please mention the copy number and promoter of the vector construct.

      As recommended by the reviewer, we have now mentioned the promoter of the vector and incorporated new text with regard to copy number of the expression vector in the revised manuscript (duplicated below).

      “Although copy number of episomal vectors with pAl5000 origin of replication (oriM) have been reported to be 3 by Southern hybridization (Ranes et al, 1990), in this case wild-type and mutant Rv0805 proteins were expressed from single-copy chromosomal integrants (Parikh et al., 2013).”

      References

      Ranes et al., (1990) Functional analysis of pAL5000, a plasmid from Mycobacterium fortuitum: construction of a "mini" mycobacterium-Escherichia coli shuttle vector. J Bacteriol 172: 2793-2797

      Parikh et al., (2013) Development of a new generation of vectors for gene expression, gene replacement, and protein-protein interaction studies in mycobacteria. Applied and environmental microbiology 79: 1718-1729

      (8) Figure 3 - Figure Supplement 1: not sure why the authors measured mRNA levels of rv1357 and rv2387? These genes were not overexpressed!

      The mRNA levels of rv1357 and rv2387 were measured to show that overexpression of either the wild-type or mutant Rv0805 did not influence expression of other PDEs like Rv1357 and Rv2387. We have now mentioned it explicitly in the revised manuscript (duplicated below).

      “In contrast, other PDE encoding genes (rv1357 and rv2387), under identical conditions, demonstrate comparable expression levels in WT-H37Rv and rv0805 over-expressing strains.”

      (9) Line 234: Wrong interpretation it should be PDE mRNA levels in WT-Rv0805 and WT-Rv0805M.

      As recommended by the reviewer, we have now modified the statement to improve clarity (duplicated below).

      “The corresponding mRNA levels of PDEs (wild-type and the mutant) are over-expressed approximately 4.5-6 -fold relative to the genomic rv0805 level of WT-H37Rv (Figure 3-figure supplement 1A).”

      (10) Line 237: Remove the sentence "Thus, we conclude......identical expression strategy", you have already talked about why phosphodiesterase activity is crucial for cAMP concentration and it is well understood.

      Following reviewer’s recommendation, we have now removed the sentence from the revised manuscript.

      (11) Figure 3E: Authors should comment on why the cAMP concentration is not significantly changed even though the mRNA level changes are drastic (~90%). How do you correlate that? Is it because of other PDEs?

      We agree. As suggested by the reviewer, we have now incorporated new text in the revised manuscript (duplicated below).

      “We speculate that effective knocking down of phoP or rv0805 is not truly reflected in the extent of variation of cAMP levels possibly due to the presence of numerous other mycobacterial PDEs.”

      (12) Line 505,506: Is it the translation start site or the transcription start site? Because mRNA level changes are reported.

      It is the translational start sites, and gene-specific small guide RNAs were designed to inhibit mRNA expression.

      (13) Line 292: There is a difference between red and green bars. Authors should do statistical analysis and then comment on whether overexpression of WT and mutant pde are different or similar, to me they are different; also, explain why the WT-Rv0805 strain is different than the phoPR-KO strain in the context of cell wall metabolism.

      As recommended by the reviewer, we have now included statistical significance of the data in the revised version, and modified the text accordingly in the manuscript.

      Also, we included text explaining why WT-Rv0805 is different compared to phoPR-KO strain in the context of cell wall metabolism (duplicated below).

      “Together, these results suggest that both strains expressing wild type or mutant PDEs share a largely similar cell-wall properties and are consistent with (a) a recent study reporting no significant effect of cAMP dysregulation on mycobacterial cell wall structure/permeability (Wong et al., 2023), and (b) role of PhoP in cell wall composition and complex lipid biosynthesis (Walters et al., 2006; Asensio et al., 2006; Goyal et al., 2011).”

      References:

      Wong et al. (2023) Cyclic AMP is a critical mediator of intrinsic drug resistance and fatty acid metabolism in M. tuberculosis. eLife 2023; 12: e81177

      Walters et al. (2006) The Mycobacterium tuberculosis PhoPR two-component system regulates genes essential for virulence and complex lipid biosynthesis. Mol Microbiol 60: 312-330

      Asensio et al. (2006) The Virulence-associated Two-component PhoP-PhoR System Controls the Biosynthesis of Polyketide-derived Lipids in Mycobacterium tuberculosis. J Biol Chem 281: 1313-1316.

      Goyal et al. (2011) Phosphorylation of PhoP protein plays direct regulatory role in lipid biosynthesis of Mycobacterium tuberculosis. J Biol Chem 286: 45197-45208

      (14) Line 299-303: Authors should explain how the colocalization % are calculated. Also, in the figure 4D merge panel please highlight the difference.

      As suggested by the reviewer, we have now explained the methodology used to calculate percent colocalization in greater details. Also, we have modified Figure 4D to highlight the difference between samples shown in merge panel. Please see our response to comment # 33 from the Reviewer 1.

      (15) General comment: There are multiple instances where writing needs to be improved.

      We are sorry for the inaccuracies. We have now done thorough editing of the manuscript and made numerous corrections throughout.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The weaknesses are the brevity of the simulations, the concomitant lack of scope of the simulations, the lack of depth in the analysis, and the incomplete relation to other relevant work.

      A 1 µs simulation of CCh (Video 1, part 2) shows that m3 (ACHA) is stable, throughout. The DG comparisons, in silico versus in vitro, indicate that 200 ns simulations are sufficient to identify LA versus HA conformational populations. Figure 6-table supplement 1 shows distances. New citations have been added.

      Reviewer #2 (Public Review):

      Weaknesses:

      After carrying out all-atom molecular dynamics, the authors revert to a model of binding using continuum Poisson-Boltzmann, surface area, and vibrational entropy. The motivations for and limitations associated with this approximate model for the thermodynamics of binding, rather than using modern atomistic MD free energy methods (that would fully incorporate configurational sampling of the protein, ligand, and solvent) could be provided. Despite this, the authors report a correlation between their free energy estimates and those inferred from the experiment. This did, however, reveal shortcomings for two of the agonists. The authors mention their trouble getting correlation to experiment for Ebt and Ebx and refer to up to 130% errors in free energy. But this is far worse than a simple proportional error, because -24 Vs -10 kcal/mol is a massive overestimation of free energy, as would be evident if the authors were to instead express results in terms of KD values (which would have an error exceeding a billion fold). The MD analysis could be improved with better measures of convergence, as well as a more careful discussion of free energy maps as a function of identified principal components, as described below. Overall, however, the study has provided useful observations and interpretations of agonist binding that will help understand pentameric ligand-gated ion channel activation.

      The objective of the calculations was to identify structural populations, not to estimate binding free energies. We knew the actual LA and HA energies (for all 4 agonists) from real-world electrophysiology experiments. We conclude that the simple PBSA method worked as a tool for identification because the calculated efficiencies match those from experiments (Figure 4B, Figure 4-Source Data 1). We discuss the mismatches in absolute G in the Results and Discussion. Methods for estimating experimental binding free energies are described in a cited, eLife companion paper. The G ratio relates to agonist efficiency.

      Main points:

      Regarding the choice of model, some further justification of the reduced 2 subunit ECD-only model could be given. On page 5 the authors argue that, because binding free energies are independent of energy changes outside the binding pocket, they could remove the TMD and study only an ECD subunit dimer. While the assumption of distant interactions being small seems somewhat reasonable, provided conformational changes are limited and localised, how do we know the packing of TMD onto the ECD does not alter the ability of the alpha-delta interface to rearrange during weak or strong binding? They further write that "fluctuations observed at the base of the ECD were anticipated because the TMD that offers stability here was absent.". As the TMD-ECD interface is the "gating interface" that is reshaped by agonist binding, surely the TMD-ECD interface structure must affect binding. It seems a little dangerous to completely separate the agonist binding and gating infrastructure, based on some assumption of independence. Given the model was only the alpha and delta subunits and not the pentamer with TMD, I am surprised such a model was stable without some heavy restraints. The authors state that "as a further control we carried out MD simulation of a pentamer docked with ACh and found similar structural changes at the binding pocket compared to the dimer." Is this sufficient proof of the accuracy of the simplified model? How similar was the model itself with and without agonist in terms of overall RMSD and RMSD for the subunit interface and the agonist binding site, as well as the free energy of binding to each model to compare?

      The statement that distant interactions are small is not an "assumption", but rather a conclusion based on data. Mutant cycle analysis of 83 pairs shows (with a few exceptions) non-additivity of free energy change prevails only with separations <~15 A (Fig.3 in Gupta et al 2017). Regardless, the adequacy of dimers and convergence by 200 ns are supported by the calculated and experimental agonist efficiencies match (Figure 4B) and the 1 ms simulation (Video 1 part 2). Apo 200ns simulation of the ECD dimer is now added (Figure 2-figure supplement 2) and the dimer interface seems to be adequate (stable).

      Although the authors repeatedly state that they have good convergence with their MD, I believe the analysis could be improved to convince us. On page 8 the authors write that the RMSD of the system converged in under 200 ns of MD. However, I note that the graph is of the entire ECD dimer, not a measure for the local binding site region. An additional RMSD of local binding site would be much more telling. You could have a structural isomerisation in the site and not even notice it in the existing graph. On page 9 the authors write that the RMSF in Figure S2 showed instability mainly in loops C and F around the pocket. Given this flexibility at the alpha-delta interface, this is why collecting those regions into one group for the calculation of RMSD convergence analysis would have been useful. They then state "the final MD configuration (with CCh) was well-aligned with the CCh-bound cryo-EM desensitized structure (7QL6)... further demonstrating that the simulation had converged." That may suggest a change occurred that is in common with the global minimum seen in cryo EM, which is good, but does not prove the MD has "converged". I would also rename Figure S3 accordingly.

      The description is now changed to “aligns well” with desensitized structure (7QL6.PDB)”. RMSD of not just the binding pocket but the whole ECD dimer is well aligned with first apo (m1) and with desensitized state (m3).

      The authors draw conclusions about the dominant states and pathways from their PCA component free energy projections that need clarification. It is important first to show data to demonstrate that the two PCA components chosen were dominant and accounted for most of the variance. Then when mapping free energy as a function of those two PCA components, to prove that those maps have sufficient convergence to be able to interpret them. Moreover, if the free energies themselves cannot be used to measure state stability (as seems to be the case), that the limitations are carefully explained. First, was PCA done on all MD trajectories combined to find a common PC1 & PC2, or were they done separately on each simulation? If so, how similar are they? The authors write "the first two principal components (PC-1 and PC-2) that capture the most pronounced C. displacements". How much of the total variance did these two components capture? The authors write the changes mostly concern loop C and loop F, but which data proves this? e.g. A plot of PC1 and PC2 over residue number might help.

      The PCA analyses have been enriched. Figure 3-Source Data 1. shows the dominance of PC1 and PC2. Because the binding energy match was sufficient to identify affinity states, we did not explore additional PCs. Residue-wise PC1 and PC2 analysis and comparison with RMSF are in Figure 2-figure supplement 2. PC1 and PC2 both correlate with fluctuations in loops C and F. Overlap analysis in different runs is shown in Figure 3-figure supplement 1. Lower variance in a particular region of the PCA landscape indicates that the system frequently visits these states, suggesting stability (a preference for these conformations).

      The authors map the -kTln rho as a free energy for each simulation as a function of PC1 & PC2. It is important to reveal how well that PC1-2 space was sampled, and how those maps converged over time. The shapes of the maps and the relative depths of the wells look very different for each agonist. If the maps were sampled well and converged, the free energies themselves would tell us the stabilities of each state. Instead, the authors do not even mention this and instead talk about "variance" being the indicator of stability, stating that m3 is most stable in all cases. While I can believe 200ns could not converge a PC1-2 map and that meaningful delta G values might not be obtained from them, the issue of lack of sampling must be dealt with. On page 12 they write "Although the bottom of the well for 3 energy minima from PCA represent the most stable overall conformation of the protein, they do not convey direct information regarding agonist stability or orientation". The reasons why not must be explained; as they should do just that if the two order parameters PC1 and PC2 captured the slowest degrees of freedom for binding and sampling was sufficient. The authors write that "For all agonists and trajectories, m3 had the least variance (was most stable), again supporting convergence by 200 ns." Again the issue of actual free energy values in the maps needs to be dealt with. The probabilities expressed as -kTln rho in kcal/mol might suggest that m2 is the most stable. Instead, the authors base stability only on variance (I guess breadth of the well?), where m3 may be more localised in the chosen PC space, despite apparently having less preference during the MD (not the lowest free energy in the maps).

      The motivations and justifications for the use of approximate PBSA energetics instead of atomistic MD free energies should be dealt with in the manuscript, with limitations more clearly discussed. Rather than using modern all-atom MD free energy methods for relative or absolute binding free energies, the author selects clusters from their identified states and does Poisson-Boltzmann estimates (electrostatic, vdW, surface area, vibrational entropy). I do believe the following sentence does not begin to deal with the limitations of that method: "there are limitations with regard to MM-PBSA accurately predicting absolute binding free energies (Genheden & Ryde, 2015; Hou et al., 2011) that depends on the parameterization of the ligand (Oostenbrink et al., 2004)." What are the assumptions and limitations in taking continuum electrostatics (presumably with parameters for dielectric constants and their assignments to regions after discarding solvent), surface area (with its assumptions and limitations), and of course assuming vibration of a normal mode can capture entropy. On page 30, regarding their vibrational entropy estimate, they write that the "entropy term provides insights into the disorder within the system, as well as how this disorder changes during the binding process". It is important that the extent of disorder captured by the vibrational estimate be discussed, as it is not obvious that it has captured entropy involving multiple minima on the system's true 3N-dimensional energy surface, and especially the contribution from solvent disorder in bound Vs dissociated states.

      As discussed above, errors in the free energy estimates need to be more faithfully represented, as fractional errors are not meaningful. On page 21 the authors write "The match improved when free energy ratios rather than absolute values were compared." But a ratio of free energies is not a typical or expected measure of error in delta G. They also write "For ACh and CCh, there is good agreement between.Gm1 and GLA and between.Gm3 and GHA. For these agonists, in silico values overestimated experimental ones only by ~8% and ~25%. The agreement was not as good for the other 2 agonists, as calculated values overestimated experimental ones by ~45%(Ebt) and ~130% (Ebt). However, the fractional overestimation was approximately the same for GLA and GHA." See the above comment on how this may misrepresent the error. On page 21 they write, in relation to their large fractional errors, that they "do not know the origin of this factor but speculate that it could be caused by errors in ligand parameterization". However the estimates from the PBSA approach are, by design, only approximate. Both errors in parameterisation (and their likely origin) and the approximate model used, need discussion.

      Again, the goal of calculating binding free energy was to identify structural correspondence to LA and HA and not to obtain absolute binding free energy values. Along with the least variance (distribution) for the principle component for m3, it also had the highest binding free energy. An association of m1 to LA and m3 to HA was done after comparing them to experimental values (efficiencies). This comparison not only validates our approach but also underscores the utility of PBSA in supplementing MD and PCA analyses with broader energetics perspectives.

      Reviewer #3 (Public Review):

      Weaknesses:

      Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for two other ligands were significantly different than the experiment. It is unclear to what extent the choice of method for the energy calculations influenced the results. See above.

      A control simulation, such as for an apo site, is lacking. Figure 2-figure supplement 2. shows the results of 200 ns MD simulations of the apo structure (n=2).

      Reviewer #4 (Public Review):

      Weaknesses:

      Timescales (200 ns) do not capture global rearrangements of the extracellular domain, let alone gating transitions of the channel pore, though this work may provide a launching point for more extended simulations. A more general concern is the reproducibility of the simulations, and how representative states are defined. It is not clear whether replicates were included in principal component analysis or subsequent binding energy calculations, nor how simulation intervals were associated with specific states.

      We are interested eventually in using MD to study the full isomerization, but these investigations are for the future and likely will involve full length pentamers and longer timescales. However, in response to this query we have in the Discussion raised this issue and offer speculations. See above, PCA has be compared between replicates (Figure 3-figure supplement 1).

      Structural analysis largely focuses on snapshots, with limited direct evidence of consistency across replicates or clusters. Figure legends and tables could be clarified.

      Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories. Incorporated in the legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study gives interesting insights into the possible dynamics of ligand binding in ACh receptors and establishes some prerequisites for necessary and urgent further work. The broad interest in this receptor class means this work will have some reach.

      Suggestions:

      (1) I found the citation of relevant literature to be rather limited. In the following paper, the agonist glutamate was shown to bind in two different orientations, and also to convert. These are much longer simulations than what is presented here (nearly 50 µs), which allowed a richer view of conformational changes and ligand binding dynamics in the AMPA Receptor. Albert Lau has published similar work on NMDA, delta, and kainate receptors, including some of it in eLife. Perhaps the authors could draw some helpful comparisons with this work.

      Yu A et al. (2018) Neurotransmitter Funneling Optimizes Glutamate Receptor Kinetics. Neuron

      Likewise, the comparison to a similar piece of work on glycine receptors (not cited, https://pubs.acs.org/doi/10.1021/bi500815f) could be instructive. Several similar computational techniques were used, and interactions observed (in the simulations) between the agonist and the receptor were tested in the context of wet experiments. In the absence of an equivalent process in this paper (no findings were tested using an orthogonal approach, only compared against known results, from perhaps a narrow spectrum of papers), we have to view the major findings of the paper (docking in cis that leads to a ligand somersault) with some hesitancy.

      The Gharpure 2019 paper is cited in the context of the delta subunit but this paper was about a3b4 neuronal nicotinic receptors. This could be tidied up. Also, the simulations from that paper could be used as an index of the stability of the HA state (if ligand orientation is being cited as transferrable, other observations could be too).

      New citations have been added. It is difficult to generalize from Yu A and Yu R eta al, because in neither study was the ligand orientation associated with LA versus HA binding energy.

      (2) "To start, we associated the agonist orientation in the hold end states as cis in AC-LA versus trans in AC-HA."

      I think this a valid start, but one is left with the feeling that this is all we have and the validity of the starting state is not tested. What was really shown here? Is the docking reliable? What evidence can the authors summon for the ligand orientation that they use as a starting structure? In addition to docking energies, the match between PBSA and electrophysiology Gs and temporal sequence (m1-m2-m3) support the assignment.

      Given that these simulations cover a circumscribed part of the binding process, I think the limitations should be acknowledged. Indeed the authors do mention a number of remaining open questions.

      Paragraphs regarding 'catch' have been added to the Discussion.

      (3) Results around line 90. Hypothetical structures and states that were determined from Markov analyses are discussed as if they are well understood and identified. Plausible though these are, I think the text should underline at least the source of such information. In these simulations, a further intermediate has been identified.

      The model in Figure 1B was first published in 2012 and has been used and extended over the intervening years. In our lab, catch-and-hold is standard. We have published many papers (in top journals), plus reviews, regarding this scheme. We made presentations that are on Youtube. Here, at the end of the Introduction we now cite a new review article (Biophysical Journal, 2024). I am not sure what more we can do to raise awareness regarding catch and hold.

      (4) The figures are dense and could be better organised. Figure 2 is key but has a muddled organization. The placement of the panel label (C) makes it look like the top row (0 ns) is part of (A). Panel B- what is shown in the oval inset (not labeled or in legend). Why not show more than one view, perhaps a sequence of time points? It is confusing to change the colour of the loops in (C). Please show the individual values in D.

      Figure 2 has been redone.

      (5) A lot is made of the aK145 salt bridge with aD200 and the distances - but I didn't see any measurements, or time course. This part is vague to the point of having no meaning ("bridge tightening").

      We present a Table of distance measurements in the SI (Figure 6-table supplement 1).

      Reviewer #2 (Recommendations For The Authors):

      All main comments have been given in the above review. There are a few other minor comments below.

      The 4 agonists examined were acetylcholine (ACh), carbamylcholine (CCh), epibatidine (Ebt), and epiboxidine (Ebx). Could the choices be motivated for the reader?

      New in Methods: the agonists are about the same size yet represent different efficiency classes (citation to companion eLife paper). One of our (unmet) objectives was to understand the structural correlates of agonist efficiency.

      The authors write that state structures generated in the MD simulation were identified by aligning free energy values with those from experiments. It would be good to explain to the reader, in the introduction, how LA and HA free energies were extracted from experiments, rather than relying on them to read older papers.

      In the Introduction, we say that to get G, just measure an equilibrium constant and take the log. We think it is excessive to explain in detail in this paper how to measure the equilibrium binding constants (several methods suffice). However, we have added in Methods our basic approach: measure KLA and L2 by using electrophysiology, and compute KHA from the thermodynamic cycle using L0. We think this paper is best understood in the context of its companion, also in eLife.

      In all equilibrium equations of the type A to B (e.g. on page 5), rather than using "=" signs it would be much better to use equilibrium reversible arrow symbols.

      It is incorporated.

      Reviewer #3 (Recommendations For The Authors):

      (1) Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for Ebt and Ebx were significantly different than the experiment. Are there any alternative methods for calculating binding energies from the MD simulations that could be readily compared to?

      See above. We did not use more sophisticated energy calculations because we already knew the answers. Our objective was to identify states, not to calculate energies.

      (2) It would be nice to see control simulations of an apo site to ensure that the conformational changes during the MD are due to the ligands and not an artifact of the way the system is set up. I am primarily asking about this as the simulation of the isolated ECDs for the binding site interface seems like it may be unhappy without the neighboring domains that would normally surround it. On that note, was the protein constrained in any way during the MD?

      Apo simulation results are presented in Figure 2-figure supplement 2. The dimer interface seems to be adequate (stable).

      (3) Figure 4A-B: Should the colors for m1 and m3 be reversed?

      Colors have been changed and a bar chart has been added.

      Reviewer #4 (Recommendations For The Authors):

      (1) Although simulations are commendably run in triplicate, it is difficult in some places to discern their consistency.

      (1a) Table S1 provides important quantification of deviations in different replicates and with different agonists. Please confirm that the reported values are accurate. All values reported for the epibatidine system are identical to those reported for carbamylcholine, which seems statistically improbable. Similarly, runs 1 and 3 with epiboxidine seem identical to one another, and runs 1 and 2 with acetylcholine are nearly the same.

      Figure 2-Source Data 1 has been corrected.

      (1b) In reference to Figure S3, the authors comment that the simulated system (one replicate with carbamylcholine) converges within 0.5 Å RMSD of a desensitized experimental structure. This seems amazing; please specify over what atoms this deviation was calculated and with reference to what alignment. It would be interesting to know the reproducibility of this remarkable convergence in additional replicates or with other ligands; for example, Figure 5 indicates that loop C transitions to a lesser extent in the context of epibatidine than other agonists.

      The comparison was for the entire dimer ECD; 0.5 Å is the result. It may be worthwhile to pursue this remarkable convergence, but not in this paper. Here, we are concerned with identifying ACLA and ACHA. Similarity between ACHA and AD structures is for a different study.

      (1c) For principal-component and subsequent analyses, it appears that only one trajectory was considered for each system. Please clarify whether this is the case; if so, a rationale for the selection would be helpful, and some indication of how reproducible other replicates are expected to be.

      We have added new PCA results (Results, Figure 3-figure supplement 1) that show comparable principal components in other replicates.

      (2) Figure 3 shows free energy landscapes defined by principal components of fluctuation in Cα positions.

      (2a) Do experimental structures (e.g. PDB IDs 6UWZ, 7QL6u) project onto any of these landscapes in informative ways?

      6UWZ.pdb matches well with the apo (7QKO.pdb), comparable to m1, and 7QL6.pdb with the m3.

      (2b) Please indicate the meaning of colored regions in the righthand panels.

      The color panels in the top left panel indicate the colored regions in the righthand panel also, which is indicative of direction and magnitude of changes with PC1 and PC2.

      (2c) Please also check the legend; do the porcupine plots really "indicate the direction and magnitude of changes between PC1 and PC2," or rather between negative and positive values of each principal component?

      It indicates the direction and magnitude of changes with PC1 and PC2.

      (3) It would be helpful to clarify how trajectory segments were assigned to specific minima, particularly m2 and m3.

      (3a) Please verify the timeframes associated with the m2 minima, reported as "20-50 ns [with acetylcholine], 50-60 ns [with carbamylcholine], 60-100 ns [with epibatidine, and] 100-120 ns [with epiboxidine]." It seems improbable that these intervals would interleave so precisely in independent systems. Furthermore, the intervals associated with acetylcholine and epiboxidine do not appear to correspond to the m2 regions indicated in Figure S8.

      Times are given in Figure 4-Source Data 1 and Figure 3-figure supplement 2. The m2 classification is based on loop displacement as well as agonist orientation. For all agonists, the selection was strictly from PCA and cluster analysis.

      (3b) The text (and legend to Figure 3) indicate that 180+ ns of each trajectory was assigned to m3, which seems surprisingly consistent. However, Figure S5 indicates this minimum is more variable, appearing at 160 ns with acetylcholine but at 186 ns with carbamylcholine. Please clarify.

      see above: the selection was from PCA and cluster analysis. Times are in Figure 3-figure supplement 2 and also in Figure 4-Source Data 1 (none in Fig. 3 legend).

      (3c) Figures 5, 6, S6, and S7 illustrate structural features of free-energy minima in each ligand system. Please clarify what is shown, e.g. a representative snapshot, centroid, or average structure from a particular prominent cluster associated with a given minimum.

      They are all representative snapshots (now in Methods). Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories.

      (4) Figure S4 helpfully shows the behavior of a pentameric control system; however, some elements are unclear.

      (4a) The 2.5-6.5 Å jump in RMSD at ~40 ns seems abrupt; can it be clarified whether this corresponds to a transition to either m2 or m3 poses, or to another feature of e.g. alignment?

      Figure 2-figure supplement 4 left bottom is just the ligand. The jump is the flip, m1 to m2.

      (4b) It seems difficult to reconcile the apparently bimodal distribution of states with the proposed 3-state model. Into which RMSD peak would the m2 intermediate fall?

      The simulations are only to 100 ns, where we found a complete flip of the agonist represented in the histograms. This confirmed that dimer showed similar pattern as the pentamer. In depth analysis was only done only on dimers.

      (4c) The top panel is labeled "Com" with a graphical legend indicating "ACh." Does this indicate the ligand or, as described in the text legend, "the pentamer" (i.e. the receptor)? For both panels, please verify whether they are calculated on the basis of center-of-mass, heavy atoms, Cα, etc.

      "Com" (for complex) has been changed to system (protein+ligand).

      (5) Minor concerns:

      (5a) In Figures 1 and S3, correct the PDB references (6UWX and 7QL7 are not nAChRs).

      They are now corrected.

      (5b) In Figure 4, do all panels represent mean {plus minus} standard deviation calculated across all cluster-frames reported in Table 1?

      Yes.

      Also check the graphical legend in panel A: presumably the red bars correspond to m1/LA, and the blue to m3/HA?

      Corrected

      (5c) In the legend to Figure S1, please clarify that panel B is reproduced from Indurthi & Auerbach 2023.

      This figure has been deleted.

      (5d) As indicated in Figure S2, it seems surprising that the RMSF is so apparently low at the periphery, where the subunits should contact neighbors in the extracellular domain; how might the authors account for this? Specify whether these results apply to all replicates of each system.

      The redness in the periphery for all four systems indicates the magnitude of fluctuation. As we focus on the orthosteric site, we highlight the loops around the agonist binding pocket and kept other regions 75% transparent. We now include Apo simulations and the dimer appears to be stable even without an agonist present.

      (5e) Within each minimum in Figure S5, three "prominent" clusters appear to be colored (by heteroatom) with carbons in cyan, pink, and yellow respectively. If this is correct, note these colors in the text legend.

      Colors have been added to the legend.

      (5f) In Figure S6, note in the legend that key receptor sidechains are shown as spheres, with the ligand as balls-and-sticks, and that ligand conformations in both low- and high-affinity complexes are shown in both receptor states for comparison.

      This is now added in the legend.

      (5g) The legend to Figure S6 also notes "The agonists are as in Fig S4," but that figure contains a single replicate of a different system; please check this reference.

      This has been updated to Figure 5.

      (5h) In Figure S8, the colors in the epibatidine system appear different from the others.

      The colors are the same for m1, m2 and m3 in all systems including epibatidine.

      (5i) In Table 1, does "n clusters" indicate the number of simulation frames included in the three prominent clusters chosen for MM-PBSA analysis? Perhaps "n frames" would be more clear.

      It was a good suggestion. It has now been changed to ‘n frames’

      (5j) Pg 24-ln 453 presumably should read "...that separate it from m1 and m3..."

      This sentence is now changed in the discussion.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you and the two reviewers for the thorough review of our manuscript. We thank you very much for the positive evaluation of our manuscript and your encouragement to continue in this fascinating topic. In this version we made minor changes in the text to address the comments and suggestion of the second reviewer and increase the clarity of the text.

      Reviewer #2 Recommendation to the authors

      We thank the reviewer for the sharp comments that help us improve the clarity of the paper. Below we list the changes we made to correct and revise the paper in accordance to the reviewer’s comments.

      (1) Line 90. Isn't the genus Paracentrotus?

      Yet it is, thank you. We corrected the typo.

      (2) Figure 1 and supplementary figure 2. To this reviewer supplementary Figure 2 doesn't really help the story as written in the paragraph from line 96-110. You want to report expression of ROCK in skeletogenic cells. You do that quite well in Figure 1. Since Fig. S2 reports whole embryo expression of ROCK when only 5% of the cells in the embryo are the subject of interest here, and the Axitinib is selective, presumably for skeletogenic cells, the relative lack of effect in Fig. S2 is not surprising and again, doesn't really help the theme you wish to establish by focusing on the role of ROCK in skeletogenic cells over time. If anything, the data reported in Fig. S2 shows that perturbation of VEGF signaling has very little effect embryo-wide, while Fig. 1 shows that perturbation of VEGF signaling has a noticeable effect on ROCK expression in skeletogenic cells. If you choose to keep Fig. S2, I recommend that you indicate that embryo-wide vs skeletogenic cell difference more succinctly than given at present. It will also strengthen your paragraph in lines 110-127.

      The importance of the western blot presented in Fig. S2 is to validate that the antibody recognizes a protein of the expected size. This strengthen the credibility of this commercial antibody to detect the sea urchin ROCK protein. We agree with the reviewer that the fact that the skeletogenic cells are less than 5% of the embryonic cells is important to explain why we didn’t see an affect of VEGFR inhibition in the western blot, and we changed the text to express it (lines 108-111): “Yet, this measurement was done on proteins extracted from whole embryos, of which the skeletogenic cells, where VEGFR is active, are less than 5% of the total cell mass (42). We therefore wanted to study the spatial expression of ROCK and specifically, its regulation in the skeletogenic cells.”

      (3) Comparison of Fig. 2 and Fig. S3. To me the reader is confused when Fig. S3 is 33hpf as reported in the text (but not in the figure legend), and Fig. 2 shows 2 day old embryos - on the figure and figure legend but not in the text. So, the reader sees the text indicating 33hpf and looks around and the figure 2 says 2dpf. Does that mean 33hpf = 2dpf, the reader is thinking. To clarify, I suggest including the 2dpf in the text or simply drop the time in the text and report it in the two figures. Further, in the middle of the paragraph 130-143 you switch from reporting on Fig.S3 to Fig. 2, yet the reader doesn't know that. The reader is still looking at Fig. S3. The problem here is that at 33hpf the skeleton doesn't yet show the reduction or abnormalities that are shown later at 2dpf in Fig. 2. In clarifying this paragraph both the reduction in ROCK expression and the subsequent alterations in growth and patterning of the skeleton will be clear to the reader.

      Thank you for raising this point. We added in the caption of Fig. S3 that the measurements were done in 33hpf. We also added in the text, that the observations of the skeletogenic phenotypes were done at 2dpf (48hpf). We made a break between the first paragraph discussing Fig. S3 and the paragraph discussing Fig. 2.

      (4) The experiment with Y27632, an inhibitor of ROCK, is significantly improved in this revision. The concern earlier was the possibility that at the concentration used there might be off-target effects since other kinases are affected by higher concentrations of this selective inhibitor. The authors have modified this component of the paper and performed experiments at lower concentrations where other reports indicate the inhibitor is highly selective for ROCK, and they still demonstrate an inhibition of skeletal production. This, plus the added citations greatly increases confidence that this inhibition is selective for ROCK, thus enabling a stronger conclusion that ROCK has a role in skeletal growth and patterning.

      Thank you for asking us to test this lower concentration which improved the credibility of our findings.

      Line 239 - should be: indicating instead of indicting We corrected that.

      (5) Line 402-403."The first step in generating the sea urchin spicules is the construction of the spicule cavity, a membrane filled with calcium carbonate and coated with F-actin (Fig. 8A)". I suggest more precise language. The way this now reads (above) is that somehow the spicule cavity is a membrane and that membrane is filled with CaCO3. And further the membrane is coated with F-actin. Isn't the spicule cavity what is filled with CaCO3? And isn't that cavity surrounded by a membrane? And the F-actin must be in the cortex of the cell since there is very little cytoplasm associated with the pseudopodial extensions that surround the spicule.

      We change this sentence to: “The first step in generating the sea urchin spicules is the construction of the spicule cavity where the mineral is engulfed in a membrane coated with F-actin” (lines 403-404). Our observations show that F-actin is enriched around the spicule cavity. It could be an extension of the cell cortex, but we did not prove it, so we prefer to simply describe what we saw.

      Line 405-408. Thank you for putting in this unknown. It is important to point out that while you've shown that ROCK contributes to regulation of actomyosin, it is not clear whether this is direct or indirect. You have also shown that ROCK somehow contributes to regulation of the GRN that leads to skeletogenesis. Thus, your data are consistent in showing that ROCK perturbation cripples normal skeletogenesis both via morpholino and with a selective inhibitor. Your last part of the discussion then offers speculation as to what might be affected specifically. That discussion sets the stage for digging even deeper to identify specific targets of ROCK activity.

      Thank you, we agree with you that there is an exciting road ahead of us!

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We will make the suggested change of adding “sequence”. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. And the example given is not helpful to ascertain what other changes may be necessary, since we cannot see any problem with the sentence “we assembled a chromosome-scale genome of” as this phrase is widely used in many similar publications.

      As for panel E of figure 2, it is not missing. The panel located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      • Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced, individual tissues derived from juvenile fish were used for the genome annotation and whole larval fish for the developmental analysis. We will specify in the figures and text that the results shown are from whole larvae, and add more detail to the material and methods section about which type of sample was analysed in which way.

      • The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We will upload the code used to assemble and annotate this genome to a public repository or add it to the supplementary material.

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material. The annotation workflow has been described in a previous publication already, but an in-depth description can also be found in the Material and methods section, including parameters used for specific steps. The RNA-seq mapping and analysis part has also been described in the Material and Methods section, including parameters and models for DESEq2.

      • Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications. This would really help (general) readers.

      We will add a comment on this in the manuscript as suggested. Basically, the T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormone-mediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      • Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      As the reviewer notes, there are a large number of differentially expressed genes due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60.

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). Normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/): “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods; an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the possible interaction of corticoids and TH during metamorphosis, a question that is certainly not settled yet in teleost fishes, and 2) the existence of an early larval developmental step also involving TH and corticosteroids.

      Especially 2) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of our study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remain a contentious issue.

      We wrote that “there is contrasting evidence of communication between these two pathways [TH and corticosteroids] in teleost fish with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish (ref 19), golden sea bream (ref 100) and silver sea bream (ref 101). Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder (ref 20). TH exposure increases MR and GR genes expression in zebrafish embryo (ref 55). It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner (ref 56) On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp, decreasing cortisol levels (ref 57), whereas cortisol exposure decreases TH levels in European eel (ref 58). Given this scattered evidence, the existence of a crosstalk active during teleost metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis and cortisol synthesis are activated (i) during early development and (ii) during metamorphosis. This may suggest that in some aspect cortisol synthesis can work in concert with TH, as has been shown in several different contexts in amphibians (ref 17).” In the revised manuscript, we will also add the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field.

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiment in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period.

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. We will make this clear in the revised manuscript.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption, 2) that there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) that we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. To reinforce our point, we plan to add a figure to the revised manuscript, which puts our data in the context of earlier studies done in grouper. This will clearly show that our inference is correct. Additionally, we would like to point out that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. We will make sure to improve the clarity of the revised version of the manuscript to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here.

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We will improve the clarity of the manuscript to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared line 195 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction.

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We will correct this in the revised version.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). We are planning to add a figure reconciling all these data together. However, the main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we will pay attention to improve that part too.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message is unclear, which we will address in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We will also make sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.

    1. Author Response

      We provide here a provisional response to the Public Comments and main issues raised by the reviewers. We appreciate the opportunity to submit a revision and will give all of the reviewers’ comments careful consideration when modifying the manuscript.

      (1) BioRxiv version history.

      Reviewer 1 correctly noted that we have posted different versions of the paper on bioRxiv and that there were significant changes between the initial version and the one posted as part of the eLife preprint process. Here we provide a summary of that history.

      We initially posted a bioRxiv preprint in November, 2021 (Version 1) that included the results of two experiments. In Experiment 1, we compared conditions in which the stimulation frequency was at 2 kHz, 3.5 kHz, or 5.0 kHz. In Experiment 2, we replicated the 3.5 kHz condition of Experiment 1 and included two amplitude-modulated (AM) conditions, with a 3.5 kHz carrier signal modulated at 20 Hz or 140 Hz. Relative to the sham stimulation, non-modulated kTMP at 2 kHz and 3.5 kHz resulted in an increase in cortical excitability in Experiment 1. This effect was replicated in Experiment 2.

      In the original posting, we reported that there was an additional boost in excitability in the 20 Hz AM condition above that of the non-modulated condition. However, in re-examining the results, we recognized that the 20 Hz AM condition included an outlier that was pulling the group mean higher. We should have caught this outlier in the initial submission given that the resultant percent change for this individual is 3 standard deviations above the mean. Given the skew in the distribution, we also performed a log transform on the MEPs (which improves the normality and homoscedasticity of MEP distributions) and repeated the analysis. However, even here the participant’s results remained well outside the distribution. As such, we removed this participant and repeated all analyses. In this new analysis, there was no longer a significant difference between the 20 Hz AM and nonmodulated conditions in Experiment 2. Indeed, all three true stimulation conditions (nonmodulated, AM 20 Hz, AM 140 Hz) produced a similar boost in cortical excitability compared to sham. Thus, the results of Experiment 2 are consistent with those of Experiment 1, showing, in three new conditions, the efficacy of kHz stimulation on cortical excitability. But the results fail to provide evidence of an additional boost from amplitude modulation.

      We posted a second bioRxiv preprint in May, 2023 (Version 2) with the corrected results for Experiment 2, along with changes throughout the manuscript given the new analyses.

      Given the null results for the AM conditions, we decided to run a third experiment prior to submitting the work for publication. Here we used an alternative form of amplitude modulation (see Kasten et. al., NeuroImage 2018). In brief, we again observed a boost in cortical excitability in from non-modulated kTMP at 3.5 kHz, but no additional effect of amplitude modulation. This work is included in the third bioRrxiv preprint (Version 3), the paper that was submitted and reviewed at eLife.

      (2) Statistical analysis.

      Reviewer 1 raised a concern with the statistical analyses performed on aggregate data across experiments. We recognize that this is atypical and was certainly not part of an a priori plan. Here we describe our goal with the analyses and the thought process that led us to combine the data across the experiments.

      Our overarching aim is to examine the effect of corticospinal excitability of different kTMP waveforms (carrier frequency and amplitude modulated frequency) matched at the same estimated cortical E-field (2 V/m). Our core comparison was of the active conditions relative to a sham condition (E-field = 0.01 V/m). We included the non-modulated 3.5 kHz condition in Experiments 2 and 3 to provide a baseline from which we could assess whether amplitude modulation produced a measurable difference from that observed with non-modulated stimulation. Thus, this non-modulated condition as well as the sham condition was repeated in all three experiments. This provided an opportunity to examine the effect of kTMP with a relatively large sample, as well as assess how well the effects replicate, and resulted in the strategy we have taken in reporting the results.

      As a first step, we present the data from the 3.5 kHz non-modulated and sham conditions (including the individual participant data) for all three experiments in Figure 4. We used a linear mixed effect model to examine if there was an effect of Experiment (Exps 1, 2, 3) and observed no significant difference within each condition. Given this, we opted to pool the data for the sham and 3.5 kHz non-modulated conditions across the three experiments. Once data were pooled, we examined the effect of the carrier frequency and amplitude modulated frequency of the kTMP waveform.

      (3) Carry-over effects

      As suggested by Reviewer 1, we will examine in the revision if there is a carry-over effect across sessions (for the most part, 2-day intervals between sessions). For this, we will compare MEP amplitude in baseline blocks (pre-kTMP) across the four experimental sessions.

      Reviewer 1 also commented that mixing the single- and paired-pulse protocols might have impacted the results. While our a priori focus was on the single-pulse results, we wanted to include multiple probes given the novelty of our stimulation method. Mixing single- and different paired-pulse protocols has been relatively common in the noninvasive brain stimulation literature (e.g., Nitsche 2005, Huang et al, 2005, López-Alonso 2014, Batsikadze et al 2013) and we are unaware of any reports suggested that mixed designs (single and paired) distort the picture compared to pure designs (single only).

      (4) Sensation and Blinding

      Reviewer 2 bought up concerns about the sham condition and blinding of kTMP stimulation. We do think that kTMP is nearly ideal for blinding. The amplifier does emit an audible tone (at least for individuals with normal hearing) when set to an intensity to produce a 2 V/m E-field. For this reason, the participants and the experimenter wore ear plugs. Moreover, we played a 3.5 kHz tone in all conditions, including the sham condition, which effectively masked the amplifier sound. We measured the participant’s subjective rating of annoyance, pain, and muscle twitches after each kTMP session (active and sham). Using a linear mixed effect model, we found no difference between active and sham for each of these ratings suggesting that sensation was similar for active and sham (Fig 8). This matches our experience that kHz stimulation in the range used here has no perceptible sensation induced by the coil. To blind the experimenters (and participants) we used a coding system in which the experimenter typed in a number that had been randomly paired to a stimulation condition that varied across participants in a manner unknown to the experimenter.

      Reviewer 1 asked why we did not explicitly ask participants if they thought they were in an active or sham condition. This would certainly be a useful question. However, we did not want to alert them of the presence of a sham condition, preferring to simply describe the study as one testing a new method of non-invasive brain stimulation. Thus, we opted to focus on their subjective ratings of annoyance, pain, and finger twitches after kTMP stimulation for each experimental session.

    1. Author Response

      Provisional Response to Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The work by Zeng et al. comprehensively explored the differences in the effects of leaf and soil microbes on the seed germination, seedling survival, and seedling growth of an invasive forb, Ageratina adenophora, and found evidence of stronger effects of leaf microbes on Ageratina compared with soil microbes, which were negative for seed germination and seedling survival but positive for seedling growth. By further DNA sequencing and fungal strain cultivation, the authors were able to identify some of the key microbial guilds that may facilitate such negative and positive feedback.

      Thank you very much for your assessment.

      Strengths:

      (1) The theoretic framework is well-established.

      (2) Relating the direction of plant-microbe feedback to certain microbial guilds is always hard, but the authors have done a great job of identifying and interpreting such relationships.

      Thank you very much for your assessment.

      Weaknesses:

      (1) In the G0 and G21 inoculation experiments, allelopathic effects from leaf litters had not been accounted for, while these two experiments happened to be the ones where negative feedback was detected.

      We did not directly test the allelopathic effects. However, our inoculation of sterile litter or soil indicated the potential allelopathic role in germination and seedling mortality. Interestingly, such allelopathic effects are elicited by leaf litter not by soil, which include delaying germination time (see Fig. 1a) and killing some seedlings (see Fig. 1c). Nonetheless, microbial effects are significantly more adverse than allelopathic (also see Fig. 1e). We will discuss this point in the resubmitted version.

      (2) The authors did not compare the fungal strains accumulated in dead seedlings to those accumulated in live seedlings to prove that the live seedlings indeed accumulated lower abundances of the strains that were identified to increase seedling mortality.

      Thanks for your concerns. We have not isolated fungi from healthy seedlings to make a comparative study. However, our team work previously found that the seedling-killing Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with mature A. adenophora individual; some seedling-killing Alternaria also occur in healthy seedlings inoculated by leaf litter. We thus assumed that these seedling-killing fungi, e.g., Allophoma and Alternaria, likely exist in A. adenophora mature individual by a lifestyle switch from endophytic to pathogenic, and these fungi can kill seedling only at very early life stage of A. adenophora.

      Thus, we discussed this point as: “We did not isolate fungi from healthy seedlings in this study. However, a previous report revealed that the dominant genera in healthy seedlings inoculated with leaf litter were Didymella and Alternaria (Kai Fang et al., 2019). Based on these results, these fungal genera likely exist in A. adenophora by a lifestyle switch from endophytic to pathogenic. The virulence of these strains for seedling survival under certain conditions may play an essential role in limiting the population density of A. adenophora monocultures.” See Lines 416-435.

      Here, we also will consider adding more sentences to discuss your concerns in the resubmitted version as: “It is worth to explore the dynamic of these strains along with seedling development and to determine if these strains kill seedling only at very early stage.”

      (3) The data of seed germination and seedling mortality could have been analyzed in the same manner as that of seedling growth, which makes the whole result section more coherent. I don't understand why the authors had not calculated the response index (RI) for germination/mortality rate and conducted analyses on the correlation between these RIs with microbial compositions.

      Thanks so much. Response index (RI) was calculated as: (variablenon-sterile–variablesterile)/variablesterile)). Because mortality rates of some sterile groups were zero values, it is impossible to calculate their RIs. Relatively, microbes rarely affect seed germination time (GT) and rate (GR) (see Fig. 1a,b). Therefore, we preferred to make a direct comparison of their difference between non-sterile and sterile treatments (see also Figure S2), and we also conducted a correlation by these values with microbial compositions rather than by RIs (see Fig. 4).

      We will emphasis this point in the Materials and Methods when resubmit our revision.

      (4) The language of the manuscript could be improved to increase clarity.

      We will improve this in the resubmitted version.

      Reviewer #2 (Public Review):

      Summary:

      The study provides strong evidence that leaf microbes mediate self-limitation at an early life stage. It highlights the importance of leaf microbes in population establishment and community dynamics.

      Thank you very much for your assessment.

      The authors conducted three experiments to test their hypothesis, elucidating the effects of leaf and soil microbial communities on the seedling growth of A. adenophora at different stages, screening potential microbial sources associated with seed germination and seedling performance, and identifying the fungus related to seedling mortality. The conclusions are justified by their results. Overall, the paper is well-structured, providing clear and comprehensive information.

      Thank you very much for your assessment.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Public Reviewer Comments

      We again thank the reviewers for the time and effort they clearly put into reviewing our manuscript. We have revised our manuscript to take into account the majority of their suggestions, primary among them being refinements of our model and classification approach, detailed sensitivity analysis of our model, and several new simulations. Their very constructive feedback has resulted in what we feel is a much-improved paper. In what follows, we respond to each of their points.

      Reviewer #1:

      COMMENT: The reviewer suggested that our control policy classification thresholds should be increased, especially if the behavioral labels are to be subsequently used to guide analyses of neural data which “is messy enough, but having trials being incorrectly labeled will make it even messier when trying to quantify differences in neural processing between strategies.”

      REPLY: We appreciate the observation and agree with the suggestion. In the revised manuscript, we simplified the model (as another reviewer suggested), which allowed for better training of the classifier. This enabled an increase in the threshold to 95% to have more confidence in the identified control strategies. Figures 7 and 8 were regenerated based on the new threshold.

      COMMENT: The reviewer asked if we could discuss what one might expect to observe neurally under the different control policies, and also suggested that an extension of this work could be to explore perturbation trials, which might further distinguish between the two control policies.

      REPLY: It is indeed interesting to speculate what neural activity could underlie these different behavioral signatures. As this task is novel to the field, it is difficult to predict what we might observe once we examine neural activity through the lens of these control regimes. We hope this will be the topic of future studies, and one aspect worthy of investigation is how neural activity prior to the start of the movement may reflect two different control objectives. Previous work has shown that motor cortex is highly active and specific as monkeys prepare for a cued movement and that this preparatory activity can take place without an imposed delay period (Ames et al., 2014; Cisek & Kalaska, 2005; Dekleva et al., 2018; Elsayed et al., 2016; Kaufman et al., 2014; Lara et al., 2018; Perich et al., 2018; Vyas et al., 2018; Zimnik & Churchland, 2021). It seems possible that the control strategies we observed correspond to different preparatory activity in the motor cortex. We added these speculations to the discussion.

      The reviewer’s suggestion to introduce perturbations to probe sensory processing is very good and was also suggested by another reviewer. We therefore conducted additional simulations in which we introduced perturbations (Supplementary Material; Figure S10). Indeed, in these model simulations the two control objectives separated more. However, testing these predictions via experiments must await future work.

      COMMENT: “It seems like a mix of lambda values are presented in Figure 5 and beyond. There needs to be some sort of analysis to verify that all strategies were equally used across lambda levels. Otherwise, apparent differences between control strategies may simply reflect changes in the difficulty of the task. It would also be useful to know if there were any trends across time?”

      REPLY: We appreciate and agree with the reviewer’s suggestion. We have added a complementary analysis of control objectives with respect to task difficulty, presented in the Supplementary Material (Figures S7 and S8). We demonstrate that, overall, the control objectives remain generally consistent throughout trials and difficulty levels. Therefore, it can be concluded that the difference in behavior associated with different control objectives does not depend on the trial sequence or difficulty of the task. A statement to this extent was added to the main text.

      COMMENT: “Figure 2 highlights key features of performance as a function of task difficulty. …However, there is a curious difference in hand/cursor Gain for Monkey J. Any insight as to the basis for this difference?”

      REPLY: The apparently different behavior of Monkey J in the hand/cursor RMS ratio could be due to subject-to-subject variability. Given that we have data from only two monkey subjects, we examined inter-individual variations between human subjects in the Supplementary Material by presenting individual hand/cursor gain data for all individual human subjects (Figure S1). As can be seen, there was indeed variability, with some subjects not exhibiting the same clear trend with task difficulty. However, on average, the RMS ratio shows a slight decrease as trials grow more difficult, as was earlier shown in Figure 2. We added a sentence about the possibility of inter-individual variations to address the difference in behavior of monkey J with reference to the supplementary material.

      Reviewer #2:

      (Reviewer #2's original review is with the first version of the Reviewed Preprint. Below is the authors' summary of those comments.)

      COMMENT: The reviewer commends the care and effort taken to characterize control policies that may be used to perform the CST, via dual human and monkey experiments and model simulations, noting the importance of doing so as a precursor to future neural recordings or BMI experiments. But the reviewer also wondered if it is all that surprising that different subjects might choose different strategies: “... it makes sense that different subjects might choose to favor different objectives, and also that they can do so when instructed. But has this taught us something about motor control or simply that there is a natural ambiguity built into the task?”

      REPLY: The redundancy in the task that allowed different solutions to achieve the task was deliberate, and the motivation for choosing this task for this study. We therefore did not regard the resulting subject-to-subject variability as a finding of our study. Rather, redundancy and inter-individual variability are features ubiquitous in all everyday actions and we explicitly wanted to examine behavior that is closer to such behavior. As commended by the reviewers, CST is a rich task that extends our research beyond the conventional highly-constrained reaching task. The goal of our study was to develop a computational account to identify and classify such differences to better leverage future neural analyses of such more complex behaviors. This choice of task has now been better motivated in the Introduction of the revised manuscript.

      COMMENT: The reviewer asks about our premise that subjects may use different control objectives in different trials, and whether instead a single policy may be a more parsimonious account for the different behavioral patterns in the data, given noise and instability in the system. In support of this view, the reviewer implemented a simple fixed controller and shared their own simulations to demonstrate its ability to generate different behavioral patterns simply by changing the gain of the controller. The reviewer concludes that our data “are potentially compatible with any of these interpretations, depending on which control-style model one prefers.”

      REPLY: We first address the reviewer’s concern that a simple “fixed” controller can account for the two types of behavioral patterns observed in Experiment 2 (instructed groups) by a small change in the control gain. We note that our controller is also fixed in terms of the plant, the actuator, and the sensory feedback loop; the only change we explore is in the relative weights of position vs. velocity in the Q matrix. This determines whether it is deviations in position or in velocity that predominate in the cost function. This, in turn, generates changes in the gain vector L in our model, since the optimal solution (i.e. the gains L that minimize the cost function) depends on the Q matrix as well as the dynamics of the plant (specifically, the lambda value). Hence, one could interpret the differences arising from changes in the control objective (the Q matrix) as changes in the gains of our “fixed” controller.

      More importantly, while the noise and instability in the system may indeed occasionally result in distinct behavioral patterns (and we have observed such cases in our simulations as well), these factors are far from giving an alternative account for the structural differences in the behavior that we attribute to the control objective. To substantiate this point, we performed additional simulations that are provided in the Supplementary Material (Figures S4—6). These simulations show that neither a change in noise nor in the relative cost of effort can account for the two distinct types of behavior. These differences are more consistently attributed to a change in the control objective.

      In addition, our approach provides a normative account of the control gains needed to simulate the observed data, as well as the control objectives that underlie those gains. As such, the two control policies in our model (Position and Velocity Control) resulted in control gains that captured the differences in the experimental groups (Experiment 2), both at the single trial and aggregate levels and across different task difficulties. Figure S9 in the Supplementary Material shows how the control gains differ between Position and Velocity Control in our model across different difficulty levels.

      We agree,with the reviewer’s overall point, that there are no doubt many models that can exhibit the variability observed in our experimental data, our simulations, or the reviewer’s simulations. Our study aimed to explore in detail not only the model’s ability to generate the variable behavior observed in experimental data, but also to match experimental results in terms of performance levels, gains, lags and correlations across a wide range of lambda values, wherein the only changes in the model were the lambda value and the control objective. Without the details of the reviewer’s model, we are unable to perform a detailed analysis of that model. Even so, we are not claiming that our model is the ‘ground truth,’ only that it is certainly a reasonable model, adopted from the literature, that provides intuitive and normative explanation about the performance of humans and monkeys over a range of metrics, system dynamics, and experimental conditions.

      Finally, we understand the reviewer’s concern regarding whether the trial-by-trial identification of control strategy in Figure 8 suggests that (uninstructed) subjects constantly switch control objectives between Position and Velocity. Although it is not unreasonable to imagine that individuals would intuitively try different strategies between ‘keeping the cursor still’ and ‘keeping the cursor at the center’ across trials, we agree that it is generally difficult to determine such trial-to-trial changes, especially when the behavior lies somewhere in between the two control objectives. In such cases, as we originally discussed in the manuscript, an alternative explanation could be a mixed control objective that generates behavior at the intersection of Position and Velocity Control, i.e., between the two slopes in Figure 8. We believe, however, that our modeling approach is still helpful in cases where performance is predominantly based on Position or Velocity Control. After all, the motivation for this study was to parse neural data into two classes associated with each control objective to potentially better identify structure underlying these behaviors.

      We clarified these points in the main text by adding further explanation in the Discussion section.

      COMMENT: The reviewer suggested additional experiments, such as perturbation trials, that might be useful to further explore the separability of control objectives. They also suggested that we temper our conclusion that our approach can reliably discriminate amongst different control policies on individual trials. Finally, the reviewer suggested that we modify our Introduction and/or Discussion to note past human/monkey research as well as investigations of minimization of velocity-error versus position-error in the smooth pursuit system.

      REPLY: We have expanded our simulations to investigate the effects of perturbation on the separability of different control objectives (Figure S10 in Supplementary Materials). We demonstrated that introducing perturbations more clearly differentiated between Position and Velocity Control. These results provide a good basis for further experimental verifications of the control objectives, but we defer these for future work.

      We also appreciate the additional past work that bridges human and monkey research that the reviewer highlights, including the related discussions in the eye movement literature on position versus velocity control. We have modified our Introduction and Discussion accordingly.

      Reviewer #3:

      COMMENT: The reviewer asked whether the observed differences in behavior might be due to some other factors besides the control policy, such as motor noise or effort cost, and suggested that we more systematically ruled out that possibility.

      REPLY: We appreciate and have heeded the reviewer’s suggestion. The revised manuscript now includes additional simulations in which the control objective was fixed to either Position or Velocity Control, while other parameters were systematically varied. Specifically, we examined the influence of the relative effort cost, the sensory delay, and motor noise, on performance. The results of these sensitivity analyses are presented in the Supplementary Material, Figures S4—6. In brief, we found that changing the relative effort cost, delay, or noise levels, mainly affected the success rate in performance (as expected), but did not affect the behavioral features originally associated with control objectives. We include a statement about this result in the main text with reference to the details provided in the Supplementary Material.

      COMMENT: The reviewer questioned our choice of classification features (RMS position and velocity) and wondered if other features might yield better class separation, such as the hand/cursor gain. In a similar vein, reviewer 2 suggested in their recommendations that we examine the width of the autocorrelation function as a potentially better feature.

      REPLY: We note first that our choice of cursor velocity and position stems from a dynamical systems perspective, where position-velocity phase-space analysis is common. However, we also explored other features as suggested. We found that they, too, exhibited overlap between the two different control objectives, and did not provide any significant improvement in classification performance (Figures S2 and S3; Supplementary Materials). Of course, that is not to say that a more exhaustive examination of features may not find ones that yield better classification performance than those we investigated, but that is beyond the scope of our study. We refer to this consideration of alternative metrics in the discussion.

      COMMENT: The reviewer notes that “It seems that the classification problem cannot be solved perfectly, at least on a single-trial level.” To address this point, the reviewer suggested that we conduct additional simulations under the two different control objectives, and quantify the misclassifications.

      REPLY: We appreciate the reviewer’s suggestion, and have conducted the additional simulations as suggested, the results of which are included in the revised manuscript.

      COMMENT: “The problem of inferring the control objective is framed as a dichotomy between position control and velocity control. In reality, however, it may be a continuum of possible objectives, based on the relative cost for position and velocity. How would the problem differ if the cost function is framed as estimating a parameter, rather than as a classification problem?”

      REPLY: A blended control strategy, formulated as a cost function that is a weighted combination of position and velocity costs, is indeed a possibility that we briefly discussed in the original manuscript. This possibility arises particularly for individuals whose performance metrics lie somewhere between the purely Position or purely Velocity Control. While our model allows for a weighted cost function, which we will explore in future work, we felt in this initial study that it was important to first identify the behavioral features unique to each control objective.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      None beyond those stated above.

      Reviewer #2 (Recommendations For The Authors):

      COMMENT: Line 166 states "According to equation (1), this behavior was equivalent to reducing the sum (𝑝 + 𝑥) when 𝜆 increased, so as to prevent rapid changes in cursor velocity". This doesn't seem right. In equation 1, velocity (not acceleration) depends on p+x. So a large p+x doesn't create a "rapid change in cursor velocity", but rather a rapid change in cursor position.

      REPLY: The reviewer is correct and we have corrected this misworded sentence; thank you for catching that.

      COMMENT: The reviewer points out the potential confusion readers may have, given our unclear use of ‘control strategy’ vs. ‘control policy’ vs. ‘control objective’. The reviewer suggests that “It would be helpful if this could be spelled out early and explicitly. 'Control strategy' seems perilously close to 'control policy', and it would be good to avoid that confusion. The authors might prefer to use the term 'cost function', which is really what is meant. Or they might prefer 'control objective', a term that they introduce as synonymous with 'control strategy'.”

      REPLY: We thank the reviewer for noting this ambiguity. We have clarified the language in the Introduction to explicitly note that by strategy, we mean the objective or cost function that subjects attempt to optimize. We then use ‘control objective’ consistently and removed the term ‘policy’ from the paper to avoid confusion. We also now use Position Control and Velocity Control as the labels for our two control objectives.

      COMMENT: The reviewer notes that in Figure 2B and the accompanying text in the manuscript, we need to be clearer about what is being correlated; namely, cursor and hand position.

      REPLY: Thank you for pointing out this lack of clarity, which we have corrected as suggested.

      COMMENT: The reviewer questions our attribution of decreasing lag with task difficulty as a consequence of subjects becoming more attentive/responsive when the task is harder, and points out that our model doesn’t include this possible influence yet the model reproduces the change in lag. The reviewer suggests that a more likely cause is due to phase lead in velocity compared to position, with velocity likely increasing with task difficulty, resulting in a phase advance in the response.

      REPLY: Our attribution of the decrease in lag with task difficulty being due to attention/motivation was a recapitulation of this point made in the paper by Quick et al. [2018]. But as noted by the reviewer, this potential influence on lag is not included in our model. Accordingly, the change in lag is more likely a reflection of the phase response of the closed loop system, which does change with task difficulty since the optimal gains depend upon the plant dynamics (i.e., the value of lambda). We have, therefore, deleted the text in question.

      COMMENT: “The Methods tell us rather a lot about the dynamics of the actual system, and the cost functions are also well defined. However, how they got from the cost function to the controller is not described. I was also a bit confused about the controller itself. Is the 50 ms delay assumed when deriving the controller or only when simulating it (the text seems to imply the latter, which might make sense given that it is hard to derive optimal controllers with a hard delay)? How similar (or dissimilar) are the controllers for the two objectives? Is the control policy (the matrix that multiplies state to get u) quite different, or only subtly?”

      REPLY: Thanks for pointing this out. For brevity, we had omitted the details and referred readers to the original paper (Todorov, 2005). However, we now revised the manuscript to now include all the details in the Methods section. Hence, the entire section on the model is new. This also necessitated updating all data figures (Figures 3, 4, 5, 6, 7, 8) as they contain modeling results.

      COMMENT: “Along similar lines, I had some minor to moderate confusions regarding the OFC model as described in the main text. Fig 3 shows a model with a state estimator, but it isn't explained how this works. …Here it isn't clear whether there is sensory noise, or a delay. The methods say a delay was included in the simulation (but perhaps not when deriving the controller?). Noise appears to have been added to u, but I'm guessing not to x or x'? The figure legend indicates that sensory feedback contains only some state variables, and that state estimation is used to estimate the rest. Presumably this uses a Kalman filter? Does it also use efference copy, as would be typical? My apologies if this was stated somewhere and I missed it. Either way, it would be good to add a bit more detail to the figure and/or figure legend.”

      REPLY: As the lack of detail evidently led to some confusion, we now more clearly spell out the details of the model in the Methods, including the state estimation procedure.

      COMMENT: The reviewer wondered why we chose to plot mean velocity vs. mean position as in Figure 5, noting that, “ignoring scale, all scatter plots would be identical if the vertical axis were final position (because mean velocity determines final position). So what this plot is really examining is the correlation between final position and average position. Under position control, the autocorrelation of position is short, and thus final position tends to have little to do with average position. Under velocity control, the autocorrelation of position is long, and thus final position tends to agree with average position. Given this, why not just analyze this in terms of the autocorrelation of position? This is expected to be much broader under velocity control (where they are not corrected) than under position control (where they are, and thus disappear or reverse quickly). To me, thinking of the result in terms of autocorrelation is more natural.”

      REPLY: The reviewer is correct that the scatter plots in Fig. 5 would be the same (to within a scale factor of the vertical axis) had we plotted final position vs. mean position instead of mean velocity vs. mean position as we did. Our preference for mean velocity vs. mean position stems from a dynamical systems perspective, where position-velocity phase-space analysis is common. We now mention these perspectives in the revised manuscript for the benefit of the reader.

      As suggested, we also investigated the width of the (temporal) autocorrelation function (acf) of cursor position for 200 simulated position control trials and 200 simulated velocity control trials, at four different lambda values (50 simulated trials per lambda). Figs. S2A and B (Supplementary Materials) show example trials and histograms of the acf width, respectively. As the reviewer surmised, velocity control trials tend to have wider acfs than position control trials. However, as with the metrics we chose to analyze, there is overlap and there is no visible benefit for the classification.

      COMMENT: “I think equation ten is incorrect, but would be correct if the identity matrix were added? Also, why is the last term of B set to 1/(Tau*M). What is M? Is it mass (which above was lowercase m)? If so, mass should also be included in A (it would be needed in two places in the last column). Or if we assume m = 1, then just ignore mass everywhere, including here and equation 5. Or perhaps I'm confused, and M is something else?”

      REPLY: Thanks for pointing this out. The Matrix A shown in the paper is for the continuous-time representation of the model. However, as the reviewer correctly mentioned, for the discrete-time implementation of the model, a modification (identity matrix) was added in our simulations. We have now clarified this in the Methods section of the revised manuscript. Also, as correctly pointed out, M is the mass of the hand, which depending on whether the hand acceleration (d^2 p/dt^2) or hand force (F) are taken as the state, it can be included in the A matrix. In our case, the A matrix is modified according to the state vector. Similarly, the B matrix is also modified. This is now clarified in the Methods section of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      COMMENT: “Equations 4-8 are written in continuous time, but Equation 9 is written in discrete time. Then Equation 10 is in discrete time. This needs to be tidied up. … I would suggest being more detailed and systematic, perhaps formulating the control problem in continuous time and then converting to discrete time.”

      REPLY: Thank you for this helpful suggestion. The model section in the Methods has been expanded to provide further details of the equation of motion, the discretization process, the control law calculation and the state estimation process.

      COMMENT: “It seems slightly odd for the observation to include only position and velocity of the cursor. Presumably participants can also observe the state of their own hand through proprioception (even if it were occluded). How would it affect the model predictions if the other states were observable?”

      REPLY: Thanks for pointing this out. We initially included only cursor position and velocity since we felt that was the most prominent state feedback, and the system is observable in that case. Nevertheless, we revised the manuscript and repeated all simulations using a full observability matrix. Our findings and conclusions remain unchanged. With the changes in the modeling, the figures were also updated (Fig.3, 4, 5, 6, 7, 8).

      COMMENT: “It seems unnecessary to include the acceleration of the cursor in the formulation of the model. …the acceleration is not even part of the observed state according to line 668… I think the model could therefore be simplified by omitting cursor acceleration from the state vector.”

      REPLY: We agree. We have simplified the model, and generated new simulations and figures. Our results and conclusions were unchanged by this modification. With the changes in the modeling, the figures were also updated (Fig.3, 4, 5, 6, 7, 8).

      COMMENT: “In the cost function, it's not clear why any states other than position and velocity of the cursor need to have non-zero values. …The choice to have the cost coefficient for these other states be 1 is completely arbitrary… If the point is that the contribution of these other costs should be negligible, then why not just set them to 0?”

      REPLY: We agree, and have made this change in the Methods section. Our findings and conclusions were unaffected.

      COMMENT: “It seems that the cost matrices were specified after transforming to discrete-time. It is possible however (and perhaps recommended) to formulate in continuous time and convert to discrete time. This can be done cleanly and quite straightforwardly using matrix exponentials. Depending on the discretization timestep, this can also naturally lead to non-zero costs for other states in the discrete-time formulation even if they were zero under continuous time. … A similar comment applies to discretization of the noise.”

      REPLY: Thanks for the suggestion. We have expanded on the discretization process in our Methods section, which uses a common approximation of the matrix exponentiation method.

      COMMENT: “Most of the parameters of the model seem to be chosen arbitrarily. I think this is okay as the point is to illustrate that the kinds of behaviors observed are within the scope of the model. However, it would be helpful to provide some rationale as to how the parameters were chosen. e.g. Were they taken directly from prior literature, or were they hand-tuned to approximately match observed behavior?”

      REPLY: We have revised the manuscript to more clearly note that the noise parameters, as well as parameters of the mechanical system (mass, muscle force, time scale, etc) in our model were taken from previous publications (Todorov, 2005, Cluff et al. 2019). As described in the manuscript, the parameter values of the cost function (Q matrix) were obtained by tuning the parameters to achieve a similar range of success rate with the model as observed in the experimental data. This is now clarified in the Methods section.

      COMMENT: “The ‘true’ cost function for this task is actually a 'well' in position space - zero cost within the screen and very high cost elsewhere. In principle, it might be possible to derive the optimal control policy for this more veridical cost function. It would be interesting to consider whether or not this model might reproduce the observed behaviors.”

      REPLY: This is indeed a very interesting suggestion, but difficult to implement based on the current optimal feedback control framework. However, this is interesting to consider in future work.

      Minor Comments:

      COMMENT: “In Figs 4 and 5, the data points are drawn from different conditions with varying values of lambda. How did the structure of this data depend on lambda? Might it be possible to illustrate in the figure (e.g. the shade/color of each dot) what the difficulty was for each trial?”

      REPLY: We performed additional analyses to show the effects of task difficulty on the choice of control objective. Overall, we found that the main behavioral characteristics of the control objective remained fairly unchanged across different task difficulties or across time. The results of this analysis are included in Fig. S7 and S8 of the Supplementary Materials.

      COMMENT: “Should mention trial duration (6s) in the main narrative of the intro/results.”

      REPLY: We now mention this detail when we describe the task for the first time.

      COMMENT: “As an alternative to training on synthetic data (which might not match behavior that precisely, and was also presumably fitted to subject data at some level) it might be worth considering to do a cross-validation analysis, i.e. train the classifier on subsets of the data with one participant removed each time, and classify on the held-out participant.”

      REPLY: This is indeed a valid point. The main reason to train the classifier based on model simulations was two-fold: first, to have confidence in the training data, as the experimental data was limited and noisy, which would result in less reliable classifications; and second, the model simulations are available for different contexts and conditions, where experimental data is not necessarily available. The latter is a more practical reason to be able to identify control objectives for any subject (who received no instructions), without having to collect training data from matching control subjects who received explicit instructions. Nonetheless, we appreciate the reviewer’s recommendation and will consider that for our future studies.

      COMMENT: “line 690 - Presumably the optimal policy was calculated without factoring in any delay (this would be tricky to do), but the 50ms delay was incorporated at the time of simulation?”

      REPLY: The discretization of the system equations allowed us to incorporate the delay in the system dynamics and solve for the optimal controller with the delay present. This was done simply by system augmentation (e.g., Crevecoeur et al., 2019), where the states of the system in the current time-step were augmented with the states from the 5 preceding time-steps to form the new state vector x(t)_aug =[x(t) , x(t-1) , … , x(t-d) ]. Similarly, the matrices A, B, and H from the system dynamics could be expanded accordingly to form the new dynamical system:

      $$x(t+1){aug} = A{aug} * x(t){aug} + B{aug} * u$$

      Then, the optimal control was implemented on the new (augmented) system dynamics.

      We have revised the manuscript (Methods) to clarify this issue.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The present study's main aim is to investigate the mechanism of how VirR controls the magnitude of MEV release in Mtb. The authors used various techniques, including genetics, transcriptomics, proteomics, and ultrastructural and biochemical methods. Several observations were made to link VirR-mediated vesiculogenesis with PG metabolism, lipid metabolism, and cell wall permeability. Finally, the authors presented evidence of a direct physical interaction of VirR with the LCP proteins involved in linking PG with AG, providing clues that VirR might act as a scaffold for LCP proteins and remodel the cell wall of Mtb. Since the Mtb cell wall provides a formidable anatomical barrier for the entry of antibiotics, targeting VirR might weaken the permeability of the pathogen along with the stimulation of the immune system due to enhanced vesiculogenesis. Therefore, VirR could be an excellent drug target. Overall, the study is an essential area of TB biology.

      Strengths:

      The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. Application of Cryo-EM technology confirmed increased thickness and altered arrangement of CM-L1 layer. The authors also confirmed that increased vesicle release in the mutant was not due to cell lysis, which contrasts with studies in other bacterial species.

      Another strength of the manuscript is that biochemical experiments show altered permeability and PG turnover in the mutant, which fits with later experiments where authors provide evidence of a direct physical interaction of VirR with LCP proteins.

      Transcriptomics and proteomics data were helpful in making connections with lipid metabolism, which the authors confirmed by analyzing the lipids and metabolites of the mutant.

      Lastly, using three approaches, the authors confirm that VirR interacts with LCP proteins in Mtb via the LytR_C terminal domain.

      Altogether, the work is comprehensive, experiments are designed well, and conclusions are made based on the data generated after verification using multiple complementary approaches.

      Weaknesses:

      The major weakness is that the mechanism of VirR-mediated EV release remains enigmatic. Most of the findings are observational and only associate enhanced vesiculogenesis observed in the VirR mutant with cell wall permeability and PG metabolism. The authors suggest that EV release occurs during cell division when PG is most fragile. However, this has yet to be tested in the manuscript - the AFM of the VirR mutant, which produces thicker PG with more pore density, displays enhanced vesiculogenesis. No evidence was presented to show that the PG of the mutant is fragile, and there are differences in cell division to explain increased vesiculogenesis. These observations, counterintuitive to the authors' hypothesis, need detailed experimental verification.

      Response: We thank the reviewer for this comments. We would like to convince this reviewer about the fact that the VirR mutant is truly caring a more fragile PG. We will perfume additional experiments that would support this notion. We will determine the degree of PG release to the extracellular space and run additional mass spectrometry data on isolated PG.

      Transcriptomic data only adds a little substantial. Transcriptomic data do not correlate with the proteomics data. It remains unclear how VirR deregulates transcription. TLCs of lipids are not quantitative. For example, the TLC image of PDIM is poor; quantitative estimation needs metabolic labeling of lipids with radioactive precursors. Further, change in PDIMs is likely to affect other lipids (SL-1, PAT/DAT) that share a common precursor (propionyl- CoA).

      Response: We agree with the reviewer that TLC analysis is not quantitative. Additional TLCs will be run to investigate other lipids sharing common precursors. At the present time, we can not run radioactive experiments on the lab.

      The connection of cholesterol with cell wall permeability is tenuous. Cholesterol will serve as a carbon source and contribute to the biosynthesis of methyl-branched lipids such as PDIM, SL-1, and PAD/DAT. Carbon sources also affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability and import/export of drugs. Authors should investigate whether restoration of the normal level of permeability and EV release is not due to the maintenance of cell wall lipid balance upon cholesterol exposure of the VirR mutant.

      Response: We concur with the reviewer that cholesterol as sole carbon source is introducing many changes in Mtb cells beside permeability. Our central hypothesis regarding this data is that cholesterol will make Mtb cell membrane less fluid and this fact will make Ev release to be reduced. We will try to measure membrane fluidity in the presence and absence of cholesterol. However, permeability changes in Mtb cells can be manifested at different levels of the cell envelope. This would suggest that the increased permeability observed in the VirR mutant, could be different than that of observed upon TRZ treatment. The main point on this is that vesiculogenesis could be a general process responding to changes in permeability regardless of the cell envelope compartment affected. We need to define experiments here, but we will try to demonstrate this.

      Finally, protein interaction data is based on experiments done once without statistical analysis. If the interaction between VirR and LCP protein is expected on the mycobacterial membrane, how the SPLIT_GFP system expressed in the cytoplasm is physiologically relevant. No explanation was provided as to why VirR interacts with the truncated version of LCP proteins and not with the full-length proteins.

      Response: Split-GFP has been previously used with cell membrane proteins with success. However, we will repeat the experiments and perform statistics.

      Reviewer #2 (Public Review):

      Summary:

      In this work, Vivian Salgueiro et al. have comprehensively investigated the role of VirR in the vesicle production process in Mtb using state-of-the-art omics, imaging, and several biochemical assays. From the present study, authors have drawn a positive correlation between cell membrane permeability and vasculogenesis and implicated VirR in affecting membrane permeability, thereby impacting vasculogenesis.

      Strengths:

      The authors have discovered a critical factor (i.e. membrane permeability) that affects vesicle production and release in Mycobacteria, which can broadly be applied to other bacteria and may be of significant interest to other scientists in the field. Through omics and multiple targeted assays such as targeted metabolomics, PG isolation, analysis of Diaminopimelic acid and glycosyl composition of the cell wall, and, importantly, molecular interactions with PG-AG ligating canonical LCP proteins, the authors have established that VirR is a central scaffold at the cell envelope remodelling process which is critical for MEV production.

      Response: We thank the reviewer for this kind words.

      Weaknesses:

      Throughout the study, the authors have utilized a CRISPR knockout of VirR. VirR is a non-essential gene for the growth of Mtb; a null mutant of VirR would have been a better choice for the study.

      Response: We thank the reviewer for bringing up this issue. Contrary to predictions, we believe that virR is an essential gene as we have tried to delete it several times with no success. We used in the study a transposon mutant and its complementing strain since they have been the base of previous studies to establish their genetic implications in vesiculogenesis in Mtb. The choice of CRISPRi was run similar experiments in a background different from transposon mutagenesis. Our data, support similar phenotypes in term of vesicle release.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      More details about the classification and how it is trained

      We included a sentence in the introduction to clarify which data we are using: "In order to demonstrate this improvement, we apply our methods to two classification datasets: a synthetic dataset and a public clinical dataset where the predicted outcome is the survival of the patient"

      And about how the classifier is trained in the "Results" section: "we used the default parameters of the classifier, since our focus is comparing the different imputation methods"

      Availability of the code

      Now the code is publicly available in a github repository https://github.com/AstraZeneca/dpp_imp/ (see Availability of Data and Code section)

      Reviewer #2

      Clarifying that Determinantal Point Processes and their deterministic version have been introduced before but are applied for the first time for data imputation in this work:

      We added explanation in the 6th paragraph of the introduction that we use pre-existing DPP and deterministic-DPP algorithms for our imputation methods and include the references to avoid confusion

      We also added a paragraph at the end of the introduction to summarize this work's contribution

      Explaining the claim about the computational advantage of using quantum determinantal point processes for the imputation methods:

      In the fourth paragraph of the "Discussion" section (page 8), we give an imputation example by numerically comparing the classical and quantum algorithms running time for DPP sampling, which shows the advantage of using the quantum algorithm.

      Regarding running time for classical DPP and quantum DPP sampling algorithms:

      We included Table VIII (page 13) that compares the preprocessing and sampling complexities for both classical and quantum DPP algorithms, we consider the case where we sample d rows from an (n,d) matrix and n=O(d) which is usually the case for our DPP-Random Forest algorithm

      We added some details regarding the quantum advantage in the first paragraph of page 12

      Regarding the comment about the modest improvement of the DPP methods and questions about their practical benefit:

      As mentioned in the third paragraph of the "Discussion" section, we point out that the consistency of the improvement and the removal of variance as a result of using the DPP and deterministic DPP methods make our methods very beneficial to use on clinical data. Further exploration with different data sets can provide a more result in a more complete understanding of the practical advantages of the methods

      Algorithmic complexity of the deterministic DPP algorithm:

      Detailed in the last sentence of the "Determinantal Point Processes" subsection of the "Methods" section: O(N^2 d) for the preprocessing step and O(Nd^3) for the sampling step

      Running time for the quantum deterministic DPP sampling and how it is done in practice:

      While it is difficult to assess the real running time for the quantum detDPP algorithm for large circuits (100 or more qubits), due to the unavailability of such devices, we give more details about our practical implementation in the last paragraph of the "Methods" section. In our case (up to 10 qubits) we used 1000 shots to sample the highest probability elements.

      On which quantum simulator was used

      We point out in the first paragraph of page 5 that we employ the qiskit noiseless simulator

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study identifies the gene mamo as a new regulator of pigmentation in the silkworm Bombyx mori, a function that was previously unsuspected based on extensive work on Drosophila where the mamo gene is involved in gamete production. The evidence supporting the role of Bm-nano in pigmentation is convincing, including high-resolution linkage mapping of two mutant strains, expression profiling, and reproduction of the mutant phenotypes with state-of-the-art RNAi and CRISPR knock-out assays. While the discussion about genetic changes being guided or accelerated by the environment is extremely speculative and has little relevance for the findings presented, the work will be of interest to evolutionary biologists and geneticists studying color patterns and evolution of gene networks.

      Response: Thank you very much for your careful work. In the revised version, we conducted a comparative genomic analysis of the upstream regions of the Bm-mamo gene in 51 wild silkworms and 171 domesticated local silkworms. The analysis of nucleotide diversity (pi) and the fixation index (FSTs) of the Bm-mamo genome sequences in the wild and domesticated silkworm populations were also performed. The results showed that the Bm-mamo genome sequence of local silkworms was relatively conserved, while the upstream sequence of wild silkworms exhibited high nucleotide diversity. This finding suggested a high degree of variability in the regulatory region of the Bm-mamo gene, in wild strains. Additionally, the sequence in this region may have been fixed by domestication selection. We have optimized the description in the discussion section.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This papers performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument.

      The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate.

      Strengths:

      The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      This revision is a much improved manuscript and I command the authors for many of their edits.

      Response: Thank you very much for your careful work. With the help of reviewers and editors, we have revised the manuscript to improve its readability.

      I find the last part of the discussion, starting at "It is generally believed that changes in gene expression patterns are the result of the evolution of CREs", to be confusing.

      In this section, I believe the authors sequentially:

      • emphasize the role of CRE in morphological evolution (I agree)

      • emphasize that TF, and in particular their own CRE, are themselves important mutational targets of evolution (I agree, but the phrasing need to insist the authors are here talking about the CRE found at the TF locus, not the CRE bound by the TF).

      • use the stickleback Pel enhancer as an example, which I think is a good case study, but the authors also then make an argument about DNA fragility sites, which is hard to connect with the present study.

      • then continue on "DNA fragility" using the peppered moth and butterfly cortex locus. There is no evidence of DNA fragility at these loci, so the connection does not work. "The cortex gene locus is frequently mutated in Lepidoptera", the authors say. But a more accurate picture would be that the cortex locus is repeatedly involved in the generation of color pattern variants. Unlike for Pel fragile enhancer, we don't know if the causal mutations at this locus are repeatedly the same, and the haplotypes that have been described could be collateral rather than causal. Overall, it is important to clarify the idea that mutation bias is a possible factor explaining "genetic hotspots of evolution" (or genetic parallelism sensu 10.1038/nrg3483), but it is also possible that many genetic hotspots are repeated mutational targets because of their "optimal pleiotropy" (e.g. hub position in GRNs, such as mamo might be), or because of particularly modular CRE region that allow fine-tuning. Thus, I find the "fragility" argument misleading here. In fact the finding that "bd" and "bdf" alleles are different in nature is against the idea of a fragility bias (unless the authors can show increased mutation rates at this locus in a wild silkmoth species?). These alleles are also artificially-selected ie. they increased in frequency by breeding rather than natural selection in the wild, so while interesting for our understand of the genotype-phenotype map, they are not necessarily representative of the mutations that may underlie evolution in the wild.

      Response: Thank you very much for your careful work. DNA fragility is an interesting topic, but some explanations for DNA fragility are confusing. One study measured the rate of DNA double-strand breaks (DSBs) in yeast artificial chromosomes (YACs), which are chromosomes containing marine Pel that broke ~25 to 50 times more frequently than did the control. These authors believe that the increase in the mutation rate is caused by DNA sequence characteristics, particularly TG-dinucleotide repeats. Moreover, they found that adding a replication origin on the opposite side of Pel did not cause the fungus to switch fragile, making the forward sequence stable and the reverse complement fragile. Thus, Pel fragility is also dependent on the direction of DNA replication. In summary, they suggested that the special DNA sequence is the cause of DNA fragility. In addition, the sequence features associated with DNA fragility in the Pel region are also found in thousands of other positions in the stickleback and human genomes (Xie KT et al, 2019, science).

      In yeast artificial chromosomes (YACs), the characteristics of DNA sequences, such as TG-dinucleotide repeat sequences, may be important reasons for DNA fragility, and these breaks occur during DNA replication. However, the inserted sequence of YAC often undergoes deletion or recombination during cultivation and passage. In addition, yeast is a single-celled organism. Therefore, the results in yeast cannot represent the situation in multicellular organisms. If multicellular organisms are like this, there are several issues as follows:

      (1) The DNA replication process occurs separately in different multicellular organisms. Because DNA breakage and repair are independent, they can lead to the presence of different alleles in different cells. This can potentially lead to the occurrence of extensive chimeric organisms. However, we have not found such a situation in the genome sequencing of many multicellular organisms.

      (2) If the DNA sequence, TG-dinucleotide repeats, is the determining factor, the mutations near the sequence lose their strong correlation with environmental changes. The researchers conducted yeast artificial chromosome experiments in the same environment and found that the frequency of DNA breaks containing TG dinucleotide repeat sequences was 25 to 50 times greater than that of the control group. This means that, whether in the marine population or the lake population, this part of the sticklebacks’ genome has undergone frequent mutations. However, according to related research, populations of lake sticklebacks, rather than marine populations, often exhibit a decrease in the pelvic phenotype.

      (3) Researchers have found thousands of loci in the genome of sticklebacks and humans that contain such sequences (TG-dinucleotide repeats). This means that thousands of sites undergo frequent mutations during DNA replication. Unless these sites do not possess functionality, they will have some impact on the organism, even causing damage. Even if they are not functional sequences, these sequences will gradually be discarded or replaced during frequent mutations rather than being present in large quantities in the genome.

      Therefore, the study of DNA fragility in yeast cannot explain the situation in multicellular organisms.

      As you noted, we want to express that the frequent variation in the cortex gene should be regulated by targeted regulation involving the GRN in Lepidoptera. In addition, studies on specific epigenetic modifications discovered through the referenced fragile DNA sites suggest that DNA fragility is not determined by the DNA sequence (Ji F, 2020, Cell Res) but rather by other factors, such as epigenetic factors. The sequence features discovered at fragile DNA sites are traces of frequent mutations, not causes.

      In this revision, we analyzed the nucleotide diversity of the mamo genome in 51 wild and 171 domestic silkworms. We found high nucleic acid diversity from the third exon to the upstream region of this gene in wild silkworms. We randomly selected 12 wild silkworms and 12 domestic silkworms and compared their upstream sequences to approximately 1 kb. In wild silkworms, there is significant diversity in their upstream sequences. In domestic silkworms, the sequences are highly conserved, but in some silkworms, a long interspersed nuclear element (LINE) is inserted. This finding suggested that there is frequent variation in the sequence of this region in wild silkworms, while fixation occurs in domesticated silkworms. These genomic data are sourced from the pangenome of silkworms (Tong X, 2022, Nat Commun.). In the pangenomic research, 1078 strains (205 local strains, 194 improved strains, 632 mutant strains, and 47 wild silkworms), which included 545 third-generation sequencing genomes, were obtained. An online website was built to utilize these data (http://silkmeta.org.cn/). We warmly welcome you to use these data.

      In summary, for clearer expression, we have rewritten this section.

      Xie KT, Wang G, Thompson AC, Wucherpfennig JI, Reimchen TE, MacColl ADC, Schluter D, Bell MA, Vasquez KM, Kingsley DM. DNA fragility in the parallel evolution of pelvic reduction in stickleback fish. Science. 2019 Jan 4;363(6422):81-84. doi: 10.1126/science.aan1425.

      Ji F, Liao H, Pan S, Ouyang L, Jia F, Fu Z, Zhang F, Geng X, Wang X, Li T, Liu S, Syeda MZ, Chen H, Li W, Chen Z, Shen H, Ying S. Genome-wide high-resolution mapping of mitotic DNA synthesis sites and common fragile sites by direct sequencing. Cell Res. 2020 Nov;30(11):1009-1023. doi: 10.1038/s41422-020-0357-y.

      Tong X, Han MJ, Lu K, Tai S, Liang S, Liu Y, Hu H, Shen J, Long A, Zhan C, Ding X, Liu S, Gao Q, Zhang B, Zhou L, Tan D, Yuan Y, Guo N, Li YH, Wu Z, Liu L, Li C, Lu Y, Gai T, Zhang Y, Yang R, Qian H, Liu Y, Luo J, Zheng L, Lou J, Peng Y, Zuo W, Song J, He S, Wu S, Zou Y, Zhou L, Cheng L, Tang Y, Cheng G, Yuan L, He W, Xu J, Fu T, Xiao Y, Lei T, Xu A, Yin Y, Wang J, Monteiro A, Westhof E, Lu C, Tian Z, Wang W, Xiang Z, Dai F. High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation. Nat Commun. 2022 Sep 24;13(1):5619. doi: 10.1038/s41467-022-33366-x.

      Lu K, Pan Y, Shen J, Yang L, Zhan C, Liang S, Tai S, Wan L, Li T, Cheng T, Ma B, Pan G, He N, Lu C, Westhof E, Xiang Z, Han MJ, Tong X, Dai F. SilkMeta: a comprehensive platform for sharing and exploiting pan-genomic and multi-omic silkworm data. Nucleic Acids Res. 2024 Jan 5;52(D1):D1024-D1032. doi: 10.1093/nar/gkad956.

      Curiously, the last paragraph ("Some research suggests that common fragile sites...") elaborate on the idea that some sites of the genome are prone to mutation. The connection with mamo and the current article are extremely thin. There is here an attempt to connect meiotic and mitotic breaks to Bm-mamo, but this is confusing: it seems to propose Bm-mamo as a recruiter of epigenetic modulators that may drive higher mutation rates elsewhere. Not only I am not convinced by this argument without actual data, but this would not explain how the mutations at the Bm-mamo itself evolved.

      Response: Thank you very much for your careful work. This section mainly illustrates that DNA fragility is not determined by sequence but is regulated by other factors in animals. In fruit flies, they found that mamo is an important candidate gene for recombination hotspot setting in meiosis. First, we evaluated PRDM9, which plays an important role in setting recombination hotspots during meiosis. Our purpose in mentioning this information is to illustrate that chromosome recombination is a process of programmed double strand breaks and to answer another reviewer's question about programmed events in the genome. In summary, we suggest that some variations in DNA sequences are procedural results. We have optimized the description of this section in this version.

      On a more positive note, I find it fascinating that the authors identified a TF that clearly articulates or orchestrate larval pattern development, and that when it is deleted, can generate healthy individuals. In other words, while it is a TF with many targets, it is not too pleiotropic. This idea, that the genetically causal modulators of developmental evolution are regulatory genes, has been described elsewhere (e.g. Fig 4c in 10.1038/s41576-020-0234-z, and associated refs). To me, the beautiful findings about Bm-mamo make sense in the general, existing framework that developmental processes and regulatory networks "shape" the evolutionary potential and trajectories of organisms. There is a degree of "programmability" in the genomes, because some loci are particularly prone to modulate a given type of trait. Here, Bm-mamo, as a potentially regulator of both CPs and melanin pathway genes, appear to be a potent modulator of epithelial traits. Claiming that there are inherent mutational biases behind this is unwarranted.

      Response: Thank you very much for your careful work. I completely agree with your statement that the genome exhibits a certain degree of programmability. On the one hand, some transcription factors can precisely control the spatiotemporal expression levels of some structural genes (such as pigment synthesis genes). On the other hand, these transcription factors are also subject to strict expression regulation. Because the color pattern is complex, changes in single or minority structural genes result in incomplete or imprecise changes in coloring patterns. Nevertheless, several regulatory factors can regulate multiple downstream target genes. Changes in their expression patterns can lead to holistic and significant changes in color patterns. There are long intergenic regions upstream of many important transcription factors, dozens of kilobase pairs (Kb) to hundreds of Kb, which may contain many different regulatory elements for better control of their expression patterns. Therefore, gene regulatory networks can directly regulate transcription factors to modulate a given type of trait. Transcription factors and their downstream target genes can form a functional module, which is similar to a functional module in software or operating systems. This regulation of transcription factors is simpler in terms of steps, which are similar to a single click switch button. The gene regulatory network regulates these modules in response to environmental changes and is widely recognized.

      Some people do not agree that genetic variations can also be regulated. They claim that this is completely random. The infinite monkey theorem (Félix-Édouard-Justin-Émile Borel, 1909) states that if an infinite number of monkeys were given typewriters and an infinite amount of time, they would eventually produce the complete works of Shakespeare. Although this theory advocates randomness on the surface, its conclusions are full of inevitability (tail event). In nature, some things we observe do not have obvious regularity because they involve relatively complex factors, and the underlying logic is obscure and difficult to understand. We often name them random. However, as we gradually understand the logic behind this complex event, we can also recognize the procedural nature of this randomness.

      Previously, chromosomal recombination during meiosis was believed to be a random event. However, currently, it is believed that the process is procedural. The occurrence of meiotic recombination mentioned earlier indicates that the genome has the ability to self-set the position of double-strand breaks to form new allelic forms. Because meiotic recombination is programmed, transcription factors that recognize DNA sites, enzymes that cleave double strands, and DNA repair systems exist, programming can also introduce genetic variation. A study in plants has provided insights into this programmed mutation (Monroe JG, 2023, nature). Frequent changes in the expression patterns of some transcription factors occur between and/or within species. In this article, we only discuss the possible reasons for variations in the expression patterns of some transcription factors in a general manner and simple reasoning. We have added an analysis of the response of wild silkworms and improved the relevance of the discussion.

      Monroe JG, Srikant T, Carbonell-Bejerano P, Becker C, Lensink M, Exposito-Alonso M, Klein M, Hildebrandt J, Neumann M, Kliebenstein D, Weng ML, Imbert E, Ågren J, Rutter MT, Fenster CB, Weigel D. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022 Feb;602(7895):101-105. doi: 10.1038/s41586-021-04269-6. Epub 2022 Jan 12. Erratum in: Nature. 2023 Aug;620(7973):

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Please structure your Discussion with section headers.

      Response: Thank you very much for your careful work. We have added relevant section headers.

      • As explained in my public review, I found the two last sections of the Discussion to be dispersed and confusing. I also must say that I carefully read the Response to Reviewers on this, which helped me to better understand the authors' intentions here. Please consider the revision of this Discussion as this feels extremely speculative difficult to connect with Bm-mamo.

      Response: Thank you very much for your careful work. We have rewritten this part of the content.

      • typo: were found near the TTS of yellow --> TSS

      Response: Thank you very much for your careful work. We have made these modifications.

      • l. 234 :"expression level of the 18 CP genes in the integument". Consider adding a mention of Figure 7 here, as only Fig. S10 is cited here.

      Response: Thank you very much for your careful work. We have made these modifications.

      • Editorial comment on the second half of the Abstract:

      Wu et al : "We found that Bm-mamo can comprehensively regulate the expression of related pigment synthesis and cuticular protein genes to form color patterns. This indicates that insects have a genetic basis for coordinate regulation of the structure and shape of the cuticle, as well as color patterns. This genetic basis provides the possibility for constructing the complex appearances of some insects. This study provides new insight into the regulation of color patterns."

      I respectfully suggest a more accurate rephrasing, where the methods are mentioned, and where the logical argument is more straightforward. For example

      "Using RNAi and CRISPR we show that Bm-mamo is a repressor or dark melanin patterns in the larval epithelium. Using in-vitro binding assays and gene expression profiling in wild-type and mutant larvae, we also show that Bm-mamo likely regulate the expression of related pigment synthesis and cuticular protein genes in a coordinated manner to mediate its role in color pattern formation. This mechanism is consistent with a dual role of this transcription factor in regulating both the structure and shape of the cuticle and pigments that are embedded within it. This study provides new insight into the regulation of color patterns as well as in the construction more complex epithelial features in some insects."

      I hope this let the ideas of the original version transpire as the authors intended.

      Response: Thank you very much for your careful work. We have made these modifications.

    1. Author Response

      We would like to thank the reviewers for their thoughtful feedback on our work. One important point that they bring up is a potential issue with our method for accounting for excess NCO events that are detected due to increased marker resolution in the introgressed regions. The method we chose was to simulate average sized NCO tracts over both introgressed and non-introgressed windows to determine the expected increase in NCO detection due to marker density. We then took that expected increase and used it to correct our per-window NCO counts in all windows. We used these corrections for all results and analysis involving genomic windows (maps and genomewide comparisons) but did not include them when focusing on introgression-specific characteristics (e.g. analyzing fine-scale sequence differences around NCO tracts in introgressed regions). We chose this method based on previous work in the field and after some additional analyses on our own data that we did not include in the final manuscript. We will attempt to better communicate our decision making process and include some of the exploratory results that guided us in our revised manuscript. We look forward to responding to all comments and highlighting additional aspects of our findings that we think are of interest to the evolution and recombination communities, including significant changes to the recombination landscape between closely related strains and the impact of introgression on allelic shuffling.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The study investigates the role of cylicin-1 (CYLC1) in sperm acrosome-nucleus connections and its clinical relevance to male infertility. Using mouse models, the researchers demonstrate that cylicin-1 is specifically expressed in the post acrosomal sheath-like region in spermatids and plays a crucial role in mediating acrosome-nucleus connections. Loss of CYLC1 results in severe male subfertility, characterized by acrosome detachment and aberrant head morphology in sperm. Further analysis of a large cohort of infertile men reveals CYLC1 variants in patients with sperm head deformities. The study provides valuable insights into the role of CYLC1 in male fertility and proposes CYLC1 variants as potential risk factors for human male infertility, emphasizing the importance of mouse models in understanding the pathogenicity of such variants.

      We appreciate the comprehensive summary of reviewer 1.

      Strengths:

      This article demonstrates notable strengths in various aspects. Firstly, the clarity and excellent writing style contribute to the accessibility of the content. Secondly, the employed techniques are not only relevant but also complementary, enhancing the robustness of the study. The precision in their experimental design and the meticulous interpretation of results reflect the scientific rigor maintained throughout the study. Furthermore, the decision to create a second mouse model with the exact CYLC1 mutation found in humans adds significant qualitative value to the research. This approach not only validates the clinical relevance of the identified variant but also strengthens the translational impact of the findings.

      We appreciate the positive comment of reviewer 1.

      Weaknesses:

      There are no obvious weaknesses. While a few minor refinements, as suggested in the recommendations to authors, could enhance the overall support for the data and the authors' messages, these suggested improvements in no way diminish the robustness of the already presented data.

      In the recommendation for the authors, reviewer 1 mentioned a recent study (Schneider et al., eLife, 2023) showing that Cylc1-KO mice exhibits a reduced sperm count, an observation not noted in our current study. We would like to comment that that main and most important phenotype of Cylc1-KO mice in both studies is quite similar, including male subfertility and abnormal head morphology. We think the different targeting strategy and mouse strain may cause this discrepancy. In Schneider’s and our current studies, the total motility abnormality of Cylc1-KO mice are not observed. We appreciate the suggestion of reviewer 1 to further examine the detailed parameters of motility such as VCL, VSL, and ALH. Given that the head deformation is the most obvious phenotype of Cylc1-KO mice and the focus of our study, we feel sorry that this detailed analysis of sperm motility was not performed in the current stage. Reviewer 1 also asked whether Cylc1-KO female mice are fertile or not. Given that Cylc1 is an X chromosome gene and Cylc1-KO (Cylc1-/Y) mice are severely subfertile, we do not obtain enough Cylc1-KO female mice to examine their fecundity. We also would like to thank reviewer 1 to point out several inaccurate descriptions.

      Reviewer #2 (Public Review):

      Summary:

      To verify the function of PT-associated protein CYLC1, the authors generated a Cylc1-KO mouse model and revealed that loss of cylicin-1 leads to severe male subfertility as a result of sperm head deformities and acrosome detachment. Then they also identified a CYLC1 variant by WES analysis from 19 infertile males with sperm head deformities. To prove the pathogenicity of the identified mutation site, they further generated Cylc1-mutant mice that carried a single amino acid change equivalent to the variant in human CYLC1. The Cylc1-mutant mice also exhibited male subfertility with detached acrosomes of sperm cells.

      We appreciate the comprehensive summary of reviewer 2.

      Strengths:

      The phenotypes observed in the Cylc1-KO mice provide strong evidence for the function of CYLC1 as a PT-associated protein in spermatogenesis and male infertility. Further mechanistic studies indicate that loss of cylicin-1 in mice may disrupt the connections between the inner acrosomal membrane and acroplaxome, leading to detached acrosomes of sperm cells.

      We appreciate the positive comment of reviewer 2.

      Weaknesses:

      The authors identified a missense mutation (c.1377G>T/p. K459N) from 19 infertile males with sperm head deformities. The information for the variant in Table 1 is insufficient to determine the pathogenicity and reliability of the mutation site. More information should be added, including all individuals in gnomAD, East Asians in gnomAD, 1000 Genomes Project for allele frequency in the human population; MutationTaster, M-CAP, FATHMM, and more other tools for function prediction. Then, the expression of CYLC1 in the spermatozoa from men with CYLC1 mutation should be explored by qPCR, Western blot, or IF staining analyses. Although 19 infertile males were found carrying the same missense mutation (c.1377G>T/p. K459N), their phenotypes are somewhat different. For example, sperm concentrations for individuals AAX765, BBA344, and 3086 are extremely low but this is not observed in other infertile males. Then, progressive motility for individuals AAT812, 3165, 3172, 3203, and 3209 are extremely low but this is also not observed in other infertile males. It is worth considering why different phenotypes are observed in probands carrying the same mutation.

      We appreciate the suggestion of reviewer 2. First, Table 1 shows the information of the variant identified in CYLC1 gene, including allele frequency in gnomAD and functional prediction by SIFT, PolyPhen-2, and CADD. Given that mutant mice is a gold standard to confirm the pathogenicity of a variant, we generate Cylc1-mutant mice and Cylc1-mutant mice exhibit male subfertility with sperm acrosome detachment. The animal evidence is much more solid than bioinformatics prediction to confirm the pathogenicity of the identified variant in the CYLC1 gene. Second, the expression of CYLC1 in the spermatozoa from patients have been examined by IF staining (Fig. 5B). Unfortunately, the patients declined to continue in the project to donate more semen for qPCR and Western blot analyses. Third, the reviewer 2 asks why not all patients with CYLC1 gene mutation show the identical phenotype. Although some patients exhibit low sperm count or reduced motility, sperm head deformities are the shared phenotype of 19 patients. Many factors, such as way of life, may affect sperm quality. Perfectly identical phenotype of all 19 patients carrying the CYLC1 mutation is idealistic and will not always happen in clinical diagnosis. We also appreciate other suggestions from reviewer 2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate the insightful feedback provided by the editors and reviewers who have recognized the novelty of our study. We have mapped the spatial distribution of six endogenous somatic histone H1 variants within the nuclei of several human cell lines using specific antibodies, which strongly suggest functional differences between variants. We are submitting a revised version of the manuscript to accommodate the reviewers comments and recommendations.

      Reviewer #1 (Recommendations For The Authors):

      Minor Comments:

      (1) In Figure 1C, since H1.4 is uniformly distributed among the four sections (A1-A4), its levels are not expected to be significant among the four sections as depicted. Even the violin plots shown do not seem to be significantly different from each other. This requires an explanation.

      We agree with this reviewer that significant differences of H1.4 abundance within areas A1 to A4 seem to not exist, either looking at the images or the data violin plots, as discussed in the manuscript. Nonetheless, statistical testing gave this as significant, due to small differences and the elevated sample N of the analysis. It is clear that H1.4 does not show a relevant peripheral enrichment as shown for the other variants.

      (2) At the end, it would be better to include a figure panel depicting chart/table/pictorial representation, depicting the summary of the work done with respect to all the histone variants, as there are several histone H1 variants studied under different conditions and contexts.

      A table summarizing the location and characteristics of the different H1 variants has been included in the manuscript (Figure 6).

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors may consider adding controls for the specificity of the antibodies used for the studies. While the antibodies used here are commercial, it does not guarantee the quality for immunofluorescence, especially considering their unreliability in the past. The authors may consider including peptide/ recombinant protein-based adsorption controls in addition to knockdown or knockout controls. Having these data will strengthen the exciting observations presented in this MS and significantly increase the impact of the presented findings.

      We totally agree with the reviewers that the use of commercially available antibodies does not guarantee their quality and specificity. As this issue was crucial for our studies, we extensively assayed performance and specificity of the antibodies, using different approaches. The validations were shown in our previous publications where these antibodies where successfully used for ChIP-seq (Serna-Pujol et al. 2022 NAR 50:3892; Salinas-Pena et al. 2024 NAR doi 10.1093/nar/gkae014). In summary, performance of H1.0 (05-629l, Millipore), H1.2 (ab4086, abcam), H1.4 (702876; Invitrogen), H1.5 (711912, Invitrogen) and H1X (ab31972; abcam) antibodies was tested by Western-Blot, ChIP and proteomic analyses (all the results are included in Supplem. Figure 1 in Serna-Pujol et al. 2022 NAR 50:3892). Concretely, we tested specificity using inducible KDs for the depletion of each of the somatic H1 variants in T47D. We also checked that the antibodies did not recognize additional H1 variants using recombinant proteins or cell lines naturally lacking some of the variants. All the experiments confirmed that antibodies were variant-specific. In addition, when the corresponding epitope was absent, the antibodies did not gain new cross-reactivity with other variants. More recently, validation of the specificicity of the H1.3 antibody (ab203948) was performed following the same experimental approaches described for the rest of antibodies (Supplem. Figure 1 in Salinas-Pena et al. 2024 NAR doi 10.1093/nar/gkae014).

      (2) Histone H1 is overexpressed in several cancers. While the authors do not use an overexpression strategy, the cells used in this study are all cancer cell lines. The study would benefit greatly if some of the findings- primarily regarding the spatial distribution of the H1 were to reproduce in non-tumorigenic, diploid cells.

      We have also studied and discussed the spatial distribution of H1 variants in nontumorogenic cell lines 293T and IMR-90, and we have added this in the revised manuscript (Figure 5D and Figure 5-figure supplement 3). The nuclear radiality of H1.4 in 293T cells is also shown (Figure 5-figure supplement 4A).

      Reviewer #3 (Recommendations For The Authors):

      This is an interesting paper that provides convincing evidence of distinct distributions to individual histone H1 variants. There are several aspects of the study that leave me unconvinced that the study accurately captures histone H1 variant distributions.

      (1) Antibody accessibility: (see PMID: 32505195). One means to address this is to express a fluorescent protein-tagged version of histone H1 and demonstrate that the antibody can detect that tagged version of histone H1 independent of its location in the nucleus. In general, these FP-tagged H1s show a much more even distribution than what is observed here. Of course, that could reflect artifacts related to the fusion or the expression of the exogenous construct. However, even if all of the above are true, this will test the ability of the antibodies to recognize their epitopes in different chromatin environments. The fluorescent protein tag enables unambiguous knowledge of the presence or absence of the H1 histone.

      We have used cells expressing HA-tagged H1.0 variant and performed immunofluorescence with HA and H1.0 antibody to investigate co-localization, to test whether an H1 antibodiy recognize all the tagged protein in different chromatin environments or irrespective of its location in the nucleus. A very high correlation between the two antibodies has been found (Figure 1-figure supplement 1B).

      (2) At high concentrations, the fluorescence signal intensity can be quenched. For example, this is common with high-affinity histone H3 serine 10 phosphorylation antibodies in late interphase/prophase nuclei. The artifact can be minimized by serial dilution of the antibody and identifying the minimum usable concentration for immunofluorescence. While I am not certain that this is taking place here, the rate and manner that the intensity drops off from the periphery in the peripheral H1 variant distribution are very similar in appearance. There are biological explanations related to constraints on diffusion that one could imagine also explaining the data so I'm not stating that this must be an artefact. However, I am concerned that it might be. An improved staining may reveal the same result but more convincingly.

      We have performed immunofluorescence with serial dilutions of the H1.3 antibody to show that peripheral distribution was not due to fluorescence signal intensity quenching (Figure 1figure supplement 1A).

      (3) Histone H1 is highly mobile and there is some concern that they could reorganize during the relatively long period of time that it takes to fully fix the cells for both ChIP and immunofluorescence. This should be acknowledged in the manuscript.

      We have added this reviewer’ concern in the Discussion section.

      (4) The paper would benefit from a more rigorous quantification of histone H1 subtypes. Mass spectrometry would be ideal but more classical techniques such as 2D AU-SDS PAGE, HPLC, etc...would be an improvement over immunoblotting. The authors did not explain the quantification of the immunoblots and the assignment of relative contributions of H1 subtypes to the individual coommassie bands in the Image J section of methods, which is referred to as the method of quantification in the immunoblotting methods.

      We have further explained how the relative quantification of H1 variants in different cell lines was performed (Methods section). We agree that more sophisticated mass spectrometrybased quantification is desirable and we are collaborating to do this using internal H1 peptide controls (Parallel Reaction Monitoring), but this is out of the scope of this manuscript as the observed patterns of distribution of H1 variants do not depend on mild differences in variants abundance. Only the absence of H1.3 and H1.5 in some cell lines alters the distribution of other variants.

      Additional author responses to the Public Review comments made by some Reviewer:

      (1) Respect to the functional significance of the results presented here, we want to stress that as a consequence of the differential distribution and abundance of H1 variants among cell types, depletion of different variants has different consequences. For example, H1.2 depletion but not others has a great impact on chromatin compaction. Besides, cell lines lacking H1.3/H1.5 expression present a basal up-regulation of some Interferon stimulated genes (ISGs) and particular repetive elements, as it was previously described upon induced depletion of H1.2/H1.4 in a breast cancer cell line or in pancreatic adenocarcinomas with lower levels of replication-dependent H1 variants (Izquierdo et al. 2017 NAR 45:11622). So, our results reinforce the existing link between H1 content and immune signature. We have added this data in the revised manuscript (Figure 5-figure supplement 5).

      Moreover, we also analyzed the chromatin structural changes upon combined depletion of H1.2 and H1.4. Combined H1.2/H1.4 depletion triggers a global chromatin decompaction, which supports previous observations from ATAC-Seq and Hi-C experiments in these cells (Izquierdo et al. 2017 NAR 45:11622; Serna-Pujol et al. 2022 NAR 50:3892). Although H1 content is more compromised in these cells (30% total H1 reduction) compared to single H1 KDs, the phenotype observed could not be recapitulated when other H1 KD combinations, in which total H1 content was reduced similarly, were investigated (Izquierdo et al. 2017 NAR 45:11622), supporting that the deleterious defects were due to the non-redundant role of H1.2 and H1.4 proteins. Indeed, this manuscript supports this notion, as H1.2 and H1.4 show a different genomewide and nuclear distribution.

      (2) Our immunofluorescence data, together with ChIP-seq data, do not discard binding of H1 variants to a great variety of chromatin, but show enrichment or preferential binding to certain regions or chromatin types. Our data on the interphase nuclei does not suggest at all any type of quenching or saturation. Obviously, detection with antibodies depends on epitope accessibility, just like all immunofluorescence data ever published, and we have acknowledged that post-translational modifications of H1 may occlude antibody accessibility as some phospho-H1 antibodies give distribution patterns different than total/unmodified H1 antibodies. Thus, we cannot exclude that specific modified-H1s exhibit particular distribution patterns that are not being recapitulated in our data. This represents another layer of complexity in H1 diversity and we agree that exploration of the repertoire of H1 PTMs and their functional roles are an interesting matter of study that needs to be addressed. Still, our data is highly relevant as it demonstrates for the first time the unique distribution patterns of H1 variants among multiple cell lines and it does not use overexpression of tagged H1 variants that in our experience may produce mislocalization of H1s.

      (3) We do have investigated co-localization of H1 variants with HP1alpha protein and we have added this data in the revised version of this manuscript (Figure 1-figure supplement 1C-D).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      The authors report a novel hepatic lncRNA FincoR regulating FXR with therapeutic implications in the treatment of MASH. The findings are important and use an appropriate methodology in line with the current state-of-the-art, with convincing support for the claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the article titled "Hammerhead-type FXR agonists induce an eRNA FincoR that ameliorates nonalcoholic steatohepatitis in mice," the authors explore the role of the Farnesoid X Receptor (FXR) in treating metabolic disorders like NASH. They identify a new liver-specific long non-coding RNA (lncRNA), FincoR, regulated by FXR, notably induced by agonists such as tropifexor. The study shows that FincoR plays a significant role in enhancing the efficacy of tropifexor in mitigating liver fibrosis and inflammation associated with NASH, suggesting its potential as a novel therapeutic target. The study makes a promising contribution to understanding the role of FincoR in alleviating liver fibrosis in NASH, providing initial insights into the mechanisms involved. While it offers a valuable starting point, there is potential for further exploration into the functional roles of FincoR and their specific actions in human NASH cases. Building upon the current findings to elucidate more detailed mechanistic pathways through which FincoR exerts its therapeutic effects in liver disease would elevate the research's significance and potential impact in the field.

      Strengths:

      This study stands out for its comprehensive and unbiased approach to investigating the role of FincoR, a liver-specific lncRNA, in the treatment of NASH. Key strengths include: 1) The application of advanced sequencing methods like GRO-seq and RNA-seq offered a comprehensive and unbiased view of the transcriptional changes induced by tropifexor, particularly highlighting the role of FincoR. 2) Utilizing a genetic mouse model of FXR KO and a FincoR liver-specific knockdown (FincoR-LKD) mouse model provided a controlled and relevant environment for studying NASH, allowing for precise assessment of tropifexor's therapeutic effects. 3) The inclusion of tropifexor, an FDAapproved FXR agonist, adds significant clinical relevance to the study. It bridges the gap between experimental research and potential therapeutic application, providing a direct pathway for translating these findings into real-world clinical benefits for NASH patients. 4) The study's rigorous experimental design, incorporating both negative and positive controls, ensured that the results were specifically attributable to the action of FincoR and tropifexor.

      Weaknesses:

      The study presents several notable weaknesses that could be addressed to strengthen its findings and conclusions: 1) The authors focus on FincoR, but do not extensively test other lncRNAs identified in Figure 1A. A more comprehensive approach, such as rescue experiments with these lncRNAs, would provide a better understanding of whether similar roles are played by other lncRNAs in mitigating NASH. 2) FincoR was chosen for further study primarily because it is the most upregulated lncRNA induced by GW4064. Including another GW4064-induced lncRNA as a control in functional studies would strengthen the argument for FincoR's unique role in NASH. 3) The study does not conclusively demonstrate whether FincoR is specifically expressed in hepatocytes or other liver cell types. Conducting FincoR RNA-FISH with immunofluorescent experiments or RT-PCR, using markers for different liver cell types, would clarify its expression profile. 4) Understanding the absolute copy number of FincoR is crucial. Determining whether there are sufficient copies of FincoR to function as proposed would lend more credibility to its suggested role. 5) The manuscript, although technically proficient, does not thoroughly address the relevance of these findings to human NASH. Questions like the conservation of FincoR in humans and its potential role in human NASH should be discussed.

      Reviewer #2 (Public Review):

      Summary:

      Nonalcoholic fatty liver disease (NASH), recently renamed as metabolic dysfunctionassociated steatohepatitis (MASH) is a leading cause of liver-related death. Farnesoid X receptor (FXR) is a promising drug target for treating NASH and several drugs targeting FXR are under clinical investigation for their efficacy in treating NASH. The authors intended to address whether FXR mediates its hepatic protective effects through the regulation of lncRNAs, which would provide novel insights into the pharmacological targeting of FXR for NASH treatment. The authors went from an unbiased transcriptomics profiling to identify a novel enhancer-derived lncRNA FincoR enriched in the liver and showed that the knockdown of FincoR in a murine NASH model attenuated part of the effect of tropifexor, an FXR agonist, namely inflammation and fibrosis, but not steatosis. This study provides a framework for how one can investigate the role of noncoding genes in pharmacological intervention targeting known protein-coding genes. Given that many disease-associated genetic variants are located in the non-coding regions, this study, together with others, may provide useful information for improved and individualized treatment for metabolic disorders.

      Strengths:

      The study leverages both transcriptional profile and epigenetic signatures to identify the top candidate eRNA for further study. The subsequent biochemical characterization of FincoR using FXR-KO mice combined with Gro-seq and Luciferase reporter assays convincingly demonstrates this eRNA as a FXR transcriptional target sensitive to FXR agonists. The use of in vitro culture cells and the in vivo mouse model of NASH provide multi-level evaluation of the context-dependent importance of the FincoR downstream of FXR in the regulation of functions related to liver dysfunction.

      Weaknesses:

      As discussed, future work to dissect the mechanisms by which FincoR facilitates the action of FXR and its agonists is warranted. It would be helpful if the authors could base this on the current understanding of eRNA modes of action and the observed biochemical features of FincoR to speculate potential molecular mechanisms explaining the observed functional phenotype. It is unclear if this eRNA is conserved in humans in any way, which will provide relevance to human disease. Additionally, the eRNA knockdown was achieved by deletion of an upstream region of the eRNA transcription. A more direct approach to alter eRNA levels, e.g., overexpression of FincoR in the liver would provide important data to interpret its functional regulation.

      We thank the Editor and Reviewers for their constructive comments. We believe we have addressed all of the issues (detailed below) and the revisions have greatly strengthened the manuscript.

      Reviewer 1:

      The study presents several notable weaknesses that could be addressed to strengthen its findings and conclusions:

      (1) The authors focus on FincoR, but do not extensively test other lncRNAs identified in Figure 1A. A more comprehensive approach, such as rescue experiments with these lncRNAs, would provide a better understanding of whether similar roles are played by other lncRNAs in mitigating NASH.

      (2) FincoR was chosen for further study primarily because it is the most upregulated lncRNA induced by GW4064. Including another GW4064-induced lncRNA as a control in functional studies would strengthen the argument for FincoR's unique role in NASH.

      (3) The study does not conclusively demonstrate whether FincoR is specifically expressed in hepatocytes or other liver cell types. Conducting FincoR RNA-FISH with immunofluorescent experiments or RT-PCR, using markers for different liver cell types, would clarify its expression profile.

      (4) Understanding the absolute copy number of FincoR is crucial. Determining whether there are sufficient copies of FincoR to function as proposed would lend more credibility to its suggested role.

      Response to 1 - 4): We thank Reviewer 1 for the positive comments on the strength of our work, including the open-ended approach, the novel eRNA FincoR and its strong relevance to liver disease. We also value the constructive feedback provided by the reviewer and agree that additional studies are important to fully understand the mechanisms of FincoR and the functional significance of other FXR-induced lncRNAs. In this manuscript we report the discovery and initial characterization of FincoR, as well as its potential function in FXR action in response to hammerhead agonists, but a number of interesting questions are raised. Future experiments, as suggested by reviewer, will be needed to examine the role of other FXR-induced lncRNAs, the potential role of FincoR induction by other nuclear receptors with binding sites at FincoR, whether FincoR is expressed in liver cell types in addition to hepatocytes, and the expression abundance of FincoR. These are all excellent suggestions for future experimentation which we feel are beyond the scope of the present report. For example, generating a genetic CRISPR/Cas9 of another lncRNA is not trial as it takes a significant amount of work with murine models. Also, we did not mean to exclude if other lncRNAs induced by FXR also bear functions. Technically, rescue experiment is not possible as FincoR RNA can be potentially very long (~10 kb if estimated by RNA-seq pattern in Fig.1C), and it is not feasible now to properly express it by exogenous vectors to ensure the expression levels are similar to endogenous ones. We therefore consider that these important questions are more suitable for future work to fully address. Our belief is that a comprehensive exploration of FXR-regulated lncRNAs holds the potential to unveil novel insights crucial for the development of therapies targeting NASH and other metabolic diseases. The study of FincoR is the beginning of this area of research.

      (5) The manuscript, although technically proficient, does not thoroughly address the relevance of these findings to human NASH. Questions like the conservation of FincoR in humans and its potential role in human NASH should be discussed.

      Response: These are important questions. To respond to the reviewer’s comment, new experiments are presented in our final revised manuscript in which we utilized mouse models of NAFLD/NASH and cholestatic liver injury to determine FincoR’s role in these diseases. Hepatic FincoR levels were significantly increased in mice fed with high fat diet (HFD) for 12 weeks (Supplementary Figure S1A) and in mice fed a HFD with high fructose (HFHF) in drinking water for 12 weeks (Supplementary Figure S1B). Elevated hepatic FincoR levels were also observed in mice treated with α-naphthylisothiocyanate (ANIT), a chemical inducer of liver cholestasis (Supplementary Figure S1C), and in mice with bile duct ligation (BDL), a surgical method to induce cholestatic liver injury (Supplementary Figure S1D).

      In terms of the human relevance, we have provided additional information and figures showing that there is sequence similarity between mouse FincoR and a human loci. FincoR sequence is moderately conserved between mice and humans as displayed in the UCSC genome browser (Supplementary Figure S1E). Annotation of these conserved human sequences revealed that they overlap with a functionally uncharacterized human lncRNA XR_007061585.1 (Supplementary Figure S1F). Further, we conducted qRT-PCR experiment from human patient’s RNA samples, which demonstrated that hepatic lncRNA XR_007061585.1 levels are elevated in patients with NAFLD and PBC, but not in severe NASH-fibrosis patients (Supplementary Figure S1G, H). These results demonstrate that hepatic levels of a potential human analog of FincoR are elevated in NAFLD and PBC patients, which is consistent with FincoR’s upregulation in mouse models of chronic liver disease with hepatic inflammation and liver injury. Whether human lncRNA XR_007061585.1 is entirely analogous to mouse FincoR in terms of functions and mechanisms, and whether the elevation of this human lncRNA has a role in liver disease progression or is an adaptive response to liver injury remains to be determined.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction Line 96, "..., while the vast majority are transcribed into ncRNAs" may not be accurate. Please refer to Pointing and Haerty Annu Rev 2022 for a related discussion.

      Response: We would like to thank the reviewer for pointing out this inaccurate information in the introduction. We have changed the content in the text, “While a significant portion of the genome was initially thought to be "junk DNA", it has been established that many non-coding regions give rise to functional non-coding RNAs.”

      (2) Figure 5: the authors should provide a clear illustration demonstrating the sequence targeted by the sgRNA in relation to the transcriptional and epigenetic profile (i.e., RNAseq and H3K27ac ChIP-seq data).

      Response: The illustration (Figure 5-figure supplement 1A, right panel) demonstrating the sequence targeted by the sgRNA has been updated as suggested by the reviewer.

      In this model, the upstream of FincoR is deleted, leading to the inhibition of FincoR transcription. Does the deleted region include FXR binding sites? If so, would the phenotype be due to the deletion of these binding sequences, rather than the decreased FincoR transcripts? Accordingly, the limitation or alternative interpretation should be discussed.

      Response: The reviewer made a good point. The deleted region includes FXR binding sites so that we cannot rule out decreased binding of FXR or decreased transcription of the region per se, in addition to the decreased levels of FincoR, to bear a role in the phenotypic changes we observed. In the final revision, we have added discussion of this alternative (6th paragraph in the revised discussion section).

      (3) Figure 6C, the images should be accompanied by quantification. It appears the FincoR-KD shows a visible difference as compared to Tropifexor-treated control mice, which does not match entirely what is written in the results.

      Response: The quantitation of Oil Red O staining has been done as suggested by the reviewer (Figure 6C). The result is consistent with the triglyceride result showing that tropifexor treatment markedly reduced neutral lipids determined by Oil Red O staining of liver sections (Figure 6C) and liver TG levels (Figure 6D) and these beneficial effects on reducing fatty liver were not altered by FincoR.

      (4) Figure 7, does AST show the same pattern as ALT? As indicated from Line 335, "tropifexor treatment reduced mRNA levels of several genes that promote fibrosis (Col1a1, Col1a2, ...)". Fig. 7D does not seem to match the description of Col1a1. Authors may need to modify the results.

      Response: AST has been measured and has the same pattern as ALT. The new data have been added to Figure 7B. Col1a1 expression has been re-measured and the results have been updated in Figure 7D.

      (5) Is FincoR level reduced in NASH conditions?

      Response: We thank the Reviewer for this question. We now added new data to examine the levels of FincoR in mouse liver disease models and also examined levels of a potential human analog of FincoR in human liver specimens from PBC, NAFLD, and NASH patients. Please see our new data and description above in the response to comment 5 by Reviewer 1 (most data now included in the new Supplementary Figure S1).

      (6) Please provide information on the conservation of FincoR (DNA and RNA) in humans. This would be important to provide the human disease relevance.

      Response: As described above in the response to comment 5 of reviewer 1, a human loci shows sequence similarity to mouse FincoR and this conserved region has an annotated uncharacterized human lncRNA. We also examined the levels of this human homolog in human diseased liver samples. Our new results demonstrate that hepatic levels of a potential human analog of FincoR are elevated in NAFLD and PBC patients, which is consistent with FincoR’s upregulation in mouse models of chronic liver disease with hepatic inflammation and liver injury. Whether human lncRNA XR_007061585.1 is entirely analogous to mouse FincoR in terms of functions and mechanisms, and whether the elevation of this human lncRNA has a role in liver disease progression or is an adaptive response to liver injury remains to be determined.

      (7) Several discussion points for the authors' consideration:

      (7.1) human-mouse conservation as alluded to in #6;

      Response: Potential human-mouse conservation is discussed with new data in the last paragraph of the Results section.

      (7.2) potential molecular mechanism involved in FincoR-regulated hepatocyte function;

      Response: We thank Reviewer for this comment. We have added more discussion as shown below: “RNA inside the cells usually associates with different RNA-binding proteins (RBPs). To predict those potential binding proteins of FincoR. Additional bioinformatic analysis identified proteins that potentially binding FincoR, including KHDRBS1, RBM38, YBX2 and YBX3 (Supplemental Table S5). These findings and potential functions of the binding proteins are discussed in the 5th paragraph of the discussion section in the final revised manuscript. Whether these predicted RBPs interact with FincoR and the underlying mechanisms will need to be investigated in future experimentation to understand the mechanisms involved in FincoR-regulated hepatocyte function.”

      (7.3) any disease-associated SNPs in the FincoR locus.

      Response: No SNPs were noted in the annotation of the human loci with sequence similarity to mouse FincoR in the NCBI genome data viewer.

      (7.4) the in vitro induction of FincoR is transient but in vivo this occurs after 12 days of drug treatment. How do the authors reconcile the differential induction patterns?

      Response: To clarify, the induction of FincoR after a single dose of GW4064 in vivo was transient, peaked within 1 h and then declined gradually (Figure 1-figure Supplement 1C). In the tropifexor treatment protocol (also in vivo), the mice were treated daily with tropifexor for 12 days so that the multiple doses maintained FincoR induction. The beneficial effect of tropifexor by inducing FincoR, therefore, accumulated over the 12 days.

      It is worthy to note that we failed to see induction of FincoR in isolated primary mouse hepatocytes treated with GW4064 in vitro. We can only detect FincoR in primary hepatocytes isolated from GW4064-treated mice liver. This may be due to the loss of key factors mediating FincoR induction in the cultured primary hepatocytes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript describes valuable information on how the extraocular muscles (EOM) are preserved in a mouse model of familial Amyotrophic lateral sclerosis (ALS) that carries a G93A mutation in the Sod1 gene. The authors provide convincing evidence of how the integrity of neuromuscular junction is preserved in EOM but not in limb and diaphragm muscles of G93A mice. Overall, this interesting work provides new evidence regarding the etiopathogenesis of ALS and insights for the development of therapeutic targets to slow the loss of neuromuscular function in ALS.

      Public Reviews:

      Reviewer#1 (Public Review):

      Summary:

      The study explores the mechanisms that preserve satellite cell function in extraocular muscles (EOMs) in a mouse model of familial Amyotrophic lateral sclerosis (ALS) that carries the G93A mutation in the Sod1 gene. ALS is a fatal neuromuscular disorder driven by motor neuron degeneration, leading to progressive wasting of most skeletal muscles but not EOM. The study first established that integrity of neuromuscular junction (NMJ) is preserved in EOM but not in limb and diaphragm muscles of G93A mice, and sodium butyrate (NaBu) treatment partially improves NMJ integrity in limb and diaphragm muscles of G93A mice. They also found a loss of synaptic satellite cells and renewability of cultured myoblasts in hindlimb and diaphragm muscles of G93A mice, but not in EOM, and NaBu treatment restores myoblast renewability. Using RNA-seq analysis, they identify that exon guidance molecules, particularly Cxcl12, are highly expressed in EOM myoblasts, along with more sustainable renewability. Using a neuromuscular co-culture model, they convincingly show that AAV-mediated Cxcl12 expression in G93A myotubes enhances motor axon extension and innervation. Strikingly, NaBu-mediated preservation of NMJ in limb muscles of G93A mice is associated with elevated expression of Cxcl12 in satellite cells and improved renewability of myoblasts. These results together offer molecular insights into genes critical for maintaining satellite cell function and revealing a mechanism through which NaBu ameliorates ALS.

      Strengths:

      Combination of in vivo and cell culture models. Nice imaging of NMJ and associated satellite cells. Using motoneuron-myotube coculture to establish the mechanism. Tested and illustrated a mechanism through which a clinically used drug ameliorates ALS.

      Weaknesses:

      Data presentation could be improved (see details in the Recommendation for Authors).

      It would have been nice to have included G93A motoneurons in the coculture study.

      This is indeed a plan of our future study. In the revised version, we discussed the limitation of not including G93A motor neurons in the coculture assay. (Page 11, Line 445-448)

      “However, it is possible that motor neurons carrying ALS mutations will respond differently to Cxcl12 mediated axon guidance than WT motor neurons. This is a limitation of the current study which will be investigated in future co-culture studies.”

      Reviewer #2 (Public Review):

      Summary:

      The work is potentially interesting as it outlines the role of satellite cells in supporting the functional decline of skeletal muscle due to the denervation process. In this context the authors analyze the functional and molecular characteristics of satellite cells in different muscle types differently affected by the degenerative process in the ALS model.

      Strengths:

      The work illustrates a relevant aspect of the differences in stem cell potential in different skeletal muscles in a mouse model of the disease through a considerable amount of data and experimental models.

      Weaknesses:

      However, there are some criticisms of the structuring of the results:

      It is not clear how many animals were used in each experimental group (Figs 1 and 2, Fig. 2-9). In particular, it is unclear whether the dots in the histograms represent biological or technical replicates. Furthermore, the gender used in experimental groups is never specified. This last point appears to be important considering the gender differences observed in the SOD1G93A mouse model.

      The original quantification data and mouse gender specification were actually listed in the corresponding supplementary tables. We now added the gender specification and number of the mice used in all corresponding figure legends. The number of mice used for sorting SCs from different muscles were also specified in the Methods section in the revised manuscript. (Page 12, Line 489-493).

      We also added one more supplementary figure (Figure 1-figure supplement 2) to compare the innervation status between male and female mice. The following description has been added in the updated manuscript (Page 3-4; Line 125-130):

      “The data shown in Figure 1B has also been replotted to compare the innervation status between male and female mice (Figure 1- figure supplement 2). In terms of well- or partially- innervated ratios, there are no significant gender difference observed in our experimental condition, in which the muscle samples were collected at the end stage of the disease, although there is marginally lower “poorly innervated ratio” in the EDL muscle of G93A female mice compared to G93A male mice.”

      However, we acknowledge that the current study has limitations to fully detect cross-gender differences in our experiments due to low “n” numbers per gender. We hope this is understandable as we have to split limited resource of ALS G93A mice between different kinds of experiments, including NMJ integrity assessment, peri-nuclear SC abundance assessment, whole muscle-qPCR, cell sorting for imaging, cell sorting for RNA-Seq, cell-sorting for qPCR, cell-sorting for neuromuscular co-culture, etc., in this pioneer study. However, we do intend to gradually build up “n” numbers for characterization of cross-gender difference in our ongoing studies.

      As to what the dots in each plot represent, we have inserted the description in each relevant figure legend as detailed below:

      For Fig 1, each dot represents quantification result from a single mouse. Please see Figure 1-figure supplement 1, Figure 1-figure supplement 2 and Figure 1-table supplement 1 for NMJs measured per muscle type per gender. Briefly, EDL, soleus and diaphragm muscles were from 4 male and 6 female mice per group; WT EOM group was from 4 male and 4 female mice; G93A EOM group was from 3 male and 4 female mice; G93A EOM with NaBu feeding group was from 6 female mice.

      For Fig 2, each dot represents quantification result from a single mouse. Please see Figure 2-table supplement 1 for NMJs measured per muscle type per gender. Briefly, WT EDL group was from 2 male and 2 female mice; G93A EDL group was from 3 male and 3 female mice; G93A EDL with NaBu feeding group was from 2 male and 4 female mice; WT soleus group was from 2 male and 3 female mice; G93A soleus group was from 3 male and 2 female mice; G93A soleus with NaBu feeding group was from 1 male and 4 female mice; WT diaphragm group was from 1 male and 4 female mice; G93A diaphragm group was from 1 male and 4 female mice; G93A diaphragm with NaBu feeding group was from 4 female mice; WT EOM group was from 1 male and 3 female mice; G93A EOM group was from 5 female mice; G93A EOM with NaBu feeding group was from 1 male and 3 female mice.

      For Fig 3, each dot in the box-and-dot plots represents result from one round of sorting. WT HL SCs were from 8 male and 6 female mice; G93A HL SCs were from 9 male and 5 female mice; WT diaphragm SCs were from 6 male and 3 female mice; G93A diaphragm SCs were from 12 male and 5 female mice. WT EOM SCs were from 6 batches of male and 1 batch of female mice (each batch contains 5-6 mice of the same gender). G93A EOM SCs were from 5 batches of male and 2 batches of female mice.

      *Please note these results were from sorting in which the FACS profiles were recorded. Not all rounds of sorting were with FACS profile recorded.

      For Fig 4A, each dot in the box-and-dot plots represents one image analyzed. For WT HL SCs, 94 images from 3 rounds of sorting; For WT Dia SCs, 107 images from 3 rounds of sorting; For WT EOM SCs, 75 images from 3 rounds of sorting; For G93A HL SCs, 96 images from 3 rounds of sorting; For G93A Dia SCs, 62 images from 3 rounds of sorting; For G93A EOM SCs, 79 images from 3 rounds of sorting. For the 3 rounds of sorting, 1 was from male and 2 were from female mice.

      *Please note that the number of mice used for sorting SCs in different muscles were specified in the Method Section in the revised manuscript. (Page 12, Line 489-493)

      For Fig 4B, each dot in the box-and-dot plots represents one image analyzed. For WT HL SCs, 52 images from 3 rounds of sorting; For WT Dia SCs, 51 images from 3 rounds of sorting; For WT EOM SCs, 51 images from 3 rounds of sorting; For G93A HL SCs, 52 images from 3 rounds of sorting; For G93A Dia SCs, 47 images from 3 rounds of sorting; For G93A EOM SCs, 56 images from 3 rounds of sorting. For the 3 rounds of sorting, 1 was from male and 2 were from female mice.

      For Fig 5A, each dot in the box-and-dot plots represents one replicate of culture. HL SCs were from male mice.

      For Fig 5B, each dot in the box-and-dot plots represents one image analyzed. For G93A HL SCs, 52 images from 3 rounds of sorting; 1-day NaBu treatment, 45 images from 3 rounds of sorting; 3-day NaBu treatment, 51 images from 3 rounds of sorting; For G93A Dia SCs, 47 images from 3 rounds of sorting; 1-day NaBu treatment, 60 images from 3 rounds of sorting; 3-day NaBu treatment, 57 images from 3 rounds of sorting. For the 3 rounds of sorting, 2 were from male and 1 was from female mice.

      For Fig 6, all samples used for bulk RNA-Seq were from female mice.

      For Fig 7C, each dot in the box-and-dot plots represents one replicate of culture. RNA samples were collected from 3-6 rounds of sorting and sorted cells were seeded into 3 dishes as replicates. WT HL SCs were from 3 male and 1 female mice. WT diaphragm SCs were from 2 male and 2 female mice; WT EOM SCs were from 3 male mice; G93A HL SCs were from 4 male and 2 female mice. G93A diaphragm SCs were from 1 male and 3 female mice; G93A EOM SCs were from 3 male mice.

      For Fig 7D, each dot in the box-and-dot plots represents one replicate of culture. RNA samples were collected from 6 rounds of sorting and sorted cells were seeded into 3 dishes as replicates. G93A HL SCs were from 4 male and 2 female mice; G93A diaphragm SCs were from 2 male and 4 female mice.

      For Fig 8D, each dot in the box-and-dot plot represents one neurite measured. HL and EOM SCs used for co-culture experiments were all from male mice.

      For Fig 9D, each dot in the box-and-dot plot represents one image analyzed. HL and EOM SCs used for co-culture experiments were all from male mice.

      For Figure 1-figure supplement 1, each dot in the box-and-dot plots represents quantification result from one mouse. Please also see Figure 1-table supplement 2. Briefly, muscles in WT and G93A groups were from 3 male and 3 female mice per group; G93A EDL with NaBu feeding group was from 3 male and 3 female mice. G93A soleus with NaBu feeding group was from 2 male and 3 female mice; G93A diaphragm with NaBu feeding group was from 2 male and 4 female mice; G93A EOM with NaBu feeding group was from 4 male and 2 female mice.

      The first paragraph of the results lacks a functional analysis of the motor decline of the animals after the administration of sodium butyrate. The authors, in fact, administered NaBu around 90 days of age while in previous work the drug had been administered at a pre-symptomatic age. It would therefore be useful, to make the message more effective, to characterize the locomotor functions of the treated animals in parallel with the histological evidence of the integrity of the NMJ.

      We are still in the process of collecting locomotor function data for G93A mice with and without NaBu treatment. We plan to report them in a future manuscript while this manuscript focuses on the molecular and histological aspect. Additionally, in the revised manuscript, we revised the rationale of the NaBu treatment starting after the disease onset. (Page 4, Line 131-134)

      “In the previous study, NaBu treatment initiated at a pre-symptomatic age delayed disease progression in G93A mice. As treatment of ALS patients is initiated after symptoms appear, we further tested whether NaBu treatment started after disease onset (at the age of 3 months, 2% NaBu in water for 1 month) was effective in preserving NMJ integrity.”

      Figure 5 should be completed with the administration of NaBu also to the satellite cells isolated from the WT mouse, the same for figure 9 where AAV-CMV-Cxcl12 transduction of WT myotubes is missing. We appreciate the reviewer’s suggestion of conducting the additional experiment with AAV-delivery of CXCL12 into the myotubes derived from the WT mice. Extensive studies by other investigators have been performed with butyrate on satellite cells derived from WT mice. To name a few here: Fiszman et al., 1980 (DOI: 10.1016/0014-4827(80)90467-X); Johnston et al., 1992 (DOI: 10.1128/mcb.12.11.5123-5130.1992); Lezzi et al., 2002 (DOI: 10.1073/pnas.112218599). To avoid performing redundant experiments, we focus on the effect of butyrate on the proliferation and differentiation of SCs derived from G93A mice. Thanks to the reviewer’s comment, we added additional discussion in the Results section (Page 6, line 216-217). Regarding the effect of Cxcl12, published studies have demonstrated its role in promoting axon growth. To name a few here: Negro et al., 2017 (DOI: 10.15252/emmm.201607257); Lieberam et al., 2005 (DOI: 10.1016/j.neuron.2005.08.011); Whitman et al., 2018 (DOI: 10.1167/iovs.18-25190). (Page 10, line 434, 440-442).

      In the experiment illustrated in Figure 8, treatment of cell cultures with NaBu would improve the outcome as well as the interference of Cxcl12 expression in myotubes derived from G93A EOM SC (Fig.9) would strengthen the specificity of this protein in axon guidance in this NMJ typical of a spared muscle in ALS.

      This is a great suggestion. Our study demonstrated the overexpression of CXCL12 in G93A myotube can enhance the axonal guidance and innervation of the co-cultured myotube/moto-neurons. We have also demonstrated the NaBu treatment can enhance the expression of CXCL12 and slow ALS progression. Combining NaBu treatment with CXCL12 overexpression may indeed have additive therapeutic benefits to slow ALS progression. We have added this statement in the revised Discussion. (Page 11, Line 466-468)

      In the "materials and methods" section the paragraph relating to the methods used for statistical analysis is missing.

      We have added it accordingly. (Page 15, Line 631-636)

      Reviewer #3 (Public Review):

      Summary:

      In their paper, Li et al. investigate the transcriptome of satellite cells obtained from different muscle types including hindlimb, diaphragm, and extraocular muscles (EOM) from wild-type and G93A transgenic mice (end-stage ALS) in order to identify potential factors involved in the maintenance of the neuromuscular junction. The underlying hypothesis is that since EOMs are largely spared from this debilitating disease, they may secrete NMJ-protective factors. The results of their transcriptome analysis identified several axon guidance molecules including the chemokine Cxcl12, which are particularly enriched in EOM-derived satellite cells. Transduction of hindlimb-derived satellite cells with AAV encoding Cxcl12 reverted hindlimb-derived myotubes from the G93A mice into myotubes sharing phenotypic characteristics similar to those of EOM-derived satellite cells. Additionally, the authors were able to demonstrate that EOM-derived satellite cell myotube cultures are capable of enhancing axon extensions and innervation in co-culture experiments.

      Strengths:

      The strength of the paper is that the authors successfully isolated and purified different populations of satellite cells, compared their transcriptomes, identified specific factors released by EOM-derived satellite cells, overexpressed one of these factors (the chemokine Cxcl12) by AAV-mediated transduction of hindlimb-derived satellite cells. The transduced cells were then able to support axon guidance and NMJ integrity. They also show that administration of Na butyrate to mice decreased NMJ denervation and satellite cell depletion of hind limbs. Furthermore, the addition of Na Butyrate to hindlimb-derived satellite cell myotube cultures increased Cxcl12 expression. These are impressive results providing important insights for the development of therapeutic targets to slow the loss of neuromuscular function characterizing ALS.

      Weaknesses:

      Several important aspects have not been addressed by the authors, these include the following points which weaken the conclusions and interpretation of the results.

      (a) Na Butyrate was shown to extend the survival of G93A mice by Zhang et al. Na butyrate has a variety of biological effects, for example, anti-inflammatory effects inhibit mitochondrial oxidative stress, positively influence mitochondrial function, is a class I / II HDAC inhibitor, etc. What is the mechanism underlying its beneficial effects both in the context of mouse muscle function in the ALS G93A mice and in the in vitro myotube assay? Cytokine quantification as well as histone acetylation/methylation can be assessed experimentally and this is an important point that has not been appropriately investigated.

      Great suggestion by the reviewer.

      Our previous publications (DOI: 10.3390/biom12020333; DOI: 10.3390/ijms22147412) have shown the beneficial roles of NaBu in ameliorating mitochondrial function in both motor neuron-like cells and adult muscle fibers. A focus of the current study is to test whether NaBu treatment also affect the SCs by regulating their gene transcription. Regarding the potential on HDAC/acetylation modification, there are previous studies by other investigators. We have added these references in the Discussion (Page 11, line 466-468).

      (b) In the context of satellite cell characterization, on lines 151-152 the authors state that soleus muscles were excluded from further studies since they have a higher content of slow twitch fibers and are more similar to the diaphragm. This justification is not valid in the context of ALS as well as many other muscle disorders. Indeed, soleus and diaphragm muscles contain a high proportion of slow twitch fibers (up to 80% and 50% respectively) but soleus muscles are more spared than diaphragm muscles. What makes soleus muscles (and EOMs) more resistant to ALS NMJ injury? Satellite cells from soleus muscles need to be characterized in detail as well.

      We agree with the reviewer’s comment that our original statement is misleading regarding the difference between soleus and diaphragm muscles in terms of the content of slow twitch fibers. Our histological studies revealed similar defects in denervation of diaphragm and soleus muscles derived from the G93A mice. Most importantly, the degree of NMJ degeneration and atrophy is less severe in soleus compared to other hindlimb muscles, such as EDL, during ALS progression. We have cited related studies such as Valdez et al., 2012 (DOI: 10.1371/journal.pone.0034640), Atkin et al., 2005 (DOI: 10.1016/j.nmd.2005.02.005). To avoid any confusion, we have removed the original statement and revised the paragraph (Page 4, line 159-162).

      “The three groups were determined because they represent the most severely affected, moderately affected and least affected muscles by ALS progression, respectively. Soleus was not included in the hindlimb SCs pool because its less affected than other hindlimb muscles based on our study and others [6,42].”

      Furthermore, EOMs are complex muscles, containing many types of fibers and expressing different myosin heavy chain isoforms and muscle proteins. The fact that in mice both the globular layer and orbital layers of EOMs express slow myosin heavy chain isoform as well as myosin heavy chain 2X, 2A, and 2B (Zhou et al., 2010 IOVIS 51:6355-6363) also indicates that the sparing is not directly linked to the fast or slow twitch nature of the muscle fiber. This needs to be considered.

      We greatly appreciate your suggestions and have included these points in the revised Discussion. “It is known that EOMs are complex muscles. Besides the developmental myosin isoforms, EOMs also express both adult fast and slow myosin contractile elements (Zhou et al., 2010 IOVIS 51:6355-6363), suggesting that the sparing may not be solely linked to the fast or slow twitch nature of the muscle fiber, rather the changes in SCs may play a pivotal role in preserving the EOM function during the progression of ALS. ” (Page 9, line 389-392)

      (c) In the context of myotube formation from cultured satellite cells on lines 178-179 the authors stained the myotubes for myosin heavy chain. Because of the diversity of myosin heavy chain isoforms and different muscle origins of the satellite cells investigated, the isoform of myosin heavy chain expressed by the myotubes needs to be tested and described. It is not sufficient to state anti-MYH.

      We used the pan-anti-MYH antibody (MF20 from DSHB) for the immunostaining of myosin heavy chain for identification of the differentiated myotubes. As described in the commercial website: https://dshb.biology.uiowa.edu/MF-20), FM20 recognizes all myosin heavy chain isoforms. We are happy to examine whether specific myosin heavy chain isoforms may contribute to the differences observed in future studies.

      (d) The original RNAseq results have not been deposited and while it is true that the authors have analyzed the results and described them in Figures 6 and 7 and relative supplements, the original data needs to be shown both as an xls list as a Volcano plots (q value versus log2 fold change). This will facilitate the independent interpretation of the results by the readers as some transcripts may not be listed. As presented it is rather difficult to identify which transcripts aside from Cxcl12 are commonly upregulated. Can the data be presented in a more visual way?

      We have uploaded the Fastq files and the text files containing TPM values to the Gene Expression Omnibus (GEO) database and included the GEO access number GSE249484 in the revised text. Per recommendation of the reviewer, we have added supplementary tables for Figure 6, to list the top 20 differentially expressed genes (ranked by Log2FC, both the upregulated and downregulated) comparing 1) EOM SCs to their hindlimb and diaphragm counterparts (Figure 6-table supplement 1); 2) G93A SCs to WT SCs of the same muscle origin (Figure 6-table supplement 2); 3) G93A hindlimb and diaphragm SCs with 3 day-NaBu treatment to those without (Figure 6-table supplement 3). (Page 6, Line 237-257)

      (e) There is no section describing the statistical analysis methods used. In many figures, more than 2 groups are compared so the authors need to use an ANOVA followed by a post hoc test.

      Thank for the comments. We have added it accordingly. (Page 15, Line 631-636)

      The authors have achieved their aim in showing that satellite cells derived from EOMs have a distinct transcriptome and that this may be the basis of their sparing in ALS. Furthermore, this work may help develop future therapeutic interventions for patients with ALS.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The prevailing hypothesis of ALS is that motoneuron degeneration subsequently induces muscle atrophy and wasting. However, evidence also suggests that ALS is a muscle disease independent of motoneuron degeneration. The results from the current study support the latter. The RNA-seq data from cultured myoblasts (without innervation) suggest cell cell-autonomous effect of G93A on muscle cells. While the current analyses in this study identify axon guidance pathways in EOM satellite cells that may underlie their unique gene program that enhances motoneuron function, the powerfulness of the RNA-seq data is underutilized. I suggest that the authors explore the RNA-seq further by comparing genes and pathways altered by G93A in various muscles to better pinpoint how G93A influences satellite cell function.

      Thanks for the comments and advice. Further analysis of the RNA-seq data is planned. As our original sequencing provider became unavailable to us since last year, we are currently negotiating with other sequencing providers. We have deposited the raw data files into the GEO database (GSE249484) to foster further analyses by other researcher teams.

      To address the reviewer’s concern, we have added three more supplementary tables for Figure 6, which list the top 20 differentially expressed genes (DEG) (ranked by Log2FC, both the upregulated and downregulated) comparing 1) EOM SCs to their hindlimb and diaphragm counterparts (new Figure 6-table supplement 1); 2) G93A SCs to WT SCs of the same muscle origin (new Figure 6-table supplement 2); 3) G93A hindlimb and diaphragm SCs with and without 3 day-NaBu treatment (new Figure 6-table supplement 3). These three DEG lists are discussed in the results section of the revised manuscript as following (Page 6, Line 237-257).

      Figure 4 presentation could be improved by adopting a similar comparison (WT vs G93A) as used in Figure 1-3. The current comparison is not straightforward. In addition, a magnified image of panel A would demonstrate the loss of myoblast homeostasis more clearly. (AKA Figure 2B)

      The WT vs G93A comparison was presented in the supplementary figure of Figure 4 (Figure 4-figure supplement 1 in the previous version, and now in Figure 4-figure supplement 2 in the revised version).

      As requested, we have added magnified single channel representative images of cultured SCs in the new Figure 4-figure supplement 1 in the revised manuscript.

      Co-culture results in Figure 8 are very impressive. It would be nice if the data were quantified. The figure legend states that panel D is the quantification, but I don't see panel D. As the study used rat motoneurons (presumably SOD1 wildtype), it is unknown if G93A motoneurons would respond to muscle-derived CXCL12 similarly to the wildtype motoneurons. This information is crucial for understanding whether the SOD1 mutant ALS1 is a motoneuron disease or muscle disease or both. Some discussion should be provided to reflect the limitation (of not including G93A motoneurons in the coculture).

      Panel D (the quantification data) was presented in the original figure setting (but may not be obvious). We have now revised Figure 8 to enlarge panel D to clearly present the quantification data.

      We acknowledge the limitation of not including mutant G93A motor neurons in the coculture assay, and have added this important point (and our future plans to do so) in the discussion section of the revised manuscript: (Page 11, Line 445-448)

      “However, it is possible that motor neurons carrying ALS mutations may respond differently to Cxcl12 mediated axon guidance than WT motor neurons. This is a limitation of the current study, which will be investigated in future co-culture studies.”

      Reviewer #2 (Recommendations For The Authors):

      Line 108. The sentence: "Z-stack scans of glycerol-cleared 109 whole muscles were obtained using a high working distance lens in a confocal microscope. The z-stacks were compacted into 2D images by maximal intensity projection" and should be moved to the material and methods section.

      Removed from the Result section and added to the Method section as recommended (Page 13, Line 564-568).

      Linea 113. The sentence: " In order to quantify the extent of denervation in a categorical manner, NMJs were arbitrarily defined as "well innervated" if SYP staining was present in >60% of the BTX positive area, "partially innervated" if between 60% and 30%, and "poorly innervated" if SYP staining corresponded to less than 30% of the BTX positive area" has already been written in the figure legend.

      Thanks for the advice. We have rephrased the sentence to remove the redundant part.

      In lines 445-7, it would be better to indicate the enzymatic units instead of the concentrations.

      We included enzymatic units for the four enzymes in the Methods Section of revised manuscript (Page 12, Line 497-499).

      Reviewer #3 (Recommendations for The Authors):

      There are several points that need to be addressed by the authors including:

      (a) The authors need to provide experimental evidence as to the mode of action of Na Butyrate and more specifically whether its beneficial effect is mediated by its anti-inflammatory action, inhibition of HDACs, or the combination of several mechanisms. Additionally, it should be clearer why Na Butyrate was administered. The sentence referring to reference 36 is not sufficient and some mechanistic insight needs to be provided in the results section.

      Thanks for the great suggestion. We have revised the Results section accordingly to clarify the rationale for NaBu usage (please also see our detailed response to your suggestion above). (Page 4, line 131-134)

      (b) Their reason for excluding soleus-derived-satellite cells from the analysis is not valid. Soleus muscles are "more" speared than diaphragm muscles and analysis may help shed light on this observation.

      Please see our response to your question (b) in the above public review section.

      (c) DATA AVAILABILITY: The RNAseq raw untransformed data has not been provided and Volcano plots are also not shown. I find it quite difficult to follow the results of the RNAseq experiments and this is central to the interpretation of the paper's results. Ideally, one should be able to look at the data and draw his/her own conclusions but as it stands this is difficult to do.

      We have uploaded the raw FastQ files and the excel files containing TPM values to the GEO database with the access number GSE249484.

      (d) A detailed description of all statistical tests that were used needs to be provided.

      Yes, this has been added to the revised manuscript.

      (e) Many figure legends are incomplete and some panels are not described appropriately, indicating that the authors need to thoroughly revise all aspects of the manuscript.

      We have extensively edited the figure legends to address the issues raised by reviewers.

      (f) Line 96-98: it is unlikely that muscles from ALS patients will be biopsied frequently. Furthermore, what biomarkers exactly could be followed in patients in response to therapy? This is unclear.

      While it is true that it is not generally part of the diagnostic workup for ALS, muscle biopsy is increasingly being used pre- and post-treatment in ALS clinical trials to examine responses to potential new therapies. Muscle biopsy is also being explored in several ongoing studies as a potential ALS-relevant peripheral tissue amenable to biopsy (as opposed to brain or spinal cord) for predictive, pharmacodynamic, and prognostic biomarkers. This includes studies attempting to recapitulate pathophysiological patient clusters observed in CNS autopsy tissues and studies to detect aberrant TDP-43 aggregates in intramuscular nerve twigs, among others. Indeed, Dr. Ostrow’s clinical duties include performing muscle biopsies and interpreting muscle pathology, and he is involved in several ongoing studies attempting to correlate postmortem CNS and muscle analyses for these purposes.

      To avoid potential controversy on the feasibility of multiple biopsies, we rephrased the sentence as follows (Page 3, Line 96-98)

      “Characterizing the distinct EOM SC transcriptomic pattern could provide clues for identifying potential biomarkers in therapeutic trials in both ALS patients and animal models, in addition to identifying therapeutic targets.”

      (g) Line 388-389. What do the authors mean by this sentence? It is not clear.

      Thanks for the comment, we have expended the discussion to make it clearer in the revision. (Page 10, Line 428-431)

      “It is possible that the more frequent self-renewal and spontaneous activation of EOM SCs contribute to higher rate of mitochondrial DNA replication, leading to accelerated spreading of mitochondrial DNA defects, resulting in higher proportion of COX-deficient myofibers than other muscles”.

      (h) Were the experimenters blinded as to the results shown in Figures 2, 7, 8, and 9?

      We endeavored to blind experiments whenever possible. Not all experiments were blinded due to logistic complexity and the clear difference in microscopic and gross appearances of wild-type and mutant muscle. The differences observed in Figures 2, 7, 8, 9 are qualitative (ie more than just quantitative), which should minimize the impact of possible human bias. Additionally, we employed multiple different experimental approaches to assess our hypotheses.

      For Fig 2, the physical appearance is notably different between G93A and WT muscles. The different innervation status (Fig 2A) is also not amenable to blinding.

      For Fig 7, the expression level of Hmga2, Notch3 and Cxcl12 detected by the qPCR assay are substantially greater in EOM derived SCs than counterparts from other muscles, and these results are also consistent with RNA-Seq, immunofluorescence assays. For Fig 8, the overexpression of Cxcl12 and the coculture with EOM SC derived myotubes not only increased the length of the longest neurites but also promoted axon branching, which can be easily observed.

      For Fig 9, only the EOM SC derived myotubes were capable of aligning the neurites along with them on a global scale. This qualitative difference is easy to appreciate, even under low magnification.

      (i) Line 64 -65 The authors refer to a very old paper by Fischer et al in 2002 for the expression profile of EOMs. There are more recent papers including that of Eckhardt et al. (eLife 2023, 12:e83618) showing the differences in proteome between EOMs and soleus and EOMs and EDL muscles. There are more than 2000 (and not 300!!) differentially expressed proteins.

      Thank you for the newly published reference. We have revised the Introduction section to include this new proteomic study. (Page 2, Line 64-69)

      (j) Figure 7 C. The Y axis is mislabeled as they should be log2 fold change and not the growth conditions.

      Thank you for catching this. We have fixed it.

      (k) In all figures, if each symbol represents the results obtained on 1 mouse, this needs to be clearly stated. What do the panels on the right of Figures 4 and 5B show?

      Thanks for the comments. For Figure 1B and 2C, as well as Figure 1-figure supplement 1B, one dot in the box-and-dot plots represents result obtained from 1 mouse. For Figure 3B, one dot represents one round of sorting. Generally, one mouse was euthanized for each round of sorting for HL and diaphragm SCs. But the sorting of EOM SCs could take up to 6 mice (as the EOMs are much smaller). For Figure 4 and 5B, each dot represents one image analyzed. All images were collected from three rounds of sorting. For Figure 5A, each dot represents one replicate of culture. For Figure 5B, each dot represents one image analyzed. All images were collected from three rounds of sorting. We have indicated those details in the revision.

      Please also see our response to the 1st question of Reviewer 2 in the public review section.

      (l) Figure 6 Table supplement 3 does NOT show the FDR but only the log2 fold change. Please amend.

      We have amended the supplementary table accordingly.

    1. Author Response

      We would like to thank the editors for giving us an opportunity to address the insightful comments made by the referees. In our response to the comments, we provide a guide to important information that may have been overlooked, and hope to elaborate on the context for better evaluating this study.

      As mentioned in the introduction of our manuscript, mosquito-transmitted diseases cause nearly a million deaths every year and significant worldwide morbidity. Moreover, the geographical range of mosquito vectors is rapidly expanding due to climate change and mosquito-borne disease risks are emerging in new parts of the world. DEET was discovered in the 1940s and has remained the primary insect repellent for >70 years in the developed world. The US Environmental Protection Agency (EPA) regulates mosquito repellents, and DEET-based commercial products are typically assigned protection times that vary with concentration. Products with lower concentration need repeated applications, whereas those with higher concentrations feel oily and cost more.

      We also mentioned that DEET inhibits mammalian cation channels and human acetylcholinesterase. The latter is a target of carbamate insecticides that are commonly used in disease-endemic areas, raising additional concerns about prolonged use of DEET. DEET is also a solvent and damages several forms of plastics, synthetic fabrics, and painted . Unfortunately, DEET has been of little value in disease control in Africa and Asia. Even in developed countries, a natural, cosmetically pleasant alternative could benefit millions of people who currently avoid repellents.

      Innovation in finding new repellents has been slow due to limitations in current research approaches and high costs for EPA registration (specially for synthetic compounds). Since DEET only five additional actives have been approved by the EPA for repellent products. In the 20+ years since discovery of insect odorant receptors from genomes, not a single novel repellent compound has been identified registered by the EPA. Thus, there is a both a strong need for new approaches to find insect repellents and need for new active ingredients that are safe and strategically effective. In fact, this goal of finding new mosquito repellents has been the topic of multiple Gates Foundation Grand Challenge grants, and numerous NIH funded grants to many research groups around the world.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors set up a pipeline to predict insect repellents that are pleasant and safe for humans. This is done by daisy-chaining a new classification model based on predicting repellents with a published model on predicting human perception. Models use a feature-engineered selection of chemical features to make their predictions. The predicted molecules are then validated against a proxy humanoid (heated brick) and its safety is tested by molecular assays of human cells. The humanistic approach to modeling these authors have taken (which considers cosmetic/aesthetic appeal and safety) is novel and a necessary step for consumer usage. However, the importance of pleasantness over effectiveness is still up for debate (DEET is unpleasant but still used often) and the generalization of safety tests is unknown and assumed. The effectiveness of the prediction models is also still warranted. They pass the authors' own behavioral tests, but their contribution to the field is unknown as both models (new and published) have not been rigorously benchmarked to previous models. Moreover, the author's breadth of literature in this field is sparse, ignoring directly related studies.

      Strengths:

      Humanistic approach to modeling considers pleasantness and safety. Chaining models can help limit the candidate odorants from the vastness of odor space.

      Weaknesses:

      The current models need to be bench-marked against leading models predicting similar outcomes. Similarly, many of these papers need to be addressed and discussed in the introduction. The authors might even consider their data sources for model training to increase performance and lexical categorization for interoperability. For instance, the Dravnikes data lexicon, currently used in the human perception lexicon, has been highly criticized for its overlapping and hard-to-interpret descriptive terms ("FRAGRANT", "AROMATIC").

      Human Perception:

      Khan, R. M., Luk, C. H., Flinker, A., Aggarwal, A., Lapid, H., Haddad, R., & Sobel, N. (2007). Predicting odor pleasantness from odorant structure: pleasantness as a reflection of the physical world. Journal of Neuroscience, 27(37), 10015-10023.

      Keller, A., Gerkin, R. C., Guan, Y., Dhurandhar, A., Turu, G., Szalai, B., ... & Meyer, P. (2017). Predicting human olfactory perception from chemical features of odor molecules. Science, 355(6327), 820-826.

      Gutiérrez, E. D., Dhurandhar, A., Keller, A., Meyer, P., & Cecchi, G. A. (2018). Predicting natural language descriptions of mono-molecular odorants. Nature communications, 9(1), 4979.

      Lee, B. K., Mayhew, E. J., Sanchez-Lengeling, B., Wei, J. N., Qian, W. W., Little, K. A., ... & Wiltschko, A. B. (2023). A principal odor map unifies diverse tasks in olfactory perception. Science, 381(6661), 999-1006.

      Author Response: The human perception predictions were performed using models that we had reported in two earlier publications: Kowalewski & Ray, iScience (2020b) and Kowalewski, Huynh & Ray, Chem. Senses (2021). Three of the four references pointed out by the referee were cited in these prior studies, which involved computational validation by predicting on a test set of the data which was left out of training (as typically done), and also predicting across different human studies with a high degree of success. A rigorous benchmarking of the odor perception models was done in Kowalewski, Huynh & Ray, Chem. Senses (2021) and a mini-review published in the same issue of the journal by Gerkin, Chem. Senses, (2021). This included a favorable comparison with the two references indicated by the referee: Keller et al. Science (2017) as well as the Gutiérrez et. al. Nat. Communication (2018). The 4th reference, Lee et al, Science (2023) describes a neural network approach and was published much after our mosquito behavior studies were completed. Although using an advanced Neural network model Lee et al. worked with 2-D structures of compounds in contrast to our 3-D approach. They also did not report cross-study validations or comparisons with Keller et al, 2017 or benchmark to past studies, so it is difficult to compare advances if any.

      The intent of the current study was to move beyond testing approaches, of which there are many, and instead work on a practical use case. As we see it, it is not necessarily the prediction of fragrance character or quality alone that matters but overlap with other predicted bioactivities. From the perspective of human use, a molecule with a pleasing scent that also repels insects is likely to be far more useful than one with an unappealing scent. Accordingly, our task in this study was to select molecules that fit into specific use categories: display strong insect repellency, have pleasing scent profiles, are natural in origin and are potentially repurposed from flavors and fragrances.

      Insect Repellents:

      Wright, R. H. (1956). Physical basis of insect repellency. Nature, 178(4534), 638-638.

      Katritzky, A. R., Wang, Z., Slavov, S., Tsikolia, M., Dobchev, D., Akhmedov, N. G., ... & Linthicum, K. J. (2008). Synthesis and bioassay of improved mosquito repellents predicted from chemical structure. Proceedings of the National Academy of Sciences, 105(21), 7359-7364.

      Bernier, U. R., & Tsikolia, M. (2011). Development of Novel Repellents Using Structure− Activity Modeling of Compounds in the USDA Archival Database. In Recent Developments in Invertebrate Repellents (pp. 21-46). American Chemical Society.

      Author response: The Katritzky et. al. PNAS (2008) paper is cited in our study, and we have indicated that the chemical analogs reported therein are part of the training data set in our study. We thank the reviewer for pointing us to the book chapter by Bernier & Tsikolia (2011), which reviews the QSAR approaches taken for repellent discovery and in large measure focuses on the Katritzky et. al. PNAS (2008) paper. We did cite two relevant studies by Uli Bernier, but agree that citation of the book chapter would make a nice addition.

      The current study assumes that insect repellents repel via their odor valence to the insect, but this is not accurate. Insect repellents also mask the body odor of humans making them hard to locate. The authors need to consult the literature to understand the localization and landing mechanisms of insects to their hosts. Here, they will understand that heat alone is not the attractant as their behavioral assay would have you believe. I suggest the authors test other behaviour assays to show more convincing evidence of effectiveness. See the following studies:

      De Obaldia, M. E., Morita, T., Dedmon, L. C., Boehmler, D. J., Jiang, C. S., Zeledon, E. V., ... & Vosshall, L. B. (2022). Differential mosquito attraction to humans is associated with skin-derived carboxylic acid levels. Cell, 185(22), 4099-4116.

      McBride, C. S., Baier, F., Omondi, A. B., Spitzer, S. A., Lutomiah, J., Sang, R., ... & Vosshall, L. B. (2014). Evolution of mosquito preference for humans linked to an odorant receptor. Nature, 515(7526), 222-227.

      Wei, J. N., Vlot, M., Sanchez-Lengeling, B., Lee, B. K., Berning, L., Vos, M. W., ... & Dechering, K. J. (2022). A deep learning and digital archaeology approach for mosquito repellent discovery. bioRxiv, 2022-09.

      Author response: In this study we took an unbiased approach to compile the training data set, including several known insect repellents of varying chemical structures and volatility, for most of which there is no information on how they are sensed by insects. Not surprisingly, the repellents we identified are varied in structure and in functional groups, and are likely detected in more than one way by the mosquitoes, using olfactory and/or gustatory systems. We did not consider “masking” of skin attraction as a factor in the training data set in this study, which precluded the need to discuss the papers pointed out by the referee in any detail. In fact there is an extremely vast and rich body of literature regarding human skin odor, CO2 and breath emanations, which includes our own contributions of research and review articles that are not discussed in the current paper.

      We did in fact conduct human arm-in-cage experiments with a few of the compounds reported in this study using female Aedes aegypti mosquitoes; a preprint describes the smaller scale analysis, the results of which show strong repellency, in Boyle et. al. bioRxiv (2016) https://doi.org/10.1101/060178 (Figure 4). However, heat offers a practical proxy for evaluating prospective repellents in a high-throughput manner. It would certainly be desirable to further evaluate additional candidates from the heat attraction assay with human subjects in the future.

      We thank the reviewer for pointing out the preprint by Wei, et. al. bioRxiv (2022). Our approaches differ in that Wei et al do not consider properties such as fragrance and toxicity. We also cannot assume that their newer neural network model is superior because although the model uses a large training dataset, it does not use 3D chemical structures that are extremely relevant for biological activity. While very little information is available for the actives reported in Wei et. al., we independently evaluated their top compounds similar or better than DEET (CAS#3731-16-6, 4282-32-0, 2040-04-2, 32940-15-1 and 3446-90-0) and could not find information about toxicity, smell, or natural source. In contrast, the top repellents that we identify here as similar or better than DEET (N=8) are all classified as GRAS (Generally Regarded as Safe) compounds by the Flavor and Extract Manufacturers (FEMA), are all naturally occurring (plum, jasmin, mushroom, grapes, etc), and have pleasant smells. The Dermal toxicity values in rabbits are known for six of our compounds and are at the best possible levels (5000mg/kg).

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting study that seeks to identify novel mosquito repellents that smell attractive to humans.

      Strengths:

      The combination of standard machine learning methods with mosquito behavioral tests is a strength.

      Weaknesses:

      The study would be strengthened by describing how other modern ML approaches (RF, decision trees) would classify and identify other potential repellents.

      Author response: The current approach already shows a success rate >85% for repellency coefficient >0.5 and identifies eight naturally occurring GRAS compounds with repellency as strong as or greater than DEET. This substantially expands the repertoire of strong natural repellents. Since the 1950s only six active ingredients have been registered by US EPA for use in topical repellents, of which only two are natural in origin (Oil of lemon eucalyptus and catmint oil) and they typically do not protect as well as DEET does. That being said, we have since explored other predictive algorithms, for instance Neural Networks. The experimental evaluation of these newer pipelines will take significant resources and time and will be the focus of future grants.

      A comparison in the repellent activity between DEET and the top ten hits identified in this new study indicates little change in repellent activity (~3%), suggesting that DEET remains the gold standard. Without additional toxicity tests, the study is arguably incremental. The study's novelty should be better clarified.

      Author response: There is an urgent need to find new insect repellents that have better chances of being adopted by people who avoid DEET, such as in Africa and Asia. Having more natural actives that are effective, expands the tools against disease transmitting mosquitoes. As mentioned above, the top repellents that we identified as similar to or better than DEET (N=8) are all classified as GRAS (Generally Regarded as Safe) compounds by the Flavor and Extract Manufacturers (FEMA), are all naturally occurring (plum, jasmin, mushroom, grapes), and have pleasant smells. The Dermal toxicity values in rabbits are known for six and they are of the best possible levels (5000mg/kg).

      The Methods in the repellency tests are sparse, and more information would be useful. Testing the top repellents at low doses (<<1%) and for long periods (2-12 h) would strengthen the manuscript. Without this information, the manuscript is lacking in depth.

      Author response: The US Environmental Protection Agency (EPA) regulates mosquito repellents, and DEET-based commercial products are typically assigned protection times that vary with concentration (10% ~2 hrs, 30% ~5hrs, 100% ~8hrs). These would be the relevant concentrations for testing protection times on human volunteers, not lower as suggested. Such studies fall within the realm of EPA registration efforts, involving extensive GLP-testing for safety, physical chemistry, and Human Subjects Board approvals. This is outside the scope of the current study and is typically accomplished during development efforts.

      Testing human subjects on their olfactory perceptions of the repellents would also increase the depth and utility of the manuscript. Without additional experiments, the authors' conclusions lack support and have limited impact on the state-of-the-art.

      This manuscript is a mix of different approaches, which makes it lack cohesion. There is the ML method for classifying new repellents that smell good, but no testing of the repellents on human volunteers. The repellents are not tested at realistic concentrations and durations. And the calcium mobilization test is strange and makes little sense in the context of the other experiments and framing of the manuscript.

      Author response: The human olfaction validation that we present in this paper is consistent with most current publications in the field (for example, Keller et al, Gutiérrez et al.). More systematic validation of the human odor character prediction pipelines used was presented in two previous papers Kowalewski & Ray, iScience (2020b) and Kowalewski, Huynh & Ray, Chem. Senses (2021) and a mini-review published in the same issue of the journal by Gerkin, Chem. Senses, (2021).

      Reviewer #3 (Public Review):

      While I am not a specialist in this field, I do have some knowledge of the subject matter and the computational aspects involved. The authors employ simple machine learning techniques (such as SVM) for the following purposes:

      (a) Prediction of aversive valence.

      (b) Predicting anti-repellent chemicals.

      (c) Predicting calcium mobilization.

      The approach is commonplace in chemoinformatics literature.

      Weaknesses:

      • All the above models are presented discretely, making it difficult to discern experiment design principles and connectedness.

      • The ML work is rudimentary, lacking adequate details. Chemoinformatics has reached great heights, and SVM does not seem contemporary.

      • There is significant existing research on finding repellents.

      Author response: In the current study, we aimed to showcase how computational research may be combined with basic science to create scalable pipelines that address real world problems, rather than to demonstrate methodological novelty of chemoinformatics approaches. Specifically we wanted to use different predictive models to identify compounds that display strong insect repellency, have pleasing scent profiles, are natural in origin and are potentially repurposed from flavors and fragrances. Unfortunately, there is very little existing research on insect repellents that have these types of properties, which would make them better candidates for EPA registration. Most tested compounds are synthetic, and are often analogs of known repellents like DEET, and necessitate substantial time and resources to register. Moreover the identities of chemosensory receptors that are responsible for repellency to DEET and other compounds, and that are conserved across Anopheles, Aedes and Culex mosquitoes are not known.

      It is true that the field of cheminformatics has experimented with a variety of newer approaches, based in part on neural networks (e.g., Graph Neural Networks and graph embeddings to encode chemical structure rather than a more conventional Extended Connectivity Fingerprint (ECFP)). Importantly, however, novelty does not imply usefulness. The mosquito behavior experiments that we present show a very high success rate (>85%), validating our approach and identifying several excellent candidates already.

      Strengths:

      • Authors attempt to make a case for calcium mobilization in the context of repellency. This aspect sounds interesting but is not surprising.

      • Behavioral profiling of repellents could be useful.

      Author Comment: We thank the referee for this comment. We have indeed done behavioral profiling for several repellents that evoke calcium mobilization, but we do not see any clear correlation thus far.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents a valuable approach to exploring CD4+ T-cell response in mice across stimuli and tissues through the analysis of their T-cell receptor repertoires. The authors use a transgenic mouse model, in which the possible diversity of the T-cell receptor repertoire is reduced, such that each of a diverse set of immune exposures elicits more detectably consistent T-cell responses across different individuals. However, whereas the proposed experimental system could be utilized to study convergent T-cell responses, the analyses done in this manuscript are incomplete and do not support the claims due to limitations in the statistical analyses and lack of data/code access.

      We worked to address the reviewers' concerns below, point-by-point.

      All data on immune repertoires are deposited here: https://figshare.com/articles/dataset/Convergence_plasticity_and_tissue_residence_of_regulatory_and_effector_T_cell_response/22226155

      We added the Data availability statement to the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the alpha chain TCR landscape in conventional vs regulatory CD4 T cells. Overall I think it is a very well thought out and executed study with interesting conclusions. The authors have investigated CDR3 alpha repertoires coupled with a transgenic fixed CDR3beta in a mouse system.

      Strengths:

      • One of a kind evidence and dataset.

      • State-of-the-art analyses using tools that are well-accepted in the literature.

      • Interesting conclusions on the breadth of immune response to challenges across different types of challenges (tumor, viral and parasitic).

      Thank you for the positive view.

      Weaknesses:

      • Some conclusions regarding the eCD4->eTreg transition are not so strong using only the data.

      The overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • Some formatting issues.

      We are working on the manuscript to correct minor errors and formatting.

      Reviewer #2 (Public Review):

      This study investigates T-cell repertoire responses in a mouse model with a transgenic beta chain, such that all T-cells in all mice share a fixed beta chain, and repertoire diversity is determined solely by alpha chain rearrangements. Each mouse is exposed to one of a few distinct immune challenges, sacrificed, and T-cells are sampled from multiple tissues. FACS is used to sort CD4 and Treg cell populations from each sample, and TCR repertoire sequencing from UMI-tagged cDNA is done.

      Various analyses using repertoire diversity, overlap, and clustering are presented to support several principal findings: 1) TCR repertoires in this fixed beta system have highly distinct clonal compositions for each immune challenge and each cell type, 2) these are highly consistent across mice, so that mice with shared challenges have shared clones, and 3) induction of CD4-to-Treg cell type transitions is challenge-specific.

      The beta chain used for this mouse model was previously isolated based on specificity for Ovalbumin. Because the beta chain is essential for determining TCR antigen specificity, and is highly diverse in wildtype mice, I found it surprising that these mice are reported to have robust and consistently focused clonal responses to very diverse immune challenges, for which a fixed OVA-specific beta chain is unlikely to be useful. The authors don't comment on this aspect of their findings, but I would think it is not expected a priori that this would work. If this does work as reported, it is a valuable model system: due to massively reduced diversity, the TCR repertoire response is much more stereotyped across individual samples, and it is much easier to detect challenge-specific TCRs via the statistics of convergent responses.

      This was to some extent expected, since these mice live almost normally and have productive adaptive immune responses and protection. In real life, there are frequent TCR-pMHC interactions where the TCR-alpha chain dominates (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5701794/; https://pubmed.ncbi.nlm.nih.gov/37047500/). On the fixed TCR-beta background this mechanics starts working full-fledged, essentially substituting TCR-beta diversity, at the extent of relatively simplified TCRab repertoire and probably higher cross-reactivity.

      We agree that this is a valuable model, for sure, and indicated this in the last sentence of our Discussion. Now we are also adding this point to the abstract.

      While the data and analyses present interesting signals, they are flawed in several ways that undermine the reported findings. I summarize below what I think are the most substantive data and analysis issues.

      (1) There may be systematic inconsistencies in repertoire sampling depth that are not described in the manuscript. Looking at the supplementary tables (and making some plots), I found that the control samples (mice with mock challenge) have consistently much shallower sampling-in terms of both read count and UMI count-compared with the other challenge samples. There is also a strong pattern of lower counts for Treg vs CD4 cell samples within each challenge.

      The immune response of control mice is less extensive, as it should be. Just like the fact that the number of Tregs in tissues is lower than CD4, this is normal. So this all follows the expectations. But please note that we were very accurate everywhere with appropriate data normalisation, using all our previous extensive experience (https://pubmed.ncbi.nlm.nih.gov/29080364/).

      In particular (now adding more relevant details to Methods):

      For diversity metrics calculations, we randomly sampled an equal number of 1000 UMI from each cloneset. Samples with UMI < 700 were excluded from analysis.

      For amino acid overlap metrics calculations, we selected top-1000 largest clonotypes from each cloneset. Samples with clonotype counts < 700 were excluded from analysis.

      For nucleotide overlaps metrics calculations (eCD4-eTreg), we selected top-100 clonotypes from each cloneset. Samples with clonotypes < 100 were excluded from analysis.

      The top N clonotypes were selected as the top N clonotypes after randomly shuffling the sequences and aligning them in descending order. This was done in order to get rid of the alphabetical order for clonotypes with equal counts (e.g. count = 1 or 2).

      Downsampling was carried out using software vdjtools v.1.2.1.

      (2) FACS data are not reported. Although the graphical abstract shows a schematic FACS plot, there are no such plots in the manuscript. Related to the issue above, it would be important to know the FACS cell counts for each sample.

      Yes, we agree that this is valuable information that should be provided. Unfortunately, this data has not been preserved.

      (3) For diversity estimation, UMI-wise downsampling was performed to normalize samples to 1000 random UMIs, but this procedure is not validated (the optimal normalization would require downsampling cells). What is the influence of possible sampling depth discrepancies mentioned above on diversity estimation? All of the Treg control samples have fewer than 1000 total UMIs-doesn't that pose a problem for sampling 1000 random UMIs?

      Indeed, I simulated this procedure and found systematic effects on diversity estimates when taking samples of different numbers of cells (each with a simulated UMI count) from the same underlying repertoire, even after normalizing to 1000 random UMIs. I don't think UMI downsampling corrects for cell sampling depth differences in diversity estimation, so it's not clear that the trends in Fig 1A are not artifactual-they would seem to show higher diversity for control samples, but these are the very same samples with an apparent systematic sampling depth bias.

      We evaluated this approach through all our work, and summarised in the ref: https://pubmed.ncbi.nlm.nih.gov/29080364/. Altogether, normalising to the same count of randomly sampled UMI seems to be the best approach (although, preferably, the initial sequencing depth should be essentially higher for all samples than the sampling threshold used). Initial sorting of identical numbers of cells and ideally uniform library preparation and sequencing is generally not realistic and does not work in the real world, while UMI downsampling does the same work much better.

      (4) The Figures may be inconsistent with the data. I downloaded the Supplementary Table corresponding to Fig 1 and made my own version of panels A-C. This looked quite different from the diversity estimations depicted in the manuscript. The data does not match the scale or trends shown in the manuscript figure.

      There was a wrong column for Chao1, now correcting. Also, please note that we only used samples with > 700 UMI. Supplementary Table now corrected accordingly. Also, please note that Figure 1 shows the results for lung samples only.

      (5) For the overlap analysis, a different kind of normalization was performed, but also not validated. Instead of sampling 1000 UMIs, the repertoires were reduced to their top 1000 most frequent clones. It is not made clear why a different normalization would be needed here. There are several samples (including all Treg control samples) with only a couple hundred clones. It's also likely that the noted systematic sampling depth differences may drive the separation seen in MDS1 between Treg and CD4 cell types. I also simulated this alternative downsampling procedure and found strong effects on MDS clustering due to sampling effects alone.

      That’s right, for the overlap analysis (which values are mathematically proportional to the clonotype counts in both compared repertoires, so the difference in the counts causes major biases) the right way to do it is to choose the same number of clonotypes. See Ref. https://pubmed.ncbi.nlm.nih.gov/29080364/.

      We kept only samples with > 700 for the overlap analyses. Some relatively poor samples are present in all challenges, while MDS1 localization has clear reproducible logic, so we are confident in these results.

      It is not made clear how the overlap scores were converted to distances for MDS. It's hard to interpret this without seeing the overlap matrix.

      This is a built-in feature in VDJtools software (https://pubmed.ncbi.nlm.nih.gov/26606115/). See also here: https://vdjtools-doc.readthedocs.io/en/master/overlap.html.

      (6) The cluster analysis is superficial, and appears to have been cherry-picked. The clusters reported in the main text have illegibly small logo plots, and no information about V/J gene enrichments. More importantly, as the caption states they were chosen from the columns of a large (and messier-looking) cluster matrix in the supplementary figure based on association with each specific challenge. There's no detail about how this association was calculated, or how it controlled for multiple tests. I don't think it is legitimate to simply display a set of clusters that visually correlate; in a sufficiently wide random matrix you will find columns that seem to correlate with any given pattern across rows.

      Particular CDR3 sequences and VJ segments do not mean much for the results of this manuscript. Logos are given just for visual explanation of how the consensus motifs of the clusters look like.

      We now add two more Supplementary Tables and a Supplementary Figure with full information about clusters.

      We disagree that the Supplementary Figure 1 (representing all the clusters) looks “messy”. Vice versa, it is surprisingly “digital”, showing the clear patterns of responses and homings. This becomes clear if you visually study it for a while. But yes, it is too big to let the reader focus on this or that aspect. That is why we need to select TCR clusters to illustrate this or that aspect discussed in the work, but they were selected from the overall already structured picture.

      (7) The findings on differential plasticity and CD4 to Treg conversion are not supported. If CD4 cells are converting to Tregs, we expect more nucleotide-level overlap of clones. This intuition makes sense. But it seems that this section affirms the consequent: variation in nucleotide-level clone overlap is a readout of variation in CD4 to Treg conversion. It is claimed, based on elevated nucleotide-level overlap, that the LLC and PYMT challenges induce conversion more readily than the other challenges. It is not noted in the textual interpretations, but Fig 4 also shows that the control samples had a substantially elevated nucleotide-level overlap. There is no mention of a null hypothesis for what we'd expect if there was no induced conversion going on at all. This is a reduced-diversity mouse model, so convergent recombination is more likely than usual, and the challenges could be expected to differ in the parts of TCR sequence space they induce focus on. They use the top 100 clones for normalization in this case, but don't say why (this is the 3rd distinct normalization procedure).

      Your point is absolutely correct: “This is a reduced-diversity mouse model, so convergent recombination is more likely than usual”. Distinct normalisation procedure was required to focus on the most expanded clonotypes to avoid the tail of (presumably cross-reactive) and identical TCRs present in all repertoires in these limited-repertoire mice. So we downsampled as strictly as possible to minimise this background signal of nucleotide overlap, and only this strict downsampling to the top-100 clonotypes allowed us to visualise the difference between the challenges. This is a sort of too complicated explanation that would overload the manuscript. But your comments and our answers will be available to the reader who wants to go into all the details.

      The observed (at this strict downsampling) overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts in interpretations based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      Although interpretations of the reported findings are limited due to the issues above, this is an interesting model system in which to explore convergent responses. Follow-up experimental work could validate some of the reported signals, and the data set may also be useful for other specific questions.

      Yes, thank you for your really thorough analysis. We fully agree with your conclusion.

      Reviewer #3 (Public Review):

      Nakonechnaya et al present a valuable and comprehensive exploration of CD4+ T cell response in mice across stimuli and tissues through the analysis of their TCR-alpha repertoires.

      The authors compare repertoires by looking at the relative overlap of shared clonotypes and observe that they sometimes cluster by tissue and sometimes by stimulus. They also compare different CD4+ subsets (conventional and Tregs) and find distinct yet convergent responses with occasional plasticity across subsets for some stimuli.

      The observed lack of a general behaviour highlights the need for careful comparison of immune repertoires across cell subsets and tissues in order to better understand their role in the adaptive immune response.

      In conclusion, this is an important paper to the community as it suggests several future directions of exploration.

      Unfortunately, the lack of code and data availability does not allow the reproducibility of the results.

      Thank you for your positive view.

      All data on immune repertoires are deposited here: https://figshare.com/articles/dataset/Convergence_plasticity_and_tissue_residence_of_regulatory_and_effector_T_cell_response/22226155

      We added the Data availability statement to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • In the manuscript at "yielding 13,369 {plus minus} 1,255 UMI-labeled TCRα cDNA molecules and 3233 {plus minus} 310 TCRα CDR3 clonotypes per sample" I'm not sure how can there be fewer unique DNA molecules than clonotypes in each sample.

      That was our mistake for sure, now corrected.

      • In the manuscript at "This indicates that the amplitude and focused nature of the effector and regulatory T cell response in lungs is generally comparable."

      I'm not sure it's possible to conclude that a drop in diversity in all conditions necessarily signals a focused nature. Since at this stage, the nature of the colotypes was not compared between conditions, it is not possible to claim a focused nature of the response.

      We have softened the wording:

      "This could indicate that the amplitude and focused nature of the effector and regulatory T cell response in lungs is generally comparable."

      • What are your thoughts on why there is such a large overlap between Treg and Teff in the Lung in control? For some replicates it is almost as much as a post-LLC challenge!

      There is some natural dispersion in the data, which is generally expectable. The overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • In the manuscript at "These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches" I'm not sure we can conclude this from the Figure. Wouldn't you expect the samples to be grouped by color (the different challenges)? Maybe I'm not understanding the sentence!

      This is a different story, about resident Tregs, irrespective of the challenge.

      The whole explanation is here in the text:

      “Global CDR3α cluster analysis revealed that characteristic eTreg TCR motifs were present in distinct lymphatic tissues, including spleen and thymus, irrespective of the applied challenge (Supplementary Fig. 1). To better illustrate this phenomenon, we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge. This analysis demonstrated close proximity of eTreg repertoires obtained from the same lymphatic tissues upon all lung challenges and across all animals (Fig. 5a, b). These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches. Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells (Fig. 5c, d).”

      And in the abstract:

      “Additionally, our TCRα repertoire analysis demonstrated that distinct antigenic specificities are characteristic for eTreg cells residing in particular lymphatic tissues, regardless of the challenge, revealing the homing-specific, antigen-specific resident Treg populations. ”

      • In the manuscript at " Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells ":

      5b and 5d seem to have the same pattern: Spleen and MLN group together, AxLN and IgLN together and thymus is separate. Do you mean to say that the groups are more diffuse? I feel like the pattern really is the same and it's likely due to some noise in the data…

      Yes, we just mean here that eTreg groups are less diffuse - means more convergent.

      • I'm not sold on the eCD4 to eTreg conversion evidence. Why only limit to the top 100 clones? The top 1000 clones were used in previous analyses! Moreover, the authors claim that calculating relative overlap (via F2) of matching CDR3+V+J genes is evidence of a conversion between eCD4 and eTreg. I think to convince myself of a real conversion, I would track the cells between groups, unfortunately, I'm not sure how to track this.. Maybe looking at the thymus population? For example, what is the overlap in the thymus vs. after the challenge? I don't have an answer on how to verify but I feel that this conclusion is a bit on the weaker end.

      Distinct normalisation procedure was required to focus on the most expanded clonotypes to avoid the tail of (presumably cross-reactive) and identical TCRs present in all repertoires in these limited-repertoire mice. So we downsampled as strictly as possible to minimise this background signal of nucleotide overlap, and only this strict downsampling to the top-100 clonotypes allowed us to visualise the difference between the challenges. This is a sort of too complicated explanation that would overload the manuscript. But your comments and our answers will be available to the reader who wants to go into all the details.

      The observed (at this strict downsampling) overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts in interpretations based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • There is a nuance in the analysis between Figure 3 and Figure 5 which I think I am not grasping. Both Figures use the same method and the same data but what is different? I think the manuscript would benefit from making this crystal clear. The conclusions will likely be more evident as well!

      As explained in the text and above, on Figure 5 “we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge.”

      The idea of this mini-chapter of the manuscript is to reveal tissue-resident Tregs, distinct for distinct tissues, resident there in all these mice, irrespectively of the challenge we applied. And they are really there (!).

      • Do the authors plan to share their R scripts?

      All calculations were performed in VDJtools. R was only used to build figures. Corrected this in Methods.

      Minor typos and formatting issues to address:

      • Typo in Figure 2a the category should read "worm" instead of "warm"

      Corrected.

      • Figure 2a heatmap is missing a color bar indicating the value ranges

      The detailed information can be found in additional Supplementary materials.

      • Figure 2f is never mentioned in the manuscript!

      Corrected.

      • "eTreg repertoire upon lung challenge is reflected in the draining lymph node" - the word upon is of a lower size

      Corrected.

      • The authors should make the spelling of eTreg uniform across the manuscript (reg in subscript vs just lower case letters. Same goes for CDR3a vs CDR3\alpha

      Corrected.

      • Figure 4a-d p-values annotations are not shown. Is it because they are not significant?

      Corrected.

      • The spelling of FACS buffer should be uniform (FACs vs FACS, see methods)

      Corrected.

      • In the gating strategy, I would make a uniform annotation for the cluster of differentiation, for example, "CD44 high" vs "CD44^{hi}", pos vs + etc.

      Corrected.

      • Citation for MIGEC software (if available) is missing from methods

      Present in the text so probably sufficient.

      Reviewer #2 (Recommendations For The Authors):

      I noticed the data was made available via Figshare in the preprint, but there is no data availability statement in the current ms.

      We provided Data availability statement.

      The methods state that custom scripts were written to perform the various analyses. Those should be made available in a code repository, and linked in the ms.

      All calculations were performed in VDJtools. R was only used to build figures. Corrected this in Methods.

      The title mentioned "TCR repertoire prism", so I thought "prism" was the name of a new method or software. But then the word "prism" didn't appear anywhere in the ms.

      We just mean viewing or understanding something from a different perspective or through a lens that reveals different aspects or nuances.

      Figure 1D lacks an x-axis label.

      Worked on the figures in general.

      Reviewer #3 (Recommendations For The Authors):

      • The paper is very concise, possibly a bit too much. It could use additional explanations to properly affirm its relevance, for example:

      why the choice of fixing the CDR3beta background?

      To make repertoire more similar across the mice, and to track all the features of repertoire using only one chain.

      to what it is fixed?

      As explained in Methods:

      “C57BL/6J DO11.10 TCRβ transgenic mice (kindly provided by Philippa Marrack) and crossed to C57BL/6J Foxp3eGFP TCRa-/- mice.”

      What do you expect to see and not to see in this specific system and why it is important?

      As stated above: we expected repertoire to be more similar across the mice, and it is important to find antigen-specific TCR clusters across mice, and to be able to track all the features of the TCR repertoire using only one chain.

      Does this system induce more convergent responses? If so, can we extrapolate the results from this system to the full alpha-beta response?

      Such a model, compared to conventional mice, is much more powerful in terms of the ability of monitoring convergent TCR responses. At the same time, it behaves natural, mice live almost normally, so we believe it reflects natural behaviour of the full fledged alpha-beta T cell repertoire.

      • Is the lack of similarity of other tissues to Lung/MLN due to a lack of a response?

      As indicated in the title of the corresponding mini-chapter: “eTreg repertoire upon lung challenge is reflected in the draining lymph node”. And conclusion of this mini-chapter is that “these results demonstrate the selective tissue localization of the antigen-focused Treg response. ”

      Can you do a dendrogram like 2a for the other tissues to better clarify what is going on there? There is space in the supplementary material.

      We built lots of those, but in such single dimension mostly they are less informative compared to 2D MDS plots.

      • Figure 5 seems a bit out of place as it looks more related to Figure 2. It could maybe be integrated there, sent to supplementary or become Figure 3?

      This is a different story, about resident Tregs, irrespective of the challenge.

      The whole explanation is here in the text:

      “Global CDR3α cluster analysis revealed that characteristic eTreg TCR motifs were present in distinct lymphatic tissues, including spleen and thymus, irrespective of the applied challenge (Supplementary Fig. 1). To better illustrate this phenomenon, we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge. This analysis demonstrated close proximity of eTreg repertoires obtained from the same lymphatic tissues upon all lung challenges and across all animals (Fig. 5a, b). These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches. Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells (Fig. 5c, d).”

      And in the abstract:

      “Additionally, our TCRα repertoire analysis demonstrated that distinct antigenic specificities are characteristic for eTreg cells residing in particular lymphatic tissues, regardless of the challenge, revealing the homing-specific, antigen-specific resident Treg populations. ”

      • Have you explored more systematically the role of individual variability? If you stratify by individual, do you observe any trend? If not this is also an interesting observation to highlight and discuss.

      This is inside the calculations and figures/ one dot = 1 mice, so this natural variation is there inside.

      • Regarding the MDS plots: why are 2 dimensions the right amount? Maybe with 3, you can see both tissue specificity and stimuli contributions. Can you do a stress vs # dimensions plot to check what should be the right amount of dimensions to more accurately reproduce the distance matrix?

      Tissue specificity and stimuli contribution is hard to distinguish without focussing on appropriate samples, as we did on Fig. 3 and 5. The work is already not that simple as is, and attempting to analyse this in multidimensional space is far beyond our current abilities. But this is an interesting point for future work, thank you.

      • Figure 2: A better resolution is needed in order to properly resolve the logo plots at the bottom.

      Yes, we worked on Figures, and also provide new Supplementary Figure with all the logos.

      • No code or data are made available. There is also a lack of supplementary figures that complement and expand the results presented in the main text.

      We believe that the main text, although succinct, contains lots of information to analyse and conclusions (preliminary) to make. So we do not see it rational to overload it further.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The manuscript investigates the role of membrane contact sites (MCSs) and sphingolipid metabolism in regulating vacuolar morphology in the yeast Saccharomyces cerevisiae. The authors show that tricalbin (1-3) deletion leads to vacuolar fragmentation and the accumulation of the sphingolipid phytosphingosine (PHS). They propose that PHS triggers vacuole division through MCSs and the nuclear-vacuolar junction (NVJ). The study presents some solid data and proposes potential mechanisms underlying vacuolar fragmentation driven by this pathway. Although the manuscript is clear in what the data indicates and what is more hypothetical, the story would benefit from providing more conclusive evidence to support these hypothesis. Overall, the study provides valuable insights into the connection between MCSs, lipid metabolism, and vacuole dynamics.

      We thank the positive review from the Reviewer #1. We hope that our hypotheses are supported by the "Author Response to Recommendations" and by further research in the future.

      Reviewer #2 (Public Review):

      This manuscript explores the mechanism underlying the accumulation of phytosphingosine (PHS) and its role in initiating vacuole fission. The study posits the involvement of membrane contact sites (MCSs) in two key stages of this process. Firstly, MCSs tethered by tricalbin between the endoplasmic reticulum (ER) and the plasma membrane (PM) or Golgi regulate the intracellular levels of PHS. Secondly, the amassed PHS triggers vacuole fission, most likely through the nuclear-vacuolar junction (NVJ). The authors propose that MCSs play a regulatory role in vacuole morphology via sphingolipid metabolism. While some results in the manuscript are intriguing, certain broad conclusions occasionally surpass the available data. Despite the authors' efforts to enhance the manuscript, certain aspects remain unclear. It is still uncertain whether subtle changes in PHS levels could induce such effects on vacuolar fission. Additionally, it is regrettable that the lipid measurements are not comparable with previous studies by the authors. Future advancements in methods for determining intracellular lipid transport and levels are anticipated to shed light on the remaining uncertainties in this study.

      We thank the careful comment from Reviewer #2. As Reviewer #2 pointed out, the mechanism of how slight changes in PHS levels can induce the vacuolar fission event is still uncovered in this manuscript. We sincerely consider that this issue has to be resolved in further study.

      Reviewer #3 (Public Review):

      In this manuscript, the authors investigated the effects of deletion of the ER-plasma membrane/Golgi tethering proteins tricalbins (Tcb1-3) on vacuolar morphology to demonstrate the role of membrane contact sites (MCSs) in regulating vacuolar morphology in Saccharomyces cerevisiae. Their data show that tricalbin deletion causes vacuolar fragmentation possibly in parallel with TORC1 pathway. In addition, their data reveal that levels of various lipids including ceramides, long-chain base (LCB)-1P, and phytosphingosine (PHS) are increased in tricalbin-deleted cells. The authors find that exogenously added PHS can induce vacuole fragmentation and by performing analyses of genes involved in sphingolipid metabolism, they conclude that vacuolar fragmentation in tricalbin-deleted cells is due to the accumulated PHS in these cells. Importantly, exogenous PHS- or tricalbin deletion-induced vacuole fragmentation was suppressed by loss of the nucleus vacuole junction (NVJ), suggesting the possibility that PHS transported from the ER to vacuoles via the NVJ triggers vacuole fission. Of note, the authors find that hyperosmotic shock increases intracellular PHS levels, suggesting a general role of PHS in vacuole fission in response to physiological vacuolar division-inducing stimuli. This work provides valuable insights into the relationship between MCS-mediated sphingolipid metabolism and vacuole morphology. The conclusions of this paper are mostly supported by their results, but inclusion of direct evidence indicating increased transport of PHS from the ER to vacuoles via NVJ in response to vacuolar division-inducing stimuli would have strengthened this study. There is another weakness in their claim that the transmembrane domain of Tcb3 contributes to the formation of the tricalbin complex which is sufficient for tethering ER to the plasma membrane and the Golgi complex. Their claim is based only on the structural simulation, but not on by biochemical experiments such as co-immunoprecipitation and pull-down.

      We appreciate the careful feedback from Reviewer #3. We have responded in the "Recommendations to Authors" section and hope it can partially support the weakness in our claim regarding the physical interaction between Tcb1, 2, and 3.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest that the authors include some of the data (e.g., Tcb interactions) that they refer to in the response to the reviewers. I think that this could enhance the message in this manuscript. Also, maybe it's a typo and you were referring to some other image panel, but in the rebuttal letter a "Fig. S3B" is mentioned, but I could not find it.

      Following the suggestions of reviewers #1 and #3, we have added the data of co-immunoprecipitation which confirmed that Tcb3 binds to both Tcb1 and Tcb2 as Supplemental Figure 2. With this change, the person (Ms. Saku Sasaki) who performed this analysis was also added as a co-author.

      Also, we appreciate the careful remark and apologize for the mistake. In the previous Author's response, we mentioned the vacuole observation using SD medium, but this data was Fig 5C, not Fig S3B.

      Reviewer #3 (Recommendations For The Authors):

      I would recommend that the authors include the IP data mentioned in their rebuttal letter to show the interactions among Tcb1-3. Also, the authors should quantify all lipid species in Fig 5B, as shown in Fig 3A.

      Following the suggestions of reviewers #1 and #3, we have added the co-immunoprecipitation data (Fig S2). In a further study, we would like to test if the transmembrane domain of Tcb3 is sufficient for the interaction among Tcb1-3. Also, we quantified all lipid species and replaced the data in Fig 5B.

      Minor points:

      (1) The function of vps4 is not mentioned in the manuscript.

      (2) The function of Sur2p is not mentioned in the manuscript. It should be clearly mentioned that DHS is converted to PHS by Sur2p.

      (1) We have added text sections which mention that VPS4 is needed for normal ESCRT function, and its deletion is an example for inhibition of GFP-Cps1p transport into the vacuole.

      (2) We have added the text in the manuscript that states Sur2p is the hydroxylase that catalysis the conversion of DHS to PHS.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Overall, the magnitude of the effect size due to FNDC5 deficiency in both male and female mice is rather modest. Looking at the data from a qualitative perspective, it is clear that knockout females still lose bone during lactation and on the low calcium diet (LCD). It is difficult to assess the physiologic consequence of the modest quantitative 'protection' seen in FNDC5 mutants since the mutants still show clear and robust effects of lactation and LCD on all parameters measured. Similarly, the magnitude of the 'increased' cortical bone loss in FNDC5 mutant males is also modest and perhaps could be related to the fact that these mice are starting with slightly more cortical bone. Since the authors do not provide a convincing molecular explanation for why FNDC5 deficiency causes these somewhat subtle changes, I would like to offer a suggestion for the authors to consider (below, point #2) which might de-emphasize the focus of the manuscript on FNDC5. If the authors chose not to follow this suggestion, the manuscript could be strengthened by addressing the consequences of the modest changes observed in WT versus FNDC5 KO mice.

      Response: We agree that the magnitude of the effect size due to FNDC5 deficiency is modest with regards to the quantitative cortical bone parameters. However, if one examines the changes in osteocyte lacunar size and the mechanical properties of these bones, the differences are greater. As shown in Figure 3 E, the lacunar area of the WT females on a low calcium diet increases by over 30% and the KO by less than 20%, while in the males it is approximately 38% in WT compared to 46% in KO mice. According to Sims and Buenzli (PMID: 25708054) a potential total loss of ~16,000 mm3 (16 mL) of bone occurs through lactation in the human skeleton. This was based on our measurements in lactation-induced murine osteocytic osteolysis (Qing et al PMID: 22308018). They used our 2D section of tibiae from lactating mice showing an increase in lacunar size from 38 to 46 um2. In that paper we also showed that canalicular width is increased with lactation. Therefore, this would suggest a dramatic decrease in intracortical porosity due to the osteocyte lacunocanalicular system in female KO on a low calcium diet compared to WT females and a dramatic increase in KO males compared to WT males. Also, PTH was higher in the serum of female WT compared to female KO mice on a low calcium diet, the opposite for males in order to maintain normal calcium levels (See Table 1). Based on this data, using the FNDC5 null animals, we would speculate that the product of FNDC5, irisin, is having a highly significant effect on the ultrastructure of bone in both males and females challenged with a low calcium diet.

      (2) The bone RNA-seq findings reported in Figures 4-6 are quite interesting. Although Youlten et al previously reported that the osteocyte transcriptome is sex-dependent, the work here certainly advances that notion to a considerable degree and likely will be of high interest to investigators studying skeletal biology and sexual dimorphism in general. To this end, one direction for the authors to consider might be to refocus their manuscript toward sexually-dimorphic gene expression patterns in osteocytes and the different effects of LCD on male versus female mice. This would allow the authors to better emphasize these major findings, and to then use FNDC5 deficiency as an illustrative example of how sexually-dimorphic osteocytic gene expression patterns might be affected by deletion of an osteocyte-acting endocrine factor. Ideally, the authors would confirm RNA-seq data comparing male versus female mice in osteocytes using in situ hybridization or immunostaining.

      Response: Thank you for this suggestion. We have compared the different effects of LCD on male versus female mice in our revised version and have added a figure containing this information.

      (3) Along the lines of point #2 (above), the presentation of the RNA-seq studies in Figures 4-6 is somewhat confusing in that the volcano plot titles seem to be reversed. For example, Figure 4A is titled "WT M: WT F", but the genes in the upper right quadrant appear to be up-regulated in female cortical bone RNA samples. Should this plot instead be titled "WT F: WT M"? If so, then all other volcano plots should be re-titled as well.

      Response: We have now insured that the plots are appropriately labeled.

      (4) Have the authors compared male versus female transcriptomes of LCD mice?

      Response: We have now compared the male vs female transcriptomes of LCD mice and added an additional figure.

      (5) It would be appreciated if the authors could provide additional serum parameters (if possible) to clarify incomplete data in both lactation and low-calcium diet models: RANKL/OPG ratio, Ctx, PTHrP, and 1,25-dihydroxyvitamin D levels.

      Response: It is not possible to quantitate each of these as the serum has been exhausted. We have checked the RANKL/OPG ratio in the RNA seq and qPCR data using osteocyte enriched bone chips and found no difference.

      (6) Lastly, the data that overexpressing irisin improved bone properties in Fig 2G was somewhat confusing. Based on Kim et al.'s (2018) work, irisin injection increased sclerostin gene expression and serum levels, thus reducing bone formation. Were sclerostin levels affected by irisin overexpression in this study? Was irisin's role in modulating sclerostin levels attenuated with additional calcium deficiency?

      Response: We have not observed any differences in the osteocyte Sost mRNA expression between WT and KO normal and low-calcium-diet male and female mice in our RNAseq and qPCR data. As such, we did not check the Sost levels for the 2G experiment.

      Reviewer #2 (Public Review):

      Summary:

      The goal of this study was to examine the role of FNDC5 in the response of the murine skeleton to either lactation or a calcium-deficient diet. The authors find that female FNDC5 KO mice are somewhat protected from bone loss and osteocyte lacunar enlargement caused by either lactation or a calcium-deficient diet. In contrast, male FNDC5 KO mice lose more bone and have a greater enlargement of osteocyte lacunae than their wild-type controls. Based on these results, the authors conclude that in males irisin protects bone from calcium deficiency but that in females it promotes calcium removal from bone for lactation.

      While some of the conclusions of this study are supported by the results, it is not clear that the modest effects of FNDC5 deletion have an impact on calcium homeostasis or milk production.

      Specific comments:

      (1) The authors sometimes refer to FNDC5 and other times to irisin when describing causes for a particular outcome. Because irisin was not measured in any of the experiments, the authors should not conclude that lack of irisin is responsible. Along these lines, is there any evidence that either lactation or a calcium-deficient diet increases the production of irisin in mice?

      therefore we have extrapolated that the observed effects are due to a lack of circulating irisin. However, this does not rule out that Fndc5 itself could have a function, but this would have to be most likely in muscle and not in the osteocyte as we do not detect significant levels of irisin in either primary osteoblasts nor primary osteocytes compared to muscle and C2C12 cells. As such, we concluded that the phenotypical differences we saw in our experiments are due to a lack of irisin. We now address the reviewer’s point in the discussion. The measurement of irisin in the circulation with lactation or with low calcium diet of normal mice has not been performed.

      (2) The results of the irisin-rescue experiment shown in figure 2G cannot be appropriately interpreted without normal diet controls. In addition, some evidence that the AAV8-irisin virus actually increased irisin levels in the mice would strengthen the conclusion.

      Response: We do not have the normal diet controls at this time. We have quantitate tagged irisin in other AAV experiments and found highly significant expression

      (3) There is insufficient evidence to support the idea that the effect of FNDC5 on bone resorption and osteocytic osteolysis is important for the transfer of calcium from bone to milk. Previous studies by others have shown that bone resorption is not required to maintain milk or serum calcium when dietary calcium is sufficient but is critical if dietary calcium is low (Endo. 156:2762-73, 2015). To support the conclusions of the current study, it would be necessary to determine whether FNDC5 is required to maintain calcium levels when lactating mice lack sufficient dietary calcium.

      Response: We agree that it would be important to measure calcium levels in the milk to test the hypothesis that FNDC5 is important to maintain calcium levels in milk. However, as the calcium levels are normal in the serum, we are assuming they are normal in milk. This would require future experiments.

      (4) The amount of cortical bone loss due to lactation is very similar in both WT and FNDC5 KO mice. The results of the statistical analysis of the data presented in figure 1B are surprising given the very similar effect size of lactation. The key result from the 2-way ANOVA is whether there is an effect of genotype on the effect size of lactation (genotype-lactation interaction). The interaction terms were not provided. Similar concerns are noted for the results shown in figure 1G and H.

      Response: We agree, thanks. We will now add the interaction terms in the figure legends.

      (5) It is not clear what justifies the term 'primed' or 'activated' for resorption. Is there evidence that a certain level of TRAP expression lowers the threshold for osteocytic osteolysis in response to a stimulus?

      Response: The number of TRAP positive osteocytes in female KO mice are lower than in female WT. The number of TRAP positive osteocytes are lower in WT males compared to WT females. We propose that irisin plays a role in the number of TRAP positive osteocytes in normal, WT females by readying or preparing these cells to rapidly respond to low calcium. We will use the term ‘primed’ and will not use the term ‘activated’. We are open to any terminology or description as to why this is observed and what irisin could be doing to the osteocyte.

      Reviewer #3 (Public Review):

      Summary:

      Irisin has previously been demonstrated to be a muscle-secreted factor that affects skeletal homeostasis. Through the use of different experimental approaches, such as genetic knockout models, recombinant Irisin treatment, or different cell lines, the role of Irisin on skeletal homeostasis has been revealed to be more complex than previously thought and this warrants further examination of its role. Therefore, the current study sought to rigorously examine the effects of global Irisin knockout (KO) in male and female mouse bone. Authors demonstrated that in calcium-demanding settings, such as lactation or low-calcium diet, female Irisin KO mice lose less bone compared to wild-type (WT) female mice. Interestingly male Irisin KO mice exhibited worse skeletal deterioration compared to WT male mice when fed a low-calcium diet. When examined for transcriptomic profiles of osteocyte-enriched cortical bone, authors found that Irisin KO altered the expression of osteocytic osteolysis genes as well as steroid and fatty acid metabolism genes in males but not in females. These data support the authors' conclusion that Irisin regulates skeletal homeostasis in sex-dependent manner.

      Strengths:

      The major strength of the study is the rigorous examination of the effects of Irisin deletion in the settings of skeletal maturity and increased calcium demands in female and male mice. Since many of the common musculoskeletal disorders are dependent on sex, examining both sexes in the preclinical setting is crucial. Had the investigators only examined females or males in this study, the conclusions from each sex would have contradicted each other regarding the role of Irisin on bone. Also, the approaches are thorough and comprehensive that assess the functional (mechanical testing), morphological (microCT, BSEM, and histology), and cellular (RNA-seq) properties of bone.

      Weaknesses: One of the weaknesses of this study is a lack of detailed mechanistic analysis of why Irisin has a sex-dependent role on skeletal homeostasis. This absence is particularly notable in the osteocyte transcriptomic results where such data could have been used to further probe potential candidate pathways between LC females vs. LC males.

      Response: Our future studies will focus on understanding the molecular mechanism behind the sex-dependent effects of irisin. Our RNA seq data shows a significant difference in the lipid, steroid, and fat metabolism pathways between male and female mice, as well as between WT and KO mice. Future studies will focus on these pathways.

      Another weakness is authors did not present data that convincingly demonstrate that Irisin secretion is altered in the skeletal muscle between female vs. male WT mice in response to calcium restriction. The supplement skeletal muscle data only present functional and electrophysiolgical outcomes. Since Itgav or Itgb5 were not different in any of the experimental groups, it is assumed that the changes in the level of Irisin is responsible for the phenotypes observed in WT mice. Assessing Irisin expression will further strengthen the conclusion based on observing skeletal changes that occur in Irisin KO male and female mice.

      Response: The problem is that the commercial assays for irisin are not dependable, and results can differ widely across and beyond the physiologic range of 1-10 ng/ml. In part this is due to the nature of the polyclonal antibodies used and the resultant cross reactivity with other proteins. It was shown in Islam et al, 2021 (Nature Metabolism) that the commercial ELISAs were completely unreliable in mice and the only reliable method of measuring circulating irisin is mass spectrometry.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Were there any low calcium diet food intake or body weight alterations between littermates and FDNC5 KO mice?

      Response: Yes, and we can now include the body weight data and the food intake data in the supplement. We do not observe any significant difference between the groups.

      (2) In Fig 1, ideally the authors would provide the osteocyte lacunar density along with the lacunar area.

      Response: We do not observe any difference in osteocyte density in any of the groups. There is not sufficient time within 2 weeks to see a change in osteocyte density because there is no new bone formation.

      (3) What is the author's comment on the involvement of irisin on TGF-B signaling since the authors observed peri lacunar remodeling in FDNC5 KO mice? Authors should also include this in the discussion section regarding the Irisin-TGF-B signaling in terms of observed increased matrix-related signals.

      Response: Perilacunar modeling is the removal followed by the replacement of the perilacunar and pericanilucular matrix as occurs with lactation (Qing et al 2012). Osteocytic osteolysis is the first half of that process where the matrix is removed. Alliston and colleagues generated transgenic mice with reduced expression of the TGFb Type II receptor in mice by using the Dmp1-Cre (PMID: 32282961). They clearly found a significant difference in bone parameters, the appearance of the osteocyte lacunocanalicular network, and markers of the osteocyte perilacunar remodeling between the sexes, however they did not compare the lacunar remodeling process in males as compared to females. The females were subjected to lactation and were found to be resistant to osteocytic osteolysis. To compare males and females, they would have had to challenge both sexes to a high calcium demanding condition such as low calcium diet as performed in the current study. Their study does suggest that TGF is involved in the osteocytic osteolysis that occurs with lactation. However, as the null males showed an abnormal lacunocanlicular network compared to wildtype males, this does not necessarily indicate a defect in perilacunar remodeling. It is more likely that the defect occurred during bone formation when osteoblasts were differentiating into osteocytes. Therefore, we will reference this paper regarding the role of TGF in osteocytic osteolysis in females with lactation but not in the comparison of males to females. We have examined the normalized expression of TGF1, 2, and 3 in the present study and found no significant differences in TGF1 or 2 in any of the groups, but did find significantly higher expression of TGF3 in females compared to males for WT (fdr < 0.05), LCD WT (fdr < 0.05), and Control KO (p value < 0.01). Perhaps this isoform is playing a major role in osteocytic osteolysis that occurs with lactation.

      (4) Did the authors compare the transcriptomic dataset between lactated female WT vs. KO groups? Or were the RNA-seq studies only performed on LCD study samples?

      Response: We have examined RNA sequence on the LCD study samples, and not in the lactating females.

      Reviewer #2 (Recommendations For The Authors):

      Line 401 on page 14 states that the sexes respond differently to calcium deficiency. Lacunar area increases in both sexes, so the response is very similar. What appears to be different between the sexes is the role of FNDC5 in this process.

      Response: Female WT mice have higher osteocyte lacunar area at baseline with normal diet compared to WT males. With the low calcium diet, lacunar area increases in both sexes, with female WTs having a greater increase. We agree that what appears to be different between the sexes is the role of FNDC5 when challenged with high calcium demand.

      Reviewer #3 (Recommendations For The Authors):

      • The authors state in the abstract and discussion that 'We propose Irisin ensures the survival of offspring by targeting the osteocytes...'. However, this appears to be over interpretation of their findings as they have not assessed the number of offspring surviving to weaning or their growth rate between WT and KO breeders.

      Response: That was a proposal and we agree that it could be an over interpretation. However we would like to keep this as a speculation that could be tested in future studies.

      • Figures 1 and 2 should include cortical Total Area (and maybe Marrow Cavity data from Supp as well). These data will help readers to assess whether the thinning of the cortex is driven by impaired periosteal expansion or accelerated endosteal resorption (or both). Marrow cavity area data seem to suggest increased endosteal resorption (Supp. Table 2), but unclear if periosteal expansion is altered.

      Response: The data are included in the supplementary tables. We do not observe any difference in the periosteal area between the groups.

      • To further support the author's statement that male KO mice exhibit different material properties of bone compared to WT mice, estimated elastic modulus should be calculated from the stiffness data (see https://doi.org/10.1002/jbmr.2539).

      Response: We looked at the elastic modulus and it requires a stress strain curve instead of the force displacement we used in our calculations, therefore we were not able to get the estimated elastic modulus from the raw data we have.

      • In Figure 3 there is no legend indicating females or males. Based on the data and results texts it is assumed that red is Female and blue is Male. However, please confirm in the figure legend.

      Response: This is now added in the figure legends.

      • Transcriptomic data should be deposited to NCBI GEO data repository. Also, please indicate whether cutoff p-value for DEG analysis was adjusted or not.

      Response: We have submitted our data to the GEO data repository: GSE242445. Significant genes were defined as genes with p-value less than 0.01 and absolute log2 fold change larger than 1. The p-value is not adjusted. This information is now added.

      • The statistical analysis section indicates that a two-way repeated-measure ANOVA was used. However, the data presented in the study are from independent groups, in which case repeated-measure statistical approaches should not be used. Please clarify the statistical tests that were used.

      Response: We now use regular ANOVA instead of repeated-measure ANOVA. Repeated-measure ANOVA is used for paired tests. The data remain significant.

      In summary, we thank the reviewers for their very useful and thoughtful suggestions for improving our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Response to reviewer 1 comments on “weaknesses”:

      “A weakness in the approach is the use of genetic models that do not offer complete deletion of the prolactin receptor from targeted neuronal populations...”

      We acknowledge that neither model used provided a complete deletion of the prolactin receptor (Prlr) from the targeted neuronal populations. We suspect that incomplete deletion of targeted genes is not uncommon in these sort of studies, but this remains the best approach to addressing our question, and we believe we have been thorough and transparent in reporting the degree of deletion observed. We thought we had appropriately discussed the implications of the low proportion of Kiss1 cells still expressing Prlr, but will certainly revisit to ensure it is discussed thoroughly. This does not detract, however, from the key conclusion that prolactin action is necessary for full suppression of fertility in lactation in the mouse.

      “Results showing no impact of progesterone on LH secretion during lactation are surprising, given the effectiveness of progesterone-containing birth control in lactating women...”

      We think that this comment misrepresents what has been done in our study. We did not report a lack of impact of progesterone, as exogenous progesterone was never administered to mice. We did, however, give mifepristone as a progesterone receptor antagonist to determine whether endogenous progesterone contributed to the suppression of kisspeptin neuronal activity. We found that mifepristone, at levels sufficient to terminate pregnancy, had no effect on pulsatile LH secretion in lactating mice. This is consistent with our prior observation that progesterone levels are low in mouse lactation, suggesting that progesterone does not contribute significantly to the suppression of kisspeptin neuronal activity during lactation in the mouse. We agree with the reviewer that if we had given exogenous progesterone, it likely would result in suppression of pulsatile LH secretion (as it does in women). Indeed, in other work, we have found that progesterone administration profoundly suppresses activity of the kisspeptin neurons in mice (https://doi.org/10.1210/en.2019-00193). But this was not the point of the present experiment. We will review how we have described this experiment to ensure that this is absolutely clear.

      “While the authors assert their findings may reflect an important role for prolactin in lactational infertility in other mammalian species, that remains to be seen….”

      We acknowledge that our study cannot address whether prolactin is necessary for the suppression of lactation in other mammalian species. We hope our data may stimulate a re-examination of this question in other species, however, as some of the prior methodology (such as using pharmacological suppression of prolactin) may have had off target effects that confound interpretation. We thought that this point was discussed appropriately in the manuscript but we will certainly check and make sure this is addressed suitably.

    1. Author Response

      The following is the authors’ response to the current reviews.

      eLife assessment

      This important study used Voltage Sensitive Dye Imaging (VSDI) to measure neural activity in the primary visual cortex of monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. The authors show convincingly that the initial effect of the mask ran counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. They interpret these results in terms of influences from the receptive field center, and although an alternative view that emphasizes the role of the receptive field surround also seems reasonable, this study stands as an interesting and important contribution to our understanding of mechanisms of visual perception.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings.

      The authors have done a good job responding to the main points of my previous review. One important question remains, as stated in that review:

      "My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry at al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here."

      Their rebuttal of my first review is not convincing -- I still believe that surround influences are important and perhaps predominant in determining the outcome of the experiments. This is particularly clear for the "paradoxical" dynamics that they observe, which seem exactly to reflect the behavior of the surround.

      The authors' arguments to the contrary are based on three main points. First, their stimuli cover the center and surround, unlike those of many previous experiments, so they argue that this somehow diminishes the impact of the surround. But the argument is not accompanied by data showing the effects of center stimuli alone or surround stimuli alone. Second, their model -- a normalization model -- does not need surround influences to account for the masking effect. Third, they cite human psychophysical masking results from their collaborators (Sebastian et al 2017), but do not cite an equally convincing demonstration that surround contrast creates potent orientation selective masking when presented alone (Petrov et al 2005, https://doi.org/10.1523/JNEUROSCI.2871-05.2005).

      At the end of the day, these issues will be resolved by further experiments, not argumentation. The paper stands as an excellent contribution, but it might be wise for the authors to be less doctrinaire in their interpretations.

      We thank the reviewer for their positive comments and constructive criticism. In general, we agree with the reviewer’s comments. Importantly, we do not claim that there is no effect from the surround. What we say in the discussion is:

      “Because our targets are added to the background rather than occluding it, it is likely that a significant portion of the behavioral and neural masking effects that we observe come from target-mask interactions at the target location rather than from the effect of the mask in the surround.”

      We still stand by this assessment. We also make the point that, at least within the framework of our delayed normalization model, there is no need for the normalization mechanism to extend beyond the center mechanism to account for our results, and even if the normalization mechanism is somewhat larger than the center, the overlap region at the center would still have a large contribution to the modulations. Overall, we agree that these issues will be need to be resolved by future experiments.

      For the reasons discussed in our previous reply, we disagree with the reviewers’ statement “…this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature”. For similar reasons we disagree with the statement “It is nice to see that VSDI results square well with those from prior extracellular recordings”.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      We thank the reviewer for their positive comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      None, except perhaps for a more balanced representation of the "surround" possibility in the Discussion. The Petrov et al paper (https://doi.org/10.1523/JNEUROSCI.2871-05.2005) should be considered and cited.

      As discussed above, we believe that our discussion of possible contribution from the surround is balanced. While the paper by Petrov et al is interesting, the stimuli used to study the surround effects are quite different (e.g., gap between center and surround, and the sharp edge of the surround inner boundary) so direct comparison with our results is not possible.

      Reviewer #2 (Recommendations For The Authors):

      The authors have addressed the questions/suggestions I raised in my review.


      The following is the authors’ response to the original reviews.

      We thank the reviewers for their helpful comments and suggestions.

      eLife assessment

      This is an important contribution that extends earlier single-unit work on orientation-specific center-surround interactions to the domain of population responses measured with Voltage Sensitive Dye (VSD) imaging and the first to relate these interactions to orientation-specific perceptual effects of masking. The authors provide convincing evidence of a pattern of results in which the initial effect of the mask seems to run counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. It seems likely that the physiological effects of masking reported here can be attributed to previously described signals from the receptive field surround.

      We thank the reviewers for bringing up the relation of our results to findings from previous orientation-specific center-surround interactions studies. In our final manuscript, we added a paragraph discussing this important issue. Briefly, for multiple reasons, we believe that the orientation-dependent behavioral and neural masking effects that we observe are unlikely to depend on previously described center-surround interactions in V1. First, in human subjects, perceptual similarity masking effects are almost entirely accounted for by target-mask interactions at the target location and are recapitulated when the mask has the same size and location as the target (Sebastian et al 2017). Second, in our computational model, the effect of mask orientation on the dynamics of the response are qualitatively the same if the mask is restricted to the size and location of the target while mask contrast is increased (Fig. 8 – figure supplement 3). Third, in our model, the results are qualitatively the same when the spatial pooling region for the normalization signal is the same as that for the excitation signal (Fig. 8 – figure supplement figure 1). These considerations suggest that center-surround interactions may not be necessary for neural and behavioral similarity masking effects with additive targets.

      We would also like to point out some key differences between the stimuli that we use and the ones used in most previous center-surround studies. First, in our experiments, the target and the mask were additive, while in most previous center-surround studies the target occludes the background. Such studies therefore restrict the mask effect to the surround, while in our study we allow target-mask interactions at the center. Second, most center-surround studies have a sharp-edged target/surround, while in our experiments no sharp edges were present. Unpublished results from our lab suggest that such sharp edges have a large impact on V1 population responses. A third key difference is that our stimuli were flashed for a short interval of 250 ms corresponding to a typical duration of a fixation in natural vision, while most previous center-surround studies used either longer-duration drifting stimuli or very short-duration random-order stimuli for reverse-correlation analysis.

      In addition, we would like to emphasize that our results go beyond previous studies in two important ways. First, we study the effect of similarity masking in behaving animals and quantitatively compare the effect of similarity masking on behavior and physiology in the same subjects and at the same time. Second, VSD imaging allows us to capture the dynamics of superficial V1 population responses over the entire population of millions of neurons activated by the target at two important spatial scales. Such results therefore complement electrophysiological studies that examine the activity of a very small subset of the active neurons.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings. But the work may be less original than the authors propose, and their overall framing strikes me as odd. Some additional clarifications could make the contribution more clear.

      Please see our reply above regarding the agreement with previous studies and framing.

      My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry et al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here.

      We thank the reviewer for the pointing out this previous work which we now cite in the final version of the manuscript. For the reasons discussed above, while this study is interesting and related to our work, we believe that our results are quite distinct.

      • In the discussion (lines 315-316), they state "in order to account for the reduced neural sensitivity with target-background similarity in the second phase of the response, the divisive normalization signal has to be orientation selective." I wonder whether they observed this in their modeling. That is, how robust were the normalization model results to the values of sigma_e and sigma_n? It would be useful to know how critical their various model parameters were for replicating the experimental effects, rather than just showing that a good account is possible.

      Thank you for this suggestion. In the final manuscript we include a supplementary figure that shows how the model’s predictions are affected by the orientation tuning and spatial extent of the normalization signal, and by the size and contrast of the mask (Fig. 8 – figure supplement 1-4).

      • The majority of their target/background contrast conditions were collected only in one animal. This is a minor limitation for work of this kind, but it might be an issue for some.

      We agree that this is a limitation of the current study. These are challenging experiments and we were unable to collect all target/background contrast combinations from both monkeys. However, in the common conditions, the results appear similar in the two animals, and the key results seem to be robust to the contrast combination in the animal in which a wider range of contrast combinations was tested. We added these points to the discussion in the final manuscript.

      • The authors point out (line 193-195) that "Because the first phase of the response is shorter than the second phase, when V1 response is integrated over both phases, the overall response is positively correlated with the behavioral masking effect." I wonder if this could be explored a bit more at the behavioral level - i.e. does the "similarity masking" they are trying to explain show sensitivity to presentation time?

      We agree that testing the effect of stimulus duration on similarity masking is interesting, but unfortunately, it is beyond the scope of the current study. We would also like to point out that the duration of the presentation was selected to match the typical time of fixation during natural behaviors, so much shorter or much longer stimulus durations would be less relevant for natural vision.

      • From Fig. 3 it looks like the imaging ROI may include some opercular V2. If so, it's plausible that something about the retinotopic or columnar windowing they used in analysis may remove V2 signals, but they don't comment. Maybe they could tell us how they ensured they only included V1?

      We thank the reviewer for this comment. As part of our experiments, we extract a detailed retinotopic map for each chamber, so we were able to ensure that the area used for the decoding analysis lays entirely within V1. We now incorporate this information in the final manuscript (Fig. 3 – figure supplement 1).

      • In the discussion (lines 278-283) they say "The positive correlation between the neural and behavioral masking effects occurred earlier and was more robust at the columnar scale than at the retinotopic scale, suggesting that behavioral performance in our task is dominated by columnar scale signals in the second phase of the response. To the best of our knowledge, this is the first demonstration of such decoupling between V1 responses at the retinotopic and columnar scales, and the first demonstration that columnar scale signals are a better predictor of behavioral performance in a detection task." I am having trouble finding where exactly they demonstrate this in the results. Is this just by comparison of Figs. 4E,K and 5E,K? I may just be missing something here, but the argument needs to be made more clearly since much of their claim to originality rests on it.

      We thank the reviewer for this comment. In the final manuscript we are more explicit when we discuss this point and refer to the relevant panels in Figs. 4, 5 and their figure supplements. To substantiate this key claim, we also report the timing of the transition between the two phases in all temporal correlation panels and report the neural-behavioral correlation for the integration period.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      Points to Consider / Possible Improvements

      The biphasic nature of the relationship between neural and behavioral modulation by the mask and the surprising finding that the two are anticorrelated in the initial phase are left as a mystery. The paper would be more impactful if this mystery could be resolved.

      We thank the reviewer for the positive comments. In our view, while our results are surprising, there may not be a remaining mystery that needs to be resolved. As our model shows, the biphasic nature of V1’s response can be explained by a delayed orientation-tuned gain control. Our results are consistent with the hypothesis that perception is based on columnar-scale V1 signals that are integrated over an approximately 200 ms long period that incorporates both the early and the late phase of the response, since such decoded V1 signals are positively correlated with the behavioral similarity masking effect (Fig. 5D, J; Fig. 5 – figure supplement 1). We now explain this more clearly in the discussion of our final manuscript.

      The finding is based on analyses of the correlation between behavior and neural responses. This appears in the main body of the manuscript and is detailed in Figures S1 and S2, which show the correlation over time between behavior and target response for the retinotopic and columnar scale.

      One possible way of thinking of this transition from anti- to positive correlation with behavior is that it might reflect the dynamics of a competitive interaction between mask and target, with the initial phase reflecting predominantly the mask response, with the target emerging, on some trials, in the latter phase. On trials when the mask response is stronger, the probability of the target emerging in the latter phase, and triggering a hit, might be lower, potentially explaining the anticorrelation in the initial phase. The sustained response may be a mixture of trials on which the target response is or is not strong enough to overcome the effect of the mask sufficiently to trigger target detection.

      It would, I think, be worth examining this by testing whether target dynamics may vary, depending on whether the monkey detected the target (hit trials) or failed to detect the target (miss trials). Unless I missed it I do not think this analysis was done. Consistent with this possibility, the authors do note (lines 226-229) that "The trajectories in the target plus mask conditions are more complex. For example, when mask orientation is at +/- 45 deg to the target, the population response is initially dominated by the mask, but then in mid-flight, the population response changes direction and turns toward the direction of the target orientation." This suggests (to this reviewer, at least) that the emergence of a positive correlation between behavioral and neural effects in the latter phase of the response could reflect either a perceptual decision that the target is present or perhaps deployment of attention to the location of the target.

      It may be that this transition reflected detection, in which it might be more likely on hit trials than miss trials. Given the SNR it would presumably be difficult to do this analysis on a trial-by-trial basis, but the hit and miss trials (which make each make up about 1/2 of all trials) could be averaged separately to see if the mid-flight transition is more prominent on hit trials. If this is so for the +/- 45 degree case it would be good to see the same analysis for other combinations of target and mask. It would also be interesting to separate correct reject trials from false alarms, to determine whether the mid-flight transition tends to occur on false alarm trials.

      If these analyses do not reveal the predicted pattern, they might still merit a supplemental figure, for the sake of completeness.

      We thank the reviewer for suggesting this interesting possibility. The original analysis in the manuscript was based on both correct and incorrect trials, raising the possibility that our results reflect some contribution from decision- and/or attention-related signals rather than from low-level nonlinear encoding mechanisms in V1 that we postulate in our model (Fig. 8). To explore this possibility, we re-examined our results while excluding error trials. We found that our key results from Figs 4 and 5 – namely that there is an early transient phase in which the neural and behavioral similarity effects are anti-correlated, and a later sustained phase in which they are positively correlated – hold even for the subset of correct trials, reducing the possibility that decision/attention-related signals play a major role in explaning our results. We now include the results of this analysis as a supplementary figure in the final manuscript (Fig. 4 – figure supplement 2). While there may be some interesting differences in the response dynamics between correct and incorrect trials, the current study was not designed to address this question and the large number of conditions and small number of repeats that it necessitated make this data set suboptimal for examining these phenomena.

      References

      Sebastian S, Abrams J, Geisler WS. 2017. Constrained sampling experiments reveal principles of detection in natural scenes. Proc Natl Acad Sci U S A 114: E5731-e40

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to express our sincere appreciation for the invaluable comments provided by the reviewers and their constructive suggestions to enhance the quality of our manuscript. In response to their feedback, we have diligently revised and resubmitted our paper as an article, introducing five primary figures, seven supplementary figures, and two supplementary data files. Importantly, this work represents a noteworthy contribution to the field, presenting novel findings for the first time without any prior publication.

      Within the enclosed document, we have provided a comprehensive response to the reviewer comments, addressing each point in a meticulous and specific manner. We extend our sincere gratitude to the reviewers for their diligent examination of our manuscript and for offering insightful recommendations.

      In our latest revision, we have taken great care to respond to every reviewer's comment, ensuring that we clarify the manuscript and provide robust evidence where required. The primary focus of these revisions was to provide additional context regarding the cooperative role between PR-Set-7 and PARP-1 in the repression of metabolic genes, accompanied by a thorough description of the current state of the field. Substantial modifications and new analyses, presented in the supplemental figures, have been included to comprehensively address this concern.

      Another concern raised was regarding the interaction between PARP-1 and mono-methylated active histone marks, which was not adequately described in the previous version of our manuscript. In this revised version, we have updated our Fig. 1 and Supplemental Fig. S1 and introduced Supplemental Fig. S2 to properly demonstrate that PARP-1 binds to all mono-methylated active histone marks tested. Furthermore, we extensively revised the Discussion section of our manuscript to discuss the implications of this discovery and how it fits into the broader context of PARP-1 research.

      Addressing another reviewer's concern about the potential indirect regulation of transcription by PARP1 and PR-SET7, we revised the discussion section and incorporated findings from our recent study. These findings clearly demonstrate PARP1's binding to the loci of misregulated genes, suggesting a direct involvement in their regulation.

      Furthermore, we have improved the description of the reagents and Drosophila lines used in this study to provide a more comprehensive understanding for readers. Finally, we conducted a comprehensive revision of the entire manuscript to rectify the identified typos and grammatical errors.

      Enclosed, you will find a detailed, point-by-point response to each of the reviewer's comments, showcasing our commitment to addressing their concerns with precision.

      We firmly believe that our revisions successfully resolve all the concerns raised by the reviewers, and we are confident that this improved version of our manuscript contributes significantly to the scientific discourse.

      Reviewer #1:

      The study investigates the role of PARP-1 in transcriptional regulation. Biochemical and ChIP-seq analyses demonstrate specific binding of PARP-1 to active histone marks, particularly H4K20me, in polytene chromosomes of Drosophila third instar larvae. Under heat stress conditions, PARP-1's dynamic repositioning from the Hsp70 promoter to its gene body is observed, facilitating gene activation. PARP-1, in conjunction with PR-Set7, plays a crucial role in the activation of Hsp70 and a subset of heat shock genes, coinciding with an increase in H4K20me1 levels at these gene loci. This study proposes that H4K20me1 is a key facilitator of PARP-1 binding and gene regulation. However, there are several critical concerns that are yet to be addressed. The experimental validation and demonstration of results in the main manuscript are scant. Recent developments in the area are omitted, as an important publication hasn't been discussed anywhere in the work (PMID: 36434141). The proposed mechanism operates quite selectively, and any extrapolations require intensive scientific evidence.

      Major Comments:

      (1) PARP1 hypomorphic mutant validation data must be provided at RNA levels as the authors have mentioned about its global reduction in RNA levels.

      We sincerely appreciate Reviewer 1 for their meticulous review of our manuscript and for providing valuable insights. In response to the raised concern, we would like to highlight that the validation data for the PARP1 hypomorphic mutant at the RNA level has been previously documented in our study (PMID: 20371698), where we found that PARP1 RNA level was deeply impacted in parp1C03256. To enhance clarity, we have made corresponding modifications to the Materials and Methods section to explicitly articulate this aspect: parp-1C03256 significantly lowers the level of PARP-1 RNA and protein level (14) but also significantly diminishes the level of pADPr (11).

      We hope these revisions effectively address the reviewer's suggestion and contribute to a more comprehensive understanding of our findings.

      (2) The authors should provide immunoblot data for global Poly (ADP) ribosylation levels in PARP1 hypomorphic mutant condition as compared to the control. They must also provide the complete details of the mouse anti-pADPr antibody used in their immunoblot in Figure 5B.

      We extend our gratitude to Reviewer 1 for drawing attention to aspects requiring further clarification. In response to the inquiry about global Poly (ADP) ribosylation levels in the PARP1 hypomorphic mutant condition, we want to emphasize that our study extensively reported on the diminished levels of pADPr in comparison to the wildtype, as documented in our previous work (PMID: 21444826). To address this, we have incorporated pertinent details in the Materials and Methods section, providing a comprehensive account of our findings. parp-1C03256 significantly lowers the level of PARP-1 RNA and protein level (14) but also significantly diminishes the level of pADPr (11).

      Furthermore, in addressing the request for complete details of the mouse anti-pADPr antibody (10H) used in Figure 5B, we have taken steps to enhance transparency. The Materials and Methods section has been revised to incorporate more comprehensive information about the antibody, ensuring a clearer understanding of our experimental procedures. anti-pADPr (Mouse monoclonal, 1:500, 10H - sc-56198, Santa Cruz).

      We appreciate the reviewer's diligence in ensuring the robustness of our methodology, and we believe these modifications strengthen the overall quality and transparency of our study.

      (3) PR-Set7 mutant validation results should be provided in the main manuscript, as done by the authors using qRT-PCR. Also, immunoblot data for the PR-set7 null condition should be supplemented in the main manuscript as the authors have already mentioned their anti-PR-Set7 (Rabbit, 1:1000, Novus Biologicals, 44710002) antibody in the materials and methods section.

      We appreciate Reviewer 1's thorough examination of our manuscript and their constructive feedback. The pr-set7 null mutant has been rigorously characterized in a study conducted by Dr. Ruth Steward's laboratory (PMID: 15681608). Additionally, we employed our PR-SET7 antibody to validate the mutant, and the corresponding data can be found in Supplemental Figure 3. To enhance clarity, we have made necessary modifications to both the results and Materials and Methods sections, providing explicit details on the validation process. Result section: To validate our hypothesis, we initially confirmed that the pr-set720 mutant not only eliminated PR-SET7 RNA and protein but also abrogated H4K20me1 modification (Supplemental Fig.S3).

      Material and methods section: The pr-set720 null mutant was validated in (15) and we confirmed that this mutant abolishes PR-SET7 RNA and protein level but also leads to the absence of H4K20me1 (Supplemental Fig. S3).

      We believe these revisions address the reviewer's concerns and contribute to a more comprehensive presentation of our study.

      (4) The authors have probably missed out on a very important recent report (PMID: 36434141), suggesting the antagonistic nature of the PARP1 and PR-SET7 association. In light of these important observations, the authors must check for the levels of PR-SET7 in PARP1 hypomorphic conditions.

      We appreciate the insightful comment from Reviewer 1, drawing our attention to the recent study by Estève et al. (PMID: 36434141) highlighting the potential antagonistic relationship between PARP1 and PR-SET7. To address this important point, we have carefully examined the levels of PR-SET7 in PARP1 hypomorphic conditions.

      In response to this concern, we have added two new supplemental figures, Supplemental Fig. S4 and S5, which specifically address the impact of PARP1 deficiency on PR-Set7 expression. These figures clearly demonstrate that there were no significant changes observed in PR-SET7 RNA (Fig. S4) or protein levels (Fig. S5) in the absence of Parp1. This finding supports the conclusion that Parp1 is not directly involved in the regulation of PR-SET7 in Drosophila.

      Furthermore, we have updated the Results section to explicitly mention this observation:

      Interestingly, in the absence of PARP-1, neither PR-SET7 RNA nor protein levels were affected (Supplemental Fig. S4-5), indicating that PARP-1 is not directly implicated in the regulation of PR-SET7.

      Additionally, we have included information about the anti-H3 antibody used in Supplemental Fig. S4 in the Materials and Methods section: anti-H3 (Rabbit polyclonal, 1/1000, FL-136 sc-10809 Santa Cruz).

      We believe that these modifications effectively address the raised concern and provide a more comprehensive understanding of the relationship between PARP1 and PR-SET7 in our study. We hope these clarifications enhance the overall robustness and clarity of our findings.

      (5) Also, the results of the aforementioned study should be adequately discussed in the present study along with its implications in the same.

      We appreciate Reviewer 1's valuable suggestion to discuss the implications of the study by Estève et al. (PMID: 36434141) within the context of our own findings. Estève et al. reported a potential antagonistic relationship between PARP1 and PR-SET7, showing that a decrease in PARP1 proteins leads to an increase in PR-SET7 protein levels. In our investigation, however, we did not observe significant changes in PR-SET7 RNA and protein levels in the parp1C03256 mutant, as demonstrated in the newly added Supplemental Fig. S3 and S4.

      We acknowledge the discrepancy between our results and those of Estève et al., and we propose that this difference may be due to distinct experimental approach: Estève et al.'s study focused on mammalian cell populations and in vitro experiments, whereas our investigation employed Drosophila third-instar larvae as the whole organism model. It is plausible that regulatory mechanisms governing PR-SET7 differ between mammals and Drosophila. Another possibility is that PARP-1 may cooperate with PR-SET7 in the context of Drosophila development but could exhibit antagonistic roles against PR-SET7 in specific cell lines and under certain biological or developmental conditions.

      In the Discussion section, we have incorporated this information, stating: A recent study demonstrated that in human cells overexpressing PARP-1, PR-SET7/SET8 is degraded (33). This implies that the absence of PARP-1 might lead to increased levels of PR-SET7. However, in our study involving parp-1 mutant in Drosophila third-instar larvae, we observed a slightly different scenario: we detected a minor but not significant reduction in both PR-SET7 RNA and protein levels (Supplemental Fig.S4 and S5). This outcome stands in stark contrast to the previous study's findings. The discrepancy could be due to the distinct experimental approaches used: the previous research focused on mammalian cells and in vitro experiments, whereas our study examined the functions of PARP-1 in whole Drosophila third-instar larvae during development. Consequently, while PARP-1 may cooperate with PR-SET7 in the context of Drosophila development, it could exhibit antagonistic roles against PR-SET7 in specific cell lines and under certain biological or developmental conditions.

      We believe these modifications provide a comprehensive discussion of the observed discrepancies and enhance the overall interpretation of our findings. We hope that these clarifications satisfactorily address the concerns raised by Reviewer 1.

      (6) Gene transcriptional activation requires open chromatin and RNA polymerase II binding to the promoter. Since, differentially expressed genes in both PR-Set7 null and PARP1 hypomorph mutants, co-enriched with PARP-1 and H4K20me1 were mainly upregulated, the authors should provide RNA polymerase II occupancy data of these genes via RNA-Pol II ChIP-seq to further attest their claims.

      We appreciate the insightful comment from Reviewer 1 regarding the necessity for RNA-polymerase II (PolII) occupancy data to further support our claims on gene transcriptional activation. To address this concern, we conducted an analysis of PolII occupancy around genes co-enriched with PARP-1 and H4K20me1 that are upregulated in both pr-set720 and parp-1C03256 mutants during the third instar larvae stage. The results of this analysis have been included in the newly added supplemental Fig. S5.

      Our findings reveal that these upregulated genes exhibit higher PolII occupancy compared to other genes, both at their promoter regions and gene bodies, suggesting heightened activity during third instar larval stage in wild type animals (Supplemental Fig. S6). To further validate these results, we cross-referenced publicly available RNA-seq data at the same developmental stage, confirming that, on average, these upregulated genes display a 40% higher expression compared to other genes (supplemental Fig. S6B).

      Moreover, we would like to highlight the consistency of our current findings with our previous study (PMID: 38012002), where we reported the critical involvement of PARP-1 in tempering the expression of active metabolic genes at the end of the third instar larvae. The current data, suggesting a role for PR-SET7 in this regulatory process, adds another layer to our understanding of the nuanced control exerted by PARP-1 on the expression of active metabolic genes during this critical developmental transition.

      In light of these results, we have modified the Results section to emphasize these findings: Intriguingly, under wild-type conditions, these genes displayed expression levels approximately 40% higher than the average and demonstrated increased RNA-Polymerase II occupancy both at their promoter regions and gene bodies compared to other genes (supplemental Fig.S6), indicating their high activity in wild type context.

      Additionally, we have incorporated this information into the Discussion section to underscore the cooperative role of PARP-1 and PR-SET7 in repressing the expression of active metabolic genes: Notably, genes co-enriched with PARP-1 and H4K20me1, and are upregulated in both parp-1C03256 and pr-set720 mutants, are predominantly metabolic genes exhibiting high expression levels under wild-type conditions and a high occupancy of polymerase II both at their promoter region and gene body (Supplemental Fig. S6). In our previous study, we discovered that PARP-1 plays a crucial role in repressing highly active metabolic genes during the development of Drosophila by binding directly to their loci (34). Also, PARP-1 is required for maintaining optimum glucose and ATP levels at the third-instar larval stage (34). During Drosophila development, repression of metabolic genes is crucial for larval to pupal transition (35, 36). This repression is linked to the reduced energy requirements as the organism prepares for its sedentary pupal stage (35, 37). Notably, we observed that PARP-1 shows a high affinity for binding to the gene bodies of these metabolic genes (34).

      Our data indicates that in both parp-1 and pr-set7 mutant animals, there was a preferential repression of metabolic genes at sites where PARP-1 and H4K20me1 are co-bound (Fig.3E), while these metabolic genes are highly active during third-instar larval stage (Supplemental Fig.S6). Thus, we propose that the presence of H4K20me1 may be essential for the binding of PARP-1 at these gene bodies, contributing to their repression. Importantly, this mechanism of gene repression has broader developmental implications. As earlier stated, mutant animals lacking functional PARP-1 and PR-SET7 undergo developmental arrest during larval to pupal transition. This arrest could be directly linked to the disruption of the normal metabolic gene repression during development. Without the repressive action of PARP-1 and PR-SET7, key metabolic processes might remain unchecked, leading to metabolic imbalances that are incompatible with the normal progression to the pupal stage.

      Finaly, we have updated the Materials and Methods section to include information about the RNA-seq and PolII ChIP-seq datasets used: GSE15292 (RNA-polymerase II). In addition, we used the Developmental time-course RNA-seq dataset (54), SRP001065.

      We believe that these modifications comprehensively address Reviewer 1's concern and provide a more robust foundation for our claims regarding the role of PARP-1 and PR-SET7 in the transcriptional regulation of co-enriched genes during the critical developmental transition.

      (7) As discussed in Figure 4, the authors found transcriptional activation of group B genes even after a significant reduction of H3K20me1 in their gene body after heat shock. Given the dynamic equilibrium shift in epigenetic marks that regulate gene expression and their locus-specific transcriptional regulation, the authors should further look for the enrichment of other epigenetic marks and even H4K20me1 specific demethylases such as PHF8 (PMID: 20622854), and their cross-talk with PARP1 to further bridge the missing links of this tale. This will add more depth to this work.

      We appreciate the thoughtful input provided by Reviewer 1 and acknowledge the importance of exploring additional epigenetic marks and potential cross-talk association with PARP1 to enhance the depth of our study. Our investigation has primarily focused on the interplay between PR-SET7/H4K20me1 and PARP-1, as evidenced by the colocalization and robust binding affinity observed between PARP-1 and H4K20me1 (Fig 1C, 2B, and 3A). This interaction is particularly noteworthy in the context of regulating specific heat shock genes, as highlighted in Figure 4A. While we recognize the potential significance of examining a broader spectrum of epigenetic marks and considering the involvement of specific demethylases, such as PHF8 (PMID: 20622854), in this regulatory network, our research strategy is intentionally tailored to leverage the unique characteristics of the PR-SET7/H4K20me1 and PARP-1 interplay in Drosophila. A key consideration is the technical advantage afforded by the fact that PR-SET7 is the exclusive methylase responsible for H4K20 in Drosophila (PMID: 15681608), allowing for specific depletion of H4K20me1 without the confounding influence of other methyltransferases.

      This specificity is pivotal, especially given the similar developmental arrest patterns observed in both PR-SET7 and PARP-1 mutants. Such parallel phenotypes provide a distinct opportunity to delve deeply into the intricacies of their interaction during organismal development and in response to heat stress. Additionally, the identity of the demethylase for H4K20me1 in Drosophila remains unknown, further underscoring the rationale for our focused approach.

      While we acknowledge the broader implications of exploring additional epigenetic marks, we believe that our deliberate focus on the PR-SET7/H4K20me1 and PARP-1 pathway provides a unique and valuable perspective on the regulation of gene expression in Drosophila. We hope that this clarification addresses the concerns raised by Reviewer 1 and conveys the rationale behind our chosen research strategy.

      Reviewer #2:

      Summary:

      This study from Bamgbose et al. identifies a new and important interaction between H4K20me and Parp1 that regulates inducible genes during development and heat stress. The authors present convincing experiments that form a mostly complete manuscript that significantly contributes to our understanding of how Parp1 associates with target genes to regulate their expression.

      Strengths:

      The authors present 3 compelling experiments to support the interaction between Parp1 and H4K20me, including:

      (1) PR-Set7 mutants remove all K4K20me and phenocopy Parp mutant developmental arrest and defective heat shock protein induction.

      (2) PR-Set7 mutants have dramatically reduced Parp1 association with chromatin and reduced poly-ADP ribosylation.

      (3) Parp1 directly binds H4K20me in vitro.

      Weaknesses:

      (1) The histone array experiment in Fig1 strongly suggests that PARP binds to all mono-methylated histone residues (including H3K27, which is not discussed). Phosphorylation of nearby residues sometimes blocks this binding (S10 and T11 modifications block binding to K9me1, and S28P blocks binding to K27me1). However, H3S3P did not block H3K4me1, which may be worth highlighting. The H3K9me2/3 "blocking effect" is not nearly as strong as some of these other modifications, yet the authors chose to focus on it. Rather than focusing on subtle effects and the possibility that PARP "reads" a "histone code," the authors should consider focusing on the simple but dramatic observation that PARP binds pretty much all mono-methylated histone residues. This result is interesting because nucleosome mono-methylation is normally found on nucleosomes with high turnover rates (Chory et al. Mol Cell 2019)- which mostly occurs at promoters and highly transcribed genes. The author's binding experiments could help to partially explain this correlation because PARP could both bind mono-methylated nucleosomes and then further promote their turnover and lower methylation state.

      We appreciate the comprehensive review and valuable insights provided. In response to the comments, we have made substantial revisions to address the concerns and enhance the clarity of our findings. In Figure 1B, C, D, F, and G, we have expanded our data presentation to demonstrate PARP-1's binding affinity for H3K27me1. This addition is now incorporated into the revised results section. Additionally, we have updated Supplemental Fig.S1 and introduced new supplemental data (Supplemental Fig.S2) to illustrate the inhibition of PARP-1 binding by H3S10P, H3S28P, and H3T11P. The comprehensive exploration of PARP-1's interaction with mono-methylated histones, as suggested by the reviewer, is now more robustly documented in our revised figures and supplementary materials.

      Our Discussion section has been refined to articulate more clearly how PARP-1 may be selectively recruited to active chromatin domains through its interaction with mono-methylated histone marks. We have proposed a model where PARP-1 actively participates in the turnover process, contributing to the maintenance of an active chromatin environment. This proposed mechanism involves PARP-1 selectively binding to mono-methylated active histone marks associated with highly transcribed genes. Upon activation, PARP-1 undergoes automodification, leading to its release from chromatin and facilitating the reassembly of nucleosomes carrying the mono-methylated marks. The enzymatic action of Poly(ADP)-ribose glycohydrolase (PARG) subsequently cleaves pADPr, allowing for the restoration of PARP-1's binding affinity to mono-methylated active histone marks. This proposed hypothesis is consistent with existing research across various model organisms and aligns with the known association of PARP-1 with highly expressed genes, as well as its role in mediating nucleosome dynamics and assembly.

      Our Discussion section is modified a followed: Finaly, highly transcribed genes have been reported to present a high turnover of mono-methylated modifications, maintaining a state of low methylation (50). Then, our findings suggest that PARP-1 might actively participate in the turnover process to uphold an active chromatin environment. The proposed mechanism unfolds as follows: 1) PARP-1 selectively binds to mono-methylated active histone marks associated with highly transcribed genes. 2) Upon activation, PARP-1 undergoes automodification and is subsequently released from chromatin, facilitating the reassembly of nucleosomes carrying the mono-methylated marks. 3) The enzymatic action of Poly(ADP)-ribose glycohydrolase (PARG) cleaves pADPr, allowing for the restoration of PARP-1's binding affinity to mono-methylated active histone marks. This proposed hypothesis aligns cohesively with existing research conducted across various model organisms, including mice, Drosophila, and Humans (7, 23, 29, 51-53). Notably, previous studies have consistently demonstrated that PARP-1 predominantly associates with highly expressed genes and plays a crucial role in mediating nucleosome dynamics and assembly. Thus, our proposed model provides a molecular framework that may contribute to understanding the relationship between PARP-1 and the epigenetic regulation of gene expression. Further experimental validation is warranted to elucidate the precise details of this proposed mechanism and its implications in the broader context of chromatin dynamics and transcriptional control.

      We hope that these revisions address the reviewer's concerns and contribute to the overall strength and clarity of our manuscript.

      (2) The RNAseq analysis of Parp1/PR-Set7 mutants is reasonable, but there is a caveat to the author's conclusion (Line 251): "our results indicate H4K20me1 may be required for PARP-1 binding to preferentially repress metabolic genes and activate genes involved in neuron development at co-enriched genes." An alternative possibility is that many of the gene expression changes are indirect consequences of altered development induced by Parp1 or PR-Set7 mutants. For example, Parp1 could activate a transcription factor that represses the metabolic genes that they mention. The authors should consider discussing this possibility.

      We hope that these revisions address the reviewer's concerns and contribute to the overall strength and clarity of our manuscript.

      We extend our gratitude to Reviewer 2 for their thoughtful consideration of our manuscript and the insightful suggestion. In response to the raised concern regarding the conclusion on Line 251, where we proposed that "our results indicate H4K20me1 may be required for PARP-1 binding to preferentially repress metabolic genes and activate genes involved in neuron development at co-enriched genes," we acknowledge the alternative possibility suggested by the reviewer. It is plausible that many of the observed gene expression changes are indirect consequences of altered development induced in parp-1 or pr-set7 mutants. For example, PARP-1 could activates a transcription factor that represses the mentioned metabolic genes.

      To address this concern, we have revisited our data and incorporated relevant findings from one of our recent studies that utilized a ChIP-seq approach. The results from this study suggest a direct binding of PARP-1 to the loci of metabolic genes, providing support for the notion that PARP-1 may indeed directly regulate their expression (PMID: 37347109). We have updated the Discussion section to reflect this information, aiming to provide a more comprehensive perspective on the potential mechanisms underlying the observed gene expression changes: In our previous study, we discovered that PARP-1 plays a crucial role in repressing highly active metabolic genes during the development of Drosophila by binding directly to their loci (34). Also, PARP-1 is required for maintaining optimum glucose and ATP levels at the third-instar larval stage (34). During Drosophila development, repression of metabolic genes is crucial for larval to pupal transition (35, 36). This repression is linked to the reduced energy requirements as the organism prepares for its sedentary pupal stage (35, 37). Notably, we observed that PARP-1 shows a high affinity for binding to the gene bodies of these metabolic genes (34).

      We believe these modifications contribute to a more informed interpretation of our findings.

      (3) The section on the inducibility of heat shock genes is interesting but missing an important control that might significantly alter the author's conclusions. Hsp23 and Hsp83 (group B genes) are transcribed without heat shock, which likely explains why they have H4K20me without heat shock. The authors made the reasonable hypothesis that this H4K20me would recruit Parp-1 upon heat shock (line 270). However, they observed a decrease of H4K20me upon heat shock, which led them to conclude that "H4K20me may not be necessary for Parp1 binding/activation" (line 275). However, their RNA expression data (Fig4A) argues that both Parp1 and H40K20me are important for activation. An alternative possibility is that group B genes indeed recruit Parp1 (through H4K20me) upon heat shock, but then Parp1 promotes H3/H4 dissociation from group B genes. If Parp1 depletes H4, it will also deplete H4K20me1. To address this possibility, the authors should also do a ChIP for total H4 and plot both the raw signal of H4K20me1 and total H4 as well as the ratio of these signals. The authors could also note that Group A genes may similarly recruit Parp1 and deplete H3/H4 but with different kinetics than Group B genes because their basal state lacks H4K20me/Parp1. To test this possibility, the authors could measure Parp association, H4K20methylation, and H4 depletion at more time points after heat shock at both classes of genes.

      We thank Reviewer 2 for their valuable comment on our manuscript. We acknowledge your hypothesis suggesting that PARP-1 may induce H3/H4 dissociation from group B genes, potentially leading to a reduction in H4K20me1. However, our findings support a different interpretation.

      Our data indicate that while H4K20me1 is present under normal conditions at group B genes, its reduction following heat shock does not appear to hinder PARP-1's role in transcriptional activation (Fig 4A, C and E). We propose that the observed decrease in H4K20me1 might reflect a regulatory shift in chromatin structure that is conducive to transcriptional activation during heat shock, facilitated by PARP-1 independently of sustained H4K20me1 levels at group B genes. Additionally, the literature suggests a dual role for H4K20me1 in gene regulation, from facilitating transcriptional elongation in certain contexts to acting as a repressor in others.

      Unlike in group A genes which had low enrichment of H4K20me1 before heat shock (Fig 4B and D), the high enrichment of H4K20me1 in group B genes (Fig 4C and E) could imply a repressive role for this mark prior to heat stress. Thus, in the context of group B genes, it's conceivable that the removal of H4K20me1 might be necessary for their activation during heat stress. Thus, PR-SET7 may possess functions beyond its role as a histone methylase, which are crucial for activating group B genes under heat stress conditions. These functions could include methylation of non-histone substrates and non-catalytic activities.

      Furthermore, our analysis of gene expression in pr-set720 and parp-1C03256 mutants indicates that while PARP-1 and H4K20me1 interaction may have overlapping roles in gene regulation, they also possess distinct functions in the modulation of gene expression (Fig 3E). Thus, we propose that the relationship between PR-SET7 and PARP-1 in transcriptional regulation involves a complex regulatory mechanism that extends beyond the presence of H4K20me1.

      We modified the discussion section to address this point: Another plausible explanation could be that the recruitment of PARP-1 to group B genes loci promotes H4 dissociation and then leads to a reduction of H4K20me1. However, our findings suggest an alternative interpretation: the decrease in H4K20me1 at group B genes during heat shock does not seem to impede PARP-1's role in transcriptional activation, (Fig.4A, C and E). Rather than disrupting PARP-1 function, we propose that this reduction in H4K20me1 may signify a regulatory shift in chromatin structure, priming these genes for transcriptional activation during heat shock, with PARP-1 playing an independent facilitating role. Moreover, existing studies have highlighted the dual role of H4K20me1, acting as a promoter of transcription elongation in certain contexts and as a repressor in others (13, 25, 38, 39, 41-45). The elevated enrichment of H4K20me1 in group B genes under normal conditions may indicate a repressive state that requires alleviation for transcriptional activation. Additionally, we cannot discount the possibility of unique regulatory functions associated with PR-SET7, extending beyond its recognized role as a histone methylase. Non-catalytic activities and potential interactions with non-histone substrates might contribute to the nuanced control exerted by PR-SET7 on group B genes during heat stress (46, 47). Furthermore, our exploration of pr-set720 and ParpC03256 mutants reveals distinct roles for PARP-1 and H4K20me1 in modulating gene expression (Fig 3E). This reinforces the notion that the interplay between PR-SET7 and PARP-1 involves a multifaceted regulatory mechanism. Understanding the intricate relationship between these molecular players is crucial for elucidating the complexities of gene expression modulation under heat stress conditions.

      We hope that this modification will adequately address Reviewer 2 concerns and enhance the clarity of our conclusions.

      Reviewer #1 (Recommendations For The Authors):

      (1) Please check the entire manuscript for grammatical errors and typos. PR-set7 has been wrongly written as PR-ste7 in quite a few places in the manuscript. Poly (ADP)-ribosylation has been written as poly(ADP-ribosyl)ation in the last result heading. There are more such errors. Please rectify them.

      We express our sincere appreciation to Reviewer 1 for their meticulous review of our manuscript, and we acknowledge the importance of ensuring grammatical accuracy and clarity. We have taken your feedback seriously and conducted a comprehensive revision of the entire manuscript to rectify the identified typos and grammatical errors. We hope that these revisions contribute to an improved overall presentation of our research, and we appreciate the reviewer's diligence in ensuring the accuracy of the manuscript.

      (2) The authors can also look up publicly available mammalian ChIP-seq data for H4K20me1 and PARP1, in order to further ossify their findings and increase the breadth of their work.

      We appreciate the suggestion from Reviewer 1 and have taken steps to further validate and broaden the scope of our findings. Specifically, we compared the distribution of PARP1 and H4K20me1 in Human K562 cells. The results of this analysis revealed a correlation in their distribution, supporting the idea that the observed correlation between PARP-1 and H4K20me1 is not limited to fruit flies. We have incorporated these findings into the Results section and added a new Supplemental Fig. S6 to visually highlight this correlation: Finally, to extend the generalizability of our observations beyond Drosophila, we compared the distribution of PARP1 and H4K20me1 in Human K562 cells. Strikingly, we observed a correlation in their distribution, suggesting that the interplay between PARP-1 and H4K20me1 is not limited to fruit flies (Supplemental Fig. S6).

      We believe that this modification addresses Reviewer 1's suggestion by providing additional evidence that supports the broader relevance of our findings beyond the Drosophila model system.

      (3) Please discuss in greater detail how the PARP1-H4K20me1 axis orchestrates the repression program (metabolic pathways in this case) with proper references.

      We appreciate Reviewer 1's continued engagement with our manuscript and have adjusted the discussion section to provide a more detailed insight into how the PARP1-H4K20me1 axis orchestrates the repression program, particularly focusing on metabolic pathways. The modified discussion section now reads: In our previous study, we discovered that PARP-1 plays a crucial role in repressing highly active metabolic genes during the development of Drosophila by binding directly to their loci (34). Also, PARP-1 is required for maintaining optimum glucose and ATP levels at the third-instar larval stage (34). During Drosophila development, repression of metabolic genes is crucial for larval to pupal transition (35, 36). This repression is linked to the reduced energy requirements as the organism prepares for its sedentary pupal stage (35, 37). Notably, we observed that PARP-1 shows a high affinity for binding to the gene bodies of these metabolic genes (34). Our data indicates that in both parp-1 and pr-set7 mutant animals, there was a preferential repression of metabolic genes at sites where PARP-1 and H4K20me1 are co-bound (Fig.3E), while these metabolic genes are highly active during third-instar larval stage (Supplemental Fig.S6). Thus, we propose that the presence of H4K20me1 may be essential for the binding of PARP-1 at these gene bodies, contributing to their repression. Importantly, this mechanism of gene repression has broader developmental implications. As earlier stated, mutant animals lacking functional PARP-1 and PR-SET7 undergo developmental arrest during larval to pupal transition. This arrest could be directly linked to the disruption of the normal metabolic gene repression during development. Without the repressive action of PARP-1 and PR-SET7, key metabolic processes might remain unchecked, leading to metabolic imbalances that are incompatible with the normal progression to the pupal stage.

      We hope these modifications provide a more comprehensive discussion on how the PARP1-H4K20me1 axis influences the repression program, particularly within metabolic pathways, and how this mechanism contributes to the broader context of Drosophila development.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study presents a useful inventory of immune signatures that are correlated with cancer treatment-related pneumonitis. The data were collected and analysed using solid and validated methodology and can be used as a starting point for further functional studies.

      We sincerely thank the editor for their encouraging comments regarding our study. As rightly pointed out, this study indeed serves as a pivotal starting point for subsequent functional studies.

      Reviewer #2 (Recommendations For The Authors):

      I greatly appreciate the authors diligence in addressing all the suggested points. The paper now presents significantly stronger evidence to support the findings.

      I do have one final question: Could you clarify how the correlation presented in Supplementary Figure 3 was calculated? Is it a Pearson correlation of CTCAE grade directly to marker expression? Additionally, could you explain how the significance was determined? The authors mention a significant correlation for CCR7, but the heatmap displays similarly high values for CD7 and CD57. Finally, I'm curious about the absence of CD16 in the heatmap.

      Thank you for your insightful query. To clarify, the correlation shown in Supplementary Figure 3 was indeed calculated using the Pearson correlation coefficient. This involved correlating the CTCAE grade directly with the mean expression levels of each marker. The computations were conducted using GraphPad Prism version 9. Regarding the statistical significance, we defined a threshold of P < 0.05 as significant. Specifically, the P-values for CCR7, CD7, and CD57 were found to be 0.009, 0.035, and 0.039, respectively. Hence, while CCR7 showed a significant correlation, CD7 and CD57 also exhibited relatively high values, as correctly observed. We have added CD7 and CD57 along with CCR7 in the discussion section, though not to mention much for better focusing on CD16.

      CD16 was initially omitted from Supplementary Figure 3 to prevent redundancy and preserve data clarity. Nonetheless, in light of your query, we have included CD16 in the correlation matrix to provide a comprehensive view of its association with other markers.

      We hope this adequately addresses your question and further clarifies our findings.

      Reviewer #3 (Recommendations For The Authors):

      General suggestions for presentation in the future:

      It is essential to concretely define the numbers presented in all figures and plots. For example, in Figure 6 (I), what does it mean by "percentage representation of FCGR3A (CD16)"? Percentage of what? How did you calculate that? It is also important to show more statistics in general, for example, in dot plots like Figure 6 (H), where are the means and p-values? Little things like that completely change the impact of the figures. For the narrative of this paper, it is OK, but in the future, fine-tuning the presentation would massively improve the impact of the work which the contents deserve.

      Thank you for your insightful feedback. Addressing your concerns, I have revised Figure 6H and Figure 6I to provide a more precise and informative presentation of our data. In Figure 6H, the violin plots illustrate the expression intensity of FCGR3A (CD16) on CD4+ and CD8+ T cells. Each dot represents an individual cell within the BALF from both healthy controls (HC) and COVID-19 patients. This data was derived from the single-cell RNA-seq dataset GSE145926. To enhance clarity and statistical robustness, I have now included p-values directly in Figure 6H. Additionally, for a more comprehensive understanding, the means ± standard deviation (SD) have been incorporated into the main text of the manuscript.

      Regarding Figure 6I, it depicts the proportion of FCGR3A (CD16)-positive cells within the CD4+ and CD8+ T cell populations in BALF from HC and COVID-19 patients. The threshold for FCGR3A expression was set at 0.5. Upon further review and in response to your feedback, I realized an error in the calculation of the proportion of FCGR3A-positive cells among CD4+ and CD8+ T cells. Initially, the proportion of FCGR3A-positive CD4+ T cells was calculated in relation to the entire CD4+ T cell population, without differentiation between the groups. This has now been corrected, and the adjusted figures are presented in Figure 6I.

      I am grateful for the opportunity to refine these figures, as your suggestions have not only helped to correct the error but have also significantly enhanced the impact and clarity of our work. Your guidance has been instrumental in improving the overall quality and presentation of our research, ensuring that the findings are communicated effectively and accurately.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on diabetogenic risk from colorectal cancer (CRC) treatment. The authors claim that postoperative screening for type 2 diabetes should be prioritized in CRC survivors with overweight/obesity, irrespective of the oncological treatment received. The evidence supporting the claims is solid but requires confirmation in different populations. These results have theoretical or practical implications and will be of interest to endocrinologists, oncologists, general practitioners, gastrointestinal surgeons, and policymakers working on CRC and diabetes.

      Author response: We thank you for taking the time to provide constructive feedback on our manuscript and for the useful suggestions. We have provided a point-by-point response to each of the reviewers’ comments with clearly marked changes to the manuscript.

      Public reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors set out to determine whether colorectal cancer surgery site (right, left, rectal) and chemotherapy impact the subsequent risk of developing T2DM in the Danish national health register.

      Strengths:

      • The research question is conceptually interesting

      • The Danish national health register is a comprehensive health database

      • The data analysis was thorough and appropriate

      • The findings are interesting, and a little surprising that there was no impact of chemotherapy on the development of T2DM

      Weaknesses:

      This is not a weakness as such, but in the discussion, I would consider adding some brief comment on the international generalizability of the findings - e.g. demographic make up of the Danish population health register and background rates of DM and obesity in this population with CRC compared to countries on other continents.

      Author response: We agree that this information would be valuable. It has now been added in the Discussion section.

      Changes in manuscript: "In Denmark, the overall T2D prevalence is 6.9%25, lower than the global average in 2021 (10.5%) and also falls below the estimate of high-income countries (11.1%).26 Similarly, the obesity rate of 20% aligns with other Scandinavian countries and is below that of most high-income nations.27” (Page 8, line 256-258)

      A little more information would be helpful regarding how T2DM was diagnosed in the registry.

      Author response: We have now added a more thorough explanation of how T2D was diagnosed in the Methods section.

      Changes in manuscript: “Diabetes is defined as the second occurrence of any event across three types of inclusion events: 1) Diabetes diagnosed during hospitalisation 2) diabetes-specific services received at podiatrist 3) purchases of glucose lowering. Thus, if a patient developed transient T2D during chemotherapy treatment, it will only be an inclusion event if they purchase glucose lowering drugs. Individuals were classified as having T1D if they had received prescriptions for insulin combined with a diagnosis of type 1 from a medical hospital department. Otherwise, diabetes was classified as type 2.22” (Page 5, line 154-160)

      If someone did develop transient hyperglycemia requiring DM medications during chemotherapy, would the investigators have been able to identify these people?

      Author response: Yes, we have added a sentence in the Methods section.

      Changes in manuscript: “Thus, if a patient developed transient T2D during chemotherapy treatment, it will only be an inclusion event if they purchase glucose lowering drugs.” (Page 5, line 156-158)

      Would they have been classified as T2DM based on filling a prescription for DM meds for a period of time? Also, did the authors have information regarding time to development of T2DM after surgery?

      Author response: Yes, if they have 2 (or more) prescriptions of oral glucose lowering drugs. Yes, we have information regarding time to development of T2DM after surgery and found no difference between the groups.

      Changes in manuscript: Information on mean time to develop T2D post-surgery has now been added to Table 2.

      In the adjusted Models, the authors did not adjust for cancer stage, even though cancer stage appears to be very different between the chemo and no chemo groups. It would be interesting to know if it affects the results if the model adjusted for cancer stage

      Author response: We agree that adjustment for cancer stage would be a valuable information and we have performed the analysis and added a sentence in the Result section.

      Changes in manuscript: An adjusted analysis of cancer stage now appears in the Supplementary table 1.

      “Moreover, adjusting for cancer stage did not affect the results (Supplementary table 1).” (Page 7, line 219-220)

      It would be worthwhile to report if mortality rates were different between the groups during follow up, and if the authors investigated whether perhaps differences in mortality rates led to specific groups living longer, and therefore having more time to develop DM

      Author response: This situation is accounted for in the analysis by using Cox-regression analysis. This method accounts for the potential competing effect of mortality.

      Changes in manuscript: None.

      Overall, the authors achieved their aims, and the conclusions are supported by their results as reported.

      The results are unlikely to significantly change patient treatment or T2DM screening in this population. With some additional information, as described above, the results would be of interest to the community.

      Reviewer #2 (Public Review):

      Summary:

      The study showed the impact of cancer treatment on new onset of diabetes among patients with colorectal cancer using the national database. Findings reported that individuals with rectal cancer without chemotherapy were less likely to develop diabetes but among other groups, treatment didn't show any impact on the development of diabetes. BMI still played a significant role in developing diabetes regardless of treatment types.

      Strengths:

      One of the strengths of this study is innovative findings about the prognosis of colorectal cancer treatment stratified by treatment types. Especially, as it examined the impact of treatment on the risk of new chronic disease after diagnosis, it became significant evidence that suggests practical insights in developing a proper monitoring system for patients with colorectal cancer and their outcomes after treatment and diagnosis. It is imperative for providers to guide patients and caregivers to prevent adverse outcomes like new onset of chronic disease based on BMI and types of treatment. The next strength is the national database. As the study used the national database, the generalizability is validated.

      Weaknesses:

      Even though the study attempted to examine the impact of each treatment option, the dosage of chemotherapy and the types of chemotherapy were not able to be examined due to the data source.

      Author response: No unfortunately not. We agree that this would have been valuable information. This is stated in the original manuscript as a limitation. Please refer to page 10 line 305-306.

      Changes in manuscript: None.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor things:

      There are minor inconsistencies in the methods and results regarding BMI. In the methods, the authors state that BMI <18.5 and >/=40 were excluded, but these groups are included in Table 2.

      Author response: This has been corrected

      Changes in manuscript: BMI groups <18.5 and >/=40 are now excluded in Table 2. (Page 18)

      Line 204, I believe should be BMI 18.5-24.9, not 20-24.9.

      Author response: This has been corrected

      Changes in manuscript: “For each group (type of surgery ± chemotherapy), the HR for developing T2D depending on BMI subgroups was calculated by using Cox regression analysis adjusted for age, sex, year of surgery, and ASA score using normal weight (BMI:18.5-24.9) as the reference group.” (Page 6, line 184-186)

      Rather than showing the BMI mean in Table 1, it would be interesting to see the BMI breakdown by category.

      Author response: Yes, we agree. This analysis has now been added to Table 1

      Changes in manuscript: Please refer to Table 1

      Re line 215, I would consider rewriting to remove the multiple negatives -e.g. Radiation therapy in rectal resected had did not impact the incidence rate of T2D in the Rectal-No-Chemo group or Rectal-Chemo group

      Author response: This has been corrected. Please refer to the Result section.

      Changes in manuscript: “Radiation therapy in the rectal resected groups had no impact on the incidence rate of T2D (Table 2); and the unadjusted/adjusted HR of developing T2D was non-significant when comparing Rectal-No-Radiation patients with Rectal-Radiation patients (Table 3).” (Page 7, 223-225)

      Consider changing some of the "didn't"s in the discussion to "did not"

      Author response: This has been corrected.

      Changes in manuscript: Revised and corrected throughout the discussion.

      Reviewer #2 (Recommendations For The Authors):

      Some points need to be clarified and improved.

      In the method, patients with Type 1 Diabetes were excluded in the baseline but some patients were diagnosed with Type 1 diabetes after treatment and they were included in your analysis. It is interesting to identify Type 1 Diabetes after the treatment as an outcome, do you think that this diagnosis is caused by the treatment? And incidence rate or other HRs did not seem to include Type 1 Diabetes as stated in the methods. Did you exclude every Type 1 diabetes? If not, It needs to give further explanation about this outcome since the mechanism of Type 1 Diabetes and Type 2 Diabetes is different.

      Author response: This matter has now been clarified in the Methods section.

      Changes in manuscript: “Additionally, individuals diagnosed with Type 1 diabetes (T1D) either before or after surgery were excluded, along with those diagnosed with T2D preoperatively or within the first 2 weeks postoperatively, as the last group probably represents patients with preoperatively unknown pre-existing prediabetes or diabetes.22” (Page 4, line: 125-128)

      Despite limited existing findings, some studies actually reported the incidence rates of Type 2 Diabetes among patients with CRC (Singh S, Earle CC, Bae SJ, et al. Incidence of Diabetes in Colorectal Cancer Survivors. J Natl Cancer Inst. 2016;108(6):djv402. Published 2016 Feb 2. doi:10.1093/jnci/djv402; Khan NF, Mant D, Carpenter L, Forman D, Rose PW. Long-term health outcomes in a British cohort of breast, colorectal and prostate cancer survivors: a database study. Br J Cancer. 2011;105 Suppl 1(Suppl 1):S29-S37. doi:10.1038/bjc.2011.420; Jo A, Scarton L, O'Neal LJ, et al. New onset of type 2 diabetes as a complication after cancer diagnosis: A systematic review. Cancer Med. 2021;10(2):439-446. doi:10.1002/cam4.3666) whereas your study examined the impact of the different types of treatments.

      Author response: Our findings of T2D rate among CRC patients are now commented on in discussion section, and the abovementioned studies are included as references.

      Changes in manuscript: “This national cohort study demonstrated an IR of developing T2D after CRC surgery similar to previous studies.5,11” (Page 8, line 237-238)

      To strengthen the presentation, some places should be revised.

      • Line 216: it says that Table 1 showed no impact of radiation therapy on the incidence rate of T2D. However, either the interpretation or the table number seems wrong. Table 1 does not have this information. Correct this statement.

      • Line 239: There are typo and incomplete sentence. Check the sentence and correct the sentence.

      • Line 257-261: It may be a systematic issue to separate these two paragraphs. But two paragraphs seem related so make them one paragraph.

      Author response: These suggested changes have been made. Regarding line 216 the paragraph has been adjusted to the following:

      Changes in manuscript: “Radiation therapy in the rectal resected groups had no impact on the incidence rate of T2D (Table 2); and the unadjusted/adjusted HR of developing T2D was non-significant when comparing Rectal-No-Radiation patients with Rectal-Radiation patients (Table 3).” (Page 7, 223-225)

      Reference

      (1) Araghi M, Soerjomataram I, Jenkins M, et al. Global trends in colorectal cancer mortality: projections to the year 2035. Int J Cancer. 2019;144(12):2992-3000. doi:10.1002/ijc.32055

      (2) Arnold M, Sierra MS, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global patterns and trends in colorectal cancer incidence and mortality. Gut. 2017;66(4):683-691. doi:10.1136/gutjnl-2015-310912

      (3) González N, Prieto I, del Puerto-Nevado L, et al. 2017 Update on the Relationship between Diabetes and Colorectal Cancer: Epidemiology, Potential Molecular Mechanisms and Therapeutic Implications. Vol 8.; 2017. www.impactjournals.com/oncotarget

      (4) Mills KT, Bellows CF, Hoffman AE, Kelly TN, Gagliardi G. Diabetes mellitus and colorectal cancer prognosis: A meta-analysis. Dis Colon Rectum. 2013;56(11):1304-1319. doi:10.1097/DCR.0b013e3182a479f9

      (5) Singh S, Earle CC, Bae SJ, et al. Incidence of Diabetes in Colorectal Cancer Survivors. J Natl Cancer Inst. 2016;108(6). doi:10.1093/jnci/djv402

      (6) Xiao Y, Wang H, Tang Y, et al. Increased risk of diabetes in cancer survivors: a pooled analysis of 13 population-based cohort studies. ESMO Open. 2021;6(4). doi:10.1016/j.esmoop.2021.100218

      (7) Colorectal D, Nordcan 2019. 5-Year Age-Standardised Relative Survival (%), Males and Females. Accessed September 12, 2022. “https://nordcan.iarc.fr/en/dataviz/survival?cancers=520&set_scale=0&sexes=1_2&populations=208”" has been copied into your clipboard

      (8) Nano J, Dhana K, Asllanaj E, et al. Trajectories of BMI Before Diagnosis of Type 2 Diabetes: The Rotterdam Study. Obesity. 2020;28(6):1149-1156. doi:10.1002/oby.22802

      (9) Maddatu J, Anderson-Baucum E, Evans-Molina C. Smoking and the risk of type 2 diabetes. Translational Research. 2017;184:101-107. doi:10.1016/j.trsl.2017.02.004

      (10) Lega IC, Lipscombe LL. Review: Diabetes, Obesity, and Cancer-Pathophysiology and Clinical Implications. Endocr Rev. 2020;41(1). doi:10.1210/endrev/bnz014 (11) Jo A, Scarton L, O’Neal LTJ, et al. New onset of type 2 diabetes as a complication after cancer diagnosis: A systematic review. Cancer Med. 2021;10(2):439-446. doi:10.1002/cam4.3666

      (12) Feng JP, Yuan XL, Li M, et al. Secondary diabetes associated with 5-fluorouracil-based chemotherapy regimens in non-diabetic patients with colorectal cancer: Results from a single-centre cohort study. Colorectal Disease. 2013;15(1):27-33. doi:10.1111/j.1463-1318.2012.03097.x

      (13) Lee EK, Koo B, Hwangbo Y, et al. Incidence and disease course of new-onset diabetes mellitus in breast and colorectal cancer patients undergoing chemotherapy: A prospective multicenter cohort study. Diabetes Res Clin Pract. 2021;174. doi:10.1016/j.diabres.2021.108751

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewer for the constructive comments. We have revised the papers to address the concerns. In summary, here is what we included in the revised version.

      • Statistical analysis using biological replicate datasets for WT and K40R doublet microtubule.

      • Addition figures for statistical analysis and MIP decorations in MEC17-KO and K40R.

      • Revised texts and figures to reflect the new changes, cite proper references and fix small errors throughout the text.

      Reviewer #1 (Public Review):

      Summary:

      The study "Effect of alpha-tubulin acetylation on the doublet microtubule structure" by S. Yang et al employs a multi-disciplinary approach, including cryo-electron microscopy (cryo-EM), molecular dynamics, and mass spectrometry, to investigate the impact of α-tubulin acetylation at the lysine 40 residue (αK40) on the structure and stability of doublet microtubules in cilia. The work reveals that αK40 acetylation exerts a small-scale, but significant, effect by influencing the lateral rotational angle of the microtubules, thereby affecting their stability. Additionally, the study provided an explanation of the relationship between αK40 acetylation and phosphorylation within cilia, despite that the details still remain elusive. Overall, these findings contribute to our understanding of how post-translational modifications can influence the structure, composition, stability, and functional properties of important cellular components like cilia.

      Strengths:

      (1) Multi-Disciplinary Approach: The study employs a robust combination of cryo-electron microscopy (cryo-EM), molecular dynamics, and mass spectrometry, providing a comprehensive analysis of the subject matter.

      (2) Significant Findings: The paper successfully demonstrates the impact of αK40 acetylation on the lateral rotational angles between protofilaments (inter-PF angles) of doublet microtubules in cilia, thereby affecting their stability. This adds valuable insights into the role of post-translational modifications in cellular components.

      (3) Exploration of Acetylation-Phosphorylation Relationship: The study also delves into the relationship between αK40 acetylation and phosphorylation within cilia, contributing to a broader understanding of post-translational modifications.

      (4) High-quality data: The authors are cryo-EM experts in the field and the data quality presented in the manuscript is excellent.

      (5) Depth of analysis: The authors analyzed the effects of αK40 acetylation in excellent depth which significantly improved our understanding of this system.

      Thank you for highlighting the strength of our paper.

      Weaknesses:

      I have no major concerns about this paper, but would recommend that a few minor issues be addressed.

      (1) Lack of Statistical Details: The review points out that the paper could benefit from providing more statistical details, such as the number of particles and maps used for analysis, randomization methods, and dataset splitting for statistical analyses.

      To address this, we analyzed the true biological replicate datasets (different cultures, cryo-EM vitrification and data collection) from WT and K40R. Since the MEC17-KO was collected as only one dataset, we decided to not divide the MEC-17 using randomization since the division does not lead to independent sets, which tends to yield identical results in the case of cryo-EM. The biological replicates help us to see how consistent is our structure data for interpretation. The information about the replicate dataset is now included in Table 1. The description of the analysis is highlighted in the manuscript and included in the Materials & Methods and Fig. S4.

      In summary, the biological replicate between the WT data indicates that the inter-PF rotation angles are significantly consistent between two biological replicates. On the other hand, there are variations in the inter-PF angles between two replicates of K40R data in the B-tubule (Fig. S4B).

      Overall, when pooling the data together ( 6 + 6 measurement points for WT dataset 1 & 2 and 6 + 6 measurement points for K40R dataset 1 & 2 and 6 measurement points for MEC17-KO) (Fig. S4), our analysis yields the same statistical significance as the average of all datasets (6 measurement points of the total averages for WT, K40R and MEC17-KO) (Fig. 3).

      In addition, the variation in inter-PF rotation angles between certain PF pairs within the K40R replicates (B7B8 and B9B10) is similar to the variation to MEC17-KO. This suggests that the deacetylation induces variation in inter-PF angles while the inter-PF angles are maintained consistently in WT.

      (2) Questionable Conclusion Regarding MIPs: The reviewer suggests caution in the paper's conclusion that "Acetylation of αK40 does not affect tubulin and MIPs." The reviewer recommends that this conclusion be more specific or supported by additional evidence to exclude all other possibilities.

      We now revised the text to make sure we do not overclaim that “Acetylation of αK40 does not affect tubulin and MIPs.” We now describe more specifically as “Lack acetylation of αK40 does not significantly affect tubulin and MIP interactions”. Also the text was edited to make the statement more specific.

      (3) Need for Additional Visual Data: The reviewer recommends that an enlarged local density map along with fitted PDB models be provided in a supplementary figure, such as Figure 4.

      We now include the density maps and fitted PDB models in Fig. 4 and Fig. S5. We also include more snapshots of the MIP in K40R and MEC17-KO in Figure S3.

      Overall, the paper is strong in its scientific approach and findings but could benefit from additional statistical rigor and clarification of certain conclusions.

      Page 11, Line 226: "cluster consists of only ~ acetylated", lacks the percentage. Please correct this.

      We corrected it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) V2 epitopes exhibit properties of CD4i epitopes in that they are largely absent from the native Env surface, probably by glycan-occlusion, but become more exposed upon CD4 binding. Although the V2-scaffolds were produced in GnTi- cells to produce highmannose proteins, it appears that no systematic analysis of glycan content or structure was carried out save for enzymatic deglycosylation of the constructs to sharpen bands on SDS-PAGE gels. It would be helpful if the authors could comment on how the lack of this information might impact their conclusions.

      We thank the reviewer for this comment.

      The lack of native glycan structures is a common phenomenon in all HIV studies involving in vitro cell culture-expressed envelope proteins.

      As the reviewer mentioned, it is clear that our V1V2 scaffolds produced in GnTi-cells contain the expected high-mannose glycans, as evident from a significant shift and sharpening of the protein bands on the SDS-PAGE gel upon deglycosylation with the PNGase enzyme.

      In our previously published studies by Chand et al.,2017* (ref. below), the V1V2 scaffolds were shown to bind to glycan-dependent PG9 antibody suggesting that the conformation of the PG9 epitope is retained in the high-mannose V1V2 scaffold. This information has also been added to the “Hypothesis and Experimental Design” section of the Results in the revised manuscript.

      Additionally, as shown in Results, the human antibodies elicited in study participants against native glycosylated envelope protein due to natural HIV-1 infection distinguished the H173 and Y 173 epitopes in the high-mannose scaffolds, which was also recapitulated in our mouse studies using the GnTi-expressed high-mannose V1V2 scaffolds as antigens.

      Therefore, it does not seem likely that differences in glycans per se majorly affected the binding or the conclusions from our studies.

      *Chand S, Messina EL, AlSalmi W, Ananthaswamy N, Gao G, Uritskiy G, Padilla-Sanchez V, Mahalingam M, Peachman KK, Robb ML, Rao M, Rao VB. Glycosylation and oligomeric state of the envelope protein might influence HIV-1 virion capture by α4β7 integrin. Virology. 2017 Aug;508:199-212. doi: 10.1016/j.virol.2017.05.016. Epub 2017 May 31. PMID: 28577856; PMCID: PMC5526109.

      (2) Similarly, the MD simulations appear to be performed without taking glycan structure/occupancy.

      We were unable to perform glycan-dependent MD simulation studies because of the high computational demands and also the technical limitations that existed at the time of the study several years ago. Therefore, we focused on the protein backbone of the short C-strand in the V2 region that lacks glycan sites and in previous published studies has been demonstrated as conformationally polymorphic.

      Since this C-strand epitope is the binding site for many V2-directed antibodies identified previously, we hypothesized that it is relevant to explore this small immunogenic epitope for its propensity to change conformation due to an escape mutation discovered at residue 173 in a natural HIV-1 infection. How might this epitope behave in MD simulations in the presence of different glycans requires further investigation.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      Liao et al leveraged two powerful genomics techniques-CUT&RUN and RNA sequencing-to identify genomic regions bound by and activated or inactivated by SMAD1, SMAD5, and the progesterone receptor during endometrial stromal cell decidualization. Additionally, the authors generated novel knock-in HA-SMAD1 and PA-SMAD5 tagged mice to combat antibody issues facing the field, generating a novel model to advance the study of BMP signaling in the female reproductive tract. During decidualization in a murine model, SMAD1/5 are bound to many genomic sites of genes important in decidualization and pregnancy and coregulated responses with progesterone receptor signaling.

      Strengths

      The authors utilized powerful next generation sequencing and identified important transcriptional mechanisms of SMAD1/5 and PGR during decidualization in vivo.

      Weaknesses<br /> None.

      Overall, the manuscript and study are well structured and provide critical mechanistic updates on the roles of SMAD1/5 in decidualization and preparation of the maternal endometrium for pregnancy.

      We thank you for the summary and consideration.

      Reviewer #2 (Public Review):

      Summary:

      Liao and colleagues generated tagged SMAD1 and SMAD5 mouse models and identified genome occupancy of these two factors in the uterus of these mice using the CUT&RUN assay. The authors used integrative bioinformatic approaches to identify putative SMAD1/5 direct downstream target genes and to catalog the SMAD1/5 and PGR genome co-localization pattern. The role of SMAD1/5 on stromal decidualization was assayed in vitro on primary human endometrial stromal cells. The new mouse models offer opportunities to further dissect SMAD1 and SMAD5 functions without the limitation from SMAD antibodies, which is significant. The CUT&RUN data further support the usefulness of these mouse models for this purpose.

      Strengths:

      The strength of this study is the novelty of new mouse models and the valuable cistromic data derived from these mice. Overall the present manuscript is an excellent resource paper for the field of reproductive biology.

      Weaknesses:

      The weakness of the present version of the manuscript includes the self-limited data analysis approaches such as the proximal promoter based bioinformatic filter and an outdated method on inferring the cell type composition. Evidence was provided for potential associations between SMAD1/5 and other major transcription factors. However, causal effects of SMAD1/5 on the genome occupancy of other major uterine transcription factors were discussed but not experimentally examined in the present manuscript, which is understandable.

      For data in Figure 2B, the current manuscript fails to elaborate the common and distinct features between clusters 1 and 3 as well as the biological significance of having two separate clusters for SMAD1. In addition, Figure S1A shows overlapping genome occupancy between SMAD1 and SMAD5, which is not clearly demonstrated in Figure 2B.

      Thank you for the comments. We’ve added additional interpretations in Lines 281-283, addressing the clustering results mentioned in Figure 2B as suggested. We do appreciate the overlapping genome occupancy in Cluster 1, although the signal intensities may differ between two groups.

      Lines 281-283:

      “Peaks in cluster 1 exhibit a shared enrichment for both SMAD1 and SMAD5, whereas clusters 2 and 3 demonstrate preferential enrichment for SMAD5 and SMAD1, respectively.”

      For data in Figure 5A, the result description does not provide adequate information to guide readers to full understanding of the data. The biological meaning behind the three PR clusters is not stated nor speculated. Moreover, Figure 5A and Figure S1B are inherently connected but fail to be adequately described in the main text.

      Thank you for the comments. We’ve added additional interpretations in Lines 415-421 discussing the clustering results mentioned in Figure 5A, together with Supplement Figure 1C (Former Supplement Figure 1B) as suggested.

      Lines 415-421:

      “Based on the k-means clustering results of the peaks, we demonstrated clusters with shared occupancy between SMAD1/5 and PR (cluster 1), preferential deposition in the SMAD1 (cluster 2), SMAD5 (cluster 4) and PR (clusters 3,5), respectively. Interestingly, between clusters 3 and 5, although the primary enrichment is for PR, overall, the signal intensities for SMAD5 are higher in cluster 5. Together with previous analysis on genes uniquely or commonly bound by SMAD1/5 (Supplement Figure 1A), we speculate such observation can be attributed to a subset of the genes that are potentially co-regulated by SMAD5 and PR.”

      Reviewer #3 (Public Review):

      Summary:

      As SMAD1/5 activities have previously been indistinguishable, these studies provide a new mouse model to finally understand unique downstream activation of SMAD1/5 target genes, a model useful for many scientific fields. Using CUT&RUN analyses with gene overlap comparisons and signaling pathway analyses, specific targets for SMAD1 versus SMAD5 were compared, identified, and interpreted. These data validate previous findings showing strong evidence that SMADs directly govern critical genes required for endometrial receptivity and decidualization, including cell adhesion and vascular development. Further, SMAD targets were overlapped with progesterone receptor binding sites to identify regions of potential synergistic regulation of implantation. The authors report strong correlations between progesterone receptor and SMAD1/5 direct targets to cooperatively promote embryo implantation. Finally, the authors validated SMAD1/5 gene regulation in primary human endometrial stromal cells. These studies provide a data-rich survey of SMAD family transcription, defining its role as a governor of early pregnancy.

      Strengths:

      This manuscript provides a valuable survey of SMAD1/5 direct transcriptional events at the time of receptivity. As embryo implantation is controlled by extensive epithelial to stromal molecular crosstalk and hormonal regulation in space and time, the authors state a strong, descriptive narrative defining how SMAD1/5 plays a central role at the site of this molecular orchestration. The implementation of cutting-edge techniques and models and simple comparative analyses provide a straightforward, yet elegant manuscript.

      Although the progesterone receptor exists as a major regulator of early pregnancy, the authors have demonstrated clear evidence that progesterone receptor with SMAD1/5 work in concert to molecularly regulate targets such as Sox17, Id2, Tgfbr2, Runx1, Foxo1 and more at embryo implantation. Additionally, the authors pinpoint other critical transcription factor motifs that work with SMADs and the progesterone receptor to promote early pregnancy transcriptional paradigms.

      Weaknesses:

      Although a wonderful new tool to ascertain SMAD1 versus SMAD5 downstream signaling, the importance of these factors in governing early pregnancy is not novel. Furthermore, functional validation studies are needed to confirm interactions at promoter regions. Additionally, the authors presume that all overlapped genes are shared between progesterone receptor and SMAD1/5, yet some peak representations do not overlap. Although, transcriptional activation can occur at the same time, they may not occur in the same complex. Thus, further confirmation of these transcriptional events is warranted.

      Thank you for the comments. We recognized this limitation and discussed future options regarding this in Lines 578-583.

      Lines 578-583:

      “In this study, we determined the overlapped transcriptional control between SMAD1/5 and PR at the gene level, and functionally validated the regulatory effect at the transcript level in a human stromal cell decidualization model. While we observe a subset of peak representations that do not overlap at the base pair level in the promoter regions, future functional screenings at the promoter level, such as luciferase reporter assays to assess transcriptional co-activation by SMAD1/5 and PR, will advance this study.”

      Since whole murine uterus was used for these studies, the specific functions of SMAD1/5 in the stroma versus the epithelium (versus the myometrium) remain unknown. Further work is needed to delineate binding and transcriptional activation of SMAD1/5 and the progesterone receptor in the uterine compartments.

      We thank the reviewer for the insightful comment. Given the multifaceted roles of SMAD1/5 play the female reproductive tract, we concur that future studies will benefit from a more compartmentalized approach, as discussed in Lines 526-538.

      Lines 526-538:

      “Published studies have shown that nuclear SMAD1/5 localize to the stroma and epithelium during the decidualization process at 4.5 dpc, during the window of implantation. Conditional deletion of SMAD1/5 exclusively in the uterine epithelium using lactoferrin-icre (Ltf-icre) results in severe subfertility due to impaired implantation and decidual development. Conditional deletion of SMAD1/5/4 exclusively in the cells from mesenchymal lineage (including uterine stroma) using anti-Mullerian hormone type 2 receptor cre (Amhr2-cre) results in infertility with defective decidualization. Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that the transcriptional co-regulatory roles of SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out potential coregulatory roles of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future validations using single-cell sequencing and/or spatial transcriptomic analysis.”

      There are asynchronous gene responses in the SMAD1/5 ablated mouse model compared to the siRNA-treated human endometrial stromal cells. These differences can be confounding. Further investigation is needed to understand the meaning of these differences and as they relate to the entire SMAD transcriptome.

      Thank you for the comments. In the current study, we used human endometrial stromal cells as a model to validate our findings functionally, aiming to mimic the specific time point during decidualization. We acknowledge the similarities and differences between the mouse and human cell models, and this information needs to be considered when evaluating genome-wide effects on the transcriptome. This point is discussed ins Lines 589-597.

      Lines 589-597:

      “Since mice only undergo decidualization upon embryo implantation whilst human stromal cells undergo cyclic decidualization in each menstrual cycle in response to rising levels of progesterone, asynchronous gene responses may occur in comparison between mouse models and human cells. However, cellular transformation during decidualization is conserved between mice and humans, which makes findings in the mouse models a valuable and transferable resource to be evaluated in human tissues. Accordingly, our functional validation studies were performed using human endometrial stromal cells induced to decidualize in vitro for four days, which models the early phases of decidualization. Additional transcriptomic studies of the SMAD1/5 perturbations in human endometrial stromal cells will be of great resource in understanding the entire SMAD1/5 regulomes in humans.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The inference on the cell type composition could use updated bioinformatic tools, which are purely computational without costly and time-consuming wet-lab resources. Perhaps this part of the description could be streamlined if the authors chose to use the method in the current version.

      We thank the reviewer for the suggestion. We added the analysis of the cell type composition using the updated tool CIBERSORTx (PMID:31061481) and included the results and discussion regarding the cell type composition changes in Supplement Figure 1B and Lines 392-407.

      Lines 392-407

      “To explore the major cell types regulated by SMAD1/5, first, we used CIBERSORTx to analyze and depict changes in the cell populations upon SMAD1/5 depletion in the mouse uterus during early pregnancy. By imputing the bulk uterine gene expression profiles to previously published mouse uterine single-cell datasets using CIBERSORTx, we were able to compare changes across both samples and cell types upon the SMAD1/5 perturbation in the mouse uterus. We highlight the proportional increase in the epithelial cells, as well as the decrease in the decidual stromal cells and smooth muscle cells in mice lacking uterine SMAD1/5 during the periimplantation phase (Supplement Figure 1B). Such cell populational changes are in line with the phenotypical observations of decidualization failure and excessive proliferation in the epithelial compartment. In addition, to explore the expression patterns of SMAD1/5 direct targets in human, we profiled the expression levels of the key “up-targets” and “down-targets” in the different cell types of the human endometrium. Using previously published single-cell RNA seq data of human endometrium, we visualized the expression patterns of suppressive targets and activating targets of SMAD1/5 (Figure 4E). Apart from the major epithelial and stromal compartments, SMAD1/5 target genes are also widely expressed in the immune cell populations. Such observations reinforced the importance of the BMP signaling pathways in establishing an immune-privileged environment at the maternal-fetal interface.”

    1. Author Response

      Reviewer #1 (Public Review):

      Reviewer 1: The structural part of this work is interesting, as it is the first structure of Pin1 with a ligand that bridges both domains. They might want to underline this - all other structures in the PDB have a single domain complex, but never both domains by a single longer peptide.

      Done. We have highlighted the novelty of the structure in the abstract, introduction (page 5); and discussion (section “The Pin1-PKC interface is described by a novel bivalent interaction mode”, page 24).

      Reviewer 1: I would however question the static representation of this structure - the 90{degree sign} kink in the peptide when complexed is probably one single snapshot, but I hardly believe the PPIase/WW domain orientation to be static. Unless the authors have additional information to stand by this static structure, this point merits being commented on in the manuscript.

      Done. Following the reviewer’s suggestion and to avoid the impression of “static” structure, we have added sentences that highlight the dynamic aspects of the complex evident from the entire ensemble representation of Figure 5-figure supplement 2:

      Page 15 (Results):

      “Of note, the linker region connecting the two domains retains its flexibility in the complex and confers some variability onto the relative positions of the WW and PPIase domains, as is evident from the ensemble representation of Figure 5-figure supplement 2. The complex exhibits novel structural features that distinguish it from all other structures of Pin1 complexes known to date. These features are highlighted in Fig. 6 using the lowest-energy structure of the ensemble.”

      Page 24 (Discussion): “Moreover, the retention of linker flexibility in the Pin1::pV5bII complex suggests that Pin1 can potentially adopt minor “extended” states that would not be readily detectable by ensemble-averaged methods such as solution NMR.”

      Also, in describing specific interactions in the section “Structural basis of the Pin1-PKCII C-term bivalent recognition mode”, we now note how many structures of the Pin1-pV5bII ensemble have those interactions.

      Reviewer 1: I would like to point out to literature that described for example the non-canonical binding (Yeh ES, Lew BO & Means AR (2006) The loss of PIN1 deregulates cyclin E and sensitizes mouse embryo fibroblasts to genomic instability. J Biol Chem 281, 241-251. Pin1 recognizes cyclin E via a noncanonical pThr384- Gly385 motif [33] rather than the pThr380-Pro381 motif.). They mention briefly the absence of isomerase activity in similar TPP motifs, but this information might already come in the Results section.

      Done. We have incorporated this information in the Discussion section, page 25 (last paragraph).

      Reviewer 1: The expression levels of Pin1 and PKCa are amazingly linear (Fig 7A), but when they overexpress WT Pin1 in a KO line, with 3-4 times higher overexpression, the PKCa levels are hardly higher than in the original WT cell line.

      We thank Reviewer 1 for raising this interesting point. Our simple interpretation of the data is that physiological expression of Pin1 in the cell model we use is a limiting factor in the stimulated PKCa degradation pathway, but that Pin1 is no longer a limiting factor at higher expression levels. We now include this point in the Discussion, page 26.

      Reviewer 1: Also, the levels in the W34A/R68A/R69A (abolishing both WW and PPIase binding functions) are surprising, why would PKCa levels rise above the level found in the Pin1 KO cells?

      This result remains a puzzle but, as we are including all independent biological replicates in the analysis, the data are the data. Moreover, by assessing the functional complementation data to the KO by two-tailed t-test (see last point below), this effect does not reach statistical significance. Nonetheless, as the result is reproducible, we now comment on this effect in the Results, page 21. One speculation is this triple mutant has dominant negative properties imposed on some limiting factor in PKCa degradation that are revealed in the absence of WT Pin1. Considerably more work needs to be done to settle this issue. However, in light of the fact that this result does not conflict with the structural/biochemical data (rather, it is consistent with it), we hope this positive response satisfies the Reviewer.

      Reviewer 1: Finally, if even slight overexpression of the C113S catalytically inactive mutant leads to more efficient PKCa degradation than overexpression of the WT Pin1 (Figure 7C), it is hard to interpret. The conclusion that Pin1-mediated regulation of PKCa requires a bivalent interaction mode of Pin1 with PKCa independent of its catalytic activity do depend on these data, so they merit further analysis.

      We certainly had no intention of concluding that the C113S catalytically inactive mutant is more efficient with regard to promoting PKCa degradation than overexpression of the WT Pin1. That overstates the data. We concede that our organization of the Pin1 rescue data in the original Fig 7C confused the issue, and that the original text also invited conclusions that overstate the result. To correct this problem, we reorganized Fig. 7C to simplify the presentation by comparing the complementation data to the KO. All statistical comparisons are now to the KO cell line (not to WT as before) and we employ the two-tailed t-test to compare the data. Statistical significance is attained only for reconstituted WT and C113S Pin1 expression. The text is also appropriately revised to describe the results clearly. We trust the Reviewer agrees that the C113S data are compelling and are consistent with a noncanonical (noncatalytic) mode of PKCa regulation by Pin1. This is a major point of Fig 7C as it links the structural/biochemical data to a cellular context.

    1. Author Response

      eLife assessment

      This computational study is a valuable empirical investigation into the common trait of neurons in brains and artificial neural networks: responding effectively to both objects and their mirror im- ages and it focuses on uncovering conditions that lead to mirror symmetry in visual networks and the evidence convincingly demonstrates that learning contributes to expanding mirror symmetry tuning, given its presence in the data. Additionally, the paper delves into the transformation of face patches in primate visual hierarchy, shifting from view specificity to mirror symmetry to view invariance. It empirically analyzes factors behind similar effects in two network architec- tures, and key claims highlight the emergence of invariances in architectures with spatial pooling, driven by learning bilateral symmetry discrimination and importantly, these effects extend be- yond faces, suggesting broader relevance. Despite strong experiments, some interpretations lack explicit support, and the paper overlooks pre-training emergence of mirror symmetry.

      As detailed above, we have now analyzed several convolutional architectures and made a direct link between the artificial neural networks and neuronal data to further support our claims (refer to Figure 6, S10- 13).

      To address the concern about pre-training emergence of mirror symmetry, we conducted a new analysis inspecting unit-level response profile, following Baek and colleagues (2021). This analysis is described in detail below (response to R3). In brief, we found that the first fully connected layer in trained networks exhibits twice the number of mirror symmetric units found before training. In addition to our population-level observations (Fig. S2) and explicit training- dataset manipulations (Fig. 4), this finding supports the interpretation of training to discriminate among mirror- symmetric object categories as a major factor behind the emergence of mirror symmetric viewpoint tuning.

      Reviewer 1 (Public Review):

      By using deep convolutional neural networks (CNNs) as model for the visual system, this study aims at understanding and explaining the emergence of mirror-symmetric viewpoint tuning in the brain.

      Major strengths of the methods and results:

      1) The paper presents comprehensive, insightful and detailed analyses investigating how mirror- symmetric viewpoint tuning emergence in artificial neural networks, providing significant and novel insights into this complex process.

      2) The authors analyze reflection equivariance and invariance in both trained and untrained CNNs’ convolutional layers. This elucidates how object categorization training gives rise to mirror-symmetric invariance in the fully-connected layers.

      3) By training CNNs on small datasets of numbers and a small object set excluding faces, the authors demonstrate mirror-symmetric tuning’s potential to generalize to untrained categories and the necessity of view-invariant category training for its emergence.

      4) A further analysis probes the contribution of local versus global features to mirror-symmetric units in the first fully-connected layer of a network. This innovative analysis convincingly shows that local features alone suffice for the emergence of mirror-symmetric tuning in networks.

      5) The results make a clear prediction that mirror-symmetric tuning should also emerge for other bilaterally symmetric categories, opening avenues for future neural studies.

      We are grateful for your insightful feedback and the positive evaluation of our study on mirror-symmetric viewpoint tuning in neural networks. Your constructive comments considerably improved the manuscript. We eagerly look forward to exploring the future research avenues you have highlighted.

      Major weaknesses of the methods and results:

      Point 1.1) The authors propose a mirror-symmetric viewpoint tuning index, which, although innovative, complicates comparison with previous work and this choice is not well motivated. This index is based on correlating representational dissimilarity matrices (RDMs) with their flipped versions, a method differing from previous approaches.

      We have revised the Methods section to clarify the motivation for the mirror-symmetric viewpoint tuning index we introduced.

      Manuscript changes:

      Previous work quantified mirror-symmetry in RDMs by comparing neural RDMs to an idealized mirror- symmetric RDM (see Fig. 3c-iii in [14]). Although highly interpretable, such an idealized RDM encompasses implicit assumptions about representational geometry that are unrelated to mirror-symmetry. For example, consider a neural RDM reflecting perfect mirror-symmetric viewpoint tuning and wherein for each view, the distances among all of the exemplars are equal. Such a neural RDM would fit an idealized mirror- symmetric RDM better than a neural RDM reflecting perfect mirror-symmetric viewpoint tuning but with non-equidistant exemplars. In contrast, the measure proposed in Eq. 2 equals 1.0 in both cases.

      Point 1.2> Faces exhibit unique behavior in terms of the progression of mirror-symmetric viewpoint tuning and their training task and dataset dependency. Given that mirror-symmetric tuning has been identified in the brain for faces, it would be beneficial to discuss this observation and provide potential explanations.

      We revised the caption of Figure S1 to explicitly address this point:

      Manuscript changes:

      For face stimuli, there is a unique progression in mirror-symmetric viewpoint tuning: the index is negative for the convolutional layers and it abruptly becomes highly positive when transitioning to the first fully connected layer. The negative indices in the convolutional layers can be attributed to the image-space asymmetry of non-frontal faces; compared to other categories, faces demonstrate pronounced front-back asymmetry, which translates to asymmetric images for all but frontal views (Fig. S8). The features that drive the highly positive mirror-symmetric viewpoint tuning for faces in the fully connected layers are training-dependent (Fig. S2), and hence, may reflect asymmetric image features that do not elicit equivariant maps in low-level representations; for example, consider a profile view of a nose. Note that cars and boats elicit high mirror- symmetric viewpoint tuning indices already in early processing layers. This early mirror-symmetric tuning is independent of training (Fig. S2), and hence, may be driven by low-level features. Both of these object categories show pronounced quadrilateral symmetry, which translates to symmetric images for both frontal and side views (Fig. S8).

      Point 1.3: 3. Previous work reported critical differences between CNNs and neural represen- tations in area AL indicating that mirror-symmetric viewpoint tuning is less present than view invariance in CNNs compared to area AL. While such findings could potentially limit the use- fulness of CNNs as models for mirror-symmetric viewpoint tuning in the brain, they are not addressed in the study.

      This point is now addressed explicitly in the caption of Figure S9:

      Manuscript changes:

      Yildirim and colleagues [14] reported that CNNs trained on faces, notably VGGFace, exhibited lower mirror- symmetric viewpoint tuning compared to neural representations in area AL. Consistent with their findings, our results demonstrate that VGGFace, trained on face identification, has a low mirror-symmetric viewpoint tuning index. This is especially notable in comparison to ImageNet-trained models such as VGG16. This difference between VGG16 and VGGFace can be attributed to the distinct characteristics of their training datasets and objective functions. The VGGFace training task consists of mapping frontal face images to identities; this task may exclusively emphasize higher-level physiognomic information. In contrast, training on recognizing objects in natural images may result in a more detailed, view-dependent representation. To test this potential explanation, we measured the average correlation-distance between the fc6 representations of different views of the same face exemplar in VGGFace and VGG16 trained on ImageNet. The average correlation-distance between views is 0.70±0.04 in VGGFace and 0.93±0.04 in VGG16 trained on ImageNet. The converse correlation distance between different exemplars depicted from the same view is 0.84±0.14 in VGGFace and 0.58±0.06 in VGG16 trained on ImageNet. Therefore, as suggested by Yildirim and colleagues, training on face identification alone may result in representations that cannot explain intermediate levels of face processing.

      Point 1.4) The study’s results, while informative, are qualitative rather than quantitative, and lack direct comparison with neural data. This obscures the implications for neural mechanisms and their relevance to the broader field.

      We addressed this point by conducting a quantitative comparison between the architectures of various networks and neural response patterns in monkey face patches (see Figures 6, S10-S13, appearing above).

      Point 1.5) The study provides compelling evidence that learning to discriminate bilaterally symmetric objects (beyond faces) induces mirror-symmetric viewpoint tuning in the networks, qualitatively similar to the brain. Moreover, the results suggest that this tuning can, in principle, generalize beyond previously trained object categories. Overall, the study provides important conclusions regarding the emergence of mirror-symmetric viewpoint tuning in networks, and potentially the brain. However, the conducted analyses and results do not entirely address the question why mirror-symmetric viewpoint tuning emerges in networks or the brain. Specifically, the results leave open whether mirror-symmetric viewpoint tuning is indeed necessary to achieve view invariance for bilaterally symmetric objects.

      We believe that mirror-symmetric viewpoint tuning is not strictly necessary for achieving view-invariance. However, it is a plausible path from view-dependence to view invariance. We addressed this point in the updated limitations subsection of the discussion.

      Manuscript changes:

      A second consequence of the simulation-based nature of this study is that our findings only establish that mirror-symmetric viewpoint tuning is a viable computational means for achieving view invariance; they do not prove it to be a necessary condition. In fact, previous modeling studies [10, 19, 61] have demonstrated that a direct transition from view-specific processing to view invariance is possible. However, in practice, we observe that both CNNs and the face-patch network adopt solutions that include intermediate representations with mirror-symmetric viewpoint tuning.

      Taken together, this study moves us a step closer to uncovering the origins of mirror-symmetric tuning in networks, and has implications for more comprehensive investigations into this neural phenomenon in the brain. The methods of probing CNNs are innovative and could be applied to other questions in the field. This work will be of broad interest to cognitive neuroscientists, psychologists, and computer scientists.

      We appreciate your acknowledgment of our study’s contribution to understanding mirror-symmetric tuning in networks and its wider implications in the field.

      Reviewer 2 (Public Review);

      Strengths

      1) The statements made in the paper are precise, separating observations from inferences, with claims that are well supported by empirical evidence. Releasing the underlying code repository further bolsters the credibility and reproducibility. I especially appreciate the detailed discussion of limitations and future work.

      2) The main claims with respect to the two convolutional architectures are well supported by thorough analyses. The analyses are well-chosen and overall include good controls, such as changes in the training diet. Going beyond ”passive” empirical tests, the paper makes use of the fully accessible nature of computational models and includes more ”causal” insertion and deletion tests that support the necessity and sufficiency of local object features.

      3) Based on modeling results, the paper makes a testable prediction: that mirror-symmetric viewpoint tuning is not specific to faces and can also be observed in other bilaterally symmetric objects such as cars and chairs. To test this experimentally in primates (and potentially other model architectures), the stimulus set is available online.

      We express our gratitude for your constructive feedback. Your acknowledgment of the clarity of our statements and the robustness of our empirical evidence is greatly appreciated. We are also thankful for your recognition of our comprehensive analyses and the testable predictions arising from our work.

      Point 2.1: Weaknesses

      My main concern with this paper is in its choice of the two model architectures AlexNet and VGG. In an earlier study, Yildirim et al. (2020) found an inverse graphics network ”EIG” to better correspond to neural and behavioral data for face processing than VGG. All claims in the paper thus relate to a weaker model of the biological effects since this work does not analyze the EIG model. Since EIG follows an analysis-by-synthesis approach rather than standard classification training, it is unclear whether the claims in this paper generalize to this other model architecture. It is also unclear if the claims will hold for: 1) transformer architectures, 2) the HMAX architecture by Leibo et al. (2017) which has also been proposed as a computational explanation for mirror-symmetric tuning, and, as the authors note in the Discussion, 3) deeper architectures such as ResNet-50 which tend to better align to neural and behavioral data in general. These architectures include different computational motifs such as skip connections and a much smaller proportion of fully-connected layers which are a major focus of this work.

      Overall, I thus view the paper’s claims as limited to AlexNet- and VGG-like architectures, both of which fall behind state-of-the-art in their alignment to primates in general and also specifically for mirror-symmetric viewpoint tuning.

      We understand your concern regarding the choice of AlexNet and VGG architectures. The decision to focus on these models was driven by the need for a straightforward macroscopic correspondence between the layer structure of the artificial networks and the ventral visual stream. However, acknowledging this potential limitation of generality, we have expanded our analysis to include the EIG model, a transformer architecture, the HMAX model, and deeper convolutional architectures like ResNet-50 and ConvNeXt. Our revised analysis, detailed in Figures S1, S9, and S10-S13, incorporates these additional models and offers a comprehensive evaluation of their brain alignment and mirror-symmetric viewpoint tuning. We found that while the architectures indeed vary in their computational motifs, the emergence of mirror-symmetric viewpoint tuning is not exclusive to AlexNet and VGG. It occurs for every CNN we tested, exactly at the stage where equivariant feature maps are pooled globally. We believe that the new analyses extend the generality of our findings and remove the concern that our claims apply only to older, shallower networks.

      For details, please refer to Point 1 in the ’Essential Revisions’ section.

      Point 2.2: Minor weaknesses

      1) Figure 1A: since the relevance to primate brains is a major motivator of this work, the results from actual neural recordings should be shown and not just schematics. For instance, the mirror symmetry in AL is not as clean as the illustration (compare with Fig. 3 in Yildirim et al. 2020), and in the paper’s current form, this is not easily accessible to the reader.

      Thank you for your feedback regarding the presentation of neural recordings in Figure 1A. We have updated Figure 1A to include actual neural RDMs instead of the previous schematic representations.

      Point 2.3: 2. Figure 4 L832-845: The claims for the effect of training on mirror-symmetric viewpoint tuning are with respect to the training data only, but there are other differences between the models such as the number of epochs (250 for CIFAR-10 training, 200 for all other datasets), the learning rate (2.5 ∗ 10−4 for CIFAR-10, 10−4 for all others), the batch size (128 vs 64), etc. I do not expect these choices to make a major difference for your claims, but it would be much cleaner to keep everything but the training dataset consistent. Especially the different test accuracies worry me a bit (from 81% to 92%, and they appear different from the accuracy numbers in figure S4 e.g. for CIFAR-10 and asymSVHN), at the very least those should be comparable.

      We addressed this point by retraining the models while holding most of the hyperparameters constant. Specifically, we standardized the number of epochs, batch size, and weight decay. The remaining differences are necessitated by the characteristics of the specific training image sets used (natural images versus digits). Please note that we do not directly contrast models trained on CIFAR-10 and SVHN; the controlled comparisons are conducted while holding the SVHN training images constant, and are not confounded by hyperparameter choice.

      Manuscript changes:

      The networks’ weights and biases were initialized randomly using the uniform He initialization [70]. We trained the models using 250 epochs and a batch size of 256 images. The CIFAR-10 network was trained using stochastic gradient descent (SGD) optimizer starting with a learning rate of 10−3 and momentum of 0.9. The learning rate was halved every 20 epochs. The SVHN/symSVHN/asymSVHN networks were trained using the Adam optimizer. The initial learning rate was set to 10−5 and reduced by half every 50 epochs. The hyper-parameters were determined using the validation data. The models reached around 83% test accuracy (CIFAR-10: 81%, SVHN: 89%, symSVHN: 83%, asymSVHN: 80%). Fig. S4 shows the models’ learning curves.

      Point 2.4: 3. L681-685: The general statement made in the paper that ”deeper models lose their advantage as models of cortical representations” is not supported by the cited limited comparison on a single dataset. There are many potential confounds here with respect to prior work, e.g. the recording modality (fMRI vs electrodes), the stimulus set (62 images vs thousands), the models that were tested (9 vs hundreds), etc.

      We agree that the recording modality and stimulus set may play a critical role in determining model ranking. Since we generalized the analyses to deeper models, we removed this statement from the paper. While we still believe that shallower networks may prove to be better models of the visual cortex, this empirical question is out of the scope of the current manuscript.

      Reviewer 3

      This study aimed to explore the computational mechanisms of view invariance, driven by the observation that in some regions of monkey visual cortex, neurons show comparable responses to (1) a given face and (2) to the same face but horizontally flipped. Here they study this known phenomenon using AlexNet and other shallow neural networks, using an index for mirror symmetric viewpoint tuning based on representational similarity analyses. They find that this tuning is enhanced at fully connected- or global pooling layers (layers which combine spatial information), and that the invariance is prominent for horizontal- but not vertical- or rotational transformations. The study shows that mirror tuning can be learned when a given set of images are flipped horizontally and given the same label, but not if they are flipped and given different labels. They also show that networks learn this tuning by focusing on local features, not global configurations.

      We are grateful for your thorough reading, reflected by the comprehensive summary of our study and its main findings.

      Point 3.1) I found the study to be a mixed read. Some analyses were fascinating: for example, it was satisfying to see the use of well-controlled datasets to increase or decrease the rate of mirror-symmetry tuning. The insertion- and deletion¬ experiments were elegant tests to probe the mechanisms of mirror symmetry, asking if symmetry could arise from (1) global feature configurations (in a holistic sense) vs. (2) local features, with stronger evidence for the latter. These two sets of results were successful and interpretable. They stand in contrast with the first analysis, which relies on observations that do not seem justified. Specifically, Figure 2D shows mirror-symmetry tuning across 11 stages of image processing, from pixels space to fully connected layers. It shows that images from different object categories evoke considerably different tuning index values. The explanation for this result is that some categories, such as ”tools,” have ”bilaterally symmetric structure,” but this is not explicitly measured anywhere. ”Boats” are described as having ”front-back symmetry,” more so than flowers. One imagines flowers being extremely symmetric, but perhaps that depends on the metric. What is the metric? At first I thought it was the mirror-symmetric viewpoint tuning index in the image (pixel) space, but this cannot be, as the index for faces and flowers is negative, cars have no symmetry, and boats are positive. To support these descriptions, one must have an independent variable (for object class symmetry) that can be related to the dependent variable (the mirror-symmetric viewpoint tuning index). If it exists, it is not a part of the Results section. This omission undermines other parts of the Results section: ”some car models have an approximate front-back symmetry...however, a flower typically does not...” ”Some,” ”typically:” how many in the dataset exactly, and how often?

      We thank you for your insightful observation. You are correct that we did not refer to pixel-space symmetry; our descriptions relate to the 3D structure of the objects used in the study.

      Following this comment, we objectively quantified the symmetry planes of the 3D objects. Unfortunately, we do not have direct access to the proprietary 3D meshes of these objects, only to their renders. Therefore, we devised measures that assess the symmetry of the 3D objects through the symmetry they elicit in the different 2D renders.

      This analysis is described in the new supplemental figure S8. We believe that these measurements support the qualitative claims we made in the previous draft.

      Point 3.2) The description of CIFAR-10 as having bilaterally symmetric categories - are all these categories equally symmetric? If not, would such variability matter in terms of these results?

      When considering their 3D structure, all ten CIFAR10 categories exhibit pronounced left-right symmetry. These categories encompass vertebrate animals (birds, cats, deer, dogs, frogs, horses); They also include man-made vehicles (airplanes, cars, ships, and trucks), which, at least externally, are nearly perfectly symmetric by design. It is important to note that this symmetry pertains to the photographed 3D objects, rather than the images themselves, which could be highly asymmetric. Other axes of symmetry (e.g., back-front) in CIFAR10 cannot be measured without 3D representations of the objects.

      Point 3.3) These assessments of object category symmetry values are made before experiments are presented, so they are not interpretations of the results, and it would be circular to write it otherwise.

      We have changed the order so that the explanations follow the experimental results. This includes the relevant main text paragraph, as well as the relevant figure—both the order of panels and the phrasing of the figure caption.

      Point 3.4) Overall, my bigger concern is that the framing is misleading or at best incomplete. The manuscript successfully showed that if one introduces left-right symmetry to a dataset, the network will develop population-level representations that are also bilaterally symmetric. But the study does not explain that the model’s architecture and random weight distribution are sufficient for symmetry tuning to emerge, without training, just to a much more limited degree. Baek et al. showed in 2021 that viewpoint-invariant face-selective units and mirror-symmetric units emerge in untrained networks (”Face detection in untrained deep neural networks”; this current manuscript cites this paper but does not mention that mirror symmetry is a feature of the 2021 study). This current study also used untrained networks as controls (Fig. 3), and while they were useful in showing that learning boosts symmetry tuning, the results also clearly show that horizontal-reflection invariance is far from zero. So, the simple learning-driven explanation for the mirror-symmetric viewpoint tuning for faces is wrong: while (1) network training and (2) pooling are mechanisms that charge the development of mirror-symmetric tuning, the lottery ticket hypothesis is enough for its emergence. Faces and numbers are simple patterns, so the overparameterization of networks is enough to randomly create units that are tuned to these shapes and to wire many of them together. How learning shapes this process is an interesting direction, especially now that this current study has outlined its importance.

      We agree with the reviewer that random initialization may result in units that show mirror-symmetric viewpoint tuning for faces in the absence of training. In the revised manuscript, we quantify the occurrence of such units, first reported by Baek et al, in detail, and discuss the relation between Baek et al., 2021 and our work. In brief, our analysis affirms that units with mirror-symmetric viewpoint tuning for faces appear even in untrained CNNs, although we believe their rate is lower than previously reported. Regardless of the question of the exact proportion of such units, we believe it is unequivocal that at the population level, mirror-symmetric viewpoint tuning to faces (and other objects with a single plane of symmetry) is strongly training-dependent.

      First, we refer the reviewer to Figure S2, which directly demonstrates the effect of training on the population-level mirror symmetric viewpoint tuning:

      Note the non-mirror-symmetric reflection invariant tuning profile for faces in the untrained network.

      Second, the above-zero horizontal reflection-invariance referred by the reviewer (Figure 3) is distinct from mirror- symmetric viewpoint tuning; the latter requires both reflection-invariance and viewpoint tuning. More importantly, it was measured with respect to all of the object categories grouped together; this includes objects with quadrilateral symmetry, which elicit mirror-symmetric viewpoint tuning even in shallow layers and without training. To clarify the confusion that this grouping might have caused, we repeated the measurement of invariance in fc6, separately for each 3D object category:

      Disentangling the contributions of different categories to the reflection-invariance measurements, this analysis under-scores the necessity of training for the emergence of mirror-symmetric viewpoint symmetry.

      Last, we refer the reviewer to Figure S5, which shows that the symmetry of untrained convolutional filters has a narrow, zero-centered distribution. Indeed, the upper limit of this distribution includes filters with a certain degree of symmetry. This level of symmetry, however, becomes the lower limit of the filters’ symmetry distribution following training.

      Therefore, we believe that training induces a shift in the tuning of the unit population that is qualitatively distinct from, and not explained by, random-lottery-related mirror-symmetric viewpoint tuned units. In the revised manuscript, we clarify the distinction between mirror-symmetric viewpoint tuning at the population level and the existence of individual units showing pre-training mirror symmetric viewpoint tuning, as shown by Baek et al.

      Manuscript changes: (Discussion section)

      Our claim that mirror-symmetric viewpoint tuning is learning-dependent may seem to be in conflict with findings by Baek and colleagues [17]. Their work demonstrated that units with mirror-symmetric viewpoint tuning profile can emerge in randomly initialized networks. Reproducing Baek and colleagues’ analysis, we confirmed that such units occur in untrained networks (Fig. S15). However, we also identified that the original criterion for mirror-symmetric viewpoint tuning employed in [17] was satisfied by many units with asymmetric tuning profiles (Figs. S14 and S15). Once we applied a stricter criterion, we observed a more than twofold increase in mirror-symmetric units in the first fully connected layer of a trained network compared to untrained networks of the same architecture (Fig. S16). This finding highlights the critical role of training in the emergence of mirror-symmetric viewpoint tuning in neural networks also at the level of individual units.

      Point 3.5) Finally, it would help to cite other previous demonstrations of equivariance and mirror symmetry in neural networks. Chris Olah, Nick Cammarata, Chelsea Voss, Ludwig Schubert, and Gabriel Goh of OpenAI wrote of this phenomenon in 2020 (Distill journal).

      We added a reference to the study by Olah and colleagues (2020).

      Manuscript changes: (Discussion section)

      (see Olah and colleagues (2020) [60] for an exploration of emergent equivariance using activation maximiza- tion).

      Point 3.6) Some other observations that might help:

      I am enthusiastic about the experiments using different datasets to increase or decrease the rate of mirror-symmetry tuning (sets including CIFAR10, SVHN, symSVHN, asymSVHN); it is worth noting, however, that the lack of a ground truth metric for category symmetry is a problem here too. In the asymSVHN dataset, images are flipped and given different labels. If some categories are naturally symmetric after horizontal flips, such as images containing ”0” or ”8”, then changing the label is likely to disturb training. This would explain why the training loss is larger for this condition (Figure S4D).

      We now acknowledge that the inclusion of digits 0 and 8 reduces the accuracy of asymSVHN:

      Manuscript changes: (Figure S4 caption)

      Note that the accuracy of asymSVHN might be negatively affected by the inclusion of relatively symmetric categories such as 0 and 8.

      Our rationale for retaining these digits in the dataset was to manipulate the symmetry of the learned categories (compared to symSVHN) while keeping the images themselves constant.

      Regarding ground-truth symmetry of these dataset: For CIFAR-10, the relevant measure of symmetry pertains to the 3D structure of the photographed objects, which we believe is unequivocally symmetric (see Point 3.2). Note that 2D, pixel-space image symmetry is not directly indicative of symmetry in 3D.

      For SVHN, which consists of two-dimensional characters, the pixel-space symmetry of the images indeed reflects the objects’ symmetry. However, since we are worried that some readers might confuse our claims that relate to the symmetry of objects with claims (we did not make) about symmetry of 2D images, we prefer to avoid reporting measurements of image-space symmetry. We believe that our interpretation of the experiments with SVHN/symSVHN/asymSVHN holds even in the absence of such measurements.

      For your reference, we include here a quantification of image-space horizontal symmetry for each category of CIFAR-10 and SVHN:

      Point 3.7) It is puzzling why greyscale 3D rendered images are used. By using greyscale 3D render (at least as shown in the figures) the study proceeds as if the units are invariant under color transformations. Unfortunately, this is not true and using greyscale images impact the activations of different layers of Alexnet in a way that is not fully defined. Moreover, many units in shallow networks focus on color and exactly these units could be invariant to other transformation like the mirror symmetry, but grey scaling the images makes them inactive.

      We use grayscale 3D rendered images to align with the setting in other studies investigating mirror- symmetric viewpoint tuning, including Freiwald et al. (2010), Leibo et al. (2017), and Yildirim et al. (2020). The choice of using grayscale images in these studies is motivated by the need to dissociate face-processing from lower-level, hue-specific responses.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors perform a very thorough, extensive characterization of the impact of an iron-rich diet on multiple phenotypes in a wide range of inbred mouse strains. While a work of this type does not offer mechanistic insights, the value of the study lies not only in its immediate results but also in what it can offer to future researchers as they explore the genetic basis of iron levels and other related phenotypes in rodent studies. The creation of a web resource and the offer from the authors to share all available samples is particularly laudable, and helps to increase the accessibility of the work to other scientists. There is one shortcoming to the work however. To induce iron overload in mice in the main study in this work, mice were placed on an iron-rich diet that differed in its composition from the baseline diet in more than just iron. This could influence some of the phenotypes observed in this study.

      We thank the reviewer for their comments. We hope that this work can provide insight and/or support for a wide variety of future studies. Regarding the diets, yes, in our initial pilot study with 6 strains, the baseline diet was inadvertently not isocaloric with the high iron diet, and it also used a different source of cellulose and contained individual amino acids in ratios found in casein, instead of casein, which was used as the protein source for the high iron diet. The baseline metal composition however was the same. We included data from the pilot study in this manuscript because it provided some important early insight, but made sure to note this caveat since it could potentially affect some results. We added some additional text to the Methods section to help clarify this further. The other subsequently performed studies in this paper were not affected, for example the Control study performed in C57BL/6J has a baseline diet that matches the high iron diet except for iron. For our HMDP genetic study with 114 strains, we did not have a baseline group, so all mice were on the same high iron diet.

      Reviewer #2 (Public Review):

      Here, the authors tried to identify the genes and biological pathways underlying iron overload and its associated pathologies in mice. Several wet lab experiments and measurements alongside many bioinformatic analyses like GWAS, RNA-seq data analysis (DEG), eQTL analysis, TWAS, and gene-set enrichment analysis have been performed. The study design is good enough and the author tried to validate the results. The data have been submitted (Accession #: GSE230674) but are not public yet.

      Thank you very much for your detailed and thoughtful review and for helping us to improve our manuscript.

      1) The main issue of this manuscript is its length. It's too long, especially the result section. It's hard for readers to follow the paper. Moreover, you added results about other minerals, mostly copper, which seems too much (considering the fact that this study is about iron). The text doesn't have the required Integrity and focus. You should decide where you want to put the focus of this manuscript and I strongly recommend shortening the manuscript, try to be short and sweet as much as you can.

      Thank you for this helpful suggestion. We have moved or removed excess discussion from the Results section. We moved the specific GWAS results for copper and related red cell traits to the Supplementary text file “Supplementary File 24” so that only iron and triglyceride GWAS results are described in the main text. We kept in the discussion about the copper findings in the Discussion section, since we believe the deficiency is an important phenotype induced by the high iron diet that may impact other studies of dietary iron overload. We also believe that the copper and anemia GWAS loci may be of interest to some readers. We considered putting the copper and anemia findings in a separate manuscript, but ultimately decided to include it here, although we do agree it makes the manuscript longer.

      2) Also, the "Methods" section is long, some parts are over-detailed (mostly wet lab procedures) and some parts are not detailed enough. It seems the "Statistical analyses" part doesn't have extra information. I recommend removing the first paragraph and moving some of the information from the second paragraph to the right place in the Method section.

      We reorganized the first part of the statistical analyses section for clarity, and as mentioned further below, added in more detail regarding the GWAS significance thresholds:

      “Analyses were performed using GraphPad Prism (GraphPad Software, La Jolla, CA) and in R. P < 0.05 was considered significant for these tests and for bicor analyses. All reported P values are based on a two-sided hypothesis. The initial number of mice per group in the pilot (N = 6 per group) and Control studies (N = 8 per group) were determined based on previous studies where similar phenotypes were measured. For the HMDP study, permutation and simulation studies were previously used to test the statistical power of the HMDP using parameters including the variance explained by SNPs, genetic background, random errors, and the number of repeated measurements per strain (Bennett, Farber et al. 2010). Appropriate sample sizes to achieve adequate statistical power were determined based on previous analyses. Differences in sample sizes among the HMDP strains were due to differences in strain availability as determined by breeding success and losses. For GWAS, thresholds for significant (P < 4.1e-6; -log10P > 5.387) loci were defined using permutation as previously described (Bennett, Farber et al. 2010). The suggestive locus threshold (P < 4.1e-5; -log10P > 4.387) was based on reducing the significance threshold by one log unit. The cis eQTL GWAS threshold (P < 1e-4) was based on a calculated 1% FDR threshold of 1.73e-3, adjusted to 1e-4 to be slightly more conservative. The trans-eQTL threshold (P < 1e-6) was based on the 4.1e-6 threshold, adjusted to 1e-6 to be more conservative as well.”

      We tried moving the missing values notes in the second paragraph to the various method sections in the paper they apply to, but this led to much repetition and was in some cases not clear, so we decided to keep this information together in the statistical analyses section.

      3) Some part of your discussion section, is retelling the results. Please discuss your results and compare them with previous findings.

      We have revised the discussion to remove several parts that mostly just summarized the results and agree this improves the text. As mentioned above, we moved some discussion that was in the Results section to the Discussion section as well.

      4) Add detail about your GWAS model. As you had repeated samples from each strain, it's good to mention how you considered this. Also, show how you determined the significance threshold.

      Thank you for this suggestion. The GWAS software we used (FaST-LMM) derives a kinship matrix from the genotypes of the individuals considered in the analysis; this kinship matrix is used to correct for population structure including multiple individuals per strain.

      The trait GWAS significance threshold was determined using permutation analysis (Bennett, Farber et al. 2010). The suggestive GWAS threshold was based on reducing the significance threshold by one log unit. The cis eQTL GWAS threshold was based on a calculated 1% FDR threshold of 1.73e-3, adjusted to 1e-4 to be slightly more conservative. The trans-eQTL threshold was based on the 4.1e-6 threshold, adjusted to 1e-6 to be more conservative as well.

      To improve the text, we added to the Methods section under the “Genome-wide association analysis and heritability estimation” header the following:

      “Traits were quantile transformed to normalize the distribution and then GWAS was performed using the FaST-LMM program (Lippert, Listgarten et al. 2011), which corrects for population structure (including multiple samples per strain) by using a kinship matrix derived from the genotypes to be analyzed.”

      We also revised the GWAS threshold text to include more detail:

      “Analyses were performed using GraphPad Prism (GraphPad Software, La Jolla, CA) and in R. P < 0.05 was considered significant for these tests and for bicor analyses. All reported P values are based on a two-sided hypothesis. For GWAS, thresholds for significant (P < 4.1e-6; -log10P > 5.387) loci were defined using permutation as previously described (Bennett, Farber et al. 2010). The suggestive locus threshold (P < 4.1e-5; -log10P > 4.387) was based on reducing the significance threshold by one log unit. The cis eQTL GWAS threshold (P < 1e-4) was based on a calculated 1% FDR threshold of 1.73e-3, adjusted to 1e-4 to be slightly more conservative. The trans-eQTL threshold (P < 1e-6) was based on the 4.1e-6 threshold, adjusted to 1e-6 to be more conservative as well. “

      5) The abstract could be better. It also doesn't have a conclusion.

      We revised the abstract and added in a conclusion:

      “Tissue iron overload is a frequent pathologic finding in multiple disease states including non-alcoholic fatty liver disease (NAFLD), neurodegenerative disorders, cardiomyopathy, diabetes, and some forms of cancer. The role of iron, as a cause or consequence of disease progression and observed phenotypic manifestations, remains controversial. In addition, the impact of genetic variation on iron overload related phenotypes is unclear, and the identification of genetic modifiers is incomplete. Here, we used the Hybrid Mouse Diversity Panel (HMDP), consisting of over 100 genetically distinct mouse strains optimized for genome-wide association studies (GWAS) and systems genetics, to characterize the genetic architecture of dietary iron overload and pathology. Dietary iron overload was induced by feeding male mice (114 strains, 6-7 mice per strain on average) a high iron diet for six weeks, and then tissues were collected at 10-11 weeks of age. Liver metal levels and gene expression were measured by ICP-MS/ICP-AES and RNASeq, and lipids were measured by colorimetric assays. FaST-LMM was used for genetic mapping, and Metascape, WGCNA, and Mergeomics were used for pathway, module, and key driver bioinformatics analyses. Across the HMDP, we identified many traits that exhibited high inter-strain variability on the high iron diet, and we found a substantial contribution of genetics to many traits. Mice on the high iron diet accumulated iron in the liver, with a 6.5 fold difference across strain means. The iron loaded diet also led to a spectrum of copper deficiency and anemia, with liver copper levels highly positively correlated with red blood cell count, hemoglobin, and hematocrit. Hepatic steatosis of various severity was also observed histologically, with 52.5 fold variation in triglyceride levels across the strains. Most clinical traits examined had at least one significant GWAS locus, and notably, liver triglyceride and iron mapped most significantly to an overlapping locus on chromosome 7 that has not been previously associated with either trait. By genetically mapping liver mRNA expression, we identified cis- and trans-eQTL for thousands of genes, and we integrated this with trait correlation data to identify candidate causal genes at many trait loci. Using network modeling, significant key drivers for both iron and triglyceride accumulation were found to be involved in cholesterol biosynthesis and oxidative stress management. To make the full data set accessible and useable by others, we have made our data and analyses available on a resource website. Overall, our study confirms and expands upon the contribution of mouse genetic background to dietary iron overload and associated pathology. The numerous GWAS loci, candidate genes, and biological pathways identified here provide a rich public resource to drive further investigation.”

      6) Page 8, lines 4-7: Please remove these lines or move them to the Method section. The last paragraph of the introduction should clearly explain the goal of the study.

      We removed these lines and revised this paragraph for clarity:

      In order to gain further insight into genetic contributors to iron overload and associated pathology, we measured clinical traits and hepatic mRNA expression in 114 mouse strains fed a high iron diet. The mice are from a genetically diverse cohort known as the Hybrid Mouse Diversity Panel (HMDP), a panel optimized for systems genetics studies that has previously been used to examine numerous complex traits, including obesity, diabetes, atherosclerosis, heart failure, carbon tetrachloride induced liver fibrosis, and fatty liver disease (Lusis, Seldin et al. 2016; Seldin, Yang et al. 2019; Tuominen, Fuqua et al. 2021; Cao, Wang et al. 2022).

      7) Page 68, line 13: Explain the abbreviation (RINe) before use. Also, most probably it is RIN (RNA Integrity Number).

      Thank you for pointing this out. We updated the methods text as follows: “All samples had RNA integrity number equivalents (RINe) values greater than 8 as measured on an Agilent 2200 TapeStation (Agilent, Santa Clara, CA).” We also added RINe to the abbreviations section.

      8) The heritability estimates seem high and the 1% difference between broad- and narrow-sense heritability means there is almost no dominant and epistatic genetic variance between alleles affecting the studied trait (which is hard to accept). I recommend considering a within-group (strain) variance (common environmental effect) component in the model to absorb this source of variation in this component, so the genetic variance and consequently the heritability estimates would be more accurate. You also can consider this source of variance in your GWAS model.

      Thank you for bringing up these points. While we try to minimize environmental effects by keeping these mice and samples in as similar environmental and experimental conditions as feasible, some will remain. Thus, in our analyses, we try to factor in remaining environmental variation by using data from multiple mice per strain. The programs we used for GWAS and heritability calculations take into account within-group (strain) variance. We added the following sentence to the Methods section just after mention of the programs used to calculate heritability:

      “Both of the software packages used for heritability estimation account for environmental variance within strains.”

      We agree that the broad-sense and narrow-sense estimates are close to each other for many traits and that this suggests low levels of dominance and epistasis. A low level of non-additive genetic variance is not uncommon and theoretically predicted for complex traits, as has been reported previously and discussed in the references below:

      Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 2008 Feb 29;4(2):e1000008. doi: 10.1371/journal.pgen.1000008. PMID: 18454194

      Hivert V, Sidorenko J, Rohart F, Goddard ME, Yang J, Wray NR, Yengo L, Visscher PM. Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. Am J Hum Genet. 2021 May 6;108(5):786-798. doi: 10.1016/j.ajhg.2021.02.014. Epub 2021 Apr 2. Erratum in: Am J Hum Genet. 2021 May 6;108(5):962. PMID: 33811805

      It has also been argued that many human GWAS studies, as well as studies using populations of mice designed for complex trait analyses, including the HMDP population, inherently lack the statistical power to detect epistasis:

      Buchner DA, Nadeau JH. Contrasting genetic architectures in different mouse reference populations used for studying complex traits. Genome Res. 2015 Jun;25(6):775-91. doi: 10.1101/gr.187450.114. Epub 2015 May 7. PMID: 25953951

      Taking all this together we would argue that it is not surprising to see the little difference between the narrow and broad heritability estimates for many traits in our study. To provide more context to the reader regarding how to interpret our heritability findings, we added the following text to the discussion section, under limitations:

      “Finally, in our study with the HMDP population, estimated broad and narrow sense heritabilities were similar for many traits, suggesting modest non-additive contributions (e.g dominance and epistasis) to the variance in these traits. While such results are common and theoretically predicted for complex traits (Hill, Goddard et al. 2008; Hivert, Sidorenko et al. 2021), our study population may also not be optimal for detection of these effects (Buchner and Nadeau 2015).”

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study by Lee et al. is a direct follow-up on their previous study that described an evoluBonary conservancy among placental mammals of two moBfs (a transmembrane moBf and a juxtamembrane palmitoylaBon site) in CD4, an anBgen co-receptor, and showed their relevance for T-cell anBgen signaling. In this study, they describe the contribuBon of these two moBfs to the CD4-mediated anBgen signaling in the absence of CD4-LCK binding. Their approach was the comparison of anBgen-induced proximal TCR signaling and distal IL-2 producBon in 58-/- T-cell hybridoma expressing exogenous truncated version of CD4 (without the interacBon with LCK), called T1 with T1 version with the mutaBons in either or both of the conserved moBfs. They show that the T1 CD4 can support signaling to the extend similar to WT CD4, but the mutaBon of the conserved moBfs substanBally reduced the signaling. The authors conclude that the role of these moBfs is independent of the LCK-binding.

      Strengths:

      The authors convincingly show that T1 CD4, lacking the interacBon with LCK supports the TCR signaling and also that the two studied moBfs have a significant contribuBon to it.

      Weaknesses:

      The study has several weaknesses.

      (1) The whole study is based on a single experimental system, geneBcally modified 58-/- hybridoma. It is unclear at this moment, how the molecular moBfs studied here contribute to the signaling in a real T cell. The evoluBonary conservancy suggests that these moBfs are important for T cell biology. However, the LCK-binding moBf is conserved as well (perhaps even more) and it plays a very minor role in their model. Without verifying their results in primary cells, the quanBtaBve, but even qualitaBve, importance of these moBfs for T-cell signaling and biology is unclear. Although the authors discuss this issue in the Discussion, it should be noted in all important parts of the manuscript, where conclusions are made (abstract, end of introducBon, perhaps also in the Btle) that the results are coming from the hybridoma cells.

      We appreciate the Reviewer’s thoughWul comments and suggesBon. We now state in the abstract and introducBon that wet-lab experiments were performed with T cell hybridomas. We have also beXer highlighted work from Killeen and LiXman (PMID: 8355789) wherein they showed that C-terminally truncated CD4, which lacked the moBfs that mediate CD4-Lck interacBons, can drive CD4+ T cell development, proliferaBon, and T-helper funcBon because we now provide mechanisBc data to help explain those in vivo results. Also, as noted by the reviewer, we discuss how the sum of our data provides jusBficaBon for the investment in and use of mouse models to interrogate how the funcBonally important residues/moBfs idenBfied and studied here influence T cell biology.

      We will take the opportunity to reiterate here that, while the study is based on a well characterized, albeit single, wet-lab experimental system, the whole study is based on two lines of invesBgaBon. The other approach was a systems biology computaBonal approach that analyzes data from real-world experiments in a variety of jawed vertebrate species over evoluBon. Specifically, we used a computaBonal reconstrucBon of the evoluBonary history of CD4 by performing mulBple analyses of CD4 from 99 jawed vertebrates spanning ~435 million years of evoluBon. This analysis allowed us to idenBfy residues, and networks of evoluBonarily coupled residues, that are predicted to be funcBonally important in vivo. Like other systems biology approaches, this allowed us to look at the larger picture by evaluaBng data points that have emerged from constant tesBng and adjustments of CD4 funcBon in vivo through selecBon on an evoluBonary Bmescale in more jawed vertebrate species, and under more real-world condiBons, than can be tested in the laboratory. Our structure-funcBon analysis provided a second, wet-lab reducBonist experimental system to cross-validate that the residues idenBfied by our evoluBonary analysis are funcBonally significant. This experimental validaBon is criBcal and elevates the relevance of our studies above ad hoc observaBons. Our work also provides mechanisBc insights for why the residues studied here are funcBonally significant (i.e., key determinants of pMHCII-specific signaling iniBaBon). In short, using both systems allowed us to cross-validate the funcBonal significance of the residues within the GGXXG and (C/F)CV+C moBfs studied here by two independent methods.

      (2) Many of the experiments lack the negaBve control. I believe that two types of negaBve controls should be included in all experiments. First, hybridoma cells without CD4 (or with CD4 mutant unable to bind MHCII). Second, no pepBde control, i.e., acBvaBon of the hybridoma cells with the APC not loaded with the cognate pepBde. These controls are required to disBnguish the basal levels of phoshorylaBon and CD4-independent anBgen-induced phosphorylaBon to quanBfy, what is the contribuBon of the parBcular moBfs to the CD4-mediated support. Although these controls are included in some of the experiments, they are missing in other ones. The binding mutant appears in some FC results as a horizontal bar (without any error bar/variability), showing that CD4 does not give a huge advantage in these readouts. Why don't the authors show no pepBde controls here as well? Why the primary FC data (histograms) are not shown? Why neither of these two controls is shown for the % of responders plots? Although the IL-2 producBon is a very robust and convincing readout, the phosphoflow is much less sensiBve. It seems that the signaling is elevated only marginally. Without the menBoned controls and showing the raw data, the precise interpretaBon is not possible.

      These comments, and those in point #3, concern our flow cytometry-based analysis of early intracellular signaling events where we asked: how do the moBfs under invesBgaBon impact phosphorylaBon of CD3z, ZAP-70, and PLCg1 in response to agonist pMHCII? Thank you for poinBng out areas of confusion regarding these analyses. We will try to clarify here and have worked to clarify the text.

      Our approach was to mutate consBtuent residues within the moBfs that our evoluBonary analysis predicted to be funcBonally significant, compare the performance of the mutants to that of controls bearing WT moBfs, and then infer the funcBon of the moBfs based on the differenBal phenotype of the mutants relaBve to their controls. In most cases, the C-terminally truncated CD4-T1 mutant served as the appropriate CD4 control backbone against which to evaluate the phenotypes of the GGXXG and (C/F)CV+C moBf mutants. This is a convenBonal structure-funcBon strategy.

      All experiments included APCs expressing null pMHCII (Hb:I-Ek) as negaBve controls. These were a necessary component of the data analysis, explained further below, which involved background subtracBon of the signal from control or mutant T cell hybridomas bound to these negaBve control APCs from those bound to the agonist pMHCII (MCC:I-Ek). Doing so allowed us to establish a true signal over background for calculaBng percent responders and signaling intensity. These negaBve controls served the same purpose of APCs expressing I-Ek not loaded with cognate pepBde requested by the reviewer. It is important to note that we previously published that TCR-CD3-pMHCII interacBons reciprocally increase CD4-pMHCII dwell Bme, and vice versa, such that dwell Bmes of the 5c.c7 TCR and CD4 to the null Hb:I-Ek are both basal in this system relaBve to antagonist, weak agonist, and agonist pMHCII (PMID 29386113). A recent study using different techniques also concluded that TCR-CD3 and CD4 cooperaBvely enhance signaling to pMHCII (PMID 36396644). The use of the null pMHCII, Hb:I-Ek, in each experiment thus serves as a well-characterized negaBve control for both TCR and CD4 engagement in this experimental system with regards to assembly of the TCR-CD3 and CD4 around pMHCII to drive signaling. In our view, it is the most important negaBve control for interpreBng our results, and it is present in each experiment. In Fig 1B and related supplemental figures we compare the Cterminally truncated CD4-T1 mutant to the full-length WT CD4 to evaluate the contribuBons of the intracellular domains to early signaling events. We found no significant differences for pCD3z, pZAP-70, and pPLCg1 levels demonstraBng that, in our system, CD4 WT and T1 are staBsBcally indisBnguishable.

      In Fig 1C we asked: what is the contribuBon of CD4-pMHCII interacBons made by CD4 T1, which lacks the intracellular domain, using our CD4 T1Dbind mutant. Fig 2C and Table 3 show that pCD3z levels for T1Dbind were ~54% of T1, meaning that CD4 binding to pMHCII roughly doubles pCD3z levels (even without the intracellular domain). We also showed that the percent of responders were not different between the CD4 T1 and T1Dbind mutant in Fig 2C. The impact on ZAP-70 and PLCg1 are shown in Figure 2—figure supplement 4. These differences, including the magnitude of the decrease, were observed reproducibly (p<0.001) in three independently generated sets of lines. We believe that this analysis saBsfies the request by the reviewer for an analysis of the contribuBons of CD4 binding to pMHCII. We did not include this as a negaBve control in experiments evaluaBng the contribuBons of the GGXXG and (C/F)CV+C moBfs to CD4 T1 signaling because the quesBon being asked in those experiments was how do the moBfs impact signaling in the absence of the intracellular domain (i.e., within the CD4 T1 backbone, making CD4 T1 the proper comparator for the quesBon we were asking). We showed the average normalized intensity for the T1Dbind mutant, relaBve to T1, for this lower bound of signaling mediated by TCR-CD3-only as a doXed line in those figures to provide a reference point for the readers to evaluate and put into perspecBve how the mutants we generated impacted the overall contribuBon of CD4 to these early signaling events. The T1Dbind mutants were not always measured in the same experiment at the same Bme with other mutants, because the cell lines used were not always made at the same Bme, so we did not think it appropriate to graph the results together.

      We do not know how to interpret the comment “Although the IL-2 producBon is a very robust and convincing readout, the phosphoflow is much less sensiBve. It seems that the signaling is elevated only marginally.” We will offer our perspecBve that we do not know how to equate the sensiBvity of the phos-flow to the IL-2. Because the IL-2 is a signaling output, it results from signaling amplificaBon from the membrane to the nucleus. If CD3z phosphorylaBon is the iniBaBng event for a signaling cascade that leads to IL-2 gene transcripBon and transducBon, as is widely believed, our data strongly suggests that the ~2-fold difference in pCD3z levels between CD4 T1 and T1Dbind (Fig 2C/Table 3 data) contributes to the difference between no IL-2 output for T1Dbind and IL-2 output by T1 in this experimental system. Because CD4 WT and T1 have significantly different levels of IL-2 output, but show no significant differences in pCD3z, pZAP-70, or pPLCg1 levels, there are likely to be other differences we did not measure via other pathways that intersect at the nucleus. At many levels, biology works on gradients such that small differences can Bp a system in one direcBon or another. The kineBc discriminaBon model (PMID 8643643), which is thought to be a reasonable descripBon of the relaBonship between pMHC engagement and signaling outcomes, suggests that very small differences in molecular interacBons at the earliest stages of a response can lead to big differences in signaling outcome. We therefore have no basis at this juncture to think that ~2-fold differences in pCD3z levels could not account for bigger differences in signaling output such as IL-2.

      (3) The processing of the data is not clear. Some of the figures seem to be overprocessed. For instance, I am not sure what "Normalized % responders of pCD3zeta" means (e.g., Fig. 1C and elsewhere)? Why do not the authors show the actual % of pCD3zeta+ cells including the gaBng strategy? Why do the authors subtract the two histograms in Fig. 2- Fig.S3? It is very unusual.

      We did develop and implement a novel strategy for measuring the impact of our mutaBons on CD3z, ZAP-70, and PLCg1 phosphorylaBon. This was explained in more detail in our prior study. The instrucBons to authors indicated that we should not repeat methods in the current manuscript. However, we will go through the approach here, and address why we did not show primary FC histograms for all experiments from above. First, we think that a brief explanaBon as to what moBvated us to develop our approach will add to a beXer understanding:

      (1) For experimental and staBsBcal rigor, our goal was to perform both experimental and biological replicates by measuring and comparing the average of at least three independently generated sets of paired WT/T1 control Vs. mutant cells lines generated at different Bmes to determine the staBsBcal significance of the difference, if any, between averages of the control and mutant lines.

      (2) Our quesBons necessitated that we measure signals generated naturally by the cooperaBve engagement of cognate pMHCII by TCR-CD3 and CD4 on APCs, rather than through aCD3/aCD4 crosslinking.

      (3) We chose to use flow cytometry rather than bulk cell analysis by Western Bloung to analyze signaling occurring in cells that were engaged to the agonist APC in order to avoid diluBon of that signal by cells that are not engaged to APCs and not signaling. 4. For each experiment, we wanted to subtract background signals from cells bound to APCs expressing a null pMHCII (Hb:I-Ek) from signals generated by cells bound to APCs expressing agonist pMHCII (MCC:I-Ek). Doing so allowed us to idenBfy cells that are signaling (responders) to agonist over null pMHCII. The goal here was to quanBtate the level of signaling in an objecBve manner with a method that can be applied to all samples uniformly rather than seung a flow cytometry gate on posiBve cells (e.g. pCD3z) because gaBng is subjecBve and can vary from experiment to experiment. To put that another way, as detailed below, we used our subtracBon method to idenBfy signaling responders rather than seung a signaling gate on the posiBve populaBon.

      Regarding gaBng schemes, controls, and data processing:

      Figure 2—figure supplement 3 of the current study and Figure 6—figure supplement 1 of our prior study are designed to walk the reader through our experimental design, gaBng, data processing and thinking. Here we will provide a detailed explanaBon to complement the figure legend as well as the methods provided in our prior manuscript (see pt #4 below).

      We will refer to Figure 2—figure supplement 3 here:

      Panel A. The dot plots show our approach to idenBfying 5c.c7+ CD4+ 58a-b- T cell hybridomas (yaxis, GFP posiBve) coupled to M12 cells (x-axis, TagIt Violet) expressing the null pMHCII Hb:I-Ek (lev) or agonist pMHCII MCC:I-Ek (right). The gaBng shows the frequency of GFP+ T cell hybridomas that are bound to TagIt violet posiBve APCs (i.e., cell couples). The histogram on the right then shows the staining intensity for pCD3z on the x-axis for the 10,000 coupled events collected wherein the APCs express the null pMHCII (filled cyan) or the agonist pMHCII (black line).

      Panel B. The data presented here is the same as in Panel A, but for CD4 T1 cells.

      Panel C. The data presented here walks through how we idenBfy 5c.c7+ CD4+ 58a-b- T cell hybridomas responding (i.e., signaling) to agonist pMHCII, as well as the mean signaling intensity of the responding populaBon, in a gaBng-independent manner aver background subtracBon. For the lev graph, we exported the data for the histograms shown in Panel A from FlowJo 10 sovware and ploXed them here using Prism 9 as smoothed lines (500 nearest neighbors). The cyan line is therefore a replicate of the flow cytometry histogram shown in Panel A for pCD3z intensity from 5c.c7+ CD4+ 58a-b- T cell hybridomas coupled to M12 cells expressing the null pMHCII (Hb:I-Ek), while the black histogram is a replicate of the pCD3z intensity for 5c.c7+ CD4+ 58a-b- T cell hybridomas coupled to M12 cells expressing the agonist pMHCII (MCC:I-Ek). Next, to idenBfy the responding populaBon in a gaBng-independent manner, we used Excel to subtract the pCD3z intensity for the null pMHCII (cyan) negaBve control populaBon on a bin-by-bin bases from the pCD3z intensity for the agonist pMHCII (black) responding populaBon. We then transferred the background subtracted values to Prism 9 for smoothing and ploung (grey line: MCC:I-Ek minus Hb:I-Ek). The middle graph shows the same data processing for the data from Panel B for the CD4 T1 cells. Please note that the background subtracted grey line has negaBve values and posiBve values. The negaBve values represent intensity bins where signaling in response to agonist pMHCII leads to fewer cells per bin than in the null pMHCII populaBon that is not signaling, while the posiBve values represent bins of intensity where signaling cells outnumber non-signaling cells. The right graph in this panel shows the populaBons aver background subtracBon for intensity bins that had more cells with pCD3z signal in the agonist pMHCII populaBon than the null pMHCII populaBon (grey = WT full length CD4 and blue = T1). In short, the right graph shows idenBficaBon of those cells that are signaling in response to agonist pMHCII. This approach miBgated the need for subjecBve gaBng in FlowJo to idenBfy signaling cells (i.e., pCD3z posiBve) and allowed for background subtracBon which could not be done in FlowJo. We used this approach for all analyses of pCD3z, pZAP-70, and pPLCg1 in this study.

      The number of cells in these background-subtracted populaBons were divided by 10,000 (the number of events collected and analyzed) to calculate the percent of responding 5c.c7+ CD4+ 58a-b- T cell hybridomas, while the mean fluorescent intensity for the cells within these populaBon represent the signaling intensity.

      Panel D. The graph on the lev shows the mean fluorescence intensity (MFI) ± SEM for the posiBve signaling populaBon from the right graph of panel C. We see in this example comparing a WT and T1 cell line, generated at the same Bme from the same parental 58a-b- T cell hybridoma populaBon, that the T1 MFI is significantly greater than the WT. These intensity values represent one of the paired intensity values used in the main Fig 2B (Lev graph), where we show the paired MFI analysis of responding populaBons from 5 independently generated sets of cell lines. Please note that these single MFI values are directly derived from the flow cytometry histograms aver background subtracBon. Figure 2B, and similar figures, therefore equate to a disBllaBon of all of the histograms for the populaBons tested in a manner that we consider easier to digest than either overlaying all histograms or showing mulBple panels individually. It also conserves more space. This is why we only showed representaBve flow cytometry histograms, rather than all histograms.

      The graph on the right shows the % responders for the posiBve signaling populaBon from the right graph of panel C. Specifically, the total number of cells that were determined to be signaling in response to agonist pMHCII was divided by 10,000 (the number of coupled cells collected by flow cytometry) to determine the percent responders. These values represent one of five sets of values used to determine the average normalized percent responders (all normalized to WT). There was no significant difference between these two populaBons in terms of percent responders.

      Regarding graphing normalized values for the mean MFI for signaling intensity or the percent responders: in our first manuscript, we presented the individual MFI intensity values for matched pairs of cells as well as the actual percent responders per group. The feedback we received from colleagues on this presentaBon was that it was confusing, distracBng, and otherwise hard to digest. It was suggested to us by mulBple individuals that the normalized values would be preferable because it is easier and faster to understand. Upon reflecBon, we agreed with this feedback because the normalized presentaBon with staBsBcs allows for the two key relevant quesBons to be quickly evaluated: 1. Are the mutants different than the control? 2. By how much? We have lev the raw intensity values and well as the normalized intensity values in the version of record. Given the Reviewer’s comments, we have now graphed the average % responders instead of normalized values in the figures, and lev the normalized values in Table 3.

      (4) The manuscript lacks Materials and Methods. It only refers to the previous paper, which is very unusual. Although most of the methods are the same, they sBll should be menBoned here. Moreover, some of the mutants presented here were not generated in the previous study, as far as I understand. Perhaps the authors plan to include Materials and Methods during the revision...

      Because we submiXed this as a Research Advances arBcle we followed the journal instrucBons to reference the Materials and Methods in our prior publicaBon, upon which this work builds, as the methods used are the same. They are detailed in that study. We have now included a copy of the Materials and Methods for the eLife staff to determine how best to link with this manuscript. We have also included the gene sequences for the novel constructs used in this study. Thank you for poinBng out the omission.

      (5) Membrane rafts are a very controversial topic. I recommend the authors stick to the more consensual term "detergent resistant microdomains" in all cases/occurances.

      We agree this is a controversial topic with a variety of viewpoints. Because we are not experts in the field of membrane composition, we turned to the literature to inform our view of how best to refer to these membrane subdomains. In our reading, we found a 2006 meeting report from a Keystone symposium on lipid rafts and cell function authored by Linda Pike (PMID 16645198). At this meeting, a central focus was reaching a consensus on how best to refer to these domains. The consensus term agreed upon by this group was “membrane rafts”. Specifically, we will quote from this report published in the Journal of Lipid Research, ‘Together, the discussions permitted the generation of a definition for “lipid rafts” in an ad hoc session on the final day of the meeting. All participants were invited to contribute to this effort, and the work product reflects the consensus of this broad-based group…… First and foremost, the term “lipid raft” was discarded in favor of the term “membrane raft.”’ We chose to use the term “membrane raft” based on this consensus opinion.

      (6) Last, but not least, the mechanistic explanation (beyond the independence of LCK binding) of the role of these motifs is very unclear at the moment.

      We agree with this comment. One goal in making these results, and those in our prior study, available to the field at large is to provide evidence in support of our view that the dominant paradigm that is thought to explain the earliest events in T cell signaling needs re-evaluating. How T cell signaling is initiated in response to pMHCII is clearly more complex than is currently thought. However, out data is inconsistent with the dominant paradigm in which CD4 recruits Lck to TCR-CD3 to phosphorylate ITAMs to initiate signaling.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kuhn and colleagues follows upon a 2022 eLife paper in which they identified residues in CD4 constrained by evolutionary purifying selection in placental mammals and then performed functional analyses of these conserved sequences. They showed that sequences distinct from the CXC "clamp" involved in recruitment of Lck have critical roles in TCR signaling, and these include a glycine-rich motif in the transmembrane (TM) domain and the cyscontaining juxtamembrane (JM) motif that undergoes palmitoylation, both of which promote TCR signaling, and a cytoplasmic domain helical motif, also involved in Lck binding, that constrains signaling. Mutations in the transmembrane and juxtamembrane sequences led to reduced proximal signaling and IL-2 production in a hybridoma's response to antigen presentation, despite retention of abundant CD4 association with Lck in the detergent-soluble membrane fraction, presumably mislocalized outside of lipid rafts and distal to the TCR. A major conclusion of that study was that CD4 sequences required for Lck association, including the CXC "clasp" motif, are not as consequential for CD4 co-receptor function in TCR signaling as the conserved TM and JM motifs. However, the experiments did not determine whether the functions of the TM and JM motifs are dependent on the Lck-binding properties of CD4 - the mutations in those motifs could result in free Lck redistributing to associate with CD4 in signaling-incompetent membrane domains or could function independently of CD4-Lck association. The current study addresses this specific question.

      Using the same model system as in the earlier eLife paper (the entire methods section is a citation to the earlier paper), the authors show that truncation of the Lck-binding intracellular domain resulted in a moderate reduction in IL-2 response, as previously shown, but there was no apparent effect on proximal phosphorylation events (CD3z, Lck, ZAP70, PLCg1). They then evaluated a series of TM and JM motif mutations in the context of the truncated Lck-nonbinding molecule, and showed that these had substantially impaired co-receptor function in the IL-2 assay and reduced proximal signaling. The proximal signaling could be observed at high ligand density even with a MHC non-binding mutation in CD4, although there was still impaired IL-2 production. This result additionally illustrates that phosphorylation of the proximal signaling molecules is not sufficient to activate IL-2 expression in the context of antigen presentation.

      Strengths:

      The strength of the paper is the further clear demonstration that the classical model of CD4 coreceptor function (MHCII-binding CD4 bringing Lck to the TCR complex, for phosphorylation of the CD3 chain ITAMs and of the ZAP70 kinase) is not sufficient to explain TCR activation. The data, combined with the earlier eLife paper, further implicate the gly-rich TM sequence and the palmitoylation targets in the JM region as having critical roles in productive co-receptordependent TCR activation.

      Weaknesses:

      The major weakness of the paper is the lack of mechanistic insight into how the TM and JM motifs function. The new results are largely incremental in light of the earlier paper from this group as well as other literature, cited by the authors, that implicates "free" Lck, not associated with co-receptors, as having the major role in TCR activation. It is clear that the two motifs are important for CD4 function at low pMHCII ligand density. The proposal that they modulate interactions of TCR complex with cholesterol or other membrane lipids is an interesting one, and it would be worth further exploring by employing approaches that alter membrane lipid composition. The JM sequence presumably dictates localization within the membrane, by way of palmitoylation, which may be critical to regulate avidity of the TCR:CD4 complex for pMHCII or TCR complex allosteric effects that influence the activation threshold. Experiments that explore the basis of the mutant phenotype could substantially enhance the impact of this study.

      We appreciate these thoughtful comments and suggestions. We will restate what we wrote in our preliminary response to the reviews to explain the scope of the current study:

      To address comments about the limited scope of this study and referencing of the Methods secBon to our prior study, we would like to note that we submiXed the current study via the Research Advance mechanism. Our goal was to build upon the conclusions of our 2022 eLife publicaBon (PMID: 35861317) and address an unresolved quesBon from that study (as nicely summarized by Reviewer #2). In the current manuscript we present data from reducBonist experiments that were designed specifically for this purpose and, as noted by the reviewers, we provide answers to the quesBon being asked. We think that the Research Advance mechanism is an ideal opportunity to make these results available to the field given the stated purpose of such arBcles (for reference: “A Research Advance might use a new technique or a different experimental design to generate results that build upon the conclusions of the original research by, for example, providing new mechanis=c insights or extend the pathway under inves=ga=on…”). Now that we have provided evidence that CD4 does not recruit Lck to phosphorylate TCR-CD3 ITAMs in our system, nor do the GGXXG and (C/F)CV+C motifs play a role in enabling CD4 to regulate Lck proximity to TCR-CD3, we agree that it is important to form and test alternative hypotheses for how TCR-CD3 signaling is initiated.

    1. Author Response

      Reviewer #1 (Public Review):

      Combining functional MRI with a decoder, the authors probe the neural substrate of the double drift illusion in visual cortex. Their elegant behavioural paradigm keeps the actual retinal position of the stimulus stable while inducing the illusion with a combination of smooth pursuit and visual motion. The results show that the illusory drift path can be decoded from a signal in extrastriate visual area hMT+ but not other visual areas. Importantly, this can be done in the absence of spatial attention to the stimulus location.

      The particular strengths of this study lie in the elegant paradigm and the clear attentional control. The methodology of the decoder is powerful and at the same time straightforward, well explained, and well accepted in the literature. A potential weakness of the study is the lack of simultaneous eye movement recordings in the scanner. Such data could have provided further clarification of the potential underlying neural mechanism and whether differences in eye movements could contribute to the decoding of the visual illusion path. There are some controls that mitigate this.

      We have addressed the Reviewer's comment by repeating the fMRI experiment in a new group of subjects in which we were able to also obtain concurrent, high-quality eye tracking. When we initially conducted the experiment, it was not possible to perform eye tracking in the 7T scanner at NIH. Because of this limitation, we were forced to depend on careful eye tracking in a pre-scan behavioral experiment. But in the ensuing period of time, we have developed a protocol for obtaining high quality eye tracking with an Eyelink 1000 mounted in the bore of the scanner. Now that we have the ability to collect concurrent eye tracking, we repeated the fMRI experiment and found that our main fMRI result replicated (i.e, it was possible to decode the direction of the illusion from fMRI responses in hMT+). Additional, the concurrent fMRI eye tracking enabled us to make four important observations (see new Fig 4):

      First, subjects maintained stable fixation when the target was stationary during fixation and accurately pursued the vertically moving target during illusion (Fig 4). This analysis confirms that the drifting Gabor remained at a relatively fixed position on the retina during the illusory period.

      Second, there were no differences in microsaccades between any of the conditions. We quantified the direction, amplitude, and frequency of all saccades for each condition. While we did observe small rightward microsaccades, none of the microsaccade characteristics differed between conditions. The rightward microsaccades may have been due to the sustained eccentric leftward fixation. Or, it may have been due to attention to the right visual field stimulus (despite the foveal attention task). Or it may have reflected the known horizontal microsaccade bias. Regardless, we do not believe our fMRI results are related to microsaccades because these small saccades did not differ across condition.

      Finally, we wondered if small not-easily-quantified ocular deviations could have differed between conditions, and somehow result in differences in fMRI activity picked up by the decoding analysis. To test for this possibility, we trained a classier to discriminate condition based on the raw eye traces (just as we did in the main fMRI data analysis). But unlike the fMRI analysis, we found that it was not possible to decode the direction of the illusion from the eye traces themselves.

      We conclude that the ability to decode the illusion from fMRI responses were not due to differences in eye movements caused by the illusion.

      The authors provide important evidence for a potential neural substrate in the extrastriate visual cortex for encoding the perceived spatial location of a moving stimulus. This significantly extends previous studies that showed relevant spatiotopic signals outside visual cortex. Understanding the neural substrate and the underlying neural mechanisms for encoding perceived spatiotopic location are of broad importance for our understanding of the neural basis of sensory perception.

      We thank the Editor for this positive assessment of our work.

      Reviewer #3 (Public Review):

      The authors studied the neural basis of the double drift illusion, an illusion in which a Gabor drifting both horizontally within an aperture and moving vertically along a path appears to follow a diagonal trajectory, perceptually displaced off its true vertical path in the direction of the horizontal drift. The illusion is strong and its neural basis is intriguing. The authors suggest it can be used to address the locus of spatiotopic processing in the brain. They find that fMRI BOLD activity in hMT+ can be used to decode the illusory drift direction of the stimulus, even under conditions of withdrawn attention. They internally replicate this result and ensure it is not due to local motion. They interpret the finding to indicate that hMT+ contains spatiotopic information. This was a carefully designed and conducted study, and the manuscript writing and figures are clear.

      Despite the care that went into the study design and control experiments, I see some potential interpretational issues, and I am uncertain about the scientific advance. My main questions are about the interpretation of the findings, the possible confound of smooth pursuit eye movements, and the relation to previous studies, including previous fMRI studies of the same illusion. I also would like to see more thorough reporting of behavior.

      Major comments

      1) The authors motivate the study by saying that there have been conflicting results about which brain areas are involved in spatiotopic coding, but they did not give an indication about why there might be conflicting results or why the current study is suitable to address the previous discrepancies. Is this study simply adding another observation to the existing body of literature, or does it go beyond previous studies in a critical theoretical way?

      There have indeed been conflicting results in the literature. One idea that has received some prior support in the literature is that spatiotopic location information can depend on the task. Our experiment tests this idea by measuring cortical responses during an illusion that involves spatiotopic coding. Previous human fMRI studies reporting spatiotopic coding have not really linked cortical activity with the perception of spatiotopic coordinates. Hence, we feel that our results make a unique contribution to the field.

      2) The authors interpret the finding of illusory drift direction encoding in hMT+ to mean that hMT+ is coding the illusory spatial position of the stimulus. But could an alternative explanation be that hMT+ is coding the illusory global motion direction, and not the spatial position per se? If this is a possible account, then the result would still indicate that an illusory motion percept is reflected in hMT+ but it would seem not to answer the question about spatiotopic coding which motivated the paper.

      Here, the Reviewer suggests an interesting alternative explanation—that responses in MT pertain to the direction of global motion rather than stimulus position. However, this alternative possibility would still involve spatiotopic coding. In order for the brain to compute the direction of global motion of a stimulus that is at a fixed retinal position, some spatiotopic computation must occur. So, we do not agree with the Reviewers suggestion that this alternative explanation undermines the motivation of this study.

      3) It is good that the authors sought to rule out the possibility that smooth pursuit eye movements were driving the decoding results in hMT+, but I'm not sure they have yet convincingly done so. Decoding based on the pursuit selective voxels alone was very nearly significant (p = 0.052), which was not acknowledged in the text of the paper. Furthermore, because voxels that were both pursuit and stimulus selective were excluded from the pursuit selective ROI, decoding performance in that ROI may have been underestimated.

      To clarify, voxels that were identified by both localizers were NOT excluded from either ROI. When we repeated decoding (from Expt 2, Fig 3B) using disjoint voxel selection (i.e., analyzing voxels that only responded in the stim localizer, or only responded in the pursuit localizer, and excluding voxels that responded to both), we obtained qualitatively similar results, although the magnitude of the effects were smaller, which is not surprising given the much smaller number of voxels remaining in the ROI, and hence the disjoint ROIs only proved marginally significant in MT for the stim localizer (p=0.049).

      4) A previous fMRI study of the double drift illusion (Liu et al. 2019 Current Biology) also found above chance decoding of illusory drift direction in hMT+. The authors mention this study but do not discuss it, so it was unclear to me what the advance is of the current study over that study. The main differences I see are that in the current study, 1) the observer is also moving their eyes so that the double drift stimulus is theoretically stabilized on the retina, and 2) attention is withdrawn from the stimulus. But in both studies, hMT+ contains information about the illusory drift direction even though retinotopic information is the same, so it's not clear to me that the differences between these studies lead to fundamentally different interpretations.

      The results of Liu et al. are not relevant to the reference frame used to encode the stimulus. Because subjects were fixating in Liu et al., the encoding of the illusion could have been in either retinal or spatiotopic coordinates. In our study, the stimulus must have been encoded in spatiotopic coordinates. One interesting feature of Liu et al. is the issue of cross decoding the illusion and actual percept (training the decoder on veridical motion of different angles, and then testing the decoder on data collected during the illusion). One potentially interesting extension of the cross decoding approach would be to train the decoder on a version of the illusion involving fixation (as in Liu et al), but then testing the decoder on the illusion during pursuit. One would expect cross decoding if spatiotopic coordinates are used in both cases. We now discuss this possibility (Discussion: Relationship to a previous study of the double-drift illusion).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study addresses the fundamentally unresolved question of why many thousands of small-effect loci contribute more to the heritability of a trait than the large-effect lead variants. The authors explore resource competition within the transcriptional machinery as one possible explanation with a simple theoretical model, concluding that the effects of resource competition would be too small to explain the heritability effects. The topic and approximation of the problem are very timely and offer an intuitive way to think about polygenic variation, but the analysis of the simple model appears to be incomplete, leaving the main claims only partially supported.

      We thank eLife for recognizing the importance of our work. We hope the revised manuscript addresses the reviewers’ reservations.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study explores whether the extreme polygenicity of common traits can be explained in part by competition among genes for limiting molecular resources (such as RNA polymerases) involved in gene regulation. The authors hypothesise that such competition would cause the expression levels of all genes that utilise the same molecular resource to be correlated and could thus, in principle, partly explain weak trans-regulatory effects and the observation of highly polygenic architectures of gene expression. They study this hypothesis under a very simple model where the same molecule binds to regulatory elements of a large number m of genes, and conclude that this gives rise to trans-regulatory effects that scale as 1/m, and which may thus be negligible for large m.

      We thank the reviewer for their thorough and thoughtful review of our manuscript.

      The main limitation of this study lies in the details of the mathematical analysis, which does not adequately account for various small effects, whose magnitude scales inversely with the number m of genes that compete for the limiting molecular resource. In particular, the fraction of "free" molecule (which is unbound to any of the genes) also scales as 1/m, but is not accounted for in the analysis, making it difficult to assess whether the quantitative conclusions are indeed correct.

      It is explicitly accounted for in the supplement.

      Second, the questions raised in this study are better analysed in the framework of a sensitivity or perturbation analysis, i.e., by asking how changes in expression level or binding affinity at one gene (rather than the total expression level or total binding affinity) affect expression level at other genes. In the context of complex traits, where an increase in gene expression can either increase or decrease the trait, we believe the most important quantity of interest is variation in expression and, therefore, trait variation. Nevertheless, our results do show that the relative change in expression due to competition is also small.

      Thus, while the qualitative conclusion that resource competition in itself is unlikely to mediate trans-regulatory effects and explain highly polygenic architectures of gene expression traits probably holds, the mathematical reasoning used to arrive at this conclusion requires more care.

      In my opinion, the potential impact of this kind of analysis rests at least partly on the plausibility of the initial hypothesis- namely whether most molecular resources involved in gene regulation are indeed "limiting resources". This is not obvious, and may require a careful assessment of existing evidence, e..g., what is the concentration of bound vs. unbound molecular species (such as RNA polymerases) in various cell types?

      We intentionally looked at the most extreme case of extreme resource limitation, and we conclude that since extreme resource limitation is a small effect, the same would be true of weak resource limitation, when unbound molecules play an important role. We put more emphasis on this point in our revised text.

      Reviewer #1 (Recommendations For The Authors):

      While the main conclusion that resource competition in itself is unlikely to mediate trans effects and explain high levels of polygenicity may well be correct, I am not convinced that the mathematical reasoning presented in support of this conclusion is entirely correct. I will attempt to outline my concerns mainly in the context of section 2, since the arguments in sections 3 and 4 build upon this.

      (a) The key assumption underlying the approximations in equations 3, 4, and 5 is that there is very little free polymerase, in other words /_0 is a small quantity. However, the second and third terms that emerge in equation 7 are also small quantities and (as far as I can see) of the same order as /_0. Thus, one cannot simply use equation 4 or 5 as a starting point to derive eq. 7 and should instead use the exact x_i = (g_i [G])/ (1+g_tot [G]), in order to make sure that all (and not just some) terms that are similar in order of magnitude are accounted for in the analysis.

      The concentration of free polymerase is marked as [P], and we explicitly assume (just before eq. 2) that [P]<<[P]0 with [P]0 being the overall concentration of polymerase. This is a conservative assumption – we consider extreme resource competition with little free polymerase and since we since only a small effect in this extreme scenario we assume it would be a small effect also for less extreme scenarios. We put more emphasis on this point in our revised text.

      More concretely, the difference between the exact x_i = (g_i [G])/ (1+g_tot [G]) and the approximate x_i = (g_i / g_tot) is precisely 1/m (for large m) in the example considered line 246 onwards. Thus, I suspect that the conclusion that Var[x_i] = (1-1/m)Var[g_i] in that example is just an artefact of starting with eqs. 4 and 5. As a sanity check, it may be useful to actually simulate resource competition explicitly (maybe using a deterministic simulation) under the explicit model [PG_i] = g_i [G] and _0 = + Sum[[PG]_i , i=1,m] without making any further approximations to see if perturbations in g_i actually produce Order [1/m] effects in the variance of x_i for the example considered line 246 onwards (this would require simulating with a few different m and plotting Var[x_i] vs. m for example).

      The exact equation the reviewer is alluding to describes a scenario of non-extreme resource competition. If g_tot [G]>>1, i.e. if most polymerase is bound to a gene then x_i is equal to g_i/g_tot and this is the scenario we are considering of extreme competition. If g_tot [G]<<1, then x_i=g_i [G] and competition has no effect. While the intermediate case is interesting, we see no reason for the effects to be larger than in the extreme competition case. We have added the results of simulations in the supplement to validate our arguments.

      Lines 231-239: Because of the concerns highlighted above and questions about the validity of equation 7, I am not convinced that the interpretations given here and also in section 4 are correct.

      (b) Lines 219-230 (including equations 6 and 7): I think to address the question of whether genetic changes in cis-regulatory elements for a given gene have an effect on other genes (under this model of resource competition), it is better to spell out the argument in terms of Var[ dx_i ] rather than Var[x_i], where dx_i is the change in expression level at gene i due to changes at all m genes, dg_i is the change in gene activity due to (genetic) changes in the relevant regulatory elements associated with gene i etc. Var[ dx_i ] can then be expressed as a sum of Var[dg_i], Var[dg_tot] and Cov[d g_i, dg_tot]. However, I suspect that to do this correctly, one should not start with the approximate x_i=g_i/g_tot : see previous comment.

      The variance of the deviation from the mean is mathematically identical to the overall variance, Var[ dx_i ]= Var[ x_i ]. Our analysis is therefore equivalent to the suggested analysis.

      Somewhere in all of this, there is also an implicit assumption that E[dg_i] is zero, i.e, mutations are as likely to increase as to decrease binding affinities so that one needs to only consider Var[dx_i] and not E[dx_i]; this assumption should be spelled out.

      Our results concern the variation around trait means and therefore we have not included a possible mean effect of mutation, which would not affect the results but just shift the mean.

      Some minor comments (mostly related to the introduction and general context):

      • I think it would be worth connecting more with the literature on molecular competition and gene regulation (see e.g., How Molecular Competition Influences Fluxes in Gene Expression Networks, De Vos et al, Plos One 2011). Even though this literature does not frame questions in terms of "polygenicity of traits", these analyses address the same basic questions: to what extent do perturbations in gene expression at one gene affect other genes, or to what extent is there crosstalk between different genes or pathways?

      We have expanded our introduction to refer to De Vos et al, as well as a few other papers we have recently become aware of. (e.g., Jie Lin & Ariel Amir Nature Communications volume 9, Article number: 4496 (2018))

      • Lines 88-89: "supports the network component of the model" is a vague phrase that does not convey much. It would be useful to clarify and make this more precise.

      We have clarified this phrasing in the text.

      • Lines 113-114: In the context of "selective constraint", it may also be worth discussing previous work by one of the authors: "A population genetic interpretation of GWAS findings for human quantitative traits". What implications would stabilizing selection on multiple traits (as opposed to simple purifying selection) have for the distribution of variances across trait loci and the extent to which trait architectures appear to be polygenic?

      While most definitely of great interest to some of the authors, the distribution of variance across loci does not affect our results.

      References: Barton and Etheridge 2018 in line 54 is not the correct reference; it should be Barton et al 2017 (paper with Amandine Veber). Fisher 1919 in line 52 is actually Fisher 1918. The formatting of references in the next paragraph (and in various other places in the paper) is also a bit unusual, with some authors referred to by their full names and others only by their last. I believe that it may be useful to crosscheck references throughout the paper.

      We have crosschecked the references in the paper.

      Line 164: Some word appears to be missing here. Maybe bound -> bound to ?

      Fixed

      Reviewer #2 (Public Review):

      The question the authors pose is very simple and yet very important. Does the fact that many genes compete for Pol II to be transcribed explain why so many trans-eQTL contribute to the heritability of complex traits? That is, if a gene uses up a proportion of Pol II, does that in turn affect the transcriptional output of other genes relevant or even irrelevant for the trait in a way that their effect will be captured in a genome-wide association study? If yes, then the large number of genetic effects associated with variation in complex traits can be explained but such trans-propagating has effects on the transcriptional output of many genes.

      This is a very timely question given that we still don't understand how, mechanistically, so many genes can be involved in complex traits variation. Their approach to this question is very simple and it is framed in classic enzyme-substrate equations. The authors show that the trans-propagating effect is too small to explain the ~70% of heritability of complex traits that are associated with trans-effects. Their conclusion relies on the comparison of the order of magnitude of a) the quantifiable transcriptional effects due to Pol II competition, and b) the observed percentage of variance explained by trans effects (data coming from Liu et al 2019, from the same lab).

      The results shown in this manuscript rule out that competition for limited resources in the cell (not restricted to Pol II, but applicable to any other cellular resource like ribosomes, etc) could explain the heritability of complex traits.

      We thanked the Reviewer for his resounding support of our paper!

      Reviewer #2 (Recommendations For The Authors):

      The authors rely on simulated data, and although the conclusions hold in a biologically-realistic scenario given the big difference in effect sizes, I wonder if the authors could provide data from the literature (if available) that give the reader a point of reference for the steady state of cells in terms of free/occupied Pol II molecules and/or free/occupied transcription binding sites. This information won't change the conclusion of the manuscript, but it will put it in the context of real biological data.

      We have scoured the literature, but have not found readily available data with which to validate our results (beyond that which is already referenced).

      Reviewer #3 (Public Review):

      Human complex traits including common diseases are highly polygenic (influenced by thousands of loci). This observation is in need of an explanation. The authors of this manuscript propose a model that competition for a single global resource (such as RNA polymerase II) may lead to a highly polygenic architecture of traits. Following an analytical examination, the authors reject their hypothesis. This work is of clear interest to the field. It remains to be seen if the model covers the variety of possible competition models.

      We thank the Reviewer for his assessment, support and comments.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript provides a straightforward and elegant quantitative argument that the competition for the RNA polymerase is not a significant source of trans-eQTLs and, more generally, of genetic variance of complex polygenic phenotypes. This is an unusual manuscript because the authors propose a hypothesis that they confidently reject based on a calculation. This negative result is intuitive. Still, the manuscript is of interest. Progress in understanding the highly polygenic architecture of complex traits is welcome, and the resource competition hypothesis is quite natural. I have three specific comments/concerns listed below.

      (1) The manuscripts states that V(x_i)=V(g_i/g_tot). Unless I am missing something, this seems to result from a very strong implicit assumption that all genetic variance is due to variation in the binding of RNA polymerase, while x_i_max is a constant. I would expect that x_i_max may also be genetically variable due to many effects unrelated to the Pol II binding (e.g. transcription rate, bursting, presence of R-loops etc.). I guess that the assumption made by the authors is conservative.

      Indeed. We made conservative assumptions throughout, aiming to consider the most extreme scenario in which resource competition may affect trait variation. Our logic being that if even under the most extreme scenario resource competition is a small effect then it is a small effect in all scenarios. We put more emphasis on this point in our revised text.

      (2) The manuscript focuses on the competition for RNA polymerase but suggests that the lesson learned is highly generalizable. However, it is an example of a single global limiting resource resulting in first-order kinetics. What happens in a realistic scenario of competition for multiple resources associated with transcription and with downstream processes (free ribonucleotides, spliceosome, polyadenylation machinery, ribosome, post-translational modifications)? It is possible that in most cases a single resource is a limiting factor, but an investigation (or even a brief discussion) of this question would support the claim that the results are generalizable.

      We expect competition for multiple resource to result in similarly weak effects. Since there is not a great number of such resources, we do not expect it to change our qualitative result. We added language to that effect in the main text.

      (3) Alternatively, what happens in a scenario of competition for multiple local resources shared by a few genes (co-factors, substrates, chaperones, micro-RNAs, post-translational modification factors such as kinases, degradation factors, scaffolding proteins)? In this case, each gene would compete for resources with a few other genes increasing polygenicity without a global competition with all other genes. Intuitively, a large set of such local competitions may lead to a highly polygenic architecture.

      This is indeed a scenario in which competition may be a large effect which we mention in our discussion. “the conclusions may differ in contexts where a very small number of genes compete for a highly limited resource, such as access to a particular molecular transporter”

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We greatly appreciate your positive assessment and the suggestions by Reviewer #2 on the previous version of our manuscript, all of which are very helpful and have greatly improved our manuscript. We have added a description of Biomineralized columnar architecture in the Results section, added a discussion of the Family Eoobolidae, provided more details in the Material and Methods section, and revised other parts of the manuscript based on her/his comments. We are grateful that these comments have enhanced the overall quality of our manuscript. In this letter, we take the opportunity to note and discuss the various changes as below.

      Reviewer #2:

      (1) Two early Cambrian taxa of linguliform brachiopods are assigned to the family Eoobolidae. The taxa exhibit a columnar shell structure and the phylogenetic implications of this shell structure in relation to other early Cambrian families is discussed. It is the interesting idea regarding the evolution of shell structure.

      We thank Reviewer 2 very much for her/his very constructive suggestions. All the comments have been thoroughly considered, and introduced into the revised version of the manuscript.

      (2) The early record of shell structures of linguliform brachiopods is incomplete and partly contradictory. The authors maintain silence regarding contradictory information throughout the article to an extent that information is cited wrongly.

      We agree with Reviewer #2 that the early record of shell structure of linguliform brachiopods is incomplete and potentially in some instances contradictory. This situation is well demonstrated in the Introduction and Systematic Palaeontology sections of this paper. This is also the reason why we think the detailed investigation of early linguliform shell architectures is so important, and we hope this work will be useful for further comparative studies on brachiopod biomineralization. We also understand that more detailed studies of the complexity and diversity of linguliform brachiopod architectures (especially their early fossil representatives) require further investigation.

      (3) The article is written under the assumption that all eoobolids have a columnar shell structure. Thus, the previously claimed columnar structure of Eoobolus incipiens which has been re-illustrated in the paper is not convincing and could be interpreted in other ways.

      Yes, the type specimen of Eoobolus is poorly known and we do not know its shell structure, but the ornamentation, pseudointerarea etc. are well preserved and promote a character diagnosis. In this paper, we focus on the detailed study of Cambrian eoobolids with exquisitely well-preserved columns from the Cambrian Series 2 based on the collection of more than 30 thousand early Cambrian brachiopod specimens in China and Australia. With the wide preservation of columnar shells in early eoobolid specimens, it is likely that Eoobolus has columnar shell architecture, although there is no documentation of the shell structure from every single Eoobolus specimen.

      The secondary columns of Eoobolus incipiens is well demonstrated in Fig. 4a. The size of these columns can be well compared with the columns from other Eoobolus species and acrotretide brachiopods, which are quite different from the criss-cross baculae. As we noted in the manuscript, the columnar structure Eoobolus incipiens is very simple (short columns and less number of columnar units) and can be readily secondarily phosphatised. This is also the reason why it is hard to find the columnar shell architecture in early eoobolids.

      (4) The article needs a proper results section. The Discussion is mainly a review of published data. Other potential results are hidden in this "discussion".

      I would recommend to reorganize the paper and make it a solid presentation of the new taxa and other new results, i.e., have a solid Results section. The Discussion should discuss relevant points that relate to the new results rather than reviewing shell structure in general but skipping relevant parts such as the tertiary shell layer.

      We have reorganised the manuscript based on these comments. A general description of the biomineralized columnar architecture is added in the Results section. As the Supplementary section (main results) includes 7 figures and 3 tables, it will increase the size of the current paper if they are moved to the main text. We would prefer to keep the main results in the Supplementary based on the style and format of eLife.

      As the current information on the shell structures of early linguliform brachiopods is unclear, we need to review most of the previous studies on brachiopod shells in the first part of Discussion section. It will help the readers to follow our results and conclusion. So, we think some of the review content is necessary and helps build the Discussion section. The tertiary shell layer, which is not developed in our studied material, is not discussed in the current research.

      (5) In addition, a more elaborate Methods section is needed in which it is explained how the data for shell thicknesses and numbers of laminae was obtained.

      The potential evolutionary patterns that are discussed towards the end (summarized in Fig 6) are interesting but rather unconvincing as the way the data has been obtained has never been clarified. Shell thicknesses and numbers of laminae that built up the shell of several taxa are compared, but at no point it is stated where these measurements were taken. Shell thicknesses vary within a shell and also the presence of the never mentioned tertiary layer is modifying shell thicknesses. Hence, the presented data appears random and is not comparable. The obtained evolutionary patterns must be considered as dubious.

      A proper Methods section would be needed that explains how the data presented in Fig. 6 has been obtained. Plus it needs to be convincingly explained that the obtained data is in fact comparable and represents, e.g., equivalent areas of the shell in all involved taxa.

      All the information is added in the Material and Methods section. We are aware of the marginal accretionary secretion of brachiopod shells. It is well known that the shell at the posterior is thicker (usually the thickest) than that at the anterior, we did not note this in the previous manuscript. We have measured all the shell data (shell thickness and number of columnar unit) from the posterior part of the adult shell for all the studied taxa. And the measurements of diameter and height of orthogonal columns are performed on available adult specimens from this study and previously published literature. Consequently, the obtained data are comparable and represent equivalent areas of the shell on all involved taxa.

      In term of the tertiary shell layer, we do not find any evidence of this tertiary shell layer from our studied material. The tertiary shell layer is well developed in some recent and Palaeozoic lingulides (Holmer, 1989), but it is not recognised in the early eoobolides and acrotretides.

      (6) A critical revision of the family Eoobolidae and Lingulellotretidae including a revision of the type species of Eoobolus and Lingulellotreta is needed.

      Concerning the families Eoobolidae and Lingulellotretidae, we are aware of the current problematic situation of these families, and we have added more remarks regarding the Eoobolidae in the Systematic Palaeontology section of the manuscript. However, the revision of the families Eoobolidae and Lingulellotretidae falls outside the scope of this paper. We prefer to exclude it just now, as it will be part of an upcoming publication based on more material from China, Australia, Sweden and Estonia that we are currently working on.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides important insights into the degradation of a host tRNA modification enzyme TRMT1 by SARS-CoV-2 protease nsp5. The data convincingly support the main conclusions of the paper. These results will be of interest to virologists interested in studying the alterations in tRNA modifications, host methyltransferases, and viral infections.

      Public Reviews:

      Response to Public Reviews

      We appreciate the reviewers’ assessment that our findings are well supported and provide important insight to the field. We also thank the reviewers for their comments and suggestions that have improved the quality of this manuscript. Through the requested edits and experiments, we provide additional results in this revision that further support and extend our original findings.

      We acknowledge the major questions that remain to be addressed, including the biological relevance of TRMT1 cleavage by Nsp5. We note that elucidating the biological role of host protein cleavage by viral proteases has been a long-standing challenge. For example, several endogenous proteins have been identified as cleavage targets of HIV protease, but the functional relevance for many of these cases took decades to resolve or remain unknown to this day. Nonetheless, we have added additional experiments that suggest a possible role for TRMT1 and TRMT1 cleavage in SARS-CoV-2 pathobiology.

      Key additions in the revised manuscript include:

      • Subcellular localization of full-length TRMT1 and TRMT1 fragments (Supplemental Figure 4).

      • Experiments demonstrating that TRMT1 levels are reduced to near background levels in SARS-CoV-2 infected human cells at higher MOI (Figure 6C and D).

      • Results showing that expression of the non-cleavable TRMT1 mutant can promote virion particle infectivity (Figure 8).

      • The addition of an “Ideas and Speculation” subsection that is now being offered to authors by eLife.

      Reviewer #1 (Public Review):

      Zhang et al. investigate the hypothesis that tRNA methyl transferase 1 (TRMT1) is cleaved by NSP5 (nonstructural protein 5 or MPro), the SARS-CoV-2 main protease, during SARS-CoV-2 infection. They provide solid evidence that TRMT1 is a substrate of Nsp5, revealing an Nsp5 target consensus sequence and evidence of TRMT1 cleavage in cells. Their conclusions are exceptionally strong given the co-submission by D'Oliveira et al showing cleavage of TRMT1 in vitro by Nsp5. Separately, the authors convincingly demonstrate widespread downregulation of RNA modifications during CoV-2 infection, including a requirement for TRMT1 in efficient viral replication. This finding is congruent with the authors' previous work defining the impact of TRMT1 and m2,2g on global translation, which is most likely necessary to support infection and virion production. What still remains unclear is the functional relevance of TRMT1 cleavage by Nsp5 during infection. Based on the data provided here, TRMT1 cleavage may be an act by CoV2 to self-limit replication, as the expression of a non-cleavable TRMT1 (versus wild-type TRMT1) supports enhanced viral RNA expression at certain MOIs. Theoretically, TRMT1 cleavage should inactivate the modification activity of TRMT1, which the authors thoroughly and elegantly investigate with rigorous biochemical assays. However, only a minority of TRMT1 undergoes cleavage during infection in this study and thus whether TRMT1 cleavage serves an important functional role during CoV-2 replication will be an important topic for future work. The authors fairly assess their work in this regard. This study pushes forward the idea that control of tRNA expression and functionality is an important and understudied area of host-pathogen interaction.

      We thank the reviewer for the thoughtful assessment of our study.

      We acknowledge that only a minority of TRMT1 undergoes cleavage during infection at the originally tested MOI. However, the ~40% reduction in TRMT1 levels after infection with SARS-CoV-2 is quite substantial considering that the TRMT1 in the nucleus and mitochondria are likely to be inaccessible to Nsp5. Moreover, we detected a reduction in m2,2G modification in the infected human cells, providing evidence for a functional impact on TRMT1 activity (Figure 1C).

      To further test the effects of SARS-CoV-2 infection on endogenous TRMT1, we infected 293T cells at a higher MOI and measured TRMT1 levels. At MOI=5, we found that SARS-CoV-2 infection led to near complete depletion of TRMT1 in human cells. This result suggests that SARS-CoV-2 infection could have a profound impact on TRMT1 levels during pathogenesis. We have added this new experiment as Figures 6C and D.

      Weaknesses noted:

      The detection of the N-terminal TRMT1 fragment by western blot is not robust. The polyclonal antibody used to detect TRMT1 in this work cross-reacts with a non-specific protein product. Unfortunately, this obstructs the visualization of the predicted N-terminal TRMT1 fragment. It is unclear how the authors were able to perform densitometry, given the interference of the nonspecific band. Additionally, the replicates in the source data make it clear that the appearance of the N-terminal fragment "wisp" under the non-specific band is not seen in every replicate. Though the disappearance of this wisp with mutant Nsp5 and uncleavable TRMT1 is reassuring, the detection of the N-terminal fragment with the TRMT1 antibody should be assessed critically. Considering this group has strong research interests in TRMT1, I assume that attempts to make other antibodies have proved unfruitful. Additionally, N-terminal tagging of TRMT1 is predicted to disrupt the mitochondrial targeting signal, eliminating the potential for using alternative antibodies to see the N-terminal fragment.

      We agree that the anti-TRMT1 antibody used here is sub-optimal for detection of the N-terminal TRMT1 fragment. However, as noted by the Reviewer, we provided multiple ways of corroborating that the lower-molecular weight band detected in human cells expressing Nsp5 corresponds to the N-terminal TRMT1 fragment. We have shown that the TRMT1 cleavage band is not detectable in human cells expressing GFP or inactive Nsp5. This indicates that the lower molecular weight TRMT1 band only arises when active Nsp5 protease is expressed. Moreover, the TRMT1 cleavage band is not detectable in TRMT1-KO cell lines, demonstrating that the band arises from TRMT1 cleavage rather than a non-specific protein. We have also detected the C-terminal fragment if TRMT1 is over-expressed with Nsp5. In addition, we have shown that the mutation of the predicted Nsp5 cleavage site in TRMT1 abolishes the appearance of the N- and Cterminal cleavage fragments.

      Despite the drawbacks of this antibody, we identified gel running conditions that resolves the non-specific band from the N-terminal TRMT1 cleavage fragment. Thus, for quantification, we measured the total signal of both the cleavage band and the nonspecific band in all lanes (Figure 3). After normalization to actin, the total signal from the cleavage band and the non-specific band in the control lane from cells expressing GFP was subtracted from the lanes with cells expressing Nsp5 to calculate the signal arising from the cleavage band. We have updated our Materials and Methods to provide details on how we quantified the TRMT1 cleavage band.

      While we did test other antibodies against TRMT1, none of them were sensitive enough to detect TRMT1 cleavage fragments at endogenous levels. For example, we included results with an antibody targeting the C-terminus of TRMT1 that could not detect TRMT1 cleavage products at endogenous levels (Supplemental Figure 3). However, the antibody could detect the C-terminal TRMT1 fragments if TRMT1 was overexpressed with Nsp5 (Supplemental Figure 3).

      These technical issues reiterate the fact that the functional significance of TRMT1 cleavage during CoV-2 infection remains unclear. However, this study demonstrates an important finding that the tRNA modification landscape is altered during CoV-2 infection and that TRMT1 is an important host factor supporting CoV-2 replication.

      We agree that the functional relevance of TRMT1 cleavage by Nsp5 remains an open question. Thus, we have added an experiment to test the functional impact of TRMT1 on virion particle production and infectivity (Figure 8). We find that TRMT1 expression is required for optimal virus production, consistent with our observation that TRMT1deficient cells exhibit reduced viral RNA replication. In addition, we find that expression of the non-cleavable TRMT1 mutant can promote virion particle infectivity (Figure 8, TRMT1-Q530N). These results are consistent with the Reviewer’s conclusion that “TRMT1 cleavage may be an act by CoV-2 to self-limit replication, as the expression of a non-cleavable TRMT1 (versus wild-type TRMT1) supports enhanced viral RNA expression at certain MOIs”. We discuss the potential implications of this result and their functional relevance in the “Ideas and Speculation” subsection.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript titled 'Proteolytic cleavage and inactivation of the TRMT1 tRNA modification enzyme by SARS-CoV-2 main protease' from K. Zhang et al. demonstrates that several RNA modifications are downregulated during SARS-CoV-2 infection including the widespread m2,2G methylation, which potentially contributes to changes in host translation. To understand the molecular basis behind this global hypomodification of RNA during infection, the authors focused on the human methyltransferase TRMT1 that catalyzes the m2,2G modification. They reveal that TRMT1 not only interacts with the main SARS-CoV-2 protease (Nsp5) in human cells but is also cleaved by Nsp5. To establish if TRMT1 cleavage by Nsp5 contributes to the reduction in m2,2G levels, the authors show compelling evidence that the TRMT1 fragments are incapable of methylating the RNA substrates due to loss of RNA binding by the catalytic domain. They further determine that expression of full-length TRMT1 is required for optimal SARS-CoV-2 replication in 293T cells. Nevertheless, the cleavage of TRMT1 was dispensable for SARS-CoV-2 replication hinting at the possibility that TRMT1 could be an off-target or fortuitous substrate of Nsp5. Overall, this study will be of interest to virologists and biologists studying the role of RNA modification and RNA modifying enzymes in viral infection.

      We thank the reviewer for the thoughtful assessment of our study.

      We agree with the possibility that TRMT1 could be a fortuitous substrate of Nsp5 due to the coincidental presence of a Nsp5 cleavage site in TRMT1. As considered in our Discussion section, TRMT1 cleavage could be a collateral effect of SARS-CoV-2 infection. While TRMT1 could be an off-target substrate during viral infection, the subsequent effect on tRNA modification levels could have physiological consequences on downstream processes that affect cellular health. This information could still be useful for understanding the pathophysiological consequences of SARS-CoV-2 infection in tissues.

      Strengths:

      • The authors use a state-of-the-art mass spectrometry approach to quantify RNA modifications in human cells infected with SARS-CoV-2.

      • The authors go to great length to demonstrate that SARS-CoV-2 main protease, Nsp5, interacts, and cleaves TRMT1 in cells and perform important controls when needed. They use a series of overexpression with strategically placed tags on both TRMT1 and Nsp5 to strengthen their observations.

      • The use of an inactive Nsp5 mutant (C145A) strongly supports the claim of the authors that Nsp5 is solely responsible for TRMT1 cleavage in cells.

      • Although the direct cleavage was not experimentally determined, the authors convincingly show that TRMT1 Q530N is not cleaved by Nsp5 suggesting that the predicted cleavage site at this position is most likely the bona fide region processed by Nsp5 in cells.

      • To understand the impact of TRMT1 cleavage on its RNA methylation activity, the authors rigorously test four protein constructs for their capacity not only to bind RNA but also to introduce the m2,2G modification. They demonstrate that the fragments resulting from TRMT1 cleavage are inactive and cannot methylate RNA. They further establish that the C-terminal region of TRMT1 (containing a zinc-finger domain) is the main binding site for RNA.

      • While 293T cells are unlikely an ideal model system to study SARS-CoV-2 infection, the authors use two cell lines and well-designed rescue experiments to uncover that TRMT1 is required for optimal SARS-CoV-2 replication.

      Weaknesses:

      • Immunoblo0ng is extensively used to probe for TRMT1 degradation by Nsp5 in this study. Regretfully, the polyclonal antibody used by the authors shows strong non-specific binding to other epitopes. This complicates the data interpretation and quantification since the cleaved TRMT1 band migrates very closely to a main non-specific band detected by the antibody (for instance Fig 3A). While this reviewer is concerned about the cross-contamination during quantification of the N-TRMT1, the loss of this faint cleaved band with the TRMT1 Q530N mutant is reassuring. Nevertheless, the poor behavior of this antibody for TRMT1 detection was already reported and the authors should have taken better precautions or designed a different strategy to circumvent the limitation of this antibody by relying on additional tags.

      We acknowledge the sub-optimal performance of the commercial anti-TRMT1 antibody used in our study. Nevertheless, we have provided multiple lines of evidence indicating that the lower molecular weight band detected using this antibody corresponds to the N-terminal TRMT1 fragment. As noted by the reviewer, we have shown that the lower molecular weight band disappears using the TRMT1-Q530N non-cleavable mutant. The lower molecular weight signal is also absent in TRMT1-KO cell lines expressing Nsp5. Moreover, we have shown that the TRMT1 cleavage band is undetectable in human cells expressing GFP or inactive Nsp5. We have also detected the C-terminal fragment when TRMT1 is over-expressed with Nsp5.

      As discussed in the response to Reviewer 1, we did consider alternative approaches for detecting the N-terminal fragment. We thought about tagging TRMT1 at the N-terminus so that we could detect the cleavage band using a different antibody. However, as noted by Reviewer 1, the tagging of TRMT1 at the N-terminus is likely to disrupt the mitochondrial targeting signal and alter the localization of TRMT1. In addition, we spent considerable time and effort testing alternative antibodies against TRMT1. However, none of them were effective at detecting the N- or C-terminal TRMT1 fragments. For example, we included results with a different antibody targeting the C-terminus of TRMT1 that could not detect TRMT1 cleavage products at endogenous levels but could detect them when TRMT1 was overexpressed with Nsp5 (Supplemental Figure 3).

      • While 293T cells are convenient to use, it is not a well-suited model system to study SARS-CoV2 infection and replication. Therefore, some of the conclusions from this study might not apply to better-suited cell systems such as Vero E6 cells or might not be observed in patient-infected cells.

      We acknowledge the potential caveats associated with using 293T human embryonic cells as a system for testing SARS-CoV2 replication. However, we note that 293T cells have been used as a physiological model for discovering and characterizing key aspects of SARS-CoV-2 biology, including viral replication. For example, SARS-CoV-2 has been shown to exhibit significant replication and virion production in 293T cells expressing ACE2 that can be inhibited by known SARS-CoV-2 antiviral compounds:

      https://www.thelancet.com/journals/lanmic/article/PIIS2666-5247(20)300045/fulltext

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9444585/

      https://www.science.org/doi/10.1126/sciadv.add3867

      https://www.pnas.org/doi/full/10.1073/pnas.2025866118

      293T cells have also been demonstrated to exhibit cytopathic effects upon SARS-CoV-2 infection that are dependent upon the ACE2 receptor and mirror that of infected lung cells in culture and in patient tissues:

      https://www.embopress.org/doi/full/10.15252/embj.2020106267

      https://journals.asm.org/doi/full/10.1128/jvi.00002-22

      https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1009715

      https://www.nature.com/articles/s41559-021-01407-1

      In addition to 293T cells, we have demonstrated that infection of MRC5 human pulmonary fibroblast cells with SARS-CoV-2 results in a decrease in TRMT1 levels and m2,2G modification (Figure 1). The reduction in TRMT1 levels in MRC5 cells after SARS-CoV-2 infection is similar to that observed in 293T cells.

      • The reduction of bulk TRMT1 levels is minor during infection of MRC5 cells with SARS-CoV-2 (Fig 1). This does not seem to agree with the more dramatic reduction in m2,2G modification levels. Cellular Localization experiments of TRMT1 would help clarify this. While TRMT1 is found in the cytoplasm and nucleus, it is possible that TRMT1 is more dramatically degraded in the cytoplasm due to easier access by Nsp5.

      We agree that the processing of newly synthesized TRMT1 in the cytoplasm is likely to be the main cause for the reduction of TRMT1 levels in the infected MRC5 cells. Thus, we followed the Reviewer’s suggestion to conduct cellular localization experiments of TRMT1 (Supplemental Figure 4). Through these experiments, we show that full-length TRMT1 exhibits localization to the cytoplasm, mitochondria, and nucleus, consistent with prior findings from our group and others. This result supports the conclusion that cytoplasmic TRMT1 is the likely target of Nsp5 cleavage while TRMT1 in the nucleus and mitochondria are inaccessible to Nsp5. We also note that the decrease in cytoplasmic TRMT1 could account for the reduction in m2,2G modifications if the cytoplasmic pool of TRMT1 is responsible for modifying any exported tRNAs that were not modified in the nucleus.

      • In Fig 6, the authors show that TRMT1 is required for optimal SARS-CoV-2 replication. This can be rescued by expressing TRMT1 (Fig 7). Nevertheless, it is unknown if the methylation activity of TRMT1 is required. The authors could have expressed an inactive TRMT1 mutant (by disrupting the SAM binding site) to establish if the RNA modification by TRMT1 is important for SARS-CoV-2 replication or if it is the protein backbone that might contribute to other processes.

      We agree that it would be interesting to test if the methylation activity of TRMT1 is important for optimal SARS-CoV-2 replication. However, the present study focuses on the cleavage of TRMT1 by Nsp5 and the biological effects of this cleavage. Thus, we feel that generating another human cell line lies outside the scope of this paper and would be an excellent idea for future studies. We thank the reviewer for the proposed experiment.

      • Fig 7, the authors used the Q530N variant to rescue SARS-CoV-2 replication in TRMT1 KO cells. This is an important experiment and unexpectedly reveals that TRMT1 cleavage by Nsp5 is not required for viral replication. To strengthen the claim of the authors that TRMT1 is required to promote viral replication and that its cleavage inhibits RNA methylation, the authors could express the TRMT1 N-terminal construct in the TRMT1 KO cells to assess if viral replication is restored or not to similar levels as WT TRMT1. This will further validate the potential biological importance of TRMT1 cleavage by Nsp5.

      Indeed, we did not expect to find that human cells expressing the TRMT1-Q530N variant exhibit higher levels of viral replication. This suggests that cleavage of TRMT1 is inhibitory for viral replication. To provide further support for this observation, we analyzed the viral titer and infectivity of supernatants derived from human cells expressing wildtype TRMT1 or TRMT1-Q530N. Consistent with our finding that TRMT1-Q530N cells contain more viral RNA, the media supernatants from TRMT1Q530N expressing cells exhibit higher viral titer and infectivity compared to supernatants from TRMT1-KO cells expressing wildtype TRMT1. These results provide additional evidence that TRMT1 is required to promote viral replication. Moreover, these findings suggest that TRMT1 cleavage and reduced protein synthesis could selflimit viral replication. The additional results have been added as Figure 8.

      • Fig 7 shows that the TRMT1 Q530N variant rescues SARS-CoV-2 replication to greater levels then WT TRMT1. The authors should discuss this in greater detail and its possible implications with their proposed statement. For instance, are m2,2G levels higher in Q530N compared to WT? Does Q530N co-elute with Nsp5 or is the interaction disrupted in cells?

      These are excellent points brought up by the Reviewer. As noted above, we have added an additional experiment that tests the functional relevance of TRMT1 expression and cleavage on virion production and infectivity (Figure 8). Moreover, we have followed the Reviewer’s suggestion and discussed the potential implications of these findings in the “Ideas and Speculation” subsection.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors have used biochemical approaches to provide compelling evidence for the cleavage of TRMT1 by SARS-CoV-2 Nsp5 protease. This work is of wide interest to biochemists, cell biologists, and structural biologists in the coronavirus (CoV) field. Furthermore, it substantially advances the understanding of how CoV's interact with host factors during infection and modify cellular metabolism.

      We thank the reviewer for the thoughtful assessment of our study.

      Strengths:

      The authors provide multiple lines of biochemical evidence to report a TRMT1-Nsp5 interaction during SARS-CoV-2 infection. They show that the host enzyme TRMT1 is cleaved at a specific site and that it generates fragments that are incapable of functioning properly. This is an important result because TRMT1 is a critical player in host protein synthesis. This also advances our understanding of virus-host interactions during SARS-CoV-2 infections.

      Weaknesses:

      The major weakness is the lack of mechanistic insights into TRMT1-Nsp5 interactions. The authors have provided commendable biochemical data on proving the TRMT1-Nsp5 interaction but without clear mechanistic insights into when this interaction takes place in the context of SARS-CoV-2 propagation, what are the functional consequences of this interaction on host biology, and does this somehow benefit the infecting virus? I feel that the authors played it a bit safe despite having access to several reagents and an extremely promising research direction.

      We agree that our findings have prompted questions on the mechanistic and functional relevance of TRMT1 cleavage by Nsp5. To begin addressing the latter point, we have included a new experiment testing the impact of TRMT1 expression and cleavage on SARS-CoV-2 virus production and infectivity (Figure 8). We find that TRMT1-deficient cells infected with SARS-CoV-2 exhibit less virion production and the viruses produced are less infectious. Intriguingly, we find that expression of the non-cleavable TRMT1-Q530N variant in TRMT1-KO cells promotes an increase of viral titer as well as infectivity compared to expression of wildtype TRMT1. These results provide evidence for an unexpected role for TRMT1 expression in virus production and the generation of optimally infectious SARS-CoV-2 particles. We discuss the potential implications of this finding in the “Ideas and Speculation” subsection.

      We agree that understanding the timing and effects of Nsp5-TRMT1 interaction will be an important area of investigation moving forward. We would like to include additional time points beyond 24- and 48-hours post-infection. However, we have found that the MRC5-ACE2 cells exhibited increased levels of cell death at 72 and 96-hours postinfection that could confound results (Raymonda et al 2022). Moreover, we would like to know how the reduction in m2,2G modifications affects host tRNA biology and translation. However, these experiments involve large-scale methods such as tRNA sequencing and ribosome profiling which are outside the scope of our current studies and will be the subject of future efforts.

      We acknowledge the Reviewer’s assessment that we “played it a bit safe” in discussing the functional consequences of Nsp5-TRMT1 interaction. We aimed for a circumspect interpretation of our results and their biological implications, but might have been too cautious in our conclusions. Thus, we have added an “Ideas and Speculation” subsection that discusses possible reasons for how TRMT1 cleavage and interaction with Nsp5 could benefit the virus. We thank the Reviewer for pointing out this issue in our initial manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Having reviewed an earlier version of this manuscript, I appreciated the recent progress made by the authors. I felt the entire body of work is quite solid and the interpretations are clear and not overstated. One piece of data I thought deserved a sentence or two of discussion was the complementation assay with Q530N TRMT1. This experiment suggests the possibility that cleavage of TRMT1 by Nsp5 may be an act to self-limit replication, although this result could also be due to the elevated levels of Q530N TRMT1 expression compared to WT. I still think it is worthy of discussion. Another thing I would recommend is to include the length of infection by SARS-CoV-2 in the figure legends.

      We thank the reviewer for their positive response and constructive comments.

      We have followed the Reviewer’s suggestion to further discuss how cleavage of TRMT1 may act to self-limit replication in the “Ideas and Speculation” subsection. We have also included the length of infection by SARS-CoV-2 in the figure legends.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the comments mentioned in the public review, this reviewer encourages the authors to address the following points:

      • Please clarify the rationale behind choosing 24 and 48 hours post-infection as time points for the analyses (Fig 1). One would expect even lower levels of TRMT1 and RNA modification after 72 and 96 hours post-infection.

      We chose the 24 and 48-hour time points since we have shown that MRC5 cells exhibit elevated accumulation of viral RNA at these time points (Raymonda et al 2022). However, at 72 and 96-hours post-infection, we have found that the MRC5-ACE2 cells exhibited cytopathic effects indicative of cell death that could confound results. We have included the rationale for these time points in our revised manuscript.

      • In Supplementary Figure 3, please add in the legend the meaning of the asterisk symbol.

      The asterisks denote non-specific bands that are still detectable in the TRMT1-KO cell line. We have updated the Figure Legend and thank the Reviewer for catching this omission.

      • In Supplementary Figure 3B, there is an intermediate band in lane 3 with C145A when using the antibody 609-659. The authors should clarify what that band is.

      The intermediate band in lane 3 (and in lane 6) of Supplemental Figure 3B represents non-specific detection of the Nsp5-C145A variant that exhibits extremely high levels of expression since it cannot self-cleave. We have clarified the identity of the band in the figure legend.

      Reviewer #3 (Recommendations For The Authors):

      I have only minor comments:

      Although the authors have done a commendable job of providing compelling biochemical evidence of TRMT1 cleavage by Nsp5, it is not clear how this enhances viral infection. The discussion presents the experimental findings and prior publications as a series of correlated observations without clearly specifying the mechanistic benefits of TRMT1 hijacking towards CoV propagation, or even proposing a mechanistic hypothesis to this end.

      We agree with the Reviewer that providing a mechanistic hypothesis on how TRMT1 cleavage impacts virus biology will help inform future studies. We have followed the Reviewer’s suggestion and discuss potential mechanisms in the “Ideas and Speculation” subsection.

      How do these experiments inform us about the cell biology of SARS-CoV- infections? Does Nsp5-mediated degradation start early in infection? Is the loss of TRMT1 sustained over the course of the infection? Do Nsp5 concentrations or relative amounts correlate with TRMT1 loss during this period? For instance, is there only a modest increase in Nsp5 levels from 24h to 48h? I would suggest adding a few more data points than just 24h and 48h in the cell culture experiments. As the manuscript stands right now, it will be a bit difficult for readers to appreciate the relevance of this study in its present form.

      These are excellent questions raised by the Reviewer. The temporal effects of SARSCoV-2 infection on TRMT1 levels will be an important area to dissect moving forward.

      As mentioned above, we would like to include additional time points beyond 24- and 48-hours post-infection. However, at 72 and 96-hours post-infection, we have found that the MRC5-ACE2 cells exhibited increased levels of cell death that could confound results.

      However, we do observe a correlation between the level of infection and the amount of TRMT1 depletion. In our newly added Figure 6C and 6D, we show that increasing the MOI leads to a concomitant increase in N-protein production that correlates with the amount of TRMT1 depletion. Moreover, we have added additional experiments to explore the biological relevance of our findings in terms of virion particle production and infectivity. We thank the reviewer for these insightful questions that have improved our manuscript and provide a foundation for future studies.

      Related to this previous comment: how do the authors rationalize their inference that TRMT1 is essential for SARS-CoV-2 infection, yet it is cleaved during the infection? What seems to be the advantage of this seemingly contradictory but possibly quite intriguing inference?

      We acknowledge the paradox that TRMT1 seems to be essential for SARS-CoV-2 replication but is cleaved during the infection. We propose several hypotheses to explain these findings:

      Hypothesis 1: TRMT1 could be a bystander target. The loss of TRMT1 expression leads to a decrease in modifications that impacts translation. This decrease in translation capacity of the infected cells would lead to decreased production of viral proteins and reduced viral replication. This could explain why TRMT1-deficient cells exhibit less virus production. This could also account for why the TRMT1-Q530N mutant might produce more virus. In this case, the cleavage of TRMT1 and biological effects on viral replication and virion production are coincidental. However, even if TRMT1 cleavage and inactivation does not impact viral replication or production, it would still be important to know the cellular impacts that contribute to disease pathogenesis.

      Hypothesis 2: The slight diminishment of viral replication due to host translation inhibition could outweigh the benefits of shutting down host responses dependent upon protein synthesis. The decrease in TRMT1-catalyzed tRNA modification caused by Nsp5 cleavage could severely inhibit host translation while viral translation can still be maintained through a tRNA pool optimized for viral translation, albeit at a slightly lower rate than if TRMT1 is not cleaved.

      Hypotheses 3: The Nsp5-TRMT1 interaction could allow the virus to bind tRNAs that are packaged in viral particles as suggested previously (Pena et al., 2022). The finding that expression of the non-cleavable TRMT1-Q530N variant enhances viral replication and infectivity supports the hypothesis that TRMT1 could facilitate tRNA uptake into viral particles. The packaging of specific tRNAs in viral particles could enhance viral translation in the subsequent round of infection, thereby enhancing infectivity and perhaps facilitating the species jump of SARS-CoV-2 towards hosts with incompatible codon bias.

      We have included these hypotheses in the new “Ideas and Speculation” subsection.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      After revision, I only have a few remaining remarks:

      l. 180 The authors write: We were able to process all 4 datasets with minimal adjustments to the default parameter values (Methods).

      But they still don't indicate how they vary parameters and how important this is for success or how this affects absolute measurements such as average cell length. Could they give a table of parameter values and some sense of sensitivity for any future user?

      We thank the reviewer for the suggestion. We see how this info is valuable for the user. We’ve added a table with the parameter values used for processing each dataset in the supplemental information, along with the default parameters for reference (lines 476 - 496). In that section we also discuss which parameters may affect the output measurements of cell size, etc.

      l. 192-193 They write 'The software performed well on BACMMAN, molyso and MoMa datasets.' Naming the datasets after the analysis methods used in the original papers could be confusing, as they analyse data with MM3. Not sure how best to resolve this, maybe using first author names instead.

      We thank the reviewer for pointing this out. We now refer to them with the first author names.

      Related to the request of ref. #1 for a video tutorial, the video currently displayed under the github readme.md section 'Usage guide' is not functional. And the video at the top of the same page is very short with minimal information.

      We thank the reviewer for letting us know the tutorial video was not functional. We’ve tested it on Linux, Mac and Windows machines on both Firefox and Chrome. We were not able to reproduce any problems for the video - could they let us know what browser / OS was used and any other specifics? If it’s easier, we can be reached through the Github page as well.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers:

      We would like to thank all the reviewers and the editors for their thorough and helpful feedback on our work. Before addressing specific questions and points, we would like to make a general comment on a mechanistic aspect of this study. The reviewers correctly pointed out that our study does not reveal the molecular mechanism that leads to centromeric histone depletion specifically from meiotic chromosomes. Identifying this mechanism requires a deep and thorough understanding of how centromeric histones are loaded and centromeres are established each cell cycle, and how they are maintained over time in different cell types. To our knowledge, these mechanisms have not been described in plants. To add a further layer of complexity, it appears that the mechanisms governing CENH3 maintenance may be (partially) different in plant mitotic and meiotic cells, and the mechanistic basis of this difference is unknown. Obviously, these are interesting but also complex questions and their resolution will require considerable resources and effort, which we believe is beyond the scope of this manuscript. Nevertheless, our finding that CENH3 maintenance and centromere function in meiotic cells are sensitive to heat stress is an unexpected discovery with profound implications for plant adaptation, which provides a strong incentive for further exploration of centromere maintenance mechanisms in plants.

      Furthermore, we would like to apologize to reviewers for poor quality of pictures in the original submission. It was decreased by conversion to a pdf format during submission.

      eLife assessment

      This important study reports how heat stress affects centromere integrity by compromising the loading of the centromere protein CENH3 and by prolonging the spindle assembly checkpoint during male meiosis in Arabidopsis thaliana. The evidence supporting the claims by live cell imaging is convincing, although deeper mechanistic insight is lacking, making the study overall somewhat preliminary in nature. This work will be of interest to a broad audience of biologists working on how chromatin states are affected by stress conditions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Khaitova and co-workers present here an analysis of centromere composition and function during elevated temperatures in the plant Arabidopsis. The work relates to the ongoing climate change during which spikes in high temperatures will be found. Hence, the paper addresses a timely subject.

      The authors start by confirming earlier studies that high temperatures reduce the fertility of Arabidopsis plants. Interestingly, a hypomorphic mutant of the centromeric histone variant CENH3 (CENP-A), which was previously described by the authors, sensitizes plants to heat and results in a drop in viable pollen and silique length. The drop in fertility coincides with the formation of micronuclei in meiosis and an extension of meiotic progression as revealed by live cell imaging. Based on this finding, the authors then show that at high temperatures, the fluorescence intensity of a YFP:CENH3 declines in meiosis but remarkably not in the surrounding cells (tapetum cells). In addition, the amount of BMF1 (a Bub1 homolog and part of the spindle assembly checkpoint) also appears to decline on the kinetochores of meiocytes as judged by BMF1 reporter line. However, whether this is dependent on a decline of CENH3 or represents a separate pathway is not clear.

      We provide new data in Figure S6 showing that BMF1 loading on centromeres is substantially reduced in cenh3-4 mutants. Thus, efficient tethering of BMF1 to centromeres depends on CENH3.

      Finally, the authors measure the duration of the spindle checkpoint and find that it is extended under high temperatures from which they conclude that the attachment of spindle fibers to kinetochores is compromised under heat.

      Strengths:

      This is an interesting and important paper as it links centromere organization/function to heat stress in plants. A major conclusion of the authors is that weakened centromeres, presumably by heat, may be less effective in establishing productive interactions with spindle microtubules.

      Weaknesses:

      The paper does not explain the molecular reason why CENH3 levels in meiocyctes are reduced or why the attachment of spindle fibers to kinetochore is less efficient at high versus low temperatures.

      While we cannot explain the molecular mechanism underlying temperature-dependent depletion of CENH3 in meiocytes, the less efficient attachment of microtubules to the kinetochores at higher temperatures is likely caused by reduced levels of CENH3, which result in smaller centromeres that are less effective in establishing productive microtubule-kinetochore attachments. Here (new Figure S6) and in our previous study (Capitao et al. 2021), we have shown that amount of centromere/kinetochore proteins is reduced at centromeres in cenh3-4 mutants, and that these plants exhibit prolonged SAC and slower chromosome biorientation.

      Reviewer #2 (Public Review):

      Summary:

      This work investigates how increased temperature affects pollen production and fertility of Arabidopsis thaliana plants grown at selected temperature conditions ranging from 16C to 30C. They report that pollen production and fertility decline with increasing temperature. To identify the cause of reduced pollen and fertility, they resort to living cell imaging of male meiotic cells to identify that the duration of meiosis increases with an increase in temperature. They also show that pollen sterility is associated with the increased presence of micronuclei likely originating from heat stress-induced impaired meiotic chromosome segregation. They correlate abnormal meiosis to weakened centromere caused by meiosis-specific defective loading of the centromere-specific histone H3 variant (CenH3) to the meiotic centromeres. Similar is the case with kinetochore-associated spindle assembly checkpoint(SAC) protein BMF1. Intriguingly, they observe a reverse trend of strong CENH3 presence in the somatic cells of the tapetum in contrast to reduced loading of CENH3 in male meiocytes with increasing temperature. In contrast to CENH3 and BMF1, the SAC protein BMF3 persists for longer periods than the WT control, based on which authors conclude that the heat stress prolongs the duration of SAC at metaphase I, which in turn extends the time of chromosome biorientation during meiosis I. The study provides preliminary insights into the processes that affect plant reproduction with increasing temperatures which may be relevant to develop climate-resilient cultivars.

      Strengths:

      The authors have mastered the live cell imaging of male meiocytes which is a technically demanding exercise, which they have successfully employed to examine the time course of meiosis in Arabidopsis thaliana plants exposed to different temperature conditions. In continuation, they also monitor the loading dynamics and resident time of fluorescently tagged centromere/kinetochore proteins and spindle assembly checkpoint proteins to precisely measure the time duration of respective proteins to study their precise dynamics and function in male meiosis.

      Weaknesses:

      Here the authors use only one representative centromere protein CENH3, one kinetochore-associated SAC protein BMF1, and the SAC protein BMF3 to conclude that heat stress impairs centromere function and prolongs SAC with increased temperatures. Centromere and its associated protein complex the kinetochores and the SAC contain a multitude of proteins, some of which are well characterized in Arabidopsis thaliana. Hence the authors could have used additional such tagged proteins to further strengthen their claim.

      Indeed, several other proteins have recently been characterized as centromere/kinetochore components and could have been included in the study to further validate the results presented. To strengthen our argument, we have added new experimental data (Figure S4) showing temperature-induced depletion of CENH3 in wild-type plants by immunocytology. Thus, we convincingly show that temperature stress reduces the amount of CENH3. This is likely to affect the loading of most kinetochore and centromeric proteins. Here (new Figure S6) and in our previous study (Capitao et al., 2021), we have shown that genetic depletion of CENH3 in cenh3-4 mutants results in reduced loading of CENPC, MIS12 and BMF1 at mitotic centromeres and reduced loading of BMF3 and BMF1 at meiotic centromeres. We also attempted to assess the levels of CENPC and MIS12 on meiotic chromosomes by immunocytology, but our antibodies, which work on mitotic spreads, did not stain meiotic chromosomes.

      Though the results presented here are interesting and solid, the study lacks a deeper mechanistic understanding of what causes the defective loading of CenH3 to the centromeres, and why the SAC protein BMF3 persists only at meiotic centromeres to prolong the spindle assembly checkpoint. Also, this observation should be interpreted in light of the fact that SAC is not that robust in plants as several null mutants of plant SAC components are known to grow as healthy as wild-type plants at normal growth conditions without any vegetative and reproductive defects.

      Thank you for raising this point. We are of the opinion that SAC operates and it is important in plants - we have added a citation to a preprint from the Schnittger lab (Lampou et al., 2023, BioRxiv) that was published while this manuscript was under review. We think this is the most comprehensive analysis of plant SAC to date, clearly showing that SAC delays progression to anaphase in the presence of spindle inhibitors, although adaptation eventually occurs and the cell cycle progresses. This is very similar to the situation in animals, which also undergo spindle adaptation in similar situations. The difference between plants and animals may be due to subsequent events, where plants are better able to tolerate genome instability and resume cell division in the presence of abnormal chromosome numbers. Robustness and redundancy may be another reason why plant mutants deficient in SAC do not show obvious growth retardation.

      One of the immediate responses to heat stress is the production of heat shock proteins(Hsps), which act as molecular chaperones to safeguard the proteome. It will be interesting to see if the expression levels of known HsPs can be correlated with their role in stabilizing the structure of SAC proteins like BMF1 to prolong its presence at the meiotic kinetochores.

      Indeed, the heat stress response is likely to be involved in this process. We sought to investigate the role of this pathway by analyzing Arabidopsis mutants deficient in HEAT-SHOCK FACTOR BINDING PROTEIN (HSBP), which acts as a negative regulator of the heat shock response. This experiment was prompted by the observation that hsbp mutants have reduced fertility. We expected that an unrestricted heat stress response might affect meiosis and pollen formation. However, our initial experiments did not show altered pollen viability in response to heat stress in hsbp plants and we did not pursue this line of research further.

      Reviewer #3 (Public Review):

      Summary:

      Khaitova et al. report the formation of micronuclei during Arabidopsis meiosis under elevated temperatures. Micronuclei form when chromosomes are not correctly collected to the cellular poles in dividing cells. This happens when whole chromosomes or fragments are not properly attached to the kinetochore microtubules. The incidence of micronuclei formation is shown to increase at elevated temperatures in wild-type and more so in the weak centromere histone mutant cenH3-4. The number of micronuclei formed at high temperatures in the recombination mutant spo11 is like that in wild-type, indicating that the increased sensitivity of cenh3-4 is not related to the putative role of cenh3 in recombination. The abundance of CENH3-GFP at the centromere declines with higher temperature and correlates with a decline in spindle assembly checkpoint factor BMF1-GFP at the centromeres. The reduction in CENH3-GFP under heat is observed in meiocytes whereas CENH3-GFP abundance increases in the tapetum, suggesting there is a differential regulation of centromere loading in these two cell types. These observations are in line with previous reports on haploidization mutants and their hypersensitivity to heat stress.

      Strengths:

      This paper is an important contribution to our insights into the impact of heat stress on sexual reproduction in plants.

      Weaknesses:

      While it is highly significant, I struggled to interpret the results because of the poor quality of the figures and the videos.

      We apologize for the poor quality of the figures. The figure resolution was drastically reduced during the conversion of the manuscript to pdf on publisher web site.

      Reviewer #1 (Recommendations For The Authors):

      To complete the presented analysis, it would be great to analyze the signal strength of the here-presented BMF3 reporter at high temps, see below for further reasoning.

      Quantification of the BMF3 signal is difficult - it is only transiently associated with kinetochores and its level changes over time. Nevertheless, analysis of our movies taken under the same microscope settings indicates that the amount of BMF3 decreases with increasing temperature. This is illustrated in the new Figure S6C.

      Conversely, how is the BMF1 and BMF3 signal strength in cenh3-4 mutants?

      We performed an analysis of BMF1 and BMF3 signal in cenh3-4 mutants and observed a reduced level of signal from both proteins (Figure S6). In the case of BMF1, no signal was detectable in either somatic or meiotic cells.

      How do the authors explain the reduction in BMF1 signal at 26 and 30{degree sign}C versus the extension of the duration of the SAC as measured by the persistence of a BMF3 signal (line 192: "...reduces the amount of CENH3 and the kinetochore protein BMF1 on meiotic centromeres, potentially affecting their functionality..." versus line 213: "...We observed that while the BMF3:GFP signal persisted, on average, for about 22.7 min at 21 and 26{degree sign}C, its appearance was prolonged to 40.5 min at 30{degree sign}C..."). Is the BMF3 signal also reduced at high temps (see question above)?

      This is a very interesting point. While we see reduced levels of both proteins under heat stress or in cenh3-4 plants, the effect on BMF1 is much more pronounced and becomes undetectable under these conditions. This contrasts with BMF3, which appears to be reduced but is still clearly visible. These data suggest that BMF1 is more sensitive to reduced levels of CENH3 and it further corroborates the findings from the Schnittger lab that BMF1 is not the core component of SAC.

      Line 18-20: The observation that heat stress reduces fertility has been made by several research teams before this study. I propose to write "confirm"/"support" etc. instead of "reveal" to avoid a (presumably not intended) false priority claim in the abstract.

      We apologize, this was unintentional and we cite the relevant literature in the article. We have rewritten the abstract to avoid this impression.

      Figure 2: The panel/legend appears to be a bit mixed up. Panel C is described in legend under A. In addition, I cannot find any blue arrows in panel A (which is described as panel B). Correspondingly, the references to the panels in this figure (lines 134/135 and following) need to be updated. I am also not sure how the meiocytes in this figure were stained. The dots look like centromeres but then their intensity rather increases with increasing temperature. If correct, how can this be reconciled with the authors' statement that centromeres decrease in size at higher temps?

      We apologize for the mix up. An early version of the Figure was accidentally submitted and we now corrected it. The Panel B shows DAPI stained meiocytes at the tetrad stage and examples of micronuclei are indicated by arrowheads.

      Line 520: Should read "genotype" not "phenotype".

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) It is intriguing that heat stress impairs only the centromeres and segregation of meiotic chromosomes but not the mitotic chromosomes. No analysis of mitotic divisions is provided in the manuscript. As they have generated marker lines, it is reasonable to examine the mitotic time course as well by live monitoring of root tissues exposed to similar temperature conditions as done for meiotic analysis. This will help to address the effect of heat stress on mitotic centromeres and its comparison with meiosis will provide a better picture. There are two likely outcomes during mitosis:

      (a) It is possible that the heat stress also slows down mitotic progression as well as is the case in meiosis as shown in this paper and hence it is important to examine those as well to compare and contrast the CENH3/BMF1 dynamics in mitosis and meiosis.

      (b) The second scenario is that there is no effect of heat stress on the centromere integrity of mitotic chromosomes. In fact, the authors show indirect evidence in support of this wherein the eYFP: CENH3 showed a strong signal in the tapetal cells (somatic origin) surrounding the male meiocytes (generative origin). It is interesting that somatic cells of the tapetum show a strong signal whereas the meiocytes lack this. The authors should elaborate on this contrasting result.

      The effect we observed seems to be specific to meiosis. We analyzed the progression of mitosis in root cells and we see a negligible effect of temperature on mitotic progression and no micronuclei formation. Interestingly, in terms of CENH3 loading, root cells show a slight decrease in CENH3 at 30°C, in contrast to the situation in tapetum cells. These and other data suggest a tissue/cell specific behavior of centromere maintenance and deserve further analysis. We plan to publish data on mitosis and tissue-specific aspects of CENH3 loading in a separate manuscript.

      (2) Spindle assembly checkpoint (SAC) comprises several core proteins that are recruited to the kinetochores to correct the errors during the defective cell cycle. Here the authors demonstrate the prolonged presence of BMF3 as the only proof to claim that heat stress prolongs the spindle assembly checkpoint during metaphase I. Have the authors observed the dynamics of any other SAC core components such as MAD1, MAD2, MPS1, BUB3, and the like during heat stress?

      No, we did not. We provide several independent lines of evidence that centromere structure and functionality are affected, and spindle checkpoint analysis is only one of them. At the time we designed these experiments, the only experimentally validated and well-characterized component of the SAC was BMF3, and we used only on this protein as SAC reporter because a general analysis of the SAC was not the primary goal of our study. While this paper was under review, a preprint from the Schnittger lab focusing on plant SAC was published that comprehensively analyzed these SAC components in Arabidopsis and provided a solid foundation and resources for further research in this direction. This study also uses BMF3 as a reporter for SAC in meiotic cells. It is noteworthy that despite using different microscopic methods and different plant reporter lines, our labs independently arrived at exactly the same duration of BMF3 association with the kinetochore (i.e. 22 min).

      (3) Is BMF1 a component of SAC or the kinetochore? I understand that BMF1 is a part of the core SAC ( Komaki and Schnittger, 2017) although it localizes to the kinetochore. There are well-characterized kinetochore proteins in Arabidopsis such as Mis12, NUF2, NNF1, and SPC24(MUN1) which the authors could have used as a kinetochore marker. Regardless, here the authors used it as a kinetochore marker. Being a part of SAC, one would expect the prolonged presence of BMF1 similar to BMF3 in the meiotic kinetochores but it is the other way. How to explain these contrasting results?

      As discussed in the public section of the review, BMF1 does not seem to be the core component of SAC. Furthermore, this protein localizes to centromeres/kinetochore throughout the cell cycle and therefore, it cannot be used as SAC reporter.

      (4) Micronuclei can form as a result of chromosome missegregation as shown for spo11-1 and also due to segregation error caused by DNA repair defects. Here it is not clear what is the origin of micronuclei. It is very hard to decipher from live cell imaging. A simple meiotic spread of anthers of different treatments would address the origin of micronuclei.

      Cytology cannot easily determine the origin of micronuclei in meiotic cells. Acentric fragments produced from aberrant DNA repair will still be cytologically detectable only after metaphase I as they are tethered to the remaining chromatin via cohesion. Therefore, we took advantage of spo11 mutants that do not form any meiotic breaks, and hence cannot generate acentric fragments by aberrant repair, to discriminate the origin of micronuclei. We reason that all micronuclei produced in spo11 plants originate from chromosome mis-segregation and their increase at elevated temperature support the notion that heat stress further impairs chromosome segregation.

      (5) Fig.1 B The microspores are not clearly visible in the alexander-stained anthers. It is not clear which is fertile and which is sterile. A better quality picture would be ideal to appreciate the fact.

      Again, we apologize for poor quality of pictures due to manuscript conversion.

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figure 2, it should be pointed out where the micronuclei are. I see here and there a single bright spot. In Arabidopsis, we have noticed bright spots under stress conditions that are autofluorescent signals. It needs to be shown that these spots are not observed in non-GFP lines. Better image quality may help too.

      The micronuclei in Figure 2 are visualized by DAPI staining, not with GFP. The nuclei are now indicated by arrowheads.

      (2) It was not possible to see the centromeres in Figure 3 hence I could not verify the fluorescence intensities of CENH3 and BMF1. There is also something wrong with the color codes blue and red in fig3B, C, and D.

      Again, we apologize for poor quality of pictures due to manuscript conversion.

      (3) Also in the videos it would help to point out where the micronuclei are seen. At what stage were these nuclei quantified? Given that meiosis progression in the cenh3-4 mutant is slower, it may be necessary to wait long enough to see established micronuclei. This information is supposed to be presented in Figure 2C. However, the X-axis shows time, not number. So I presume Fig 2C shows the duration of meiosis stages in the mutant. In Fig 2B, it shows the number of micronuclei per lobe. However, to correlate the incidence of micronuclei formation and the frequency of polyad formation (inviable microspores), one needs the quantification of the numbers of meiocytes carrying micronuclei. Then one can correlate the number of pollen per anther (shown in Fig 1c) with the incidence of micronuclei formation. The question of whether the degree of fertility reduction is due to micronuclei formation is a major issue that should be clarified.

      Then micronuclei were not quantified from the movies, but from DAPI stained whole anthers at the tetrad stage as indicated in the main text. We also apologize for confusion with the Figure 2 as we mixed up the panels in the original submission. This has been corrected in the new submission.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study nicely integrates a breadth of experimental and computational data to address fundamental aspects of RNA methylation by an important for biology and health RNA methyltransferases (MTases).

      Strengths:

      The authors offer compelling and strong evidence, based on carefully performed work with appropriate and well-established techniques to shed light on aspects of the methyl transfer mechanism of the methyltransferase-like protein 3 (METTL3), which is part of the methyltransferase-like proteins 3 & 14 (METTL3-14) complex.

      Weaknesses:

      The significance of this foundational work is somewhat diminished mostly due to mostly efficient communication of certain aspects of this work. Parts of the manuscript are somewhat uneven and don't quite mesh well with one another. The manuscript could be enhanced by careful revision and significant textual and figure edits. Examples of recommended edits that would improve clarity and allow accessibility to a broader audience are highlighted in some detail below.

      We thank the reviewer for the positive evaluation of our work. We have followed the suggestions and modified the text and figures as detailed further in our answers to the specific recommendations.

      Reviewer #2 (Public Review):

      Summary:

      Caflisch and coworkers investigate the methyltransferase activity of the complex of methyltransferaselike proteins 3 and 14 (METTL3-14). To obtain a high-resolution description of the complete catalytic cycle they have carefully designed a combination of experiments and simulations. Starting from the identification of bisubstrate analogues (BAs) as binders to stabilise a putative transition state of the reaction, they have determined multiple crystal structures and validated relevant interactions by mutagenesis and enzymatic assays.

      Using the resolved structure and classical MD simulations they obtained a kinetic picture of the binding and release of the substrates. Of note, they accumulated very good statistics on these processes using 16 simulation replicates over a time scale of 500 ns. To compare the time scale of the release of the products with that of the catalytic step they performed state-of-the-art QM/MM free energy calculations (testing multiple levels of theory) and obtained a free energy barrier that indicates how the release of the product is slower than the catalytic step.

      Strengths:

      All the work proceeds through clear hypothesis testing based on a combination of literature and new results. Eventually, this allows them to present in Figure 10 a detailed step-by-step description of the catalytic cycle. The work is very well crafted and executed.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      To fulfill its potential of guiding similar studies for other systems as well as to allow researchers to dig into their vast work, the authors should share the results of their simulations (trajectories, key structures, input files, protocols, and analysis) using repositories like Zenodo, the plumed-nest, figshare or alike.

      The reviewer is right. We have uploaded the simulation materials to Zenodo: the MD simulation data (trajectories, pdb files, parameter files), and the PLUMED file that was used for the DFTB3/MM metadynamics simulations. We provide the link in the “Data availability” section.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Coberski et al describes a combined experimental and computational study aimed to shed light on the catalytic mechanism in a methyltransferase that transfers a methyl group from Sadenosylmethionine (SAM) to a substrate adenosine to form N6-methyladenosine (m6A).

      Strengths:

      The authors determine crystal structures in complex with so-called bi-substrate analogs that can bridge across the SAM and adenosine binding sites and mimic a transition state or intermediate of the methyltransfer reaction. The crystal structures suggest dynamical motions of the substrate(s) that are examined further using classical MD simulations. The authors then use QM/MM calculations to study the methyl-transfer process. Together with biochemical assays of ligand/substrate binding and enzyme turnover, the authors use this information to suggest what the key steps are in the catalytic cycle. The manuscript is in most places easy to read.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      My main suggestion for the authors is that they show better how their conclusions are supported by the data. This includes how the electron density maps for example support the key interactions and water molecules in the active site and a better error analysis of the computational analyses.

      We thank the reviewer for the comments and suggestions. We have followed the suggestions and added error analysis of the computational results as well as additional figures (in the supplementary information) that illustrate key interactions and water molecules in the active site supported by the electron density.

      Reviewer #1 (Recommendations For The Authors):

      • The phrasing of the second sentence in the introduction is difficult to read. I am not sure it is necessary to define the DRACH motif if you are also giving the exact consensus sequence unless providing more context for other instances of the DRACH motif. Referring to this motif instead as "consensus sequence GGACU? may be more effective.

      The reviewer is right. We corrected the sentence accordingly.

      • In the second paragraph of the introduction, a further short description of how METTL3-14 is "involved" in diseases would be appreciated.

      We thank the reviewer for the comment. We made that clearer by including “by promoting the translation of genes involved in cell growth, differentiation, and apoptosis” together with a reference.

      • Is there any evidence that inhibiting METTL3-14 doesn't negatively impact healthy cells?

      We thank the reviewer for the question. Yes, there is such evidence and we added to the sentence “but not in normal non-leukaemic haemopoietic cells” together with a reference to make this point clearer.

      • Bringing up the MACOM complex in the third paragraph of the introduction is perhaps not necessary unless further discussing the MACOM complex later.

      The reviewer is right. We removed the mention of the MACOM complex.

      • Figure 1B: Color coding is difficult to distinguish on a screen and print out. More contrasting colors would be helpful.

      We thank the reviewer for the suggestion. We removed the transparency from the protein cartoon representation that was the reason for the low contrast.

      • The level of detail in the "MD simulations for mechanistic studies of RNA MTases" is not advised. Would strongly encourage condensing this section to improve clarity and accessibility to a larger audience.

      The reviewer is right. We removed non-essential parts of this paragraph.

      • Confirming the role of the hydroxyl in Y406 would be better supported by a Y406 -> F406 mutant because the A406 mutant could bind differently due to a loss of pi-stacking interactions.

      The purpose of the Y406A mutant was to eliminate the interaction of the aromatic sidechain with adenosine as seen from the structure with BA4. Since there is no involvement of the Y406-OH group with adenosine, mutating to F did not seem sufficient. Furthermore, by mutating Y406 to alanine, we also eliminate the possibility for a water-mediated hydrogen bond to the W398 backbone. Hence, with the alanine mutant we achieve the strongest possible effect on the enzymatic activity while the integrity of the active site is maintained as seen from the thermal shift assay.

      • For Figure 4D, can the authors justify why SAH was used as a metric for SAM binding instead of using SAM directly? Additionally, referring to the RNA as "ligand" instead of "RNA" in the Figure caption is more confusing than simply calling it RNA.

      We thank the reviewer for the comment. With the TSA, we wanted to show that with the adenosine binding mutants, the integrity of the METTL3 active site is still intact. It was shown that SAH is bound with higher affinity than SAM by METTL3 (DOI: 10.1016/j.celrep.2019.02.100). Since the magnitude of the thermal shift depends also on the affinity, we chose the higher-affinity binder SAH. There is no RNA per se shown in this figure. “Ligands” in the figure caption (A) refers to the three bound molecules that are shown and mentioned in the previous sentence: SAM, BA2, and BA4. “Ligand” in the figure caption (D) refers to “SAH” that was used in the experiment described and mentioned just after, but is now removed.

      • Figure 5D is very difficult to interpret. Removing the ribbons representing Y406 movement may make it easier to see. Color coding the Supplementary Movie 1 to match would be also helpful.

      The reviewer is right. We have changed the figure to make the different conformations of METTL3 and its Y406 sidechain clearer. However, we left the coloring of the different conformations as the colors are connected to different time points of the simulation. Following the suggestion of the reviewer we changed the coloring of SAM and AMP to match that of the supplementary movie.

      • Figure 10 is overwhelming as is. Removing the grey area around the binding sites and toning down the color of the substrate binding sites would help with visibility. The size of the chemical structures and illustrations is currently too small to easily be made out. A full page-sized figure may be beneficial for this figure.

      We agree with the reviewer and have changed the figure to make each reaction step clearer and better recognizable.

      Minor >edits

      • Change "Despite the growing knowledge on the diverse pathways" to "Despite growing knowledge of the diverse pathways involving METTL3-14".

      We corrected the sentence.

      • Perhaps use "redundant active site" instead of "degenerate active site".

      We changed the word as suggested.

      • Consider moving "The METTL3 MTase domain has the catalytically active SAM binding site and adopts a Rossmann fold that is characteristic of Class I SAM-dependent MTases" to before "METTL14 also has an MTase domain, however, with a degenerate active site of hitherto unknown function, and so-called RGG repeats at its C-terminus essential for RNA binding" to keep information about METTL3 together.

      We shifted the part of the text as suggested.

      • "Molecular dynamics studies have mainly focused on protein and bacterial MTases"? Does this mean bacterial MTases that methylate proteins?

      We thank the reviewer for the comment. This means bacterial MTases in general. The example that we mention is of a bacterial MTase that methylates a chemical precursor. We changed the sentence slightly to make that clearer.

      • In "Bisubstrate analogues bind in the METTL3 active site", please consider the following:

      • Change "and to investigate" to "and investigated".

      • Briefly describe the enzymatic assay in the main text.

      • Either more clearly defining "least potent" or change to "have the highest IC50 values".

      We made all the suggested changes to improve the description of the assay and its outcomes.

      • In Figure 3, remove some of the amino acid labels from panels A, C, and E for clarity, especially since panels B, D, and F more clearly demonstrate the interactions.

      We removed amino acids that were not involved in polar contacts and adapted the figure caption accordingly.

      • In panels 3D, 3F, and 4B, the lightning bolts are too small to make out as lightning bolts. An asterisk or other symbol may be easier to distinguish.

      We made the lightnings more than double the size to make them better recognizable.

      • In Figure 4C, no units are provided on the y-axis. Additionally, I do not believe the arrows indicating "Loss of activity" are necessary.

      These are arbitrary units as it is a ratio which is explained in the materials and methods section. We removed the arrows following the suggestion of the reviewer.

      • While demonstrating mutants with no activity still retain SAM binding is suggestive of the mutant impacting RNA binding, this would still be better supported with RNA binding studies. Electrophoretic mobility shift assays would be sufficient if Tm studies are time-consuming. While these experiments could be informative, we also acknowledge that they may be outside the scope of this current report.

      We thank the reviewer for suggesting these experiments and acknowledging that they would be outside of the scope of the current study. Such RNA binding experiments can turn out to be very time consuming, both in TSA and EMSA. The reason is mainly this: The RNA substrate must be chosen such that it binds sufficiently strong to the WT to cause an effect (thermal shift or electrophoretic mobility shift), but also to observe a clear difference in binding between WT and mutant proteins. Since many more residues of METTL3 and METTL14 contribite to RNA binding, the effects of individual mutants on affinity might be too small to be confidently detected in TSA or EMSA. In particular, we only identified the substrate adenosine binding residues, and mutating them and hence preventing adenosine binding alone, might not have a big effect on overall RNA binding affinity. The enzymatic assay that we used, on the other hand, is more sensitive since the detection is fluorescence based and quantifies the conversion of A to m6A in an RNA substrate, and more factors than just affinity play a role for enzymatic activity, such as correct orientation and stability of the adenosine in the active site and stabilization of the transision state.

      • A written narrative to accompany Supplementary Movie 1 would make it much more accessible to those unfamiliar with modeling and simulations.

      We thank the reviewer for the comment. We expanded the caption to the movie with a narrative describing different events at different time points in the movie.

      • Table 3 could be made clearer to those without MD experience by defining/indicating the top row as different computational models.

      The reviewer is right. We have added a footnote to Table 3 to clearly indicate the different density functional theory and semi-empirical density functional tight binding method used in this study. We also added another line in the table.

      • In the conclusion, the authors state "the height of the QM/MM free-energy barrier indicates that the methyl transfer step is not rate-determining." How does this compare to experimental data? Additional kinetic assays to demonstrate this experimentally would go a long way in convincing the reader of this conclusion.

      We thank the reviewer for the question. Kinetic assays have been performed for METTL3-14 and we mention and reference them in the text. We believe that further kinetic experiments would be outside of the scope of this study. Furthermore, the METTL3 mutants that we made show no activity in our enzymatic assay and hence kinetic studies would be probably impossible to do with them.<br /> As we show from QM/MM and describe in the text, the methyl cation in the SAM cofactor is transferred directly to the N6 position of the adenosine substrate. DFTB3/MM free energy simulations show that this mechanism has an energetic barrier of 15-16 kcal/mol. The turnover as published based on an enzymatic assay is 0.2-0.6 min-1 at ambient temperature which implies a barrier of ~20 kcal/mol. This value is higher than that determined for the methyl transfer alone as determined by QM/MM. Hence, in the overall mechanism, there must be a step that is slower than the methy transfer and hence we conclude that the methyl transfer is not the rate-limiting step.

      Reviewer #3 (Recommendations For The Authors):

      I only have a few comments about the work.

      (1) It would be good if the authors could show more of the data that is used as the basis for their conclusions. For example, IC50 values are presented (Table 1) without error estimates or an indication of the quality of the data that is used to estimate the data.

      We thank the reviewer for the suggestion. We included errors of the IC50 values and show the dose response curved from the enzymatic assay with the BAs as inhibitors in a new Supplementary Figure S1.

      (2) More substantially, it would be good to have a more detailed analysis of the crystal structures in terms of the properties that are mentioned/analysed. While the structures are relatively good (2.1 Å2.5Å), it is not clear to the reader how this data supports the interactions that are proposed. For example, the authors pinpoint a number of hydrogen bonding interactions and water molecules in the complexes. They might consider showing support for some of these in the electron density maps. Similarly, it would be good to show the densities that support the substantial differences of the Ade in the BA2 and BA4 complexes. These might be supplementary files. I note also that the structures are not yet released or available for analysis [which of course is a valid choice but also means that I cannot inspect the maps myself].

      We have added supplementary figures supporting the conformations of the BAs and their interactions with METTL3 with electron density, for BA1 and BA6 in Supplementary Figure S2, and for BA2 and BA4 in a new Supplementary Figure S3.

      (3) It would be useful with an error analysis of the off-rates estimated from the MD simulations and a discussion of the accuracy of these estimates. Even the slower dissociation events seem quite fast. What are the rough affinities of these molecules and how fast would the binding need to be to be compatible with the affinity and estimated off-rates?

      We expanded upon this in the results paragraph concerning the MD simulations. The affinities of METTL3-14 binding to AMP or m6AMP can be expected to be very low, with Kd values in the millimolar range. We have not measured these Kd values, nor have we found any published data, but we have conducted thermal shift assays with A and m6A and did not observe any significant thermal shifts in the melting temperature of METTL3-14 at high micromolar concentrations of these compounds, indicative of a very low binding affinity. This is to be expected because METTL3-14 should not methylate adenosines unspecifically but rather in the GGACU motif of substrate mRNA.

      (4) The authors use QM/MM simulations with metadynamics to estimate the energy profile of the methyl transfer reaction. They find a barrier of ca. 15 kcal/mol and suggest this to be compatible with the enzymatic turnover rate of ca. 0.3/min. Here it would be good with a clearer description of the possible sources of error and assumptions in making these statements. First, what is the error on the estimated energy profile from the metadynamics? The authors mention the analysis of progression of the PMF as a function of time, but that is in itself not a strong test for convergence (the PMF may stay constant if there is little sampling). What does the time series of the CV look like? Second, it seems as if the authors are assuming a large pre-exponential factor (10^9/s ?). Is that correct, and how sure are they of this value? Finally, when linking the barrier of the methyl-transfer reaction to the overall turnover rate it sounds like they assume that other parts of the reaction do not affect the turnover rate. Is that correctly understood, and what is the evidence for that? It sounds like the authors are saying that step 5 in the cycle (Figure 10) is limiting.

      We thank the reviewer for the questions. Accordingly, we have carried out additional simulations and statistical error analyses.

      (i) We have carried out two additional sets of multi-walker metadynamics simulations with the same setup as the original calculation, except for using different initial random seeds. Using the three independent sets of metadynamics simulations, we can better estimate the statistical uncertainty for the computed potential of mean force (PMF). We have updated the PMF in Fig. 8b, in which the solid curve represents the result averaged over three independent runs, and the shaded area represents the standard error of the mean of the three replicas. The figure caption of Fig. 8b is revised accordingly.

      (ii) To further illustrate the convergence behavior of the metadynamics simulations, we have included the following supplementary files: (1). Potentials of mean force computed with different numbers of deposited Gaussians are compared. (2). As suggested by the reviewer, we show the time series of the collective variable (CV) sampled by the 24 independent walkers during one set of metadynamics simulations. These results clearly indicate that the CV exhibits diffusive behaviors between the reactant and product regions, further supporting the adequate sampling and convergence of our metadynamics simulations.

      (iii) Regarding the issue of pre-factor used in the rate estimate, we have indeed used the common approximation of kT/h as in the regular transition state theory. Many studies in the literature support the use of this expression for very localized chemical reactions in enzymes. We have included several representative references along this line: (1) M. Garcia-Viloca, J. Gao, M. Karplus, D. G. Truhlar, How enzymes work: Analysis by modern rate theory and computer simulations, Science, 303, 186-195 (2004) (2) D. R. Glowacki, J. N. Harvey, A. J. Mulholland, Taking Ockham’s razor to enzyme dynamics and catalysis, Nat. Chem. 4, 169-176 (2012)

      (iv) Regarding the nature of the rate-limiting event, please see our response to reviewer 1.

      (5) The authors should ideally make the input files for their simulations available and deposit the plumed files in for example plumed-nest (as indicated in their reference 100).

      We agree with the reviewer. Accordingly, we have uploaded the PLUMED file that we have used for the DFTB3/MM metadynamics simulations (plumed.dat) together with the MD simulation trajectories to Zenodo.

      Minor

      (1) Many of the details in Figure 10 are very small and difficult to read without zooming in. Consider whether some parts could be made larger.

      The reviewer is right. We have changed the figure to make each reaction step clearer and better recognizable.

    1. Author Response:

      We would like to sincerely thank the referees and the editor for their time in considering our manuscript. The electrophysiology of bacteria is a fast-moving complex

      field and is proving contentious in places. We believe the peer review process of eLife provides an ideal mechanism to address the issues raised on our manuscript in an open and transparent manner. Hopefully we will encourage some more consensus in the field and help understand some of the inconsistencies in the current literature that are

      hampering progress.

      The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility

      data based on a simple Nernstian battery model (they assume E. coli are unexcitable matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in

      fact it is a problem with their simple battery model.

      In terms of the previous microbiology literature, the assumption of no voltage-gated ion channels in E. coli suggested by referee 2 is a highly contentious niche ideology. The majority of gene databases for E. coli have a number of ion-channels annotated as voltage sensitive due to comparative genetics studies e.g. try the https://bacteria.ensembl.org/ database (the search terms ‘voltage-gated coli’ give 2521 hits for genes, similarly you could check www.uniprot.org or www.biocyc.org) and M.M.Kuo, Y.Saimi, C.Kung, ‘Gain of function mutation indicate that E. coli Kch form a functional K + conduit in vivo’, EMBO Journal, 2003, 22, 16, 4049. Furthermore, recent microbiology reviews all agree that E. coli has a number of voltage-gated ion channels S.D.Beagle, S.W.Lockless, ‘Unappreciated roles for K + channels in bacterial physiology’,Trends in microbiology, 2021, 29, 10, 942-950. More emphatic experimental data is seen in spiking potentials that have been observed by many groups for E. coli, both directly using microelectrodes and indirectly using genetically expressed fluorophores, ‘Electrical spiking in bacterial biofilms’ E.Masi et al, Journal of the Royal Society Interface, 2015, 12, 102, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, J.M.Kralj, et al, Science, 2011, 333, 6040, 345 and ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 2023, 120, 3, e2208348120. The only mechanism currently known to cause spiking potentials in cells is due to positive feedback from voltage-gated ion channels (you need a mechanism to induce the oscillations). Indeed, people are starting to investigate the specific voltage-gated ion channels in E. coli and a role is emerging for calcium in addition to potassium e.g. ‘Genome-wide functional screen for calcium transients in E. coli identifies increased membrane potential adaptation to persistent DNA damage’, R.Luder, et al, J.Bacteriology, 2021, 203, 3, e00509.

      In terms of recent data from our own group, electrical impedance spectroscopy (EIS) experiments from E. coli indicate there are large conductivity changes associated with the Kch ion channels (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior',

      E.Akabuogu et al, ACS Nanoletters, 2024, in print). EIS experiments pr be the electrical phenomena of bacterial biofilms directly and do not depend on fluorophores i.e. they can’t be affected by ThT.

      Attempts to disprove the use of ThT to measure hyperpolarisation phenomena in E. coli using fluorescence microscopy also seem doomed to failure based on comparative control experiments. A wide range of other cationic fluorophores show similar behaviour to ThT e.g. the potassium sensitive dye used in our eLife article. Thus the behaviour of ThT appears to be generic for a range of cationic dyes and it implies a simple physical mechanism i.e. the positively charged dyes enter cells at low potentials. The elaborate photobleaching mechanism postulated by referee 2 seems most unlikely and is unable to explain our data (see below). ThT is photostable and chemically well- defined and it is therefore used almost universally in fluorescence assays for amyloids.

      A challenge with trying to use flagellar motility to measure intracellular potentials in live bacteria, as per referee 2’s many publications, is that a clutch is known to occur with E. coli e.g. ‘Flagellar brake protein YcgR interacts with motor proteins MotA and FliG to regulate the flagellar rotation speed and direction’, Q.Han et al, Frontiers in Microbiology, 2023, 14. Thus bacteria with high membrane potentials can have low motility when their clutch is engaged. This makes sense, since otherwise bacterial motility would be enslaved to their membrane potentials, greatly restricting their ability to react to their environmental conditions. Without quantifying the dynamics of the clutch (e.g. the gene circuit) it seems challenging to deduce how the motor reacts to Nernstian potentials in vivo. As a result we are not convinced by any of the Pilizota group articles. The quantitative connection between motility and membrane potential is too tenuous.

      In conclusion, the articles questioning the use of ThT are scientifically flawed and based on a niche ideology that E. coli do not contain voltage-gated ion channels. The current work disproves the simple Nernstian battery (SNB) model expounded by Pilizota et al, unpersuasively represented in multiple publications by this one group in the literature (see below for critical synopses) and demonstrates the SNB models needs to be replaced by a model that includes excitability (demonstrating hyperpolarization of the membrane potential).

      In the language of physics, a non-linear oscillator model is needed to explain spiking potentials in bacteria and the simple battery models presented by Pilizota et al do not have the required non-linearities to oscillate (‘Nonlinear dynamics and chaos’, Steve Strogatz, Westview Press, 2014). Such non-linear models are the foundation for describing eukaryotic electrophysiology, e.g. Hodgkin and Huxley’s Nobel prize winning research (1963), but also the vast majority of modern extensions (‘Mathematical physiology’, J.Keener, J.Sneyd, Springer, 2009, ‘Cellular biophysis and modelling: a primer on the computational biology of excitable cells’, G.C.Smith, 2019, CUP, ‘Dynamical systems in neuroscience: the geometry of excitability and bursting’, E.M.Izhikevich, 2006, MIT and ‘Neuronal dynamics: from single neurons to networks and models of cognition’, W.Gerstner et al, 2014, CUP). The Pilizota group is using modelling tools from the 1930s that quickly were shown to be inadequate to describe eukaryotic cellular electrophysiology and the same is true for bacterial electrophysiology (see the ground breaking work of A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 7576, 59 for the use of Hodgkin-Huxley models with bacterial biofilms). Below we describe a critical synopsis of the articles cited by referee 2 and we then directly answer the specific points all the

      referees raise.

      Critical synopsis of the articles cited by referee 2:

      1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli chassis which uses Na + instead of H + for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K + compared with 0.0000001 M of H + in E. coli, so K + is arguably a million times more important for the membrane potential than H + and thus the electrophysiology! Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H + . This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K + ) around! In our model Figure 4A is better explained by depolarisation due to K + channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K + . The manuscript is incorrect as a result and I would not recommend publication. In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees:

      Reviewer #1:

      Summary:<br /> Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:<br /> - The authors report original data.<br /> - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.<br /> - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.<br /> - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.<br /> - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:<br /> - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by

      Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al,‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2:

      Summary of what the authors were trying to achieve:<br /> The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:<br /> The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms  (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3:

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      1. An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      2. The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      3. It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      4. The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      5. Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential  dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      6. Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C). 

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      1. In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      2. In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      3. The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

    2. Author Response

      We would like to sincerely thank the referees and the editor for their time in considering our manuscript. The electrophysiology of bacteria is a fast-moving complex field and is proving contentious in places. We believe the peer review process of eLife provides an ideal mechanism to address the issues raised on our manuscript in an open and transparent manner. Hopefully we will encourage some more consensus in the field and help understand some of the inconsistencies in the current literature that are hampering progress.

      The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility data based on a simple Nernstian battery model (they assume E. coli are unexcitable matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in fact it is a problem with their simple battery model.

      In terms of the previous microbiology literature, the assumption of no voltage-gated ion channels in E. coli suggested by referee 2 is a highly contentious niche ideology. The majority of gene databases for E. coli have a number of ion-channels annotated as voltage sensitive due to comparative genetics studies e.g. try the https://bacteria.ensembl.org/ database (the search terms ‘voltage-gated coli’ give 2521 hits for genes, similarly you could check www.uniprot.org or www.biocyc.org) and M.M.Kuo, Y.Saimi, C.Kung, ‘Gain of function mutation indicate that E. coli Kch form a functional K+ conduit in vivo’, EMBO Journal, 2003, 22, 16, 4049. Furthermore, recent microbiology reviews all agree that E. coli has a number of voltage-gated ion channels S.D.Beagle, S.W.Lockless, ‘Unappreciated roles for K+ channels in bacterial physiology’, Trends in microbiology, 2021, 29, 10, 942-950. More emphatic experimental data is seen in spiking potentials that have been observed by many groups for E. coli, both directly using microelectrodes and indirectly using genetically expressed fluorophores, ‘Electrical spiking in bacterial biofilms’ E.Masi et al, Journal of the Royal Society Interface, 2015, 12, 102, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, J.M.Kralj, et al, Science, 2011, 333, 6040, 345 and ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 2023, 120, 3, e2208348120. The only mechanism currently known to cause spiking potentials in cells is due to positive feedback from voltage-gated ion channels (you need a mechanism to induce the oscillations). Indeed, people are starting to investigate the specific voltage-gated ion channels in E. coli and a role is emerging for calcium in addition to potassium e.g. ‘Genome-wide functional screen for calcium transients in E. coli identifies increased membrane potential adaptation to persistent DNA damage’, R.Luder, et al, J.Bacteriology, 2021, 203, 3, e00509.

      In terms of recent data from our own group, electrical impedance spectroscopy (EIS) experiments from E. coli indicate there are large conductivity changes associated with the Kch ion channels (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print). EIS experiments probe the electrical phenomena of bacterial biofilms directly and do not depend on fluorophores i.e. they can’t be affected by ThT.

      Attempts to disprove the use of ThT to measure hyperpolarisation phenomena in E. coli using fluorescence microscopy also seem doomed to failure based on comparative control experiments. A wide range of other cationic fluorophores show similar behaviour to ThT e.g. the potassium sensitive dye used in our eLife article. Thus the behaviour of ThT appears to be generic for a range of cationic dyes and it implies a simple physical mechanism i.e. the positively charged dyes enter cells at low potentials. The elaborate photobleaching mechanism postulated by referee 2 seems most unlikely and is unable to explain our data (see below). ThT is photostable and chemically well-defined and it is therefore used almost universally in fluorescence assays for amyloids.

      A challenge with trying to use flagellar motility to measure intracellular potentials in live bacteria, as per referee 2’s many publications, is that a clutch is known to occur with E. coli e.g. ‘Flagellar brake protein YcgR interacts with motor proteins MotA and FliG to regulate the flagellar rotation speed and direction’, Q.Han et al, Frontiers in Microbiology, 2023, 14. Thus bacteria with high membrane potentials can have low motility when their clutch is engaged. This makes sense, since otherwise bacterial motility would be enslaved to their membrane potentials, greatly restricting their ability to react to their environmental conditions. Without quantifying the dynamics of the clutch (e.g. the gene circuit) it seems challenging to deduce how the motor reacts to Nernstian potentials in vivo. As a result we are not convinced by any of the Pilizota group articles. The quantitative connection between motility and membrane potential is too tenuous.

      In conclusion, the articles questioning the use of ThT are scientifically flawed and based on a niche ideology that E. coli do not contain voltage-gated ion channels. The current work disproves the simple Nernstian battery (SNB) model expounded by Pilizota et al, unpersuasively represented in multiple publications by this one group in the literature (see below for critical synopses) and demonstrates the SNB models needs to be replaced by a model that includes excitability (demonstrating hyperpolarization of the membrane potential).

      In the language of physics, a non-linear oscillator model is needed to explain spiking potentials in bacteria and the simple battery models presented by Pilizota et al do not have the required non-linearities to oscillate (‘Nonlinear dynamics and chaos’, Steve Strogatz, Westview Press, 2014). Such non-linear models are the foundation for describing eukaryotic electrophysiology, e.g. Hodgkin and Huxley’s Nobel prize winning research (1963), but also the vast majority of modern extensions (‘Mathematical physiology’, J.Keener, J.Sneyd, Springer, 2009, ‘Cellular biophysics and modelling: a primer on the computational biology of excitable cells’, G.C.Smith, 2019, CUP, ‘Dynamical systems in neuroscience: the geometry of excitability and bursting’, E.M.Izhikevich, 2006, MIT and ‘Neuronal dynamics: from single neurons to networks and models of cognition’, W.Gerstner et al, 2014, CUP). The Pilizota group is using modelling tools from the 1930s that quickly were shown to be inadequate to describe eukaryotic cellular electrophysiology and the same is true for bacterial electrophysiology (see the ground breaking work of A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 7576, 59 for the use of Hodgkin-Huxley models with bacterial biofilms). Below we describe a critical synopsis of the articles cited by referee 2 and we then directly answer the specific points all the referees raise.

      Critical synopsis of the articles cited by referee 2:

      (1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      (2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli chassis which uses Na+ instead of H+ for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      (3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K+ compared with 0.0000001 M of H+ in E. coli, so K+ is arguably a million times more important for the membrane potential than H+ and thus the electrophysiology!

      Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H+. This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K+) around!

      In our model Figure 4A is better explained by depolarisation due to K+ channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      (4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K+. The manuscript is incorrect as a result and I would not recommend publication. In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees

      Reviewer #1 (Public Review):

      Summary: Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:

      • The authors report original data.

      • For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.

      • The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.

      • The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.

      • Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:

      • Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      • Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      Electrical signal propagation is an important aspect of the manuscript. However, a detailed >quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      • Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      • The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      • The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      • Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:

      The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:

      The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+ channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gKn^4 for potassium, gNam^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C).

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Fig. 3C needs the "still" for the movie of control C. owczarzaki (in Movie S1).

      We have now added a WT control in this figure panel.

      (2) The elongated cell shape is seen infrequently in control cells, and I wonder whether these events are transient inactivation of coHpo or coWts in these cells. Perhaps the authors could comment on this in the discussion.

      This is an interesting possibility and we have now included it in our discussion (Lines 401403).

      (3) Does C. owczarzaki normally aggregate or this is a lab-specific phenotype? For example, the slime mold Dictyostelium discoideum forms aggregates during its life cycle. Could some additional information about C. owczarzaki be added to the introduction?

      Unfortunately little is known about Capsaspora “in the wild”, as it was isolated as an endosymbiont from a laboratory strain of snails. However, some related filasterians isolated from natural environments also show aggregatve ability, indicating that aggregation is in fact a physiological process in this group of organisms. We have updated our introduction to include this fact (Line 78-80).

      Reviewer #2 (Recommendations For The Authors):

      The studies on Hippo signalling in Capsaspora are currently limited to genetic experiments and analysis of Yki/YAP localisation. Biochemical evidence that Co Wts phosphorylates Co Yki/YAP on a conserved serine residue(s) would give important further evidence that this essential signalling step in the animal Hippo pathway is conserved in Capsaspora. However, such experiments require antibodies that detect specific phosphorylation events, which might not be available at present. Is mass spectrometry of the phospho-proteome a potential approach that could be employed to investigate this? The benefit of this approach is it would give information on other Hippo pathway proteins and could be used to probe signalling events under different culture conditions (e.g., aggregate, non-aggregate).

      In response to this recommendation, we attempted to detect Phospho-coWts and PhosphocoHpo using commercial antibodies against mammalian their homologs, in the hope of cross-species reactivity. However, we could not detect a signal by Western blot. Thus better reagents or refinement of techniques beyond the scope of this article may be required to examine the phosphorylation of these Capsaspora proteins. There was a published report of Capsaspora phosphoproteome analysis (Sebe Pedros et al., 2016 Dev Cell), although phosphorylation of the conserved sites on coYki, coWts, and coHpo was not reported in this analysis, suggesting more targeted approaches may be needed to examine phosphorylation of these core Hippo pathway components.

      The following statement that Wts LOF is stronger than Hpo LOF Capsaspora is consistent with overgrowth phenotypes in flies and mammals:

      "Interestingly, we found that coWts-/- cells were significantly more likely to show nuclear mScarlet-coYki localization than coHpo-/- cells (Figure 1D), which is consistent with Hpo/MST independent activity of Wts/LATS previously reported in Drosophila and mammals (Zheng et al., 2015)."

      However, the following statement describes a stronger phenotype in Hpo LOF Capsaspora than Wts LOF:

      "As contractile cells in the coHpo mutant background tended to show a more extreme elongated morphology than the coWts mutant, we focused on the coHpo mutant for further analysis."

      Does this mean that Hpo can regulate actomyosin contractility in both Wts/Yki-dependent and independent manners? A genetic experiment, similar to those that have been performed in Drosophila and mammals could help to address this, e.g., what is the phenotype of Hpo, Yki Capsaspora and Wts, Yki double mutant Capsaspora? Do they phenocopy Yki LOF Capsaspora and are the actomyosin phenotypes associated with Hpo and Wts mutant Capsaspora completely or partially suppressed? The authors indicate that generation of double mutant Capsaspora is not technically possible at present, however.

      Indeed given available techniques the generation of such double mutants is not currently possible. With this phenotype (aberrant cytoskeletal dynamics), it is hard to say what a “stronger” phenotype is, and which mutant has the “stronger” phenotype. We have edited this statement to try and reflect this point (Line 208-209).

      Another outstanding question is whether the Hpo/Wts/Yki-related actomyosin phenotypes are linked to regulation of transcription by Yki, or are regulated non-transcriptionally. Indeed, a non-transcriptional role for Drosophila Yki in promoting actomyosin contractility has been reported (Fehon lab, Dev Cell, 2018). Generation of Scalloped/TEAD mutant Capsaspora would allow this question to be investigated. Alternatively, this could be explored using variant Co Yki transgenes, e.g., one a Co Yki transgene does not form a physical complex with Co Sd/TEAD and a Co Yki transgene that is targeted to the cell cortex.

      To address this point, we tested whether a conserved amino acid residue in coYki (F123) that is required for transcriptional activity of human YAP (in this case, F95) is required for the phenotypic effects of the coYki 4SA mutant. We found that, in contrast to expression of coYki 4SA, expression of a coYki 4SA F123A mutant showed no effect on cell or aggregate morphology. These new results, which support a requirement for transcriptional activity for coYki function, have now been added to Figure 7.

      Reviewer #3 (Recommendations For The Authors):

      Repetition from previous publication:

      (1) ej: last sentences of the abstract in both works: From Phillips et al. eLife 2022;0:e77598: "Taken together, these findings implicate an ancestral role for the Hippo pathway in cytoskeletal dynamics and multicellular morphogenesis predating the origin of animal multicellularity, which was co-opted during evolution to regulate cell proliferation".

      From this manuscript: "Together, these results implicate cytoskeletal regulation but not proliferation as an ancestral function of the Hippo pathway and uncover a novel role for Hippo signaling in regulating cell density in a proliferation-independent manner "

      Our two papers deal with different components of the Hippo pathway: Yorkie/YAP/coYki in Phillips et al. eLife 2022;0:e77598 and upstream kinases in the current paper. The fact that perturbing different components of the pathway leads to similar conclusions actually strengthens the overall conclusion. Nevertheless, to be more clear about the novelty of the current manuscript, we have now changed the current text from “Hippo pathway” to “Hippo kinase cascade”, to emphasize that the current analysis deals with kinases upstream of Yorkie/YAP/coYki (Lines 35, 368-371).

      (2) The authors claim that the change in localization of coYki in Hpo -/- and Wts -/- , being now able to enter the nucleus, is the demonstration that the nuclear regulation of Yki by the Hippo pathway is ancestral to animals. Nevertheless, the authors had already made this claim in their publication of eLife 2022, when they made a mutant version of Yki with the four conserved phosphorylation sites (Sebé-Padrós 2012) mutated. Figure 5 A to F in Phillips et al. eLife 2022;0:e77598. In their words "This regulation of coYki nuclear localization, along with the previous finding that coYki can induce the expression of Hippo pathway genes when expressed in Drosophila (Sebé-Pedrós et al., 2012), suggests that the function of coYki has a transcriptional regulator and Hippo pathway effector is conserved between Capsaspora and animals. ".

      I understand that the localization of Yki in the coHpo-/- and coWts-/- is needed as part of final proof that Hpo and Wts are the kinases that control Yki phosphorylation in C. owczarzaki, but does not constitute a completely new message and should be written like that. Figure 1C of the actual manuscript drives to the same conclusion as Figure 5 A to F in Phillips et al. eLife 2022;0:e77598

      We think that demonstrating that Hippo and Warts orthologs specifically are responsible for regulation of coYki localization is a very important finding: Many unicellular organisms encode Hippo, Warts, and/or Yorkie’s transcriptional factor partner Sd, but not Yorkie. Our understanding is that in these earlier-branching unicellular organisms, the Hippo/Warts kinase module and Sd-like proteins functioned in distinct signaling modules. Thus Yorkie has the interesting property of “fusing” these two distinct signaling modules when it emerged. In this framework, it is interesting to show that this “fusion” occurred in Capsaspora, the most distant known relative of animals with a Yorkie ortholog, indicating that this “fusion” event is very ancient. Although fleshing out of this idea is beyond the scope of this manuscript and we plan to write about it elsewhere, we have modified our discussion to point out the importance that Hippo and Warts specifically are upstream regulators of coYki.

      In Drosophila among the genes transcriptionally regulated by Yki, are the positive regulators of the Hippo pathway in order to down regulate the Yki production.

      (1) The authors don't explain if these upstream regulators of the Hippo pathway are conserved in C. owczarzaki.

      We have now indicated the conservation of some upstream Hippo pathway components (Line 69-71).

      (2) Also it would be important to know how much coYki is being active in the C. owczarzaki in the mutant lines of coHpo-/- and coWts-/- in respect to wt and also in respect to coYki 4SA, and how this is impacting the transcription and protein production of down stream genes of coYki. I think some transcriptional and proteomic data would be informative. At least for those genes related with cytoskeleton.

      We have now performed RNA-seq on the coHpo and coWts mutants to address the concerns above (See Figure 8 and the final section of Results).

      Related with the above. Among the downstream targets of coYki, the authors mentioned in their previous work (Phillips et al. eLife 2022;0:e77598) that B-integrins were up regulated in coYki -/- suggesting that B-integrins could be behind the stronger cell-substrate attachment observed in the coYki-/- mutant. It would be important to investigate if the integrin adhesome is now down regulated and how previous and new results are related to the stronger cellsubstrate attachment in the coHpo-/- and coWts-/- lines. It would be important that previous results on coYki-/-, a mutant line of the same pathway, are discussed in these two new mutant contexts.

      Two Capsaspora integrin beta genes were previously found to be upregulated in the coYki mutant (CAOG_05058 and CAOG_01283, from Phillips et al., 2022 eLife). In our coWts and coHpo mutant RNAseq data, we see that CAOG_05058 is upregulated in both coHpo and coWts mutants, whereas CAOG_01283 does not show significantly different expression in either the coHpo or coWts mutant. Because the CAOG_05058 expression data seems to go in the “opposite” direction than you might expect (i.e. not “down regulated” as the reviewer predicts), and because we see no change in expression in CAOG_01283, these results are difficult to interpret. Therefore the role of integrins in Capsaspora Hippo pathway mutant phenotypes is thus still an open question.

      Some cells from the coHpo-/- and coWts-/- mutant lines, show higher attachment to the substrate, which results in an elongated shape while the cell detaches from the substrate. The authors claim this phenotype as a contractile behavior in these cells. This behavior would be caused by changes in cytoskeleton regulation or increased number of microvilli or a change in the distribution of microvilli.

      (1) In my opinion, this phenotype can not be considered a behavior per se (the cells become round once they are free from the substrate, so the elongation is temporal and the contractile behavior is a consequence from this attachment to the substrate), so I would not say that the Hippo pathway controls a contractile behavior as the authors state as one of the main conclusions of the manuscript.

      Many cell behaviors are known to depend on external conditions, such as substrates, growth factors, nutrients, chemokines, etc., and are therefore “temporal” by the reviewer’s criteria. We therefore feel that the phenotype we describe here can be considered a cell behavior.

      (2) On the other hand I think that further efforts on microscopy or immunocytochemistry could be performed in order to discern among the different causes; more microvilli? change in microvilli distribution? change in the acto-myosin cytoskeleton? Moreover these options are not mutually exclusive and very likely the explanation is multifactorial.

      (3) coWts-/- has a different phenotype at the periphery of the aggregates than coHpo-/-. The authors use stable transfected lines with NMM-Venus to visualize microvilli. It would be interesting that further experiments using this tool would be performed in order to visualize putative differences of the cell membrane at the periphery in the two mutant genotypes.

      We have now performed experiments examining filopodia in round vs elongated cells using the NMM-venus marker, as well as differences in filopodial morphology within aggregates in the different genotypes. Our data and conclusions are included in our updated manuscript (Figure 3- figure supplement 1).

      The authors nicely inspect the consequences of the mutant lines coHpo-/- and coWts-/- in the formation of the aggregates. They find that the aggregates in these cases are more densely packed likely due to the higher attachment from microvilli, which they are able to revert by using myosin inhibitors.

      (1) As mentioned above, it would be interesting that further experiments are performed by using NMM-Venus transfection into the coHpo-/- and coWts-/-genotypes in order to visualize putative differences of the strength and distribution of the microvilli in the aggregates of these two mutant genotypes. These experiments would inform if more or less microvilli contacts are created in these lines and support a mechanical explanation of the denser aggregates in the mutant lines, as they now suggest in the discussion.

      We have now performed these experiments, and our data and conclusions are described in the updated manuscript (Figure 5- figure supplement 1).

      (2) On the other hand, myosin inhibition through blebbistatin increases the number of elongated cells in the mutant lines, demonstrating that myosin is necessary for the cells to resolve their substrate attachment and become round. In my view is confusing that myosin is needed for cells to become round again (wt phenotype) and at the same time myosin inhibition is needed for aggregates to become less dense (wt phenotype). Do they lose density because more elongated cells are now in the aggregate? These results look confusing to me and I think they should be better discussed. Again the above transfections of NMM-Venus into the coHpo-/- and coWts-/-genotypes could be informative.

      We have attempted to detect cells with an “elongated” morphology within WT and mutant aggregates but so far have been unable to visualize such cells. More advanced microscopy techniques at extended time scales may allow us to observe such things, but we believe such studies are beyond the scope of this manuscript.

      The authors do not connect and discuss their results with a very relevant study done in Drosophila, Xu J et al. Dev Cell. 2018; 46(3): 271-284.e5, where a transcriptionally independent role of Yki is characterized. In Drosophila, Yki has an important role in a positive feedback loop with myosin at the cortical part of the cell, which is especially relevant for cytoskeleton regulation.

      The results encountered by the authors in their previous study with coYki-/-, indicated that coYki was important for proper actin dynamics and cell shape in C. owczarzaki. At that moment they did not interrogate if this phenotype could be due to the lack of a possible role of coYki in the cortex and they argue that the phenotype was caused by the lack of transcription regulation of downstream genes of coYki, which actually many were cytoskeleton related.

      Because the cortex function of Yki is independent of regulation of Hpo and Wts, the authors could use these genotypes by comparing them with WT (where the cortical role of Yki should be the same) and coYki-/- to investigate if the cortex role of Yki, is conserved in C. owczarzaki. In Drosophila the cortex role of Yki has been suggested to control tension at the cell surface. Drosophila Yki at the cortex activates myosin II through the N-terminal part of the protein and establishes a positive feedback loop by down regulating the Hippo pathway and obtaining therefore more active DmYki into the nucleus. This mechanism has been proposed by Xu et al. to work as the link between sensing cell tensions at the surface with control of tissue proliferation.

      In my opinion these are relevant results in the field that should be addressed in this study or at least well discussed. Actually, I think they could be a great opportunity for investigating if a putative cortex role of Yki is ancestral to its role linked to the Hippo pathway.

      We have now addressed this study in our manuscript- please see our response to reviewer #2’s last comment above.

      It would be informative to understand how stable expression through hygromicin selection is achieved in the transfection experiments. Is the recombinant plasmid integrated in the genome? Or is it stable as an episome?

      We believe that the plasmids stably integrate, as we never lose fluorescent signal once established in a clonal line, even after extended culturing (>6 months). It may be worthwhile to definitely determine integration vs. episome in future studies.

      The authors do not speculate or discuss how cell tension and cell proliferation is different for a unicellular organism or a tissue (multicellular) and I think should be addressed since the contexts are different.

      This is an interesting and important point, which we plan to discuss in detail in an upcoming review article, as a proper discussion of this idea, we think, is beyond the scope of this manuscript.

      Minor point. The study should cite other unicellular holozoans that have been also developed into treatable organisms such as Monosiga brevicollis (Woznica A, Kumar A, et al 2021eLife 10:e70436) and Abeoforma whisleri (Faktorová, D., Nisbet, R.E.R., Fernández Robledo, J.A. et al. Nat Methods17, 481-494 (2020) in line 83 of the manuscript. I am sure the authors appreciate how much effort there is behind every non-model organism put forward as experimentally treatable and should be properly acknowledged.

      We agree, and we have now included these examples of non-model organism development in our manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank all the reviewers for taking the time to assess and provide valuable feedback on the manuscript. We believe these comments helped clarify the manuscript’s prose, and the suggestions on the functionality and aim of the toolbox were globally incorporated into the following updates of the toolbox. Particularly, we would like to point out some changes that will help all reviewers, independently of their individual comments, to understand the current state of the toolbox and some systematic changes that were made to the manuscript.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which is reflected by changes made throughout the manuscript, most particularly in Figure 3 and Table 1. A beneficial side-effect of this is a much simpler structure for MotorNet which ought to contribute positively toward its usability by researchers in the neuroscience community.

      We also refactored some terminology to be more in line with current computational neuroscience vocabulary:

      • The term “plant”, which comes from industrial engineering and is more niche in neuroscience, has been replaced by “effector”.

      • The term “task” has been replaced by “environment” to match the gymnasium toolbox terminology, which MotorNet is now compatible with. Task objects essentially performed the same function as environment objects from the gymnasium toolbox.

      • The term “controller” has been replaced by “policy” throughout, as this term is more general.

      • The term “motor command” is very specific to the motor control subfield of neuroscience, and therefore is replaced by “action”, which is more commonplace for this modelling component in computational neuroscience and machine learning.

      Reviewer #1 (Public Review):

      Summary:

      Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples that demonstrate its usefulness.

      Main comments:

      (1) The paper is well written and easy to follow. The schematics help in understanding how the toolbox works and the examples provide an idea of the results that the user can obtain.

      We thank the reviewer for this comment.

      (2) As I understand it, the main purpose of the paper should be to facilitate the usage of the toolbox. For this reason, I have missed a more explicit link to the actual code. As I see it, researchers will read this paper to figure out whether they can use MotorNet to simulate their experiments, and how they should proceed if they decide to use it. I'd say the paper provides an answer to the first question and assures that the toolbox is very easy to install and use. Maybe the authors could support this claim by adding "snippets" of code that show the key steps in building an actual example.

      This is an important point, which we also considered when writing this paper. We instead decided to focus on the first approach, because it is easier to illustrate the scientific use of the toolbox using code or interactive (Jupyter) notebooks than a publication format. We find the “how to proceed” aspect of the toolbox can more easily and comprehensively be covered using online, interactive tutorials. Additionally, this allows us to update these tutorials as the toolbox evolves over different versions, while it is more difficult to update a scientific article. Consequently, we explicitly avoided code snippets on the article itself. However, we appreciate that the paper would gain in clarity if this was more explicitly stated early. We have modified the paper to include a pointer to where to find tutorials online. We added this at the last paragraph of the introduction section:

      “The interested reader may consult the full API documentation, including interactive tutorials on the toolbox website at https://motornet.org.”

      (3) The results provided in Figures 1, 4, 5 and 6 are useful, because they provide examples of the type of things one can do with the toolbox. I have a few comments that might help improving them:

      (a) The examples in Figures 1 and 5 seem a bit redundant (same effector, similar task). Maybe the authors could show an example with a different effector or task? (see point 4).

      The effectors from figures 1 and 5 are indeed very similar. However, the tasks in figure 1 and 5 present some important differences. The training procedure in figure 1 never includes any perturbations, while the one from figure 5 includes a wide range of perturbations of different magnitudes, timing and directions. The evaluation procedure of figure 1 includes center-out reaches with permanent viscous (proportional to velocity) external dynamics, while that of figure 5 are fixed, transient, square-shaped perturbation orthogonal to the reach direction. Finally, the networks in figure 1 undergo a second training procedure after evaluation while the network of figure 5 do not. While we agree that some variation of effectors would be beneficial, we do show examples of a point-mass effector in figure 6. Overall, figure 5 shows a task that is quite different from that of figure 1 with a similar effector, while the opposite is true for figure 6. We have modified the text to clarify this for the reader, by adding the following.

      End of 1st paragraph, section 2.4.

      “Therefore, the training protocol used for this task largely differed from section 2.1 in that the networks are exposed to a wide range of mechanical perturbations with varying characteristics.”

      1st paragraph of section 2.5

      […] this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control an effector that lacked the geometrical properties of an arm such as illustrated in Figure 4c-e and section 2.1.

      (b) I missed a discussion on the relevance of the results shown in Figure 4. The moment arms are barely mentioned outside section 2.3. Are these results new? How can they help with motor control research?

      We thank the reviewer for this comment. This relates to a point from reviewer 2 indicating that the purpose of each section was sometimes difficult to grasp as one reads. Section 2.3 explains the biomechanical properties that the toolbox implements to improve realism of the effector. They are not new results in the sense that other toolboxes implement these features (though not in differentiable formats) and these properties of biological muscles are empirically well-established. However, they are important to understand what the toolbox provides, and consequently what constraints networks must accommodate to learn efficient control policies. An example of this is the results in figure 6, where a simple effector versus a more biomechanically complex effector will yield different neural representations.

      Regarding the manuscript itself, we agree that more clarity on the goal of every paragraph may improve the reader’s experience. Consequently, we ensured to specify such goals at the start of each section. Particularly, we clarify the purpose of section 2.3 by adding several sentences on this at the end of the first paragraph in that section. We also now clearly state the purpose of section 2.3 with the results of figure 6 and reference figure 4 in that section.

      (c) The results in Figure 6 are important, since one key asset of ANNs is that they provide access to the activity of the whole population of units that produces a given behavior. For this reason, I think it would be interesting to show the actual "empirical observations" that the results shown in Fig. 6 are replicating, hence allowing a direct comparison between the results obtained for biological and simulated neurons.

      These empirical observations are available from previous electrophysiological and modelling work. Particularly, polar histograms across reaching directions like panel C are displayed in figures 2 and 3 of Scott, Gribble, Graham, Cabel (2001, Nature). Colormaps of modelled unit activity across time and reaching directions like panel F are also displayed in figure 2 of Lillicrap, Scott (2013, Neuron). Electrophysiological recordings of M1 neurons during a similar task in non-human primates can also be seen on “Preserved neural population dynamics across animals performing similar behaviour” figure 2 B (https://doi.org/10.1101/2022.09.26.509498) and “Nonlinear manifolds underlie neural population activity during behaviour” figure 2 B as well (https://doi.org/10.1101/2023.07.18.549575). Note that these two pre-prints use the same dataset.

      We have added these citations to the text and made it explicit that they contain visualizations of similar modelling and empirical data for comparison:

      “This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007; Michaels et al., 2016) and replicate similar visualizations from previously published work (Fortunato et al., 2023; Lillicrap & Scott, 2013; Safaie et al., 2023).”

      (4) All examples in the paper use the arm26 plant as effector. Although the authors say that "users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively", this does not sound straightforward. Table 1 does not really clarify how to do it. Maybe an example that shows the actual code (see point 2) that creates a new plant (e.g. the 3-joint arm in Figure 7) would be useful.

      Subclassing is a Python process more than a MotorNet process, as python is an object-oriented language. Therefore, there are many Python tutorials on subclassing in the general sense that would be beneficial for that purpose. We have amended the main text to ensure that this is clearer to the reader.

      Subclassing a MotorNet object, in a more specific sense, requires overwriting some methods from the base MotorNet classes (e.g., Effector or Environment classes, which correspond to the original Plant and Task object, respectively). Since we made the decision (mentioned above) to not include code in the main text, we added tutorials to the online documentation, which include dedicated tutorials for MotorNet class subclassing. For instance, this tutorial showcases how to subclass Environment classes:

      https://colab.research.google.com/github/OlivierCodol/MotorNet/blob/master/examples/3-environments.ipynb

      (5) One potential limitation of the toolbox is that it is based on Tensorflow, when the field of Computational Neuroscience seems to be, or at least that's my impression, transitioning to pyTorch. How easy would it be to translate MotorNet to pyTorch? Maybe the authors could comment on this in the discussion.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which are reflected by changes made throughout the manuscript, notably in Figure 3 and Table 1.

      (6) Supervised learning (SL) is widely used in Systems Neuroscience, especially because it is faster than reinforcement learning (RL). Thus providing the possibility of training the ANNs with SL is an important asset of the toolbox. However, SL is not always ideal, especially when the optimal strategy is not known or when there are different alternative strategies and we want to know which is the one preferred by the subject. For instance, would it be possible to implement a setup in which the ANN has to choose between 2 different paths to reach a target? (e.g. Kaufman et al. 2015 eLife). In such a scenario, RL seems to be a more natural option Would it be easy to extend MotorNet so it allows training with RL? Maybe the authors could comment on this in the discussion.

      The new implementation of MotorNet that relies on PyTorch is already standardized to use an API that is compatible with Gymnasium. Gymnasium is a standard and popular interfacing toolbox used to link RL agents to environments. It is very well-documented and widely used, which will ensure that users who wish to employ RL to control MotorNet environments will be able to do so relatively effortlessly. We have added this point to accurately reflect the updated implementation, so users are aware that it is now a feature of the toolbox (new section 3.2.4.).

      Impact:

      MotorNet aims at simplifying the process of simulating complex experimental setups to rapidly test hypotheses about how the brain produces a specific movement. By providing an end-to-end pipeline to train ANNs on the simulated setup, it can greatly help guide experimenters to decide where to focus their experimental efforts.

      Additional context:

      Being the main result a toolbox, the paper is complemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well organized and easy to navigate. The webpage walks the user through the installation of the toolbox and the building of the effectors and the ANNs.

      Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., an 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper could be reorganized to provide clearer signposts as to what role each section plays (e.g., that the explanation of the moment arms of different joint models serves to illustrate the complexity of realistic biomechanics, rather than a novel discovery/exposition of this manuscript). Also, if possible, it would be valuable if the authors could provide more insight into whether gradient descent finds qualitatively different solutions to RL or other non gradient-based methods. This would strengthen the argument that a fully differentiable plant is useful beyond improving training time / computational power required (although this is a sufficiently important rationale per se).

      We thank the reviewer for these comments. We agree that more clarity on the section goals may improve the reader’s experience and ensured this is the case throughout the manuscript. Particularly, we added the following on the first paragraph of section 2.3, for which an explicit goal was most missing:

      “In this section we illustrate some of these biomechanical properties displayed by MotorNet effectors using specific examples. These properties are well-characterised in the biology and are often implemented in realistic biomechanical simulation software.”

      Regarding the potential difference in solutions obtained from reinforcement or supervised learning, this would represent a non-trivial amount of work to do so conclusively and so may not be within the scope of the current article. We do appreciate however that in some situations RL may be a more fitting approach to a given task design. In relation to this point we now specify in the discussion that the new API can accommodate interfacing with reinforcement learning toolboxes for those who may want to pursue this type of policy training approach when appropriate (new section 3.2.4.).

      Reviewer #3 (Public Review):

      Artificial neural networks have developed into a new research tool across various disciplines of neuroscience. However, specifically for studying neural control of movement it was extremely difficult to train those models, as they require not only simulating the neural network, but also the body parts one is interested in studying. The authors provide a solution to this problem which is built upon one of the main software packages used for deep learning (Tensorflow). This allows them to make use of state-of-the-art tools for training neural networks.

      They show that their toolbox is able to (re-)produce several commonly studied experiments e.g., planar reaching with and without loads. The toolbox is described in sufficient detail to get an overview of the functionality and the current state of what can be done with it. Although the authors state that only a few lines of code can reproduce such an experiment, they unfortunately don't provide any source code to reproduce their results (nor is it given in the respective repository).

      The possibility of adding code snippets to the article is something we originally considered, and which aligns with comment two from reviewer one (see above). Hopefully this provides a good overview of the motivation behind our choice not to add code to the article.

      The modularity of the presented toolbox makes it easy to exchange or modify single parts of an experiment e.g., the task or the neural network used as a controller. Together with the open-source nature of the toolbox, this will facilitate sharing and reproducibility across research labs.

      I can see how this paper can enable a whole set of new studies on neural control of movement and accelerate the turnover time for new ideas or hypotheses, as stated in the first paragraph of the Discussion section. Having such a low effort to run computational experiments will be definitely beneficial for the field of neural control of movement.

      We thank the reviewer for these comment.

    1. Author Response

      This important work presents a new methodology for the statistical analysis of fiber photometry data, improving statistical power while avoiding the bias inherent in the choices that are necessarily made when summarizing photometry data. The reanalysis of two recent photometry data sets, the simulations, and the mathematical detail provide convincing evidence for the utility of the method and the main conclusions, however, the discussion of the re-analyzed data is incomplete and would be improved by a deeper consideration of the limitations of the original data. In addition, consideration of other data sets and photometry methodologies including non-linear analysis tools, as well as a discussion of the importance of the data normalization are needed.

      Thank you for the thorough and positive review of our work! We will incorporate this feedback to strengthen the manuscript. Specifically, we plan to revise the Discussion section to include a deeper consideration of the limitations of the original data, a description of the capacities of our method for conducting non-linear analyses, and the role data normalization plays in applicability of our tool.

      Reviewer 1:

      Strengths:

      The framework the authors present is solid and well-explained. By reanalyzing formerly published data, the authors also further increase the significance of the proposed tool opening new avenues for reinterpreting already collected data.

      Weaknesses:

      However, this also leads to several questions. The normalization method employed for raw fiber photometry data is different from lab to lab. This imposes a significant challenge to applying a single tool of analysis.

      Thank you for the positive feedback, we will address your comments in our revision. We agree that any data pre-processing steps will have down-stream consequences on the statistical inference from our method. Note, though, that this would also be the case with standard analysis approaches (e.g., t-tests, correlations) applied to summary measures like AUCs. For that reason, we do not believe that variability in pre-processing is an impediment to widespread adoption of a standard analysis procedure. Rather, we argue that the sensitivity of analysis results to pre-processing choices underscores the need for establishing statistical techniques that reduce the need for pre-processing, and properly account for structure in the data arising from experimental designs. The reviewer brings up an excellent point that we can further elaborate on how our methods actually reduce the need for such pre-processing steps. Indeed, our method provides smooth estimation results along the functional domain (i.e., across trial timepoints), has the ability to adjust for between-trial and -animal heterogeneity, and provides a valid statistical inference framework that quantifies the resulting uncertainty. For example, adjustment for session-to-session variability in signal magnitudes or dynamics could be accounted for, at least in part, through the inclusion of session-level random effects. This heterogeneity would then influence the width of the confidence intervals. This stands in contrast to “sweeping it under the rug” with a pre-processing step that may have an unknown impact on the final statistical inferences. Similarly, the level of smoothing is at least in part selected as a function of the data, and again is accounted for directly in the equations used to construct confidence intervals. In sum, our method provides both a tool to account for challenges in the data, and a systematic framework to quantify the additional uncertainty that accompanies accounting for those data characteristics.

      Does the method that the authors propose work similarly efficiently whether the data are normalized in a running average dF/F as it is described in the cited papers? For example, trace smoothing using running averages (Jeong et al. 2022) in itself may lead to pattern dilution. The same question applies if the z-score is calculated based on various responses or even baselines.

      This is an important question given how common this practice is in the field. Briefly, application of pre-processing steps will change the interpretation of the results from our analysis method. For example, if one subtracts off a pre-trial baseline average from each trial timepoint, then the “definition of 0”, and the interpretation of coefficients and their statistical significance, changes. Similarly, if one scales the signal (e.g., divides the signal magnitude by a trial- or animal-specific baseline), then this changes the interpretation of the FLMM regression coefficients to be in terms of an animal-specific signal unit as opposed to a raw dF/F. This is, however, not specific to our technique, and pre-processing would have a similar influence on, for example, linear regression (and thus t-tests, ANOVAs and Pearson correlations) applied to summary measures. We agree with the reviewer that explicitly discussing this point will strengthen the paper.

      While it is difficult to make general claims about the anticipated performance of the method under all the potential pre-processing steps taken in the field, we believe that most common pre-processing strategies will not negatively influence the method’s performance or validity; they would, instead, change the interpretation of the results. We are releasing a series of vignettes to guide analysts through using our method and, to address your comment, we will add a section on interpretation after pre-processing.

      How reliable the method is if the data are non-stationary and the baselines undergo major changes between separate trials?

      This is an excellent question. We believe the statistical inferences will be valid and will properly quantify the uncertainty from non-stationarities, since our framework does not impose stationarity assumptions on the underlying process. It is worth mentioning that non-stationarity and high trial-to-trial variability may increase variance estimates if the model does not include a rich enough set of covariates to capture the source of the heterogeneity across trial baselines. However, this is a feature of our framework, rather than a bug, as it properly conveys to the analyst that high unaccounted for variability in the signal may result in high model uncertainty. Finally, mixed effects modeling provides a transparent, statistically reasonable, and flexible approach to account for between-session, and between-trial variability, a type of non-stationarity. We agree with the reviewer that this should be more explicitly discussed in the paper, and will do so.

      Finally, what is the rationale for not using non-linear analysis methods? Following the paper's logic, non-linear analysis can capture more information that is diluted by linear methods.

      Functional data analysis assumes that the function varies smoothly along the functional domain (i.e., across trial timepoints). It is a type of non-linear modeling technique over the functional domain since we do not assume a linear model (straight line). Therefore, our functional data analysis approach is able to capture more information that is diluted by linear models. While the basic form of our model assumes a linear change in the signal at a fixed trial timepoint, across trials/sessions, our package allows one to easily model changes with non-linear functions of covariates using splines or other basis functions. One must consider, however, the tradeoff between flexibility and interpretability when specifying potentially complex models.

      Reviewer 2

      Strengths:

      The open-source package in R using a similar syntax as the lme4 package for the implementation of this framework on photometry data enhances the accessibility, and usage by other researchers. Moreover, the decreased fitting time of the model in comparison with a similar package on simulated data, has the potential to be more easily adopted.

      The reanalysis of two studies using summary statistics on photometry data (Jeong et al., 2022; Coddington et al., 2023) highlights how trial-by-trial analysis at each time-point on the trial can reveal information obscured by averaging across trials. Furthermore, this work also exemplifies how session and subject variability can lead to opposite conclusions when not considered.

      Thank you for the positive assessment of our work!

      Weaknesses:

      Although this work has reanalyzed previous work that used summary statistics, it does not compare with other studies that use trial-by-trial photometry data across time-points in a trial.

      As described by the authors, fitting pointwise linear mixed models and performing t-test and Benjamini-Hochberg correction as performed in Lee et al. (2019) has some caveats. Using joint confidence intervals has the potential to improve statistical robustness, however, this is not directly shown with temporal data in this work. Furthermore, it is unclear how FLMM differs from the pointwise linear mixed modeling used in this work.

      We agree with the reviewers that providing more detail about the drawbacks of the approach applied in Lee et al., 2019 will strengthen the paper. We will add an example analysis applying the method proposed by Lee et al., 2019 to show how the set of timepoints at which coefficient estimates reach statistical significance can vary dramatically depending on the sampling rate one subsamples their data at, a highly undesirable property of this strategy. Our approach is robust to this, and still provides a multiple comparisons correction through the joint confidence intervals.

      In this work, FLMM usages included only one or two covariates. However, in complex behavioral experiments, where variables are correlated, more than two may be needed (see Simpson et al. (2023), Engelhard et al. (2019); Blanco-Pozo et al. (2024)). It is not clear from this work, how feasible computationally would be to fit such complex models, which would also include more complex random effects.

      This is a good point. In our experience, the code is still quite fast (often taking seconds to tens of seconds in our experience) on a standard laptop when fitting complex models that include, for example, 10 covariates, or complex random effect specifications on dataset sizes common in fiber photometry. In the manuscript, we included results from simpler models with few covariates in an attempt to show results from the FLMM versions of the standard analyses (e.g., correlations, t-tests) applied in Jeong et al., 2022. Our goal was to show that our method reveals effects obscured by standard analyses even in simple cases. Some of our models did, however, include complex nested random effects (e.g., the models described in Section 4.5.2).

      Like other mixed-model based analyses, our method becomes slower when the number of observations in the dataset is on the order of tens of thousands of observations. However, we coded the methods to be memory efficient so that even these larger analyses can be run on standard laptops. We thank the reviewer for this point, as we worked extremely hard to scale the method to be able to efficiently fit models commonly applied in neuroscience. Indeed, challenges with scalability were one of the main motivations for applying the estimation procedure that we did; in the appendix we show that the fit time of our approach is much faster than existing FLMM software such as the refund package function pffr(), especially for large sample sizes. While pffr() appears to scale exponentially with the number of clusters (e.g., animals), our method appears to scale linearly. We will more explicitly emphasize the scalability in the revision, since we agree this will strengthen the final manuscript.

      Reviewer #3

      Strengths:

      The statistical framework described provides a powerful way to analyze photometry data and potentially other similar signals. The provided package makes this methodology easy to implement and the extensively worked examples of reanalysis provide a useful guide to others on how to correctly specify models.

      Modeling the entire trial (function regression) removes the need to choose appropriate summary statistics, removing the opportunity to introduce bias, for example in searching for optimal windows in which to calculate the AUC. This is demonstrated in the re-analysis of Jeong et al., 2022, in which the AUC measures presented masked important details about how the photometry signal was changing.

      Meanwhile, using linear mixed methods allows for the estimation of random effects, which are an important consideration given the repeated-measures design of most photometry studies.

      Thank you for the positive assessment of our work!

      Weaknesses:

      While the availability of the software package (fastFMM), the provided code, and worked examples used in the paper are undoubtedly helpful to those wanting to use these methods, some concepts could be explained more thoroughly for a general neuroscience audience.

      We appreciate this and, to address your and other reviewers’ comments, we are creating a series of vignettes walking users through how to analyze photometry data with our package. We will include algebraic illustrations to help users gain familiarity with the regression modeling here.

      While the methodology is sound and the discussion of its benefits is good, the interpretation and discussion of the re-analyzed results are poor:

      In section 2.3, the authors use FLMM to identify an instance of Simpson's Paradox in the analysis of Jeong et al. (2022). While this phenomenon is evident in the original authors' metrics (replotted in Figure 5A), FLMM provides a convenient method to identify these effects while illustrating the deficiencies of the original authors' approach of concatenating a different number of sessions for each animal and ignoring potential within-session effects. The discussion of this result is muddled. Having identified the paradox, there is some appropriate speculation as to what is causing these opposing effects, particularly the decrease in sessions. In the discussion and appendices, the authors identify (1) changes in satiation/habitation/motivation, (2) the predictability of the rewards (presumably by the click of a solenoid valve) and (3) photobleaching as potential explanations of the decrease within days. Having identified these effects, but without strong evidence to rule all three out, the discussion of whether RPE or ANCCR matches these results is probably moot. In particular, the hypotheses developed by Jeong et al., were for a random (unpredictable) rewards experiment, whereas the evidence points to the rewards being sometimes predictable. The learning of that predictability (e.g. over sessions) and variation in predictability (e.g. by attention level to sounds of each mouse) significantly complicate the analysis. The FLMM analysis reveals the complexity of analyzing what is apparently a straightforward task design.

      While we are disappointed to hear the reviewer felt our initial interpretations and discussion were poor, the reviewer brings up an excellent point that we had not considered. They have convinced us that acknowledging and elaborating on this alternative perspective will strengthen the paper. We agree that the ANCCR/RPE model predictions were made for unpredictable rewards and, as the reviewer rightly points out, there is evidence that the animals sense the reward delivery. Regardless of the learning theory one adopts (RPE, ANCCR or others), we agree that this (potentially) learned predictability alone could account for the increase in signal magnitude across sessions.

      After reading the reviewer’s comments, we consulted with a number of researchers in this area, and several felt that a CS+ can serve as a reward, within itself. From this perspective, the rewards in the Jeong et al., 2022 experiment might still be considered unexpected. After discussing extensively with the authors of Jeong et al., 2022, it is clear that they went to enormous trouble to prevent the inadvertent generation of a CS+, and it is likely changes in pressure from the solenoid (rather than a sound) that served as a cue. This underscores the difficulty of preventing perception of reward delivery in practice. As this paper is focused on analysis approaches, we feel that we can contribute most thoughtfully to the dopamine–learning theory conversation by presenting both sides.

      Overall, we agree with the reviewer that future experiments will be needed for testing the accuracy of the models’ predictions for random (unpredicted) rewards. While we understand that our attempt to document our conversations with the Jeong et al., 2022 authors may have room for improvement, we hope the reviewer can appreciate that this was done with the best of intentions. We wish to emphasize that we also consulted with several other researchers in the field when crafting the discussion. The Jeong et al., 2022 authors could easily have avoided acknowledging the potential incompleteness of their theory, by claiming that our results do not invalidate their predictions for a random reward, as the reward was not unpredicted in the experiment (as a result of the inadvertent solenoid CS+). Instead, they went out of their way to emphasize that their experiment did test a random reward, and that our results do present problems for their theory. We think that engagement with re-analyses of one’s data, even when findings are inconvenient, is a good demonstration of open science practice. For that reason as well, we feel providing readers with a perspective on the entire discussion will contribute to the scientific discourse in this area.

      Finally, we would like to reiterate that this conversation is happening because our method, by analyzing the signal at every trial timepoint, revealed a neural signal that appears to indicate that the animals sense reward delivery. Ultimately, this was what we set out to do: help researchers ask questions of their data that they could not ask before. We believe that having a demonstration that we can indeed do this for a “live” issue is the most appropriate way of demonstrating the usefulness of the method.

      It is clear the reviewer put a lot of time into understanding what we did, and was very thoughtful about the feedback. We would like to thank the reviewer again for taking such care in reviewing our paper.

      If this paper is not trying to arbitrate between RPE and ANCCR, as stated in the text, the post hoc reasoning of the authors of Jeong et al 2022 provided in the discussion is not germane.

      While we appreciate that the post hoc reasoning of the authors of Jeong et al., 2022 may not seem germane, we would like to provide some context for its inclusion. As statisticians and computer scientists, our role is to create methods, and this often requires using open source data and recreating past analyses. This usually involves extensive conversation with authors about their data and analysis choices because, if we cannot reproduce their findings using their analysis methods, we cannot verify that results from our own methods are valid. As such, we prefer to conduct method development in a collaborative fashion, and we strive to constructively, and respectfully, discuss our results with the original authors. We feel that giving them the opportunity to suggest analyses, and express their point of view if our results conflict with their original conclusions, is important, and we do not want to discourage authors from making their datasets public. As such, we conducted numerous analyses at the suggestion of Jeong et al., 2022 and discussed the results over the course of many months. Indeed the analyses in the Appendix that the reviewer is referring to were conducted at the suggestion of the authors of Jeong et al., 2022, in an attempt to rule out alternative explanations. We nevertheless appreciate that our interpretations of these results can include some of the caveats suggested by the reviewer, and we will strive to improve these sections.

      Arbitrating between the models likely requires new experimental designs (removing the sound of the solenoid, satiety controls) or more complex models (e.g. with session effects, measures of predictability) that address the identified issues.

      We agree with the reviewer that the results suggest that new experimental designs will likely be necessary to adjudicate between models. It is our hope that, by weighing the different issues and interpretations, our paper might provide useful suggestions into what experimental designs would be most beneficial to rule out competing hypotheses in future data collection efforts. We believe that our methodology will strengthen our capacity to design new experiments and analyses. We will make the reviewer’s suggestions more explicit in the discussion by emphasizing the limitations of the original data.

      Of the three potential causes of within-session decreases, the photobleaching arguments advanced in the discussion and expanded greatly in the appendices are not convincing. The data being modeled is a processed signal (ΔF/F) with smoothing and baseline correction and this does not seem to have been considered in the argument.

      We are disappointed to hear that this extensive set of analyses, much of which was conducted at the suggestion of Jeong et al., 2022, was not convincing. We agree that acknowledging any pre-processing would provide useful context for the reader. We do wish to clarify that we analyzed the data that were made available online (raw data was not available). Moreover, for comparison with the authors’ results, we felt it was important to maintain the same pre-processing steps as they did. These conditions were held constant across analysis approaches; therefore, we think that the changes within-trial are likely not influenced substantially by these pre-processing choices. While we cannot speak definitively to the impact any of the processing conducted by the authors had on the results, we believe that it was likely minor, given that the timing of signals at other points in the trial, and in other experiments, were as expected (e.g., the signal rose rapidly after cue onset in Pavlovian tasks).

      Furthermore, the photometry readout is also a convolution of the actual concentration changes over time, influenced by the on-off kinetics of the sensor, which makes the interpretation of timing effects of photobleaching less obvious than presented here and more complex than the dyes considered in the cited reference used as a foundation for this line of reasoning.

      We appreciate the nuance of this point, and we will add it to our discussion. In response to your criticism, we have consulted with more experts in the field regarding the potential for bleaching in this data, and it is not clear to us why photobleaching would be visible in one time-window of a trial, but not at another (less than a second away), despite high dF/F magnitudes in both time-windows. We do wish to point out that, at the request of the authors, we analyzed many experiments from the same animals and in most cases did not observe other indications of photobleaching. Hence, it is not clear to us why this particular set of experiments would garner additional skepticism regarding the potential for photobleaching to invalidate results. While the role of photobleaching may be more complicated with this sensor than others in the references, that citation was included, at the suggestion of Jeong et al., 2022 simply as a way of acknowledging that non-linearities in photobleaching can occur.

      Within this discussion of photobleaching, the characterization of the background reward experiments used in part to consider photobleaching (appendix 7.3.2) is incorrect. In this experiment (Jeong et al., 2022), background rewards were only delivered in the inter-trial-interval (i.e. not between the CS+ and predicted reward as stated in the text). Both in the authors' description and in the data, there is a 6s before cue onset where rewards are not delivered and while not described in the text, the data suggests there is a period after a predicted reward when background rewards are not delivered. This complicates the comparison of this data to the random reward experiment.

      Thank you for pointing this out!! We will remove the parenthetical on page 18 of the appendix that incorrectly stated that rewards can occur between the CS+ and the predicted reward.

      The discussion of the lack of evidence for backpropagation, taken as evidence for ANCCR over RPE, is also weak.

      This point was meant to acknowledge that, although our method yields results that conflict with the conclusions described by Jeong et al., 2022 on data from some experiments, on other experiments our method supports their results. Again, we believe that a critical part of open science is acknowledging both areas where analyses support and conflict with those of the original authors. We agree with the reviewer that qualifying our results so as not to emphasize support for/against RPE/ANCCR will strengthen our paper, and we will make these changes.

      A more useful exercise than comparing FLMM to the methods and data of Jeong et al., 2022, would be to compare against the approach of Amo et al., 2022, which identifies backpropagation (data publicly available: DOI: 10.5061/dryad.hhmgqnkjw). The replication of a positive result would be more convincing of the sensitivity of the methodology than the replication of a negative result, which could be a result of many factors in the experimental design. Given that the Amo et al. analysis relies on identifying systematic changes in the timing of a signal over time, this would be particularly useful in understanding if the smoothing steps in FLMM obscure such changes.

      Thank you for this suggestion, and we agree this could be a useful analysis for the field. Your thoughtful review has convinced us that focusing on our statistical contribution will strengthen the paper, and we will make changes to further emphasize that we are not seeking to adjudicate between RPE/ANCCR. We only had space in the manuscript to include a subset of the analyses conducted on Jeong et al., 2022, and had to relegate the results from the Coddington et al., data to an appendix. Realistically, it would be hard for us to justify analyzing a third dataset. As you may surmise from the one we presented, reanalyzing a new dataset is usually very time consuming, and invariably requires extensive communication with the original authors. We did include numerous examples in our manuscript where we already replicated positive results, in a way that we believe demonstrates the sensitivity of the methodology. We have also been working with five groups at NIH and elsewhere using our approach, in experiments targeting different scientific questions. In fact, one paper that extensively applies our method and compares the results from those yielded by standard analysis of AUCs is already accepted and in press. Hence there should soon be additional demonstrations of what the method can do in less controversial settings. Finally, our forthcoming vignettes include additional analyses, not included in the manuscript, that replicate positive results. We take your point that our description of the data supporting one theory or the other should be qualified, and we will correct that. Again, your review was very thorough, and we appreciate your taking so much time to help us improve our work.

      Reviewer #2 (Recommendations For The Authors):

      First, I would like to commend the authors for the clarity of the paper, and for creating an open-source package that will help researchers more easily adopt this type of analysis.

      Thank you!

      I would suggest the authors consider adding to the manuscript, either some evidence or some intuition on how feasible would be to use FLMM for very complex model specifications, in terms of computational cost and model convergence.

      This is an excellent point and we will make this suggested change in the Methods and Discussion section in the next draft.

      From my understanding, this package might potentially be useful not just for photometry data but also for two-photon recordings for example. If so, I would also suggest the authors add to the discussion this potential use.

      We appreciate your thinking on this point, as it would definitely help expand use of the method. We included a brief point in the Discussion that this package would be useful for other techniques, but we will expand upon this.

      Reviewer #3 (Recommendations For The Authors):

      The authors should define 'function' in context, as well as provide greater detail of the alternate tests that FLMM is compared to in Figure 7. Given the novelty of estimating joint CIs, the authors should be clearer about how this should be reported and how this differs from pointwise CIs (and how this has been done in the past).

      Thank you, this is a very good point and will be critical for helping analysts describe and interpret results. We will add more detail to the Methods section on this point.

      The authors identify that many photometry studies are complex nested longitudinal designs, using the cohort of 8 animals used in five task designs of Jeong et al. 2022 as an example. The authors miss the opportunity to illustrate how FLMM might be useful in identifying the effects of subject characteristics (e.g. sex, CS+ cue identity).

      This is a great suggestion and we will add this important point to the discussion , especially in light of the factorial designs common in neuroscience experiments.

      In discussing the delay-length change experiment, it would be more accurate to say that proposed versions of RPE and ANCCR do not predict the specific change.

      We will make this change and agree this is a better phrasing.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the Authors):

      (1) Within the section on "optimized antigen retrieval", the authors mentioned that weak immunolabelling and strong non-specific labelling may be due to inadequate antigen retrieval. I wonder whether this interpretation is accurate. Could it also be due to inadequate antibody penetration?

      We appreciate the reviewer's comment and have revised our text to improve clarity. Regarding the SDS-electrophoresed sample (Figure S1a right), we acknowledge that the brain-surrounding background noise indicates insufficient antibody penetration. However, in the FLASH-processed sample (Figure S1a left), the background signal is uniformly distributed throughout the entire brain. Therefore, we conclude that incomplete antibody penetration is unlikely under this condition. Below is the revised paragraph:

      Revised manuscript, line 62-66: “We observed that both FLASH-processed and SDS-electrophoresed samples showed weak tyrosine hydroxylase (TH, a marker of dopaminergic neurons) signal (Figure S1a, Supporting Information). Additionally, we noticed that the FLASH-processed samples had almost no signal of NeuN, a marker of neuronal nuclei (Figure S1b left, Supporting Information), and exhibited strong non-specific background noise (Figure S1a left, Supporting Information). The presence of this background noise is considered an indicator of inadequate antigen retrieval.[48]”

      • Also, the authors mentioned the use of FLASH protocol and SDS-based electrophoresis for delipidation which were not described in the methods section.

      We have included the information in the revised Materials and Methods.

      Revised manuscript, line 418-426: S”HIELD processing, SDS-electrophoretic delipidation and FLASH delipidation. PFA-fixed specimens were incubated in SHIELD-OFF solution at 4 °C for 96 hours, followed by incubation for 24 hours in SHIELD-ON solution at 37 °C. All reagents were prepared using SHIELD kits (LifeCanvas Technologies, Seoul, South Korea) according to the manufacturer's instructions. For SDS-electrophoretic delipidation, SHIELD-processed specimens were placed in a stochastic electro-transport machine (SmartClear Pro II, LifeCanvas Technologies, Seoul, South Korea) running at a constant current of 1.2 A for 5-7 days. For FLASH delipidation, the SHIELD-processed specimens were placed in FLASH reagent (4% w/v SDS, 200 mM borate) and then incubated at 54 ℃ for 18 hours.[47] The delipidated specimens were washed with PBST at room temperature for at least 1 day.”

      • In addition, tyrosine hydroxylase (TH) should be a marker of "monoaminergic" neurons rather than specifically "dopaminergic" neurons.

      We appreciate the reviewer's correction. It is true that tyrosine hydroxylase (TH) is a marker for neurons that contain dopamine, norepinephrine, and epinephrine (catecholamines). However, the adrenergic and noradrenergic neurons are relatively few and are mostly located in the medulla and brain stem. Since we only monitoring the brain in this study, we wish to keep TH as an indicator of dopaminergic neurons.

      (2) It was mentioned that tissue integrity was retained following heating treatment during the MOCAT protocol. It would be useful to demonstrate any differences in structural distortion, if any, with before and after images with different delipidation agents.

      We have provided an additional supplementary figure (Figure S5 in the revised manuscript) to display the mouse brain at different stages of the MOCAT protocol, including pre-delipidation, post-delipidation, and post-RI-matching, to demonstrate the tissue integrity.

      Revised manuscript, line 135-137: “Figure S5 shows the gross views of the same mouse brain after undergoing 4% PFA fixation, paraffin processing, optimized antigen retrieval, and RI-matching, demonstrating intactness of the brain shape and preservation of tissue integrity.”

      (3) In this study, the authors have demonstrated the protocol could be successfully applied to FFPE specimens up to 15 years old. However, archival brain bank materials often have brain tissues with extended formalin fixation time. It may be useful to demonstrate that this technique can be utilised on FFPE tissues with long formalin fixation times.

      We appreciate the reviewer's suggestions. We have included an additional supplementary figure (Figure S6) to demonstrate the application of MOCAT to 3-month fixed mouse brain hemispheres. Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.

      The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. This indicates that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.

      Revised manuscript, line 163 to 167: “We also applied MOCAT to 3-month fixed mouse brain hemispheres (Figure S6). Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed clear details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.”

      Revised manuscript, line 346-351: “In the demonstration of MOCAT to 3-month fixed specimens, we observed that pontine reticular nucleus (Figure S6A, yellow arrowheads) lose TH-positive signals after long-term fixation. The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. The results indicate that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.”

      (4) Whilst it is encouraging to see this protocol enables multi-round immunolabelling, further work is required to demonstrate there is no cross-reactivity in subsequent rounds of immunostaining following bleaching (e.g. Non-specific secondary antibody binding).

      We appreciate the reviewer for noting their concern and providing suggestions. To address this issue, we have examined the results of the second to fourth rounds of multi-round staining, as shown in Figure 3. In all three sequential rounds, we utilized rabbit primary antibodies and the same secondary antibodies. Our observations under a 3.6x objective (NA = 0.2) did not reveal any colocalization with the staining from the previous round. Hence, we conclude that cross-reactivity is not significant. However, we acknowledge the need for more comprehensive testing to completely rule out the possibility of cross-reactivity, such as employing antibodies from different hosts or utilizing different types of secondary antibodies (e.g., IgG, Fab2).

      Revised manuscript line 189-191: “The brain shape and structural integrity remained after 4 rounds of immunolabeling, and there is no cross-reactivity in subsequent rounds of immunostaining following bleaching. (Figure S11).”

      • Also, how was the structural integrity maintained for tissues after multiple rounds of heat-induced epitope retrieval?

      We have provided an additional supplementary figure (Figure S11 in the revised manuscript) to demonstrate the structural integrity after 4 rounds of immunolabeling.

      Revised manuscript line 189-191: “The brain shape and structural integrity remained after 4 rounds of immunolabeling, and there is no cross-reactivity in subsequent rounds of immunostaining following bleaching (Figure S11).”

      (5) It may be useful to have a side-by-side comparison in staining quality with equivalent sizes of rodent and human brain tissues as there appeared to be a reduction in clarity and staining quality at greater imaging depth for human tissues.

      We have provided an additional supplementary figure (Figure S12) to show the fluorescent images of TH- and Lectin-labeling in 1mm-thick human and mouse brain tissues at depths of 100 um, 500 um, and 900 um. For millimeter-sized samples, both human and mouse brains showed comparable levels of transparency, with no noticeable reduction in fluorescence signal at varying depths. In our forthcoming studies, we plan to conduct a more comprehensive comparison of centimeter-sized human and mouse brain tissues.

      (6) Lectin staining is used throughout this study to label vasculature of the brain. How specific is this as compared with other vasculature markers such as CD31?

      We appreciate the reviewer for addressing their concern. Lectins are nonimmune-origin carbohydrate-binding proteins that have been utilized to label the surface of the blood vessel lumen. On the other hand, CD31, CD34, etc. are immunomarkers of vascular endothelial cells. Numerous references have confirmed that lectin staining consistently co-localizes with CD31 immunoreactivity (Battistella et al. 2021; Miyawaki et al. 2020). However, in tumors, blood vessels lacking a lumen may display CD31 positive/Lectin negative conditions (Morikawa et al. 2002).

      (7) When discussing the applicability of MOCAT on the astrocytoma mouse model, there is a bit of confusion with regard to the terminology. As astrocytoma by default will be comprised of astrocytes, it may be useful to describe the tumour astrocytes as ASTS1CI-GFP positive astrocytes and immunolabelled astrocytes as GFAP-positive astrocytes.

      We thank the reviewer for their suggestions. To avoid confusion for readers, we have made modifications to the content and labeling of Figure 6A.

      Revised manuscript, line 213-219: “…we subjected an intact FFPE brain from an astrocytoma mouse model (see Materials and Methods) to the MOCAT pipeline to label tumor cells (ASTS1CI-GFP positive astrocytes) and GFAP-positive astrocytes (Figure 6A, C). Accordingly, we could segment GFAP-positive astrocytes surrounding the tumor (Figure 6B, D, and E) and classify them according to their distances from the tumor cells. Statistical analysis (Figure 6F) revealed that nearly half of the GFAP-positive astrocytes were within the tumor, with 63.9% being located near the tumor surface (±200 μm).”

      (8) Within the methods section, further details of the antibodies such as the clonality and immunogen should be included in the supplementary table.

      We appreciate the reviewer for their suggestions. In the revised version, we have included these details in Supplementary Table 1.

      • Furthermore, there is inadequate detail regarding multi-round immunolabelling and the precise timing of immunolabelling including lectin staining, various imaging parameters including the working distance of the lens and excitation laser used.

      We have added the experimental details of multi-round staining for Figure 3 in Supplementary Table 3. This table now includes information about the amounts and types of chemicals and antibodies used, as well as the laser wavelengths used for each round. The staining conditions (including labeling time, temperature, and buffer used) have been disclosed in Materials and Methods (see MOCAT pipeline/Electrophoretic immunolabeling). Furthermore, we have included the working distance and NA value of the objective lens used in MOCAT pipeline/Volumetric imaging and 3D visualization subsection.

      Revised manuscript, line 464-479: “Electrophoretic immunolabeling (active staining). The procedure was modified from the previously published eFLASH protocol[15] and was conducted in a SmartLabel System (LifeCanvas Technologies, Seoul, South Korea). The specimens were preincubated overnight at room temperature in sample buffer (240 mM Tris, 160 mM CAPS, 20% w/v D-sorbitol, 0.9% w/v sodium deoxycholate). Each preincubated specimen was placed in a sample cup (provided by the manufacturer with the SmartLabel System) containing primary, corresponding secondary antibodies and lectin diluted in 8 mL of sample buffer. Information on antibodies, lectin and their optimized quantities is detailed in Supplementary Table 1. The specimens in the sample cup and 500 mL of labeling buffer (240 mM Tris, 160 mM CAPS, 20% w/v D-sorbitol, 0.2% w/v sodium deoxycholate) were loaded into the SmartLabel System. The device was operated at a constant voltage of 90 V with a current limit of 400 mA. After 18 hours of electrophoresis, 300 mL of booster solution (20% w/v D-sorbitol, 60 mM boric acid) was added, and electrophoresis continued for 4 hours. During the labeling, the temperature inside the device was kept at 25 ℃. Labeled specimens were washed twice (3 hours per wash) with PTwH (1× PBS with 0.2% w/v Tween-20 and 10 μg/mL heparin),[23] and then post-fixed with 4% PFA at room temperature for 1 day. Post-fixed specimens were washed twice (3 hours per wash) with PBST to remove any residual PFA.”

      Revised manuscript, line 483-490: “Volumetric imaging and 3D visualization. For centimeter-scale specimens, images were acquired using a light-sheet microscope (SmartSPIM, LifeCanvas Technologies, Seoul, South Korea) with a 3.6x customized immersion objective (NA = 0.2, working distance = 1.2 cm). For samples <3 mm thick, imaging was performed using a multipoint confocal microscope (Andor Dragonfly 200, Oxford Instruments, UK) with objectives that were UMPLFLN10XW (10x, NA = 0.3, working distance = 3.5 mm), UMPLFLN20XW (20x, NA = 0.5, working distance = 3.5 mm), UMPLFLN40XW (40x, NA = 0.8, working distance = 3.3 mm). 3D visualization was performed using Imaris software (Imaris 9.5.0, Bitplane, Belfast, UK).”

      • Also, since refractive index homogenisation is an important step in tissue-clearing experiments, it may be useful to describe the components of NFC1 and NFC2 solutions used and provide images of the "cleared" tissues.

      We have included the image of a cleared mouse brain in Figure S5. Additionally, we have provided the refraction index of NFC1 and NFC2 in Materials and Methods (see MOCAT pipeline/Refractive index matching). However, the composition of NFC1 and NFC2, being commercialized products from Nebulem (Taiwan), is non-disclosable.

      Reviewer #2 (Public Review):

      Major Weaknesses:

      • There is no evidence of actual transparency of the entire mouse brain across different treatments. The suggested protocol is very good at removing lipids (as assessed by DiD staining) and by results of fluorescence registration deep within the brain. BUT, since in many places of the manuscript authors speak of "transparency" the reader will expect the typical picture in which control and processed brains are on top of a white graphical pattern that would evidence transparency (see as an example Figure 1 and 2 of Wan et al. 2018 (Neurophotonics. 2018 Jul;5(3):035007. doi: 10.1117/1.NPh.5.3.035007.)

      We thank the reviewer for their suggestions. We have provided an additional supplementary figure (Figure S5 in the revised manuscript) to demonstrate the transparency.

      • The manuscript lacks clarity on the applicability of MOCAT to regular formalin-fixed tissue and tissues other than the brain.

      We appreciate the reviewer's suggestions. We have included an additional supplementary figure (Figure S6) to demonstrate the application of MOCAT to a 3-month regular formalin-fixed mouse brain hemisphere. We observed that the major dopaminergic regions were still labeled, although with reduced intensity and S/N ratio. We also observed that the fluorescence intensity was more affected in formalin, which is methanol-stabilized and stronger, than in PFA, implying that a stronger antigen retrieval method may be possible to rescue the intensity. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.

      Revised manuscript, line 163 to 167: “We also applied MOCAT to 3-month fixed mouse brain hemispheres (Figure S6). Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed clear details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.”

      Revised manuscript, line 346-351: “In the demonstration of MOCAT to 3-month fixed specimens, we observed that pontine reticular nucleus (Figure S6A, yellow arrowheads) lose TH-positive signals after long-term fixation. The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. The results indicate that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.”

      Regular formalin

      We agree with the reviewer and plan to investigate the potential use of MOCAT in tissues other than the brain in our subsequent studies.

      • Insufficient information is provided on the "epoxy treatment" or "hydrogel," and a more detailed explanation is warranted.

      We appreciate the reviewer's question. In response, we have included a paragraph in the Discussion section to clarify the appropriate timing for using epoxy or hydrogel in the MOCAT pipeline. However, the harsh conditions, such as pressure and heat, caused by external forces might damage specimens. To protect specimens from the harsh conditions caused by active staining, specimens could be strengthened by treatment with epoxy or acrylamide monomer to form a tissue-epoxy or tissue-hydrogel hybrid.[29,31] Laboratories that do not have adequate devices or handle small specimens could use passive immunolabeling instead and skip the step of epoxy or hydrogel pretreatment.

      Epoxy and acrylamide hydrogel can both strengthen tissue structures. However, in this study, we only used epoxy for treatment in combination with active electrophoretic staining. To avoid confusion and improve clarity, we have made modifications to Figure 1B and included epoxy processing in the MOCAT pipeline subsection within Materials and Methods.

      Revised manuscript, line 329-340: “In Figure 1B, we propose two staining strategies for samples with thicknesses less than 500 um and greater than 1 mm: passive immunolabeling and active immunolabeling. In passive immunolabeling, antibodies penetrate and reach their targets solely through diffusion, without any additional force. It takes approximately two months to passively stain a whole mouse brain.[26,28] Compared to passive immunolabeling, active immunolabeling uses an external force, such as pressure, electrophoresis, etc., to facilitate antibody penetration and therefore significantly speed up the staining process, reducing the required staining time for a whole mouse brain to one day. However, the harsh conditions, such as pressure and heat, caused by external forces might damage specimens. To protect specimens from the harsh conditions caused by active staining, specimens could be strengthened by treatment with epoxy or acrylamide monomer to form a tissue-epoxy or tissue-hydrogel hybrid.[29,31] Laboratories that do not have adequate devices or handle small specimens could use passive immunolabeling instead and skip the step of epoxy or hydrogel pretreatment.”

      • The differences between passive and active immunolabeling, as well as photobleaching data, should be addressed for a comprehensive understanding.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to explain the differences between passive and active immunolabeling:

      Revised manuscript, line 329-340: “In Figure 1B, we propose two staining strategies for samples with thicknesses less than 500 um and greater than 1 mm: passive immunolabeling and active immunolabeling. In passive immunolabeling, antibodies penetrate and reach their targets solely through diffusion, without any additional force. It takes approximately two months to passively stain a whole mouse brain.[26,28] Compared to passive immunolabeling, active immunolabeling uses an external force, such as pressure, electrophoresis, etc., to facilitate antibody penetration and therefore significantly speed up the staining process, reducing the required staining time for a whole mouse brain to one day.”

      Regarding the effects of photobleaching, we have added Figure S10 to demonstrate the efficiency of using our approach.

      Revised manuscript, line 184-185: After imaging, we photobleached transparent RI-matched samples using a 100W LED white light to quench the previously labeled fluorophores (Figure S10).

      • The assertion that MOCAT can be rapidly applied in hospital pathology departments seems overstated due to the limited availability of light-sheet microscopes outside research labs.

      We thank the reviewer's question. Since the imaging depth primarily relies on the working distance of the objective lens, if a long working distance objective lens (such as UMPLFLN10XW from Olympys Inc.) is available, it is also possible to scan samples up to a thickness of approximately 3.5mm. However, confocal systems require longer scanning times, and in non-optical sectioning wide-field fluorescence microscopes like the Olympus BX series or ZEISS Axio imager series, deconvolution algorithms must be utilized to eliminate out-of-focus signals.

      Additionally, the epifluorescence system may also result in reduced fluorescent intensity in the deeper regions of the sample. If the fluorescent signal of the target is weak or exceeds the working distance of the objective lens, an alternative option is to send the sample to a microscopy or imaging facility core for scanning and further analysis.

      • The compatibility of MOCAT with genetically encoded fluorescent proteins remains unclear and warrants further investigation.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to address this limitation of MOCAT:

      Revised manuscript, line 354-361: “Fourth, MOCAT is not compatible with endogenous fluorescence due to a reduction in fluorescence intensity caused by xylene and alcohol used in paraffin processing. Researchers who need to directly observe genetically encoded fluorescent proteins can utilize tissue-clearing methods such as 3DISCO, X-CLARITY, CUBIC, etc., which have been shown to minimize the decrease in fluorescence intensity. On the other hand, if researchers need to visualize transgenic fluorescent proteins along with other biomarkers, they can use MOCAT for delipidation and boost-immunolabeling to visualize the transgenic fluorescent proteins.”

      • The control of equivalent depths in cryosections for evaluating the intensity of DiD staining should be elaborated upon.

      We have included these information in the section of Materials and Methods:

      Revised manuscript, line 428-430: “Serial 20-µm-thick cryosections were cut from mouse brain slices (2-mm thick) of various treatment conditions for subsequent DiD or Oil red O staining. For DiD staining, cryosections (that were of approximately 0-40 µm depth) were post-fixed with 4% PFA at room temperature for 5 minutes.”

      • The composition of NFC1 and NFC2 solutions for refractive index matching should be provided.

      We have provided the refraction index of NFC1 and NFC2 in Materials and Methods (see MOCAT pipeline/Refractive index matching). However, the composition of NFC1 and NFC2, being commercialized products from Nebulem (Taiwan), is non-disclosable.

      Reviewer #2 (Recommendations for the Authors):

      • A larger readership would benefit from validating imaging depths using fluorescence microscopies commonly found in pathological departments (i.e. Confocal, 2-photon, epifluorescence+deconvolution, etc).

      We thank the reviewer's recommentation. Since the imaging depth primarily relies on the working distance of the objective lens, if a long working distance objective lens (such as UMPLFLN10XW from Olympys Inc.) is available, it is also possible to scan samples up to a thickness of approximately 3.5mm. However, confocal systems require longer scanning times, and in non-optical sectioning wide-field fluorescence microscopes like the Olympus BX series or ZEISS Axio imager series, deconvolution algorithms must be utilized to eliminate out-of-focus signals.

      Additionally, the epifluorescence system may also result in reduced fluorescent intensity in the deeper regions of the sample. If the fluorescent signal of the target is weak or exceeds the working distance of the objective lens, an alternative option is to send the sample to a microscopy or imaging facility core for scanning and further analysis.

      -Investigate the compatibility of MOCAT with genetically encoded fluorescent proteins, a common target in research specimens.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to address this limitation of MOCAT:

      Revised manuscript, line 354-361: “Fourth, MOCAT is not compatible with endogenous fluorescence due to a reduction in fluorescence intensity caused by xylene and alcohol used in paraffin processing. Researchers who need to directly observe genetically encoded fluorescent proteins can utilize tissue-clearing methods such as 3DISCO, X-CLARITY, CUBIC, etc., which have been shown to minimize the decrease in fluorescence intensity. On the other hand, if researchers need to visualize transgenic fluorescent proteins along with other biomarkers, they can use MOCAT for delipidation and boost-immunolabeling to visualize the transgenic fluorescent proteins.” References:

      Battistella, Roberta et al. 2021. “Not All Lectins Are Equally Suitable for Labeling Rodent Vasculature.” International Journal of Molecular Sciences 22(21): 22. /pmc/articles/PMC8584019/ (January23, 2024).

      Miyawaki, Takeyuki et al. 2020. “Visualization and Molecular Characterization of Whole-Brain Vascular Networks with Capillary Resolution.” Nature Communications 2020 11:1 11(1): 1–11. https://www.nature.com/articles/s41467-020-14786-z (January23, 2024).

      Morikawa, Shunichi et al. 2002. “Abnormalities in Pericytes on Blood Vessels and Endothelial Sprouts in Tumors.” The American Journal of Pathology 160(3): 985–1000.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) It is not entirely clear why a tumor-free model is chosen to study immune responses, as immune responses can differ significantly with or without tumor-bearing. A more detailed explanation is needed.

      We appreciate the question. As stated in the original submission, tumor-free mouse models are commonly used to assess off-target outcomes of anti-neoplastic therapies. We have expanded on this point and acknowledged this shortcoming in the revised manuscript (lines 264-265).

      (2) Immune responses in isolated macrophages, neutrophils, and bone marrow cells require priming with LPS, while such responses are not observed in vivo. There is no explanation for these differences.

      The reviewer raises an excellent point. The assembly of inflammasomes such as those nucleated by NLRP3 requires priming signals, which increase the levels of this sensor, which are kept low in homeostatic conditions to prevent spontaneous unwanted inflammation. While LPS is commonly used in vitro as an inducer of priming signals, these cues are triggered in vivo by various molecules, including pro-inflammatory cytokines. We have provided a rationale for the use of LPS in vitro in the revised manuscript (lines 144-145).

      (3) The band intensities on Western blots in Fig. 4 and Fig. 5 are not quantified, and the numbers of repeats are also not provided. This additional information is recommended.

      While caspase-1, caspase-3, GSDMD, and GSDME but not AIM 2 and NLRP3 are activated upon proteolytic cleavage. It is not straightforward to quantify and describe the intensity of the bands of these numerous with different fate outcomes. We regret for not mentioning the numbers of repeats in the original submission. This information has now been provided in figure legends where necessary.

      (4) Many abbreviations are used throughout the text, and some of the full names are not provided.

      Full names are required at the first introduction.

      We agree. We have provided full names at the first introduction (lines 21, 23, 86).

      (5) Fig. 5B needs a label on the X axis.

      We regret the confusion: X axis was for both Fig. 5B and 5C. We have made the change in the new Fig. 5.

      Reviewer #2:

      The following specific points could be addressed to further improve the quality of the manuscript:

      (1) Concerning data presented in Figure 1, 3D micro-CT reconstructions of the entire femurs could be shown instead of just the trabecular bone. Data on cortical bone loss are important. It would be important to show histological (sagittal) sections of the bones at baseline, treated with Doxorubicin or vehicle, and quantify osteoblasts in addition to osteoclasts. Is there increased bone marrow adiposity in Doxorubicin-treated mice? The data with vehicle should be shown in the main figures not just in the supplemental data.

      We thank the reviewer for the suggestion. We have now provided 3 D micro-CT reconstructions of a representative femur containing both trabecular and cortical bones (S1B Fig). Only the metaphyseal area is shown because we did not originally scan the entire femur.

      Quantification of osteoblast number is not a reliable measurement, the reason why we carried out dynamic histomorphometry to assess the effect of doxorubicin on bone formation (original S1D Fig/new S1E Fig).

      Unfortunately, we did not determine the effects of doxorubicin on bone marrow adiposity. However, to address the reviewer’s comment, we have mentioned in the revised manuscript adipogenic effects of doxorubicin based on the literature (lines 264-265).

      (2) Concerning data presented in Figure 2, how long after Doxorubicin injection is leukopenia observed (beyond the 72-hour timepoint)? Does cell-count return to baseline 4 weeks after treatment (when the bone phenotype is characterized)? Why use 12-week-old mice here and 10week-old animals for the rest of the study?

      We appreciate the question. We did not measure leukopenic effects of doxorubicin beyond the 72-hour timepoint based on the following: i) bones are analyzed in mice injected only once with a single dose of doxorubicin; ii) leukopenia is a side effect of doxorubicin whose blood levels should be undetectable 4 weeks after its administration although we did not measure them experimentally. Our premise is that osteopenia observed in doxorubicin-exposed mice is the result of early events that occur after the administration of the drug.

      We apologize for the confusion. We assessed baseline bone mass by VivaCT using 10-week-old mice; doxorubicin was injected 2 weeks when mice were 12-week-old. We have clarified this point in the revised manuscript (line 301).

      (3) It would be important to evaluate local inflammation in bones collected from wild-type and mutant mice. Are ASC specks, Cit-H3, and MPO present in the bone marrow? The expression of some components of the inflammasomes or relevant pathways could be assessed in bone samples deprived of bone marrow and in the bone marrow.

      This is a good point. Although we were not able to reliably measure Cit-H3 and MPO in bone marrow fluid, our data shown in Figs. 3-6, 7A-D are from bone marrow cells.

      (4) Data presented in western blots should be quantified. The ratio of signal intensity obtained for beta-actin over the signal obtained for a given protein should be calculated for each experimental condition (especially in Figure 5, where beta-actin levels fluctuate a lot).

      Please see the response to question #1. Fluctuations in β-actin levels are likely related to doxorubicin cytotoxic effects as mentioned in the original submission (lines 150, 194, 253). Despite this caveat, IL-1β levels are stimulated by this drug.

      (5) In Figure 7, BV/TV of WT and mutant mice at baseline should be quantified and shown. Sagittal histological sections of the femur should be shown. 3D micro-CT reconstructions of the entire femur could be shown instead of just the trabecular bone. Osteoblasts and bone resorption should be quantified. Data obtained with vehicle should be quantified and shown in the main figure. The control and LPS conditions should be better defined. Does it include vehicle?

      Please see the response to reviewer 1’s question #1.

      We have now provided 3 D micro-CT reconstructions of a representative femur containing both trabecular and cortical bone (S3A, B Fig).

      LPS was dissolved in PBS (vehicle), which was used as control. We have now replaced vehicle with PBS in Fig. 7.

      (6) For all figures, the number of biological replicates should be mentioned in the legends, as well as the statistical tests used for the analyses.

      We have now included this information in the legends where necessary.

      (7) Some of the scientific rationales are not totally clear and could be better explained in the text. For example, it is written on page 6 "studies mainly on male mice and revolved around innate immune responses" and "we focused on neutrophils because of their high turnover rate and short lifespan", but it is not clear why. The rationale (page 10) for assessing bone mass in "mice globally lacking AIM2 and/or NLRP3" is not totally clear either. The argument is that systemic inflammation leads to bone loss but the effects obtained with the total ablation of AIM2 and NLRP3 do not prove strictly speaking that systemic inflammation really matters (in this current study, although we know from many other studies that it clearly does matter). We could imagine, for example, that bone mass would be preserved in AIM2 KO mice only because the inflammasome is impaired in osteoblasts and/or osteoclasts, but not in any other cell types. Conversely one could imagine that bone would be preserved only because inflammation is preserved in the gut, for example. The use of global knockouts unfortunately does not tell us much about the importance of systemic versus local effects of the inflammasomes. It shows that reducing inflammation, either in specific organs or globally, limits bone loss in doxorubicin-treated mice. This result is important but it was fully expected since doxorubicin has been reported to induce systemic inflammation, and since many studies have shown that systemic inflammation leads to bone loss.

      We appreciate the comments. We have clarified the rationale for focusing on neutrophils (lines 129-130) and AIM2 and NLRP inflammasomes (lines 209-211). We have also now down played the concept of inflammasome-mediated systemic inflammation in doxorubicin-induced bone loss.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Satoshi Yamashita et al., investigate the physical mechanisms driving tissue bending using the cellular Potts Model, starting from a planar cellular monolayer. They argue that apical length-independent tension control alone cannot explain bending phenomena in the cellular Potts Model, contrasting with the vertex model. However, the evidence supporting this claim is incomplete. They conclude that an apical elastic term, with zero rest value (due to endocytosis/exocytosis), is necessary in constricting cells and that tissue bending can be enhanced by adding a supracellular myosin cable. Notably, a very high apical elastic constant promotes planar tissue configurations, opposing bending.

      Strengths:

      • The finding of the required mechanisms for tissue bending in the cellular Potts Model provides a more natural alternative for studying bending processes in situations with highly curved cells.

      • Despite viewing cellular delamination as an undesired outcome in this particular manuscript, the model's capability to naturally allow T1 events might prove useful for studying cell mechanics during out-of-plane extrusion.

      We thank the reviewer for the careful comments and insightful suggestions.

      Weaknesses:

      • The authors claim that the cellular Potts Model is unable to obtain the vertex model simulation results, but the lack of a substantial comparison undermines this assertion. No references are provided with vertex model simulations, employing similar setups and rules, and explaining tissue bending solely through an increase in a length-independent apical tension.

      We did not copy the parameters of the vertex models in the preceding studies because we also found that the apical, lateral, and basal surface tensions must be balanced otherwise the epithelial cell could not maintain the integrity (Supplementary Figure 1), while the ratio was outside of the suitable range in the preceding studies.

      • The apparent disparity between the two models is attributed to straight versus curved cellular junctions, with cells with a curved lateral junction achieving lower minimum energies at steady-state. However, a critical discussion on the impact of T1 events, allowing cellular delamination, is absent. Note that some of the cited vertex model works do not allow T1 events while allowing curvature.

      We appreciate the comment, and will add it to the discussion.

      • The suggested mechanism for inducing tissue bending in the cellular Potts Model, involving an apical elastic term, has been utilized in earlier studies, including a cited vertex model paper (Polyakov 2014). Consequently, the physical concept behind this implementation is not novel and warrants discussion.

      The reviewer is correct but Polyakov et al. assumed “that the cytoskeletal components lining the inside membrane surfaces of the cells provide these surfaces with springlike elastic properties” without justification. We assumed that the myosin activity generated not the elasticity but the contractility based on Labouesse et al. (2015), and expected that the surface elasticity corresponded with the membrane elasticity. Also, in the physical concept, we clarified how the contractility and the elasticity differently deformed the cells and tissue, and demonstrated why the elasticity was important for the apical constriction. We will add it to the discussion.

      • The absence of information on parameter values, initial condition creation, and boundary conditions in the manuscript hinders reproducibility. Additionally, the explanation for the chosen values and their unit conversion is lacking.

      We agree with the comment, and will add them to the methods.

      Reviewer #2 (Public Review):

      Summary:

      In their work, the authors study local mechanics in an invaginating epithelial tissue. The mostly computational work relies on the Cellular Potts model. The main result shows that an increased apical "contractility" is not sufficient to properly drive apical constriction and subsequent tissue invagination. The authors propose an alternative model, where they consider an alternative driver, namely the "apical surface elasticity".

      Strengths:

      It is surprising that despite the fact that apical constriction and tissue invagination are probably most studied processes in tissue morphogenesis, the underlying physical mechanisms are still not entirely understood. This work supports this notion by showing that simply increasing apical tension is perhaps not sufficient to locally constrict and invaginate a tissue.

      We thank the reviewer for recognizing the importance and novelty of our work.

      Weaknesses:

      The findings and claims in the manuscript are only partially supported. With the computational methodology for studying tissue mechanics being so well developed in the field, the authors could probably have done a more thorough job of supporting the main findings of their work.

      We thank the reviewer for the careful assessment and suggestions. However our simulation was computationally expensive, modeling the epithelium in an analytically calculable expression requires a lot of work, and it is beyond the scope of the present study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      "Expanding the Drosophila toolkit for dual control of gene expression" by Zirin et al. aims to develop resources for simultaneous independent manipulation of multiple genes in Drosophila. The authors use CRISPR knock-ins to establish a collection of T2A-LexA and T2A-QF2 transgenes with expression patterns in a number of commonly studied organs and tissues. In addition to the transgenic lines that are established, the authors describe a number of plasmids that can be used to generate additional transgenes, including a plasmid to generate a dual insert of LexA and QF that can be resolved into a single insert using FLP/FRT-mediated recombination, and plasmids to generate RNAi reagents for the LexA and QF systems. Finally, the authors demonstrate that a subset of the LexA and QF lines that they generated can induce RNAi phenotypes when paired with LexAop or QUAS shRNA lines. In general, the claims of the paper are well supported by the evidence and the authors do a thorough job of validating the transgenic lines and characterizing their expression patterns.

      Strengths:

      • Numerous Gal4 lines allow for highly specific genetic manipulation in a wide range of organs and tissues, however, similar tissue-specific drivers using alternative binary expression systems are not currently well developed. This study provides a large number of tissue and organ-specific LexA and QF2 driver lines that should be broadly useful for the Drosophila community.

      • While a minority of the driver lines do not express the expected pattern (likely due to cryptic regulatory elements in the LexA or QF2 sequences), the ability to generate drivers using two different Gal4 alternatives mitigates this issue (as in nearly all cases at least one of the two systems produces a clean driver line with the expected expression pattern).

      • The use of LexA-GAD provides an additional degree of control as it is subject to Gal80 repression. This could prove to be particularly useful in cases where a researcher wishes to manipulate multiple genes using Gal4 and LexA-GAD drivers as the Gal80(ts) system could be used for simultaneous temporal control of both constructs.

      • The use of Fly Cell Atlas information to generate novel oenocyte-specific driver lines provides a useful proof-of-concept for constructing additional highly tissue-specific drivers.

      Weaknesses:

      • Since these reagents will most commonly be paired with existing Gal4 lines, adding information about corresponding Gal4 lines targeting these tissues and how faithfully the LexA and QF2 lines recapitulate these Gal4 patterns would be highly beneficial.

      It is outside the scope of this paper to analyze the expression patterns of the corresponding publicly available Gal4 lines. It is clear from the tissue specificity of the LexA-GAD and QF2 lines that they are expressed in the expected larval tissues based on the target genes. We have added a sentence in the discussion section noting “Further, we expect that there will also be differences between the expression pattern of corresponding Gal4 and the LexA-GAD/QF lines, as the latter were made by knock-in, while the former are often enhancer traps. However, based on our larval mounts and dissections, the stocks generated in this paper are highly specific to the expression pattern of the targeted genes.”

      • It is not stated in the manuscript if these transgenic lines and plasmids are currently publicly available. Information about how to obtain these reagents through Bloomington, Addgene, or TRiP should be added to the manuscript.

      We have added to the materials section that “All vectors described here that are required to produce new driver lines will be made available at Addgene.” And “All transgenic fly stocks described here will be made available at the Bloomington Drosophila Stock Center.”

      Reviewer #2 (Public Review):

      Zirin, Jusiak, and Lopes et al presented an efficient pipeline for making LexA-GAD and QF2 drivers. The tools can be combined with a large collection of existing GAL4 drivers for a dual genetic control of two cell populations. This is essential when studying inter-organ communications since most of the current genetic drivers are biased toward the expression of the central nervous system. In this manuscript, the authors described the methodology for efficiently generating T2A-LexA-GAD and T2A-QF2 knock-ins by CRISPR, targeting a number of genes with known tissue-specific expression patterns. The authors then validated and compared the expression of double as well as single drivers and found the tissue-specific expression results were largely consistent as expected. Finally, a collection of plasmids for LexA-GAD and QF,2 as well as the corresponding LexAop and QUAS plasmids were generated to facilitate the expansion of these tool kits. In general, this study will be of considerable interest to the fly community and the resources can be readily generalized to make drivers for other genes. I believe this toolkit will have a significant, immediate impact on the fly community.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Lines 56-57: Janelia Flylight lines are not necessarily brain-specific - this collection has or could be screened in other tissues.

      Correct. We have altered this sentence to read: However, these lines were developed primarily for brain expression. Although they are often expressed in other tissues, they are not well suited for experiments targeting non-neuronal cell types

      • Line 197 - I don't see the referenced Figure S1 in the reviewer materials. It appears this is actually referencing panels LL and MM in Figure 2.

      Correct. We have fixed this error.

      • No information on the injection efficiency to create the CRISPR knock-in lines is presented. I am guessing the efficiency will be similar to that of other reported HDR-based CRISPR knock-ins, but if this information is available it would be useful to include it so that others know what to expect when injecting these vectors.

      We did not systematically assay the injection efficiency. However, we can say that it was in line with previous descriptions of CRISPR-based plasmid and ‘drop-in’ HDR methods. We have added a note in the methods that “Knock-in efficiencies were comparable to previous reports (Kanca et al. 2019; Kanca et al. 2022).”

      • Demonstration of successful multi-manipulation would strengthen the paper.

      We do not feel that this is necessary as there have been many papers showing combinatorial Gal4+LexA/QF experiments. An example from our lab can be seen in PMID: 37582831.

      • Also, are there approaches for efficiently constructing pairs of UAS/LexAOp or UAS/QUAS shRNA lines that would potentially streamline the genetics for multi-manipulation? Otherwise, this could be rather cumbersome to implement as one needs to combine a Gal4 line, a LexA/QF2 line (which will be constrained as to its chromosomal location by the target gene), and separate UAS-shRNA and LexAop/QUAS-shRNA constructs into the same fly.

      There are some recent innovations that are useful in this respect. We have added a sentence to the discussion that says: “There remains an unmet need for a single vector that would allow for UAS/LexAop/QUAS control of different shRNAs. However, recent innovations in multi module vectors and multiplexed drug-based genetics allow researchers to more efficiently generate UAS/QUAS/lexAop transgenic fly strains (Matinyan et al. 2021; Wendler et al. 2022).”

      • In Figure 5 - is the difference for the hh inserts attributable to the driver line or the GFP/mCherry construct (or differential ability to detect GFP/mCherry)? One could try visualizing hhL(-Q) with the LexAop-GFP line. I guess that the correspondence between the nubbin and hh result suggests that maybe QF2 is suppressed in the wing pouch, but this could also be the difference in the reporter constructs and it would be interesting to know if this difference is truly attributable to the driver constructs from the standpoint of knowing how consistent the QF/LexA patterns are expected to be.

      The difference is not attributable to GFP versus mCherry or the specific LexAop and QUAS lines that we used in figure 5. We tested the double knock-in and derivative single knock-ins with various QUAS and lexAop reporters and always observed the same pattern.

      Reviewer #2 (Recommendations For The Authors):

      There are a few points that should be clarified. A list of these specific points is provided below with the view that this could help the preparations of a stronger, improved paper.

      Line 50-51: "There have been no systematic studies comparing the two systems, with only anecdotal evidence to support one system over the other." It is unclear to me what the anecdotal evidence the authors referred to. Could the authors elaborate more on this part?

      Based on an examination of QUAS brains, Potter et al, 2010 (PMID 20434990) makes the claim that “The low basal expression of QUAS and UAS reporters provides significant advantage compared to the lexA binary expression system.”

      Shearin et al., 2014 (PMID: 24451596) compared Gal4/UAS, LexA/LexAop, and QF/QUAS reporter strength with the nompC driver and found that the QF system produced the strongest expression.

      While these observations might be true in the nervous system, it isn’t clear that this extends to other tissues, nor what effect this would have on gene knockdown experiments.

      There have been some reports that have explored swapping out a Gal4 insertion for a LexA or QF at the same locus. For example, Gohl et al. 2011 PMID: (PMID 21473015) mentions that “the majority of the swaps captured most features of the original GAL4 expression patterns. In some cases, however, either prominent features of the GAL4 pattern were lost or we observed new expression patterns. These changes may have resulted from differences in the strength or responsiveness of reporter lines. Alternately, the swap may have modified some combination of enhancer spacing and sequence composition flanking the promoter.”

      Line 61-62: "On average, each StanEx line expresses LexA activity in five distinct cell types, with only one line showing expression in just one tissue..." What's the evidence to support this claim?

      This observation comes from Figure S3 of Kockel et al. 2016 (PMID: 27527793), where the authors “analyzed a subset of 76 StanEx lines that are unambiguously inserted within, or adjacent to, a single known gene.” We cited this reference in the preceding sentence. To clarify, we have added the citation again for line 61-62.

      Line 63-65: "These findings are consistent with prior studies indicating that enhancers very rarely produce expression patterns that are limited to a single cell type in a complex organism (Jenett et al. 2012)." It might be worth expanding on the use of the split system to achieve high cell-type-specificity. Especially, there are growing resources using split-intein and T2A-split-GAL4 with the prediction of genes from single-cell RNA sequencing datasets.

      We agree that the split system is currently the premier method to produce the most specific driver lines. Indeed, our group has recently published a paper on the split-intein Gal4 system (see PMID 37276389). However, the tradeoff is that split systems usually require generation of transgenic lines, which becomes impractical for research involving two independent binary transcriptional systems, as the user would need to combine at least three driver components into single stocks, plus the UAS/QUAS/LexAop insertions. The ideal would be to generate complementary split insertions on the same chromosome, but we think a discussion of this is tangential to the thrust of our work here.

      The authors did not fully discuss the rationale of using LexA-GAD vs LexA-p65 or VP16AD throughout the manuscript. I assumed the main reason for choosing LexA-GAD was to be compatible with GAL80 suppression. It might be worth explicitly stating in the result (e.g., line 123 or in the introduction). Also, did the authors observe weak transcriptional activation using LexA-GAD? It has been shown that the strength of transactional activation is much weaker for GAL4AD than the p65 or VP16AD. This might be worth noting in the manuscript as well.

      We did briefly mention in the introduction that one disadvantage of the Flylight lines is that they “use a p65 transcriptional activation domain and therefore are not compatible with the Gal80 temperature sensitive Gal4 repression system.” We have expanded on this issue in the introduction which now says: “We chose to use LexA with the Gal4 activation domain, rather than the p65 or VP16 activation domains to allow for temporal control by Gal80 (Lai and Lee 2006; Pfeiffer et al. 2010). We chose to use QF2 variant over the original QF, to avoid the toxicity reported for the latter (Riabinina et al. 2015).”

      We did not have any problems visualizing gene expression with fluorescent reporters. Nor did we have any difficulty obtaining knock-down phenotypes with ubiquitous drivers.

      Line 125-127. Is there a specific reason why the authors chose the SV40 terminator for the double driver construct but the Hsp70 terminator for the single driver construct?

      We found that the Hsp70 terminator gave slightly lower expression and decided to use this for the singles to avoid toxicity. For the doubles we chose the SV40, to compensate for reduced protein expressiojn of the second gene position.

      Line 144-146: "To verify the knock-ins, we PCR-amplified the genomic regions flanking the insertion sites and confirmed that the insertions were seamless and in-frame." Did the authors recover lines with indel introduced, resulting in out-of-frame insertion?

      Yes, we did see indels, which sometimes resulted in out of frame insertions, which were discarded. This result is in line with what we have observed with other CRISPR HDR knock-in experiments.

      The underlying reason might be out of the scope of this manuscript. However, it would still be helpful for the authors to speculate the potential reasons why the T2A-LexA-GAD and T2A-QF2 targeting the same insertion site showed very distinct expressions.

      It is outside the scope of this report to test this issue experimentally. We have a section in the discussion which does speculate as to the reason: “While we had no difficulty obtaining knock-ins for both types of activators, we did observe that for some target genes, the T2A-QF2 was only active in a subset of the expected gene expression pattern. In particular, we found that T2A-QF2 was difficult to express in the wing pouch. It may be that toxicity is an issue, and the weaker QF2w may be a better option for generating drivers in some organs (Riabinina and Potter 2016). Alternatively, differences in the LexA-GAD and QF2 sequences, and sequence length, could impact the function of nearby gene regulatory regions.”

      Regarding the observation that the existence of 3XP3-RFP marker can interfere with the expression of T2A-LexA-GAD and T2A-QF2 expression in a case-by-case manner, it might be worth emphasizing in the discussion that the proper removal of 3XP3-RFP marker by Cre/LoxP recombination is important.

      We have added the following to the discussion: “Importantly, our knock-in constructs contain the 3XP3-RFP cassette for screening transformants. Perhaps due to interaction between the 3XP3 promoter and the regulatory regions of the target gene, we occasionally saw misexpression of the LexA-GAD/QF2 in the 3XP3 domain. We have therefore prioritized Cre-Lox removal of the 3XP3-RFP cassette from our knock-in stocks, and advise that users of the plasmids described here likewise remove the marker, following successful knock-in.”

      For Fig. 5B, 5F-G, the authors should elaborate more in the result section. For example, lines 215-217: "We tested this with the hh and dpp lines and observed robust generation of both T2A-QF2 and T2A-LexA-GAD from hs-Flp; T2A-QF2-T2A-LexA-GAD parents (Figure 5B)." It is unclear what the authors mean by "robust generation". Also, there is no description of the results in Fig. 5F-G.

      We have expanded this section for figure 5B, which now reads: “We tested this with the hh and dpp lines and observed robust generation of both T2A-QF2 and T2A-LexA-GAD from hs-Flp; T2A-QF2-T2A-LexA-GAD parents (Figure 5B). In the case of the hh line, 15 out of 36 heat-shocked parents gave rise to at least one T2A-LexA-GAD progeny, with a mean of 14% recombinant offspring per parent. 20 out of 36 gave rise to at least one T2A-QF2 progeny, with a mean of 9% recombinant offspring per parent. In the case of the dpp line, 31 out of 32 heat-shocked parents gave rise to at least one T2A-LexA-GAD progeny, with a mean of 30% recombinant offspring per parent. 17 out of 32 gave rise to at least one T2A-QF2 progeny, with a mean of 9% recombinant offspring per parent.

      We have also added a description for Figure 5F-G, which reads: “Recombinants were also independently verified by PCR of the insertions (Figure 5F-G), where we observed the expected smaller band sizes in the derivative T2A-QF2 and T2A-LexA-GAD relative to the parental double driver.”

      Line 229, minor error: "Into these vectors, ..."

      We have edited this to read: “We cloned shRNAs targeting forked (f) and ebony (e) genes into these vectors and assayed their phenotypes when crossed to ubiquitous LexA-GAD and QF2 drivers.”

      Line 238-240: "Both Tub-LexA-GAD and Tub-QF2 drivers generated knockdown phenotypes in the thorax when crossed to f and e shRNA lines. However, the Tub-LexA-GAD phenotypes were stronger than those of Tub-QF2 (Figure 6C-D, F-G, I-J)." The stated "stronger phenotypes" are not clear to me. It might be worth elaborating more.

      We have further clarified this by changing it to: “However, the Tub-LexA-GAD phenotypes were stronger than those of Tub-QF2 (Figure 6C-D, F-G, I-J). For example, Tub-LexA-GAD produced a fully penetrant f bristle phenotype (Figure 6F) while some wild-type bristles remained on the thoraces of Tub-QF2 f knockdown (Figure 6G). Neither Tub-LexA-GAD or Tub-QF2 was able to achieve the strength of phenotype generated by the T2A-LexA-GAD da knock-in line (compare the darkness of the cuticle caused by e knockdown in Figure 6H-J).”

      Line 257-250: "Our collection of T2A-LexA-GAD and T2A-QF2 and double driver vectors can be easily adapted to target any gene for CRISPR knock-in, with a high probability that the resulting line will accurately reflect the expression of the endogenous locus" The authors could refer to the recent gene-specific Trojan GAL4/split-GAL4 work to support the idea that these gene-specific T2A-GAL4/split-GAL4 drivers reflect better than the enhancer-based drivers.

      We have added the following sentence to the discussion: “The specificity achieved with this approach can also be seen in recent efforts to build collections of gene specific T2A-Split-Gal4 and T2A-Gal4 insertions (Kanca et al. 2019; Chen et al. 2023; Ewen-Campen et al. 2023).”

      Line 630: "Removal of 3XP3-RFP eliminated gut and anal pad misexpression and did not affect glial cell expression." It would be helpful to add the annotation on Fig. 3B to show the location of glial cell expression.

      We have added arrowheads on Figure 3 and the legend now reads: “Removal of 3XP3-RFP eliminated gut and anal pad misexpression and did not affect glial cell expression (white arrowheads).

      Line 650-651: "The fat body mCherry expression is also present in the reporter stock and does not indicate LexA-GAD activity." I did not get what the authors were trying to convey. Where did the fat body mCherry expression come from? Please elaborate more.

      We have changed this section to explain that “The fat body mCherry expression (yellow arrowhead) is from leakiness of the reporter stock and does not indicate LexA-GAD activity.”

      Line 679-680: "forked shRNA produced a forked bristles phenotype." Please add the annotation on the figures to show where the phenotypes were.

      We have added arrowheads and asterisks to the figure. The legend now reads: “(E-G) forked shRNA produced a forked bristles phenotype (white arrowheads). Note that some bristles retain a more elongated wild-type morphology with the Tub-QF2 driven forked knockdown (G, yellow asterisk).”

      Fig 1D-E and 4A-B. There is no description throughout the manuscript about QA, QS regulation as well as little GAL80ts regulation. It will confuse readers with a little fly genetic background. Please include the introductions of these regulations of different binary expression systems.

      We have added a section in the introduction, which states: “We chose to use LexA with the Gal4 activation domain, rather than the p65 or VP16 activation domains to allow for temporal control by the temperature sensitive Gal4 repressor, Gal80 (Lai and Lee 2006; Pfeiffer et al. 2010). We chose to use QF2 variant over the original QF, to avoid the toxicity reported for the latter (Riabinina et al. 2015). Like Gal80-based modulation of LexA-GAD, QF2 activity can also be regulated temporally by expressing QS, a QF repressor. QS repression of QF can be released by feeding flies quinic acid (Riabinina and Potter 2016).”

      Fig. 2, there are several ND in the figure without any explanation in the manuscript (e.g. Mef2 and He). In addition, the expression patterns look quite different between T2A-LexA-GAD and T2A-QF2 for some genes (e.g., mex1, Myo31DF), but the authors did not mention any of them in the manuscript. Please elaborate more.

      We have altered the Figure 2 legend as follows: “(A-KK) T2A-LexA-GAD knock-in lines crossed to a LexAop-GFP reporter and T2A-QF2 knock-in lines crossed to a QUAS-GFP reporter. Panels show 3rd instar larva. GFP shows the driver line expression pattern. RFP shows the 3XP3 transformation marker, which labels the posterior gut and anal pads of the larva. Gene names and tissues are on the left. We failed to obtain LexA-GAD knock-ins for Mef2 (E) and He (DD). (LL-MM) 3rd instar imaginal disc from the insertions in the nubbin (nub) gene. Note that most of the lines are highly tissue-specific and are comparable between the LexA-GAD and QF2 knock-ins. Insertions in the daughterless gene (da) and nub are an exception, as the T2A-LexA-GAD, but not the T2A-QF2, gives the expected expression pattern. Insertions in the gut-specific genes mex1 (X-Y) and Myo31Df (Z-AA) also differed between the LexA-GAD and QF2 drivers.”

      We have also added a note on the inconsistency of mex1 and Myo31Df in the discussion: “While we had no difficulty obtaining knock-ins for both types of activators, we did observe that for some target genes, the T2A-QF2 was only active in a subset of the expected gene expression pattern. In particular, we found that T2A-QF2 was difficult to express in the wing pouch. Additionally, we found that the driver expression in the gut-specific genes, mex1 and Myo31Df differed between the LexA-GAD and QF2 transformants. In both cases the LexA-GAD was more broadly expressed along the length of the gut than the QF2. It may be that toxicity is an issue, and the weaker QF2w may be a better option for generating drivers in some organs (Riabinina and Potter 2016).”

      Fig. 4B, it is unclear why the hsp70 is present downstream of the enhancer of interest (upstream of T2A). Is it the molecular mark resulting from the cloning steps? Does it serve any specific purpose?

      This is the Drosophila hsp70 gene minimal promoter and is standard for many expression constructs in Drosophila. In the methods section we described how we made versions of the pMCS-T2A-QF2-T2A-LexA-GAD-WALIUM20 with and without tis minimal promoter: “We used pMCS-T2A-QF2-T2A-lexA0GAD-WALIUM20 for dpp-blk and pMCS-T2A-QF2-T2A-lexGAD-WALIUM20-alt (which lacks the hsp70 promoter) for Ilp2, since dpp-blk does not have a basal promoter, but the Ilp2 enhancer does.”

      Fig 5A. The resulting single T2A-QF2 and T2A LexA-GAD from the double driver parental lines retain the sequence of FRT3 upstream of the QF2 and LexA-GAD. I assume the FRT3 part will be translated and remain attached to QF2 and LexA-GAD. Is that correct? If so, would this cause any adverse effect?

      Correct. The FRT3 sequence is present in both the parental double and single derivatives. We can say that the additional amino acids do not prevent LexA-GAD or QF2 transcriptional activation. We do not know whether there may be other adverse effects, though we did not observe any.

      Fig. 5C-C'. It seems like the images of Fig. 5C-C' were the same as Fig. 4D-D'. If so, the authors should indicate that in the figure legend.

      We have made a note of this in the figure legend.

    1. Author Response:

      Reviewer #1:

      Summary:

      The authors use an innovative behavior assay (chamber preference test) and standard calcium imaging experiments on cultured dorsal root ganglion (DRG) neurons to evaluate the consequences of global knockout of TRPV1 and TRPM2, and overexpression of TRPV1, on warmth detection. They find a profound effect of TRPM2 elimination in the behavioral assay, whereas elimination of TRPV1 has the largest effect on neuronal responses. These findings are of importance, as there is still substantial discussion in the field regarding the contribution of TRP channels to different aspects of thermosensation.

      Strengths:

      The chamber preference test is an important innovation compared to the standard two-plate test, as it depends on thermal information sampled from the entire skin, as opposed to only the plantar side of the paws. With this assay, and the detailed analysis, the authors provide strong supporting evidence for the role of TRPM2 in warmth avoidance. The conceptual framework using the Drift Diffusion Model provides a first glimpse of how this decision of a mouse to change between temperatures can be interpreted and may form the basis for further analysis of thermosensory behavior.

      Weaknesses:

      The authors juxtapose these behavioral data with calcium imaging data using isolated DRG neurons. Here, there are a few aspects that are less convincing.

      (1) The authors study warmth responses using DRG neurons after three days of culturing. They propose that these "more accurately reflect the functional properties and abundance of warm-responsive sensory neurons that are found in behaving animals." However, the only argument to support this notion is that the fraction of neurons responding to warmth is lower after three days of culture. This could have many reasons, including loss of specific subpopulations of neurons, or any other (artificial?) alterations to the neurons' transcriptome due to the culturing. The isolated DRGs are not selected in any way, so also include neurons innervating viscera not involved in thermosensation. If the authors wish to address actual changes in sensory nerves involved in warmth sensing in TRPM2 or TRPV1 KO mice without disturbing the response profile as a result of the isolation procedure, other approaches would be needed (e.g. skin-nerve recordings or in vivo DRG imaging).

      We agree that there could be several reasons as to why the responses of cultured DRGs are reduced compared to the acute/short-term cultures. It is possible ––and likely–– that

      transcriptional changes happen over the course of the culturing period. It is also possible that it is a mere coincidence that the 3-day cultures have a response profile more similar to the in vivo situation than the acute cultures. In the revised manuscript, we will therefore tone down the claim that the 3-day cultures mirror the native conditions more appropriately.

      Nevertheless, our results clearly show that acute cultures have a response profile that is much more similar to damaged/”inflamed” neurons, irrespective of any comparison to the 3 daycultures. Therefore, we believe, it is helpful to include this data to make scientists aware that acute cultures are very different to non-inflamed native/in vivo DRG neurons that many researchers use in their experiments.

      In some experiments not shown in the first version of our manuscript, we applied the TRPchannel agonists Menthol, Capsaicin and AITC (mustard oil) consecutively in a few 3-day

      cultures. We also have Capsaicin responses from overnight cultures. We will attempt to correlate the percentage of the neurons responsive to these TRPV1, TRPM8 and TRPA1

      ion channel agonists in our cultures to the percentages of neurons found to express the respective TRP ion channels (TRPM8, TRPV1 and TRPA1) in vivo. While this type of

      analysis won’t prove that 3-day cultures are similar to the in vivo situation (even if there is good correlation between the in vitro and in vivo results), it might support the usage of 3-day cultures as a model.

      (2) The authors state that there is a reduction in warmth-sensitive DRG neurons in the TRPM2 knockout mice based on the data presented in Figure 2D. This is not convincing for the following reasons. First, the authors used t-tests (with FDR correction - yielding borderline significance) whereas three groups are compared here in three repetitive stimuli. This would require different statistics (e.g. ANOVA), and I am not convinced (based on a rapid assessment of the data) that such an analysis would yield any significant difference between WT and TRPM2 KO. Second, there seems to be a discrepancy between the plot and legend regarding the number of LOV analysed (21, 17, and 18 FOV according to the legend, compared to 18, 10, and 12 dots in the plot). Therefore, I would urge the authors to critically assess this part of the study and to reconsider whether the statement (and discussion) that "Trpm2 deletion reduces the proportion of warmth responders" should be maintained or abandoned.

      Yes, we agree that the statistical tests indicated by the referee are more appropriate/robust for the data shown in Figures 1F, 2D, and 4G.

      When we perform 2-way repeated measures ANOVA and subsequent multiple comparison test (with Dunnets correction) against Wildtype, for data shown in Fig. 2D, both the main effect (Genotype) and the interaction term (Stimulus x Genotype) are significant. The multiple comparison yields very similar result as in the current manuscript, with the difference that the TRPM2-KO data for the 2nd stimulus (~36°C) is borderline significant (with a p-value of p=0.050).

      Due to the possible dependence of the repeated temperature stimuli and the variability of each stimulus between FOVs (Fig. 2C), it is possible that a mixed-effect model that accounts for these effects is more appropriate.

      Similarly, for plots 1F and 4G, Genotype (either as main effect or as interaction with Time) is significant after a repeated measures two-way ANOVA. The multiple comparisons (with Bonferroni correction) only changed the results marginally at individual timepoints, without affecting the overall conclusions. The exception is Fig. 4G at 38°C, where the interaction of Time and Genotype is significant, but no individual timepoint-comparison is significant after Bonferroni correction.

      The main difference between the results presented above and the ones presented in the manuscript is the choice of the multiple comparison correction. We originally opted for the falsediscovery rate (FDR) approach as it is less prone to Type II errors (false negatives) than other methods such as Sidaks or Bonferroni, particularly when correcting for a large number of tests. However, we are mainly interested in whether the genotypes differ in their behavior in each temperature combination and the significant ANOVA tests for Fig. 1F and 4G support that point. The statistical test and comparison used in the current version of the manuscript, comparing behavior at individual/distinct timepoints, are interesting, but less relevant (and potentially distracting), as we do not go into the details about the behavior at any given/distinct timepoint in the assay.

      Therefore, and per suggestion of the reviewer, we will update the statistics in the revised version of the manuscript. Also, we will report the correct number of FOVs in the legend.

      (3) It remains unclear whether the clear behavioral effect seen in the TRPM2 knockout animals is at all related to TRPM2 functioning as a warmth sensor in sensory neurons. As discussed above, the effects of the TRPM2 KO on the proportion of warmth-sensing neurons are at most very subtle, and the authors did not use any pharmacological tool (in contrast to the use of capsaicin to probe for TRPV1 in Figures S3 and S4) to support a direct involvement of TRPM2 in the neuronal warmth responses. Behavioral experiments on sensory-neuron-specific TRPM2 knockout animals will be required to clarify this important point.

      As mentioned above, we will tone down the correlation between the cellular and behavioral data and further stress the possibility that the Trpm2-KO phenotype is possibly related to the function of the ion channel outside of DRGs.

      (4) The authors only use male mice, which is a significant limitation, especially considering known differences in warmth sensing between male and female animals and humans. The authors state "For this study, only male animals were used, as we aimed to compare our results with previous studies which exclusively used male animals (7, 8, 17, 43)." This statement is not correct: all four mentioned papers include behavioral data from both male and female mice! I recommend the authors to either include data from female mice or to clearly state that their study (in comparison with these other studies) only uses male mice.

      In the studies by Tan et al. And Vandevauw et al. Only male animals were used for the behavioral experiments. Yarmolinsky et al. And Paricio-Montesinons et al. used both males and females while, as far as we can tell, only Paricio-Montesions et al. Reported that no difference was observed between the sexes. This is a valid point though -- when our study started 6-7 years ago, we only used male mice (as did many other researchers) and this we would now do differently. Nevertheless, we included some female mice in these experiments and will reevaluate if the numbers are sufficient so that we can generalize the phenotypes to both sexes or report differences in the revised ms.

      Wildtypes are all C57bl/6N from the provider Janvier. Generally, all lines are backcrossed to C57bl/6 mice and additionally inbreeding was altered every 4-6 generations by crossing to C57bl/6. Exactly how many times the Trp channel KOs have been backcrossed to C57bl/6 mice we cannot exactly state.

      Reviewer #3:

      Summary and strengths:

      In the manuscript, Abd El Hay et al investigate the role of thermally sensitive ion channels TRPM2 and TRPV1 in warm preference and their dynamic response features to thermal stimulation. They develop a novel thermal preference task, where both the floor and air temperature are controlled, and conclude that mice likely integrate floor with air temperature to form a thermal preference. They go on to use knockout mice and show that TRPM2-/- mice play a role in the avoidance of warmer temperatures. Using a new approach for culturing DRG neurons they show the involvement of both channels in warm responsiveness and dynamics. This is an interesting study with novel methods that generate important new information on the different roles of TRPV1 and TRPM2 on thermal behavior.

      Open questions and weaknesses:

      (1) Differences in the response features of cells expressing TRPM2 and TRPV1 are central and interesting findings but need further validation (Figures 3 and 4). To show differences in the dynamics and the amplitude of responses across different lines and stimulus amplitudes more clearly, the authors should show the grand average population calcium response from all responsive neurons with error bars for all 3 groups for the different amplitudes of stimuli (as has been presented for the thermal stimuli traces). The authors should also provide a population analysis of the amplitude of the responses in all groups to all stimulus amplitudes. Prior work suggests that thermal detection is supported by an enhancement or suppression of the ongoing activity of sensory fibers innervating the skin. The authors should present any data on cells with ongoing activity.

      We will include grand average population analysis of the different groups in the revised version.

      Concerning the point about ongoing activity: We are not sure if it is possible in neuronal cultures to faithfully recapitulate ongoing activity. Ongoing activity has been mostly recorded in skinnerve preparations (or in older studies in other types of nerve recordings) and there are only very few studies that show ongoing activity in cultured experiments and then the ongoing activity only starts in sensory neuron cultures when cultured for even longer time periods than 3 days (Ref.: doi: 10.1152/jn.00158.2018). We have very few cells that show some spontaneous activity, but these are too few to draw any conclusions. In any case, nerve fibers might be necessary to drive ongoing activity which are absent from our cultures.

      (2) The authors should better place their findings in context with the literature and highlight the novelty of their findings. The introduction builds a story of a 'disconnect' or 'contradictory' findings about the role of TRPV1 and TRPM2 in warm detection. While there are some disparate findings in the literature, Tan and McNaughton (2016) show a role for TRPM2 in the avoidance of warmth in a similar task, Paricio et al. (2020) show a significant reduction in warm perception in TRPM2 and TRPV1 knock out lines and Yarmolinksy et al. (2016) show a reduction in warm perception with TRPV1 inactivation. All these papers are therefore in agreement with the authors finding of a role for these channels in warm behavior. The authors should change their introduction and discussion to more correctly discuss the findings of these studies and to better pinpoint the novelty of their own work.

      Paricio-Montesinos et al. argue that TRPM8 is crucial for the detection of warmth, as TRPM8-KO animals are incapable of learning the operant task. TRPM2-KO animals and, to a smaller extent TRPV1-KO animals, have reduced sensitivity in the task, but are still capable of learning/performing the task. However, in our chamber preference assay this is reversed: TRPM2-KO animals lose the ability to differentiate warm temperatures while TRPM8 appears to play no major role. A commonality between the two studies is that while TRPV1 affects the detection of warm temperatures in the different assays, this ion channel appears not to be crucial.

      Similarly, Yarmolinsky et al. show that Trpv1-inactivation only increases the error rate in their operant assay (from ~10% to ~30%), without testing TRPM2. And Tan et al. show the

      importance of TRPM2 in the preference task, without testing for TRPV1.

      More generally, the choice of the assay, being either an operant task (Paricio-Montesinos et al. and Yarmolinsky et al.) or a preference assay without training of the mice (Tan et al. and our data here), might be important and different TRP receptors may be relevant for different types of temperature assays, which we will extend on in the discussion in the revised manuscript. While our results generally agree with the previous studies, they add a different perspective on the analysis of the behavior (with correlation to cellular data). We will adjust the manuscript to highlight the advances more clearly.

      (3) The responses of 60 randomly selected cells are shown in Figure 2B. But, looking at the TRPM2-/- data, warm responses appear more obvious than in WTs and the weaker responders of the WT group appear weaker than the equivalent group in the TRPV1-/- and TRPM2-/- data. This does not necessarily invalidate the results, but it may suggest a problem in the data selection. Because the correct classification of warm-sensitive neurons is central to this part of the study more validation of the classifier should be presented. For example, the authors could state if they trained the classifier using equal amounts of cells, show some randomly selected cells that are warm-insensitive for all genotypes, and show the population average responses of warm-insensitive neurons.

      The classifier was trained on a balanced dataset of 1000 (500 responders and 500 nonresponders), manually labelled traces across all 5 temperature stimuli. The prediction accuracy was 98%. We will describe more clearly how the classifier was trained and include examples and also show the population average responses in the revised manuscript.

      (4) The interpretation of the main behavioral results and justification of the last figure is presented as the result of changes in sensing but differences in this behavior could be due to many factors and this needs clarification and discussion. (i) The authors mention that 'crucially temperature perception is not static' and suggest that there are fluctuating changes in perception over time and conclude that their modelling approach helps show changes in temperature detection. They imply that temperature perceptual threshold changes over time, but the mouse could just as easily have had exactly the same threshold throughout the task but their motivation (or some other cognitive variable) might vary causing them to change chamber. The authors should correct this. (ii) Likewise, from their fascinating and high-profile prior work the authors suggest a model of internal temperature sensing whereby TRPM2 expression in the hypothalamus acts as an internal sensory of body temperature. Given this, and the slow time course of the behavior in chambers with different ambient temperatures, couldn't the reason for the behavioral differences be due to central changes in hypothalamic processing rather than detection by skin temperature? If TRPM2-/- were selectively ablated from the skin or the hypothalamus (these experiments are not necessary for this paper) it might be possible to conclude whether sensation or body temperature is more likely the root cause of these effects but, without further experiments it is tough to conclude either way. (iii) Because the ambient temperature is controlled in this behavior, another hypothesis is that warm avoidance could be due to negative valence associated with breathing warm air, i.e. a result of sensation within the body in internal pathways, rather than sensing from the external skin. Overall, the authors should tone down conclusions about sensation and present a more detailed discussion of these points.

      We are sorry that the statement including the phrase “crucially temperature perception is not static” is ambiguous; what we meant to say is that with the mouse moving across the two chambers, the animal experiences different temperatures over time (not that the perceptual threshold of the mouse changes). We will clarify this stament in the revised version of the manuscript.

      But even so, it could be that some other variable (motivation etc) makes the mouse change the chamber; we hypothesize that this variable (whatever it might be) is still modulated by temperature (at least this would be the likeliest explanation that we see).

      As for the aspect of internal/hypothalamic temperature sensing: we have included this possibility already in the discussion but will further emphasize this possibility in the revised manuscript.

      As for the point of negative valence mediated by breathing in warm air: yes, presumably this could also be possible. The aspect of valence is in interesting aspect by itself: would the mice be rather repelled from the (uncomfortable) hot plate or more attracted to the (more comfortable) thermoneutral plate, or both? Something to elucidate in a different study.

      (5) It is an excellent idea to present a more in-depth analysis of the behavioral data collected during the preference task, beyond 'the mouse is on one side or the other'. However, the drift-diffusion approach is complex to interpret from the text in the results and the figures. The results text is not completely clear on which behavioral parameters are analyzed and terms like drift, noise, estimate, and evidence are not clearly defined. Currently, this section of the paper slightly confuses and takes the paper away from the central findings about dynamics and behavioral differences. It seems like they could come to similar conclusions with simpler analysis and simpler figures.

      We will reassess the description of the drift diffusion model and explain it more clearly. Additionally, we will assess whether we can introduce the drift diffusion model and analysis better at the beginning of the study, subsequent to Figure 1 to have the model and this type of analysis coherent with the first behavior results (instead of introducing the model only at the very end).

      (6) In Figure 2D the % of warm-sensitive neurons are shown for each genotype. Each data point is a field of view, however, reading the figure legend there appear to be more FOVs than data points (eg 10 data points for the TRPV1-/- but 17 FOVs). The authors should check this.

      We check and make sure that in the revised manuscript the number of FOVs mentioned in the legend and the number shown in the Figure 2D are in agreement.

      (7) Can the authors comment on why animals with over-expression of TRPV1 spend more time in the warmest chamber to start with at 38C and not at 34C?

      This is an interesting observation that we did not consider before. A closer look at Figure 4H reveals that the majority of the TRPV1-OX animals, have a proportionally long first visit to the 38°C room. We can only speculate why this is the case. We cannot rule out that this a technical shortcoming of the assay and how we conduced it – but we don’t observe this for the wildtype mice, thus it is rather unlikely a technical problem. It is possible that this is a type of “freezing-” (or “startle-“) behavior when the animals first encounter the 38°C temperature. Freezing behaviors in mice can be observed when sudden/threatening stimuli are applied. It is possible that, in the TRPV1-overexpressing animals, the initial encounter with 38°C leads to activation of a larger proportion of cells (compared to WT ctrls), possibly signaling a “painful” stimulus, and thus leading to this startle effect. It is noteworthy, however, that with more stringent repeated measure statistics applied as suggested by the referees, the difference at the first measured time point in Fig. 4G is not significantly different anymore (see comment #2 above. This does not rule out that this might be a true effect, but such a claim would benefit from additional experiments that test such and hypothesis more rigorously.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1.1: The distinction of PIGS from nearby OPA, which has also been implied in navigation and ego-motion, is not as clear as it could be.

      Response1.1: The main “functional” distinction between TOS/OPA and PIGS is that TOS/OPA responds preferentially to moving vs. stationary stimuli (even concentric rings), likely due to its overlap with the retinotopic motion-selective visual area V3A, for which this is a defining functional property (e.g. Tootell et al., 1997, J Neurosci). In comparison, PIGS does not show such a motion-selectivity. Instead, PIGS responds preferentially to more complex forms of motion within scenes.

      Moreover, PIGS and TOS/OPA are located in differently relative to the retinotopic visual areas. Briefly, PIGS is located adjacent to areas IPS3-4 while TOS/OPA overlaps with areas V3A/B and IPS0 (V7). This point is now highlighted in the new experiment 3b and the new Figure 6. In this revision, we also tried to better highlight these point in sections 4.3, 4.4 and 4.5. (see also the response to the first comment from Reviewer #2).

      Reviewer 2:

      Comment 2.1: First, the scene-selective region identified appears to overlap with regions that have previously been identified in terms of their retinotopic properties. In particular, it is unclear whether this region overlaps with V7/IPS0 and/or IPS1. This is particularly important since prior work has shown that OPA often overlaps with v7/IPS0 (Silson et al, 2016, Journal of Vision). The findings would be much stronger if the authors could show how the location of PIGS relates to retinotopic areas (other than V6, which they do currently consider). I wonder if the authors have retinotopic mapping data for any of the participants included in this study. If not, the authors could always show atlas-based definitions of these areas (e.g. Wang et al, 2015, Cerebral Cortex).

      Response 2.1: We thank the reviewers for reminding us to more clearly delineate this issue of possible overlap, including the information provided by Silson et al, 2016. The issue of possible overlap between area TOS/OPA and the retinotopic visual areas, both in humans and non-human primates, was also clarified by our team in 2011 (Nasr et al., 2011). As you can see in Figure 6 (newly generated), and consistent with those previous studies, TOS/OPA overlaps with visual areas V3A/B and V7. Whereas PIGS is located more dorsally close to IPS3-4. As shown here, there is no overlap between PIGS and TOS/OPA and there is no overlap between PIGS and areas V3A/B and V7.

      To more directly address the reviewer’s concern, in this revision, we have added a new experiment (Experiment 3b) in which we have shown the relative position of PIGS and the retinotopic areas in two individual subjects (Figure 6). All the relevant points are also discussed in section 4.3.

      Comment 2.2: Second, recent studies have reported a region anterior to OPA that seems to be involved in scene memory (Steel et al, 2021, Nature Communications; Steel et al, 2023, The Journal of Neuroscience; Steel et al, 2023, biorXiv). Is this region distinct from PIGS? Based on the figures in those papers, the scene memory-related region is inferior to V7/IPS0, so characterizing the location of PIGS to V7/IPS0 as suggested above would be very helpful here as well. If PIGS overlaps with either of V7/IPS0 or the scene memory-related area described by Steel and colleagues, then arguably it is not a newly defined region (although the characterization provided here still provides new information).

      Response 2.2: The lateral-place memory area (LPMA) is located on the lateral brain surface, anterior relative to the IPS (see Figure 1 from Steel et al., 2021 and Figure 3 from Steel et al., 2023). In contrast, PIGS is located on the posterior brain surface, also posterior relative to the IPS. In other words, they are located on two different sides of a major brain sulcus. In this revision we have clarified this point, including the citations by Steel and colleagues in section 4.3.

      Comments 2.3: Another reason that it would be helpful to relate PIGS to this scene memory area is that this scene memory area has been shown to have activity related to the amount of visuospatial context (Steel et al, 2023, The Journal of Neuroscience). The conditions used to show the sensitivity of PIGS to ego-motion also differ in the visuospatial context that can be accessed from the stimuli. Even if PIGS appears distinct from the scene memory area, the degree of visuospatial context is an alternative account of what might be represented in PIGS.

      Response 2.3: The reviewer raises an interesting point. One minor confusion is that we may be inadvertently referring to two slightly different types of “visuospatial context”. Specifically, the stimuli used in the ego-motion experiment here (i.e. coherently vs. incoherently changing scenes) represent the same scenes, and the only difference between the two conditions is the sequence of images across the experimental blocks. In that sense, the two experimental conditions may be considered to have the same visuospatial “context”. However, it could be also argued that the coherently changing scenes provide more information about the environmental layout. In that case, considering the previous reports that PPA/TPA and RSC/MPA may also be involved in layout encoding (Epstein and Kanwisher 1998; Wolbers et al. 2011), we expected to see more activity within those regions in response to coherently compared incoherently changing scenes. These issues are now more explicitly discussed in the revised article (section 4.6).

      Reviewer 3:

      Comment 3.1: There are few weaknesses in this work. If pressed, I might say that the stimuli depicting ego-motion do not, strictly speaking, depict motion, but only apparent motion between 2s apart photographs. However, this choice was made to equate frame rates and motion contrast between the 'ego-motion' and a control condition, which is a useful and valid approach to the problem. Some choices for visualization of the results might be made differently; for example, outlines of the regions might be shown in more plots for easier comparison of activation locations, but this is a minor issue.

      Response 3.1: We thank the reviewer for these constructive suggestions, and we agree with their comment that the ego-motion stimuli are not smooth, even though they were refreshed every 100 ms. However, the stimuli were nevertheless coherent enough to activate areas V6 and MT, two major areas known to respond preferentially to coherent compared to incoherent motion.

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed reading this article. I have a few suggestions for improvement:

      (1) Delineation from OPA: The OPA has been described in quite similar terms as PIGS, with its involvement in ego-motion (e.g., crawling, walking) and navigation in general (e.g., Dilks' recent work; Bonner and Epstein). The authors address the distinction in section 4.4. Unlike Kamps et al. (2016) and Jones et al. (2023), the authors found weak or no evidence for ego-motion in OPA. They explain this discrepancy with differences in refresh rates and different levels of spatial smoothing of the fMRI data. It is not clear why these fairly small methodological differences would lead to different findings of ego-motion in the OPA. Arguably, the OPA is the closest of the "established" scene areas to PIGS, both in anatomical location and in function. I would therefore appreciate a more detailed discussion of the differences between these two areas.

      Response: Jones et al. have also shown that ego-motion TOS/OPA activity when compared to scrambled scenes. This is fundamentally different than what we have shown here, which coherently vs. incoherently changing scenes (i.e. not a small difference). Also, Kamps et al. used static scenes as a control which, considering TOS/OPA motion-selectivity, have a large impact on TOS/OPA response.

      (2) Random effects analysis: The authors mention using a "random effects analysis" for several of their experiments. I would ask them to provide more details on what statistical models were used here. Were they purely random-effects models or actually mixed-effects models? What were the factors that entered into the analysis? Providing more detail would make the analysis techniques more transparent.

      Response: This point is now clarified in the Methods section.

      (3) Data and code availability: The authors write that data and code "are ready to be shared upon request." (section 2.5) In the spirit of transparency and openness, I strongly encourage the authors to make the data publicly available, e.g., on OSF or OpenNeuro. In particular, having probabilistic maps of PIGS available will allow other researchers to include PIGS in their analysis pipelines, making the current work more impactful.

      Response: We have made the probabilistic labels available to the public. This point is now highlighted in section 2.5.

      (4) Minor comments on the writing that caught my eye while reading the article:

      • Line 27: "in the human brain".

      Response: Done.

      -Line 30: I don't agree with the characterization of the previous model of scene perception as "simplistic." Adding one additional ROI makes it no less simplistic. Perhaps the authors can rephrase to make this slightly less antagonistic?

      Response: Done.

      • Line 71: it is not clear why NHPs are relevant here.

      Response: We decided to keep the text intact.

      • Line 138" "were randomized".

      Response: Done.

      • Line 152: "consisting".

      Response: Done.

      • Line 155: "sets" (plural).

      Response: Done.

      • Lines 253-255: Why were the 3T spatially smoothed but not the 7T data? This seems odd.

      Response: We kept the text intact.

      • Line 481: "we found strong motion selectivity" (remove "a").

      Response: Done.

      • Line 564: a word is missing, probably: "a stronger effect of ego-motion".

      Response: Done.

      • Line 591: "controlling spatial attention" (remove "the").

      Response: Done.

      • Line 591 and 594: Both sentences start with "However". I think the first of these should not because it is setting up the contrast for the second sentence.

      Response: Done.

      • Line 607: "higher-level" (hyphen).

      Response: Done.

      • Throughout the manuscript: adverbial phrases such as "(in)coherently changing" or "probabilistically localized" do not get a hyphen.

      Response: Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors state that "All data, codes and stimuli are ready to be shared upon request". Ideally, these materials should be deposited in appropriate repositories (e.g. OpenMRI, GitHub) and not require readers to contact the authors to obtain such materials.

      Other Comments:

      (a) The title ("A previously undescribed scene-selective site is the key to encoding ego-motion in natural environments") is potentially misleading - the work was not conducted in a natural environment. At best, you could say they are 'naturalistic stimuli'. Also, in what sense is PIGS "key" to encoding ego-motion - the study just shows sensitivity to this factor.

      Response: We changed the title to “naturalistic environments”.

      (b) Figure 1 - I'm not sure what point the authors are trying to make with Figure 1. The comparison is between a highly smoothed, group fixed-effects analysis and a less-smoothed individual subject analysis. The differences between the two could reflect group vs. individual, highly-smoothed (5 mm) versus less-smoothed (2 mm), or differences in thresholding. If the thresholding were lower for the group analysis, it would probably start to look more similar to the individual subject. As it stands, this figure isn't particularly informative, it seems redundant with Figure 2, and Figure 1A is not even referenced in the main text. Further, fixed effects analyses are relatively uncommon in the recent literature, so their inclusion is unusual.

      Response: Figure 1A is a replication of the data/method used in Nasr et al., 2011 and it will help the readers see the difference between the “traditional” scene-selectivity maps generated based on group-averaging” vs. data from individual subjects. In this case, we decided not to change the Figure.

      (c) Figure 3 - why are the two sets of maps shown at different thresholds? For 3B given the larger sample size, it is expected that the extent of the significant activations will increase. Currently the higher threshold for 3B and the smaller range for 3A is making the sets of maps look more comparable.

      Response: As the reviewer noticed, the number of subjects is larger in Figure 3B compared to 3A. The main point of this figure is to show that the PIGS activity center does not vary across populations. Considering this point, we decided not to change this figure.

      (d) Figure 10 - why is the threshold lower than used for other figures? It would be helpful if there was consistent thresholding across figures.

      Response: Experiment 6 and Experiment 1 are based on different stimuli (see Methods). Also, among those subjects who participated in Experiment 1, two subjects did not participate in Experiment 6. These points are already highlighted in the text.

      (e) Figures - how about the AFNI approach of thresholding and showing sub-threshold data at the same time? (Taylor et al, 2023, Neuroimage).

      Response: We highly appreciate the methodology suggested by Taylor and colleagues. However, our main point here is to show the center of PIGS activity. In this condition, showing an unthresholded activity map doesn’t have any advantage over the current maps. Considering these points, we decided not to change the figures.

      (f) Coherent versus incoherent scenes - there are many differences between the coherent and incoherent scenes. Arguing that it must be ego-motion seems a little premature without further investigation. Activity anterior to OPA has been associated with the construction of an internal representation of a spatial environment (Steel et al., 2023, The Journal of Neuroscience). Could it be that this is the key effect, not really the ego-motion?

      Response: In this revision, we discussed the study by Steel et al., 2021 and 2023 in section 4.3.

      Reviewer #3 (Recommendations For The Authors):

      Overall, I think this is already an excellent contribution. The suggestions I have are minor and may help with the clarity of the results.

      (1) My main request of the authors would be to provide more points of reference in some of the figures with cortical maps. In many cases, the authors use arrows to point to the locations of activations of interest. However, the arrows in adjacent figures are often not placed in exactly the same places on maps that are meant to be compared. It would very much help the viewer to compare activations if the arrows pointing to activations or regions of interest were placed in identical locations for the same brains appearing in different sub-panels (e.g. in panels A and B of Figure 1). The underlying folds of the cortical surface provide some points of reference, but these are often occluded to different extents by data in figures that are meant to be compared.

      Response: To address the reviewer’s concern, we regenerated Figure 8 (Figure 7 in the previous submission) and we tried to put arrowheads in identical locations, as much as possible. Especially for PIGS, this point was also considered in Figures 2 and 3.

      (2) Outlines (such as those in Figure 5) are also very useful, and I would encourage broader use of them in other figures (e.g. Figures 7, 10, and 12). Figures 10 and 12 are on the fsaverage surface, so the same outlines could be used for them as for Figure 5.

      To be clear, it's possible to apprehend the results with the figures as they are, but I think a few small changes could help a lot.

      Response: In this revision, we added outlines to Figures 11 and 13 (Figure 10 and 12 in the previous submission). We did not add the outline to Figure 8 because it made it hard to see PIGS. Rather we used arrows (see the previous comment).

      Other minor points:

      In the method for Experiment 4, the authors write: "Other details of the experiment were similar to those in Experiment 1.". Similar or the same? The authors should clarify this statement, e.g. "the number of images per block, the number of blocks, the number of runs were the same as Experiment 1" - with any differences noted.

      Response: This point is now addressed in the Methods section.

      In Figure 8, it would be better to have the panel labels (A, B, C, D) in the upper left of each panel rather than the lower left.

      Response: We tried to keep the panels arrangement consistent across the figures. That is why letters are positioned like this.

      A final gentle suggestion: pycortex (http://github.com/gallantlab/pycortex) provides a means to visualize the flattened fsaveage surface with outlines for localized regions of interest and overlaid lines for major sulci. Though it is by no means necessary for publication, It would be lovely to see these results on that surface, which is freely available and downloadable via a pycortex command (surface here: https://figshare.com/articles/dataset/fsaverage_subject_for_pycortex/9916166)

      Response: We thank the reviewer for bringing pycortex to our attention. We will consider using it in our future studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings characterising the genomic features of E. coli isolated from neonatal meningitis from seven countries, and documents bacterial persistence and reinfection in two case studies. The genomic analyses are solid, although the inclusion of a larger number of isolates from more diverse geographies would have strengthened the generalisability of findings. The work will be of interest to people involved in the management of neonatal meningitis patients, and those studying E. coli epidemiology, diversity, and pathogenesis.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses whole genome sequencing to characterise the population structure and genetic diversity of a collection of 58 isolates of E. coli associated with neonatal meningitis (NMEC) from seven countries, including 52 isolates that the authors sequenced themselves and a further 6 publicly available genome sequences. Additionally, the study used sequencing to investigate three case studies of apparent relapse. The data show that in all three cases, the relapse was caused by the same NMEC strain as the initial infection. In two cases they also found evidence for gut persistence of the NMEC strain, which may act as a reservoir for persistence and reinfection in neonates. This finding is of clinical importance as it suggests that decolonisation of the gut could be helpful in preventing relapse of meningitis in NMEC patients.

      Strengths:

      The study presents complete genome sequences for n=18 diverse isolates, which will serve as useful references for future studies of NMEC. The genomic analyses are high quality, the population genomic analyses are comprehensive and the case study investigations are convincing.

      We agree

      Weaknesses:

      The NMEC collection described in the study includes isolates from just seven countries. The majority (n=51/58, 88%) are from high-income countries in Europe, Australia, or North America; the rest are from Cambodia (n=7, 12%). Therefore it is not clear how well the results reflect the global diversity of NMEC, nor the populations of NMEC affecting the most populous regions.

      The virulence factors section highlights several potentially interesting genes that are present at apparently high frequency in the NMEC genomes; however, without knowing their frequency in the broader E. coli population it is hard to know the significance of this.

      We acknowledged the limitations of our NMEC collection in the Discussion. We agree the prevalence of virulence factors in our collection is interesting. The limited size of our collection prevented further evaluation of the prevalence of these virulence factors in a broader E. coli population.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a robust genomic dataset profiling 58 isolates of neonatal meningitis-causing E. coli (NMEC), the largest such cohort to be profiled to date. The authors provide genomic information on virulence and antibiotic resistance genomic markers, as well as serotype and capsule information. They go on to probe three cases in which infants presented with recurrent febrile infection and meningitis and provide evidence indicating that the original isolate is likely causing the second infection and that an asymptomatic reservoir exists in the gut. Accompanying these results, the authors demonstrate that gut dysbiosis coincides with the meningitis.

      Strengths:

      The genomics work is meticulously done, utilizing long-read sequencing.

      The cohort of isolates is the largest to be sampled to date.

      The findings are significant, illuminating the presence of a gut reservoir in infants with repeating infection.

      We agree

      Weaknesses:

      Although the cohort of isolates is large, there is no global representation, entirely omitting Africa and the Americas. This is acknowledged by the group in the discussion, however, it would make the study much more compelling if there was global representation.

      We agree. In the Discussion we state this is likely a reflection of the difficulty in acquiring isolates causing neonatal meningitis, in particular from countries with limited microbiology and pathology resources.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Schembri et al performed a molecular analysis by WGS of 52 E. coli strains identified as "causing neonatal meningitis" from several countries and isolated from 1974 to 2020. Sequence types, virulence genes content as well as antibiotic-resistant genes are depicted. In the second part, they also described three cases of relapse and analysed their respective strains as well as the microbiome of three neonates during their relapse. For one patient the same E. coli strain was found in blood and stool (this patient had no meningitis). For two patients microbiome analysis revealed a severe dysbiosis.

      Major comments:

      Although the authors announce in their title that they study E. coli that cause neonatal meningitis and in methods stipulate that they had a collection of 52 NMEC, we found in Supplementary Table 1, 29 strains (therefore most of the strains) isolated from blood and not CSF. This is a major limitation since only strains isolated from CSF can be designated with certainty as NMEC even if a pleiocytose is observed in the CSF. A very troubling data is the description of patient two with a relapse infection. As stated in the text line 225, CSF microscopy was normal and culture was negative for this patient! Therefore it is clear that patient without meningitis has been included in this study.

      We have reviewed the clinical data for our 52 NMEC isolates, noting that for some of the older Finish isolates we relied on previous publications. This data is shown in Table S1. To address the Reviewer’s comment, we have added the following text to the methods section (new text underlined).

      ‘The collection comprised 42 isolates from confirmed meningitis cases (29 cultured from CSF and 13 cultured from blood) and 10 isolates from clinically diagnosed meningitis cases (all cultured from blood).’

      Patient 2 was initially diagnosed with meningitis based on a positive blood culture in the presence of CSF pleocytosis (>300 WBCs, >95% polymorphs). We understand there may be some confusion with reference to a relapsed infection, which we now more accurately describe as recrudescent invasive infection in the revised manuscript.

      Another major limitation (not stated in the discussion) is the absence of clinical information on neonates especially the weeks of gestation. It is well known that the risk of infection is dramatically increased in preterm neonates due to their immature immunity. Therefore E. coli causing infection in preterm neonates are not comparable to those causing infection in term neonates notably in their virulence gene content. Indeed, it is mentioned that at least eight strains did not possess a capsule, we can speculate that neonates were preterm, but this information is lacking. The ages of neonates are also lacking. The possible source of infection is not mentioned, notably urinary tract infection. This may have also an impact on the content of VF.

      We agree. In the Discussion we now note the following (new text underlined):

      ‘… we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants.’

      Submission to Medrxiv, a requirement for review of our manuscript at eLife, necessitated the removal of some patient identifying information, including precise age and detailed medical history.

      Sequence analysis reveals the predominance of ST95 and ST1193 in this collection. The high incidence of ST95 is not surprising and well previously described, therefore, the concluding sentence line 132 indicating that ST95 E. coli should exhibit specific virulence features associated with their capacity to cause NM does not add anything. On the contrary, the high incidence of ST1193 is of interest and should have been discussed more in detail. Which specific virulence factors do they harbor? Any hypothesis explaining their emergence in neonates?

      We compared the virulence factors of ST95 and ST1193 and summarized this information in Figure 4. We also discussed how the K1 polysialic acid capsule in ST95 and ST1193 could contribute to the emergence of these STs in NM. Specifically, we stated the following: ‘We speculate this is due to the prevailing K1 polysialic acid capsule serotype found in ST95 and the newly emerged ST1193 clone [22, 37] in combination with other virulence factors [15, 28, 29] (Figure 4) and the immature immune system of preterm infants.’

      In the paragraph depicted the VF it is only stated that ST95 contained significantly more VF than the ST1193 strains. And so what? By the way "significantly" is not documented: n=?, p=?

      We compared the prevalence of known virulence factors between ST95 and ST1193, and showed that ST95 strains in our collection contained significantly more virulence factors than the ST1193 strains. The P-value and the statistical test used were included in Supplementary Figure 3. To address the reviewers concern, we have now also added this to the main manuscript text as follows (new text underlined):

      ‘Direct comparison of virulence factors between ST95 and ST1193, the two most dominant NMEC STs, revealed that the ST95 isolates (n = 20) contained significantly more virulence factors than the ST1193 isolates (n=9), p-value < 0.001, Mann-Whitney two-tailed unpaired test (Supplementary Table 1, Supplementary Figure 3).’

      The complete sequence of 18 strains is not clear. Results of Supplementary Table 2 are presented in the text and are not discussed.

      NMEC isolates that were completely sequenced in this study are indicated in bold and marked with an asterisk in Figure 1. This information is indicated in the figure legend and was provided in the original submission. All information regarding genomic island composition and location, virulence genes and plasmid and prophage diversity is included in Supplementary Table 2. This information is highly descriptive and thus we elected not to include it as text in the main manuscript.

      46 years is a very long time for such a small number of strains, making it difficult to put forward epidemiological or evolutionary theories. In the analysis of antibiotic resistance, there are no ESBLs. However, Ding's article (reference 34) and other authors showed that ESBLs are emerging in E. coli neonatal infection. These strains are a major threat that should be studied, unfortunately, the authors haven't had the opportunity to characterize such strains in their manuscript.

      We agree 46 years is a long time-span. The study by Ding et al examined 56 isolates comprised of 25 different STs isolated in China from 2009-2015, with ST1193 (n=12) and ST95 (n=10) the most common. Our study examined 58 isolates comprised of 22 different STs isolated in seven different geographic regions from 1974-2020, with ST1193 (n=9) and ST95 (n=20) the most common. Thus, despite differences in the geographic regions from which isolates in the two studies were sourced, there are similarities in the most common STs identified. The fact that we observed less antibiotic resistance, including a lack of ESBL genes, in ST1193 is likely due to the different regions from which the isolates were sourced. We acknowledged and discussed the potential of ST1193 harbouring multidrug resistance including ESBLs in our manuscript as follows:

      ‘Concerningly, the ST1193 strains examined here carry genes encoding several aminoglycoside-modifying enzymes, generating a resistance profile that may lead to the clinical failure of empiric regimens such as ampicillin and gentamicin, a therapeutic combination used in many settings to treat NM and early-onset sepsis [35, 36]. This, in combination with reports of co-resistance to third-generation cephalosporins for some ST1193 strains [22, 34], would limit the choice of antibiotic treatment.’

      Second part of the manuscript:

      The three patients who relapsed had a late neonatal infection (> 3 days) with respective ages of 6 days, 7 weeks, and 3 weeks. We do not know whether they are former preterm newborns (no term specified) or whether they have received antibiotics in the meantime.

      As noted above, patient ages were not disclosed to comply with submission to Medrxiv, a requirement for review of our manuscript at eLife.

      Patient 1: Although this patient had a pleiocytose in CSF, the culture was negative which is surprising and no explanation is provided. Therefore, the diagnosis of meningitis is not certain. Pleiocytose without meningitis has been previously described in neonates with severe sepsis. Line 215: no immunological abnormalities were identified (no details are given).

      We respectfully disagree with the reviewer. The diagnosis of meningitis is made unequivocally by the presence of a clearly abnormal CSF microscopy (2430 WBCs) and an invasive E. coli from blood culture. This does not seem controversial to the authors. We had believed it unnecessary to include this corroborative evidence, but have added the following to support our assertion:

      ‘The child was diagnosed with meningitis based on a cerebrospinal fluid (CSF) pleocytosis (>2000 white blood cells; WBCs, low glucose, elevated protein), positive CSF E. coli PCR and a positive blood culture for E. coli (MS21522).’

      On the contrary, the authors are surprised by the statement that CSF pleocytosis occurs in neonatal sepsis ‘without meningitis’ and do not know of any definitions of neonatal meningitis that are not tied to the presence of a CSF pleocytosis. Furthermore, the later isolation of E. coli from the CSF during the relapsed infection re-enforces the initial diagnosis.

      Patient 2: This patient had a recurrence of bacteremia without meningitis (line 225: CSF microscopy was normal and culture negative!). This case should be deleted.

      In a similar vein to the previous comment, we respectfully assert that this patient has clear evidence of meningitis (330 WBCs in the CSF, taken 24h after initiation of antibiotic treatment). In this case, molecular testing was not performed as, under the principle of diagnostic stewardship, it was not considered necessary by the clinical microbiologists and treating clinicians following the culture of E. coli in the bloodstream. We agree that this is not a case of recurrent meningitis, but our intention was to highlight the recrudescence of an invasive infection (urinary sepsis requiring admission to hospital and intravenous antibiotics) which we hypothesise has arisen from the intestinal reservoir. We did not state that all patients suffered from relapsed meningitis.

      Despite this, to address this reviewers concern, we have changed all reference to ‘relapsed infection’ to now read ‘recrudescent invasive infection’ in the revised manuscript.

      Patient 3: This patient had two relapses which is exceptional and may suggest the existence of a congenital malformation or a neurological complication such as abscess or empyema therefore, "imaging studies" should be detailed.

      This patient underwent extensive imaging investigation to rule out a hidden source. This included repeated MRI imaging of head and spine, CT imaging of head and chest, USS imaging of abdomen and pelvis and nuclear medicine imaging to detect a subtle meningeal defect and CSF leak. All tests were normal, and no abscess or empyema found.

      We have modified the text to include this information:

      Text in original submission: ‘Imaging studies and immunological work-up were normal.’

      New text in revised manuscript (underlined): ‘Extensive imaging studies including repeated MRI imaging of the head and spine, CT imaging of the head and chest, ultrasound imaging of abdomen and pelvis, and nuclear medicine imaging did not show a congenital malformation or abscess. Immunological work-up did not show a known primary immunodeficiency. At two years of age, speech delay is reported but no other developmental abnormality.’

      The authors suggest a link between intestinal dysbiosis and relapse in three patients. However, the fecal microbiomes of patients without relapse were not analysed, so no comparison is possible. Moreover, dysbiosis after several weeks of antibiotic treatment in a patient hospitalized for a long time is not unexpected. Therefore, it's impossible to make any assumption or draw any conclusion. This part of the manuscript is purely descriptive. Finally, the authors should be more prudent when they state in line 289 "we also provide direct evidence to implicate the gut as a reservoir [...] antibiotic treatment". Indeed the gut colonization of the mothers with the same strain may be also a reservoir (as stated in the discussion line 336). Finally, the authors do not discuss the potential role of ceftriaxone vs cefotaxime in the dysbiosis observed. Ceftriaxone may have a major impact on the microbiota due to its digestive elimination.

      We addressed the limitations of our study in the Discussion, including that we did not have access to urine or stool samples from the mother of the infants that suffered recrudescence, and thus cannot rule out mother-to-child transmission as a mechanism of reinfection. We have now added that we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants. The limitations of our study are summarised as follows in the Discussion (new text underlined):

      ‘This study had several limitations. First, our NMEC strain collection was restricted to seven geographic regions, a reflection of the difficulty in acquiring strains causing this disease. Second, we did not have access to a complete set of stool samples spanning pre- and post-treatment in the patients that suffered NM and recrudescent invasive infection. This impacted our capacity to monitor E. coli persistence and evaluate the effect of antibiotic treatment on changes in the microbiome over time. Third, we did not have access to urine or stool samples from the mother of the infants that suffered recrudescence, and thus cannot rule out mother-to-child transmission as a mechanism of reinfection. Finally, we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants.’

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It would be useful to mention the sample size (number of genomes analysed, n=58) in the abstract to give readers a sense of the scale of the analysis.

      We have added the number of genomes in the abstract as suggested (new text underlined).

      ‘Here we investigated the genomic relatedness of a collection of 58 NMEC strains spanning 1974-2020 and isolated from seven different geographic regions.’

      The term 'strain' is used throughout, it would be clearer to use 'isolates' to describe the biological material and 'genomes' when the unit being referred to is genome sequences. For example, lines 108-111 use 'strain' to mean the collection of 52 isolates but also uses 'strain' to mean the collection of 58 genomes including those of the 52 isolates that the authors sequenced plus a further 6 genomes of isolates that they do not have in their isolate collection.

      We have changed the term ‘strain’ to ‘isolate’ or ‘genome’ as suggested.

      Figure 1 (annotated phylogeny) is hard to read and interpret, as so much data is presented. It would assist readers if the authors could provide an interactive form of the phylogeny and metadata/genomic feature data discussed in the text, e.g. using microreact.org, so that details can be explored more easily.

      This is an excellent suggestion, and we created a project on microreact.org. This information has been added to the Figure 1 legend.

      https://microreact.org/project/oNfA4v16h3tQbqREoYtCXj-high-risk-escherichia-coli-clones-that-cause-neonatal-meningitis-and-association-with-recrudescent-infection.

      It would be useful to provide information on the frequency and/or distribution of the virulence factors in the broader E. coli population, to provide context for readers and to better understand the importance/significance of the high frequency of the reported virulence factors within NMEC.

      As noted above, we agree the prevalence of virulence factors in our collection is interesting. We discussed the prevalence of these virulence factors in our collection, and the detailed data is presented in Table S1. However, we also note a limitation in our study is the number of isolates, and thus we would prefer to avoid evaluation of the prevalence of these virulence factors in the context of a broader E. coli population. There are other studies that have examined NMEC virulence factors in the past; some examples are noted below, and we have now referenced these in our manuscript (note Ref 15 was suggested by Reviewer 3 in a comment below; PMID: 11920295).

      Ref 15: Johnson JR, Oswald E, O'Bryan TT, Kuskowski MA, Spanjaard L. Phylogenetic distribution of virulence-associated genes among Escherichia coli isolates associated with neonatal bacterial meningitis in the Netherlands. J Infect Dis 2002; 185(6): 774-84.

      Ref 28: Wijetunge DS, Gongati S, DebRoy C, et al. Characterizing the pathotype of neonatal meningitis causing Escherichia coli (NMEC). BMC Microbiol 2015; 15: 211.

      Ref 29: Bidet P, Mahjoub-Messai F, Blanco J, et al. Combined Multilocus Sequence Typing and O Serogrouping Distinguishes Escherichia coli Subtypes Associated with Infant Urosepsis and/or Meningitis. J Infect Dis. 2007; 196(2):297-303.

      I suggest avoiding the term 'global' to describe the collection, given that there are only seven countries included in the collection and two of the most populous continents (Africa and South America) are not represented at all.

      We agree, and now refer to our collection as ‘an NMEC strain collection from geographically diverse locations.’

      Reviewer #2 (Recommendations For The Authors):

      This is a suggestion regarding discussion/food for thought: This study sheds information on genomic features and indicates the presence of a reservoir in the infected infant. Previous studies have demonstrated the presence of a reservoir in the vaginas of women with recurrent UTIs. Is there any information as to whether the mothers of these infants, especially the three with recrudescent infection, had a UTI or recurrent UTI in their life? It may be worthwhile discussing the potential of testing for E. coli in expecting mothers, if they have a history of UTI.

      We do not have such data, and as indicated above we note this as a limitation of our study.

      It is unclear as written in the main text, as to whether all three cases of recrudescent infection come from the same geographical location. It would be easier to have this information in the corresponding main text, in addition to the supplement.

      The three cases of recrudescent invasive infection were from 3 different locations. We have added the information as following (new text underlined):

      ‘These patients were from different regions in Australia.’

      Reviewer #3 (Recommendations For The Authors):

      Line 48 and 67 change the word "devasting".

      Changed as suggested.

      Line 49 second most in full-term infants.

      Changed as suggested.

      Line 56 delete the sentence "antibiotic resistance genes occurred infrequently".

      We changed the sentence, which now reads (new text underlined):

      ‘Antibiotic resistance genes occurred infrequently in our collection’.

      Line 76 reference 10 is inappropriate.

      Reference 10 reported that 5/24 infants treated for neonatal Gram-negative bacillary meningitis over a 10-year period had a relapse of meningitis after the initial course of treatment. Four of the isolates that caused these relapsed infections were E. coli.

      To address the reviewers concern, we have altered the text as follows (new text underlined):

      ‘Moreover, NMEC is an important cause of relapsed infections in neonates [10]’.

      Line 83 several references related to serotypes are missing, notably doi.org/10.1086/339343.

      We have added this reference.

      Line 171 significantly? n=?, p=?

      The numbers and P-value were provided in the Supplementary Figure 3 legend. We have now added these to the text as follows:

      ‘Direct comparison of virulence factors between ST95 and ST1193, the two most dominant NMEC STs, revealed that the ST95 isolates (n = 20) contained significantly more virulence factors than the ST1193 isolates (n = 9); P-value < 0.001, Mann-Whitney two-tailed unpaired test (Supplementary Table 1, Supplementary Figure 3).”

      Figure 4 is not necessary.

      We respectfully disagree. Figure 4 provides an illustrative comparison of virulence factors between the two most dominant NMEC sequence types, ST95 and ST1193. We believe this will be informative for many readers.

      Line 311 "We speculate....of preterm infants" This sentence does not add anything to the discussion.

      We respectfully disagree and have kept the sentence. This reflects our opinion.

      Line 320 "clear clinical risk factors to explain... ». Term of neonates is missing.

      Updated as follows (new text underlined):

      ‘Although reported rarely, recrudescent invasive E. coli infection in NM patients, including several infants born pre-term, has been documented in single study reports [39, 40]. In these reports, infants received appropriate antibiotic treatment based on antibiogram profiling and no clear clinical risk factors to explain recrudescence were identified, highlighting our limited understanding of NM aetiology.’

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The emergence of catalytic self-replication of polymers is an important question in the context of the origin of life. Tkachenko and Maslov present a model in which such a catalytic polymer sequence emerges from a random pool of replicating polymers.

      Strengths:

      The model is part of a theme from many previous papers from the same authors and their colleagues. The model is interesting, technically correct, and demonstrates qualitatively new phenomena. It is good that the paper also makes a connection with possible experimental scenarios -- specifically, concrete proposals are made for testing the core ideas of the model. It would indeed be an exciting demonstration when such an experiment does indeed materialize.

      Weaknesses:

      Unlike the rest of the paper which is very tight in its arguments, I find that the discussion section is not so. Specifically, sentences such as " In fact, this can be seen as a special case of the classical error catastrophe" are a bit loose and not well substantiated -- although these are in the discussion section, I find this to be a weakness of an otherwise good paper. Tightening some of the arguments here will make it an excellent paper in my opinion.

      We followed the reviewer's recommendations by streamlining the discussion and removing the potentially confusing comparison to the classic error catastrophe.

      Reviewer #2 (Public Review):

      Summary:

      The replication of information-coding polymers and the emergence of catalytic ribozymes pose significant challenges, both experimentally and theoretically, in the study of the RNA world hypothesis. In this context, Tkachenko et al. put forth a novel hypothesis regarding a replication oligomer system based on a cleavage ribozyme. They initially highlighted that the breakage of oligomers could contribute to self-replication, provided that these fragments function as primers for subsequent replications. Next, they proposed a self-replicating system of oligomers founded on a hammerhead structure that catalyzes cleavage. By a simple dynamical model, they demonstrated that such a system is self-sustainable in certain parameter regimes. Furthermore, they delved into discussions regarding the potential emergence of such a system and the evolution toward further optimized ribozymes.

      Strengths: Although the cleavage (hammerhead) ribozyme has been discussed in the context of the origins of life, the authors are the first to discuss how they could be selected using a mathematical model as far as I know. The idea is simple: ribozyme activity creates fragments by breakage of an oligomer, which works as a primer for the ribozyme itself, resulting in a positive feedback system (i.e., autocatalytic sets in a broader sense). This potentially enables us to resolve at the same time problems on the (i) supply of new primers (but note that there is a major concern on this as described in the 'weakness'), and (ii) the sustaining of the cleavage ribozyme.

      Weaknesses:

      The major weakness of their theory is that the ends of the new primers, formed through the breakage/cleavage of polymers, must be chemically active (as the authors have already emphasized in the last paragraph of their discussion) to enable further elongation. Reactivating the ends of preexisting oligomers without enzymes, to the best of our current knowledge, could be a challenging task. Although their model heavily relies on this aspect, the authors do not elaborate on it.

      We have added a discussion of the need for chemical activation: "It is important to note that in the context of RNA, such bidirectional elongation requires chemical activation of the phosphate group at the 5' end of the primer to provide free energy for the newly formed covalent bond. Like the polymerization process itself, achieving this without enzymes is biochemically challenging. One might speculate that prebiotic evolution relied on inorganic catalysis, such as on mineral surfaces, or involved polymers other than today's RNA."

      We also included in the discussion a comment on a possible combination of our mechanism and the Virtual Circle Genome model that would avoid the need for bidirectional growth: "It may be possible to incorporate the selection mechanism proposed in this paper into the Virtual Circle Genome model. Such a hybrid approach would avoid the need for the biochemically problematic bidirectional growth while explaining the emergence of early catalytic activity unaffected by sequence scrambling"

      Another weakness is in the setup of their discussion on evolutionary dynamics. While they claim that their model is robust against replication errors, their approach to evolutionary dynamics appears unconventional, and it remains unclear under what conditions their assumptions are founded. They treat a whole set of oligos as a subject of evolution, rather than each individual oligo. This may necessitate more complex assumptions, such as the encapsulation of sets of oligos inside a protocell, to be adequately rationalized. Thus, it remains uncertain whether the system is indeed robust against replication errors in a more natural context. For example, if a mutant oligo, denoted as b', arises due to an error in the replication of oligo b, and if b' has lower catalytic activity but replicates more rapidly than b, it may ultimately come to dominate the system.

      We agree with the reviewer that the evolutionary dynamics in multi-species ecosystems are somewhat complicated and potentially confusing. To this end, we have added the following text and citations to our discussion: "Note that this fitness is defined at the level of the ecosystem, comprising all sequences in the chemostat, and is not necessarily attributable to individual members of that population. Over time, similar to microbial ecosystems, this population changes according to the laws of competitive exclusion [34, 35]". However, we would like to point out that we assume that our model operates in a chemostat-like environment, which can be realized, for example, in a prebiotic pool supplied with a constant flux of monomers. Thus, the evolutionary dynamics described by our equations do not require encapsulation of sets of oligos in a protocell followed by selection of these protocells.

      Reviewer #3 (Public Review):

      Summary:

      Non-enzymatic replication of RNA or a similar polymer is likely to be important for the origin of life. The authors present a model of how a functional catalytic sequence could emerge from a mixture of sequences undergoing non-enzymatic replication.

      Strengths:

      Interesting model describing details of the proposed replication mechanism.

      Weaknesses:

      A discussion of the virtual circular genome idea proposed in [33] is included in the discussion section together with the problem of sequence scrambling faced by this mechanism that was raised in [34]. However, the authors state that sequence scrambling is a special case of the classical error catastrophe. This should be reworded, because these phenomena are completely different. The error catastrophe occurs due to single-point mutational errors in a model that assumes that a complete template is being copied in one cycle. Sequence scrambling arises in models that assume cycles of melting and reannealing, in which case only part of a template is copied in one cycle. Scrambling is due to the many alternative ways in which pairs of sequences can reanneal. Many of these alternatives are incorrect and this leads to the disappearance of the original sequence. This problem exists even in the limit where there is zero mutational error rate. Therefore, it cannot be called a special case of the error catastrophe problem.

      We followed the reviewer's recommendations and removed the potentially confusing comparison to the classic error catastrophe.

      The authors seem to believe that their model avoids the scrambling problem. If this is the case, a clear explanation should be added about why this problem is avoided. Two possible points are mentioned.

      (i) Replication is bidirectional in this model. This seems like a small detail to me. I don't think it makes any difference to whether scrambling occurs.

      (ii) The functional activity is located in a short sequence region. I can imagine that if the length of a strand that is synthesized in a single cycle is long enough to cover the complete functional region, then sometimes the complete functional sequence can be copied in one cycle. Is this what is being argued? If so, it depends a lot on rates of primer extension and lengths of melting cycles etc, and some comment on this should be made.

      As we now explain in the text, while the scrambling problem itself is not completely avoided in our model, it does not affect the replication of the functionally relevant regions of the oligomers. Our key observation is that, due to the simplicity of the cleaving enzymes, the length of the functionally relevant region is much smaller than the scrambling-free length. This can be seen from a back-of-the-envelope estimate of the scrambling-free length added to the text: "...assuming the minimal hybridization length l_0=6 and random statistics of the master sequence, one gets the scrambling free length \sqrt{2 x 4^l_0}+l_0 ~100. This is an order of magnitude larger than both l_0 and the length of the core region of the hammerhead ribozyme."

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have evaluated that the authors have proposed a novel mechanism potentially relevant to the origins of life, and they have explained it with a sufficiently simple model. However, I recommend that they address the following issues, including those I raised in the public review:

      • Title: I believe that the title "Emergence of catalytic activity in ..." is rather broad. Could it be more specific to accurately represent the system described in the paper? For instance, "Selective advantage (or selection) of the hammerhead cleavage ribozyme in..." may better encapsulate the paper's focus.

      We thank the reviewer for this suggestion. However, our mechanism is not unique to hammerhead ribozymes. So we decided to keep the old title.

      • One theoretically non-trivial aspect is the stability of the cooperative structure. Could the authors provide a more detailed explanation of what drives the instability of the system and what mechanisms restore its stability? For example, in a similar self-reproducing oligomer system with ribozymes and their fragments (Kamimura et al. PLoS Comp. 2019), the symmetry of fragments breaks because they effectively suppress each other's replication. Also, it would be beneficial to clarify the necessary assumptions for stability. (For instance, the authors assumed that a_L can serve as a primer for only a, while a_R can serve for both a and b.).

      We thank the reviewer for bringing this interesting paper to our attention. The cooperative fixed point in our model is intrinsically dynamically stable. It is an interesting point why the replicase in Kamimura et al can be dynamically unstable, while the ligase in our model is always stable. However, it goes beyond the scope of our study. We added the following discussion to the manuscript: "Note that the stability of our cooperative fixed point is a non-trivial result. For example, in a related model by Kamimura et al. [34], the fixed point corresponding to a viable composite replicase is dynamically unstable and requires additional stabilization, e.g., by cell-like compartments."

      • As mentioned in the public review, a critical aspect of the practical applicability of the theory is whether cleaved oligos can be reactivated and further elongated, especially through non-enzymatic pathways. Alternatively, is it possible with the presence of enzymes? While I appreciate the conceptual beauty of their model, I recommend that they at least address the difficulty or feasibility of achieving this.

      We addressed this point in response to the public review

      • As also mentioned, in the section on evolutionary dynamics, it's essential to clarify the unit of evolution and the assumptions made. For a system-level evolution (i.e., all the sets of oligos, a and b can be the unit of evolution), more detailed assumptions are required, such as the presence of compartments whose growth is coupled with the replication of oligos inside, and the competition between these compartments. I recommend the authors clarify these points.

      We addressed this point in response to the public review

      Reviewer #3 (Recommendations For The Authors):

      Assuming that the above points can be addressed, this reviewer would support publication with minor modifications.

      We addressed all points in response to the public review

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The paper addresses the important question of how numerical information is represented in the human brain. Experimental findings are interpreted as providing evidence for a sensorimotor mechanism that involves channels, each tuned to a particular numerical range. However, the logic of the channel concept as employed here, as well as the claims regarding a sensorimotor basis for these channels, is incomplete and thus requires clarification and/or modification.

      Reviewer #1 Public Review

      Anobile and colleagues present a manuscript detailing an account of numerosity processing with an appeal to a two-channel model. Specifically, the authors propose that the perception of numerosity relies on (at least) two distinct channels for small and large numerosities, which should be evident in subject reports of perceived numerosity. To do this, the authors had subjects reproduce visual dot arrays of numerosities ranging from 8 to 32 dots, by having subjects repetitively press a response key at a pre-instructed rate (fast or slow) until the number of presses equaled the number of perceived dots. The subjects performed the task remarkably well, yet with a general bias to overestimate the number of presented dots. Further, no difference was observed in the precision of responses across numerosities, providing evidence for a scalar system. No differences between fast and slow tapping were observed. For behavioral analysis, the authors examined correlations between the Weber fractions for all presented numerosities. Here, it was found that the precision at each numerosity was similar to that at neighboring numerosities, but less similar to more distant ones. The authors then went on to conduct PCA and clustering analyses on the weber fractions, finding that the first two components exhibited an interaction with the presented numerosity, such that each was dominant at distinct lower and upper ranges and further well-fit by a log-Gaussian model consistent with the channel explanation proposed at the beginning.

      Overall, the authors provide compelling evidence for a two-channel system supporting numerosity processing that is instantiated in sensorimotor processes. A strength of the presented work is the principled approach the authors took to identify mechanisms, as well as the controls put in place to ensure adequate data for analysis. Some questions do remain in the data, and there are aspects of the presentation that could be adjusted.

      • The use of a binary colormap for the correlation matrix seems unnecessary. Binary colormaps between two opposing colors (with white in the middle) are best for results spanning positive and negative values (say, correlation values between -1 and +1), but the correlations here are all positive, so a uniform colormap should be applied. I can appreciate that the authors were trying to emphasize that a 2+ channel system would lead to lower correlations at larger ratios, but that's emphasized better in the numerical ratio line plots.

      We agree and now changed the colour maps accordingly (Fig 1 and 3, p. 4 and 11). Thank you.

      • In Figure 1, the correlation matrices in Figure 1 appear blurred out. I am not sure if this was intentional but suspect it was not, and so they should appear like those presented in Figure 3.

      Sorry about that, it was a rendering problem. Now fixed.

      • It's notable that the authors also collected data on a timing task to rule out a duration-based strategy in the numerosity task. If possible, it would be great to have the author also conduct the rest of the analyses on the duration task as well; that is, to look at WF correlation matrices/ratios as well as PCA. There is evidence that duration processing is also distinctly sensorimotor, and may also rely on similar channels. Evidence either for or against this would likely be of great interest.

      We agree that investigating the existence of temporal channels would be of great interest, but it is goes beyond the scope of the current study. Out of curiosity, however, we analysed the duration data. Interestingly, signatures of sensorimotor channels (correlation gradient as a function on duration distance) emerge. Interestingly, this does not hold when correlating number against duration data. These results (if confirmed) would indicate the existence of independent mechanisms for the time and numerosity perception. Our research agenda is now proceeding in this direction.

      • For the duration task, there was no fast-tapping condition. Why not? Was this to keep the overall task length short?

      Yes, this was the main reason.

      • The number of subjects/trials seems a bit odd. Why did some subjects perform both and not others? The targets say they were presented "between 25 and 30 times", but why was this variable at all?

      The two experimental conditions were demanding, lasting around 2 hours each. Some participants, unfortunately, were available for just one slot. To make the two conditions similarly powered, we added some extra non-shared participants. Trials were divided into blocks of 55 trials (5 repetitions for each target). Most of the participants performed 6 blocks in both conditions, few of them (again for availability limits) performed 5 blocks.

      • For the PCA analysis, my read of the methods and results is that this was done on all the data, across subjects. If the data were run on individual subjects and the resulting PCA components averaged, would the same results be found?

      We thank the reviewer for giving us the opportunity to clarify the technique.

      In brief: we measured precision (Weber Fraction) in translating digits (target numbers) into corresponding action sequences. This creates a m by n matrix, each column (n) representing a participant, each row (m) a target number. This matrix was then submitted to PCA. The analyses provided two components. Each target number was assigned with two loading scores: one representing the loading on the 1st and one on the 2nd component. These loadings were than displayed as a function of targets, to describe the tunings. This analysis, by its nature, is across-participants and cannot be performed on individual data.

      • For the data presented in Figure 2, it would be helpful to also see individual subject data underlaid on the plots to get a sense of individual differences. For the reproduced number, these will likely be clustered together given how small the error bars are, but for the WF data it may show how consistently "flat" the data are. Indeed, in other magnitude reproduction tasks, it is not uncommon to see the WF decrease as a function of target magnitude (or even increase). It may be possible that the reason for the observed findings is that some subjects get more variable (higher WFs) with larger target numbers and others get less variable (lower WFs).

      We agree and now added individual data, confirming flat WF distributions (Fig 2 B&D).

      • Regarding the two-channel model, I wonder how much the results would translate to different ranges of numerosities? For example, are the two channels supported here specific to these ranges of low and high numbers, or would there be a re-mapping to a higher range (say, 32 to 64 dots) or to a narrower range (say 16 to 32 dots). It would be helpful to know if there is any evidence for this kind of remapping.

      This is the first study measuring sensorimotor channels for the transformation of numbers into action sequences. Whether these channels are modulated by the numerical context is an interesting open question that we are exploring through specific experimental conditions (now discussed at p. 17, lines 451-460).

      Reviewer #2 Public Review

      The authors wish to apply established psychophysical methods to the study of number. Specifically, they wish to test the hypothesis - supported by their previous work - that human sensorimotor processes are tuned to specific number ranges. In a novel set of tasks, they ask participants to tap a button N times (either fast or slow), where N varies between 8 and 32 across trials. As I understood it, they then computed the Weber fraction (WF) for each participant for each number and correlated those values across participants and numbers. They find stronger correlations for nearby numbers than for distant numbers and interpret this as evidence of sensorimotor tuning functions. Two other analyses - cluster analyses and principal component analyses (PCA) - suggest that participants' performance relied on at least 2 mechanisms, one for encoding low numbers of taps (around 10) and another for encoding larger numbers (around 27).

      Strengths

      Individual differences can be a rich source of scientific insight and I applaud the authors for taking them seriously, and for exploring new avenues in the study of numerical cognition.

      Weaknesses

      Inter-subject-correlation

      The experiment "is based on the idea that interindividual variability conveys information that can reveal common sensory processes (Peterzell & Kennedy, 2016)" but I struggled to understand the logic of this technique. The authors explain it most clearly when they write "Regions of high intercorrelation between neighbouring stimuli intensity can be interpreted to imply that sets of stimuli are processed by the same (shared) underlying channel. This channel, while responding relatively more to its preferred stimulus, will also be activated by neighbouring stimuli that although slightly different from the preferred intensity, are nevertheless included in the same response distribution." As I understood it, the correlations are performed "between participants, for all targets values" - meaning that they are measuring the extent to which different participants' WFs vary together. But why is this a good measure of channels? This analysis seems to assume that if people have channels for numerical estimation, they will have the same channels, tuned to the same numerical ranges. But this is an empirical question - individual participants could have wildly different channels, and perhaps different numbers of channels (even in the tested range). If they do, then this between-subject analysis would mask these individual differences (despite the subtitle).

      Yes, the technique assumes that different individuals have similar channels, and the results confirm this. If everyone had different channels, or different numbers of channels, we would not have found this pattern of results: an ordered scaling of correlations as a function of numerical distance. As specified in the ms, however, this technique (at least as we used it) is not sensitive enough to identify the exact number of channels, so it may have smoothed the results, 'masking' the existence of more than two channels. To avoid possible confounds related to accuracy (reproduction biases), we used Weber Fraction, a standard index of normalized sensory precision (p. 7, lines 182-183).

      Different channels

      I had trouble understanding much of the analyses, and this may account for at least some of my confusion. That said, as I understand it, the results are meant to provide "evidence that tuned mechanisms exist in the human brain, with at least two different tunings" because of the results of the clustering analysis and PCA. However, as the authors acknowledge, "PCA aims to summarize the dataset with the minimal number of components (channels). We can therefore not exclude the possible existence of more than two (perhaps not fully independent) channels." So I believe this technique does not provide more evidence for the existence of 2 channels as for the existence of 4 or 8 or 11 channels, the upper bound for a task testing 11 different numbers. If we can conclude that people may have one channel per number, what does "channel" mean?

      We recognise that the technique is not particularly intuitive, and we apologize for the lack of clarity.

      To clarify: we measured the precision in translating digit numbers into action sequences. This was done for different target numbers (8, 10, 11, 13, 14, 16, 19, 21, 24, 28, 32) and with N participants. For each target number, and independently for each participant, we calculated the reproduction precision (Weber Fraction). The dataset comprised a matrix where each column represents a participant, and each row a target number. Each cell contains the corresponding Weber Fraction value. This dataset was then analysed with a simple correlation, across participants. For example, the WFs provided by the N participants when tested at the target number "8" were correlated with those obtained with the target number 10, 11, 13...32. The results show that the correlation between "8 and 10" (low numerical distance) was higher compared to that obtained correlating "8 with 32" (higher numerical distance). This pattern implies that the shared variance, between numbers, scales with numerical distance, across participants: implying the existence of channels aggregating similar numbers (i.e. tuning selectivity). On the same dataset we than ran a PCA. This analysis provides two main components. Within each component, each target number is assigned with a loading score: one for the 1st and one for the 2nd component. These loading were plotted as a function of targets, to describe the tunings shape (i.e. channels).

      As stated above, we cannot really say exactly how many channels exist. These results should be interpreted as evidence for the existence of at least two channels for the transformation of numerical symbols into action sequences. This is not an obvious result at-all. There is no evidence in the literature for the existence of such mechanism in humans. In the animal (crow), there were found as many channels as the numbers tested. This does not contrast with our 2-channel results, but (very likely) arises from the different resolution of the techniques. Single cell recording has surely higher resolution compared to our interindividual covariance approach. In short, we believe that the channels revealed here are likely a coarse summary representation of several underlying channels.

      We now tried to make these points clearer (p. 7 lines 186-196; p. 15 lines 382-384; p. 16 lines 401-402):

      Several other questions arose for me when thinking through this technique. If people did have two channels (at least in this range), why would they be so broad? Why would they be centered so near the ends of the tested range? Can such effects be explained by binning on the part of the participants, who might have categorized each number (knowingly or not) as either "small" or "large"? Whereas the experiment tested numbers 8-32, numbers are infinite - How could a small number of channels cover an infinite set? Or even the set 8-10,000? More broadly, I was unsure what advantages channels would have - that is - how in principle would having distinct channels for processing similar stimuli improve (rather than impede) discrimination abilities?

      This field of study is completely new, with many questions still open, including whether these channels are modulated by the numerical context such as the tested range and their extremes. The channels appear broad because, as stated above, they likely represent a coarse summary representation of several (probably sharper) underlying channels. We are now exploring the effect of numerical range and trying to modulate the tuning widths through ad-hoc experimental conditions. (p. 16 lines 401-402; p. 17 lines 450-459)

      No number perception

      I was uncertain about the analogy to studies of other continuous dimensions like spatial frequency, motion, and color. In those studies, participants view images with different spatial frequency, motion, or color - the analogy would be to see dot arrays containing different numbers of dots. Instead, here participants read written numerals (like "19"), symbols which themselves do not have any numerical properties to perceive. How does that difference change the interpretation of the effects? One disadvantage of using numerals is that they introduce a clear discontinuity: Our base-10 numerical system artificially chunks integers into decades, potentially causing category-boundary effects in people's reproductions.

      We used these sensory analogies to provide a flavour of the technique. The focus of the current study was on the individual differences in the numbers-to-actions transformation process. To this aim we decided to reduce the noise associated with the encoding of the sensory stimulus di per se. Digits encoding, at least with educated adults, is indeed noiseless, eliminating this source of variability. However, we agree that looking at non-symbolic formats would be interesting. We are now collecting data with dots and flash estimations. The results (so far) are largely in line with those found here, ensuring no chunking strategies, and confirming previous literature showing sensory numerosity selective channels in humans and animals. (p. 14 lines 351-355)

      Sensorimotor

      The authors wished to test for "sensorimotor mechanisms selective to numerosity" but it's not clear what makes their effects sensorimotor (or selective to numerosity, see below). It's true they found effects using a tapping task (which like all behaviour is sensorimotor), but it's not clear that this effect is specific to sensorimotor number reproduction. They might find similar effects for numerical comparison or estimation tasks. Such findings would suggest the effect may be a general feature of numerical cognition across modalities.

      Related to the above comment, the task here was to transform noiseless symbols (digits) into (noisy) numerical action sequences. Given that the source of variability is thus mainly driven by the sensory-to-action process, we believe that the task can be safely assumed to be considered sensorimotor in nature. (p. 14 lines 351-355)

      Yes, the same pattern of results might be found for numerical comparison or estimation tasks, but using non-symbolic formats (dots/flashes). Educated adults make no errors in naming or comparing such simple digits, making this covariance analysis impossible to be performed with digit verbal estimation or comparison tasks. However, to anticipate our future results, we have preliminary data for dots and flashes verbal estimation tasks (“how many?”). The data are suggesting similar results, consolidating the technique, and confirming the large literature showing sensory channels for purely visual numerosity. (p. 17 lines 453-455)

      Specific to numbers

      The authors argue that their effects are "number selective" but they do not provide compelling evidence for this selectivity. In principle, their main findings could be explained by the duration of tapping rather than the number of taps. They argue this is unlikely for two reasons. The first reason is that the overall pattern of results was unchanged across the fast and slow tapping conditions, but differences in duration were confounded with numerosity in both conditions, so the comparison is uninformative. (Given this, I am not sure what we stand to learn by comparing the two tapping speeds.) The second reason is that temporal reproduction was less precise in their control condition than numerical reproduction, but this logic is unclear: Participants could still use duration (or some combination of speed and duration) as a helpful cue to numerosity, even if their duration reproductions were imperfect. If the authors wish to test the role of duration, they might consider applying the same analytical techniques they use for numbers to their duration data. Perhaps participants show similar evidence for duration-selective channels, in the absence of number, as they do for other non-numerical domains (like spatial frequency).

      The fast and slow conditions were not meant to control for duration strategies but to test for the generalizability of the results over different tapping temporal dynamics (temporal frequency in this case). The results confirmed this.

      The control for duration strategies is the comparison between precision in reproducing durations or numbers. In the number-to-action task, participants were free to use any cues, including response duration. However, it is safe to assume that the performance is dominated by the most precise feature, number in this case. In other words, in the number task if participants were reproducing the time required to give a certain number of presses, then in the timing task, where they are explicitly reproducing the same durations, they should show no lower precision. The results are opposite to that prediction. (p. 16 lines 418-420)

      Theories of numerical cognition.

      An expansive literature on numerical cognition suggests that many animals, human children, and adults across cultures have two systems for representing numerosity without counting - one that can represent the exact cardinality of sets smaller than about 4 and another that represents the approximate number of larger sets (but see Cheyette & Piantadosi, 2020). The current paper would benefit from better relating its findings to this long lineage of theories and findings in numerical approximation across cultures, ages, and species.

      The numbers used in this work were well above the subitizing limit (>N7). Indeed, the WFs found showed no signs of subitizing discontinuities. We believe that discussing the literature on subitizing here is too far from the scope of the current work.

      Additional public comments from the Reviewing Editor:

      (1) What, in the present work, makes the case that the operative mechanism is sensorimotor? The authors frame the discussion around a sensorimotor number system but the evidence here could be seen as using a sensorimotor task as one way to get at an amodal number channel. For example, the authors could do the same experiment but have people watch a circle that flashes on and off for n times, with participants reporting the number of flashes (or shown a number and asked to say more or less). They could then apply the same analyses as used here. If they got the same results, it would seem that this would be an argument against the channels being sensorimotor. I suppose if they did NOT get results in the perceptual task, then they would have (much) stronger evidence that the channels are somehow sensorimotor in nature. Either way, an experiment along these lines would be essential for addressing the nature of the channels (tied to sensorimotor or not).

      We chose to use this task because the perception of simple digits (like those used here), at least in educated adults, is noiseless. This ensures that the inter-individual variability remaining on the table is that related to the motor transformation process. For this reason, we believe that the task can be safely considered sensorimotor (see also Kirschhock & Nieder, Number selective sensorimotor neurons in the crow translate perceived numerosity into number of actions, Nature comm, 2022). (p. 14 lines 351-355)

      This is not true for verbal numerosity estimation of non-symbolic stimuli (such as dots and/or series of events). It is well known that the estimation of the latter stimuli is noisy, and there would be no sensorimotor transformation processing in the task. The inter-individual variability in estimation precision and thus the measurable channels would then reflect sensory numerosity tunings. These have been revealed with various techniques in both humans and animals. However, we are now following this idea and we have preliminary data showing that sensory channels are also detectable by the technique used in the current study. This in not in contrast with the sensorimotor nature of the channels found here, but instead indicating the existence of both sensory and sensorimotor number channels.

      The authors may argue that results from other studies such as the 2016 target article make the case about a sensorimotor basis of these channels. While I don't have a great grasp of this literature, my take on the 2016 target article is that the point was not about sensorimotor channels but about interactions between action and vision. This seems more in line with the idea of amodal number channels and indeed, they speak about a "generalized number sense" in that paper.

      The 2016 paper showed that a short period of hand tapping (adaptation) can distort visual numerosity perception. The results implied the existence of sensorimotor number channels, integrating non-symbolic numerosity (dots/flashes) and actions. The current study goes beyond this, describing (for the first time) sensorimotor channels transforming symbolic numbers into action sequences. Whether these channels are also in charge to encode non-symbolic numerosity is an interesting open question that we are currently investigating with cross-tasks analyses. If the same channels are in charge to respond to non-symbolic numerosity (across space and time: dots and sequences of visual/auditory events) as well as to translate digits into actions, we could than speck about a generalized sensorimotor number sense. At present, this remains a possibility, to be tested. (p. 17 lines 450-459)

      (2) There is a need for clarification on the method for creating the correlation matrices. The authors write that they look at correlations between Weber fractions between participants. By "between" do they mean "across"? That is, they calculate the Weber fraction for each individual for each cell. Then for a given cell, you correlate its Weber fraction with every other cell, using the pairs for each individual. I would call this "across" not "between." Is this just a semantic thing or have I misunderstood the process?

      To make this concrete, consider the correlation for cell 10/11. I assume it is something like

      10 11

      Subj1 .25 .31

      Subj2 .13 .09

      Subj3 .22 .16

      Etc

      And correlation across participants will be the data point for the 10/11 cell in the matrix.

      It is a semantic error; this is exactly what we did: across participants.

      To clarify better: we measured the precision in transforming numbers into sequences of actions. This was done for different target numbers (8, 10, 11, 13, 14, 16, 19, 21, 24, 28, 32) and with N participants. For each target number, and independently for each participant, we than calculated the reproduction precision (Weber Fraction). The dataset then consists of a matrix where each column represents a participant, and each row a target number. Each cell contains the corresponding Weber Fraction. This dataset was then analysed with a simple correlation, across participants. For example, the WFs of the N participants obtained when testing the target number "8" were correlated with those obtained with the target numbers "10, 11, 13...32". The results show that the correlation between "8 and 10" (low numerical distance) was higher compared to that obtained correlating "8 with 32" (higher numerical distance). This pattern implies that the shared variance, between numbers (across participants) scales with numerical distance, in line with the existence of channels that aggregate similar numbers (tunings).

      (p. 7 lines 186-196)

      (3) The duration data should be analysed. While n is small, can't the authors correlate WFs across tasks? Suppose a similar pattern is observed, suggestive of >1 channel in this between-task correlation.

      One of the strengths of this technique is that it is very general, it can be applied to virtually every stimulus feature. We are currently collecting data to test the existence of generalised sensorimotor channels for continuous magnitudes: space, time, and numerosity. The logic is exactly as suggested. These correlational analyses however require (relatively) large samples and ad-hoc experimental conditions. We do not feel confident in providing messages on this with 9 participants. Out of curiosity, however, we analysed the data as requested and the results are interesting: signatures of sensorimotor channels emerge in both the number and duration tasks but NOT when analysed in conjunction (cross-task). If these results will be confirmed, would indicate the existence of separate mechanisms for the encoding of time and numerosity (and perhaps also space?).

      (4) The finding of similar results for fast and slow is quite interesting. And provides good motivation to do the duration control experiment. But two issues related to the control experiment:

      (4a) Why not look at the correlation matrix for the duration task? Was this not done because there were only 9 participants? If so, why the small n here?

      Yes, that is the reason. The aim of this work is not to investigate the existence of duration channels. This experimental condition was designed as a control for the use of non-numerical strategies in the number task. It worked well. The results were already obvious with 9 individuals (confirming Kirschhock & Nieder, Nature comm, 2022); we then did not consider necessary to continue in this direction. However, related to the previous point, we run a preliminary analysis on this small data set and (as mentioned above) signatures of sensorimotor channels (correlation gradients) emerge in both number and duration tasks but NOT when analysed in conjunction (cross-task), indicating different mechanism. We are now pursuing this issue using different number and duration tasks.

      (4b) I don't follow why greater precision on the tapping task compared to the duration task makes a strong case against the duration hypothesis. Is the argument that, if based on duration, there should be greater precision on the duration task since the tapping task would exhibit the variability from duration PLUS added noise from tapping? If this is the argument, this should be spelled out.

      Yes. The more precise feature dominates behaviour. In other words, in the number task if participants were reproducing the time required to give a certain number of presses, then in the timing task, where they are explicitly reproducing the same durations, they should show no lower precision. The results are opposite to that prediction. (p. 18 lines 418-420)

      (4c) Related to point 3 above, one would expect based on things like Rammsayer's study that duration judgments would also engage channels. Is the idea that these are different channels in the tapping task? There seems a good case to have participants do both tapping and duration tasks and then do the correlation matrices, comparing within and between tasks.

      Please see response to 3 and 4a.

      Recommendations for the authors:

      (1) On the logic of the channel concept as applied in the current context:

      While the authors present the numerical channel idea by analogy to how this concept is used for other features such as spatial frequency or orientation, there is no input to activate the channels-just a written numeral. The channel concept would mean that to respond to say, "16", you get output from multiple channels, with each weighted by its "tuning" to 16 such that the aggregate results in approximately 16 taps. This seems a bit odd: it would be like saying to draw, I use the output from my spatial frequency channels to create an image with a particular power spectrum. The logic of the channel concept in the current experimental context needs to be reviewed and clarified.

      The channel here reflects (probably) the activity of noisy neurons in charge to translate sensory information into a numerical motor output, such as those shown by Kirschhock & Nieder (Nature comm, 2022) in the crows. We used digits because their encoding (at least for such simple digits and educated adults) has no associated noise. The interindividual variability left, and analysed, is thus mainly associated with the motor transformation process, revealing sensorimotor channels.

      (2) A more thorough analysis of the duration task would strengthen the paper. The n is small for this interesting control condition and the analyses presented in the current version of the paper are limited. It is recommended to make this a fully powered test with complete analyses. Consider making this a new experiment in which participants do both the tapping and duration tasks to allow cross-modal analyses.

      We run some exploratory analyses on this, described in comments 3 and 4a. We prefer to leave this issue to dedicated future experiments (which are just started).

      (3) Expanded discussion of the limitations of the current study. The authors are clear that the methods don't provide a strong test of whether there are two or more than two channels. It would be useful to also comment on whether the estimated locations of the peaks are robust or if there is some sort of statistical bias for them to be at more extreme values. More generally, use the comments on the reviews to elaborate on various issues related to the channel concept.

      We addressed these issues in the ms (p. 17 lines 450-459).

      (4) Clarify the methods used to calculate the correlation matrix (see reviews).

      We now specified better the correlation analyses (p. 7 lines 186-196).

      (5) What is the basis for arguing that the mechanism under consideration is a "sensorimotor number system?" The data in this paper do not appear to provide evidence that the effects are linked to sensorimotor processes rather than reflect an amodal number system that is being accessed in their task through the motor system. At a minimum, present arguments for what motivates/justifies the sensorimotor claim or modify the paper to be neutral on this point.

      We now specified better the sensorimotor nature of the task used here (p. 14 lines 351-355; see also comment 1).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      Lujan et al make a significant contribution to the field by elucidating the essential role of TGN46 in cargo sorting and soluble protein secretion. TGN46 is a prominent TGN protein that cycles to the plasma membrane and it has been used as a TGN marker for many years, but its function has been a fundamental mystery.

      In parallel, it remains unclear how most secreted proteins are targeted from the Golgi to the cell surface. These molecules do not contain conserved sequence motifs or post-translation modifications such as lysosomal hydrolases. Cargo receptors for these secreted proteins have remained elusive.

      Therefore, these investigations are likely to have a significant influence on the field.

      To gain an insight into the molecular role of TGN46 in sorting, they systematically test the impact of the luminal, transmembrane, and cytosolic domains. Importantly and against the current thinking, they demonstrate that the luminal domain of TGN facilitates sorting. Interestingly, neither the cytosolic nor the length of the transmembrane domain of TGN46 plays a role in cargo export. The effects of TGN46 depletion are specific as membrane- associated VSVG remains unaffected.

      Interestingly, TGN46 luminal domain also plays an important role in the intracellular and intra-Golgi localization of TGN46, and it contains a positive signal for Golgi export in CARTS. Rigorous, well-performed data support the experimental evidence.

      A speculative part of the manuscript, with some accompanying experimental data, proposes that the luminal domain of TGN46 forms biomolecular condensates that help to capture cargo proteins for export.

      One important point to discuss is that the effects of TGN46 KO are partial, suggesting that TGN46 stimulates the Golgi export of PAUF but is not essential for this process. The incomplete block is apparent in Fig 1 and in Fig 5D.

      We thank the reviewers and the editorial team for their assessment and valuable feedback on our manuscript. Their supporting comments reinforce the significance of our findings.

      Regarding the specific point raised about the partial effects observed in the TGN46 KO cell line, we acknowledge the importance of this issue, and we have addressed it in more detail in the revised version of our manuscript. The partial effects observed when using the TGN46 KO cell line are likely caused by several factors:

      (1) It is important to consider the phenomenon of cell adaptation/compensation, which is documented to occur in gene knockout cell lines. Cells often respond to genetic perturbations by adapting to compensate the loss of a specific gene. These compensatory effects could potentially mitigate the full impact of TGN46 depletion and might explain the partial effects observed.

      (2) Our data indicate that the absence of TGN46 reduces PAUF secretion, but does not completely block its export. These results align with our proposed role TGN46 in cargo sorting. In its absence, the secretory proteins likely exit the TGN via alternative routes/mechanisms, such as "bulk flow" or by entering other transport carriers in an uncontrolled manner. The partial redistribution of the TGN46-∆lum mutant into VSVG carriers (Figure 4D) supports this likelihood. Importantly, similar situations are observed when unrelated sorting factors are depleted from the Golgi membranes. For example, when the cofilin/SPCA1/Cab45 sorting pathway is genetically disrupted, the secretion of this pathway's clients is inhibited but not completely halted (e.g., von Blume et al. Dev. Cell 2011; J. Cell Biol. 2012).

      (3) As suggested by the reviewers, it remains possible that TGN46 is not the sole player for cargo sorting. The existence of redundant or alternative mechanisms cannot be ruled out.

      In our revised manuscript, we have now provided a more in-depth discussion of these factors and their potential contributions to the observed partial effects in TGN46 KO cells (lines 447-463). We believe that a comprehensive exploration of these possibilities will improve our understanding of the role(s) of TGN46 in cargo sorting and TGN export.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      The reviewers were unanimously enthusiastic about your work. They felt that the manuscript could be significantly improved mostly through careful re-wording, additional explanations and some figure modifications.

      We thank the reviewers and the editorial team for their enthusiastic assessment of our findings. Their positive feedback is reassuring.

      We have now addressed the reviewers' suggestions to improve the clarity of our manuscript. Specifically, we have improved various aspects of the text that may have lacked clarity in the initial submission. This includes a thorough re-writing of respective sections to ensure that the content is more accessible and reader-friendly (see detailed answers to the additional points below). Furthermore, we have carefully followed the recommendations related to figure modifications.

      Please mention the species (human) in the title.

      We have changed the title according to the suggestion. The revised title now is: "Sorting of secretory proteins at the trans-Golgi network by human TGN46". In addition, we have also added the word "human" in the abstract ("... we identified the human transmembrane protein TGN46 as a receptor for the export of secretory cargo protein PAUF in CARTS ...").

      Additional points:

      The main Figures only show quantifications that are challenging to understand without fluorescence micrographs. We suggest putting the micrographs of the fluorescence images (Figures S2A and B) into the main Figure 2 (before 2B and 2C)-the same in Figures 3 and 4.

      Following the reviewers' suggestion, we have incorporated the fluorescence micrographs (included as figure supplements in the initial submission) into the main figures 2–5. Given that these additions have introduced a significant number of extra figure panels, we have carefully re-designed the figure layout to accommodate all the necessary information. This has involved that the FLIP data from old Figs. 2–4 is now included as a new Fig. 3; and the split of old Fig. 4 in the new Figs. 5,6. The supporting figures have also been rearranged accordingly. In addition, we have changed the color palette of the micrographs, in which now the dual-color images are presented in color-blind-friendly green and magenta, instead of green and red as previously. We believe that in this revised manuscript, all data and micrographs are clearly presented.

      For figures such as Fig. 1B, the mean and SD positions are hard to see for the data plotted as solid black dots. Maybe hollow circles would be better.

      The reviewers are right and we apologize for any difficulty in discerning the mean and SD positions from the figure. In our revised version, we have made the necessary modifications to all the figures where data points were plotted as solid black circles by converting them into empty black circles, as suggested by the reviewers.

      In the right side of Fig. 1A, is the difference in PAUF secretion between WT and KO cells truly significant? The meaning of the number of asterisks should be given in the legend. Only one asterisk is shown, suggesting that the significance is low.

      In our revised manuscript, we have included comprehensive information about the statistical significance, such as the statistical test used, p-values/asterisk meaning, and any other relevant details. In addition, we have included the lines connecting the individual data points corresponding to the different replicates of the secretion assays (WT vs KO).

      Experiments such as the one in Fig. 1C may be better described as iFRAP rather than FLIP.

      We appreciate the reviewers' attention to the experimental methods used, e.g., in Figure 1C. We actually performed FLIP experiments rather than iFRAP, and we acknowledge that this might not have been stated clearly in our initial submission. The distinction between iFRAP and FLIP lies in the frequency of photobleaching. In iFRAP, photobleaching occurs only once at the beginning of the experiment, whereas FLIP involves repeated photobleaching (FLIP is sometimes also referred to as "repeated iFRAP"), which was conducted in our experiments. Specifically, in our experiments we performed repeated photobleaching at a relatively slow rate (approximately once per minute; every two imaging frames).

      We understand the potential source of confusion, which may have arisen from the references we provided to introduce our FLIP experiments (Hirschberg et al. 1998; Patterson et al. 2008). In those papers, almost all results were obtained using iFRAP and not FLIP. In light of this feedback, we have made significant efforts in our revised manuscript to clarify the terminology and procedure used in our experiments (lines 148-154). These revisions have improved the understanding of our findings and we appreciate the reviewers' suggestions.

      When using iFRAP to measure the Golgi residence time of a TGN46 construct that has a cytosolic tail, shouldn't recycling from the plasma membrane be taken into account? Unlike a secreted protein, TGN46 will never show complete loss of signal from the Golgi.

      The reviewers are right: for a TGN46 construct that can efficiently recycle back to the TGN from the cell surface, an iFRAP experiment would not report solely the protein residence time at the Golgi. We concur with the reviewers, and we'd like to clarify that the reason we performed FLIP experiments, as opposed to iFRAP, was precisely to address this concern. In an iFRAP experiment, where photobleaching occurs only once at the beginning, the fluorescence decay within the Golgi area would indeed consist of two components: a decay due to the export of the protein and an increase in fluorescence due to the protein that had been exported (after the initial photobleaching) and then recycled back to the Golgi area. In contrast, our choice of conducting FLIP experiments, with repeated photobleaching of the pool of fluorescent protein outside the Golgi area (approximately once per minute), minimizes the influence of recycling. Consequently, the loss of fluorescence in the Golgi area in our FLIP experiments predominantly reflects the protein's export. We acknowledge that this distinction was not adequately communicated in our initial submission and we have emphasized these points in the revised version of the manuscript (lines 230-234).

      Lines 274 to 285 are confusing and controversial. The author argues that the transmembrane domain does not impact TGN localisation and cargo packaging. Later, they state, "These data further support the idea that the slower Golgi export rate of TGN46 mutants with short TMDs is a consequence of their compromised selective sorting into CARTS".

      We appreciate the reviewers' attention to the potential confusion regarding the impact of the TMD on TGN localization and cargo packaging. Actually, our results indicate that the length of the TMD does not seem to have an impact in intra-Golgi protein localization (Fig. 4B,C) but they do play a role in incorporation into CARTS (Fig. 4D,E). We have now clarified this in the text (lines 283-284; 296-297).

      That being said, these results were also surprising to us initially. However, upon closer examination of the amino acid sequence of the cytosolic domain of TGN46, we noticed a possible side effect of shortening its TMD. Shortening the TMD of TGN46 could lead to the partial burial of highly charged residues from TGN46 cytosolic tail (HHNKRK...) into the membrane, potentially affecting its behavior. For that reason, we constructed the TGN46 ∆cyt ST-TMD mutant, which features a short TMD (ST TMD) and lacks the potential interference from the cytosolic tail (see also lines 307-320). Notably, this mutant showed a phenotype similar to that of TGN46-Δcyt, and to that of full length TGN46, particularly in terms of intra-Golgi localization and CARTS specificity. We acknowledge that the interpretation of these results can be debated, and we have ensured that the revised manuscript captures these nuances. Additionally, we have realized that the organization and presentation of these results may have caused confusion, particularly concerning the placement of the results from the GFP-TGN46 ∆cyt ST-TMD mutant. To address this, we have reorganized old Figures 2 and 3 to ensure that the results of the GFP-TGN46 ∆cyt ST- TMD mutant are presented with the short TMD mutants. These adjustments have greatly improved the overall flow of our manuscript. We thank the reviewers for their valuable feedback.

      In lines 444-446 in the Discussion the argumentation is confusing. The experiment shows that the cytosolic domain of TGN46 has no impact on TGN46 localisation or cargo packaging into a nascent vesicle. At the same time, the authors mention that a cytosolic complex composed of Rab6 and p62 is required to generate CARTS.

      We are grateful for the reviewers' feedback regarding our argumentation in lines 444-446. Indeed, our results indicate that the cytosolic tail of TGN46 does not play a major role in packaging of TGN46 in CARTS and in PAUF secretion. However, it is important to acknowledge that our findings do not rule out the possibility that TGN46 might have a dual function at the TGN. It could potentially play a role in mediating or controlling the export of other cargo proteins by alternative mechanisms/routes, which could, in part, depend on its cytosolic domain.

      This complexity is consistent with the open question regarding the role of the cytosolic Rab6- p62 complex in CARTS biogenesis. Interestingly, in experiments reported in Jones et al. (1993), a Golgi budding assay was used to test the involvement of the cytosolic domains of TGN38 and TGN41 in budding of Golgi-derived carriers that contain the transmembrane cargo protein pIgA-R (polymeric IgA-receptor). The authors showed that the budding of these carriers was blocked upon incubation of the Golgi membranes with peptides against the cytosolic tail of TGN38/41 but not peptides against their lumenal domain. However, in the latter experiment, they used a peptide formed by the 15 N-terminal residues of TGN46, which might not functionally block the entire lumenal domain (>400 residues). Our results with reference to earlier results in the field will serve as a basis for further exploring the role(s) of TGN46 in cargo export beyond the scope of the present study.

      In summary, these are all very important points (we thank again the reviewers for highlighting them), which we have now carefully addressed in the revised version of our manuscript (lines 476-485).

      The phase separation experiments are exciting. However, they are not necessary. They may be more confusing than helpful for the following reasons:

      • The authors use very high protein concentrations and crowding reagents. Any protein would condense under these conditions.

      The protein was produced in bacteria so that it won't have post-translational modifications, especially glycosylation, possibly the most critical drivers of phase separation.

      There was no test of direct binding of PAUF with TGN46

      We appreciate that the reviewers share our excitement about our preliminary phase separation experiments. Likewise, while we initially included these experiments in the "Ideas and speculation" section due to their exciting nature, we concur with the reviewers that their preliminary nature and the experimental conditions used to obtain them raise valid concerns.

      In light of these considerations and to prevent any potential confusion for the readers, we have decided to follow the advice of the reviewers. We have removed the phase separation experiments and data from the revised manuscript. Instead, we have retained a simplified and concise "Ideas and speculation" section, in which we propose condensate formation as a potential mechanism by which TGN46 functions as a cargo sorter at the TGN (lines 580- 620).

      The authors reference S5A as the localisation between TGN46deltaLUM images, however, we believe they are referring to fig. S7.

      We apologize for the oversight in referencing the figure and thank the reviewers for bringing this to our attention. We have amended this in the revised version.

      The authors write "remarkably, the amino acid sequence of rat TGN38 is largely conserved amongst other species, including humans (>80% amino acid identity between rat TGN38 and human TGN46)". To understand if this is remarkable, the authors should use the average identity between rat and human proteins.

      We are grateful for the reviewer's insightful comment. Indeed, as the reviewer hints, the average identity between the rat and human proteomes is of the same order of magnitude as the identity reported between rat TGN38 and human TGN46. We therefore acknowledge that the term "remarkable" may not be suitable in this context and could lead to potential misinterpretation. In the revised version, we have removed the term "remarkably".

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript of Zhao et al. aimed at investigating the relationships between type 2 diabetes, bone mineral density (BMD) and fracture risk using Mendelian Randomization (MR) approach.

      The authors found that genetically predicted T2D was associated with higher BMD and lower risk of fracture, and suggested a mediated effect of RSPO3 level. Moreover, when stratified by the risk factors secondary to T2D, they observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased.

      Strengths:

      • Important question

      • Manuscript is overall clear and well-written

      • MR analyses have been conducted properly, which include the usage of various MR methods and sensitivity analyses, and likely meet the criteria of the MR-strobe checklist to report MR results.

      Response: Thanks.

      Weaknesses:

      • Previous MR studies on that topic have not been discussed

      Response: In the manuscript, we discussed the previous MR findings from Trajanoska et al., BMJ, 2018. This study assessed the effect of 15 clinical risk factors (including type 2 diabetes) on fracture risk. Now we have included the other two studies (Mitchell et al, Diabetologia, 2021; Ahmad et al JBMR, 2016) which took BMD as the exposure in the paragraph when we discussed the effects on BMD.

      • Multivariable MR could have been used to better assessed the mediative effect of BMI or RSPO3 on the relationships between T2D and fracture risk.

      Response: In revision, the inverse weighted multivariable MR model was used to estimate the direct effect of T2D upon the fracture and BMD adjusted for BMI with ‘MVMR’ R package (https://github.com/WSpiller/MVMR). Specifically, we first extracted the overlapping SNPs from the summary data for T2D, BMI and fracture. Then the independent significant SNPs (P<5×10−8 and R2<0.1) for either T2D or BMI were pooled as instruments. Additionally, we performed SNP harmonization to correct the orientation of alleles. Additionally, we performed SNP harmonization to correct the orientation of alleles. The results showed that increased risk of T2D has a direct effect that decreased fracture risk (OR=0.974, 95%CI=0.953-0.995, P=0.017 adjusted BMI), and BMI mediated 9.03% of the protective effect. The multivariable MR analysis suggested that T2D also showed direct effect on increased BMD after adjusting for BMI (β=0.042, 95%CI=0.026-0.057, P=1.92×10-7). We didn’t observe the direct effect of MRI-derived visceral (β=0.02, P=0.831) and abdominal subcutaneous (β=0.03, P=0.57) on fracture risk adjusted for RSPO3 expression. We have updated the Methods and Results accordingly.

      Reviewer #2 (Public Review):

      The authors employed the Mendelian Randomization method to analyze the association between type 2 diabetes (T2D) and fracture using the UK Biobank data. They found that "genetically predicted T2D was associated with higher BMD and lower risk of fracture". Additionally, they identified 10 loci that were associated with both T2D and fracture risk, with the SNP rs4580892 showing the highest signal. While the negative relationship between T2D and fracture has been previously observed, the discovery of these 10 loci adds an intriguing dimension to the findings, although the clinical implications remain uncertain.

      Response: We appreciate the reviewer's thoughtful evaluation of our study. The hypothesis and idea of this study is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the risk association could be observed. However, when stratified by the risk factors secondary to the disease, we observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased, and the association became non-significant if the T2D patients carried none of the risk factors. These results suggested that the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, the clinical implications of our study might lie in the health management of type 2 diabetes patients. We suggest that it is important to manage the complications of type 2 diabetes to prevent the risk of fracture.

      Reviewer #1 (Recommendations For The Authors):

      • Introduction/discussion: findings from MR previously published on that topic have not been discussed in this manuscript (eg, Mitchell et al, Diabetologia, 2021; Ahmad et al JBMR, 2016);

      Response: In the manuscript, we discussed the previous MR findings from Trajanoska et al., BMJ, 2018. The study assessed the effect of 15 clinical risk factors (including type 2 diabetes) on fracture risk. Sorry that we missed the studies you mentioned, these two studies took BMD as the exposure, now we have included them in the paragraph where we discussed the effect of T2D on BMD (Page 14, Line 320-322).

      • In the one-sample MR analysis: I would suggest looking at whether the association between T2D GRS and fracture risk differ across fracture sites; in the hypothesis that BMI might be protective, performing the analysis separately for weight-bearing bones vs not weight-bearing bones would be interesting.

      Response: According to your suggestion, we further categorized fractures into weight-bearing bones (neck, vertebrae, pelvic, femur, tibia) and other bones (detailed codes have been added to Supplementary Table 16). When we regressed the observed fracture on the wGRS, it indicated that there was trend of protective association between T2D wGRS and both weight-bearing bones fracture (OR=0.9772, 95%CI=0.9552-0.9997, P=0.04737, N of fracture=8,992) and other bones fracture (OR=0.9838, 95%CI=0.9688-0.9991, P=0.0386, N of fracture=20,317) (Figure 1). We have updated the Methods and Results accordingly (Page 6, line 129-134 and Page 18, line 408-412).

      In this analysis, I would also suggest verifying the absence of sex interaction with T2D PRS on BMD and fracture risk

      Response: Thanks for your suggestion, we further estimated the effect of sex interaction on BMD and fracture risk with T2D wGRS × sex interaction term in regression model. And you are right, we found no interactions (sex with T2D wGRS) on fracture risk (P=0.5576) and BMD (P=0.66). Moreover, we conducted the stratified analysis by sex. When we regressed the observed fracture on the wGRS in male, we found that the genetically determined type 2 diabetes was also associated with lower risk of fracture (OR=0.977, P=0.015) (adjusting for reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments). In female, the direction of the association remained with no significance (OR=0.986, P=0.139). We tested the heterogeneity between male and female, and found no significant difference (Pheterogeneity= 0.457). Similarly, the genetically determined type 2 diabetes was associated with higher BMD in male (β=0.023, P=8.23×10-14) and female (β=0.022, P<2.0×10-16), and Pheterogeneity=0.6306 (Supplementary Figure 2). We have updated the Methods and Results accordingly (Page 6, line 134-139 and Page 19, line 425-429).

      • In the two-sample MR analysis: I would suggest performing a multivariable MR to look at the effect of T2D adjusted for BMI on BMD and fracture risk (see Burgess et al, AJE, 2016)

      Response: Thanks for your suggestion, in revision, the inverse weighted multivariable MR model was used to estimate the direct effect of T2D upon the fracture and BMD adjusted for BMI with ‘MVMR’ R package (https://github.com/WSpiller/MVMR). Specifically, we first extracted the overlapping SNPs from the summary data for T2D, BMI and fracture. Then the independent significant SNPs (P<5×10−8 and R2<0.1) for either T2D or BMI were pooled as instruments. Additionally, we performed SNP harmonization to correct the orientation of alleles. Additionally, we performed SNP harmonization to correct the orientation of alleles. The final IVs used in MVMR were presented in Supplementary Table 17. The results showed that increased risk of T2D has a direct effect that decreased fracture risk (OR=0.974, 95%CI=0.953-0.995, P=0.017 adjusted BMI) and increased BMD (β=0.042, 95%CI=0.026-0.057, P=1.92×10-7 adjusted BMI). We have updated the Methods and Results accordingly (Page 7, line 155-158, 162-164, and Page 20, line 456-465).

      • In the section "infer the shared genetics". In addition of using waist circumference and waist-hip ratio, it would have been interesting to use GWAS summary statistics for subcutaneous and visceral adiposity (Agrawal, Nat Comm, 2022), and look at through multivariable MR whether RSPO3 mediate the effect of subcutaneous fat on fracture risk.

      Response: Thanks for your suggestion, we downloaded the genetic summary data from Agrawal, Nat Comm, 2022, and performed the same SMR analysis as we did before. We found that higher expression of RSPO3 was associated with higher MRI-derived visceral (β=0.199, P=4.36×10-5). We have updated the Methods and Results accordingly (Page 9, line 206-208 and Page 22, line 494-495).

      We didn’t observe the direct effect of MRI-derived visceral (β=0.02, P=0.831) and abdominal subcutaneous (β=0.03, P=0.57) on fracture risk adjusted for RSPO3 expression.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments

      Several concerns regarding the study's concept and methodology should be addressed before accepting the findings as credible. I would like to invite the authors to comment on the following points.

      (1) I find the authors' assertion that individuals with type 2 diabetes (T2D) exhibit both higher BMD and an increased risk of fracture to be unconvincing. The BMD measurement they refer to is based on areal BMD, which fails to account for the three-dimensional aspect of bone density. Existing evidence suggests that patients with T2D actually have lower trabecular bone scores (a predictor of fracture risk) compared to those without the condition. Furthermore, there is a lack of a clearly stated hypothesis underlying the study.

      Response: Yes, in this study, the bone mineral density measurement is based on areal BMD. We made this clear in Abstract. And we agree that other measurements, such as trabecular bone score and chest CT texture analysis, could provide additional valuable information in the evaluation of fracture risk, especially in type 2 diabetes patients. We have discussed this in the manuscript (Page 13, line 295-300). Epidemiologic studies from the past decade provided evidence that increased bone fracture risk is one of the complications of type 2 diabetes. but the areal BMD in type 2 diabetes patients could be normal or even higher (Botella Martinez et al., 2016; Romero-Diaz et al., 2021).

      In this study, we employed the mendelian randomization approach to investigate the relationship between type 2 diabetes and fracture/BMD, this method might facilitate the use of genetic data as instrumental variables to alleviate the bias of the unknown confounding factors. We found that the genetically predicted type 2 diabetes was associated with higher BMD and lower risk of fracture. That is to say, by alleviating the bias of the unknown confounding factors through MR analysis, the genetically predicted type 2 diabetes did not show bone paradox.

      We then performed observational analysis in UK Biobank, and found that type 2 diabetes was associated with higher risk of fracture and increased BMD. Further, we stratified the T2D patients with five secondary risk factors (BMI≤25kg/m2, no physical activity, falls in the last year, HbA1c≥47.5mmol/mol and antidiabetic medication treatment), and found that the effect of type 2 diabetes on the risk of fracture decreased when the risk factors secondary to type 2 diabetes decreased, and the association became not significant if the type 2 diabetes patients carried none of the risk factors. That is to say, the diabetic bone paradox might not exist if the secondary risk factors of type 2 diabetes were eliminated.

      The hypothesis and idea we want to deliver is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the association could be observed. However, when stratified by the risk factors secondary to the disease, we observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased, and the association became non-significant if the T2D patients carried none of the risk factors. These results suggested that the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, it is important to manage the complications of type 2 diabetes to prevent the risk of fracture.

      In addition, although we observed type 2 diabetes was observed to be associated with higher risk of fracture, but BMI mediated 30.2% of the protective effect. And the shared genetic architecture between type 2 diabetes and fracture suggested a top signal near RSPO3 gene. Higher expression of RSPO3 was associated with higher waist circumference and higher waist-hip ratio. These results suggested that relatively higher BMI in type 2 diabetes patients might benefit the higher BMD, as our previous study suggested that keeping moderate-high BMI (overweight) might be of benefit to old people in terms of fracture risk(Zhu et al., 2022).

      (2) It is not a good idea to solely concentrate on overall fracture risk as it may obscure the potential relationship between T2D and specific fracture sites, such as hip and vertebral fractures. By solely considering total fracture incidence, important associations at individual fracture sites could be overlooked. I would like to propose that the authors expand their analysis to include the examination of hip and vertebral fractures. By incorporating these specific fracture types into their study, a more comprehensive understanding of the association between T2D and fractures can be achieved.

      Response: This is a good suggestion, incorporating with the comments from another reviewer, and considering the sample size, we classified fractures into weight-bearing fractures (neck, vertebrae, pelvic, femur, tibia) and other bones (skull and facial, ribs, sternum, forearm, wrist and hand, foot and other unspecified body regions) fracture. We identified 6,582 (1.87%) participants with weight-bearing bones fracture and 9,586 (2.72%) participants with other bones fracture within the 352,879 UK Biobank participants. We observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments (weight-bearing bones fracture: HR=1.792, 95%CI 1.555-2.065, P=8.25×10-16; other bones fracture: HR=1.337, 95%CI 1.167-1.531, P=2.85×10−5), and additionally controlled for BMD (weight-bearing bones fracture: HR=1.850, 95%CI 1.602-2.136, P<2×10−16; other bones fracture: HR=1.377, 95%CI 1.199-1.580, P=5.54×10−6). We have updated the manuscript according in Results, Methods and Figures (Page 11, line 245-250; Page 24, line 540-547; Figure 4A).

      (3) I consider that there is an issue with combining data from both males and females in the analysis. It is widely recognized that women generally have a higher risk of fracture compared to men. Moreover, the association between BMD and fracture may vary between genders, and the risk of T2D is typically lower in women than in men. Therefore, I strongly recommend that the analysis be stratified by gender to account for these differences and provide a more accurate understanding of the relationships involved.

      Response: Thanks for your suggestion, we now add the stratified results by sex to each analysis. Briefly, in wGRS analysis, we found that the genetically determined type 2 diabetes was associated with lower risk of fracture in male (OR=0.977, 95%CI=0.958-0.995, P=0.015) (adjusting for reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments). The association in female was not significant, but the direction is the same as the male (OR=0.986, 95%CI=0.969-1.004, P=0.139). Meanwhile, the genetically determined type 2 diabetes was associated with higher BMD in both male (β=0.023, 95%CI=0.017-0.030, P=8.23×10−14) and female (β=0.022, 95%CI=0.017-0.026, P<2×10−16). In observational analysis, we observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments in male (HR=1.587, 95%CI 1.379-1.828, P=1.26×10−10) and female (HR=1.530, 95%CI 1.334-1.756, P=1.27×10−9), respectively. When we additionally controlled for BMD (HR=1.607, 95%CI 1.393-1.853, P=7.21×10−11 in male; HR=1.601, 95%CI 1.393-1.841, P=3.59×10−11 in female), we still observed increased risk of fracture in type 2 diabetes (Page 6, line 136-139; Page 11, line 241-243).

      (4) My understanding is that "BMD" in UK Biobank refers to estimated BMD derived from ultrasound measurements, rather than being directly measured using dual-energy X-ray absorptiometry (DXA). It would be helpful to clarify whether the BMD mentioned in the manuscript refers to estimated BMD or DXA-based BMD to ensure accurate interpretation of the results.

      Response: Yes, we used the BMD estimated from quantitative ultrasound measurement at heel as the outcome. Use of the device generates two variables, including speed of sound (SOS) and BUA (the slope between the attenuation of the sound signal and its frequency as it travels through the bone and soft tissue). Heel BMD was calculated by the following formula: BMD = 0.002592 ×(BUA+SOS)−3.687. We have made this clear in Methods (Page 23, line 526-530).

      (5) The clarification regarding the nature of the 13,817 individuals with T2D mentioned in Supplementary Table 9 is needed. It is unclear whether this figure represents incidence or prevalence. If it refers to incidence, it would be informative to specify the duration of the follow-up period for these individuals.

      Response: The UK Biobank data (application #41376), was applied in our study under a prospective design. We excluded participants if they were identified as follows: 1) ethnically identified as non-European (n =30,481); 2) diagnosed as type 1 diabetes (n=4,455); 3) diagnosed with diseases associated with bone loss (n=21,560); 4) diagnosed as fracture with known primary diseases (n=7,222) (Supplementary Table 15). For the 439,982 UK biobank samples, we focused the participants diagnosed with T2D within the 10-year period from 1 January 2006 to 31 December 2015, leaving 425,772 participants (with 14,860 type 2 diabetes patients). Here, each type 2 diabetes patient had a diagnosis date (i.e., the reference date), we first calculated the onset age, then among the participants who were free of T2D, we selected up to 27 participants (whenever possible) whose age at the reference date (± 3 years) could be matching to the onset age as referents. In total, 363,884 non-T2D referents were individually matched with 6-year age band at the reference date. We prospectively followed these type 2 diabetes patients and referents from the reference date until diagnosis of fracture, death, emigration, 19 April 2021 (diagnose a fracture of the last person in the cohort), whichever came first (with the mean duration of type 2 diabetes 8.34 years). Survival time was calculated based on whether the patient had a fracture. If individuals had a fracture, the survival time is calculated as the time of the first diagnosis of fracture minus the reference date. If individuals did not have a fracture, it was defined as the minimum time of the reference date to diagnose a fracture of the last person in the cohort (19 April 2021), death, or emigration date. We excluded 25,865 participants with fracture diagnosis date, or death or emigration before the reference date, leaving 352,879 participants included in the final analysis (13,817 type 2 diabetes patients and 339,062 referents). We identified 16,147 (4.6%) participants with fracture within the 352,879 UK Biobank participant. We have made this clear in the Methods and Results (Page 18, line 400-406; Page 22-23, line 506-523; Page 10, line 231-233).

      (6) I find the selection of participants for the analysis to be highly problematic. Supplementary Figure 1 suggests that individuals with a history of fracture were excluded from the study. However, it is well established that prior fracture history is a significant predictor of future fractures. Therefore, the exclusion of participants with prior fractures likely introduced selection bias into the analysis, potentially compromising the study's findings.

      Response: Sorry that we used a misleading term “secondary fracture” in the manuscript and figure. What we want to say here is that “the participants diagnosed as fracture with known primary diseases” (n=7,222), because we want to investigate the effect of diabetes on fracture, we should exclude other factures with known reason. We have changed the term in the manuscript and figure accordingly (Page 18, line 405-406; Supplementary Figure 1).

      Since this study is a prospective design, all the participants did not have fracture at the reference date, we prospectively followed these type 2 diabetes patients and referents from the reference date until diagnosis of fracture, death, emigration, 19 April 2021 (diagnose a fracture of the last person in the cohort), whichever came first. Therefore, each study subject either had one fracture or no fracture.

      (7) It is unclear what exactly is meant by "genetically predicted T2D." Could it possibly refer to the polygenic risk score derived from the variants associated with T2D? Clarification is needed regarding the methodology used to determine this "genetically predicted T2D" and its relation to the construction of a polygenic risk score based on T2D-associated variants.

      Response: In this study, we used weighted genetic risk score (wGRS) method and two-sample Mendelian Randomization (MR) method to estimate the effect of genetically predicted T2D on fracture. We constructed the wGRS for the individuals in the UK biobank (294,571 samples with genotypes) as a linear combination of the selected SNPs weighted by their β coefficients on type 2 diabetes: wGRS = β1 SNP1 + β2 SNP2 + … + βn SNPn. n is the number of instrumental variables. To validate the wGRS results, we also performed the two-sample MR analyses that is independent of UK Biobank samples. We used three two-sample MR approaches, the inverse variance weighting (IVW), simple median and MR-PRESSO approaches. Both methods took the genetically predicted type 2 diabetes as the exposure (See Methods Page 18, line 419-422; Page 19, line 439-440).

      (8) My understanding is that the Mendelian Randomization analysis relies on, among others, 2 assumptions: (1) the genetic marker is linked to the exposure (e.g., T2D), and (2) the genetic marker remains independent of the outcome (e.g., fracture) when considering the exposure and all confounding factors. In the authors' study, they identified 10 loci that exhibited associations with both T2D and fracture risk. This finding raises questions about whether the assumptions underlying Mendelian Randomization have been violated?

      Response: You're absolutely right. Because the presence of horizontal pleiotropy could bias the MR estimates, we additionally used the MR pleiotropy residual sum and outlier (MR-PRESSO) method. When we excluded pleiotropic variants using restrictive MR-PRESSO method, the causal association was still detected between type 2 diabetes and fracture (OR=0.967, 95%CI=0.945-0.989, P=0.004) (Page 6, line 146-149).

      (9) The analysis provided in Supplementary Table 10 appears to have certain limitations. From my understanding, the analysis treated fracture and BMD as outcome variables, with T2D regarded as the predictor variable. However, what is of interest is whether the association between T2D and fracture remains significant even after accounting for well-established risk factors for fractures, including BMD. It is crucial to determine whether the association between T2D and fracture is independent of these established risk factors. Therefore, I suggest the authors consider the following 3 models:

      Model 1: fracture ~ age + T2D

      Model 2: fracture ~ age + T2D + BMD

      Model 3: fracture ~ age + T2D + BMD + fracture history + falls

      Response: In our previous analysis, we have adjusted for 7 covariates (including fall history) in the basic model for fracture, i.e.

      fracture ~ T2D + age + sex + BMI + physical activity + HbA1c + medication treatments + fall history (Model 0)

      We have already included “fall history” in the basic model, according to your suggestion, we further considered an additional model for fracture by including BMD as the covariate:

      fracture ~ T2D + age + sex + BMI + physical activity + HbA1c + medication treatments + fall history + BMD (Model 1)

      We cannot include fracture history as the covariate because each study subject either had one fracture or no fracture, as we also answered in Question 6.

      In model 0, we observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the clinical risk factors including reference age, sex, BMI, physical activity, HbA1c, medication treatments and fall history (HR=1.527, 95%CI=1.385-1.685, P<2.0×10-16). When we additionally controlled for BMD (model 1), we still observed increased risk of fracture in type 2 diabetes (model 1: HR=1.574, 95%CI=1.425-1.739, P<2.0×10-16) (Supplementary Table 11).

      We thank for your suggestion, and we have updated accordingly in Methods, Results, and Figures (Page 11, line 243-245; Page 24, line 539-540; Figure 4A).

      (11) The dichotomization of data presented in Figure 4 is not considered ideal, as this approach often leads to a loss of valuable information. It is strongly recommended that the authors reconsider their data analysis strategy and reanalyze the data using continuous variables, such as BMI and HbA1c, to capture a more nuanced understanding of the relationships involved.

      Response: We agree that dichotomization of data would lead to a loss of valuable information. In model 0 and model 1, we used the continuous variables in the analyses, we adjusted for the reference age, sex, BMI (as a continuous variable), physical activity, fall history, HbA1c (as a continuous variable) and medication treatments to analyze the relationship between type 2 diabetes and fracture in the cox proportional hazards regression. We have updated the Figure 4 accordingly.

      In stratified analyses, we took 5 clinical factors secondary to the diseases to classify the individuals at risk, for example, if an individual had BMI≤25kg/m2, no physical activity, falls in the last year, HbA1c≥47.5mmol/mol and antidiabetic medication treatment, this individual was identified to have 5 risk factors, and so forth. Finally, 2,303 patients carried none of the risk factors, 4,128 patients accompanied with one of the risk factors, and 4,252 patients carried at least two risk factors. We found that the effect of type 2 diabetes on the risk of fracture decreased when the risk factors secondary to type 2 diabetes decreased. We have made this clearer in the Methods and Results (Page 11, line 255-257; Page 24, line 548-552).

      (12) The conclusion of the study appears to be somewhat confusing. In the Abstract, the authors initially state that "genetically predicted T2D was associated with higher BMD and lower risk of fracture." However, later on, they write that "the genetically determined T2D might not be associated with a higher risk of fracture." This discrepancy raises uncertainty about the clear take-home message of the study.

      Response: Here we just want to deliver the same message by different statements, avoiding the repeat of writing. The take-home message we want to deliver is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the association could be observed, suggesting the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, it is important to manage the complications of type 2 diabetes to prevent the risk of fracture, especially the 5 factors we investigated in this study.

      (13) Apologies if I offend) It seems that the authors lack comprehensive knowledge of the osteoporosis literature. In the Introduction, their definition of osteoporosis as "an age-related common disease characterized by low bone mass" is inadequate. It would be advisable for the authors to provide a more widely accepted and standard definition of osteoporosis to ensure accuracy and alignment with established definitions in the field.

      Response: Thanks for your suggestion. Now we changed the statement as follow “Osteoporosis is a common chronic disease characterized by low bone mass and disruption of bone microarchitecture. Fragility fracture is the ultimate outcome of poor bone health”.

      (14) There are several instances in which the authors use non-standard terminologies. For example, the use of the word 'effects' (in "the observed effect of T2D on fracture risk") is inappropriate since this study is observational in nature.

      Response: In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population. We have changed some of the words “effect” into “effect size” (whenever appropriate) to refer the Hazard ratio between T2D on fracture.

      (15) Please provide a reference for "diabetic bone paradox".

      Response: We have cited Botella Martínez et al, Endocrinol Nutr. 2016 and Romero-Díaz et al, Diabetes Ther. 2021 in both Introduction and Discussion (Page 3, line 76-77; Page 13, line 295-297).

      References

      Botella Martinez S, Varo Cenarruzabeitia N, Escalada San Martin J, Calleja Canelas A. The diabetic paradox: Bone mineral density and fracture in type 2 diabetes. Endocrinol Nutr. 2016, 63: 495-501.

      Romero-Diaz C, Duarte-Montero D, Gutierrez-Romero SA, Mendivil CO. Diabetes and bone fragility. Diabetes Ther. 2021, 12: 71-86.

      Zhu XW, Liu KQ, Yuan CD et al. General and abdominal obesity operate differently as influencing factors of fracture risk in old adults. iScience. 2022, 25: 104466.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presented convincing single-cell transcriptomic data of hematopoietic cells and immunocytes in zebrafish kidney marrow and showed that these cells have distinctive responses to viral infection. The findings in this study suggest that zebrafish kidney is a secondary lymphatic organ and hematopoietic stem cells in zebrafish may exhibit trained immunity. This represents a valuable discovery of the unique features of the fish immune system.

      Public Reviews:

      Reviewer #1 (Public Review):

      Hu et al. performed sc-RNA-seq analyses of kidney cells with or without virus infection, vaccines, and vaccines+virus infections from pooled adult zebrafish. They compared within these experimental groups as well as kidney vs spleen. Their analyses identified expected populations but also revealed new hematopoietic stem/progenitor cell (HSPC), even in the spleen. Their analyses show that HSPCs in the kidney can respond to virus infection differentially and can be trained to recognize the same infection and argue that zebrafish kidney can serve as a secondary immune organ. The findings are important and interesting. The manuscript is well written and a pleasure to read. However, there are several issues with their figure presentation and figure qualities, as well as the lack of clarity in some of figure legends. Some of the data presentation can be improved for better clarity. It is also important to outline what is conserved and what is unique for fish.

      Major concerns:

      (1) The visualization for several figure panels is very poor. Please provide high resolution images and larger font sizes for gene list or Y and X axis labels. This includes Figure 1B, Figure 1-figure supplement 2, Figure 2B-2C, 3A-3D, 4F, 5B, 6G, Figure 6-figure supplement 1B, Figure 6-figure supplement 2. Figure 7B, 8C-8E, Figure 8-figure supplement 1., 10F, 10G-10J, Figure 10-figure supplement 1.

      Response: We apologize for the issue you have pointed out concerning the inadequate visualization of the graphic panels. It is likely that the formatting of the inserted images was altered during the manuscript upload process, leading to a reduction in resolution. However, the graphics uploaded as separate image files, specifically formatted as vector files in PDF format, preserve their high resolution even when zoomed in. Therefore, we kindly request the reviewer to consult the figures in the submission folder for a more detailed examination. We sincerely apologize for any inconvenience caused.

      (2) What are the figures at the end of the manuscript without any figure legends?

      Response: Thank you for bringing this issue to our attention. The last few figures that lack figure legends are actually supplementary figures included in the text. It is possible that they were automatically and repeatedly generated by the submission system. In the revised manuscript, we will take measures to ensure that this issue is avoided.

      (3) It would be better to use a Table to organize the gene signatures that define each unique population of immune cells such as T, B, NK, etc.

      Response: We greatly appreciate the valuable advice provided by the reviewer. As per the reviewer's recommendation, we have included a comprehensive display of all cell types and corresponding gene signatures in Supplementary File 1 of the revised manuscript.

      (4) What are the similarities for HSPC and immune cell populations between fish and man based on this research? It is better to form a table to compare and discuss.

      Response: Following the valuable suggestion of the reviewer, we have included an additional comparative analysis of HSPC and immune cell populations between zebrafish and humans. This information can be found in Supplementary file 8 and in the "Discussion" section (lines 684-685).

      (5) It is highly likely that sex and age could be the biological variation for how HSPC responds to virus infections and vaccination. The author should clearly state the fish sex and age from their samples and discuss their results taking into consideration of these variations.

      Response: We are grateful for the reviewer's insightful comments. To reduce inter-individual variations, zebrafish samples were selected randomly, with an equal distribution of males and females, during their prime youth period spanning from 3 to 12 months of age. We have included supplementary instructions regarding this selection process in the "Materials and Methods" section (lines 798-799).

      (6) The authors claim that the spleen and kidney share HSPCs. However, their data did not demonstrate this result clearly in Figure 4A. Perhaps they should use different color to make the overlay becoming more obvious? Or include a table to show which HSPCs are shared between the kidney and spleen? Are they sure if these are just HSPCs seeding the spleen to differentiate into B cells or other immune cells?

      Response: We express our gratitude to the reviewer for raising this issue. In this section, we would like to provide detailed explanations regarding this matter. It is important to note that the figures positioned on both the left and right sides of Figure 4A should be interpreted in a corresponding manner. The left-side figure represents the cellular composition from the spleen (depicted in light red) and the kidney (depicted in blue) across various cell types. Each data point in the left-side figure signifies an individual cell, with the two distinct colors indicating the origin of the cell. On the other hand, the right-side figure displays the varied colors representing different cell types. We want to emphasize that the spatial distribution and proportions of diverse cells in the tSNE plot on the right align consistently with the information presented in the left-side figure. This indicates the correspondence between the two plots and reinforces the validity of our findings. When interpreting the figures on the left and right sides of Figure 4A in a corresponding manner, it becomes evident that the overlapping HSPCs shared by both spleen and kidney predominantly reside in the HSPCs1 group (indicated as cluster 5 in the right-side figure). Additionally, there is also a small distribution of the overlapping HSPCs in the HSPCs2 group (cluster 8 in the right-side figure). These observations underline the presence of overlapping HSPCs in both the kidney and spleen. However, further clarification is required to fully comprehend the intricate correlation between the HSPCs in the kidney and spleen.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      (1) Figure 3C: why is 10 listed in between 1 and 2?

      Response: We appreciate the reviewer's comment. It is pertinent to mention that the graphs in Figure 3C underwent an automatic sorting process facilitated by the software during the analysis. It should be emphasized that the assigned positions resulting from this sorting process have no bearing on the outcomes of the analysis.

      (2) Figure 4A: difficult to assess the overlay between the kidney and spleen.

      Response: As mentioned above, the overlapping HSPCs shared by both the spleen and kidney are mainly distributed in the HSPCs1 group (cluster 5 in the right-side figure), with a small amount also found in the HSPCs2 group (cluster 8 in the right-side figure).

      (3) Figure 4C: What is this sample, kidney or spleen? Please specify.

      Response: Figure 4C represents an overlay of the spleen and kidney cells depicted in Figure 4B, which includes all cells of the spleen and kidney to show the differentiation trajectory of the cells. As per reviewer’s suggestion, we have made corresponding modification to the revised figure.

      (4) The manuscript is very long. Consider to focus on the major findings as the main figures and move the rest to the supplementary figures.

      Response: This article aimed to comprehensively understand the hematopoietic and immunological traits of zebrafish kidneys through a systematic study. As a result, a comprehensive presentation of the findings has been provided. Given that the figures currently integrated into the main text play a significant role in illustrating the principal outcomes of each section, we kindly request that these figures remain in the main body of the article. This will contribute to sustaining the structural coherence and readability of the manuscript. Thank you for taking our request into consideration.

      Reviewer #2 (Public Review):

      In this manuscript, the authors have meticulously constructed a comprehensive atlas delineating hematopoietic stem/progenitor cell (HSPC) and immune-cell types within the zebrafish kidney, employing single-cell transcriptome profiling analysis. Notably, these cell populations exhibited distinctive responses to viral infection. Intriguingly, the investigation revealed that HSPCs manifest positive reactivities to viral infection, indicating the effective induction of trained immunity in select HSPCs. Furthermore, the study unveiled the capacity for the generation of antigen-stimulated adaptive immunity within the kidney, suggesting a role for the zebrafish kidney as a secondary lymphoid organ. This research elucidates the distinctive features of the fish immune system and underscores the multifaceted biology of the kidney in ancient vertebrates.

      Response: We would like to express our gratitude to the reviewers for their overall positive feedback on our article.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors propose that zebrafish kidney is a dual-functional entity with functionalities of both primary and secondary lymphoid organs. Do the authors have any insights into the coordination of these two functions in the kidneys?

      Response: We are grateful for the valuable comments provided. We believe that the question raised by the reviewer poses an intriguing research topic, as it explores the intricate interaction between the hematopoietic and adaptive immune systems in the renal organ. This exploration holds significant value in understanding the underlying mechanisms. To accomplish this, advanced techniques such as spatiotemporal single-cell transcriptomics and dynamic cell tracking will be utilized to validate the interplay between hematopoietic and immune cell lineages.

      (2) Previous studies have found that fish IgZ/IgT specificity exists in mucosal immune organs. Is the expression of the zebrafish IgZ gene observed in the kidney? If so, is there any correlation with IgZ in mucosal immune organs?

      Response: Thank you for drawing attention to this matter. In our study, we observed the expression of the IgZ gene (ighz) in the zebrafish kidney, as shown in Figure 6. This discovery aligns with previous research and confirms its presence in B cells. While IgZ is known to function as an antibody in mucosal immunity, it remains unclear whether the development of its secretory cells (IgZ+ B cells) originates from the central immune system, such as the kidney. Our results suggest that IgZ+ B cells may have their origin in the kidney and then migrate through the peripheral circulation to carry out their functions in the local mucosal system. This finding is consistent with our earlier research, which demonstrated that zebrafish IgZ is not limited to mucosal immune organs but is also abundantly present in systemic immunity, including peripheral blood (Immunology. 2021; 162(1): 105-120).

      Reference:

      Ji, J. F. et al. Differential immune responses of immunoglobulin Z subclass members in antibacterial immunity in a zebrafish model. Immunology, 2021;162(1), 105-120.

      (3) Did the authors use the zebrafish genome or transcriptome for gene annotation? If the former, which version is used? Please supplement in the "Materials and methods".

      Response: We appreciate the comments provided by the reviewer. In this study, we utilized the zebrafish genome, specifically the GRCz11 version, to annotate genes. The detailed genome data can be found at http://asia.ensembl.org/Danio_rerio/Info/Index. We have incorporated this information into the "Materials and Methods" section of the revised manuscript (line 873).

      (4) Since the authors performed single-cell sequencing on leukocytes, why did several kidney cells, such as kidney multicellular cells and kidney mucin cells existed in the samples?

      Response: Thanks for the reviewer’s comments. It is important to acknowledge that inadvertent mixing of kidney cells might have occurred during the preparation of single-cell suspensions in our analyzed sample. However, it is pertinent to emphasize that our primary focus was the analysis of immune cells. Therefore, any minor contamination from kidney cells in the analyzed sample is considered negligible and does not significantly affect the main results of our analysis.

      (5) The application of "trained immunity," although currently popular, appears unsuitable in this context, as the current scenario involves a recall with the cognate antigen.

      Response: To our knowledge, trained immunity is generally recognized as the long-term memory of innate immunity based on transcriptional, epigenetic and metabolic modifications of myeloid cells, which are characterized by elevated pro-inflammatory responses to secondary stimuli, whether they are identical or different (Cell Host Microbe. 2012; 12(2): 223-32; Nat Immunol. 2021; 22(1): 2-6; J Clin Invest. 2022;132(7): e158468). Therefore, stimulation of cognate antigens can be considered as a form of training immunity, and we hope that it will be accepted in this context.

      References:

      (1) Quintin, J. et al. Candida albicans infection affords protection against reinfection via functional reprogramming of monocytes. Cell host & microbe, 2012;12(2), 223-232.

      (2) Divangahi, M. et al. Trained immunity, tolerance, priming and differentiation: distinct immunological processes. Nature immunology, 2021;22(1), 2-6.

      (3) Pernet, E. et al. Training can’t always lead to Olympic macrophages. Journal of Clinical Investigation, 2022;132(7), e158468.

      (6) The discovery that HSPC exhibits trained immune characteristics is novel. Do the authors have any insights into the biological significance of trained immunity in HSPCs concerning immune defense?

      Response: We propose that the generation of trained immunity in HSPCs holds significant physiological implications. This process may expedite the differentiation and activation of specific immune cells upon re-infection, thereby bolstering the body's immune defenses and pathogen clearance. Consequently, it may serve as an intelligent strategy for host defense against pathogens. However, additional research is required to confirm this hypothesis.

      (7) In the Figure 13I, the authors used CpG and CpG+TNP-KLH to stimulate zebrafish, but no corresponding experimental method was provided in the "Materials and methods". Please supplement.

      Response: Thanks for the reviewer’s careful reading. We have included corresponding supplementary instructions in the “Materials and methods” section (lines 1011-1018).

      (8) At line 187-190 in "Results", authors state that "It's noteworthy that cluster 11 exhibited high expression of genes ......, resembling a unique serpin-secreting cell population". Noteworthy is the fact that serpins play a role in diverse immunological processes, including coagulation, inflammation, as well as myeloid and lymphoid cell development. Could this renal cell cluster (kidney mucin cells) potentially harbor immunological functions?

      Response: Given the crucial role of serpins in various immunological processes, secreted serpins from this particular cell cluster likely possess significant immunological functions, suggesting the notable immunological capabilities of this cell group. Consequently, our forthcoming research aims to conduct a more comprehensive investigation of this specific cell population.

      (9) At line 171 in "Results", the number "6" in the "cluster 6" should not be italicized, please correct.

      Response: We have addressed this issue in the revised manuscript (line 170).

      (10) At line 937 in "Materials and methods", the authors isolated T/B lymphocytes through magnetic bead sorting. Please provide information on the source of the antibodies (rabbit anti-TCRα/β or mouse anti-IgM Ab).

      Response: We have included corresponding instructions in the “Materials and methods” section (lines 938-939).

    1. Author Response

      Reviewer #1 (Public Review):

      The author's goal was to determine the role of O-GlcNAc modification in associate learning in Drosophila using an odor discriminatory task. In particular, they sought to determine the population of O-GlcNAc modified proteins in a region of the brain critical for memory, the mushroom body. They provide compelling evidence that there are brain-region-specific populations of O-GlcNAc modified proteins and that in the mushroom body, proteins involved in translation represent a sizable, and larger fraction than elsewhere in the central nervous system. Using expression of a bacterial protein that cleaves O-GlcNAc in the mushroom body, they show both reductions in the levels of this modification and effects on associative learning. Further exploration of new protein synthesis in situ supports the hypothesis that O-GlcNAc modification affects the activity of the translational machinery and could provide the basis for learning deficits when O-GlcNAc levels are compromised. Rescue of deficits resulting from reductions in O-GlcNAc was achieved by over-expression of dMyc, a known regulator of ribosome biogenesis and translation. While the critical role of protein synthesis in learning is long established, and that O-GlcNAc modification regulates protein synthesis, this work connects O-GlcNAc modification in a specialized region of the brain to translation regulation and associative learning. The authors also provide a method for identification of O-GlcNAc modified proteins using a tissue-specific and inducible proximity-labelling method. This will provide a useful tool for further functional studies of O-GlcNAc modification.

      Thank you for summarizing our main findings and recognizing the usefulness of the tool reported here.

      Reviewer #2 (Public Review):

      In this report Yu et al. try to demonstrate how O-GlcNAcylation of ribosomal proteins in the mushroom body (MB) is required for protein synthesis and olfactory learning. The authors develop a new method combining the O-GlcNAc binding activity of an OGlcNAcase (OGN) and TurboID for efficient isolation. This novel method is a useful tool for the identification of O-GlcNAc modified proteins and closely interacting partners. Transgenic expression of this binder allows the authors to perform a profiling that can be time and tissue/region/cell specific. This novel tool is thoroughly tested to show it works in cultured cells, whole Drosophila and in a tissue specific manner expressing it pan-neuronally or specific regions of the brain.

      The authors had previously shown that reduced O-GlcNAcylation through transgenic expression of a highly active OGN affected olfactory learning. In this work the same approach is used to reduce O-GlcNAcylation in different brain regions to show that specific reduction in the adult MB reduced olfactory learning performance. As control OGN expression in the ellipsoid body has no effect on olfactory learning. Optic and antennal lobes could not be tested as OGN expression affected olfactory acuity. The most critical part of this finding is time specific expression of OGN in the adult in a tissue specific manner given the developmental defects it induces with earlier expression. The MB has a widely reported role in associative learning, therefore this finding while not unexpected it is satisfying.

      Thank you for recognizing the significance of our work.

      Yu et al. use their TurboID-OGA to identify O-GlcNAcylated proteomes in different brain regions. The authors focus on the MB given its role in associative learning and the effect of reduced O-GlcNAcylation in this region. Among other substrates several ribosomal proteins are found to be specifically O-GlcNAcylated to a greater extent in the MB compared to other brain regions.

      To demonstrate the role of MB O-GlcNAcylated ribosomes in protein synthesis an ex vivo OPP fluorescent assay is used in brains of flies expressing OGN or a mutant form lacking its catalytic and binding activities. The experiment shows reduced protein synthesis in the MB. In addition, the authors can increase protein synthesis inducing ribosomal biogenesis through the expression of dMyc. Flies expressing of dMyc and OGN together do not present the learning deficits of flies carrying only OGN. Protein synthesis in MB has been previously reported to be required for associative learning (for example Wu et al.2017 or Lin et al. 2022) and the present results bring further support. A link between ribosomal O-GlcNAcylation and protein synthesis could be a really interesting finding but, unfortunately the experiments presented in this work are still too preliminary.

      The experiments presented just focus on ribosomal proteins while these are just some of the O-GlcNAcylation substrates in the MB. While a correlation between ribosomal modification and protein synthesis is shown, a demonstration is not provided. Many other mechanisms and O-GlcNAcylation of other substrates could account for the same observations. For example, O-GlcNAcylation has been reported to have a role in protein synthesis affecting different translation initiation factors (Li et al 2018, Shu et al 2022). In vitro experiments where specific O-GlcNAcylation ribosomal components could be targeted are required. In addition, O-GlcNAcylation is also known to modify ribosomal-associated mRNAs. Experiments where specific mutations preventing O-GlcNAcylation in ribosomes could demonstrate a direct link of such ribosomal modifications in olfactory learning.

      We appreciate that you bring up a crucial point that our data fall short for a causal connection between O-GlcNAcylation of ribosomes and translational activity. We have made significant changes to the text throughout the manuscript to make our description more accurate.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (recommendations for the authors):

      The following are comments that the authors may wish to address or clarify:

      (1) The claim that respiration and fermentation occur concurrently in the agr mutant during aerobic growth is not strongly supported by the evidence presented…. However, since neither lactate production nor a difference in the NAD+/NADH ratio between the wild type and agr mutant was observed, it is challenging to assert that fermentation is occurring. Relying solely on a gene expression signature indicative of fermentation is, in my view, inadequate to conclusively establish that aerobic fermentation is taking place.

      Lactate production. The data we provide in Figure 5-E of the original manuscript (Figure 5-C in the revised manuscript) indicates that lactate production is lower in the wild-type compared to the Δagr mutant.

      The exact focus of Reviewer 1’s concern is not clearly specified, but may have been referring to how the result was described in the text:

      “Although the stimulatory effect of the agr deletion on production of the fermentation product lactate was not observed in optimally aerated broth cultures after growth to late exponential growth phase, it was confirmed for organisms grown in broth under more metabolically demanding, suboptimal aeration conditions (Figure 5E). Overall, these results are consistent with transcription-level up-regulation of respiratory and fermentative pathways in agr-deficient strains.”

      The greater sensitivity of suboptimal aeration conditions is unsurprising and relates to a low rate of fermentation during the vigorous aeration (shaking at 250 rpm) conditions commonly used to grow S. aureus. To clarify the point, we modified the text to provide additional context as follows:

      Line 271: “Although the stimulatory effect of the agr deletion on production of the fermentation product lactate was not observed in optimally aerated broth cultures after growth to late exponential growth phase, it was confirmed for organisms grown in broth under more metabolically demanding, suboptimal aeration conditions (limitations in the rate of respiration when oxygen is limiting are expected to increase overall levels of fermentation) (Figure 5C). Overall, these results are consistent with transcription-level up-regulation of respiratory and fermentative pathways in agr-deficient strains.” NAD+/NADH ratio. Extended studies of the NAD+/NADH ratio, requested by Reviewer 1 under Comments 12 and 13, document an effect of the Δagr mutant not seen in Figure 5F in the original submission. Our responses to Comments 12 and 13 below address this issue.

      (2) The mechanisms through which the ΔagrΔrot double mutant resists H2O2 are not clearly elucidated. While the authors suggest that the ΔagrΔrot double mutant expresses several genes involved in combating oxidative stress, essential genetic studies that would validate this hypothesis have not been conducted.

      The data we provide indicate 1) that wild-type strains are tolerant to peroxide and 2) that wild-type strains are able to render inducible several known reactive oxygen species (ROS)-protective genes in the presence of peroxide in a rot-dependent manner. Δagr strains, which do not demonstrate this response, are more readily killed by peroxide. Additional data indicate that increased respiration caused by deletion of agr is associated with increased endogenous ROS. Higher levels of endogenous ROS can modulate tolerance to subsequent challenge by ROS (1). Collectively, these observations support a model of Δagr-induced hyper-susceptibility in which elevation of endogenous ROS results in a suboptimal ROS-defense response that plays a role in increased peroxide lethality.

      We prefer to test this model in future studies directed at understanding the complexities of the interaction among agr-mediated tolerance, endogenous ROS levels, and induction of protective responses in S. aureus. Culprit protective genes, alone and in various combinations, will be inactivated in Δagr mutant and wild-type strains, tested in killing assays with and without agents that mitigate endogenous ROS, and subjected to RNAseq, proteomic, and metabolomic analyses, as part of a larger program to identify factors involved in S. aureus tolerance to lethal stress.

      To clarify the issue raised by the reviewer we altered the wording in the following sentences as follows:

      Line 335: “Elevated expression of protective genes suggests that the double mutant survives damage from H2O2 better because protective genes are rendered inducible (loss of Rot-mediated repression).”

      Line 440: “Details of agr-mediated protection are sketched in Figure 10. At low levels of ROS, agr is activated by a redox sensor in AgrA, RNAIII is expressed and represses the Rot repressor, thereby rendering protective genes (e.g., clpB/C, dps) inducible via an unknown mechanism (induction, candidate protective gene(s), and their connection to endogenous ROS levels are being pursued, independent of the current report).

      (3) The reason behind the agr mutant's low metabolic efficiency, as evidenced by low levels (Fig 5A) despite enhanced respiration and acetate production, is not clearly explained. Could insights from the modeling shed light on why the ATP levels are low in the agr mutant?

      Comparative modeling of central metabolic pathways, in combination with in vitro metabolic analyses of Δagr and wild-type strains, revealed the metabolic inefficiency but cannot explain it. The basis for the metabolic inefficiency conferred by agr inactivation is unknown. The possibility that aberrant sorting of cell wall surface proteins could lead to metabolic inefficiency was raised in the Discussion where we wrote:

      “Our work supports this idea by showing that increased respiration caused by deletion of agr is associated with increased ROS-mediated lethality. The basis for the metabolic inefficiency conferred by agr inactivation is unknown. Given that Δagr mutants are unable to downregulate surface proteins during stationary phase (2, 3), it is possible that deletion of agr perturbs the cytoplasmic membrane or the machinery that sorts proteins across the cell wall. In support of this notion, jamming SecY translocation machinery of E. coli results in downstream events shared with antibiotic lethality, including accelerated respiration and accumulation of ROS (4). In this scenario, the formation of a futile macromolecular cycle may accelerate cellular respiration to meet the metabolic demand of unresolvable problems caused by elevated surface sorting.”

      For clarification, we modified the text as follows:

      Line 461: “Our work supports this idea by showing that increased respiration caused by deletion of agr is associated with increased ROS-mediated lethality. How agr deficiency is connected to the corruption of downstream processes that result in metabolic inefficiency and increased endogenous ROS levels is unknown. Given that Δagr mutants are unable to downregulate surface proteins during stationary phase (2, 3), it is possible that deletion of agr perturbs the cytoplasmic membrane or the machinery that sorts proteins across the cell wall.”

      agr has been linked to defects in peptidoglycan autolysis (5). Cho et al. (2019) found that β-lactam treatment can induce a futile cycle of peptidoglycan synthesis and degradation that has been linked to increased production of endogenous ROS (6). Thus, an alternative, nonmutually exclusive route to a futile cycle and elevated endogenous ROS levels in agr-deficient cells other than surface protein dysregulation may be via decreased cell wall cross-linking. We prefer not to include this and other speculations, because they are not necessary or revealing and because they would detract from the manuscript by disrupting its sense of narrative and brevity.

      (4) The observation that menadione can protect the agr mutant from H2O2 is perplexing. The authors propose that even though menadione generates superoxide through redox cycling, this superoxide might inhibit the TCA cycle, thereby restricting respiration, which could be advantageous for the agr mutant. To substantiate this hypothesis, it would be imperative to demonstrate that a double mutant ΔagrΔacnA exhibits long-lived protection against H2O2.

      Rowe et al. (2020) definitively showed that a burst of menadioneassociated ROS inactivates the TCA cycle in S. aureus, leading to reduced respiration and ATP production (7). Both aconitase activity and ATP levels in menadione-treated cultures were complemented by the antioxidant N-acetyl cysteine. In the present work we demonstrate, using the same experimental conditions as Rowe et al., that menadione protected the Δagr mutant from peroxide killing but had little effect on the wild-type strain. Addition of N-acetyl cysteine in the presence of menadione restored H2O2 susceptibility to the Δagr mutant and had no effect on the wild-type. Collectively, these observations support the idea that menadione inactivates the TCA cycle, leading to reduced respiration, and increased protection of the Δagr mutant from peroxide killing.

      As requested, we tested whether the ΔagrΔacnA double mutant exhibits protection against H2O2. The new data we now provide (Figure 8—figure supplement 2A) show that a ΔacnA mutation completely protected the Δagr mutant from peroxide killing after growth to late exponential growth phase, but it had little if any effect on the wild-type strain. To evaluate long-lived protection, we compared survival rates of ΔagrΔacnA mutant and Δagr cells following dilution of overnight cultures and regrowth prior to challenge with H2O2, which revealed partial protection of the Δagr mutant (Figure 8— figure supplement 2B).

      We explained these results with the following:

      Line 351: “Rowe et al. (2020) showed that menadione exerts its effects on endogenous ROS by inactivating the TCA cycle in S. aureus. To determine whether this mechanism can also induce protection in the Δagr mutant, we inactivated the TCA cycle gene acnA in wild-type and Δagr strains (Figure 8—figure supplement 2). We found that ΔacnA mutation completely protected the Δagr mutant from peroxide killing after growth to late exponential growth phase but had little effect on the wild-type strain. This finding supports the idea that TCA cycle activity contributes to an imbalance in endogenous ROS homeostasis in the Δagr mutant, and that this shift is a critical factor for Δagr hyperlethality. When we evaluated long-lived protection by comparing survival rates of ΔagrΔacnA mutant and Δagr cells following dilution of overnight cultures and regrowth prior to challenge with H2O2, ΔacnA remained protective, but less so (Figure 8—figure supplement 2). These partial effects of an ΔacnA deficiency suggest that Δagr stimulates long-lived lethality for peroxide through both TCA-dependent and TCA-independent pathways.”

      (5) Figure 10 presents a model suggesting that Rot-mediated repression of respiration is essential for long-lasting resistance to H2O2 lethality. However, the connection between decreased respiration and long-lived resistance to ROS is not evident, especially considering that the respiration rate varies over the growth phase and does not seem to align with the long-lived and steady protection provided by agr. However, the authors could investigate this by examining whether inactivating qox in the agr mutant restores its resistance to H2O2. The experiments with menadione are not particularly persuasive, as menadione could have additional effects on the cells that are not accounted for.

      As requested, we tested whether the ΔagrΔqoxC double mutant exhibits protection against H2O2. qox deficiency was hyperlethal in wild-type and Δagr strains, even with the lowest concentration of H2O2 used in our assay. Indeed, surviving cells were undetectable, precluding comparison of survival differences between wild-type and Δagr mutant strains. This striking finding can be explained by prior work highlighting the profound and pleotropic effects of qox deficiency on metabolism that involve not only control of respiration but also participation in other physiological processes such as cell growth and morphological differences. For example, in Bacillus, qox deficiency decreases TCA cycle flux and increases overflow metabolism (8). Additionally, we confirmed prior work in S. aureus showing that qox deficiency decreases growth rate and yield (9, 10), dramatically increases production of pigment that functions as an oxidation shield, and decreases hemolytic activity (11). Moreover, we found that that qox deficiency results in a striking increase (~150%) in endogenous ROS in both wild-type and agr mutant cells, likely explaining the hyperlethality phenotype. Thus, interpretation of killing assay results must account for the complex and likely reciprocal interactions among Δqox-mediated metabolic changes, agrA-mediated redox sensing, and Δagrmediated changes in metabolism. Since killing data are not necessary or revealing without this information, we prefer to address the role of qox in future studies directed at understanding the complexities of the interaction among agr-mediated tolerance, endogenous ROS levels, and induction of protective responses in S. aureus.

      (6) The repeated use of the term 'agr wild type' throughout the text is somewhat distracting. It might be clearer to simply use 'wild type,' as it is implied that this refers to the agr+ genotype.

      We modified the text by replacing 'agr wild-type' with “wild-type” as suggested by the Reviewer.

      (7) In the text, the authors imply that the extended lag phase of the agr mutant is observed solely in nutrient-limited CDM. However, Figure 1 and Figure Supplement 3A reveal that the strains were actually cultivated in CDM supplemented with glucose and Casamino acids, which makes the medium rich in both carbon and nitrogen, in addition to other nutrients present in CDM. The authors should clarify the composition of the media used and assess whether the term 'nutrient-limited CDM' is accurate in this context.

      The extended lag phase of the Δagr mutant is observable in TSB but it is more easily appreciated in CDM, perhaps owing to a larger range of carbohydrates and other nutrient types (TSB a rich and complex medium for which the composition is unknown) and a higher concentration of glucose (2.5 mM versus 2.2 mM).

      For clarification, we modified line 135 as follows:

      Line 184: “Lag-time differences between strains were more obvious in experiments using less complex, chemically defined medium (CDM)…”

      (8) Figure 1 - Figure Supplement 3C represents the growth rate in terms of [OD/min]. However, it would be more accurate to calculate the growth rate (μ) based on the change in the natural logarithm of optical density (OD) relative to the corresponding change in time, using appropriate units (preferably, h⁻¹). Additionally, the method employed for measuring growth rates should be detailed in the Materials and Methods section.

      Our responses to Reviewer 2 Minor Comment 1 below address this issue.

      (9) The resolution of the inset charts in Figure 4B is poor, and the Y-axis lacks labels. The figure legend should also specify whether the flux distribution (represented by thick black arrows in Fig 4B) is predicted for the wild type or the mutant.

      We modified Figure 4B and the legend accordingly.

      (10) On Page 9, the term "RT-PCR" should be corrected to "RT-qPCR."

      We thank the Reviewer for their attention to detail in picking up our error. We modified text accordingly.

      (11) It is ambiguous whether the agr mutant is producing more acetate, based on the information provided in Figure 5B. Since the cells might have entered the post-exponential phase at 5 hours, they could start consuming acetate. Consequently, the elevated acetate concentration in the agr mutant might result from a delay in acetate consumption rather than increased production. To discern between the production and consumption of acetate, it is essential to measure acetate concentrations at earlier time points as well as the corresponding glucose concentrations in the media. This will help ascertain when the agr mutant enters the post-exponential phase. A similar concern also exists in the case of lactate (Fig 5E) since it is not clear when lactate was measured.

      As requested, we measured acetate levels at earlier time points (1, 2, 3, 4, h of growth). New Figure 5B shows that the Δagr mutant accumulated more acetate than the wild-type strain during exponential growth at 3 h, well before entry into postexponential phase (see growth curves in Figure 1—figure supplement 1).

      In the original report, lactate levels were measured at 4 h for organisms grown under suboptimal aeration conditions (see Reviewer 1, Comment 1). When we measured lactate accumulation at 3 h it remained higher in the Δagr mutant compared to the wildtype. Likewise, acetate levels at 3 h under suboptimal aeration conditions remained elevated in the Δagr mutant compared to the wild-type. These results support the idea that inactivation of agr promotes production rather than decreased consumption of acetate and lactate in the culture medium.

      (12) In Figure 5G-H, presenting the actual NAD+ and NADH values side-by-side would facilitate a more straightforward interpretation of the data by the readers.

      (13) On Page 9, the text states that respiration and fermentation lower the NAD+/NADH ratio. However, this seems contradictory as these processes would typically increase the NAD+/NADH ratio. Furthermore, it would be beneficial for the authors to provide supporting evidence for the statement made at the beginning of Page 10, which claims that there is greater consumption of NADH in the agr mutant.

      Responses to Comments 12 and 13 were grouped together.

      We thank the Reviewer for their attention to detail in picking up our error about the NAD+/NADH ratio. The ratio is expected to be elevated by increases in respiration and fermentation, not lowered, owing to increased consumption of NADH.

      Figure 5I in the submitted manuscript indicated a small but insignificant decrease in the NAD+/NADH ratio of the Δagr mutant. Thus, the NAD+/NADH ratio remained tightly bounded, but if anything was decreased, not increased.

      We explained this finding as follows:

      Line 284: “Collectively, these observations suggest that a surge in NADH production and reductive stress in the Δagr strain induces a burst in respiration and fermentation.”

      The NAD+/NADH ratio in Figure 5F of the submitted manuscript was calculated from NADH and total (NAD+/NADH) levels. As requested, we measured individual NAD+ and NADH concentrations. We found that the decrease in the NAD+/NADH ratio of the Δagr mutant was now large, significant, and largely due to a relative increase in NADH.

      We have included these new data in a revised Figure 5 in the revised version of the manuscript and clarify the relationship among the NAD+/NADH ratio, respiration, and fermentation in the Δagr mutant by modifying the wording of the text as follows:

      Line 280: “Since respiration and fermentation generally increase NAD+/NADH ratios and since these activities are increased in Δagr strains (Figure 5C and 5E-F), we expected a higher NAD+/NADH ratio relative to wild-type cells. However, we observed an increase decrease in the NAD+/NADH ratio due to a large surge in NADH accompanied by a modest drop in NAD+ compared to wild-type. Collectively, these observations suggest that a surge in NADH production and reductive stress in the Δagr strain induces a burst in respiration, but levels of NADH are saturating, thereby driving fermentation in the presence of oxygen.

      Reviewer #2 (Recommendations For The Authors):

      (1) The RNA-seq analysis revealed that the Δagr strain exhibited increased expression of genes involved in respiration and fermentation, suggesting enhanced energy generation. However, metabolic modeling based on transcriptomic data indicated a decrease in tricarboxylic acid (TCA) cycle and lactate flux per unit of glucose uptake in the Δagr mutant. Additionally, intracellular ATP levels were significantly lower in the Δagr mutant compared to the wild-type strain, despite the carbon being directed into an acetate-generating, ATP-yielding carbon "overflow" pathway. Furthermore, growth analysis in nutrient-constrained medium demonstrated a decrease in the growth rate and yield of the Δagr mutant. Given that S. aureus actively utilizes the electron transport chain (ETC) to replenish NAD pools during aerobic growth on glucose, supporting glycolytic flux and pyruvate dehydrogenase complex (PDHC) activity while restricting TCA cycle activity through carbon catabolite repression (CCR), it is suggested that the authors analyze glucose consumption rates in conjunction with the determination of intracellular levels of pyruvate, AcCoA, and TCA cycle intermediates such as citrate and fumarate. These additional experiments will provide valuable insights into the metabolic fate of glucose and pyruvate and their subsequent impact on cellular respiration and fermentation in the Δagr mutant.

      (2) The authors highlighted the importance of redox balance in Δagr cells by emphasizing the tendency of these cells to prioritize NAD+-generating lactate production over generating additional ATP from acetate. However, the results regarding acetate and lactate production in Δagr cells during aerobic growth suggest that carbon is directed towards acetate generation rather than lactate.

      Responses to Comments 1 and 2 have been combined.

      As requested, we measured glucose consumption and intracellular levels of several different metabolites in the wild-type and Δagr mutant strain. The results are consistent with the idea that increased acetogenesis and fermentation in Δagr mutant cells contribute to increased ATP production and NAD+ recycling, respectively. These two processes appear to be relatively favored over the flux of pyruvate carbon into the TCA cycle of the Δagr mutant.

      We explained our finding as follows:

      Line 288: “To help determine the metabolic fate of glucose, we measured glucose consumption and intracellular levels of pyruvate and TCA-cycle metabolites fumarate and citrate in the wild-type and Δagr mutant strains. At 4 h of growth to late-exponential phase, intracellular pyruvate and acetyl-CoA levels were increased in the Δagr mutant compared to wild-type strain, but levels of fumarate and citrate were similar (Figure 5— figure supplement 1D-E). Glucose was depleted after 4 h of growth, but glucose consumption after 3 h of growth (exponential phase) was increased in the Δagr mutant compared to the wild-type strain (Figure 5—figure supplement 1A). These observations, together with the decrease in the NAD+/NADH ratio and increase in acetate and lactate production described above, are consistent with a model in which respiration in Δagr mutants is inadequate for 1) energy production, resulting in an increase in acetogenesis, and 2) maintenance of redox balance, resulting in an increase in fermentative metabolism, lactate production, and conversion of NADH to NAD+. Increased levels of acetate compared to lactate under optimal aeration conditions suggests that demand for ATP is in excess of demand for NAD+.”

      Future work will compare additional extracellular and intracellular (e.g., formate, ethanol, acetoin) metabolites to test these and other models using a combination of approaches (e.g., mass spectrometry, nuclear magnetic resonance, genetic deletion studies, transcriptomics) and will determine the mechanisms underlying metabolic differences in wild-type and Δagr mutant strains.

      To maintain a sense of narrative we added a new subheading after the explanation of our findings:

      Line 311: “Transcriptional changes due to Δagr mutation are long-lived and result in down-regulation of H2O2-stimulated genes relative to those in an agr wild-type.”

      (3) The authors mentioned that respiration and fermentation typically reduce the NAD+/NADH ratios, and since these activities are elevated in Δagr strains (Figure 5F-G), they initially anticipated a lower NAD+/NADH ratio compared to wild-type agr cells. However, the increase in respiration and activation of fermentative pathways leads to a decrease in NADH levels, therefore resulting in an increase in the NAD+/NADH ratio.

      We have clarified the issue with new experiments and by modifying the wording as shown in the response to Reviewer 1 Comment 13.

      (4) To improve the clarity and completeness of this work, it would be advantageous for the authors to provide specific details regarding the glucose concentration in the TSB media and the aeration conditions during growth, including the flask-tomedium ratio. These additional experimental parameters are essential for ensuring the reproducibility and comprehensiveness of the study, allowing for a more precise understanding and interpretation of the observed metabolic changes in the Δagr strain.

      We modified the Methods as suggested.

      Minor comments:

      (1) The growth rate in Figure 1-figure supplement 3 should not be presented as a simple calculation of OD/min and needs to be recalculated.

      We recalculated the growth rate and modified Figure 1 as suggested. The exponential phase was used to determine growth rate (µ) from two datapoints, OD1 and OD2 flanking the linear portion of the curve, following the equation lnOD2-lnOD1/t2-t1, as described (12).

      (2) Δrot (BS1301) should be removed from Figure 2 (A) legend as it is not presented in the panel A.

      We modified Figure 2 as suggested.

      (3) The authors should specify in the Figure 3 (D) legend that the kinetics of killing by H2O2 was performed for ΔrnaIII and ΔagrBD mixtures.

      We modified Figure 3 as suggested.

      (4) In the Figure 4 legend for (C), the statement "See Supplementary file 2 for supporting information" should be changed to "See Supplementary file 3 for supporting information."

      We modified Supplementary file name as suggested.

      References cited in responses

      (1) Brynildsen MP, Winkler JA, Spina CS, MacDonald IC, Collins JJ. 2013. Potentiating antibacterial activity by predictably enhancing endogenous microbial ROS production. Nature biotechnology 31:160-165.

      (2) Morfeldt E, Taylor D, von Gabain A, Arvidson S. 1995. Activation of alphatoxin translation in Staphylococcus aureus by the trans-encoded antisense RNA, RNAIII. EMBO J 14:4569-4577.

      (3) Novick RP, Ross HF, Projan SJ, Kornblum J, Kreiswirth B, Moghazeh S. 1993. Synthesis of staphylococcal virulence factors is controlled by a regulatory RNA molecule. EMBO J 12:3967-3975.

      (4) Takahashi N, Gruber CC, Yang JH, Liu X, Braff D, Yashaswini CN, Bhubhanil S, Furuta Y, Andreescu S, Collins JJ, Walker GC. 2017. Lethality of MalE-LacZ hybrid protein shares mechanistic attributes with oxidative component of antibiotic lethality. Proc Natl Acad Sci U S A 114:9164-9169.

      (5) Fujimoto DF, Bayles KW. 1998. Opposing roles of the Staphylococcus aureus virulence regulators, Agr and Sar, in Triton X-100- and penicillin-induced autolysis. J Bacteriol 180:3724-3726.

      (6) Cho H, Uehara T, Bernhardt TG. 2014. Beta-lactam antibiotics induce a lethal malfunctioning of the bacterial cell wall synthesis machinery. Cell 159:13001311.

      (7) Rowe SE, Wagner NJ, Li L, Beam JE, Wilkinson AD, Radlinski LC, Zhang Q, Miao EA, Conlon BP. 2020. Reactive oxygen species induce antibiotic tolerance during systemic Staphylococcus aureus infection. Nat Microbiol 5:282-290.

      (8) Zamboni N, Sauer U. 2003. Knockout of the high-coupling cytochrome aa3 oxidase reduces TCA cycle fluxes in Bacillus subtilis. FEMS Microbiol Lett 226:121-126.

      (9) Halsey CR, Lei S, Wax JK, Lehman MK, Nuxoll AS, Steinke L, Sadykov M, Powers R, Fey PD. 2017. Amino acid catabolism in Staphylococcus aureus and the runction of carbon catabolite repression. mBio 8.

      (10) Hammer ND, Reniere ML, Cassat JE, Zhang Y, Hirsch AO, Indriati Hood M, Skaar EP. 2013. Two heme-dependent terminal oxidases power Staphylococcus aureus organ-specific colonization of the vertebrate host. mBio 4.

      (11) Lan L, Cheng A, Dunman PM, Missiakas D, He C. 2010. Golden pigment production and virulence gene expression are affected by metabolisms in Staphylococcus aureus. J Bacteriol 192:3068-3077.

      (12) Grosser MR, Weiss A, Shaw LN, Richardson AR. 2016. Regulatory requirements for Staphylococcus aureus nitric oxide resistance. J Bacteriol 198:2043-2055.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study, the discovery and subsequent design of the AF03-NL chimeric antibody yielded a tool for studying filoviruses and provides a possible blueprint for future therapeutics. However, the data are incomplete and not presented clearly, which obscures flaws in the analyses and leaves unexplained phenomena. The work will be of interest to virologists studying antibodies.

      Author response: Thank for your very valuable comments. The ms has been revised substantially and some new data have been added to further support the conclusions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      Zhang et al. conducted a study in which they isolated and characterized a Marburg virus (MARV) glycoprotein-specific antibody, AF-03. The antibody was obtained from a phage-display library. The study shows that AF-03 competes with the previously characterized MARV-neutralizing antibody MR78, which binds to the virus's receptor binding site. The authors also performed GP mutagenesis experiments to confirm that AF-03 binds near the receptor binding site. In addition, the study confirmed that AF-03, like MR78, can neutralize Ebola viruses with cleaved glycoproteins. Finally, the authors demonstrated that NPC2-fused AF-03 was effective in neutralizing several filovirus species.

      Weaknesses:

      (1) The main premise of this study is unclear. Flyak et al. in 2015 described the isolation and characterization of a large panel of neutralizing antibodies from a Marburg survivor (Flyak et al., Cell, 2015). Based on biochemical and structural characterization, Flyak proposed that the Marburg neutralizing antibodies bind to the NPC1 receptor binding side. In the same study, it has been shown that several MARV-neutralizing antibodies can bind to cleaved Ebola glycoproteins that were enzymatically treated to remove the mucin-like domain and glycan cap. In the following study, it has been shown that the bispecific-antibody strategy can be used to deliver Marburg-specific antibodies into the endosome, where they can neutralize Ebola viruses (Wec et al., Science 2016). Finally, the use of lysosome-resident protein NPC2 to deliver antibody cargos to late endosomes has been previously described (Wirchnianski et al., Front. Immunol, 2021). The above-mentioned studies are not referenced in the introduction. The authors state that "there is no licensed treatment or vaccine for Marburg [virus] infection." While this is true, there are human antibodies that recognize neutralizing epitopes - that information can't be excluded while providing the rationale for the study. Furthermore, the authors use the word "novel" to describe the AF-03 antibody. How novel is AF-03 if multiple Marburg-neutralizing antibodies were previously characterized in multiple studies? Since AF-03 competes with previously characterized MR78, it binds to the same antigenic region as MR78. AF-03 also has comparable neutralization potency as MR78.

      Author response: Thank for your valuable advice. In terms of the novelty of AF-03, the inhibition assay indicates that Q128/N129/C226 functions as key amino acids responsible for AF-03 neutralization given that the neutralizing capacity of AF-03 to pesudotyped virus harboring these mutants is impaired (see revised Fig. 2A left panel). Furthermore, ELISA assays show that mutation of Q128S-N129S or C226Y significantly disrupts the binding of GP to AF-03, while the neutralizing and binding capacity of MR78 to mutant GP and pseudovirus harboring C226Y instead of Q128S-N129S is not almost affected (see revised Fig. 2A right panel and 2B). Considering the fact that AF-03 and MR78 could compete with each other to bind to MARV GP (Fig. 2D). we thus make a conclusion that the epitopes of these two mAbs overlapped partially. Therefore, AF-03 is not a clone of MR78 and is a novel neutralizing mAb to MARV.

      The work from Wirchnianski and colleagues has been referenced actually in the ms (see Ref. 38). Although our strategy for the design of broad-spectrum neutralizing antibody refers to their work, we further expand the species being evaluated including RAVN and mutated EBOV strains. The results show that NPC2-fused AF-03 exhibits neutralizing activity to 10 filovirus species and 17 EBOV mutants (Fig. 6A and B). The work by Flyak et al. in 2015 that described the isolation and characterization of a large panel of neutralizing antibodies from a Marburg survivor has been cited in Introduction section accordingly.

      (2) Without the AF-03-MARV GP crystal structure, it's unclear how van der Waals interactions, H-bonds, and polar and electrostatic interactions can be evaluated. While authors use computer-guided homology modeling, this technique can't be used to determine critical interactions. Furthermore, Flyak et al. reported that binding to the NPC1 receptor binding site is the main mechanism of Marburg virus neutralization by human monoclonal antibodies. Since both AF-03 (this study) and MR78 (Flyak study) competed with each other, that information alone was sufficient for GP mutagenesis experiments that identified the NPC1 receptor binding site as the main region for mutagenesis.

      Author response: Computer-guided homology modeling has been exploited successfully in our lab to determine key residues responsible for the interaction between antigen and mAbs (Immunol Res. 2015, 62:377; Scand J Immunol. 2019, 90:e12777; Sci Rep. 2022, 12:8469; Front Immunol. 2022, 13:831536). We refer to the crystal structure of MARV GP and the complex of MR78 and GP reported previously (Cell 2015, 160:904) and then model the complex of MARV GP and AF-03. Although AF-03 and MR78 compete with each other, we show that the epitopes of these two mAbs just overlap partially (Fig. 2A-D).

      (3) The AF-03-GP affinity measurements were performed using bivalent IgG molecules and trimeric GP molecules. This format does not allow accurate measurements of affinity due to the avidity effect. The reported KD value is abnormally low due to avidity effects. The authors need to repeat the affinity experiments by immobilizing trimeric GPs and then adding monovalent AF-03 Fab.

      Author response: As shown in Fig. 1A, GP protein used in this work is not trimer but largely monomer composed of MLD-deleted GP1 and GP2, which may at a certain extent weaken the engagement between GP and AF-03. It is noteworthy that we re-done the SPR assays for the binding of AF-03 to GP and show that KD value is 4.71x10-11M (see revised Fig. 1C). This GP protein is thus available to the evaluation of mAb affinity. In addition, it is reasonable to utilize bivalent IgG to detect the affinity of mAb to monomeric GP since the affinity likely decreases significantly when monovalent Fab is used.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe the discovery of a filovirus neutralizing antibody, AF03, by phage display, and its subsequent improvements to include NPC2 that resulted in a greater breadth of neutralization. Overall, the manuscript would benefit from considerable grammatical review, which would improve the communication of each point to the reader. The authors do not convincingly map the AF03 epitope, nor do they provide any strong support for their assumption that AF03 targets the NPC1 binding site. However, the authors do show that AF03 competes for MR78 binding to its epitope, and provides good support for the internalization of AF03-NL as the mechanism for improved breadth over the original AF03 antibody.

      Strengths:

      This study shows convincing binding to Marburgvirus GP and neutralization of Marburg viruses by AF03, as well as convincing neutralization of Ebolaviruses by AF03-NL. While there are no distinct populations of PE-stained cells shown by FACS in Figure 5A, the cell staining data in Figure 5C are compelling to a non-expert in endosomal staining like me. The control experiments in Figure 7 are compelling showing neutralization by AF03-NL but not AF03 or NPC2 alone or in combination. Altogether these data support the internalisation and stabilisation mechanism that is proposed for the gain in neutralization breadth observed for Ebolaviruses by AF03-NL over AF03 alone.

      Weaknesses:

      Overall, this reviewer is of the opinion that this paper is constructed haphazardly. For instance, the neutralization of mutant pseudoviruses is shown in Figure 2 before the concept of pseudovirus neutralization by AF03 is introduced in Figure 3. Similarly, the control experiments for AF03+NPC2 are described in Figure 7 after the data for breadth of neutralization are shown in Figure 6. GP quality controls are shown in Figure 2 after GP ELISAs / BLI experiments are done in Figure 1. This is disorienting for the reader.

      Author response: AF-03 production and its binding capacity to GP is determined in Fig. 1. The epitopes of AF-03 is identified in Fig. 2. The neutralizing activity of AF-03 to pseudotyped MARV in vitro and in vivo is detected in Fig. 3. The neutralizing activity of AF-03 to pseudotyped ebolavirus harboring cleaved GP is detected in Fig. 4. The endosome-delivering ability of AF03-NL is examined in Fig. 5. The neutralization of filovirus species and EBOV mutants by AF03-NL is detected in Fig. 6. The requirement of CI-MPR for neutralization activity of AF03-NL is determined in Fig. 7. We think that this arrangement is suitable.

      Figure 1: The visualisation of AF03 modelling and docking endeavours is extremely difficult to interpret. Firstly, there is no effort to orient the non-specialist reader with respect to the Marburgvirus GP model. Secondly, from the figures presented it is impossible to tell if the Fv docks perfectly onto the GP surface, or if there are violent clashes between the deeply penetrating AF03 CDRs and GP. This information would be better presented on a white background, perhaps showing GP in surface view from multiple angles and slices. The authors attempt to label potential interactions, but these are impossible to read, and labels should be added separately to appropriately oriented zoomed-in views.

      Author response: To be readily understood the rationale of computer-guided modeling, the descriptions in the Methods and Results section have been refined accordingly. In addition, the information of the theoretical structure was presented on white background (see revised Fig. 1D-F).

      Figure 2: The neutralization of mutant pseudoviruses cannot be properly assessed using bar graphs. These data should be plotted as neutralization curves as they were done for the wild-type neutralization data in Figure 3. The authors conclude that Q128 & N129 are contact residues, but the neutralization data for this mutant appear odd as the lowest two concentrations of AF03 show higher neutralization than the second highest AF03 concentration. Neutralization of T204/Q205/T206 (green), Y218 (orange), K222 (blue), or C226 (purple) appears to be better than neutralization of the wild-type MARV. The authors do not discuss this oddity. What are the IC50's? The omission of antibody concentrations on the x-axis and missing IC50 values give a sense of obscuring the data, and the manuscript would benefit from greater transparency, and be much easier to interpret if these were included. I am intrigued that the Q128S/N129S mutant is reported as having little effect on the neutralization of MR78. The bar graph appears to show some effect (difficult to interpret without neutralization curves and IC50 data), and indeed PDB:5UQY seems to suggest that these amino acids form a central component of the MR78 epitope (Q128 forms potential hydrogen bonds with CDRH1 Y35 and CDRL3 Y91, while N129 packs against the MR78 CDRH3 and potentially makes additional polar contact with the backbone). Lastly, since neutralization was tested in both HEK293T cells and Huh7 cells in Figure 3, the authors should clarify which cells were used for neutralization in Figure 2.

      Author response: Thank for your advice. Accordingly, in the revised ms, the neutralization curve of AF-03 and MR78 is presented in revised Fig. 2A. The neutralization of AF-03 to pseudotyped MARV harboring Q128S/N129S or C226Y is impaired significantly compared with WT MARV and those bearing other indicated mutations, while Q128S/N129S instead of C226Y mutation affect the neutralizing capacity of MR78 at a certain extent. This is consistent with the data on the binding of AF-03 or MR78 to MARV GP protein assayed by ELISA (see revised Fig. 2B). Overall, these results show that Q128/N129/C226 functions as key amino acids responsible for AF-03 neutralization.

      Figure 3: The first two images in Figure 3C showing bioluminescent intensity from pseudovirus-injected mice pretreated with either 10mg/kg or 3mg/kg AF03 are identical images. This is apparent from the location, shape, and intensity of the bioluminescence, as well as the identical foot placement of each mouse in these two panels. Currently, this figure is incomplete and should be corrected to show the different mice treated with either 10mg/kg or 3mg/kg of AF03.

      Author response: Thank for your carefulness. Indeed, it is our mistake. In the revised ms, this fault has been corrected. The correct images have been added (see revised Fig. 3C).

      Figure 4 would benefit from a control experiment without antibodies comparing infection with GP-cleaved and GP-uncleaved pseudoviruses. The paragraph describing these data was also difficult to read and would benefit from additional grammatical review.

      Author response: Accordingly, a control experiment comparing the infection of GP-cleaved with GP-uncleaved pseudoviruses is performed. The results show that The infection of pseudotyped ebolavirus harboring cleaved GP to host cells is comparable or stronger than those containing intact GP(see revised Fig. s1). Therefore, the data in Fig. 4 support the inhibition of cell entry of ebolavirus species harboring cleaved GP by AF-03, which is not attributed to the possible impairment of cell entry capacity of GPcl-containing ebolavirus. In addition, the sentences have been modified to be read smoothly.

      Figure 5: The authors should clarify in the methods section that the "mock" experiment included the PE anti-human IgG Fc antibody. Without this clarification, the lack of a distinct negative population in the FACS data could be interpreted as non-specific staining with PE. If the PE antibody was added at an equivalent concentration to all panels, what does the directionality of the arrowheads in Figure 5A (labelled PE) and 5B (labelled pHrodo Red) indicate?

      Author response: Thank for your advice. In the revised version, we denote that the mock is actually a human IgG isotype in the figure legend. The arrowheads denote the fluorescence intensity of PE or pHrodo on the lateral axis of the plots. Of course, herein the percentage of PE or pHrodo-positive cells is shown.

      Figure 6B: These data would benefit from the inclusion of IC50, transparency of antibody concentrations used, and consistency in the direction of antibody concentrations (increasing to the right or left of the x-axis) when compared to Figure 2.

      Author response: The concentration of antibody titrated is shown in figure legends. The direction of antibody concentrations is unified throughout the paper. Although IC50 is not included, these data clearly show that AF03-NL rather than AF-03 prominently inhibits the cell entry of EBOV mutants.

      Reviewer #1 (Recommendations For The Authors):

      Line 143: anti-human should be anti-human.

      Line 223: From the SDS-PAGE results, it's not clear that the AF-03 was expressed in the eukaryotic cell line. Please, rephrase the sentence.

      Line 263: ELISA experiments can't be used to determine affinity.

      Line 394: Flyak et al. generated human antibodies from PBMC samples of Marburg survivors, not plasma samples.

      Author response: According to reviewer's advice, the sentences have been modified or corrected to more accurately describe the results. As well, the grammatic errors in the ms have been corrected carefully.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major Concerns:

      (1) An important point that the authors should clarify in this study is whether mice are detecting qualitative or quantitative differences between fresh and old cat saliva. Do the environmental conditions in which the old saliva was maintained cause degradation of Fel d 4, the main protein known for inducing a defensive response in rodents? (see Papes et al, 2010 again). If that is the case, one would expect that a lower concentration of Fel d 4 in the old saliva after protein degradation would result in reduced antipredator responses. Alternatively, if the authors believe that different proteins that are absent in the old saliva are contributing to the increased defensive responses observed with the fresh saliva, further protein quantification experiments should be performed. An important experiment to differentiate qualitative versus quantitative differences between the two types of saliva would be diluting the fresh saliva to verify if the amount of protein, rather than the type of protein, is the main factor regulating the behavioral differences.

      We thank the reviewer for their important suggestions. We agree that both the quality and quantity of molecular components in saliva undergo changes after the saliva is kept at room temperature for 4 hours. Our findings indicate that mice detect these changes through the VNO and adjust their defensive response patterns accordingly. For instance, freezing behavior is reduced in response to 4-hour-old saliva compared to fresh saliva. On the other hand, the duration of interaction with saliva (investigation behavior) remains low, and the stress hormone ACTH level is upregulated in both cases. A future study ought to identify the specific molecules—most likely proteins or peptides—in cat saliva responsible for these distinct defensive responses in mice. While Fel d 4 stands as one of the potential candidates as it has been shown to induce a form of defensive behavior in mice (Papes et al., 2010), there exists a possibility of a different molecule or a combination of multiple molecules playing a role. Once the molecules are identified, it is imperative to investigate how their quantity and quality change over time and how these factors correlate with freezing behavior in mice. Such an exploration will provide answers to this ethologically significant question raised by the reviewer. We added a paragraph in Discussion under the “The VNO as the sensor of predator cues that induce fear-related behavior” section to clarify this.

      (2) The authors claim that fresh saliva is recognized as an immediate danger by rodents, whereas old saliva is recognized as a trace of danger. However, the study lacks empirical tests to support this interpretation. With the current experimental tests, the behavioral differences between animals exposed to fresh vs. old saliva could be uniquely due to the reduced amount of the exact same protein (e.g., Fel d 4) in the two samples of saliva.

      As mentioned in response to comment 1, we agree with the alterations in both the quality and quantity of molecules within saliva after 4 hours. What we would like to emphasize in our current study is that mice detect these time-dependent changes through the VNO and subsequently adjust their defensive response patterns. Identifying the specific molecules responsible for inducing behavioral changes and investigating their time-dependent alterations is crucial in the next step. We added a paragraph in the Discussion under the 'The VNO as the sensor of predator cues that induce fear-related behavior' section to clarify this.

      (3) In Figure 4H, the authors state that there were no significant differences in the number of cFos-positive cells between the two saliva-exposed groups. However, this result disagrees with the next result section showing that fresh and old saliva differentially activate the VMH. It is unclear why cFos quantification and behavioral correlations were not performed in other upstream areas that connect the VNO to the VMH (e.g., BNST, MeA, and PMCo). That would provide a better understanding of how brain activity correlates with the different types of behaviors reported with the fresh vs. old saliva.

      We greatly appreciate this valuable advice. We added c-Fos immunoreactivity (IR) data in the BNST, MeApv, and PAG, together with the data for VMH as shown in new Figure 4G-J. Upon exposure to both fresh and old saliva, we observed an upregulation trend of cFos in the MeApv, VMH, and dPAG, but not in the BNST, compared to the control stimulus.

      Moreover, we conducted correlation analyses between the numbers of cFos-positive neurons and the duration of freezing behavior in those neural substrates, which have been added to new Figure 5. The numbers of cFos-IR signals in neurons in the BNST and dPAG did not correlate with the duration of freezing behavior in any of the exposure groups (Figure 5C, F). However, in addition to a significant positive correlation in the VMH for the fresh saliva-exposed group (R2 = 0.5708, 95% CI [-0.1449, 0.9714], p = 0.0412) (Figure 5E), we observed a similar positive correlation trend in the MeApv (R2 = 0.3854, 95% CI [0.3845, 0.9525], p = 0.0942), although it was not statistically significant possibly due to low sample numbers (Figure 5D).

      Based on these results, our current circuit model is as follows: different numbers of the VNO sensory neurons activated by fresh and old saliva result in differential excitation levels in mitral cells in the AOB. This, in turn, leads to the differential activation of targeting neural substrates, possibly MeApv, resulting in the differential activation of VMH neurons. This model is depicted in Figure 7 and discussed under the section of 'Differential processing of fresh and old saliva signals in the VNO-to-VMH pathway' in the Discussion."

      (4) The interpretation that fresh and old saliva activates different subpopulations of neurons in the VMH based on the observation that cFos positively correlates with freezing responses only with the fresh saliva lacks empirical evidence. To address this question, the authors should use two neuronal activity markers to track the response of the same population of VHM cells within the same animals during exposure to fresh vs. old saliva. Alternatively, they could use single-cell electrophysiology or imaging tools to demonstrate that cat saliva of distinct freshness activates different subpopulations of cells in the VMH. Any interpretation without a direct within-subject comparison or the use of cell-type markers would become merely speculative. Furthermore, the authors assume that differential activations of mitral cells between fresh and old saliva result in the differential activation of VMH subpopulations (page 13, line 3). However, there are intermediate structures between the mitral cells and the VMH, which are completely ignored in this study (e.g., BNST, medial amygdala).

      We appreciate this important feedback. We agree that performing a same-animal comparison for fresh and old saliva exposure will offer direct evidence of the differential activation of a sub-population of VMH neurons. However, there is technical difficulties. We have stimulated the same animal with the same or different types of swabs (e.g., Freshcontrol, fresh-fresh, fresh-old, or old-fresh) and observed that once mice were exposed to a saliva-containing swab and exhibited freezing behavior, they no longer made contact with the second swab within the timeframe when two different types of neuroactivity markers can be analyzed. As shown in Figure 2A, direct contact with the saliva swab is necessary for triggering saliva-elicited freezing behavior. Therefore, we concur that conducting further investigations into real-time neural activation responses to both fresh and old saliva within the same subjects, using an appropriate stimulus delivery method into the VNO, as demonstrated in (Bansal et al., 2021; Ben-Shaul et al., 2010; Bergan et al., 2014), would be useful to strengthen our argument.

      For the second part of the comment regarding the intermediate structures between the mitral cells and the VMH, please refer to our comment above in response to comment 3.

      (5) The authors incorrectly cited the Papes et al., 2010 article on several occasions across the manuscript. In the introduction, the authors cited the Papes et al 2010 study to make reference to the response of rodents to chemical cues, but the Papes et al. study did not use any of the chemical cues listed by the authors (e.g., fox feces, snake skin, cat fur, and cat collars). Instead, the Papes et al. 2010 article used the same chemical cue as the present study: cat saliva. The Papes et al. 2010 article was miscited again in the results section where the authors cited the study to make reference to other sources of cat odor that differ from the cat saliva such as cat fur and cat collars. Because the Papes et al. 2010 article has previously shown the involvement of Trpc2 receptors in the VNO for the detection of cat saliva and the subsequent expression of defensive behaviors by using Trpc2-KO mice, the authors should properly cite this study in the introduction and across the manuscript when making reference to their findings.

      The study conducted by Papes et al. in 2010 (Papes et al., 2010) explored mouse defensive responses triggered by native odors derived from three natural mouse predator species: cat, snake, and rat. These odors were derived from neck fur swabs, shed skin, and urine, respectively. Notably, all three types of samples induced defensive risk assessment and avoidance behaviors in mice. These responses were significantly diminished in Trpc2 knock-out (KO) mice, which lack the Trpc2 transduction channel in their vomeronasal sensory neurons, resulting in an impairment in transmitting sensory signals to the brain. Moreover, Papes et al. (2010) mentioned that, 'we did find cat saliva, a potential source of fur chemosignals, sufficient to induce c-Fos expression in the AOB and initiate defensive behavior.' While Papes et al. reported c-Fos expression in the AOB as well as behavioral responses induced by cat saliva in C57BL/6 mice, they did not provide information regarding the c-Fos expression or the defensive behavioral responses to cat saliva in Trpc2KO mice. Overall, we highly value these findings and explicitly state in the results section of our study that ‘Cat saliva has been considered as a source of predator cues found on cat fur and collars, which induce defensive behaviors in rodents (Engelke et al., 2021; Papes et al., 2010),’ providing the rationale for our utilization of cat saliva in our experimental design.

      (6) In the introduction, the authors hypothesized that the VNO detects predator cues and sends sensory signals to the VMH to trigger defensive behavioral decisions and stated that direct evidence to support this hypothesis is still missing. However, the evidence that cat saliva activates the VMH and that activity in the VMH is necessary for the expression of antipredator defensive response in rodents has been previously demonstrated in a study by Engelke et al., 2021 (PMID: 33947849), which was entirely omitted by the authors.

      We appreciate this insightful comment. Our original sentence meant that the direct evidence was missing for the hypothesis that the mouse VNO detects predator cues and sends sensory signals to the VMH, triggering appropriate defensive behavioral decisions. To clarify this, we altered the sentence (the last sentence of the second last paragraph in Introduction) to “However, how the sensory signals detected through the VNO-to-VMH circuitry modulate behavioral decisions in specific contexts remains elusive.

      The study in Engelke et al., 2021(Engelke et al., 2021) has shown that cat saliva activates the VMH and that activity in the VMH is necessary for the expression of antipredator defensive response, including freezing behavior, in rats. This important paper is now cited at multiple locations; page 4 line 16, page 9 line 8, and page 14 line 17. Interestingly, the vomeronasal receptor genes expressed in cat saliva-responsive VNO neurons, V2R-A4 subfamily genes, seem to have expanded independently within mice and rats, lacking direct V2R-A4 orthologues between mice and rats (Rocha et al. submitted). Therefore, exploring the sensory mechanism behind the induction of defensive behavioral responses in rats by cat saliva would be highly intriguing. Comparing the mechanism operating in rats with that observed in mice could offer valuable insights into understanding how the divergent sensory signaling pathways lead to the VMH-mediated defensive behavioral responses across different species.

      (7) In the discussion, the authors stated that their findings suggest that the induction of robust freezing behavior is mediated by a distinct subpopulation of VMH neurons. The authors should cite the study by Kennedy et al., 2020 (PMID: 32939094) that shows the involvement of VMH in the regulation of persistent internal states of fear, which may provide an alternative explanation for why distinct concentrations of saliva could result in different behavioral outcomes.

      We appreciate this valuable advice to cite this important paper. It is now cited at page 14 line 17 in the Discussion under “Differential activation of VMH neurons potentially underlying distinct intensities of freezing behavior.” We agree that it is intriguing to hypothesize that different freshness of cat saliva induces different degree of persistence of neural activity in a subpopulation of VMH neurons, which regulates the freezing behavior intensity.

      (8) The anatomical connectivity between the olfactory system and the ventromedial hypothalamus (VMH) in the abstract is unclear. The authors should clarify that the VMH does not receive direct inputs from the vomeronasal organ (VNO) nor the accessory olfactory bulb (AOB) as it seems in the current text.

      We apologize for the confusion caused by our statement in the abstract. The reviewer is correct that the VMH does not receive direct inputs from the VNO and AOB. The abstract now states: 'The vomeronasal organ (VNO) is one of the major sensory input channels through which predator cues are detected with ascending inputs to the medial hypothalamic nuclei, especially to the ventromedial hypothalamus (VMH), through the medial amygdala (MeA) and bed nucleus of the stria terminalis (BNST).’

      Reviewer #2 (Public Review):

      Weakness:

      The findings are relatively preliminary. The identities of the receptor and the ligand in the cat saliva that induces the behavior remain unclear. The identity of VMH cells that are activated by the cat saliva remains unclear. There is a lack of targeted functional manipulation to demonstrate the role of V2R-A4 or VMH cells in the behavioral response to cat saliva.

      We concur with the reviewer’s comments and agree with the necessity to explore the behavioral response to cat saliva in mice with V2R-A4 receptor(s) knocked out, alongside those with targeted functional manipulations in the VMH. These future studies will allow us to further elucidate the molecular and neural mechanisms underlying this sensory-tohypothalamic circuit.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) It is unclear if fresh and old saliva indeed alter the perceived imminence predation, as claimed by the authors. Prior work indicates that lower imminence induces anxiety-related actions, such as re-organization of meal patterns and avoidance of open spaces, while slightly higher imminence produces freezing. Here, the authors show that fresh and old predator saliva only provoke different amounts of freezing, rather than changing the topography of defensive behaviors, as explained above. Another prediction of predatory imminence theory would be that lower imminence induced by old saliva should produce stronger cortical activation, while fresh saliva would activate the amygdala, if these stimuli indeed correspond to significantly different levels of predation imminence.

      We thank the reviewer for this valuable insight. In our current study, we exclusively compared defensive behavioral responses to 15-minute-old and 4-hour-old cat saliva in mice within their home cages. In future studies, it would be intriguing to expand this investigation by examining behavioral changes in response to saliva collected at additional time points across diverse behavioral settings. Additionally, exploring neural activity in various brain regions in future studies would complement our understanding of these responses.

      (2) It is known that predator odors activate and require AOB, VNO, and VMH, thus replications of these findings are not novel, decreasing the impact of this work.

      We acknowledge the previous findings mentioned by the reviewer. Our finding in this paper is that cat saliva samples with different freshness predominantly activate different numbers of VNO sensory neurons expressing the same subfamily of sensory receptors, which results in differential activation of the downstream circuit to modulate behavioral outputs.

      (3) There is a lack of standard circuit dissection methods, such as characterizing the behavioral effects of increasing and decreasing the neural activity of relevant cell bodies and axonal projections, significantly decreasing the mechanistic insights generated by this work.

      We thank the reviewer for the valuable comments. We acknowledge that exploring the behavioral effects through the manipulation of specific cell types within defined neural substrates, along with characterizing circuit connectivity, is crucial to understand this circuit more thoroughly in future studies.

      (4) The correlation shown in Figure 5c may be spurious. It appears that the correlation is primarily driven by a single point (the green square point near the bottom left corner). All correlations should be calculated using Spearman correlation, which is non-parametric and less likely to show a large correlation due to a small number of outliers. Regardless of the correlation method used, there are too few points in Figure 5c to establish a reliable correlation. Please add more points to 5c.

      We thank the reviewer for this important suggestion. We assessed normality of the data using the Shapiro-Wilk and Kolmogorov-Smirnov tests, confirming that the dataset is parametric. We anticipate employing a larger sample size in future studies to further examine rigorous correlation patterns.

      (5) Some of the findings are disconnected from the story. For example, the authors show that V2R-A4-expressing cells are activated by predator odors. Are these cells more likely to be connected to the rest of the predatory defense circuit than other VNO cells?

      Yes, our hypothesis posits that V2R-A4-expressing VNO sensory neurons serve as receptor neurons for predator cues present in cat saliva. Additionally, we assume that these specific sensory neurons have stronger anatomical connections with the defensive circuit compared to VNO sensory neurons expressing other receptor subfamilies. In our modified Discussion section, we discussed this point under “V2R-A4 subfamily as the receptor for predator cues in cat saliva.”

      (6) Were there other behavioral differences induced by fresh compared to old saliva? Do they provoke differences in stretch-attend risk evaluation postures, number of approaches, the average distance to odor stimulus, the velocity of movements towards and away from the odor stimulus, etc?

      We appreciate the reviewer's valuable comments. We have now incorporated an analysis of stretch-sniff risk assessment behavior, presented in new Figure 1F (graph) and Supplemental Figure 1B (raster plot). Mice exhibited stretch-sniff risk assessment behavior, which remained consistent across control, fresh saliva, and old saliva swabs. Additionally, we have also included a raster plot for direct investigation, previously noted as ‘interaction’ in the original manuscript (Supplemental Figure 1C). Mice exposed to a swab containing either fresh or old saliva significantly avoided directly investigating the swab. In contrast, mice exposed to a clean control swab spent a significant amount of time directly investigating the swab, engaging in behaviors such as sniffing and chewing (Figure 1G). A comparison of temporal behavioral patterns revealed a slightly higher frequency of direct investigation behavior toward old saliva compared to fresh saliva at the beginning of the exposure period (Supplemental Figure 1C).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (A) In the discussion (page 13, line 13), the authors proposed approaches to isolate receptors among the V2R-A4 subfamily that could be responsible for the detection of predator cues in cat saliva such as mRNA profiling from cells isolated from VNO GCaMP imaging. However, the authors argue that this method can lead to false positive results. The authors should clarify what they mean by this exactly.

      We meant that pairing of kairomones and their cognate vomeronasal receptors is overall challenging, and subsequent confirmations by performing loss-of-function, as well as gainof-function studies, are necessary to avoid false positive receptor-ligand pairings. We modified the sentence in the discussion as follows: “…. as well as receptor mRNA profiling from isolated single cells activated by cat saliva in GcaMP imaging using the VNO slices in vitro (Haga-Yamanaka et al., 2014; Wong et al., 2020). Receptor candidates identified using either of the methods can be further confirmed by examining necessity and sufficiency for detecting cat saliva using genetically modified mouse lines.”

      (B) In the discussion, the authors mention that imminent predator cues present in the cat saliva activate a specific population of VMN neurons. However, the authors have not demonstrated that imminent predator cues exist and the differences between fresh and old saliva are not simply a matter of concentration and integrity of the same protein (see a similar concern in item 2 above).

      In alignment with our responses to the reviewer’s public comments 1 and 2, we acknowledge the changes in both the quality and quantity of molecules in cat saliva when kept at room temperature for 4 hours. Our findings demonstrate that mice detect this timedependent alteration through the VNO, leading to subsequent adjustments in their defensive response patterns. The identification of specific molecules responsible for inducing behavioral changes and an exploration of their time-dependent alterations are crucial steps in our ongoing research. To provide further clarification, we have added a paragraph in the discussion section under 'The VNO as the sensor of predator cues that induce fear-related behavior.’

      (C) In the introduction, the authors cite several studies and reviews that investigated sensory neural circuits that mediate behavioral responses to chemical predator cues in mice. However, the majority of these studies used rats. Therefore, it is recommended to instead indicate that these studies focus on using rodent models.

      We appreciate this insightful comment. We have now replaced the term 'mice/mouse' with 'rodents' in corresponding parts of the manuscript.

      (D)The description of the extended amygdala is unclear and gives the impression that the posteroventral part of the medial amygdala is also part of the extended amygdala (page 3, line 25).

      We appreciate the reviewer’s important feedback. We have removed the phrase 'the extended amygdala consisting of' from the text.

      (E) The authors should justify why they have focused on the role of V2R-A4 in cat saliva detection. As shown in the Figure 3A schematic, many other receptors within the V2R family could have been evaluated. Additionally, the authors should indicate how many mice were used for calculating the ratio for each receptor in Figure 3C, and a group comparison should be performed.

      As shown in Supplemental Figure 2 and Figure 3C, our initial investigation involved assessing the co-localization of pS6 signals with signals derived from in situ hybridization probes for all V2R subfamilies. Each probe was designed to recognize all the receptor genes within the subfamily under the tested conditions. This examination led to the identification of V2R-A4, whose probe signals overlap with pS6 signals induced by exposure to cat saliva. In Figure 3C, the percentage of total overlap between the in situ probe and pS6 signals in VNO sections was examined from n=3-6 animals, which is now mentioned in the modified figure legend.

      (F) The authors should make it clear to readers at the very beginning of the manuscript that the behavioral differences between fresh and old saliva are not caused by the inefficiency of the old cat saliva to induce defensive responses. Thus, other antipredator behavioral responses should be also quantified (e.g., avoidance time, number and time of investigations to the cat saliva source, risk-assessment, etc.)

      We appreciate this valuable comment from the reviewer. In the original version of our manuscript, we used the term 'interaction' to indicate 'direct interaction with the swab for investigation.' We have now replaced the term 'interaction' with 'direct investigation' and added the temporal patterns of these behavioral episodes in Supplemental Figure 1C. Our observations indicate that mice avoid directly investigating both fresh and old saliva compared to the control (Figure 1G). However, there is a slight increase in investigation behavior toward old saliva at the beginning of exposure compared to fresh saliva (Supplemental Figure 1C). Furthermore, we have included the duration (Figure 1F) and temporal patterns (Supplemental Figure 1B) of stretch-sniff risk assessment behavior. Notably, stretch-sniff behavior did not differ towards control, fresh, and old saliva swabs.

      (G) The selected representative images for Gαo- and pS6-labeled neurons in Figure 2 should have similar levels of DAPI labeling. Further, the plot depicting the duration of freezing as a function of pS6-IR signals in the VNO (Figure 2H) is difficult to follow. The authors should indicate on the graph which data points represent fresh or old cat saliva exposure, similar to the style used in Figure 5 plots.

      We have replaced the representative image in Figure 2E to align the DAPI intensity. Additionally, we updated the data points in Figure 2H and introduced a color code to indicate saliva types.

      (H) The schematic in Figure 4 is misleading because the AOB does not directly project to the VMH. The authors should explain which regions are conveying indirect predator information from AOB to VMH (see a similar concern in item 7 above).

      We thank the reviewer’s important feedback. We modified the image in Figure 4A to show the entire defensive behavior circuit initiated from the VNO.

      Reviewer #2 (Recommendations For The Authors):

      (1) This result suggests that V2R-A4 may be the dominant VR for mice to detect cat saliva.

      Future studies should determine the identity of the receptor and the ligand in the cat saliva. Additionally, the functional importance of V2R-A4 remains unclear. It is important to knockout the receptor and test changes in cat saliva-induced freezing.

      We concur with the reviewer’s comments and recognize the necessity of exploring the behavioral response to cat saliva in mice with V2R-A4 receptor(s) knocked out. Moreover, the identification of the ligand in cat saliva is critical for a deeper understanding of the molecular mechanisms in future studies.

      (2) AOB does not project to VMH directly. Other known important nodes for the predator defense circuit include MeApv, BNST, PMd, AHN, and PAG. It will be helpful to provide c-Fos data in those regions (especially MEA and BNST as they are between AOB and VMH) to provide a complete picture of how the brain processes cat saliva to induce the behavior change.

      We appreciate this important feedback by the reviewer. We have now added c-Fos expression analysis data in the BNST, MeApv, and PAG, in addition to the VMH. Upon exposure to fresh and old saliva, we observed the upregulation of cFos in the MeApv, VMH, and dPAG, but not in the BNST, compared to the control stimulus. The data are now shown in Figure 4G-J. Moreover, we also added correlation analyses between the numbers of cFospositive neurons and the duration of freezing behavior in those neural substrates to Figure 5. The numbers of cFos-IR signals in neurons in the BNST and dPAG, did not correlate with the duration of freezing behavior in any of the exposure groups (Figure 5C, F). However, in addition to a significant positive correlation in the fresh saliva-exposed group in the VMH (R2 = 0.5708, 95% CI [-0.1449, 0.9714], p = 0.0412) (Figure 5E), we observed a similar positive correlation trend in the MeApv (R2 = 0.3854, 95% CI [-0.3845, 0.9525], p = 0.0942), although it was not statistically significant possibly due to low sample numbers (Figure 5D). Based on these results, our current circuit model is as follows: different numbers of the VNO sensory neurons activated by fresh and old saliva result in differential excitation levels in mitral cells in the AOB. Differential excitation of mitral cells leads to the differential activation of targeting neural substrates, possibly MeApv, which results in differential activation of VMH neurons. This model is depicted in Figure 7 and discussed under the section of “Differential processing of fresh and old saliva signals in the VNO-toVMH pathway” in Discussion.

      (3) It is interesting that activation level difference in the VNO by old and fresh cat saliva does not transfer to AOB. It could be informative to examine the correlation between VNO and AOB p6/c-Fos cell number and AOB and VMH c-Fos cell number across animals to understand whether the activation levels across those regions are related. If they are not correlated, it could be helpful to add a discussion regarding potential reasons, e.g. neuromodulatory inputs to the AOB.

      We agree that analyzing the number of pS6/cFos-positive cells from all the regions in the same animals are ideal; however, due to technical difficulties, we were unable to collect the entire set of neural substrates from the same animals.

      (4) Please indicate n in all figure plots and specify what individual dots mean. In Figure 4h, there are 7 dots in the old saliva group, presumably indicating 7 animals. In Figure 6b, there appear to be more than 7 dots for the old cat saliva group. Are there more than 7 animals? If so, why are they not included in Figure 4h? If not, what does each dot mean? Note that each dot should represent an independent sample. One animal should not contribute more than one dot.

      We apologize for the confusion about Figure 6b. Each of these dots indicates the number of cFos signals in a single VMH hemisphere sample. The data used for this analysis were the same as the ones for the VMH used in Figure 4. This is now clarified in the figure legends.

      (5) The identification of a cluster of VMHdm cells uniquely activated by fresh cat saliva urine is interesting. It will be important to identify the molecular handle of the cells to facilitate further investigation. This could be achieved using either activity-dependent RNAseq or double in situ of saliva-induced c-Fos and candidate genes (candidate gene may be identified based on the known gene expression pattern).

      We agree that these experiments are very valuable. We would like to perform those experiments in future studies.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please cite recent relevant papers showing VMH activity induced by predators, such as https://pubmed.ncbi.nlm.nih.gov/33115925/ and https://pubmed.ncbi.nlm.nih.gov/36788059/

      We thank the reviewer’s suggestion to cite these important papers. https://pubmed.ncbi.nlm.nih.gov/33115925/ (Esteban Masferrer et al., 2020) and https://pubmed.ncbi.nlm.nih.gov/36788059/ (Tobias et al., 2023) are now cited at page 14 line 17 in the Discussion under “Differential activation of VMH neurons potentially underlying distinct intensities of freezing behavior.”

      (2) Add complete statistical information in the figure legends of all figures, which should include n, name of test used, and exact p values.

      We included statistical analysis results in figure legends; for Figure 6B, we provided statistical analysis results in Supplemental Table 1.

      (3) Please paste all figure legends directly below their corresponding figure to make the manuscript easier to read.

      We have added figure legends directly below their corresponding figures.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      Statistics analysis results have been included in figure legends and supplemental table 1.

      References

      Bansal R, Nagel M, Stopkova R, Sofer Y, Kimchi T, Stopka P, Spehr M, Ben-Shaul Y. 2021. Do all mice smell the same? Chemosensory cues from inbred and wild mouse strains elicit stereotypic sensory representations in the accessory olfactory bulb. BMC Biol 19:133.

      Ben-Shaul Y, Katz LC, Mooney R, Dulac C. 2010. In vivo vomeronasal stimulation reveals sensory encoding of conspeciic and allospeciic cues by the mouse accessory olfactory bulb. Proc Natl Acad Sci U S A 107:5172‒5177.

      Bergan JF, Ben-Shaul Y, Dulac C. 2014. Sex-speciic processing of social cues in the medial amygdala. Elife 3:e02743.

      Engelke DS, Zhang XO, OʼMalley JJ, Fernandez-Leon JA, Li S, Kirouac GJ, Beierlein M, Do-Monte FH. 2021. A hypothalamic-thalamostriatal circuit that controls approachavoidance conlict in rats. Nat Commun 12:2517.

      Esteban Masferrer M, Silva BA, Nomoto K, Lima SQ, Gross CT. 2020. Differential Encoding of Predator Fear in the Ventromedial Hypothalamus and Periaqueductal Grey. J Neurosci 40:9283‒9292.

      Papes F, Logan DW, Stowers L. 2010. The vomeronasal organ mediates interspecies defensive behaviors through detection of protein pheromone homologs. Cell 141:692‒703.

      Tobias BC, Schuette PJ, Maesta-Pereira S, Torossian A, Wang W, Sethi E, Adhikari A. 2023. Characterization of ventromedial hypothalamus activity during exposure to innate and conditioned threats. Eur J Neurosci 57:1053‒1067.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      PUBLIC REVIEWS

      Reviewer #1 (Public Review):

      In this study, the authors investigate the role of triglycerides in spermatogenesis. This work is based on their previous study (PMID: 31961851) on triglyceride sex differences in which they showed that somatic testicular cells play a role in whole body triglyceride homeostasis. In the current study, they show that lipid droplets (LDs) are significantly higher in the stem and progenitor cell (pre-meiotic) zone of the adult testis than in the meiotic spermatocyte stages. The distribution of LDs anti-correlates with the expression of the triglyceride lipase Brummer (Bmm), which has higher expression in spermatocytes than early germline stages. Analysis of a bmm mutant (bmm[1]) - a P-element insertion that is likely a hypomorphic - and its revertant (bmm[rev]) as a control shows that bmm acts autonomously in the germline to regulate LDs. In particular, the number of LDs is significantly higher in spermatocytes from bmm[1] mutants than from bmm[rev] controls. Testes from males with global loss of bmm (bmm[1]) are shorter than controls and have fewer differentiated spermatids. The zone of bam expression, typically close to the niche/hub in WT, is now many cell diameters away from the hub in bmm[1] mutants. There is an increase in the number of GSCs in bmm[1] homozygotes, but this phenotype is probably due to the enlarged hub. However, clonal analyses of GSCs lacking bmm indicate that a greater percentage of the GSC pool is composed of bmm[1]-mutant clones than of bmm[rev]-clones. This suggests that loss of bmm could impart a competitive advantage to GSCs, but this is not explored in greater detail. Despite the increase in number of GSCs that are bmm[1]-mutant clones, there is a significant reduction in the number of bmm[1]-mutant spermatocyte and post-meiotic clones. This suggests that fewer bmm[1]mutant germ cells differentiate than controls. To gain insights into triglyceride homeostasis in the absence of bmm, they perform mass spec-based lipidomic profiling. Analyses of these data support their model that triglycerides are the class of lipid most affected by loss of bmm, supporting their model that excess triglycerides are the cause of spermatogenetic defects in bmm[1]. Consistent with their model, a double mutant of bmm[1] and a diacylglycerol Oacyltransferase 1 called midway (mdy) reverts the bmm-mutant germline phenotypes.

      There are numerous strengths of this paper. First, the authors report rigorous measurements and statistical analyses throughout the study. Second, the authors utilize robust genetic analyses with loss-of-function mutants and lineage-specific knockdown. Third, they demonstrate the appropriate use of controls and markers. Fourth, they show rigorous lipidomic profiling. Lastly, their conclusions are appropriate for the results. In other words, they don't over-state the results. Overall, the rigorously quantified results support the major aim that appropriate regulation of triglycerides are needed in a germline cell-autonomous manner for spermatogenesis.

      This paper should have a positive impact on the field. First and foremost, there is limited knowledge about the role of lipid metabolism in spermatogenesis. The lipidomic data will be useful to researchers in the field who study various lipid species. Going forward, it will be very interesting to determine what triglycerides regulate in germline biology. In other words, what functions/pathways/processes in germ cells are negatively impacted by elevated triglycerides. And as the authors point out in the discussion, it will be important to determine what regulates bmm expression such that bmm is higher in later stages of germline differentiation.

      We thank the Reviewer for their positive assessment of our revised manuscript!

      Reviewer #2 (Public Review):

      Summary:

      Here, the authors show that neutral lipids play a role in spermatogenesis. Neutral lipids are components of lipid droplets, which are known to maintain lipid homeostasis, and to be involved in non-gonadal differentiation, survival, and energy. Lipid droplets are present in the testis in mice and Drosophila, but not much is known about the role of lipid droplets during spermatogenesis. The authors show that lipid droplets are present in early differentiating germ cells, and absent in spermatocytes. They further show a cell autonomous role for the lipase brummer in regulating lipid droplets and, in turn, spermatogenesis in the Drosophila testis. The data presented show that a relationship between lipid metabolism and spermatogenesis is congruous in mammals and flies, supporting Drosophila spermatogenesis as an effective model to uncover the role lipid droplets play in the testis.

      Strengths and weaknesses:

      The authors do a commendably thorough characterization of where lipid droplets are detected in normal testes: located in young somatic cells, and early differentiating germ cells. They use multiple control backgrounds in their analysis, including w[1118], Canton S, and Oregon R, which adds rigor to their interpretations. The authors employ markers that identify which lipid droplets are in somatic cells, and which are in germ cells. The authors use these markers to present measured distances of somatic and germ cell-derived lipid droplets from the hub. Because they can also measure the distance of somatic and germ cells with age-specific markers from the hub, these results allow the authors to correlate position of lipid droplets with the age of cells in which they are present. This analysis is clearly shown and well quantified.

      The quantification of lipid droplet distance from the hub is applied well in comparing brummer mutant testes to wild type controls. The authors measure the number of lipid droplets of specific diameters, and the spatial distribution of lipid droplets as a function of distance from the hub. These measurements quantitatively support their findings that lipid droplets are present in an expanded population of cells further from the hub in brummer mutants. The authors further quantify lipid droplets in germline clones of specified ages; the quantitative analysis here is displayed clearly and supports a cell autonomous role for brummer in regulating lipid droplets in spermatocytes.

      Data examining testis size and number of spermatids in brummer mutants clearly indicates the importance of regulating lipid droplets to spermatogenesis. The authors show beautiful images supported by rigorous quantification supporting their findings that brummer mutants have both smaller testes with fewer spermatids at both 29 and 25C. There is also significant data supporting defects in testis size, but not spermatid number, in 14-day-old brummer mutant animals compared to controls. Their analysis clearly shows an expanded region beyond the testis apex that includes younger germ cells, supporting a role for lipid droplets influencing germ cell differentiation during spermatogenesis.

      The authors present a series of data exploring a cell autonomous role for brummer in the germline, including clonal analysis and tissue specific manipulations. The clonal data indicating increased lipid droplets in spermatocyte clones, and a higher proportion of brummer mutant GSCs at the hub are convincing and supported by quantitation. The authors also show a tissue specific rescue of the brummer testis size phenotype by knocking down mdy specifically in germ cells, which is also supported by statistically significant quantitation. The authors present data examining the number of spermatocyte and post-meiotic clones 14 days after clonal induction. Their finding is significant with a p-value of 0.0496, which they acknowledge is less robust than their other data reported in this study, and could be a result of a low sample size. They indicate that future studies might validate these results with additional samples.

      The authors do a beautiful job of validating where they detect brummer-GFP by presenting their own pseudotime analysis of publicly available single cell RNA sequencing data. Their data is presented very clearly, and supports expression of brummer in older somatic and germline cells of the age when lipid droplets are normally not detected. The authors also present a thorough lipidomic analysis of animals lacking brummer to identify triglycerides as an important lipid droplet component regulating spermatogenesis.

      Impact:

      The authors present data supporting the broad significance of their findings across phyla. This data represents a key strength of this manuscript. The authors show that loss of a conserved triglyceride lipase impacts testis development and spermatogenesis, and that these impacts can be rescued by supplementing diet with medium-chain triglycerides. The authors point out that these findings represent a biological similarity between Drosophila and mice, supporting the relevance of the Drosophila testis as a model for understanding the role of lipid droplets in spermatogenesis. The connection buttresses the relevance of these findings and this model to a broad scientific community.

      We thank the Reviewer for their positive assessment of our revised paper!

      RECOMMENDATIONS FOR THE AUTHORS

      Reviewer #2 (Recommendations For The Authors):

      The authors addressed most of my recommendations in a way that is satisfactory to me. I would like a bit more information added to the methods section about how hub area was quantified. For example, did the authors measure area within a defined region in a single Z plane (perhaps the Z plane at the center of the hub, or the Z plane with the largest area)? Alternatively, did they authors measure area in a more 3 dimensional way, i.e. volume. Adding this information to the methods would satisfy all of my previous recommendations.

      We thank the Reviewer for pointing out that this information was not clear in the revised manuscript. We changed the methods section to clarify our methods as follows:

      “The hub was identified as the FasIII-positive area of the testis. Hub size was estimated by measuring the FasIII-positive area in a Z-projected image of the hub in each testis. Zprojections were made using the ‘sum slices’ function in Fiji.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In chicken embryos, the counter-rotating migration of epiblast cells on both sides of the forming primitive streak (PS), a process referred to as polonaise movements, has attracted longstanding interest as a paradigm of morphogenetic cell movements. However, the association between these cell movements and PS development is still controversial. This study investigated PS development and polonaise movements separately at their initial stage, showing that both could be uncoupled (at least at the initial phase), being activated via Vg1 signaling.

      Strengths of this study

      Polonaise movements, i.e., the circular cell migration of epiblast cells on both sides of the forming PS in avian embryos, have been the subject of research through live imaging and promoted the development of new tools to analyze quantitatively such movements. However, conclusions from previous studies remain controversial, at least partly due to the nature of perturbations to PS development and polonaise movements.

      This study performed the challenging technique of electroporation to successfully mark and manipulate Wnt/PCP pathways in unincubated chicken embryo cells at the initiation phase of these two processes. In addition, the authors separately altered PS development and polonaise movements: PS development was perturbed by inhibiting either the Wnt/PCP pathway or DNA synthesis using aphidicolin, while polonaise movements were modified by the development of a second PS after engrafting Vg1-expressing COS cells located at the opposite end of the blastoderm. The study concluded that Vg1 elicits both PS development and polonaise movements, which occur in a parallel and are not inter-dependent.

      To support these conclusions, particle image velocimetry (PIV) of cell trajectories captured by live imaging was performed. These tools delineated visually appealing cell movements and gave rise to vorticity profiles, adding more value to this study.

      Weaknesses of this study

      Engrafted Vg1-expressing COS cells located at the anterior end of the blastoderm elicited both the development of a second PS and marked bilateral polonaise movements while perturbing these movements along the original PS. How do polonaise movements along the second PS dominate over those along the normal PS? The authors suggested a model in which Vg1 acts in a graded or dose-dependent manner since engrafted COS cells over-expressed Vg1. This model can be tested by reducing the mass of engrafted COS cells. Although the authors propose performing this analysis in further investigations, it would be preferable to incorporate into this study for better consistency.

      We would like to express our gratitude to the editors and the reviewers for finding the valuable significances of our study and for giving thoughtful suggestions. We agree that it would be a logical next step to identify the driving mechanism(s) of the polonaise movements, although this is beyond the scope of the current study. Rather, it is the focus of ongoing studies, in which we are investigating how Vg1 works in this concentration context and resulting dose-dependent effect on downstream gene expression, in order to provide a comprehensive understanding of this interesting dual role of Vg1. The relationship between the intensity of Vg1 signaling and the polonaise movements can be tested by modifying the size of the Vg1/COS, as the reviewer pointed out.

      The authors claim that chicken embryo development is representative of "amniotes," but it does not hold for all groups. Avian and mammal species are exceptional among amniotes in the sense they develop a PS (e.g., Coolen et al. 2008). Moreover, in certain mammalian embryos like mouse embryos, cells laterally to the PS do not move much (Williams et al. 2012). The authors should avoid the generalization that chicken embryos unequivocally represent amniotes as opposed to the observed in non-amniote embryos. The observations in chicken embryos as they stand are significant enough.

      References:

      Coolen M, et al. (2008). Molecular characterization of the gastrula in the turtle Emys orbicularis: an evolutionary perspective on gastrulation. PLoS One. 3(7):e2676. doi: 10.1371/journal.pone.0002676

      Williams M, et al. (2012). Mouse primitive streak forms in situ by initiation of epithelial to mesenchymal transition without migration of a cell population. Dev Dyn. 241(2):270-283. doi: 10.1002/dvdy.23711

      We modified the following sentences to the summary and introduction of the revised version as below:

      In Summary:

      (p.1, Lines 9-11.) “Large-scale cell flow characterizes gastrulation in animal development. In amniote gastrulation, particularly in avian gastrula, a bilateral vortex-like counter-rotating cell flow, called ‘polonaise movements’, appears along the midline.”

      In Introduction:

      (p.2, Lines 43-46.) “In amniotes, particularly in avian gastrula (i.e. embryonic disc), a bilateral vortex-like counter-rotating cell flow, termed ‘polonaise movements’, occurs within the epiblast along the midline axis, prior to and during primitive streak (PS) formation.”

      Reviewer #2 (Public Review):

      Summary:

      The authors are interested in large-scale cell flow during gastrulation and in particular in the polonaise movement. This movement corresponds to a bilateral vortex-like counter-rotating cell flow and transport the mesendodermal cells allowing ingression of cells through the primitive streak and ultimately the formation of the mesoderm and endoderm. The authors specifically wanted to investigate the coupling of the polonaise movement and primitive streak to understand whether the polonaise movement is a consequence of the formation of the primitive streak or the other way around. They propose a model where the primitive streak elongation is not required for the cell flow but rather for its maintenance and that robust cell flow is not required for primitive streak extension.

      Strengths:

      Overall, the manuscript is well written with clear experimental designs. The authors have used live imaging and cell flow analysis in different conditions, where either the formation of the primitive streak or the cell flow was perturbed.

      Their live imaging and PIV-based analyses convincingly support their conclusions that primitive streak deformation or mitotic arrest do not impact the initiation of the polonaise movement but rather the location or maintenance of these rotations. They additionally showed that disruption of the polonaise movement in the authentic primitive streak by elegant addition of an ectopic primitive streak does not impact the original primitive streak elongation.

      Weaknesses:

      • When using the delta-DEP-GFP construct, the authors showed that they can manipulate the shape of the primitive streak without affecting the identity and number of primitive streak cells. It is not clear however how this can affect the shape, volume or adhesion of the cells. Some mechanistic insights would strengthen the paper.

      We appreciate the reviewer’s invaluable feedback. We agree that it would be informative to know how the ΔDEP-GFP construct led to PS deformation. This approach has been previously introduced by Voiculescu et al., (2007) to demonstrate an involvement of the Dsh(DEP) in PS shape regulation as described in text (please see pp4-5, lines 91-94 in Results and p13, lines 279-281 in Discussion). The previous study suggested that the Wnt/PCP pathway through Dsh(DEP) is a major regulator of cell intercalation, which plays an important role in PS morphogenesis (Voiculescu et al., 2007).

      • Overall, frequencies of observation are missing for a better view of the phenomenon. For example, do Vg1/Cos cells always disrupt the flow at the authentic primitive streak? Can replicate vector fields be integrated to reflect quantification?

      We agree and have added the numbers of embryos examined. In our experimental system, the Vg1/COS-implanted embryos always exhibited that the original polonaise movements along the authentic PS were always disrupted by the induced polonaise movements (n=4/4 embryos). The replicated vector fields were integrated to the Streamline and Vorticity plots (please see Fig. 1-4, Fig. S1, S4-7).

      • Since myosin cables have been shown to be instrumental for the polonaise movement, it would be interesting to better investigate how the manipulations by the delta-DEP-GFP construct, or Vg1/Cos affect the myosin cables (as shown in preliminary form for the aphidicolin-treated embryos).

      We agree that investigations of cytoskeletons and motor proteins would provide deeper understandings as to how the ΔDEP-GFP construct and perhaps Wnt/PCP components work in PS formation and morphogenesis. We plan to examine, as a future study, the patterns of the myosin cables in the ΔDEP-GFP-misexpressing or Vg1/COS-implanted embryos to get better understanding the mechanism(s) of the polonaise movements as the reviewer pointed out.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • The authors named the dominant-negative Dsh lacking DEP [dnDsh(deltaDEP)]-fused GFP as deltaDEP-GFP, presumably to distinguish it from the construct dnDsh-deltaPDZ previously reported. However, the prefix "dnDsh" conveys the critical function in the present study. The reviewer recommends spelling out dnDsh(deltaDEP)-GFP to clarify to readers which signal was manipulated.

      We agree that it is necessary to distinguish our construct used in this study from the dnDsh-deltaPDZ construct. We have, therefore, clarified the abbreviation in the main text as follows (please see pp 4-5, lines 91-97): ‘The DEP domain of Dishevelled (Dsh; a transducer protein of Wnt signaling) is responsible for the non-canonical Wnt/PCP pathway (43, 44), and misexpression of dominant-negative Dsh lacking DEP [dnDsh(ΔDEP)] leads to deformation of the midline structures, including the PS (21). Further, the Wnt/PCP pathway is involved in cellular polarity and migration, while the canonical Wnt pathway regulates cell proliferation (45). We refer the dnDsh(ΔDEP)-GFP construct that we generated, as ΔDEP-GFP, and tested its ability to alter cellular polarity, resulting in PS deformation’.

      • The authors described the "Vg1 plasmid DNA" as a gift from Claudio D. Stern and Jane Dodd. However, they should indicate the vector backbone, especially whether the vector carries the SV40 ori sequence. Ori-containing plasmids multiply after transfection as COS cells express the SV40T antigen, leading to protein overexpression.

      We added the name of the plasmid ‘pMT23-Vg1-myc-GDF1’ to the ‘Material and methods’ section (please see p25, line 574). pMT23 expression vector is a derivative of pMT21 (Hume and Dodd, 1993) and contains SV40 ori (Wong et al., 1985).

      Reviewer #2 (Recommendations For The Authors):

      • Most of the comments are indicated in the public review.

      • There are additionally minor modifications that would help readers interpret the figures. In Figure S1B and D, it is not clear to the reader what the asterisks indicate.

      We added the sentence ‘The white asterisks indicate GFP-expressing cells.’ to the figure legend of the Fig. S1 B and D (please see p34, line 874).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      In this manuscript, the authors used machine learning algorithm to analyze published exosome datasets to find biomarkers to differentiate exosomes of different origin.

      Strengths:

      The performance of the algorithm are generally of good quality.

      Weaknesses:

      The source datasets are heterogeneous as described in Figure 1 and Figure 2, or Line 72-75; and therefore questionable.

      Response: We thank the reviewer for this assessment. The commonly used biomarkers of exosomes exhibit heterogeneous presence and abundance within the exosomes derived from different cell lines, tissue, and biological fluids. The primary goal of this study was to identify universal exosomal biomarkers that remain consistent across different sources of exosomes, unaffected by potential isolation and quantification bias. This objective was achieved through an integration of datasets from different sources, which allowed for the subsequent identification of common proteins associated with exosomes. Among the 18 protein markers identified, it is noteworthy that they are universally abundant in all cell lines and their exosomes. We believe that despite the heterogeneity of the datasets used here, the identification of 18 universal protein markers in exosomes from diverse sources is a strength of this analysis.

      (1) Nomenclature: Extracellular vesicles (EVs) are small vesicles released by cells into the extracellular space, exhibiting high heterogeneity in origin across species. Exosomes are typically defined as being of multivesicular body origin. However, the absence of several crucial common exosomal markers, including CD63, suggests that the proteomics analysis may include various other vesicular and non-vesicular materials.

      Response: As we reported previously (Kugeratski et al., Nature Cell Biology, 2021), the commonly used exosomal markers, such as CD9, CD63 and CD81 exhibit heterogeneity with respect to presence and abundance in the exosomes derived from different cell types. For example, CD63 demonstrated remarkably lower abundance in the exosomes derived from Raji cell lines. In our study, the detection rate of CD63 (< 50%) is quite low in the tissue-derived exosomes, which is consistent with the observations made in another proteomics based study (Hoshino et al., Cell, 2020). Therefore, relying solely on these markers is inadequate for the comprehensive characterization of EVs as exosomes. Therefore, we conducted this study to identify universal protein markers of exosomes by integrating data from multiple sources, thereby circumventing potential confounding effects due to their diverse origins and other technical differences.

      (2) Line 90: IPA is not prior in the manuscript.

      Response: We provided the full definition of IPA (Ingenuity Pathway Analysis) in the revised manuscript.

      (3) Figure 2B: Considering the large number of variables, it is unsurprising that the 2D PCA (Principal Component Analysis) falls short in the classification task. Including a few additional dimensions (principal components) might have the potential to better distinguish the cancer groups from the control group.

      Response: Thank you for this insightful query. The purpose of utilizing PCA here is to appreciate the heterogeneity associated with exosomes from different studies. While we acknowledge that additional dimensions may be more useful in distinguishing between cancer and control exosomes, we believe that derived performance will remain inferior to the machine learning approach we developed here.

      (4) Figure 2D: Exosomes primarily derive from multivesicular bodies, rather than the plasma membrane. It remains unclear why the authors focus specifically on proteins in the plasma membrane. Is it intended to encompass all membrane proteins? Clarification is needed on this point.

      Response: A good point. This study attempted to identify protein biomarkers of exosomes originating from different sources. Our approach involved considering proteins present on the plasma membrane as potential biomarkers also because many of them have been detected on the surface of exosomes.

      (5) Figure 2F: The 18 identified proteins are also abundantly present in control cells, not solely in cancer-derived "exosomes." The statement in line 104 is misleading in this regard.

      Response: We apologize for the misleading sentence. We have revised the statement to state that “In total, we identified a set of 18 exosome protein markers that are present at a higher abundance in all exosomes examined”.

      (6) Figure 3B: Considering the definition of exosomes, CD63 and TSG101 should be present in all samples, and their absence raises concerns.

      Response: We understand the concern of the reviewer. In this Figure, we analyzed CD63 and TSG101 in tissue-derived exosomes. Our results are consistent with the previous study also shows the paucity of these makers in the tissue-derived exosomes (Hoshino et al., Cell, 2020). Our study highlights that CD63 and TSG101 cannot always identify exosomes from diverse cell lines and tissues. Such initial observations motivated us to conduct this study to identify the universal biomarkers of exosomes across different sources.

      (7) Figure 6G&H: Achieving an accuracy of 80% cannot be deemed "excellent."

      Response: We employed the word “excellent” in line 225 to describe the sensitivity and specificity associated with AUROC.

      (8) Other comments on methods: The manuscript lacks an explanation of the neural network structure and why it outperforms other methods. Additionally, details about the calculation of MI (mutual information), IPA, and other methods should be provided.

      Response: This is a good suggestion but in this work we did not employ the neural networks for the analysis. We provided additional details and explanations regarding the methodology for mutual information score calculation, as well as insights into the improved use of IPA and other relevant methods in the revised manuscript.

      Reviewer #2:

      Summary:

      This is a fine work on the development of computational approaches to detect cancer through exosomes. Exosomes are an emerging biomarker resource and have attracted considerable interests in the biomedical field. Kalluri and co-workers collected a large sample pool and used random forest to identify a group of protein markers that are universal to exosomes and to cancer exosomes. The results are very exciting and not only added new knowledge in cancer research but also a new and advanced method to detect cancer. Data was presented very nicely and the manuscript was well written.

      Strengths:

      Identified new biomarkers for cancer diagnosis via exosomes.

      Developed a new method to detect cancer non-invasively.

      Results were presented nicely and manuscript were well written.

      Weaknesses:

      N/A.

      Response: We appreciate the the enthusiastic assessment of our study by the reviewer.

      Reviewer #3:

      In the current study, Li et al. address the difficulty in early non-invasive cancer diagnosis due to the limitations of current diagnostic methods in terms of sensitivity and specificity. The study brings attention to exosomes - membrane-bound nanovesicles secreted by cells, containing DNA, RNA, and proteins reflective of their originating cells. Given the prevalence of exosomes in various biological fluids, they offer potential as reliable biomarkers. Notably, the manuscript introduces a new computational approach, rooted in machine learning, to differentiate cancers by analyzing a set of proteins associated with exosomes. Utilizing exosome protein datasets from diverse sources, including cell lines, tissues, and various biological fluids, the study spotlights five proteins as predominant universal exosome biomarkers. Furthermore, it delineates three distinct panels of proteins that can discern cancer exosomes from non-cancerous ones and assist in cancer subtype classification using random forest models. Impressively, the models based on proteins from plasma, serum, or urine exosomes achieve AUROC scores above 0.91, outperforming other algorithms such as Support Vector Machine, K Nearest Neighbor Classifier, and Gaussian Naive Bayes. Overall, the study presents a promising protein biomarker signature tied to cancer exosomes and proposes a machine learning-driven diagnostic method that could potentially revolutionize non-invasive cancer diagnosis.

      Response: We appreciate this positive assessment of our work.

      (1) The authors should clarify why they focused solely on protein markers. Why weren't RNA transcripts also considered? Do the authors see value in incorporating RNA/micro RNA transcripts to enhance diagnostic capabilities?"

      Response: This is a very important point for further consideration. The current datasets for exosomal proteins are extensive and generally proteins might offer distinct advantages in cancer diagnostics compared to nucleic acids due to their stability in exosomes and extended half-life (Schey et al., Methods, 2015). We do agree that the power of analysis can only get better if also add DNA, RNAs and other constituents and we hope to pursue such analysis in the future.

      (2) Can the identified exosomal markers also be evaluated as prognostic indicators?

      Response: We appreciate this intriguing question. Indeed, proteins such as apolipoprotein E (APOE) may serve as a potential prognostic marker in various cancers (Ren et al., Cancer Medicine, 2019). APOE is being extensively studied as a prognostic and diagnostic marker for multiple cancer types, including colorectal cancer (Martin et al., BMC Cancer, 2014), gastric cancer (Sakashita et al., Oncology Reports, 2008), pancreatic cancer (Chen et al., Medical Oncology, 2013; Xu et al., Tumor Biology, 2016), and human hepatocellular carcinoma (Yokoyama et al., International Journal of Oncology, 2006). In these studies, APOE levels were found to be elevated in the serum of cancer patients and correlated with survival outcomes.

      (3) The discussion should emphasize if the identified protein markers are tumor-specific or if they indicate, for instance, the patient's immune reaction to the tumor.

      Response: A good point. We believe that the identified biomarkers are tumor-specific and a significant number of these proteins have been previously associated with tumor initiation and progression. Further studies will likely identify immune response-related biomarkers when more in-depth tumor-level analyses are performed.

      References:

      Chen, J., Chen, L. J., Yang, R. B., Xia, Y. L., Zhou, H. C., Wu, W., Lu, Y., Hu, L. W., & Zhao, Y. (2013). Expression and clinical significance of apolipoprotein E in pancreatic ductal adenocarcinoma. Med Oncol, 30(2), 583. https://doi.org/10.1007/s12032-013-0583-y

      Hoshino, A., Kim, H. S., Bojmar, L., Gyan, K. E., Cioffi, M., Hernandez, J., Zambirinis, C. P., Rodrigues, G., Molina, H., Heissel, S., Mark, M. T., Steiner, L., Benito-Martin, A., Lucotti, S., Di Giannatale, A., Offer, K., Nakajima, M., Williams, C., Nogues, L., . . . Lyden, D. (2020). Extracellular Vesicle and Particle Biomarkers Define Multiple Human Cancers. Cell, 182(4), 1044-1061 e1018. https://doi.org/10.1016/j.cell.2020.07.009

      Kugeratski, F. G., Hodge, K., Lilla, S., McAndrews, K. M., Zhou, X., Hwang, R. F., Zanivan, S., & Kalluri, R. (2021). Quantitative proteomics identifies the core proteome of exosomes with syntenin-1 as the highest abundant protein and a putative universal biomarker. Nat Cell Biol, 23(6), 631-641. https://doi.org/10.1038/s41556-021-00693-y

      Martin, P., Noonan, S., Mullen, M. P., Scaife, C., Tosetto, M., Nolan, B., Wynne, K., Hyland, J., Sheahan, K., Elia, G., O'Donoghue, D., Fennelly, D., & O'Sullivan, J. (2014). Predicting response to vascular endothelial growth factor inhibitor and chemotherapy in metastatic colorectal cancer. BMC Cancer, 14, 887. https://doi.org/10.1186/1471-2407-14-887

      Ren, L., Yi, J., Li, W., Zheng, X., Liu, J., Wang, J., & Du, G. (2019). Apolipoproteins and cancer. Cancer Med, 8(16), 7032-7043. https://doi.org/10.1002/cam4.2587

      Sakashita, K., Tanaka, F., Zhang, X., Mimori, K., Kamohara, Y., Inoue, H., Sawada, T., Hirakawa, K., & Mori, M. (2008). Clinical significance of ApoE expression in human gastric cancer. Oncol Rep, 20(6), 1313-1319. https://www.ncbi.nlm.nih.gov/pubmed/19020708

      Schey, K. L., Luther, J. M., & Rose, K. L. (2015). Proteomics characterization of exosome cargo. Methods, 87, 75-82. https://doi.org/10.1016/j.ymeth.2015.03.018

      Xu, X., Wan, J., Yuan, L., Ba, J., Feng, P., Long, W., Huang, H., Liu, P., Cai, Y., Liu, M., Luo, J., & Li, L. (2016). Serum levels of apolipoprotein E correlates with disease progression and poor prognosis in breast cancer. Tumour Biol. https://doi.org/10.1007/s13277-016-5453-8

      Yokoyama, Y., Kuramitsu, Y., Takashima, M., Iizuka, N., Terai, S., Oka, M., Nakamura, K., Okita, K., & Sakaida, I. (2006). Protein level of apolipoprotein E increased in human hepatocellular carcinoma. Int J Oncol, 28(3), 625-631. https://www.ncbi.nlm.nih.gov/pubmed/16465366

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study presents fundamental new insights into vesicular monoamine transport and the binding pose of the clinical drug tetrabenazine (TBZ) to the mammalian VMAT2 transporter. Specifically, this study reports the first structure for the mammalian VMAT (SLC18) family of vesicular monoamine transporters. It provides insights into the mechanism by which this inhibitor traps VMAT2 into a 'dead-end' conformation. The structure also provides some evidence for a novel gating mechanism within VMAT2, which may have wider implications for understanding the mechanism of transport in the wider SLC18 family.

      Strengths:

      The structure is high quality, and the method used to determine the structure via fusing mVenus and the anti-GFP nanobody to the amino and carboxyl termini is novel. The binding and transport data are convincing and provide new insights into the role of conserved side chains within the SLC18 members. The binding position of TBZ is of high value, given its role in treating Huntington's chorea and for being a 'dead-end' inhibitor for VMAT2.

      We thank reviewer #1 for their constructive comments and input which we feel has greatly improved the manuscript.

      Reviewer #2 (Public Review):

      This public review is the same review that was posted earlier and has not been updated in response to our comments or to the revised manuscript. Please see our earlier response to these comments. We thank reviewer #2 for their input and we have incorporated many of these suggestions into our revised manuscript. With regard to the question of ‘how TBZ got there’, we have revised this sentence in the discussion to be more speculative. As pointed out earlier, our interpretation of the structure is based on a wealth of experimental and structural data which support our interpretations. Thus, our conclusions have not been overstated. This has been explained in our earlier public response and these key studies have been cited throughout the manuscript. We also note that reviewer #3 found the AlphaFold comparisons to be quite meaningful.

      Overview:

      As a report of the first structure of VMAT2, indeed the first structure of any vesicular monoamine transporter, this manuscript represents an important milestone in the field of neurotransmitter transport. VMAT2 belongs to a large family (the major facilitator superfamily, MFS) containing transporters from all living species. There is a wealth of information relating to the way that MFS transporters bind substrates, undergo conformational changes to transport them across the membrane and couple these events to the transmembrane movement of ions. VMAT2 couples the movement of protons out of synaptic vesicles to the vesicular uptake of biogenic amines (serotonin, dopamine and norepinephrine) from the cytoplasm. The new structure presented in this manuscript can be expected to contribute to an understanding of this proton/amine antiport process.

      The structure contains a molecule of the inhibitor TBZ bound in a central cavity, with no access to either luminal or cytoplasmic compartments. The authors carefully analyze which residues interact with bound TBZ and measure TBZ binding to VMAT2 mutated at some of those residues. These measurements allow well-reasoned conclusions about the differences in inhibitor selectivity between VMAT1 and VMAT2 and differences in affinity between TBZ derivatives.

      The structure also reveals polar networks within the protein and hydrophobic residues in positions that may allow them to open and close pathways between the central binding site and the cytoplasm or the vesicle lumen. The authors propose involvement of these networks and hydrophobic residues in coupling of transport to proton translocation and conformational changes. However, these proposals are quite speculative in the absence of supporting structures and experimentation that would test specific mechanistic details.

      Critique:

      Although the structure presented in this MS is clearly important, I feel that the authors have overstated several of the conclusions that can be drawn from it. I don't agree that the structure clearly indicates why TBZ is a non-competitive inhibitor; the proposal that specific hydrophobic residues function as gates will depend on lumen- and cytoplasm-facing structures for verification; the polar networks could have any number of functions - indeed it would be surprising if they were all involved in proton transport. Several of these issues could be resolved by a clearer illustration of the data, but I believe that a more rigorous description of the conclusions and where they fall between firm findings and speculation would help the reader put the results in perspective.

      Non-competitive inhibition occurs when the action of an inhibitor can't be overcome by increasing substrate concentration. The structure shows TBZ sequestered in the central cavity with no access to either cytoplasm or lumen. The explanation of competitive vs non-competitive inhibition depends entirely on how TBZ got there. If it bound from the cytoplasm, cytoplasmic substrate should have been able to compete with TBZ and overcome the inhibition. If it bound from the lumen, or from within the bilayer, cytoplasmic substrate would not be able to compete, and inhibition would be non-competitive. The structure does not tell us how TBZ got there, only that it was eventually occluded from both aqueous compartments and the bilayer.

      The issue of how VMAT2 opens access to the central binding site from luminal and cytoplasmic sides is an important and interesting one, and comparison with other MFS structures in cytoplasmic-open or extracellular/luminal-open is a very reasonable approach. However, any conclusions for VMAT2 should be clearly indicated as speculative in the absence of comparable open structures of VMAT2. As a matter of presentation, I found the illustrations in ED Fig. 6 to be less helpful than they could have been. Specifically, illustrations that focus on the proposed gates, comparing that region of the new structure with the corresponding region of either VGLUT or GLUT4 would better help the reader to compare the position of the proposed gate residues with the corresponding region of the open structure. I realize that is the intended purpose of ED Fig. 6b and 6c, but currently, those show the entire protein and a focus on the gate regions might make the proposed gate movements clearer. I also appreciate the difference between the Alphafold prediction and the new structure, but I'm not convinced that ED Fig. 6a adds anything helpful.

      The polar networks described in the manuscript provide interesting possibilities for interactions with substrates and protons whose binding to VMAT2 must control conformational change. Aside from the description of these networks, there is little evidence presented to assess the role of these networks in transport. Are the networks conserved in other closely related transporters? How could the interaction of the networks with substrate or protons affect conformational change? Of course, any potential role proposed for the networks would be highly speculative at this point, and any discussion of their role should point out their speculative nature and the need for experimental verification. Some speculation, however, can be useful for focusing the field's attention on future directions. However, statements in the abstract (three distinct polar networks... play a role in proton transduction.) and the discussion (...are likely also involved in mediating proton transduction.) should be clearly presented as speculation until they are validated experimentally.

      The strongest aspect of this work (aside from the structure itself) is the analysis of TBZ binding. I will comment on some minor points below, but there is one problematic aspect to this analysis. The discussion on how TBZ stabilizes the occluded conformation of VMAT2 is premature without structures of apo-VMAT2 and possibly structures with other ligands bound. We don't really know at this point whether VMAT2 might be in the same occluded conformation in the absence of TBZ. Any statements regarding the effect of interactions between VMAT2 and TBZ depend on demonstrating that TBZ has a conformational effect. The same applies to the discussion of the role of W318 on conformation and to the loops proposed to "occlude the luminal side of the transporter" (line 131).

      The description of VMAT2 mechanism makes many assumptions that are based on studies with other MFS transporters. Rather than stating these assumptions as fact (VMAT2 functions by alternating access...), it would be preferable to explain why a reader should believe these assumptions. In general, this discussion presents conclusions as established facts rather than proposals that need to be tested experimentally.

      The MD simulations are not described well enough for a general reader. What is the significance of the different runs? ED Fig. 4d is not high enough resolution to see the details.

      Reviewer #3 (Public Review):

      Summary:

      The vesicular monoamine transporter is a key component in neuronal signaling and is implicated in diseases such as Parkinson's. Understanding of monoamine processing and our ability to target that process therapeutically has been to date provided by structural modeling and extensive biochemical studies. However, structural data is required to establish these findings more firmly.

      Strengths:

      Dalton et al resolved a structure of VMAT2 in the presence of an important inhibitor, tetrabenazine, with the protein in detergent micelles, using cryo-EM and with the aid of protein domains fused to its N- and C-terminal ends, including one fluorescent protein that facilitated protein screening and purification. The resolution of the maps allows clear assignment of the amino acids in the core of the protein. The structure is in good agreement with a wealth of experimental and structural prediction data, and provides important insights into the binding site for tetrabenazine and selectivity relative to analogous compounds. The authors provide additional biochemical analyses that further support their findings. The comparison with AlphaFold models is enlightening.

      We appreciate this summary and thank reviewer #3 for their helpful suggestions to improve the manuscript.

      Weaknesses:

      The authors follow up their structures with molecular dynamics simulations of the tetrabenazine-bound state, and test several protonation states of acidic residues in the binding pocket, but not all possible combinations; thus, it is not clear the extent to which tetrabenazine rearrangements observed in these simulations are meaningful. Additional simulations of the substrate dopamine docked into this structure were also carried out, although it is unclear whether this "dead-end" occluded state is a relevant state for dopamine binding. The authors report release of dopamine during these simulations, but it is notable that this only occurs when all four acidic binding site residues were protonated and when an enhanced sampling approach was applied.

      As an occluded neurotransmitter bound structure has yet to be solved experimentally, it is not possible to address whether this state resembles the docked dopamine structure. However, it is reasonable to hypothesize that this is a relevant state for dopamine binding and if so, these simulations would be of great interest. The MD simulations which were performed are logical, based on the calculated pKa of the residues and the known pH of the vesicle lumen (5.5). Note that we have carried out a total of more than 2 microseconds of simulations, which required a significant computing time/memory allocation for the current runs in explicit water and membrane. To investigate all possible combinations, it would require at least 16 independent simulations, to be performed in duplicates, to vary protonation status of the four highlighted acidic residues alone, not including proper experimental replicates. We do not believe this to be a feasible suggestion, nor necessary given that the selected combinations were based on rational evaluation of on-path amino acids that were assessed to be potentially protonated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor for organizing the review of our manuscript. We have carefully read and analyzed the reviewers’ comments, addressed each criticism point-by-point as outlined below, and modified the manuscript and figures accordingly. In this regard, we would also like to take the opportunity to thank both reviewers for their thoughtful suggestions for improvement of our manuscript. We believe that our manuscript has improved as a result, and hope that it is now suitable for publication.

      Public Reviews:

      Reviewer #1 (Public Review):

      Aiming at the problem that Staphylococcus aureus can cause apoptosis of macrophages, the author found and verified that drug (R)-DI-87 can inhibit mammalian deoxycytidine kinase (dCK), weaken the killing effect of staphylococcus aureus on macrophages, and reduce the apoptosis of macrophages. And increase the infiltration of macrophages to the abscess, thus weakening the damage of Staphylococcus aureus to the host. This work provides new insights and ideas for understanding the effects of Staphylococcus aureus infection on host immunity and discovering corresponding therapeutic interventions.

      The logic of the study is commendable, and the design is reasonable.

      Some data related to the conclusion of the paper need to be supplemented, and some experimental details need to be described.

      Response: We thank the reviewer for the positive feedback along with the detailed and knowledgeable analysis of this paper. Specific details and comments on all raised concerns can be found below.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Winstel and colleagues test if the deoxycytidine kinase inhibitor, (R)-DI-87 provides therapeutic benefit during infection with Staphylococcus aureus. The premise behind the current work is a series of prior studies that found that S. aureus can disable functional immune clearance by generating NET-derived deoxyribonucleosides to induce macrophage apoptosis via purine salvage. Here, the authors use in vitro and in vivo experiments with (R)-DI-87 to demonstrate that inhibition of deoxycytidine kinase prevents S. aureus-induced deoxyribonucleoside-mediated macrophage cell death, to bolster immune cell function and promote more effective clearance during infection. The authors conclude that (R)-DI-87 represents and potentially important Host-Directed Therapy (HDT) with good potential to promote natural clearance of infection without targeting the bacterium. Overall, the study represents an important next step in the exploration of purine salvage and deoxyribonucleoside toxicity as a targetable pathway to bolster infection clearance and provides early-stage evidence of the therapeutic potential of (R)-DI-87 during S. aureus infection.

      Response: We thank the reviewer for the thoughtful suggestions for improvement of our manuscript. Specific details and comments on all raised concerns can be found below.

      Strengths:

      The study has several strengths that support its conclusions:

      (1) Well-controlled in vitro studies that firmly establish (R)-DI-87 is capable of blocking deoxyribonucleoside-mediated apoptosis of immune cell lines and primary cells.

      (2) Solid evidence to support that administration of (R)-DI-87 can have therapeutic benefits during infection (reduced number of abscesses and reduced CFU).

      (3) Controls included to ascertain the degree to which (R)-DI-87 might have secondary effects on immune cell distribution.

      (4) Controls included to ascertain whether or not (R)-DI-87 has intrinsic antibacterial properties.

      Weaknesses:

      However, there are several important weaknesses related to the rigor of the research and the conclusions drawn. The most relevant weaknesses noted by this reviewer are:

      (1) Drawing firm conclusions about the therapeutic potential of (R)-DI-87 using only S. aureus strain Newman, a methicillin-susceptible S. aureus, that while a clinical isolate is not clearly representative of the strains of S. aureus causing infection in hospitals and communities. Newman also harbors an unusual mutation in a regulator that dramatically changes virulence factor gene expression. While the data with Newman remains valuable, the absence of consideration of other strains, including MRSA, makes it more difficult to support the relatively broad conclusions about therapeutic potential made by the authors.

      Response: We assume that this is a misunderstanding. S. aureus Newman is a patient-derived isolate and not a regulator mutant and/or laboratory strain (Duthie and Lorenz LL 1952, J Gen Microbiol 6(1-2), 95107). Its genome is fully sequenced (Baba et al. 2008, J Bacteriol 190(1):300-10) and it is highly virulent in mouse or human ex vivo models (e.g. Alonzo 3rd et al. 2013, Nature 493(7430):51-5.; DuMont et al. 2011, Mol Microbiol 79(3):814-25; Skaar et al. 2004, Science 305(5690):1626-8). Moreover, S. aureus Newman has served as a gold standard to study abscess formation in the past (e.g. Thammavongsa et al. 2013, Science 342(6160):863-6; Cheng et al. 2009, FASEB J 23(10):3393-404; Corbin et al. 2008, Science 319(5865):962-5) and has further also been used multiple times to test the therapeutic efficacy of antimicrobial or anti-infective agents in various animal models of infectious disease (e.g. Buckley et al. 2023, Cell Host Microbe 31(5):751-765.e11; Zhang et al. 2014, PNAS 111(37):13517-22; Richter et al. 2013, PNAS 110(9):3531-6). Apart from this, it is crucial to note that methicillin-sensitive isolates such as S. aureus Newman are typically more frequently isolated in hospitals as compared to MRSA. Specifically, public health system- and population-based surveillance studies clearly indicate that annual incidence rates for MSSA infections are dominant over those associated with MRSA infections (e.g. Gagliotti et al. 2021, Euro Surveill 26(46):2002094; Jackson et al. 2020, Clin Infect Dis 70(6):1021-1028; Laupland et al. 2013, Clin Microbiol Infect 19(5):465-71), even in groups at elevated risk (e.g. McMullan et al. 2016, JAMA Pediatr et al., 170(10):979-986; Ericson et al. 2015, JAMA Pediatr 169(12):1105-11). Although we understand and agree with the reviewer that certain MRSA clones can be a dominant cause of staphylococcal disease in specific geographic areas, we believe that S. aureus Newman adequately reflects staphylococcal isolates that cause the majority of infections in humans. In this regard, we would also like to highlight once more that (R)-DI-87 targets host dCK and not the bacterium. Accordingly, the antibiotic resistance status of S. aureus is not expected to impact our main findings and conclusions as (R)-DI-87 exclusively inhibits dCK, a key element of the mammalian purine salvage pathway.

      (2) In vitro (R)-DI-87 efficacy studies with dAdo and dGuo are strong, however, the authors do not test the in vitro efficacy of (R)-DI-87 using S. aureus. They have done this type of work in prior studies (See doi: 10.1073/pnas.1805622115 - Figure 5). If included it would greatly strengthen their argument that (R)-DI87 is directly affecting the S. aureus --> Nuclease --> AdsA macrophage-killing pathway. Without it, the evidence provided remains indirect, and several conclusions may be overstated.

      Response: We highly appreciate this comment and agree with the reviewer that such an experiment would support our main findings. Thus, we have performed additional experiments and took advantage of a previously described approach (Tantawy et al. 2022, Front Immunol 13:847171) to demonstrate that (R)DI-87-mediated inhibition of host dCK enhances macrophage survival upon treatment with culture media that had been conditioned by incubation with adsA-proficient or adsA-deficient staphylococci in the presence or absence of purine deoxyribonucleoside monophosphates. Our findings are described in the main text and in a new figure (Fig. 2K-L). Based on these new findings and together with our rAdsA-based approach (Fig. 2I-J), we are confident that (R)-DI-87 represents a suitable small molecule inhibitor of host dCK which can prevent host immune cell death induced by toxigenic products associated with the S. aureus Nuc/AdsA pathway.

      (3) Caspase-3 immunoblot experiments seem to suggest an alternative conclusion to what was made by the authors. They point out that Caspase-3 cleavage does not occur upon treatment with (R)-DI-87. However, the data seem to argue that there is almost no caspase-3 present in (R)-DI-87 treated cells (cleaved or uncleaved). Might this suggest that caspase-3 is not even produced when cells are not experiencing deoxyribonucleoside toxicity? Perhaps the authors could reconsider the interpretation of this data.

      Response: We believe that this is a misunderstanding. Our immunoblots (Fig. 3E-F) show only the processed forms of caspase-3. The antibody we have used can recognize full-length caspase-3 along with the p17 and p19 subunits that can result from cleavage. To clarify this point, we have slightly modified our main figure and provide the full immunoblots (Source data file) which clearly demonstrate that unprocessed caspase-3 (pro-caspase-3) is present in all samples. In this regard, we further note that caspase-3 can also form heterocomplexes with other proteins, presumably explaining some of the unknown bands in samples obtained from cells that have been exposed to death-effector deoxyribonucleosides. Additional bands are probably a result of cross-reactivity of the antibody and/or unspecific degradation of pro-caspase in cellular lysates.

      (4) There are some concerns over experimental rigor and clarity of the experimental design in the methods. The most important points noted by this reviewer are included here. (a.) There is no description of the number of replicates or representation of the Western blots and no uncropped blots are provided. (b.) the methods describing the treatment conditions for in vivo studies are not sufficiently clear. For example, it is hard to tell when (R)-DI-87 is first administered to mice. Is it immediately before the infection, immediately after, or at the same time? This has important implications for interpreting the results in terms of therapeutic potential. (c.) There are several statements made that (R)-DI-87 does not have a negative impact on the mice however, it is not sufficiently clear that the studies conducted are sufficient to make this broader claim that (R)-DI-87 has no impact on the animal, except as it relates to the distribution of immune cells, which is directly tested. (d.) there are no quantitative measures of apoptosis or macrophage infiltration, which impacts the rigor of these imaging experiments. (e.) only female mice are used in the in vivo studies. There is no justification provided for this choice; however, the rigor of the study design and the ability to draw conclusions about therapeutic potential is impacted in the absence of consideration of both sexes.

      Response: Thank you for raising these points here. (a) We have modified our figure legend and provide the full immunoblots (Source data file) in order to clarify this point. (b) Moreover, we now provide more experimental details on the treatment conditions that were used to administer (R)-DI-87 to mice (methods section). (c) Furthermore, we have conducted new experiments in order to demonstrate that administration of (R)-DI-87 has no impact on laboratory animals. Specifically, we provide new data along with additional text on organ cellularity following long-term exposure of mice to (R)-DI-87. In this regard, we have also applied our immuno-phenotyping approach to spleen tissues samples derived from mice that received (R)-DI-87 or vehicle. As outlined in our new results, neither developmental errors nor differences in lymphocyte development have been observed (new Fig. 4B-C; new supplementary Fig. 3). Together with our data on mouse body weight along with our immuno-phenotyping approach of blood cells (Fig. 4A and 4D) and the fact that (R)-DI-87 is extremely well tolerated in humans (personal communication; Kenneth A. Schultz, Trethera Corporation, Los Angeles, CA, USA), we are very confident that application of (R)-DI87 is safe and has no detrimental impact on the host. (d) Lastly, we would like to point out that due to the densely packed and extremely sticky cuff of immune cells within staphylococcal abscesses, it is technically not possible to extract enough abscess material required for a reliable quantification of apoptotic macrophages within infectious foci. Such an analysis would also not allow us to differentiate between lesion-infiltrating macrophages and macrophages that may reside at the periphery of the abscess. For these reasons, we have established a fluorescence microscopy-based approach to demonstrate increased macrophage infiltration rates into abscesses formed in organs of mice that have been treated with the dCK-specific inhibitor (R)-DI-87 (Fig. 5A-P). Nonetheless, we have slightly modified our figure and its legend in order to help the readership to localize S. aureus-derived tissue lesions and the periphery of abscesses in these images. (e) Finally, publicly available databases indicate that dCK is equally well expressed in various tissues in both sexes. Moreover, dCK is not encoded on a sex chromosome, neither in mice nor in humans. Thus, we believe that it is justified to test the in vivo efficacy of (R)-DI-87 in female mice. Nonetheless, we have conducted additional in vitro experiments to test whether (R)-DI-87 can protect male animal-derived BMDMs from death-effector deoxyribonucleosides in a manner similar to cells derived from female mice. As expected, we did not observe a sex-specific effect (new supplementary Fig. 5), and hope that this adequately addresses this point.

      (5) Animal studies show significant disease burden (CFU) even after administration of (R)-DI-87. Given the absence of robust clearance of infection, the author's claims read as an overstatement of the data. The authors may wish to reframe their conclusions to better highlight the potential benefit of this therapy at reducing severe disease but also to point out relevant limitations, especially considering that it does not lead to clearance in this model. In general, the consideration of the limitations of the proposed therapeutic approach, as uncovered by the data, is not present. A more nuanced consideration of the data and its interpretations, including both strengths and limitations, would greatly help to frame the study.

      Response: Thank you for raising this point here. To highlighting the limitations of our approach, we have modified several passages in the main text. Moreover, we have adjusted our discussion section accordingly.

      Reviewer #1 (Recommendations For The Authors):

      (1) In vivo experiments, the dose given to mice was 75mg/kg. How did the author determine the dose of this drug?

      Response: We thank the reviewer for this question, which gives us the chance to clarify this point. The experimental condition used to block host dCK in mice has been adopted from a previous publication (Chen et al. 2023, Immunology 168(1):152-169). To improve the overall quality of our current manuscript, we now included more background information addressing this point. Specifically, we have added additional in vivo and biochemical data along with more conclusive text to our results section to better explain the reason for the dose given to mice (new Fig. 4E).

      (2) The author established a mouse model of Staphylococcus aureus blood infection in vivo and divided four groups for related experiments. It is suggested that the authors should supplement the survival rate of mice in each group so that readers can understand the effect of the drug on the survival of mice with bloodstream infection.

      Response: While this is an interesting suggestion by the reviewer, we believe that this is beyond the scope of our study. In particular, the current study focused on analyzing the capacity of the dCK-specific inhibitor (R)-DI-87 to improve macrophage survival during staphylococcal abscess formation in an effort to lower bacterial loads in infected organ tissues. However, we agree with the reviewer that (R)-DI-87 might also help to improve further clinical syndromes of staphylococcal infections, including lethal bloodstream infection. We therefore modified parts of our discussion to address this point.

      (3) In the in vivo experiment, the author administered the drug by intragastric administration, but the treatment was for the bloodstream infection of Staphylococcus aureus, so the author needed to determine the actual effective concentration of the drug in the blood of mice.

      Response: We thank the reviewer for this comment and agree that inclusion of more background information and data would be a valuable addition to our manuscript. As outlined above, we have designed our in vivo experiments based on the methodology of a previous publication (Chen et al. 2023, Immunology 168(1):152-169). Similar to Chen and colleagues, we have also used a dose of 75 mg/kg of (R)-DI-87 that allows complete inhibition of host dCK in vivo. In this regard, we have now performed additional in vivo experiments to address this point. More precisely, we took advantage of a highly sensitive and LC-MS/MSbased method to measure accumulation of deoxycytidine, the natural substrate of host dCK, in mouse plasma upon administration of the dCK-specific inhibitor. As shown in our new Fig. 4E, administration of (R)-DI-87 at a dose of 75 mg/kg strongly increased deoxycytidine levels in mouse plasma thereby indicating that host dCK activity is completely blocked under these experimental conditions.

      (5) This work is to reduce the apoptosis of macrophages through drug inhibition of dck, but not directly inhibit the related virulence of Staphylococcus aureus. Therefore, it is suggested that the author modify the title to summarize the whole paper more accurately.

      Response: We agree with the reviewer that our manuscript’s title might be a bit misleading as (R)-DI-87 does not directly target the bacterium or staphylococcal virulence factors. Thus, we have modified the title of our revised manuscript to: “Targeting host deoxycytidine kinase mitigates Staphylococcus aureus abscess formation”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Adelus and colleagues investigates the snRNA sequencing of endothelial cells isolated from deceased heart donor aortic trimmings. From n=6 donors, the authors have identified 5 distinct endothelial cell (EC) populations. The expression levels of a set of genes are different among the different donors and different EC clusters. Furthermore, treatment with IL1B, TGFB, or ERGsi decreased the proportion of some of these clusters and increased others, with some migratory and ECM-producing capacity. Another interesting observation in this study is that IL-1B alone induces a shift in the clusters and that is different from the TGFB-induced cells. However, ex vivo analyses showed most of the TGFB-induced population matched the in vitro observations. Another interesting finding of the work is that the authors detected SNPs linked to chromatin accessibility to the set of genes identified within these EC populations.

      Strengths:

      Overall, the work is intriguing and has some novel aspects to it, especially the link between ECderived EndMT in culture and comparing that with ex vivo atherosclerotic samples.

      In summary, we thank we thank Reviewer #1 for raising questions that prompted new speculations and clarifications of our data. We hope this Reviewer will now find our revised manuscript suitable for publication.

      Weaknesses:

      The experiments are lacking in controls, the purity of the isolation, and the use of multiple donors (deceased hearts) to draw conclusions. The lack of validation of the work is a concern.

      We thank Reviewer #1 for raising these concerns. Controls were not available in the public in vivo data, likely due to the systemic nature of coronary artery disease (CAD) and the logistical difficulty in obtaining arterial samples from healthy participants. With respect to our in vitro data, controls were included in the design. We agree that it is critical to validate functions of endothelial cell (EC) populations with functional studies, and this is the subject of ongoing and future work. Regarding asymmetry of donors, we aimed to have at least three replicate donors per condition. In the study design, we had to load genetically different donors per 10x lane, which is why we utilized different donors for each condition. We address the purity of isolation in our response to Reviewer #2 below.

      Reviewer #2 (Public Review):

      This study by Adelus et al. profiled the transcriptome and chromatin accessibility in cultured human aortic endothelial cells (ECs) at single-cell resolution. They also stimulated these cells with EC-activating agents, such as IL1b, TGFB2, or si-EGR, to knock down this master transcription factor in ECs. The results show a subpopulation, EC3, with the highest plasticity and sensitivity to perturbations. The authors also reviewed and meta-analyzed three independent publicly available scRNA-seq datasets, identifying two distinct EC subpopulations. Additionally, they aligned CAD-related SNPs with open chromatin regions in EC subpopulations. This study provides fundamental evidence to enrich our understanding of vascular ECs and highlights potential subpopulations that may contribute to health and diseases. The work exhibits the potential impact in the field. While the manuscript is comprehensive, there are some concerns that should be addressed.

      (1) My major concern is whether EC4 is derived from ECs. It seems that EC4 showed a lesser reaction to those perturbations and had lower expression levels of EC marker genes. Did the authors evaluate the purity of their isolated HAECs? Please discuss the potential cell lineage mapping of EC4.

      We thank Reviewer #2 for raising the question on the purity of isolation. We have now included this in the Discussion:

      “A major question raised by this work is the origin of cells in the mesenchymal cluster EC4. We originally hypothesized this cluster was the result of EndMT, which led to our investigations as to whether we could leverage EndMT-promoting exposures (IL1B, TGFB2, siERG) in vitro observe an expansion of treated cells in the EC4 population. To our surprise, the EC4 population did not expand. If anything, these exposures reduced the proportion of cells in ECs (Figure 4). Nonetheless, it remains a possibility that EC4 represents cells that had undergone EndMT in vivo prior to culture and that the exposures we presented in vitro were not sufficient to elicit a complete EndMT transition. Another viable hypothesis is that cells in EC4 are of SMC origin and have persisted in culture alongside their EC counterparts. Cells used in this study were isolated by luminal collagenase digestion of explanted aortic segments and were tested at early passage for EC phenotypic markers including VWF expression, cobblestone morphology, and uptake of acetylated LDL. Notably, these rigorous metrics to ensure pure EC isolation occurred prior to our group’s studies. In addition, if some of the isolated cells had undergone EndMT in vivo prior to isolation, it would be nearly impossible to distinguish their cell of origin after isolation since their collective molecular phenotypes would appear as an SMC. Without lineage tracing, which is currently not possible in human tissue explants, it would not be possible to distinguish cell origin. Nonetheless, this remains an important issue that is the subject of ongoing investigations. What we can confidently discern from these data is that these distinct cell subpopulations respond differently to the disease-relevant exposures of IL1B, TGFB2, and ERG depletion.”

      (2) Although all the donors are de-identified, is there any information about the severity of their vascular impairment, particularly in the case of patient 5, who exhibits the unique EC5?

      All donors are de-identified, and we only have access to their genotypes. We have now clarified this in Methods, “Tissue Procurement and Cell Culture”:” Primary HAECs were isolated from eight de-identified deceased heart donor aortic trimmings (belonging to three females and five males of Admixed Americans, European, and East Asian ancestries) at the University of California Los Angeles Hospital as described previously (42) (Table S7 in the Data Supplement). The only clinically relevant information collected for each donor was their genotype (Methods, “Genotyping and Multiplexing Cell Barcodes for Donor Identification”).”

      (3) The meta-analysis of the published datasets is comprehensive. The identified EC heterogeneity corresponds to their in vitro data. I am wondering, in terms of transcriptome, is there any similarity between endo1 and EC1/EC2, and also endo2 and EC3/EC4?

      This was addressed in Results, “Ex Vivo-derived Module Score Analysis Reveals Differences among In Vitro EC Subtypes and EndMT Stimuli”: “Cells scoring high for Endo1 are concentrated in the in vitro EC1 cluster, while cells scoring high in Endo2 are concentrated to the in vitro EC3 locale (Figure S7B-E in the Data Supplement).”

      (4) The in vitro data indicates that EC3 shows the highest plasticity and sensitivity to perturbations, which may act as the major subtype of ECs responding to risk factors. It's very interesting that CAD-related SNPs do not seem to be enriched in EC3. Please discuss this discrepancy.

      We thank Reviewer #2 for bringing up this interesting point, which we have now included in our Discussion: “While EC3 was found to be more sensitive to perturbations in our in vitro experiments, we did not expect to see CAD-related SNPs enriched in EC3 because plasticity does not necessarily imply a pathological process. Moreover, while EC3 and EC4 both have mesenchymal phenotypes, EC3 may represent a reversible state that is lacking in EC4. This hypothesis would explain the enrichment of EC4, but not EC3, in CAD-related SNPs.”

      (5) The last sentence in the legend of Figure 1 seems incomplete: 'Module scores are generated for each cell barcode with Seurat function AddModuleScore().'

      We have made changes to this sentence so that it now reads: “Module scores are generated for each cell barcode with the Seurat function AddModuleScore().”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript by Adelus and colleagues investigates the snRNA sequencing of endothelial cells isolated from deceased heart donor aortic trimmings. From n=6 donors, the authors have identified 5 distinct endothelial cell (EC) populations. The expression levels of a set of genes are different among the different donors and different EC clusters. Furthermore, treatment with IL1B, TGFB, or ERGsi decreased the proportion of some of these clusters and increased others, with some migratory and ECM-producing capacity. Another interesting observation in this study is that IL-1B alone induces a shift in the clusters and that is different from the TGFB-induced cells. However ex vivo analyses showed most of the TGFB-induced population are the ones that matched the in vitro observations. Another interesting finding of the work is that the authors detected SNPs linked to chromatin accessibility to the set of genes identified within these EC populations. Overall, the work is intriguing and has some novel aspects to it, especially the link between EC-derived EndMT in culture and comparing that with ex vivo atherosclerotic samples. However, the experiments are lacking in controls, the purity of the isolation, and the use of multiple donors (deceased hearts) to draw conclusions. The lack of validations for the work is a huge concern. Additional major and minor concerns are:

      Major concerns:

      (1) Abstract: line 15: ECs are a major cell type in atherosclerosis progression - That is a bold statement: What about macrophages and VSMCs?

      We have made changes to this sentence so that it now reads: “Endothelial cells (ECs), macrophages, and vascular smooth muscle cells (VSMCs) are major cell types in atherosclerosis progression, and heterogeneity in EC sub-phenotypes are becoming increasingly appreciated.”

      (2) Methods: The cells were isolated from the deceased heart by a device? What kind of device? Is it a standard method, showing a figure or data suggesting the purity of the isolates. Also, the authors mentioned that they assessed EC function, but no single figure suggests that. Why were the cells treated with fibronectin?

      We thank Reviewer #1 for bringing this to our attention. We did not isolate and identify the cells ourselves. This was done in a prior study as described in reference 41. The only function of the device was to hold the aortic explanted tissue in place so the luminal surface of the ECs could be digested with collagenase. We have made edits to clarify these points in Methods, “Tissue Procurement and Cell Culture”: “HAECs were isolated from the luminal surface of the aortic trimmings using collagenase, and identified by Navab et al. using their typical cobblestone morphology, presence of Factor VIII-related antigen, and uptake of acetylated LDL labeled with 1,1’-dioctadecyl-1-3,3,3’,3’-tetramethyl-indo-carbocyan-ine perchlorate (Di-acyetl-LD) (42).”

      (3) Why did the authors elect to treat each donor cell with different treatment types and different concentrations, also why 1ng/ml of IL-1B?

      We have addressed the study design asymmetry above. We chose the treatments because we questioned whether HAECs responded heterogeneously to these stimuli. We were interested in using these stimuli, because they have previously been used in vitro to induce EndMT and/or inflammation, two major pathophysiological processes in CAD. This is outlined in the Introduction: “We also quantified single cell responses to three perturbations known to be important in EC biology and atherosclerosis. The first was activation of transforming growth factor beta (TGFB) signaling, which is a hallmark of phenotypic transition and a regulator of EC heterogeneity (20, 30). The second was stimulation with the pro-inflammatory cytokine interleukin-1 beta (IL1B), which has been shown to model inflammation and EndMT in vitro (31-35), and whose inhibition reduced adverse cardiovascular events in a large clinical trial (36). The third perturbation utilized in our study was knock-down of the ETS related gene (ERG), which encodes a transcription factor of critical importance for EC fate specification and homeostasis (37-41).”

      (4) The justification for comparing the EC population in ERGsi is unclear? This was detected as the highest in EC2 but EC2 is not the main cell type across the donors.

      We include a justification for comparing the EC populations with siERG in the Introduction:

      “There are notable benefits and limitations for studying heterogeneity using in vitro and in vivo approaches in atherosclerosis research. In vitro approaches provide unique opportunities for interrogating consequences of genetic and chemical perturbations in highly controlled environments and are adept at identifying mechanistic relationships on accelerated timelines.”

      …and…

      “We… quantified single cell responses to three perturbations known to be important in EC biology and atherosclerosis…The third perturbation utilized in our study was knock-down of the ETS related gene (ERG), which encodes a transcription factor of critical importance for EC fate specification and homeostasis (37-41).”

      Notably, we found the highest proportion of cells in EC3 with siERG, not EC2:

      The one cluster exhibiting increased proportions of cells upon EndMT perturbations was EC3, with 3 of 4 EC IL1B-exposed donors having increased proportions in EC3 (p = 0.08 by 2-sided paired t-test; Figure 3A), 4 of 5 TGFB2-exposed donors having increased proportions (p = 0.04 by 2-sided paired t-test; Figure 3A), and 3 of 3 donors having increased EC3 proportions upon ERG knock-down (Figure 3B).

      (5) The different proportions of clusters per donor and their responses are different. These donors are from deceased hearts, could the postmortem induce changes in the ECs? The presence of SMC pathways in their analysis may indicate SMC contamination within the isolation rather than EndMT?

      We have now included the possibility of postmortem effects in the Discussion:

      “We cannot exclude the possibility that EC3 is an EndMT cluster, although we would have expected more significant deviation from clusters EC1 and EC2. It is also possible that the postmortem could induce changes in the ECs, or that the duration and doses of perturbations chosen were not sufficient to elicit complete EndMT.”

      As aforementioned, we addressed the purity of isolation within the Discussion.

      (6) Figure 4A is confusing, what do the dots indicate and the intersection size mean? What is the difference between Figure 4 C and 4 E?

      We have added a description of rows and columns to the legend for Figure 4A:

      “(A), Upset plots of up- and down-regulated DEGs across EC subtypes with siERG (grey), IL1B (pink), and TGFB2 (blue). Upset plots visualize intersections between sets in a matrix, where the columns of the matrix correspond to the sets, and the rows correspond to the intersections. Intersection size represents the number of genes at each intersection.”

      Figure 4E depicts up- and down-regulated DEGs that are mutually exclusive and shared between IL1B and siERG in EC3, whereas Figure 4C depecits up- and down-regulated DEGs with IL1B alone compared to siSCR in EC2, EC3, and EC4. This is described within the legend for Figure 4C and Figure 4E:

      “C), PEA for EC2-4 up- and down-regulated DEGs with IL1B compared to control media… (E), PEA comparing up- and down-regulated DEGs that are mutually exclusive and shared between IL1B and siERG in EC3.”

      (7) VSMCS 5 in Figure 5 is interesting, but it could be contaminated with SMCs in your EC population and they are SMCs indeed with some mesenchymal transdifferentiation?

      As abovementioned, we addressed the purity of isolation within the Discussion.

      Minor concerns:

      (1) All growth supplements, kits, and reagents should be provided with their sources and catalogue numbers.

      Sources and catalogue numbers have now been added to the following Methods sections:

      “Tissue Procurement and Cell Culture”: “Cells were grown in culture in M-199 (ThermoFisher Scientific, Waltham, MA, MT-10-060-CV) supplemented with 1.2% sodium pyruvate (ThermoFisher Scientific, cat. no. 11360070), 1% 100X Pen Strep Glutamine (ThermoFisher Scientific, cat. no. 10378016), 20% fetal bovine serum (FBS, GE Healthcare, Hyclone, Pittsburgh, PA), 1.6% Endothelial Cell Growth Serum (Corning, Corning, NY, cat. no. 356006), 1.6% heparin, and 10µL/50 mL Amphotericin B (ThermoFisher Scientific, cat. no. 15290018). HAECs at low passage (passage 3-6) were treated prior to harvest every 2 days for 7 days with either 10 ng/mL TGFB2 (ThermoFisher Scientific, cat. no. 302B2002CF), IL1B (ThermoFisher Scientific, cat. no. 201LB005CF), or no additional protein, or two doses of small interfering RNA for ERG locus (siERG; Table S18 in the Data Supplement), or randomized siRNA (siSCR; Table S18 in the Data Supplement).”

      …and…

      “siRNA Knock-down, qPCR, and Western Blotting”: “Knockdown of ERG was performed as previously described (40) using 1 nM siRNA oligonucleotides in OptiMEM (ThermoFisher Scientific, cat. no. 11058021) with Lipofectamine 2000 (ThermoFisher Scientific, cat. no. 11668030).”

      (2) The quantification of western blot how?

      Methods, “siRNA Knock-down, qPCR, and Western Blotting” now reads: “Western blots were quantified using ImageJ (76).”

      (3) All the supplemental figures are listed incorrectly in the manuscript. For example, the authors refer to Figure S11B which should be S10. Please review the manuscript throughout to refer to the correct figures.

      We thank Reviewer #1 for bringing this to our attention. Figure S4 was missing, leading to incorrectly listed supplemental figures for Figures S4-S12. Figure S4 has now been included, and Figures S4-S12 are now listed correctly within the manuscript text.

      (4) Please refer to IL-1B as IL-1beta, same with TGFB.

      We have left the terms as is, since it is also routine to refer to IL-1beta as IL1B, and TGFbeta as TGFB.

      (5) here are typos throughout the manuscript, such as 4C, VW Fexpression, VWFand VCAM-1.

      We could not locate typos “VW Fexpression” or “VWFand VCAM-1”. We do not consider “4C” a typo, as it refers to the temperature at which the centrifuge was set to in Methods, “Nuclear Dissociation and Library Preparation”: “Samples were centrifuged at 500 rcf for 5 minutes at 4C…”

      (6) Please define the abbreviations: line 69 and also cite the source of the use of aSMA/PECAM1 as EndMT?

      We have now included abbreviation definitions and the cited source for ECs that co-express aSMA/PECAM-1 in atherosclerotic lesions within the Introduction: “These studies have described an unexpectedly large number of cells co-expressing pairs of endothelial and mesenchymal proteins, including fibroblast activating protein/von Willebrand factor (FAP/VWF), fibroblastspecific protein-1/VWF (FSP-1/VWF), FAP/platelet-endothelial cell adhesion molecule-1 (CD31 or PECAM-1), FSP-1/CD31 (20), phosphorylation of TGFB signaling intermediary SMAD2/FGF receptor 1 (p-SMAD2/FGFR1) (22), and α-smooth muscle actin (αSMA)/PECAM-1 (23).”

      (7) The changes in % cells in cluster per donor per condition in Figure 3 are interesting, have the authors observed a change of one cluster at the expense of another i.e. do they transdifferentiate into another with different treatments?

      Figure 3 shows that as percent of cells in EC3 go up with TGFB or IL1B, they go down in EC4 with these treatments. This has been added to the Discussion: “Moreover, as the percent of cells in EC3 go up with TGFB or IL1B, they go down in EC4, suggesting trans-differentiation from EC4 into EC3 with these perturbations.”

      (8) Functional analysis of these clusters with and without treatment is required to confirm the EndMT.

      We do not claim that the cells underwent EndMT. Rather, we use pro-EndMT perturbations previously described in the literature to test whether ECs respond heterogeneously to stimuli which are relevant to CAD. We found that EC subtype was a greater determinant of cell state than treatment.

      (9) No blank line at 266. The break is in the middle of the sentence, also cytoplasmic cytoplasmic ribosomal proteins (typo?).

      We have revised these sentences to read: “Shared IL1B- and siERG-upregulated genes were enriched in COVID-19 adverse outcome pathway (WP4891; p-value 1.9x10-9) (52). Shared IL1B- and siERG-attenuated genes are enriched in several processes involving ribosomal proteins, including ribosome, cytoplasmic (CORUM:306; p-value 3.3x10-7), cytoplasmic ribosomal proteins (WP477; p-value 5.3x10-7), and peptide chain elongation (R-HSA-156902; pvalue 5.9x10-7) (Figure 4E).”

      (10) The sentence in line 321 "These observations support ....of human, seems incomplete.

      We revised these sentences to read: “Expected pathway enrichments are observed for annotated cell types, including NABA CORE MATRISOME (M5884; p-value 4.8x10-41) for fibroblasts, blood vessel development (GO:0001568; p-value 5.6x10-33) for ECs, and actin cytoskeleton organization (GO:0030036; p-value 1.3x10-15) for VSMCs (Figure S5D-G in the Data Supplement). These observations support the diverse composition of human atherosclerotic lesions.”

      (11) What do the authors mean by (at least partially) line 444?

      We revised this sentence to read: “In fact, the limited correlation with ex vivo data supports this interpretation.”

      (12) Some unrelated data in the paper, like supplemental figure 10B and supplemental figure 11?

      These data are relevant to methods, and have been kept.

      Reviewer #2 (Recommendations For The Authors):

      We need this work to expand our knowledge of endothelial biology. Please address my concerns to further strengthen this work.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      Revised manuscript

      The authors have addressed most of my points, but I still have one outstanding concern about the statistics:

      My Original Question:

      I have a few concerns and questions that I would like to see addressed: 1) Figure 1L - the statistics are a little unusual here as the errors are across visual areas rather than across mice or hemispheres. This isn't ideal as ideally, we want to generalize the results across animals, not areas, and the results seem to be driven mostly by V1/RSC. I would like to see comparisons using mice as the statistical unit either in an ANOVA with areas as factors or post-hoc comparisons per area.

      Author Reply:

      Based on the assumption that visual cortex should respond to visual stimuli, we would have expected to find a difference between closed and open loop locomotion onset responses in all cell types in visual areas of cortex (a closed loop locomotion onset being the combination of locomotion and visual flow onset, while an open loop locomotion onset lacks the visual flow component). Thus, the first surprise was that in most cell types we found very little difference between these two locomotion onset types. Conversely, in Tlx3-positive L5 IT neurons the difference was apparent well outside of the visual areas of cortex (even though the difference was indeed strongest in V1/RSC). To quantify the extent to which closed and open loop locomotion onsets result in different activity patterns across dorsal cortex we performed the analyses shown in Figures 1L and 2. To make the point that the effect was observable on average across cortical areas, we used cortical area as a unit in Figure 1L. We have added the analysis shown in Figure 1L with mice as the statistical unit as Figure S4J and have added the ANOVA information to Table S1, as suggested.

      My revised question:

      The authors have only partially addressed my concerns here. I disagree with the authors that they were making a point about the effect being observable across visual areas. The primary statistical statement they are trying to make is that the similarity between open and closed-loop stimulation is different for Tlx mice, e.g. Line 122: "However, comparing locomotion onsets in mice that expressed GCaMP6 only in Tlx3 positive L5 IT neurons, we found that the activation pattern was strikingly different between closed and open loop conditions" and Line 172-3: "Thus, excitatory neurons of deep cortical layers exhibited the strongest differences between closed and open loop locomotion related activation". These statements are not correctly supported by the statistical analysis as presented in Figure 1L as it is the variability across mice that is relevant to draw this conclusion.

      In the example "However, comparing locomotion onsets in mice that expressed GCaMP6 only in Tlx3 positive L5 IT neurons, we found that the activation pattern was strikingly different between closed and open loop conditions (Figure 1D)" we talk about the example mouse shown. We have not changed phrasing here.

      We have, however, changed the way we talk about Figure 1L and S4J (the second example given by the reviewer), and have rephrased much of this paragraph. Please note, we have also changed Figure S4J to quantify the difference only for V1.

      This is partially addressed by Figure S4J where the authors show standard-errors across mice and report statistics across mice. In Table S1 the statistical test is reported to be a bootstrap test with mice as the statistical unit, however, according to line 985 this was a non-hierarchical bootstrap test. Does this mean that the authors resampled onsets without regard to which mouse they came from to regenerate the response-curves and recalculate the correlation coefficient? Or did they directly resample from the distribution of correlation coefficient values? I suspect the latter, but for some comparisons (e.g. Tlx3 vs PV) there are only two mice in one group, yielding two correlation coefficients, and resampling 2 values 10,000 times would lead to very biased statistics. Either way the approach is far from ideal. There is also no protection against multiple-comparisons in these tests.

      We have adapted Figure S4J to include only V1, where we find the largest effect (the text is adapted to reflect this) and have added individual data points as suggested in the following comment. The reviewer is correct that we created a bootstrap distribution by resampling correlation values. This means we are resampling 2, 3, 4, 6, 7, or 14 values depending on the comparison. This should now be clearer in the text. We agree that this is not ideal, but when using mice as a statistical unit, analysis is almost always underpowered. To the best of our knowledge, bootstrap resampling is the best approach to alleviate this problem. Regarding the concern for multiple comparisons: We have now adjusted the significance threshold in Figures 1L and S4J by dividing through the number of groups (here: 9 genotypes).

      The ANOVA reported in Table S1 for Figure S4J isn't described in the methods so I can't say what they did and it doesn't seem to be referred to in the text and is non-significant in any case. Figure S4J also only shows summary statistics whereas individual mice should be plotted. The correct statistical test is either a one-way ANOVA with one factor (genotype) with post-hoc tests between the Tlx3 genotype and the others with suitable multiple-comparisons corrections (this may be the non-significant test in table S1). Alternatively, a linear mixed effects model with Genotype as a fixed effect and Mouse as a random intercept term. This approach is more powerful as it would allow them to use data from all locomotion onsets, but it may struggle to fit datasets with only 2 members for certain genotypes. If they wish to make the more extended point that the pattern across visual areas differs between Tlx3 and other mice they could include 'Area' as another (fixed) factor in the design and look for an interaction with Genotype.

      The ANOVA was indeed a one-way ANOVA with one factor. We have added this information to the methods. As suggested, we have added individual data points to Figure S4J.

      I also agree with the other reviewers that the presentation of standard-errors in Figures 1F-K and elsewhere is somewhat misleading as these are s.e.m. across onsets without taking into account the hierarchical nature of the data. Across mice s.e.m. would give a more accurate view of the variability in the data across the population. I also understand that first averaging across onsets within mice before taking a grand-average throws away a lot of data and s.e.m.s will be considerably larger. The authors should consider linear mixed effects models as an optimal solution for estimating s.e.m. If this is not feasible then the authors could consider showing data from individual mice in a supplementary figure or at least reporting the number of onsets that came from each mouse.

      We have now changed all plots in which we show time course data of widefield calcium imaging to show a hierarchical bootstrap estimate of mean and 90% confidence interval of the mean estimate.

      Reviewer #2 (Recommendations For The Authors):

      Congratulations to the authors on the revision! The revised article has substantially improved, and I have no further comments. I am particularly reassured by the new hierarchical bootstrap analyses as well as by the new analysis with mouse as a statistical unit that reproduces the key finding from the analyses with region as a statistical unit. Moreover, the authors added a vehicle control condition which does not yield any results. Therefore, I have no further methodological concerns and removed my mention of this previous weakness from my public review. Also, the readability of the manuscript has much improved in the revised version. Congratulations again on this important work!

      We thank the reviewer for the help in improving the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Comments on rebuttal:

      (1) It is greatly appreciated that the authors have improved aspects of their statistics, I have revised my comments accordingly.

      We are happy to hear.

      (2) However, I should clarify my comments regarding statistical concerns were not merely pertaining to a given Figure (e.g. Figure 1) I was only using it as an example. The authors have redone aspects of their analysis using N = number of mice (for statistics/trace figures), but is there a reason they cannot do this for other problematic figures/traces in the manuscript?

      Prompted also by reviewer 1, we have changed all time course plots in the manuscript to show a hierarchical bootstrap estimate of mean and 90% confidence interval of mean.

      Using mice as a statistical unit throughout the manuscript unfortunately is not viable in most cases, as we simply do not have enough mice in our dataset and statistical tests based on mice would be underpowered. The manuscript currently contains data from 77 mice, and we would likely need multiples of that to do statistics over mice.

      For Figure 1 - I do take the point why regions are being used as the independent N (though the authors justification should be made more clearly in the manuscript) making an N of 12 (though I am less clear why the same region across 2 hemispheres is counted as 2 Ns instead of 1; are they really independent?). However, I am less clear as to the choice in N in other figures. Could the authors clarify this more explicitly in the manuscript.

      We use regions as a statistical unit in Figures 6 and 7, S6-S8. Regarding the independence of hemispheres, this depends on cell type and region. E.g. activity in left V1 exhibits a higher correlation with activity left V2am than with right V2 (see Figure 5). On average callosal pairs exhibit correlation levels comparable to near cortical neighbors. See also, other work on the topic, for example (Calhoun et al., 2023).

      Regarding choice of N in other figures, this is either “recording session” or “pairs of regions”. We have made this clearer in the figure legends. In the case of testing using recording sessions, the idea is that each recording session constitutes a measurement. Measurements in the same mouse are not independent, and hence we use hierarchical bootstrap for all testing on recording sessions. The choice of “pairs of regions” for the correlation analysis follows from the use of regions as a statistical unit.

      (3) Regarding using N = locomotion onsets (or other definitions other than N = mice) when deriving trace averages/SEMs (for example, as in Figure 1) is visually misleading for the reader as it masks the true variability of the data, and even more misleading given that the authors do necessarily use that definition of N in their statistical tests associated with the data (as the authors commented). Whilst the authors have shown some traces with N=mice for some data, is there a reason they cannot do this for all figures in the manuscript? At the very least the practice of using other definitions of N for the purpose of showing trace averages/SEMs should be justified in the MS.

      We have replaced all time course plots that used SEM over events (for example locomotion onsets or visual stimuli) with a hierarchical bootstrap estimate of mean and 90% confidence interval of the mean throughout the manuscript. See also response to comment 2 above, and to reviewer 1, comment 4.

      References

      Calhoun, G., Chen, C.-T., Kanold, P.O., 2023. Bilateral widefield calcium imaging reveals circuit asymmetries and lateralized functional activation of the mouse auditory cortex. Proc. Natl. Acad. Sci. U. S. A. 120, e2219340120. https://doi.org/10.1073/pnas.2219340120

    1. Author Response

      The following is the authors’ response to the original reviews.

      To Reviewer #1

      We sincerely appreciate the constructive and insightful comments provided by the reviewer. Their valuable suggestions have been meticulously considered, leading to comprehensive modifications within the article.

      In addition, we want to stress that we have implemented a significant additional modification by introducing a new figure (Fig. 6). This figure highlights the collaborative impact of FMRP and Map1B on the microtubular structure of migrating neurons. We firmly believe that this molecular elucidation of the migration phenotype constitutes a noteworthy addition to our work.

      Public Review

      (1) We have taken the necessary steps to enhance the material and methods section of our neuronal migration analysis. We apologize for any initial lack of detail, including the omission of information on sinuosity index and directionality radar. Regarding the query about speed, we want to clarify that it indeed encompasses the percentage of pausing time. The speed is calculated by dividing the total distance traveled by the cell by the total time it migrated.

      (2) We would like to provide a clarification regarding the statistical analysis in our figures. The figures now represent the median, and the legend indicates the median along with the interquartile range. This approach is in line with the use of non-parametric analysis for variables that do not adhere to a normal distribution. Regrettably, in the previous version, there was an oversight in the figure legends where the mean, along with the standard error of the mean, was incorrectly stated instead of the intended representation of the median. We sincerely apologize for any confusion this may have caused. Moving forward, the corrected legend now accurately reflects the statistical measures used in the analysis.

      The global Kruskal Wallis analysis, followed by Dunn’s post hoc analysis, does indeed indicate that Fmr1 KD globally replicates the Fmr1-null phenotype. However, we concur with the reviewer's point regarding directionality, and we apologize for any lack of precision in the initial version. Upon further analysis, we have identified a significant difference in directionality (Fisher test p < 0.001). This more pronounced directionality defect in the KD could potentially be indicative of a lack of compensation, a factor that may not be at play in the Fmr1 null context. We appreciate the opportunity to address this issue and our revised version includes the necessary details to accurately convey these findings.

      (3) We appreciate the referee's agreement with our perspective.

      (4) In response to the recommendations from all referees, we have expanded both the introduction and discussion sections of our manuscript. The initial brevity of these sections was due to the short format we had initially chosen. We believe that these expansions contribute to a more comprehensive and nuanced presentation of our work, addressing the concerns raised by the referees.

      Recommendations for the authors

      The time stamp and scale bars were added.

      The median versus mean issue is addressed above.

      Figure numbering has been corrected (sorry for the mistake). The efficiency of CK is defined in the Mat and Met section.

      To Reviewer #2

      Public review

      We express our gratitude to the referee for their positive appreciation of our work. We have carefully considered their suggestions and have modified the article accordingly.

      In addition, as said to Referee #1, we want to stress that we have implemented a significant additional modification by introducing a new figure (Fig. 6). This figure highlights the collaborative impact of FMRP and Map1B on the microtubular structure of migrating neurons. We firmly believe that this molecular elucidation of the migration phenotype constitutes a noteworthy addition to our work.

      Recommendations for the authors

      (1) In light of the referee's recommendation, we conducted more resolutive staining of FMRP in SVZ neurons cultured in Matrigel, providing a more precise depiction of its subcellular localization (see Figure 1). Additionally, we have removed the sentence referring to growth cone staining, as it was not visibly present in cultured neurons. We appreciate the guidance from the referee in refining our study.

      (2) We have also added a new figure 4 with better staining of MAP1B in the RMS as well as a more resolutive MAP1B staining in cultured neurons.

      With all due respect, we maintain that the western blot experiments, conducted in three independent experiments, unequivocally support the conclusion of a 1.6X increase in MAP1B in the RMS of Fmr1null mutants, a trend observed in other systems.

      In accordance with the referee's suggestion, we endeavored to quantify RMS immunostainings. Regrettably, the results proved inconclusive. This outcome is not entirely unexpected, as immunostainings are recognized for their inherent challenges in quantification. The additional complexity introduced by neonate perfusion further contributes to the notable interindividual variability observed.

      (3) The efficiency of the two interfering RNAs is now documented in the text. Regarding the directionality radar, as highlighted for Ref 1 (public review, point #2), we acknowledge that, while Fmr1KD generally recapitulates the migratory phenotype of the Fmr1 mutants, more precise statistical analysis reveals differences in directionality, which is now documented. We apologize for the previous lack of precision.

      (4) The suggested experiment of overexpression is interesting but we faced challenges in its execution. Attempts to overexpress MAP1B through intraventricular electroporation of a CMV-MAP1B plasmid resulted in the immobilization of transfected cells in the SVZ, hindering further analysis of migration. We hypothesize that this outcome may be attributed to a discrepancy in the actual dosage of MAP1B in the mutants.

      (5) Concerning this point, and as mentioned above, we have incorporated a crucial piece of information into the manuscript, presented in Figure 6. The data reveal a severe disruption in the microtubular cage surrounding the nucleus of migrating neurons in Fmr1 mutants, a phenomenon rescued by MAP1B knock-down. Based on these findings, we believe we can confidently conclude that the microtubule-dependent functions of MAP1B play a role in the migratory phenotype of Fmr1 mutants. We consider this experiment to be a highly valuable addition to our work, shedding light on the underlying molecular mechanisms.

      To Reviewer #3

      We thank the referee for their insightful comments and have taken their consideration with great considerations.

      In addition and as said above, we want to stress that we have implemented a significant additional modification by introducing a new figure (Fig. 6). This figure highlights the collaborative impact of FMRP and Map1B on the microtubular structure of migrating neurons. We firmly believe that this molecular elucidation of the migration phenotype constitutes a noteworthy addition to our work.

      Public review

      With regard to the perceived 'incompleteness' of our work, we believe that the addition of Figure 6, illustrating the molecular underpinnings of the Fmr1 mutation on the microtubular cytoskeleton and its rescue in the MAP1B KD, significantly enhances the completeness of our study.

      In response to the comment on the introduction and discussion sections, we acknowledge that their brevity was due to the Short Format initially chosen. We have since expanded these sections, incorporating additional information about FMRP and MAP1B and their influences on migration.

      Regarding the La Fata article, as highlighted in our discussion, it's important to note that while the study did not strongly indicate an impact on radial locomotion per se, drawing conclusive results is challenging due to the relatively low number of analyzed neurons. Consequently, we do not believe that it poses a challenge to our findings.

      With respect to MAP1B overexpression, as previously mentioned in response to Ref #2, point 4, our attempts resulted in the inhibition of migration, potentially due to an overdosage of the protein.

      In terms of anatomical consequences, as highlighted in our discussion, while our neurons experience a delay in migration, they eventually reach their destination. Although a delay in migration may not directly result in significant anatomical anomalies, we acknowledge that the timing of differentiation can be crucial. As noted by Bocchi et al. (2017), a delay in the timing of differentiation for neurons reaching their target could lead to notable functional consequences. In any case, we have tOned down any references to the implication for the pathology.

      Recommendation for the authors

      • The size of the figures has been modified

      • The pausing time and sinuosity are now defined

      • The centrin-RFP labeling was indeed too weak in the previous version, which we corrected. We apologize for this.

      • Fig S3 has been revised to address concerns. Notably, the decision to present the two bands for Vinculin and MAP1B separately is intentional. The blot is cut to allow independent development due to the substantial difference in their development times. We believe this approach provides a more accurate representation of the data.

      • The numbering of the figures has been corrected. Sorry for the initial mistake.

      • The Mat and Meth section has been corrected. Please note that we did not use any culture insert in this study.

      • The tittle has been modified

      • Comments about the Map1B overexpression experiment are expressed above and in replies to ref #2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Lee et al. compared encoding of odor identity and value by calcium signaling from neurons in the ventral pallidum (VP) in comparison to D1 and D2 neurons in the olfactory tubercle (OT).

      Strengths:

      They utilize a strong comparative approach, which allows the comparison of signals in two directly connected regions. First, they demonstrate that both D1 and D2 OT neurons project strongly to the VP, but not the VTA or other examined regions, in contrast to accumbal D1 neurons which project strongly to the VTA as well as the VP. They examine single unit calcium activity in a robust olfactory cue conditioning paradigm that allows them to differentiate encoding of olfactory identity versus value, by incorporating two different sucrose, neutral and air puff cues with different chemical characteristics. They then use multiple analytical approaches to demonstrate strong, low-dimensional encoding of cue value in the VP, and more robust, high-dimensional encoding of odor identity by both D1 and D2 OT neurons, though D1 OT neurons are still somewhat modulated by reward contingency/value. Finally, they utilize a modified conditioning paradigm that dissociates reward probability and lick vigor to demonstrate that VP encoding of cue value is not dependent on encoding of lick vigor during sucrose cues, and that separable populations of VP neurons encode cue value/sucrose probability and lick vigor.

      Weaknesses:

      The conclusions of the data are mostly well supported by the analyses, but the statistical analysis is somewhat limited and needs to be clarified and extended.

      (1) The manuscript includes limited direct statistical comparison of the neural populations, and many of the comparisons between the subregions are descriptive, including descriptions of the percentage of neurons having specific response types, or differences in effect sizes or differing "levels" of significance. An additional direct comparison of data from each subpopulation would help to confirm whether the differences reported are statistically meaningful.

      Response: We thank the reviewer for their helpful suggestions. As the reviewer noted, the first version of our manuscript had limited direct comparisons of single-neuron metrics across subpopulations. These analyses were also limited to the supplementary figures: 1) {SK vs. XK} and {SK vs. ST} decoder auROC (S10F), 2) Valence scores (S10G), and 3) S-cue confusion after MNR classification (S11D). We have now included the following statistical comparisons of single-neuron metrics across subpopulation: 1) % of neurons that respond to both S cues (Tables S10, S11), 2) % of neurons that have auROC >0.75 for {SK vs. XK}, {SK vs. PK}, and {SK vs. ST} (Tables S12-S17), 3) response magnitudes to S cues (Table S38), and 4) valence scores (Tables S44-46).

      (2) When hypothesis tests are conducted between the neural populations, it is not clear whether the authors have accounted for the random effect of the subject, or whether individual units were treated as fully independent. For instance, pairwise differences are reported in Figures 4I, 5G/I/L, and others, but the statistical methods are unclear. Assessment of the statistics is further limited by the lack of reporting of degrees of freedom. If the individual neurons are treated as independent in these analyses, it could increase the likelihood of

      Response: We have clarified when statistical analyses are comparing individual neurons vs. simultaneously recorded populations. Per the reviewer’s recommendation, we have also incorporated linear mixed-effects models when statistically analyzing individual neurons. Lastly, to further clarify the statistical analyses used, we have added multiple supplementary tables that better describe the statistical tests used and the relevant outputs.

      Reviewer #2 (Public Review):

      Summary:

      This work is interesting since the authors provide an in vivo analysis into how odor-associations may change as represented at the level of olfactory tubercle (presynaptic) and next at the level of the ventral pallidum (postsynaptic). First the authors start-off with a seemingly careful characterization of the anterograde and retrograde connectivity of dopamine 1 receptor (D1) and dopamine 2 receptor (D2) expressing medium spiny neurons in the olfactory tubercle and neurons in the ventral pallidum. From this work they claim that regardless of D1 or D2 expression, tubercle neurons mainly project to the lateral portion of the ventral pallidum. Next, to compare how odor-associated neuronal activity in the ventral pallidum and the olfactory tubercle (D1 vs D2 MSNs) transforms across association learning, the authors performed 2photon calcium imaging while mice engaged in a lick / no-lick task wherein two odors are associated with reward, two odors are associated with no outcome, and two odors are associated with an air puff.

      This manuscript builds off of prior work by several groups indicating that the olfactory tubercle neurons form flexible learned associations to odors by looking at outputs into the pallidum (but without looking specifically at palladial neurons that truly get input from tubercle I should highlight) and with that, this work is novel. We appreciated the use of a straight-forward odoroutcome behavioral paradigm and the careful computational methods and analyses utilized to disentangle the contributions of single neurons vs population level responses to behavior. With one exception from the Murthy lab, 2P imaging in the tubercle is a new frontier and that is appreciated - as is the 2P imaging in the pallidum which was well-supported by the histology. The anatomical work is also well presented.

      Overall the approach and methods are superb. The issues come when considering how the authors present the story and what conclusions are made from these data. Several key points before going into specifics about each are: 1) The authors can not conclude that their results are contradictory to prior results, 2) The authors over-interpret the results and do not discuss several key methodological issues. We were concerned with the ability to make strong claims regarding the circuitry presented, especially given how much the presented claims contradict prior work. There were also issues with the interpretability of neuronal encoding of value vs valence based on the present behavior (in which a distinction between the air puff and neutral trial types was not clear) and the imaging methodology (in which the neuronal populations analyzed were not clearly defined). In addition to toning down and rectifying some of the language and interpretations, we suggest including a study limitations section where these methodological and interpretation issues are discussed. Over-interpreting and playing up the significance of this work is unnecessary, especially given eLife's new review and publication policy. Readers should be given a sufficiently detailed and nuanced presentation of these thought-provoking results, and from there allowed to interpret the results as they want.

      Strengths:

      State-of-the-art approaches (as detailed above)

      Possible conceptual innovation in terms of looking into output from the olfactory tubercle which has yet to be investigated in this avenue.

      Weaknesses:

      On the first point regarding the authors repeated and unsupported claims that their results are contradictory. There are papers by numerous groups, in respected journals including this one, all together which used 5 different methods (cfos, photometry, 2P, units, fMRI), in animals ranging from humans to mice, which support that tubercle neurons reflect the emotional association of an odor, whether spontaneous or learned. With that, it is on the authors to not claim that their results contradict as if the other papers are suspect, but instead, from our standpoint it is on the authors to explain how and why their results differ from these other papers versus just simply saying they found something different [which at present is framed in a way that is 'correct' due to primacy if nothing else].

      Response: We acknowledge that the first version of the manuscript contained unnecessary disagreeing language. We do not think that our results are broadly in disagreement with the existing literature, but we do come to different conclusions about what the OT is representing. Namely, our comparison of valence encoding in OT to that in the VP strongly indicates that the anteromedial OT has a less robust representation of valence, and we argue that this reflects either an intermediate form of valence representation or potentially might not be important for valence representation at all. We have toned down our conclusions, made clear that we are only recording from one domain of the OT, limited our speculation to the discussion and added a “speculations” section.

      Second, onto the points of interpretation of results, there are several specific areas where this should be rectified. As is, the authors overinterpret their results and draw too far-reaching conclusions. This needs to be corrected.

      In particular, the claims that D1 and D2 neurons of the olfactory tubercle nearly exclusively send projections to the ventral pallidum must be interpreted with caution given that the authors injected an anterograde AAV into the anteromedial olfactory tubercle, and did not examine the projections from either the posterior or lateral portions of the olfactory tubercle. This is especially significant since the retrograde tracing performed from the ventral pallidum indicates that the lateral olfactory tubercle, not the medial olfactory tubercle, primarily projects to the ventral pallidum (Fig 1D-F), however this may be due to leakage into the nucleus accumbens, as seen in the supplementary figure, S1G.

      Response: We thank the reviewer for the point of caution. We have now made it clear that our conclusions are limited to the anteromedial portion of the OT, and other areas may have other projections.

      The same caution must be advised when interpreting the retrograde tracing performed in Fig 1G-I, since the neuronal tracer used and the laterality and rostral-caudal injection site within the VTA could result in different projection patterns and under- or over-labelling. Additionally, the metric used, %Fiber Density (Figure 1C), as in the percentage of 16-bit pixels within the region of interest with an intensity greater than 200, is semi-quantitative, and is more applicable for examining axonal fibers that pass through a region rather than the synaptic terminals (like with a synaptophysin fusion protein-based tracing paradigm) found within a region (puncta). The statements made in contrast to prior studies should therefore be softened, and these concerns should be addressed in the introduction, discussion, and the limitations section if added.

      Response: We have added statements to address these limitations.

      The other major concern is whether the behavioral data generated is indicative of the full spectrum of valence. The authors appropriately state that the mice "perceive" the air puff, yet based on their data the mice did not clearly experience the puff-associated odor as emotionally aversive (viz., negative valence). The way the authors describe these results, it seems they agree with this. With that, the authors can't say the puff is aversive without data to show such - that is an assumption which, while seemingly intuitive, is not supported by the data unfortunately. To elaborate more since this is important to the messaging of the paper: The authors utilized a simple behavioral design, wherein two molecular classes of odors were included in either a sucrose rewarded, neutral no outcome, or air puff punished trial type. The odor-outcome pairs were switched after three days, allowing the authors to compare neuronal responses on the basis of odor identity and the later associated outcome. While the mice showed clear learning of the rewarded trial types by an increase in anticipatory licking during the odor, they did not show any significant changes in behavior that indicated learning of the air puff trial type (change in running velocity or % maximal eye size), especially in contrast to the neutral trial type. This brings up the concern that either the odor-air puff aversive associations (to odors) were not learned, or that the neutral trial types, in which a reward was omitted, were just as aversive as the air puff to the rear, despite the lack of startle response - perhaps due to stimulus generalization between neutral and air puff odor. The possibility of lack of learning is addressed in the paragraph starting at line 578, but does not account for the possibility that the lack of reward is also sufficiently punishing. The authors also address the possibility that laterality in the VP contributed to the lack of neural responsivity observed, but should also include a statement regarding laterality in the olfactory tubercle, as described in https://doi.org/10.7554/eLife.25423 and https://doi.org/10.1523/JNEUROSCI.0073-15.2015, since the effects of modulating the lateral portion of the olfactory tubercle are not yet reported. Lastly, use of the term "reward processing" should be avoided/omitted since the authors did not specifically study the processing of reinforcers.

      Response: As the reviewer points out, we tried to be cautious interpreting the “aversive” odor response, and focused mainly on the reward association. This was discussed in the discussion. We don’t see the need to further add a redundent statement to a “limitations section”. We have also added a note about the previously identified laterality of the OT, which might account for lack of aversive responsive neurons in the OT. The reviewer makes an interesting suggestion that behavioral responses to airpuff-associated odors are not significantly different from un-associated because the lack of reward in this context is already aversive. We note that the walking velocity between reward- and puff-associated odor is significantly different, but not that to unassociated. This is in agreement with the suggestion, and we have added a statement to reflect this.

      Also, I would appreciate justification of the term "value". How specifically does the assay used assess value versus a more simplistic learned association which influences perceived hedonics or valence of the odors.

      Response: We have removed the term “value” with the exception of areas where we cite the work of others. We acknowledge that the word value is complicated in the incentive learning field and appreciate the suggestion. Our experimental design was meant to investigate learned association for positive and negative stimuli, thus valence is more appropriate and we have used this term.

      More information is needed regarding how neurons are identified day-to-day, both in textual additions to the Methods and also in terms of elaborating more in the results and/or figure legends about what neurons are included:

      (a) The ROI maps for identifying/indicating cells in the FOVs are nice to see and at the same time raise some concerns about how cells are identified and/or borders for those specific ROIs drawn. For instance, Figure 4, A & D, ROI #13 (cell #13) between those two panels is VERY different in shape/size. Also see ROIs 15 and 4. Why was an ROI map not made on day 1 and then that same map applied and registered to frames from consecutive imaging days in that same mouse? As it is new ROIs are drawn, smaller for some "cells" and larger for others. And at least in ROI #13 above, one ROI is about twice as large as the other. This inconsistency in the work flow and definition of the ROIs is needing to be addressed in Methods. Also, the authors should address if and how this could possibly impact their results.

      Response: We have added details and clarified the methods section to make this more clear. We note that we extracted calcium transients from the raw data with the the widely used Constrained Nonnegative Matrix Factorization (CNMF) algorithm. This processing algorithm simultaneously identifies spatial and temporal components using modeled kinetics of calcium transients and pre-trained CNN classifiers. Using 2-photon microscopy the optical resolution in the z plane is narrow and we may not always capture components of a neuron that look like “neurons”, but all ROIs were confirmed manually to ensure they were not artifacts.

      (b) Also, more details are needed in results and/or figure legends regarding the changes in cell numbers over days that are directly compared in the results. Some days there are 10% or more or less cells. Why? It is not the same population being compared in this case and so some Discussion of this is needed.

      Response: The shapes of the spatial components can vary across days due to nonrigid motion in the brain and/or miniscule differences in the imaging angle across days. Although we visually verified that we are imaging approximately the same z plane across days, we cannot (and do not) claim to image identical populations of neurons across days.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes a study of the olfactory tubercle in the context of reward representation in the brain. The authors do so by studying the responses of OT neurons to odors with various reward contingencies and compare systematically to the ventral pallidum. Through careful tracing, they present convincing anatomical evidence that the projection from the olfactory tubercle is restricted to the lateral portion of the ventral pallidum.

      Using a clever behavioral paradigm, the authors then investigate how D1 receptor- vs. D2 receptor-expressing neurons of the OT respond to odors as mice learn different contingencies. The authors find that, while the D1-expressing OT neurons are modulated marginally more by the rewarded odor than the D2-expressing OT neurons as mice learn the contingencies, this modulation is significantly less than is observed for the ventral pallidum. In addition, neither of the OT neuron classes shows significant modulation by the reward itself. In contrast, the OT neurons contained information that could distinguish odor identities. These observations have led the authors to conclude that the primary feature represented in the OT is not reward.

      Strengths:

      The highly localized projection pattern from olfactory tubercle to ventral pallidum is a valuable finding and suggests that studying this connection may give unique insights into the transformation of odor by reward association.

      Comparison of olfactory tubercle vs. ventral pallidum is a good strategy to further clarify the olfactory tubercle's position in value representation in the brain.

      Weaknesses:

      The authors' interpretation of the physiologic results - that a novel framework is needed to interpret the OT's role - requires more careful treatment.

      Response: We thank the reviewer for their recommendation. We have toned down the conclusiveness of our language in the discussion. Additionally, we have removed several speculative sentences from the concluding paragraph.

      Reviewer recommendations for Authors:

      We thank the reviewers for this helpful list of recommended changes to the manuscript.<br /> Regrettably, a few of the recommendations were overlooked in the revision, as indicated below.<br /> We do agree with the suggestions and plan to add appropriate changes to the version of record.

      Reviewer #1 (Recommendations For The Authors):

      If the comparisons mentioned in point 2 in the public review do not account for the lack of independence of individual neurons, I suggest the authors do so by either running linear mixed effects models with a random effect for subject, or one-way ANOVAs with a random effect of subject, where appropriate. The authors could also run analyses on summarized individual subject data (averages, % of neurons, etc.), though the authors would lose substantial power when assessing whether average changes differ between subjects in each recording group.

      We have clarified when statistical analyses are comparing individual neurons vs. simultaneously recorded populations. Per the reviewer’s recommendation, we have also incorporated linear mixed-effects models when statistically analyzing individual neurons. Lastly, to further clarify the statistical analyses used, we have added supplementary tables for every statistical test that better describe the parameters used and the relevant outputs.

      Reviewer #2 (Recommendations For The Authors):

      Of minor note, there are some symbols/special characters that did not translate in the figure caption for Figure 6C, repeated text between lines 700-705 and 707-712, and some other small grammatical errors. Additionally, the source of the anterograde tracing virus (AAV9-phSyn1FLEX-tdTomato-T2A-SypEGFP-WPRE) needs to be stated.

      Thank you for pointing these out. We have added description to the figure legend, and deleted the repeated lines and fixed grammatical errors. During the revision, we Regrettably overlooked the request to provide the source for the AAV9-phSyn1-FLEX-tdTomato-T2A-SypEGFP-WPRE. We agree that this small detail is important and will add it before publication of the version of record. This viral vector was purchased from The Salk Institute GT3 Core.

      Reviewer #3 (Recommendations For The Authors):

      The authors' interpretation of the physiologic results - that a novel framework is needed to interpret the OT's role - requires more careful treatment. As the authors note, there is rewardcontingency modulation in OT, especially when D1 neurons are compared against D2, as shown in Fig. 3D,E, Fig. 4I, and Fig. F,J. Though small in effect size, presumably, these modulations cannot be explained by the odor identity. These observations, to this reviewer, suggest the D1 neurons of OT have a component of cue-reward representation. In other words, rather than developing an entirely new framework, an alternative possibility that D1 neurons of OT occupy an intermediate stage in associating cues with reward (i.e., under the same framework, but occupying a different position in the emergence of value representation) should be considered.

      We thank the reviewer for this thoughtful comment. We have eliminated the statement that “novel framework is needed” and have been more conservative in our interpretations. We have also acknowledged that our results are not necessarily in conflict with existing literature, but we do draw different conclusions, namely that the anteromedial OT is not a robust valence encoding population in comparison to that in the VP. We appreciate the suggestion of the term “intermediate stage” in reward association and have now included this in the discussion. Lastly, we have limited broader speculation to a “speculation” section of the discussion.

      Related to the above point, have the authors analyzed if the similarities in the chemical structures correspond to perceptual and neural similarities? In the data presented in Figure S4, there are greater similarities in the population patterns within the same rewarding condition than within chemical groups. A comparison of the reward vs. chemical group (a simpler version of Fig. 5B) may be beneficial and take full advantage of the experimental design.

      This comparison already exists in 5B and lines 285-289 of results. In VP populations, the distribution was structured such that intervalence pairwise comparisons between sucrose-paired and not sucrose-paired odors (e.g. ||SK-PK|| and ||SK-XK||) were larger than intravalence pairwise comparisons (e.g. ||SK-ST||, or ||XK-XT||). OTD1 populations showed an intermediate trend where most intravalence pairwise distances were smaller than intervalence pairwise distances with the exception of ||SK-ST||.

      Related to the point about chemical similarities - is the smaller effect size (amount of modulation associated with reward contingency) in this study, compared to the study by Martiros et al, explained by the similarities of odorants used?

      This is an interesting point. Although the odorants we use are different from those in Martiros et al, we think it is unlikely to the basis of smaller effect size due to reward modulation. If OT represents odor in a population code, whereby identity is encoded in unique ensembles of activity, then variation in the expression of D1R between OT neurons could account for different effects in different ensembles. However, there is no evidence for such varied expression and it doesn’t seem like an ideal mechanism for the OT to broadly associate odor with reward. Moreover, we do not observe any differences in effect size of reward association between the different odorants used in our study. Rather, we think the difference between our findings is more likely to result from recording in different populations of neurons, which is addressed in lines 522-535.

      Regarding the data presented in Fig. 3I - the rewarded odor responses (Sk) are compared against neutral ones (Xk responses), but an S vs. P comparison may be informative, too. Even though the authors mention that the effect of air puff is subtle, the behavioral data presented in Fig. 2F and G suggest that these serve as aversive stimuli. For example, on day 4, the first day after the reward contingency switch, the licking levels seem the lowest for the P odors.

      We have added the S vs P comparison. Indeed, we had originally omitted this because the neural and behavioral response to puff cues was not robust. This is discussed in the discussion (lines 563-579), and our conclusions about aversive conditioning are cautious.

      Regarding the data presented in Fig. 4G: it is difficult to interpret the data when the data for day 1 reward period and day 3 reward cue period are combined. Or do the authors mean day 1 S cue and day 3 S cue?

      These data were based on an observation that some neurons in the VP only responded to sucrose (not odor) on day 1, but later became responsive to the associated odor on day 4. To quantify this, Fig. 4G shows the percentage of these neurons by reporting the percentage that were both responsive to sucrose (not odor) on day 1 and also rewarded odor on day 3. This is described in lines 260-274.

      Figure 6 presentation would benefit from a revision. For example, it is unclear if the water port becomes available for the "N" odors with 100% or 50% chance of reward delivery, and if so, how that happens. There are some errors e.g., colormap used for panel G; odors listed may be wrong in line 752 etc. It was unfortunately not possible to understand what was presented.

      We have added a schematic (Fig 6B) to better describe the movement of the port and details to the methods. The color scale was indeed inverted in panel G (now H), and it has been corrected. We have verified that the odors listed in the methods are correct. Although not included in the revision, in the version of record we will also add corresponding descriptors (e.g., LHi & Lx) to the odors in the methods for easier comparison.

      Minor comments

      For Figure 2H, an alternative description in the legend may be beneficial, as the phrasing is not intuitive. A suggested alternative is "licks in response to sugar-associated odors expressed as fraction of all odors".

      We appreciate the suggestion and have changed this to “licks during either sucrose cue expressed as a fraction of all licks during any odor.”

      Figure 2H: please explain the color code for crosses in the legend and the statistical comparison shown in the figure.

      We have added a legend to explain the color code and included a statement about the statistics in the legend with a link to a supplemental table for statistical parameters.

      Figure 3D: may contain mislabeling in the legend - the legend for 3D does not match the plot (legend refers to bar graph while plot shows line graphs)

      Unclear what is meant. 3D legend says: “Percentage of total neurons that were significantly excited or inhibited by each odor (Bonferroni- adjusted FDR < 0.05) as a function of time relative to odor. Lines represent the mean across biological replicates and the shaded area reflects the mean ± SEM.” This is not a bar plot and is not referred to as one. 3E does show bar plots and is correctly described in the legend.

      Figure 3M: uses letters to refer to cell populations that are identical to the roman numerals used in Fig 3 A-C as well as colours similar to the ones in Fig 3C. However, the cell groups are unrelated; splitting the figures or using a different nomenclature might help

      We have adapted a different color code that we think makes this more distinct.

      Figure 4I: statistical comparison shown in figure not explained (neither in main text nor legend)

      We have added a statement about the statistical comparison and referenced a supplementary table.

      Figure 5 D: color code appears to have a different range than the values shown (i.e. lower limit is 0.7 while the plot shows values below 0.7)

      We confirm this is not a mistake but a stylistic choice. The displayed color scale does only show values to lower limit of 0.7, while the lower limit of values is 0.67. Although the color for 0.67 is not shown in the scale it is approximately the same as the lower limit. The values are reported for full transparency and accuracy.

      Figure 5 G, I, & L: statistical comparison shown in figure not explained

      The comparisons have been explained in supplemental tables (S22-29) and referenced in the legend.

      Figure 5 I: meaning of symbols overlayed over bars not explained

      “Markers represent the mean across biological replicates” has been added.

      Figure 5 J&K: please state if error bars show SEM or SD; also please describe individual thinner lines in the legend

      This has been added to describe 5I. The same format applies to J&K.

      Figure 5L: please describe the individual crosses overlayed over bars in the legend

      Described in 5I.

      Figure S6A-C: please mention the odors used.

      S6A-C shows kinetics for the odor a-terpinene, which is now indicated in the legend.

      Line 129: mentions a 70 psi airpuff but methods say 75 psi - please clarify This has been corrected. 70 psi is the correct value.

      Line 134 typo: SP should be PK

      This has been corrected.

      Line 428: typo; should be cluster 3, not 2

      This has been corrected.

      Line 474 (and figure 6O): please explain what "P" is

      “P” is probability, used as P(S), as in probability of sucrose. This is defined in in line 466.

      Line 692: please describe the staining protocol in the methods (rather than just listing the antibodies and concentrations)

      We have added more details (lines 692-699).

      Line 707-712: duplicate text (identical to Line 700-705)

      This has been deleted.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study investigated transcriptional profiles of midbrain dopamine neurons using single nucleus RNA (snRNA) sequencing. The authors found more nuanced subgroups of dopamine neurons than previous studies, and idenfied some genes that are preferenally expressed in subpopulaons that are more vulnerable to neurochemical lesions using 6-hydroxydopamine (6OHDA). The reviewers found the results are solid, and the study is overall valuable, providing crical informaon on the heterogeneity and vulnerability of dopamine neurons although the scope is somewhat limited because the result with snRNA is similar to previous results and cell deaths were induced by 6OHDA injecons.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study by Yaghmaeian Salmani et al., the authors performed single-nuclei RNA sequencing of a large number of cells (>70,000) in the ventral midbrain. The authors focused on cells in the ventral tegmental area (VTA) and substana nigra (SN), which contain heterogeneous cell populaons comprising dopaminergic, GABAergic, and glutamatergic neurons. Dopamine neurons are known to consist of heterogeneous subtypes, and these cells have been implicated in various neuropsychiatric diseases. Thus, idenfying specific marker genes across different dopamine subpopulaons may allow researchers in future studies to develop dopamine subtype-specific targeng strategies that could have substanal translaonal implicaons for developing more specific therapies for neuropsychiatric diseases.

      A strength of the authors' approach compared to previous work is that a large number of cells were sequenced, which was achieved using snRNA-seq, which the authors found to be superior compared to scRNA-seq for reducing sampling bias. A weakness of the study is that relavely litle new informaon is provided as the results are largely consistent with previous studies (e.g., Poulin et al., 2014). Nevertheless, it should be noted that the authors found some more nuanced subdivisions in several genecally idenfied DA subtypes.

      On this point we respectfully disagree with the reviewer. In this study, over 30,000 mDA neurons have been analyzed at the genome-wide gene expression level, idenfying mDA territories and neighborhoods (that some may call “subtypes”), a descripon of the mDA neuron diversity that goes far beyond what has been published previously.

      Although several single-cell RNA sequencing studies of mDA neurons have added to our understanding of mDA diversity, they have been limited by the low numbers of sequenced mDA neurons. As the reviewer specifically referred to the study by Poulin et al., 2014, it should be noted that in this report, 159 mDA neurons were analyzed by qPCR – not by RNAseq – of 96 previously identified marker genes. Despite those limitaons, this was indeed a highly impressive study, suggesng five different mDA neuron subtypes (as compared to the 16 neighborhoods described here), published before the era of single-cell genome-wide gene expression methods and advanced bioinformac tools were available. On average, the following scRNAseq studies typically captured a few hundred mDA neurons - compared to over 30,000 in this study. None of the studies menoned in our manuscript were close to capturing the full diversity, and the informaon on mDA neuron diversity is, for this reason, somewhat fragmented in the scienfic literature. Indeed, the seven mDA “subtypes” described in the excellent reviews by Poulin et al., 2020 in Trends in Neurosciences and Garritsen et al., 2023 in Nature Neuroscience are integrated interpretaons of the results from numerous independent studies, each methodologically unique. Several previously idenfied groups, especially Vglut2+ populaons in VTA and SNpc, have been considered poorly defined. As menoned above, our findings in this study could reliably idenfy, by computaonal analyses and combinatorial marker expression in situ, 16 different neighborhoods within the mDA populaon and localize them in the ssue (Figure 4, Supplementary figures 4-1 to 4-3, described further in Supplementary Results). To menon three examples: Within Sox6+ SNpc, we idenfied four different variants (neighborhoods) with partly unique anatomical localizaon. In addion, the large group of mDA neurons referred to as the Pcsk6 territory has not been clearly defined in earlier studies. We also idenfied a novel mDA neuron group that is related to the previously well described Vip-expressing mDA neurons. These and other novel features are menoned in the manuscript and in Supplementary Figure 4-1 to 4-3.

      Although we have, for the consideraon of the space and intelligibility, characterized the 16 neighborhoods with only a few selected key marker genes, we have idenfied numerous addional novel markers, some of which are shown in dot plots in Figure 3 and Supplementary Figure 3, which can be used to characterize these groups further. We also provide all our sequencing data and our Padlock probe ISS data for anyone to download and analyze further, and we have made a web-based tool, CELLxGENE, available on our group’s website to facilitate exploraon of the different aspects of our dataset.

      Lastly, the authors performed molecular analysis of ventral midbrain cells in response to 6-OHDA exposure, which leads to the degeneraon of SN dopamine neurons, whereas VTA dopamine neurons are mainly unaffected. Based on this analysis, the authors idenfied several candidate genes that may be linked to neuronal vulnerability or resilience.

      Overall, the authors present a comprehensive mouse brain atlas detailing gene expression profiles of ventral midbrain cell populaons, which will be important to guide future studies that focus on understanding dopamine heterogeneity in health and disease.<br /> We thank the reviewer for poinng this out.

      Reviewer #2 (Public Review):

      In the manuscript by Salmani et al., the authors explore the transcriptomic characterizaon of dopamine neurons in order to explore which neurons are parcularly vulnerable to 6-OHDA-induced toxicity. To do this they perform single nucleus RNA sequencing of a large number of cells in the mouse midbrain in control animals and those exposed to 6-OHDA. This manuscript provides a detailed atlas of the transcriptome of various types of ventral midbrain cells - though the focus here is on dopaminergic cells, the data can be mined by other groups interested in other cell types as well.

      The results in terms of cell type classificaon are largely consistent with previous studies, though a more nuanced picture of cellular subtypes is portrayed here, a unique advantage of the large dataset obtained. The major advance here is exploring the transcriponal profile in the ventral midbrain of animals treated with 6-OHDA, highlighng potenal candidate genes that may influence vulnerability. This approach could be generalizable to invesgate how various experiences and insults alter unique cell subtypes in the midbrain, providing valuable informaon about how these smuli impact DA cell biology and which cells may be the most strongly affected.

      We appreciate these comments. We want to state that the study not only gives a more nuanced picture but goes far beyond previously published studies and provides a highly resolved and detailed atlas of mDA neurons. Thus, it clarifies poorly described diversity and idenfies enrely novel groups of diverse mDA neurons at the genome-wide gene expression level.

      Overall, the manuscript is relavely heavy on characterizaon and comparavely light on funconal interpretaon of findings. This limits the impact of the proposed work. It also isn't clear what the vulnerability factors may be in the neurons that die. Beyond the characterizaon of which neurons die - what is the reason that these neurons are suscepble to lesion? Also, the interpretaon of these findings is going to be limited by the fact that 6-OHDA is an injectable, and the effects depend on the accuracy of injecon targeng and the equal access of the toxin to access all cell populaons. Though the site of injecon (MFB) should hit most/all of the forebrain-projecng DA cells, the injecon sites for each animal were not characterized (and since the cells from animals were pooled, the effects of injecon targeng on the group data would be hard to determine in any case).

      We agree that the results are presented to provide a comprehensive and valuable resource rather than explaining molecular mechanisms. The reviewer points out that “what the vulnerability factors may be in the neurons that die” is unclear. However, our study was designed to answer the queson: What genes are enriched in clusters of mDA neurons that are parcularly likely to die aer toxic stress? Using single-cell analysis, we believe this queson had higher priority than atempng to idenfy gene expression changes occurring during the cell death process. We agree that we cannot answer why neurons are suscepble to lesions, only idenfy genes that correlate with either high or low sensivity. Thus, the genes we refer to as “vulnerability genes” and “resilience genes” are candidates for influencing differenal vulnerability. Hard evidence for such influence will require addional and extensive funconal analysis. As for the variability of injecon and the characterizaon of individual animals, we wish to menon the online interacve explorer available at htps://perlmannlab.org/resources/. It allows visualizaon of nuclei distribuon per territory and neighborhood for each mouse, making it easy to determine the cell loss rao and cell distribuon per animal. There is indeed variance in the proporons of intact/lesioned total nuclei per animal. This is also evident from the DAT autoradiographs shown for each lesioned animal and presented in Figure Supplement 5-1 A. Importantly, the relave UMAP distribuon of nuclei is quite similar between individual animals. To further invesgate this, we used Pearson’s Chi square test of independence with a conngency table for animals, each with two categorical variables as the proporon of nuclei from intact vs lesioned parts of the vMB (see added Supplementary figure 5-1 C ). This shows that – while there is a difference in the number of nuclei remaining aer lesioning – the relave distribuon among clusters and neighborhoods is similar between animals. We have clarified this point in the manuscript (see page 12 ).

      I am also not clear why the authors don't explore more about what the genes/pathways are that differenate these condions and why some cells are parcularly vulnerable or resilient. For example, one could run GO analyses, weighted gene co-expression network analysis, or any one of a number of analysis packages to highlight which genes/pathways may give rise to vulnerability or resilience. Since the manuscript is focused on idenfying cells and gene expression profiles that define vulnerability and resilience, there is much more that could have been done with this based on the data that the authors collected.

      We performed GO analysis for the genes upregulated and downregulated in the ML clusters (specific to the lesion condion) in the original manuscript (Please see figure supplement 7-1 C-E, and the newly added Supplementary file 10), but we agree with the reviewer that we could also have analyzed funconal categories of genes correlang with differenal vulnerability. Thus, we have used tools recently developed by Morabito et al., Cell Reports Methods (2023), and their hdWGCNA package to address this queson. This method is parcularly suitable for analyzing high-dimensional transcriptomics data such as single-cell RNA-seq or spaal transcriptomics. We calculated the coexpression network based on the lesioned nuclei of the mDA territories. Of the 9 co-expression modules calculated, one has the highest expression in Sox6 territory and has genes in common with the vulnerability module. Another co-expression module has genes in common with the resilience module and is most highly expressed in Otx2 and Ebf1 territories. We also did GO analysis for these co-expression modules and added addional GO analysis of the ML-enriched genes (see Supplementary Figure 7-1 D,E, the newly added Supplementary Figure 6-3, and the newly added Supplementary file 9). Text describing these addional analyses are menoned on page 15 and 17.

      In addition, we wish to emphasize our idenficaon of the genes we refer to as vulnerability and resilience modules in the previous version of the manuscript. Several of the genes were discussed in the previous version of the manuscript but we have now included more informaon on these genes, based on previously published studies and discuss their potenal funconal roles (see pages 22 & 23 in the Discussion).

      Another limitation of this study as presented is the missed opportunity to integrate it with the rich literature on midbrain dopamine (and non-dopamine) neuron subtypes. Many subtypes have been explored, with divergent funcons, and can usually be disnguished by either their projecon site, neurotransmiter identy, or both. Unfortunately, the projecon site does not seem to track parcularly well with transcriptomic idenes, aside from a few genes such as DAT or the DRD2 receptor. However, this could have been more thoroughly explored in this manuscript, either by introducing AAVretro barcodes through injecon into downstream brain sites, or through exisng evidence within their sequencing dataset. There are likely clear interpretaons from some of that literature, some of which may be more excing than others. For example, the authors note that vGluT2-expressing cells were part of the resilient territory. This might be because this is expressed in medially-located DA cells and not laterally-located ones, which tends to track which cells die and which don't.

      The manuscript consists of a comprehensive descripon of transcriponal diversity. Although of clear value, we believe that addional, comprehensive analysis that combines snRNAseq with, e.g., AAVretro barcodes must be done in a separate study. It should also be noted that we describe each territory and neighborhoods in the further detail in the Supplementary Results, which contains references to the relevant literature. In line with the comments, this secon has now been expanded with further references to relevant studies (see Supplementary Results related to Figure 4-figure supplements 1-3).

      It is not immediately clear why the authors used a relaxed gate for mCherry fluorescence in Figure 1. This makes it difficult to definively isolate dopaminergic neurons - or at least, neurons with a DATCre expression history. While the expression of TH/DAT should be able to give a fairly reliable idenficaon of these cells, the reason for this decision is not made clear in the text.

      We used a relaxed gang to ensure that we could capture nuclei expressing low levels of RFP, which we believe could be especially relevant for the lesioned dataset (see page 5). We did not find that it would be advantageous to use a more stringent gang that would risk losing all cells expressing no (or very low levels) RFP. Idenfying mDA neurons based on their typical markers is straighorward, as their transcriponal relaonship is evident from the expression profile of several markers, including transcripon factors such as Nr4a2, Pitx3, and En1. In addion, as pointed out in response to Reviewer #1, point 5, atypical DA neurons expressing Th and other mDA markers with no or low levels of Slc6a3 (DAT) were isolated. We believe the study is more complete by the inclusion of these cells. Moreover, we included a sufficiently large number of cells, which ensured a comprehensive analysis of mDA neurons in relaon to other cell types dissected from the ventral midbrain.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors state that a major advantage of their approach is that it prevents biased datasets when compared to methods that rely on capturing certain cell types. I was wondering if the authors could follow up on this topic with a more detailed descripon of their methodological advantages regarding potenal sampling bias. This is somewhat unclear to me, given that the results of the present study are largely consistent with previous work on this topic.

      As expanded on above (see response to the inial comment in the public review), we strongly disagree that there is litle novelty in our study. None of the previous studies come close to describing the mDA neuron populaon with a similar resoluon, which is unsurprising given the differences in the number of analyzed mDA neurons in this versus previous reports. We agree with the reviewer that our data is consistent with previous studies, when they are all combined. Thus, we idenfied mDA neuron groups that correspond (or roughly correspond) to major DA neuron groups idenfied in previous studies (see pages 8-14 in the Supplementary Results). However, the atlas presented here goes well beyond anything published in scope and resoluon. The diversity we define is comparable to findings that, with careful cross-paper analyses, can be stched together from previous single-cell studies. However, even such a combined analysis does not unravel the resoluon and diverse categorizaon of what we have demonstrated herein (16 neighborhoods in midbrain dopaminergic territories). Considering the well-established problems of dissociang and isolang whole neurons from adult brain ssue, this is likely due to sampling bias, resulng in an almost complete exclusion of some sub-populaons of neurons. We have added text on page 20 to clarify this point.

      (2) In the abstract, the authors state that their "results showed that differences between mDA neuron group could best be understood as a connuum without sharp differences between subtypes". However, I am not sure whether this is the most appropriate descripon of the authors' results, parcularly when looking at the schemac overview shown in Fig. 4F. To me, it seems more likely that genecally-defined DA subtypes overlap with discrete ventral midbrain subnuclei - parcularly in the case of Sox6-expressing cells, which are almost exclusively located in the SNc. In the case of genes that are specific for the VTA, there also seems to be a strong bias toward certain VTA subnuclei, although I agree that arguments can be made that there is some topographic organizaon along a dorso-ventral and medio-lateral gradient, which seems to be largely consistent with the anatomical locaon of projecon-defined dopamine neurons as described previously by Poulin et al., 2018 (Nature Neuroscience).

      What was meant by connuum must be interpreted in the context of the transcriponal landscape of mDA neurons and not their anatomical localizaon. As stated in the paper, the dendrogram depicon of mDA neurons’ transcriptome can be misinterpreted as an indicaon of sharp boundaries and discrete groups in transcriponal profiles. In contrast, we assert that differences between developmentally related mDA neurons are beter described as a connuum with areas in the gene expression landscape defined by the expression of shared genes but without sharp borders between them. We decided to name different areas within this connuum as “territories” at the higher hierarchical level and “neighborhoods” at the more highly resolved level. Hypothecally, such categorizaon can be even more fine-grained, but we find it unlikely that a resoluon beyond the neighborhood level is biologically relevant. As pointed out, the Sox6 territory is the territory that best qualifies as a disncve subtype, while mDA neurons in, e.g., the VTA consist of much higher and nuanced diversity. Importantly, all mDA neurons are much more related to each other than cell types lacking a common developmental origin, including hypothalamic DA neurons. Thus, our effort to define differences in such a gene expression connuum is, in our opinion, more accurate than conveying the message that the diversity consists of subtypes comparable in difference to other cell types that lack a close developmental relaonship with the mDA neuron populaon. Such disnct neuron types, despite using the same neurotransmiter as hypothalamic DA neurons, appear as disnct islands in the UMAP snRNA-seq landscape and typically harbor hundreds of differenally expressed genes. As pointed out in the Discussion, several other studies have noted similar difficules in defining different subtypes among related neurons in e.g. the cortex, striatum, and hippocampus (Kozareva et al., 2021; Saunders et al., 2018; Tasic et al., 2018; Yao et al., 2021). For example, Yao et al., 2021, used a similar hierarchical definion to avoid the implicaon that different groups (“neighborhoods” in this study) should be defined as disnct subtypes of neurons with obvious disncve funcons.

      (3) I recommend that the authors revise the introducon to include more current literature on this topic. The review by Bjoerklund and Dunnet, 2006, is very informave and important, but there is more current literature available that discusses anatomical, molecular, and funconal heterogeneity in the ventral midbrain. For example, it would be nice to incorporate recent work from the Awatramani lab on the mapping of the projecon of molecularly defined dopamine neurons (Poulin et al., 2018; Nature Neuroscience).

      We deliberately avoided including primary references to previously described diversity in the Introducon since numerous papers are relevant to cite. Instead, we refer to three essenal reviews, including the recent arcles from Awatramani and Pasterkamp. In the Supplementary Results related to Figure 4 (pages 8-14 in the Supplementary Results), we include many references and the Poulin 2018 paper. We believe that this is the appropriate place for a comprehensive discussion on anatomical, molecular, and funconal heterogeneity. In the revised manuscript's main body, we now emphasize that previous literature is discussed in the Supplementary Results (see page 11).

      (4) In Fig. 1C, the authors show a sample image demonstrang overlap between TH and mCherry, but this has not been quanfied. Similarly, there seem to be no sample images and quanficaon for the contralateral side that was exposed to 6-OHDA.

      The mouse lines used here (Dat-Cre and Rpl10a-mCherry) have been characterized before (Toskas et al., Science Advances 2022). The labelling colocalizes nearly fully with TH, with some excepons (see response below to point #5). We have now complemented with addional data showing an IHC image of one of the midbrain of a unilaterally lesioned mouse in Figure Supplement 5-1E.

      (5) The authors state that they focused their analysis on 33,052 nuclei expressing above-threshold levels of either Th OR Slc6a3. However, there seem to be cell populaons in the ventral midbrain of mice that express TH mRNA but not TH protein, and these cells do not seem to be bona fide dopamine neurons (see work from the Morales lab). Similarly, not all dopamine neurons may express DAT mRNA. I was wondering how these discrepancies may influence the authors' analysis and interpretaon.

      Indeed, the presence of cells lacking TH protein despite Th mRNA being expressed has been previously described. We also detected these cells across SNpc and VTA and now show these data as a newly added supplementary figure 2-1. In our dataset, the Gad2 territory, located in the ventromedial VTA, contains cells that express many typical mDA markers, such as Pitx3, but very low levels of TH protein. We have idenfied these based on Pitx3-EGFP and Gad2 mRNA co-expression (figure supplement 4-3). In other parts of VTA and SNpc, most cells seem to co-express Th mRNA and protein and are labeled with Dat-Cre. Also scatered in these areas, we could detect some rare mDA cells that lack TH protein. It should be noted that in our mDA territories other typical mDA neuron genes were expressed, such as Slc18a2, Ddc, Nr4a2 and Pitx3, and thus, they were not solely defined by the presence of Th and/or Slc6a3. Cells that do not have a history of DAT-expression, and therefore were not mCherry labelled, were also included in the analysis due to the relaxed gang used during FANS isolaon.

      (6) The sex and age of the mice that are used for the experiments are not stated in the Materials and Methods secon under "Mouse lines and genotyping".

      Thank you for pointing this out. This informaon has been added to the updated manuscript in the methods secon.

      Reviewer #2 (Recommendations For The Authors):

      I think that the manuscript can be significantly improved just by providing deeper analyses of the exisng data and linking them to the current state of the art in terms of defining midbrain dopamine neurons (e.g., by projecon). The dataset is likely richer than was explored in the manuscript and more valuable insights could be gleaned with a deeper analysis.

      Please see our response to Reviewer #2 (Public Review), regarding WGCNA analysis, and the comments on ML-based GO analysis, as well as the comments on the added secons in the supplementary results file.

    1. Author Response

      eLife assessment

      This study, which seeks to identify factors from the glial niche that support and maintain neural stem cells, unveils a novel role for ferritin in this process. Furthermore, the work shows that defects in larval brain development resulting from ferritin knockdown can be attributed to impaired Fe-S cluster activity and ATP production. These findings will be valuable to both oncologists and neurobiologists, though the supporting evidence is currently incomplete.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      This study unveils a novel role for ferritin in Drosophila larval brain development. Furthermore, it pinpoints that the observed defects in larval brain development resulting from ferritin knockdown are attributed to impaired Fe-S cluster activity and ATP production. In addition, knocking down ferritin genes suppressed the formation of brain tumors induced by brat or numb RNAi in Drosophila larval brains. Similarly, iron deficiency suppressed glioma in the mice model. Overall, this is a well-conducted and novel study.

      Strengths:

      Thorough analyses with the elucidation of molecular mechanisms.

      Weaknesses:

      Some of the conclusions are not well supported by the results presented.

      We really appreciate your review and positive feedback. As for weaknesses, we will try our best to solidate the related conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Zhixin and collaborators have investigated if the molecular pathways present in glia play a role in the proliferation, maintenance, and differentiation of Neural Stem Cells. In this case, Drosophila Neuroblasts are used as models. The authors find that neuronal iron metabolism modulated by glial ferritin is an essential element for Neuroblast proliferation and differentiation. They show that loss of glial ferritin is sufficient to impact on the number of neuroblasts. Remarkably, the authors have identified that ferritin produced in the glia is secreted to be used as an iron source by the neurons. Therefore iron defects in glia have serious consequences in neuroblasts and likely vice versa. Interestingly, preventing iron absorption in the intestine is sufficient to reduce NB number. Furthermore, they have identified Zip13 as another regulator of the process. The evidence presented strongly indicates that loss of neuroblasts is due to premature differentiation rather than cell death.

      Strengths:

      • Comprenhensive analysis of the impact of glial iron metabolism in neuroblast behaviour by genetic and drug-based approaches as well as using a second model (mouse) for some validations.

      • Using cutting-edge methods such as RNAseq as well as very elegant and clean approaches such as RNAi-resistant lines or temperature-sensitive tools

      • Goes beyond the state of the art highlighting iron as a key element in neuroblast formation as well as as a target in tumor treatments.

      Weaknesses:

      Although the manuscripts have clear strengths, there are also some strong weaknesses that need to be addressed.

      • Some literature is missing

      Thanks for your reminder and we will add the missing literatures.

      • In general, the authors succeeded but in some cases, the authors´ claims are not fully supported by the evidence presented and additional experiments are critical to discriminate among different hypotheses.

      We are greatly grateful to the reviewer for recognizing our work, and we will support our conclusions with further evidence.

      • Moreover, some potential flaws might be present in the analysis of cell death and mitochondrial iron.

      We used Caspase-3 or TUNEL to indicate the apoptosis signal. Further, we overexpressed the anti-apoptosis gene p35 to inhibit apoptosis and found no rescue effect on neuroblast number. The results of these experiments are consistent.

      It is difficult to determine the mitochondrial iron of neuroblast, so we used indirect methods to test ferroptosis, such as TEM and iron (or iron chelator) supplement. We will perform more experiments according to recommendations to determine that.

      Reviewer #3 (Public Review):

      In this manuscript, Ma et al seek to identify stem cell niche factors. They perform an RNAi screen in glial cells and screen for candidates that support and maintain neuroblasts (NBs) in the developing fly brain. Through this, they identify two subunits of ferritin, which is a conserved protein that can store iron in cells in a non-toxic form and release it in a controlled manner when and where required. They present data to support the conclusion that ferritin produced in glia is released and taken up by NBs where it is utilised by enzymes in the Krebs cycle as well as in the electron transport chain. In its absence from glia, NBs are unable to generate sufficient energy for division and therefore prematurely differentiate via nuclear prospero resulting in small brains. The work will be of interest to those interested in neural stem cells and their non-cell autonomous control by niches.

      The past decade has seen a growing appreciation of how glial cells support and maintain NBs during development.

      The authors' discovery of glial-derived ferritin providing essential iron atoms for energy production is interesting and important. They have employed a variety of genetic tools and assays to uncover how ferritin in glia might support NBs. This is particularly challenging because there are no direct ways of assaying for iron or energy consumption in a cell-specific manner.

      There are however instances where conclusions are drawn to support the story being developed without considering the equally plausible alternative explanations that should ideally be addressed.

      For example, the data supporting the transfer of ferritin from glia to NBs was weak given the misexpression system used; the Shi[ts] experiment was also not convincing (perhaps they have more representative images?).

      Thanks for your comment. We have the negative control, which excludes the misexpression. As for Shits experiment, we will substitute for more representative images.

      The iron manipulation experiments are in the whole animal and it is likely that this affects general feeding behaviour, which is known to affect NB exit from quiescence and proliferative capacity. The loss of ferritin in the gut and iron chelators enhancing the NB phenotype are used as evidence that glia provide iron to NB to support their number and proliferation. Since the loss of NB is a phenotype that could result from many possible underlying causes (including low nutrition), this specific conclusion is one of many possibilities.

      Iron chelator (or iron salt) feeding is a common method for investigating metal metabolism in Drosophila[1-3]. And other metal chelators (such as copper and zinc chelator) do not have similar phenotype (data not shown), which can partially exclude this possibility. Further, iron absorption was blocked by knockdown of ferritin only in the iron cell region[1], a small part of midgut, which phenocopied iron chelator feeding, implying iron deficiency is probably the main cause of the phenotype. More importantly, iron chelator only enhances the NB phenotype in the ferritin knockdown group, not the control group, suggesting iron deficiency results in the phenotype, which rules out other possibilities.

      Similarly, knockdown of the FeS protein assembly components phenocopy glial ferritin knock down. Since iron is so important for the TCA and the ETC, this is not surprising, but the similarities in the two phenotypes seem insufficient to say that it's glial ferritin that's causing the lack of iron in the NB and therefore resulting in loss of NBs.

      It is hard to get this conclusion just by FeS protein assembly components knockdown, so we just used “implied” to describe this result. However, we combine several results to address this issue, including iron chelator feeding, ferritin knockdown in the midgut, the enhancement of phenotype by iron chelators, aconitase activity, GO enrichment, KEGG enrichment, and Zip13. These results pointed to the interpretation that iron deficiency in NBs caused by glial ferritin defects leads to NB loss.

      Pros RNAi will certainly result in an increase in NB numbers because the loss of pros results in an inability of NB progeny to differentiate. This (despite the slight increase in nuclear pros) is not sufficient to infer that glial ferritin knockdown results in premature differentiation of NBs via nuclear pros.

      First, pros RNAi, brat RNAi, or numb RNAi can each result in an inability of NB progeny to differentiate, respectively[4-6]. If the rescue of NB number by pros RNAi mainly relies on the differentiation block of NB progeny, brat RNAi or numb RNAi is expected to similarly rescue the NB number. However, our results showed that only pros RNAi could rescue the NB number, while brat RNAi or numb RNAi could not.

      Secondly, nuclear Pros represses genes required for self-renewal and is also required to activate genes for terminal differentiation[7]. Thus, Pros is kept in the cytoplasm and remains almost undetectable in the nuclei in normal NBs[8]. However, we observed the detectable Pros in the nuclei of some NBs after glial ferritin knockdown, and the NB number with detectable nuclear Pros was significantly increased when compared to control.

      Altogether, we conclude that NBs tend to undergo premature differentiation after glial ferritin knockdown.

      I recognise these are challenging to prove irrefutably, however, the frequency of such expansive interpretations of data is of concern.

      (1) Tang X, Zhou B. Ferritin is the key to dietary iron absorption and tissue iron detoxification in Drosophila melanogaster. FASEB J, 2013,27(1):288-98

      (2) Xiao G, Liu ZH, Zhao M, et al. Transferrin 1 Functions in Iron Trafficking and Genetically Interacts with Ferritin in Drosophila melanogaster. Cell Rep, 2019,26(3):748-58 e5

      (3) Mukherjee C, Kling T, Russo B, et al. Oligodendrocytes Provide Antioxidant Defense Function for Neurons by Secreting Ferritin Heavy Chain. Cell Metab, 2020,32(2):259-72 e10

      (4) Knoblich JA, Jan LY, Jan YN. Asymmetric Segregation of Numb and Prospero during Cell-Division. Nature, 1995,377(6550):624-7

      (5) Zacharioudaki E, Magadi SS, Delidakis C. bHLH-O proteins are crucial for neuroblast self-renewal and mediate Notch-induced overproliferation. Development, 2012,139(7):1258-69

      (6) Bello B, Reichert H, Hirth F. The brain tumor gene negatively regulates neural progenitor cell proliferation in the larval central brain of. Development, 2006,133(14):2639-48

      (7) Choksi SP, Southall TD, Bossing T, et al. Prospero acts as a binary switch between self-renewal and differentiation in Drosophila neural stem cells. Developmental Cell, 2006,11(6):775-89

      (8) Spana EP, Doe CQ. The Prospero Transcription Factor Is Asymmetrically Localized to the Cell Cortex during Neuroblast Mitosis in Drosophila. Development, 1995,121(10):3187-95

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for recognizing the importance of our work on transcription-independent early recovery of proteasome activity. We also thank them for their thoughtful criticisms and suggested improvements, which we addressed in the revised version as described below.

      The reviewers and editors asked for data to support the model that early recovery of proteasome activity is due to accelerated proteasome assembly. This model is backed by published data that proteasome assembly intermediates increase dramatically in cells treated with proteasome inhibitors (Fig. 6 in Ref. 46 of the revised manuscript). We expanded the discussion of this paper in a paragraph that describes our model. Another key experiment to confirm this model would be to determine what fraction of nascent polypeptides is degraded within minutes after synthesis, which is not trivial, and Ibtisam ran out of time to conduct these experiments because she had to graduate in spring before the expiration of her visa. This type of experiment usually uses metabolic labeling by a heavy or radioactive amino acid that always includes a prior depletion of a non-labeled amino acid. However, the fundamental flaw of this approach, which is not recognized by the scientific community, is that depletion of an amino acid stresses cells and reduces the rate of protein synthesis, especially if this amino acid is methionine. Thus, this model is not easy to test, and should be considered a speculation. We therefore moved the description of this model, together with Fig. 4, into a separate "Ideas and Speculations" section and removed this model's description from the abstract.

      Reviewer 1 raised the possibility that a background band detected on the western blot of DDI2 KO cells could be a highly homologous protease DDI1. This is highly unlikely because, according to Protein Atlas, DDI1 is selectively expressed in the testis and is not expressed in the cell lines we used. Reviewer 1 also suggested that we should base our conclusion on Nrf1 KD, which we de-facto did because we confirmed that DDI2 KD blocks Nrf1 activation (Fig. 1d).

      In response to Reviewer 1 critiques regarding the presentation of proteasome subunits stability data in Fig. 4 (Ref. 45 of the revised manusript), we removed PSMB8 and replaced chaperons with the subunits of the 26S base. We changed color palettes, symbols, and axis scales to improve clarity.

      We acknowledged in the discussion that our work did not exclude DDI2 role in the recovery of proteasome after repeated pulse treatments, as suggested by Reviewer 1.

      We agree with Reviewer 2 that using “proteasome levels” is inaccurate when describing our activity measurement data. However, in the manuscript, we use "levels" only when discussing data in the literature. We believe measuring activity and not the total levels is more important because not all proteasomes are active, e.g., latent 20S proteasome core particles.

      Reviewer 3 expressed concern that our conclusions were based on data in HAP1 cells, which are haploid, and appear not very sensitive to proteasome inhibitors. This is why we used DDI2 KD in MDA-MB-231 and SUM149 cells, which are highly sensitive to proteasome inhibitors (Weyburne et al., Ref. 11). In our experience, full extent of proteasome inhibitor cytotoxicity is not revealed until 48hr after treatments, and viability determined at 12hr and 24hr as on Fig. 1c should not be used to determine sensitivity (it was used for activity assay normalization). We added a new supplementary figure showing that HAP1 cells are as sensitive to proteasome inhibitors as MDA-MD-231 cells when cell viability is assayed 48hr after treatment (new Fig. S2). Another panel on this new figure demonstrates that the baseline proteasome activity is very similar in HAP1, MD-MB-231 and SUM149 cells. We also added data demonstrating that inactivation of DDI2 by mutation does not change the recovery of proteasome activity in HCT-116 cells (new Fig. 1g). Recovery in MDA-MB-231, SUM149, and HCT-116 cells was measured at 18hr, which is still within the 12 – 24hr window when other investigators observed partially DDI2-dependent recovery.

      We have conducted an experiment in which we followed activity recovery for up to 72hr. We found that activity plateaued at 24hr and opted against the repeat because there were no changes. We feel that the manuscript should not include one biological replicate data. The fact that the recovery is incomplete and that cells seem to survive with lower levels of proteasome activity is interesting; however, investigating the molecular basis for this phenomenon is beyond the scope of the current project.

      We were not disputing the conclusions of previous studies that DDI2/Nrf1 is responsible for enhanced expression of proteasomal mRNA in cells continuously treated with proteasome inhibitors. In fact, we confirmed that pulse-treatment causes similar increase (Fig. 2b). As for papers that measured activity recovery after pulse treatment, we objectively discuss our results in the context of these papers. In response to Reviewers' recommendations and minor points:

      • We reviewed the revised version carefully to eliminate spelling and grammatical errors and typos.

      • We no longer refer to DDI2 as a novel protease, as suggested by Reviewer 1.

      • We agree with Reviewer 2 that our CHX results do not necessarily mean that recovery involves translation of proteasomal mRNAs, and we now conclude that proteasome recovery requires protein synthesis.

      • We revised Fig. 1c, 3a and 4a to improve clarity.

      • We have stated in the caption that data in Fig. 4a comes from Table S4 in Ref. 45.

      • We accepted an excellent suggestion of Reviewer 3 to change "recovery" to "early recovery" in the title.

      • Regarding Reviewer 3 request to assay activity recovery at additional time points before 12h, this was done in the cycloheximide experiment in Fig. 3A.

      • Even if we assume that the differences in the observed recovery activity in MDA-MB-231 cells (Fig. 1f) are statistically significant, which may implicate DDI2 involvement in the activity recovery, the percentage is still small, suggesting that most activity recovery is DDI2-independent.

      • We toned down the statement "the present findings suggest that DDI2 desensitizes cells to PI by a different mechanism," replacing "suggest" with "raise a possibility".

      • We indicated that only Bortezomib is approved for mantle cell lymphoma.

      • We changed the description of clinical dosing as suggested by Reviewer 3. We added a reference on PK of subcutaneous bortezomib (Ref. 9), even though the review we quoted (Ref. 7) discussed subcutaneous dosing.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed most of the points that were made. However, despite some things that may well be beyond the scope, I would like to insist on a few small points:

      Point 1: If the authors have conducted a gross analysis of cardiac morphology by histology already, they should include this data in the manuscript and comment with 1-2 sentences that "cardiac healing"..."is unlikely influenced by developmental defects".

      We agree with the reviewer that this analysis is important. Therefore, we are currently conducting an in-depth analysis of the cardiac phenotype of different mouse strains lacking distinct subpopulations of cardiac macrophages in development and non-stimulated (baseline) conditions, including functional, metabolic and even electrophysiological aspects. These analysis will also include FIRE mice. While a gross analysis in this mouse strain did not show pathologic aspects, we look forward to the very detailed tissue characterization before publishing any data from a first basic analysis.

      Point 7: There is still no legend in Figure 6: what is read? What is blue?

      We added the respective legend in the figure.

      Point 8: Please add the information on the background of mice used for the different FIRE mice into the methods part of the paper

      We added the information in the Methods Part (lines 344-347).

      Reviewer #2 (Recommendations For The Authors):

      The authors have responded to all questions. I have no further comments and congratulate the authors on their work.

      We thank the reviewer for their important questions and the constructive feedback.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The delineation of MBOAT function is important with theoretical and practical implications in MAFLD, alcohol-induced hepatic steatosis, and lysosomal diseases. The strength of evidence is convincing using methodology in line with current state-of-the-art, with good support for the claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors provide mechanistic insights into how the loss of function of MBOAT7 promotes alcoholassociated liver disease. They showed that hepatocyte-specific genetic deletion of Mboat7 enhances ethanol-induced hepatic steatosis and increased ALT levels in a murine model of ethanol-induced liver disease. Through lipidomic profiling, they showed that mice with Mboat7 deletion demonstrated augmented ethanol-induced endosomal and lysosomal lipids, together with impaired transcription factor EB (TFEB)-mediated lysosomal biogenesis and accumulation of autophagosomes.

      Strengths:

      Alcohol-induced liver disease (ALD) and metabolic-associated steatotic liver disease (MASLD) are major global health problems, and polymorphism near the gene encoding MBOAT7 has been associated with these conditions. This paper is timely as it is important to gain insights on how loss of MBOAT function contributes to liver disease as this may eventually lead to therapeutic strategies. -The conclusions of the paper are mostly well supported by data.

      We sincerely thank Reviewer #1 for constructive feedback on this work.

      Weaknesses:

      (1) In regards to circulating levels of MBOAT7 products, a comparison of heavy drinkers with ALD versus heavy drinkers without ALD would be more clinically relevant.

      We agree this comparison would be an important comparison to make in future studies, but given the difficulties in accessing well-matched samples such as these we see this as beyond the scope of the current work.

      (2) A few typos need to be addressed. For Figure 1 - figure supplement 1, should the second column heading be "Heavy drinkers" instead of "Healthy drinkers"? Also, in the same figure, it is unclear what the "healthy" subcategory under MELD means.

      The typographical error was addressed in the main text and in all associated tables and figures.

      (3) Some of the data in the tables need to be addressed/discussed. For instance, the white blood cell count (WBC) in Figure 1 - figure supplement 1 for "healthy controls" is 34, compared to 13.51 for drinkers. A WBC of 34 is not at all healthy and should be explained. The vast difference between BMI and also between racial distribution within the two cohorts should also be explained. Is it possible that some of these differences contributed to the different levels of circulating MBOAT7 products that were measured?

      Sincere thanks for catching this error. In follow up, we found that some of our patient recruitment sites were using different units to report WBC counts (percent vs 1000/ml) and at this time we cannot retrospectively correct that difference. Therefore, we have incomplete WBC values for the cohort so elected to exclude that information to avoid confusing readers. A revised table is provided in revision reflecting these changes/ If we look at each site separately, values for WBC were in the normal range, so we do not think this is a major limitation of our studies. In regards to BMI and race: Race is not actually significant, but close. For BMI, there are 2 very low BMIs in the Heavy drinkers which bring that average down. We agree with Reviewer # 1 that race and / or BMI could impact MBOAT7, but larger cohorts are needed to detect such potential differences.

      (4) The representation of the statistical difference between the bars in the results figures by using alphabets is a bit confusing. For instance, in figure 2C, does that mean all the bars labelled A are significantly different from B? The solid black bar seems to be very similar to the open red bar; please double check.

      We apologize for this confusing presentation. Using the letter system, groups not sharing a common superscript differ statistically. Given this concern, we have gone back and reviewed all statistical comparisons and realized that there were several mistakes in the graph Figure 2C, Figure 3F and G, Figure 3-Supplementary Figure 1 F and Figure 3-Supplementary Figure 10H. The graphs themselves were not altered, but the denotation of statistical significance was updated with the correct letter superscripts.

      Reviewer #2 (Public Review):

      Summary:

      The work by Varadharajan et. al. explored a previously known genetic variant and its pathophysiology in the development of alcohol-associated liver injury. It provides a plausible mechanism for how varying levels of MBOAT7 could impact the lipid metabolomics of the cell, leading to a deleterious phenotype in MBOAT7 knockout. The authors further characterized the impact of the lipidomic changes and raised lysosomal biogenesis and autophagic flux as mechanisms of how MBOAT7 deletion causes the progression of ALD.

      Strengths:

      Connecting the GWAS data on MBOAT7 variants with plausible pathophysiology greatly enhances the translational relevance of these findings. The global lipidomic profiling of ALD mice is also very informative and may lead to other discoveries related to lipid handling pathways.

      We sincerely thank Reviewer #1 for constructive feedback on this work.

      Weaknesses:

      The rationale of why MBOAT7 metabolites are lower in heavy drinkers than in normal individuals is not well explained. MBOAT7 loss of function drives ALD, but unclear if MBOAT7 deletion also drives preference for alcohol or if alcohol inhibits MBOAT7 function. Presuming most individuals studied here were WT and expressed an appropriate level of MBOAT7?

      Although we were unable to genotype for the rs641738 SNP in the human subjects studied here, the original study by Buch et al. published in Nature Genetics performed cis expression quantitative trait lock (cis-eQTL) analyses to demonstrate that the minor disease-associated allele was associated with reduced MBOAT7 expression in subjects with alcohol-related cirrhosis. It is important to note that we did not see any evidence that alcohol preference was altered in either myeloid- or hepatocyte-specific Mboat7-knockout mice, given ethanol intake was similar in all genotypes. Additional studies are needed to address the possibility that MBOAT7 loss of function may promote alcohol preference, but we agree that this should be further investigated.

      Also, the discussion of mechanisms of MBOAT7-induced dysregulation of lysosomal biogenesis/autophagy, while very interesting, seems incomplete. It is not clear how MBOAT7 an enzyme involved in membrane phospholipid remodeling increases mTOR which leads to decreased TFEB target gene transcription.

      Although we agree with Reviewer #2 that mechanistic understanding by which MBOAT7 loss of function impacts mTOR activity and TFEB-driven lysosomal biogenesis is still incomplete, we do feel that the results published here will inform downstream investigation linking phosphatidylinositol remodeling to mTOR and TFEB. The MBOAT7 gene encodes an acyltransferase enzyme that specifically esterifies arachidonyl-CoA to lysophosphatidylinositol (LPI) to generate the predominant molecular species of phosphatidylinositol (PI) in cell membranes (38:4). It is well established that PI-related lipids can regulate membrane dynamics and signal transduction pathways. For instance PI-phosphates (PIPs) are dynamically shaped by PI kinases and phosphatases to play crucial roles in the regulation of a wide variety of cellular processes via specific interactions of PIP-binding proteins. Among PI phosphates, PI 3phosphate (PI3P) regulates vesicular trafficking pathways, including endocytosis, endosome-toGolgi retrograde transport, autophagy and mTOR signaling. Although additional work is needed to understand the molecular details of how MBOAT7-driven LPI acylation impacts mTOR and TFEB, it is not particularly surprising that PI lipid remodeling could broadly impact cell signaling.

      Furthermore, given the significant disturbances of global lipidomic profiling in MBOAT7 knockout, many pathways are potentially affected by this deletion. Further in vivo modeling that specifically addresses these pathways (TFEB targeting, mTOR inhibitor) would help strengthen the conclusions of this paper.

      We agree that further in vivo studies are needed that are beyond the scope of the current work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) p values are rather hard to read. For example, Figure 2c, Hepatocyte-specific deletion of Mboat7 resulted in enhanced ethanol-induced increases in liver weight. However, doesn't look like there is a significant difference between the 2 EtOH groups in Figure 2C? Same comment for Figure 2e, not sure if pair-fed groups had a significant difference.

      (2) Figure 2 Supp fig 1, what is the top band on the MBOAT7 WB?

      We have addressed these statistical comparison comments as described above. Although we cannot be sure, it is likely that the top band on the MBOAT7 Western blot is a non-specific band that shows up with the antibody combination used given there is equal intensity in the Mboat7flox/flox and the MSKO mice (Mboat7flox/flox+LysM-Cre).

  2. Feb 2024
    1. Author Response

      eLife assessment

      This manuscript provides useful information about the lipid metabolite 15d-PGJ2 as a potential regulator of myoblast senescence. The authors provide experimental evidence that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas. However, the manuscript is incomplete in its current form, as it lacks robust support from the data regarding the main conclusions related to senescence and technical concerns related to the senescence models used in this study.

      Authors Response- We ae grateful to the editors and the reviewers for their time and comments in sharpening the science and the writing of the manuscript. We have attached a detailed response to emphasize that the manuscript does include robust evidence regarding the claims, which could have been missed during the review process. We have provided a better context for these points now.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B). We will characterize other markers of senescence including senescence-associated β galactosidase in the revised manuscript.

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We only observed the death of cells at higher concentrations of 15d-PGJ2 (5 µM and 10 µM) (Fig. S2A), but not significantly at the 4 µM concentration used in Figure 2. This is the reason 4uM was used, and we should have clarified this. We will include viability data for the low concentration of 15d-PGJ2 (4 µM) in the revised manuscript.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death.

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ¬2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears.

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15d-PGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15d-PGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation. The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      The novelty of the study is compromised as the activation of PGD and 15d-PGJ2, as well as the regulation of HRas and cell proliferation, have been previously reported.

      Literature does support this statement, and it is important to clarify this mis-impression for the field as whole

      Let us clarify-

      Covalent modification of HRas by 15d-PGJ2 has been reported only twice in the literature(Luis Oliva et al., 2003; Yamamoto et al., 2011) in fibroblasts and neurons respectively.

      Interaction between HRas and 15d-PGJ2 in skeletal muscles has not been shown before, even though both HRas and 15d-PGJ2 are shown to be key regulators of muscle homeostasis.

      Activation of HRas by 15d-PGJ2 was reported first by Luis Oliva et al (Luis Oliva et al., 2003). However, this study does not comment on the functional implications of activation of HRas signaling.

      Recently, our lab contributed to a study where the functional implication of activation of HRas signaling due to covalent modification by 15d-PGJ2 was shown in the maintenance of senescence phenotype (Wiley et al., 2021).

      15d-PGJ2 was shown to inhibit the differentiation of myoblasts by Hunter et al (Hunter et al., 2001). This study hypothesized that the inhibition of myoblast differentiation is via 15d-PGJ2 mediated activation of the PPARγ signaling, the study also showed inhibition of myoblast differentiation independent of PPARγ activity, suggesting the presence of other mechanisms.

      This is the first study to show a molecular mechanism where activation of HRas signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of HRas inhibits the differentiation of skeletal myoblasts.

      Additionally, there are major technical concerns related to the senescence models, limiting data interpretation regarding the relevance to senescent cells.

      Major concerns:

      (1) The C2C12 cell line is not an ideal model for senescence study due to its immortalized nature and lack of normal p16 expression. A more suitable myoblasts model is recommended, with a more comprehensive characterization of senescence features.

      C2C12 is a good model for DNA damage based senescence that is used in this manuscript. It is not a models for replicative senescence since it is immortalized. In this study we show that C2C12 cells undergo DNA damage mediated senescence after treatment with Doxo. We also observe similar phenotype in MCF7 breast cancer cells and IMR90 lung fibroblasts after treatment with Doxo (Data will be updated in the supplementary figure 1). Also, several reports in the literature have shown induction of senescence in C2C12 cells. Moiseeva et al 2023 show induction of senescence in C2C12 cells after etoposide mediated DNA damage. Moustogiannis et al 2021 show induction of replicative senescence in C2C12 cells.

      (2) The source of increased PGD or its metabolites in the conditioned medium is unclear. Including other senescence models, such as replicative or oncogene-induced senescence, would strengthen the study.

      Fig. 1E shows time dependent increase in the expression of PGD2 biosynthetic enzymes in senescent C2C12 cells. Fig. 1F shows increase in the levels of 15d-PGJ2 secreted by senescent C2C12 cells in the conditioned medium. This data shows that senescent C2C12 cells are the source of PGD and its metabolites in the conditioned medium.

      Again, C2C12 is not suitable for replicative senescence due to its immortalized status.

      We and others have shown that C2C12 cells undergo senescence, and this manuscript only used DNA damage induced senescence.

      (3) In the in vivo part, it is unclear whether the increased expression of PTGS1, PTGS2, and PTGDS is due to senescence or other side effects of DOXO.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (4) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of a conditioned medium.

      Figure 2A tests the effect of prostaglandin PGD2 and its metabolites secreted by the senescent cells on the differentiation of myoblasts. Therefore, we inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment, whereas differentiation of C2C12 cells without any treatment serves as a positive control.

      There is no explanation of how differentiation was quantified or how the fusion index was calculated.

      The fusion index was calculated using a published myotube analyzer software (Noë et al., 2022). Appropriate info will be added to the materials and methods section in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript offers a commendable exploration into the relationship between plasma omega-6/omega-3 fatty acid ratios and mortality outcomes.

      Strengths:

      The chosen study design and analytical techniques align well with the research objectives, and the results resonate with existing literature.

      Weaknesses:

      Lack of information on the selection criteria for participants; 5. The analysis of individual PUFAs is not appropriate; The definition of comorbidities is vague; The rationale of conducting the mediation analysis of blood biomarkers is not given.

      Thank you for your insightful feedback and for acknowledging the strengths of our manuscript, particularly regarding the alignment of our study design and analytical methods with our research objectives. Your recognition of how our results resonate with existing literature is greatly appreciated.

      Addressing the concerns you've raised:

      Selection Criteria for Participants: In the “Methods-Study population” section, we have outlined the exclusion criteria for participant selection. This information provides comprehensive insight into our methodology for selecting the study cohort.

      Analysis of Individual PUFAs: We acknowledge your concern regarding the analysis of individual PUFAs due to their inter-correlations in plasma levels. However, the correlations between omega-3% and omega-6% (r = -0.12) and between DHA% and LA% (r = 0.03) are actually low. Because DHA is one of omega-3 PUFAs, we did not include PUFAs in the same model. Similar considerations apply to LA and omega-6. We believe that exploring the effects of individual fatty acids adds valuable depth to our research. Both DHA and LA have been included in the same model due to their low correlation, with careful adjustments for confounding factors to provide a nuanced understanding of their individual impacts on mortality.

      Definition of Comorbidities: The definition of comorbidities, including hypertension, diabetes, and longstanding illness, is elaborated under the Methods section. These conditions were identified through self-reported data collected via the Assessment Centre Environment (ACE) touchscreen questionnaire, allowing us to capture a broad range of chronic conditions as reported by participants.

      Rationale for Mediation Analysis: Initially, our approach to mediation analysis included various blood biomarkers available in the UK Biobank database to explore the potential underlying pathways. However, upon considering your feedback regarding the overlap of fatty acids with lipid classes or lipid particles in plasma, we have decided to remove these elements from our mediation analysis.

      Reviewer #2 (Public Review):

      Summary:

      This study utilized a large sample from the UK Biobank which enhanced statistical robustness, employed a prospective design to establish clear temporal relationships, used objective biomarkers for assessing plasma omega-6/omega-3 ratio, and investigated various mortality causes including CVD and cancer for a holistic health understanding.

      Strengths:

      The authors used a large sample size, employed a prospective design, and investigated various mortality.

      Weaknesses:

      Analyzing n-3 and n-6 PUFAs separately might be less instructive. It might not be methodologically sound to treat TG, HDL, LDL, and apolipoproteins as mediators. It's imperative to exercise caution when drawing causal conclusions from the observed correlations. The manuscript might propose potential research trajectories.

      We are grateful for your thoughtful analysis of our study's strengths and for your constructive feedback on areas for improvement.

      Response to Weaknesses:

      Analyzing n-3 and n-6 PUFAs Separately: We recognize the challenge in analyzing n-3 and n-6 PUFAs separately due to their correlations. However, the correlation between n-3% and n-6% in UK Biobank was actually relatively low (r = -0.12). We include them in one model to test if both are associated with the outcomes after controlling for the effects of the other. Indeed, both were negatively associated with the mortality outcomes in our analysis. We believe our supplemental analysis of n-3 and n-6 PUFAs provides useful information to the readers, in addition to our findings based on the n-6/n-3 ratio.

      Mediation Analysis of TG, HDL, LDL, and Apolipoproteins: We appreciate your insight on the methodological considerations of treating these biomarkers as mediators. After careful review and in line with suggestions from another reviewer, we have removed these elements from our mediation analysis. This revision improves the net scientific rigor of our work, ensuring that our conclusions are drawn from the most robust and methodologically sound of our analyses.

      Causal Conclusions from Correlations: We fully agree with the need for caution in interpreting correlations in observational studies. To this end, we have avoided implying causality in our manuscript. Terms suggesting causality, like "protective effects," have been replaced with "inverse associations" to more accurately represent our findings. This adjustment enhances the clarity and accuracy of our conclusions.

      Proposing Future Research Trajectories: Recognizing the importance of advancing causal and mechanistic understanding in this field, we have called for future studies to further examine causality and characterize molecular mechanisms of the observed associations in our study.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find out whether the levels of omega-6 and omega-3 fatty acids in the blood are linked to the likelihood of dying from anything, of dying from cancer and of dying from cardiovascular disease. They use a large dataset called UK Biobank where fatty acid levels were measured in blood at the start of the study and what happened to the participants over the following years (average of 12.7 years) was followed. They find that both omega-6 AND omega-3 fatty acids were linked with less likelihood of dying from anything, from cancer and from cardiovascular disease. The effects of omega-3s were stronger. They then made a ratio of omega-6 to omega-3 fatty acids and found that as that ratio increased risk of dying also increased,. This supports the idea that omega-3s have stronger effects than omega-6s.

      Strengths:

      This is a large study (over 85,000 participants) with a good follow up period (average 12.7 years). Using blood levels of fatty acids is superior to using estimated dietary intakes. The authors take account of many variables that could interfere with the findings (confounding variables) - they do this using statistical methods.

      Weaknesses:

      There are several omega-6 and omega-3 fatty acids - it is not clear which ones were actually measured in this study

      Thank you for recognizing the strengths of our study, including the large sample size, the duration of follow-up, and our methodological approach to using blood levels of fatty acids and addressing potential confounders. Regarding the weakness you've highlighted, we understand the importance of specifying which omega-6 and omega-3 fatty acids were analyzed in our study. We have revised the Method section to provide detailed information about how the exposures were measured.

      Recommendations for the author:

      Reviewer #1 (Recommendations for the Authors):

      To elevate the manuscript's scholarly rigor, I propose the following refinements:

      (1) The manuscript lacks information on the selection criteria for participants and the representativeness of the UK Biobank cohort. It is important to provide details on how participants were selected and whether it is representative of the general population, which is crucial for assessing the generalizability of the findings.

      We appreciate the opportunity to clarify the participant selection criteria and the representativeness of the UK Biobank cohort within our manuscript. In the “Methods-Study population” section, we delineated the exclusion criteria: "Participants with cancer (n=37,736) or CVD (n=100,972), those who withdrew from the study (n=879), and those with incomplete data on the plasma omega-6/omega-3 ratio (n=277,372) were excluded from this study, leaving 85,425 participants, 6,461 died during follow-up, including 2,794 from cancer and 1,668 from CVD." To further address representativeness, we performed a sensitivity analysis, examining the baseline characteristics of participants included in our study relative to those omitted due to lack of exposure information. This analysis, presented in Additional file 2: Table S13, indicates comparable baseline characteristics across both participant groups, bolstering confidence in the representativeness of our study sample with the general UK Biobank participants.

      Regarding the UK Biobank's representativeness with the general population, we acknowledge that the cohort does not mirror the broader UK demographic in terms of socioeconomic and health profiles. Participants in the UK Biobank generally exhibit better health and higher socioeconomic status than the average UK resident, potentially influencing the disease prevalence and incidence rates. Nonetheless, the UK Biobank's extensive sample size and comprehensive exposure data enable the generation of valid estimates for exposure-disease associations. These estimates have been corroborated by findings from more demographically representative cohorts, as highlighted in the studies by Batty et al., and Fry et al..

      We recognize the importance of this aspect and will incorporate a discussion on the implications of these factors for the generalizability of our findings in the “Discussion-Limitations” section of our manuscript. We are grateful for this insightful comment and believe that this addition will enhance the manuscript's contribution to the field.

      Here is what we added in the “Discussion-Limitations” section of our manuscript: “Third, we acknowledged that the cohort did not mirror the broader UK demographic in terms of socioeconomic and health profiles. Participants in the UK Biobank generally exhibited better health and higher socioeconomic status than the average UK resident, potentially influencing the disease prevalence and incidence rates. Nonetheless, the UK Biobank's extensive sample size and comprehensive exposure data enable the generation of valid estimates for exposure-disease associations. These estimates have been corroborated by findings from more demographically representative cohorts47,48.”

      References:

      Batty, G. D., Gale, C. R., Kivimäki, M., et al. Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis. BMJ. 2020; 368: m131.

      Fry A, Littlejohns TJ, Sudlow C, et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol. 2017;186(9):1026–34.

      (2) The study sample included different ancestries which may introduce confounding from genetic background. As over 90% of the participants were of European ancestry, I recommend excluding individuals of non-European ancestry in the main analysis.

      Thank you for raising the concern regarding the inclusion of different ancestries in our study sample and the potential confounding. In our research, we have adhered to the widely accepted practice of including all participants in the study to ensure a comprehensive analysis. Recognizing the predominance of European ancestry within our cohort, which exceeds 90%, we have proactively incorporated ethnicity as a covariate in our statistical models to mitigate confounding influences.

      We also considered the feasibility of conducting a stratified analysis for non-European participants. However, the small sample sizes of non-European subgroups do not provide sufficient statistical power to yield reliable or meaningful separate analyses. Consequently, to maintain the integrity and robustness of our findings, we opted to include all participants in the main analysis, adjusting for ethnicity to account for potential confounders.

      (3) I noted that a large proportion of participants were excluded due to the lack of data on plasma PUFAs. Were the characteristics of these participants similar to the current analysis sample?

      Thank you for raising this very important point. According to UK Biobank, “The EDTA plasma samples were picked randomly and are therefore representative of the 502,543 participants in the full cohort.” (As detailed in Julkunen et al.) Moreover, as noted in our reply to comment #1 above, we performed a sensitivity analysis, examining the baseline characteristics of participants included in our study relative to those omitted due to lack of exposure information.

      The results of this analysis are detailed in Additional file 2: Table S13. They demonstrate that the baseline characteristics—such as age, gender, ethnicity, socioeconomic status, and lifestyle habits—are indeed similar between the two groups. This similarity supports the representativeness of our analysis sample and suggests that the exclusion of participants without plasma PUFA data does not introduce a bias that would undermine the validity of our study's findings.

      References:

      Julkunen H, Cichońska A, Tiainen M, et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat Commun. 2023 Feb 3;14(1):604. doi: 10.1038/s41467-023-36231-7.

      (4) The methods section should include a detailed description of the measurement of plasma omega-6/omega-3 fatty acid ratio. It is important to provide information on the analytical techniques used and any quality control measures implemented to ensure the accuracy and reliability of the measurements. Importantly, were repeated measurements done?

      Thank you for raising this important point. The details of the metabolomic profiling have been described in previous UK Biobank publications. In this revision, we added a brief description of the measurement process and provided references to previous publications.

      Here is what we added in the “Methods- Ascertainment of exposure” section of our manuscript: “Metabolomic profiling of plasma samples was performed with high-throughput nuclear magnetic resonance (NMR) spectroscopy. At the time of this analysis (15 Mar 2023), UK Biobank released the Phase 1 metabolomic dataset, which covered a random selection of 118,461 plasma samples from the baseline recruitment. These samples were collected between 2007 and 2010 and had been stored in −80 °C freezers, while the NMR measurements took place between 2019 and 2020. Detailed descriptions could be found in previous publications about plasma sample preparation, NMR spectroscopy setup, quality control protocols, correction for sample dilution, verification with duplicate samples and internal controls, and comparisons with independent measurements from clinical chemistry assays20-22.”

      (5) The analysis of individual PUFAs is not appropriate because plasma levels of these PUFAs, including n-3 PUFAs and n-6 PUFAs, EPA, DHA and AA, are usually correlated. It is hard to differentiate these correlated FAs in Cox model. Whereas the ratio of n-6/n-3 is indeed more comprehensive, and the current analysis demonstrated this ratio as a good marker of mortality. Therefore, the analyses of individual PUFAs can be removed and only focus on the ratio of n-6/n-3.

      We resonate with the Reviewer regarding the importance of focusing on the ratio of n-6/n-3. Indeed, the ratio is our focus in this manuscript. We also acknowledge the Reviewer's concern regarding the inclusion of correlated covariates in one statistical model. In that specific analysis, the correlations between omega-3% and omega-6% (r = -0.12) and between DHA% and LA% (r = 0.03) are relatively low. Additionally, we also checked the model for multicollinearity and found that the variance inflation factors (VIFs) were within acceptable ranges. In the fully adjusted model that included omega-3% and omega-6%, all variables had VIFs below 1.13, with omega-3% at a VIF of 1.06 and omega-6% at a VIF of 1.12. Similarly, in the model including DHA% and LA%, all variables also exhibited VIFs under 1.13, with DHA% recording a VIF of 1.07 and LA% a VIF of 1.10. Because DHA is one of omega-3 PUFAs, we did not include them in the same model. We did not include LA and omega-6 in the same model, either. Because the ratio has two components and each component is the sum of multiple individual PUFAs, it is natural to ask which component is more important (e.g., omega-6 or omega-3?), which specific fatty acid is driving the effect of omega-3 PUFAs (e.g., ALA? Or the marine omega-3, EPA and DHA?). We received such feedback frequently when we presented our research previously. Therefore, as an effort to address them, we performed analysis of omega-3, omega-6, DHA, and LA. While we understand the complexities involved in differentiating the effects of individual fatty acids in a Cox model, we believe there is intrinsic value in exploring these relationships further. In our analysis, we have attempted to investigate the effects of individual PUFAs on mortality by including both DHA and LA within the same model due to their low correlation, making adjustments to account for confounding factors (As detailed in Additional file 2: Table S9). Our findings indicate significant inverse associations between both DHA and LA with all-cause, cancer, and cardiovascular disease (CVD) mortality. We agree with the Reviewer that the focus of our manuscript should be the ratio, but also hope the Reviewer will agree with us that keeping the results from individual PUFAs will provide additional useful information to the readers.

      (6) The definition of comorbidities (including hypertension, diabetes, and longstanding illness) is vague. Please clarify what diseases longstanding illness includes.

      We appreciate the request for clarification regarding the definition of comorbidities in our study, including the categorization of longstanding illness. The information regarding longstanding illnesses was obtained via the Assessment Centre Environment (ACE) touchscreen questionnaire. Participants were asked, "Do you have any long-standing illness, disability, or infirmity?" with the response options being “Yes,” “No,” “Do not know,” and “Prefer not to answer.” For the purposes of our analysis, participants who selected “Yes” were categorized as having a longstanding illness, while the remaining options were grouped as not having a longstanding illness.

      This method of classification aligns with our detailed explanation in the “Methods-Ascertainment of covariates” section of the manuscript, where we state that “Comorbidities, including hypertension, diabetes, and longstanding illness, were self-reported at baseline. Longstanding illness refers to any long-standing illness, disability, or infirmity, without other specific information.” It is important to note that this approach is consistent with established precedents in the field. Specifically, the paper by Li et al. in the BMJ utilized a similar definition for comorbidities, reinforcing the validity of our methodology.

      References:

      Li ZH, Zhong WF, Liu S, et al. Associations of habitual fish oil supplementation with cardiovascular outcomes and all cause mortality: evidence from a large population based cohort study. BMJ. 2020 Mar 4;368:m456.

      (7) The rationale of conducting the mediation analysis of blood biomarkers is not given. Since fatty acids can be formed as TG or bound with apolipoproteins in plasma, there is a large overlap of FAs with these biomarkers and thus it is not appropriate to analyze TG, HDL, LDL, and apolipoproteins as mediators.

      We are grateful for the insightful feedback regarding the mediation analysis of blood biomarkers. Our mediation analysis aimed to explore the possible biomarkers and biological processes that explain the effects of PUFAs on mortality. Upon reflection, we recognize the complexities introduced by the inherent overlap of fatty acids with different lipid particles and lipid classes in plasma. Considering the potential confounding this overlap presents, and in agreement with your recommendation, we have decided to remove the mediation analyses involving cholesterol, TG, HDL-C, LDL-C, Lp(a), ApoA, and ApoB from our study. We appreciate your guidance on this matter and have updated our manuscript accordingly to reflect these changes.

      Reviewer #2 (Recommendations for the Authors):

      (1) Analyzing n-3 and n-6 PUFAs separately might be less instructive given the inherent correlations among plasma levels of n-3 PUFAs and n-6 PUFAs. Also, some important specific PUFAs, such as ALA, AA, EPA, etc. were not available in the UK Biobank data though the authors tried to analyze LA and DHA. The n-6/n-3 ratio, as evidenced by the current analysis, offers a more holistic perspective and might be a superior mortality marker. Thus, I recommend shifting the focus solely to this ratio.

      Thank you for the thoughtful comment. Reviewer #1 raised a similar point (comment #5 above). We are glad that both reviewers recognized the importance of the omega-6/omega-3 ratio and agreed with us that the ratio should be the focus of the paper. Please also see our more detailed response above. Briefly, our manuscript centered on the ratio, while the supplemental analysis of omega-3%, omega-6%, DHA%, and LA% provided additional useful information. We included omega-3% and omega-6% in the same model because their correlation was relatively low (r = -0.12). We also checked the model for multicollinearity and found that the variance inflation factors (VIFs) for n-3 PUFAs and n-6 PUFAs were within acceptable ranges. In the fully adjusted model that included omega-3% and omega-6%, all variables had VIFs below 1.13, with omega-3% at a VIF of 1.06 and omega-6% at a VIF of 1.12. Similarly, in the model including DHA% and LA%, all variables also exhibited VIFs under 1.13, with DHA% recording a VIF of 1.07 and LA% a VIF of 1.10. Therefore, we decided to keep the content for omega-3 and omega-6 PUFAs. We hope that Reviewer will agree with us that this content only provides additional information to the readers.

      (2) It might not be methodologically sound to treat TG, HDL, LDL, and apolipoproteins as mediators. Since the model included comorbidities as covariates, hypercholesteremia and hypertriglyceridemia seemed to have been adjusted in the analysis. Thus, further adjusting these blood biomarkers for mediation analysis which overlapped with comorbidities is redundant.

      We appreciate your critical evaluation of our methodological approach. Your point is well-taken, especially in light of the fact that comorbidities such as hypercholesterolemia and hypertriglyceridemia have been accounted for as covariates in our model. This overlap, as you correctly identified, could indeed render the mediation analysis redundant. In concordance with your recommendation, and incorporating the comments of another reviewer, we have now omitted the mediation analysis involving these blood biomarkers from our study. We believe this adjustment strengthens the methodological soundness of our research and are thankful for your contribution to this refinement. We have updated our manuscript to reflect these changes and ensure our analysis remains robust and free from redundancy.

      (3) It's imperative to exercise caution when drawing causal conclusions from the observed correlations. The inherent constraints of observational studies, coupled with potential residual confounding or reverse causality, should be acknowledged.

      We concur with the caution against implying causality from correlations observed in our study. As such, we have carefully refrained from claiming any causal relationships within our paper. We acknowledge that the term "protective effects" could suggest a causal inference, and we have revised our language to describe these observations as "inverse associations" to more accurately reflect the nature of our findings.

      We have also addressed the inherent limitations of observational research in the Discussion section under 'limitations' of our manuscript. There, we recognize that while we have accounted for many confounders, the possibility of residual confounding cannot be entirely excluded. We also agree that reverse causality is a concern in observational studies. To mitigate this, we performed a sensitivity analysis excluding participants who died within the first year of follow-up. The results from this analysis, which are provided in Additional file 2: Table S12, show consistency with our main findings, suggesting that the observed associations are less likely to be predominantly driven by reverse causation. We are grateful for your insights, which have guided us in strengthening our manuscript and ensuring that our conclusions are presented with the appropriate scientific rigor.

      (4) To guide subsequent scholarly endeavors, the manuscript might propose potential research trajectories, such as spearheading randomized controlled trials to delve deeper into the causal nexus between plasma omega-6/omega-3 ratios and mortality outcomes or probing the mechanistic underpinnings of the observed correlations.

      We agree that conducting randomized controlled trials could illuminate the potential causal relationships between plasma PUFA biomarkers and mortality outcomes. While the primary focus of our manuscript is to report on associations, we acknowledge the importance of causal analysis in advancing the field. In our secondary analysis, we touched upon mediation effects of blood biomarkers, which could serve as a preliminary step towards establishing causality. Although our current work did not delve deeply into causal mechanisms, the results we have presented may indeed stimulate further exploration. By reporting our mediation analysis results, we aim to provide a foundation that other researchers might build upon. We hope that our work will act as a catalyst for more in-depth studies, such as RCTs or mechanistic investigations, to pursue the questions we have begun to explore.

      Following this recommendation, we have revised our Conclusion paragraph and added: “Our findings support the active management of a high circulating level of omega-3 fatty acids and a low omega-6/omega-3 ratio to prevent premature death. Future research is warranted to further test the causality, such as Mendelian randomization and randomized controlled trials. Mechanistic research, including comprehensive mediation analysis, in-depth experimental characterization in animal models or cell lines, and intervention studies, is also needed to unravel the molecular and physiological underpinnings.”

      Reviewer #3 (Recommendations for the Authors):

      (1) Line 32. Delete "a balanced" because a balanced o6:o3 cannot be defined.

      Thank you for pointing out the issue with the term "a balanced". Most authors agree with your observation that defining what constitutes a 'balanced' ratio can be ambiguous and potentially misleading. One author, JTB, disagrees that “balance” as a concept is unacceptably ambiguous or misleading. In response, we have removed the words from our manuscript.

      (2) In the abstract you should present the findings for omega-6 and omega-3 PUFAs first and then the findings for the ratio.

      We appreciate your suggestion to present the findings for omega-6 and omega-3 PUFAs prior to those for the ratio in the abstract. As laid out in the Background section, the ratio was our primary exposure of interest. So, we organized our manuscript by centering on the ratio. We are glad that both Reviewer #1 and #2 expressed a particular interest in the ratio findings and urged us to keep the ratio as the focus. We believe that this emphasis reflects the novel aspects of our research and aligns with the thematic structure of our manuscript.

      (3) Line 80. controversial should read uncertain.

      Thank you for the suggestion. We have changed “controversial” to “uncertain”.

      (4) It is unclear which fatty acids are included in total PUFAs, omega-6 PUFAs and omega-3 PUFAs. It is vital that this is specified.

      Thank you very much for your suggestion. We agree that it is important to clarify the specific fatty acids included in the analysis. In the revised manuscript, we emphasized that we analyzed “total omega-6 PUFAs” and “total omega-3 PUFAs”, while “LA is one type of omega-6 PUFAs” and “DHA is one type of omega-3 PUFAs”. We also revised the Method section of “Ascertainment of exposure” to provide more information about how the exposures were measured. Here is what we added in the “Methods- Ascertainment of exposure” section of our manuscript: “Five PUFAs-related biomarkers were directly measured in absolute concentration units (mmol/L), including total PUFAs, total omega-3 PUFAs, total omega-6 PUFAs, docosahexaenoic acid (DHA), and linoleic acid (LA). Of note, DHA is one type of omega-3 PUFAs, and LA is one type of omega-6 PUFAs. Our primary exposure of interest, the omega-6/omega-3 ratio, was calculated based on their absolute concentrations. We also performed supplemental analysis for four exposures, the percentages of omega-3 PUFAs, omega-6 PUFAs, DHA, and LA in total fatty acids (omega-3%, omega-6%, DHA%, and LA%), which were calculated by dividing their absolute concentrations to that of total fatty acids.”

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The signaling pathway upstream of Maf1 remains unknown. In eukaryotes, Maf1 is a negative regulator of RNA pol III and is regulated by external signals via the TORC pathway. Since TORC components are absent in the apicomplexan lineage, one central question that remains open is how Maf1 is regulated in P. falciparum. Magnesium is probably not the sole stimulus involved, as suggested by the observation that Ile deprivation also down-regulates RNA pol III activity.

      We agree that there is still much to uncover relating to the PfMaf1 signaling pathway. While we still do not know each component, we have been able to link external factors (of course not limited to only magnesium) to the increased nuclear occupancy of PfMaf1. Other protein interactors that potentially regulate PfMaf1, while not confirmed, have been identified in plasma sample as candidates for future experiments to validate their potential involvement of RNA Pol III inhibition.

      The study does not address why MgCl2 levels vary depending on the clinical state. It is unclear whether plasma magnesium is increased during asymptomatic malaria or decreased during symptomatic infection, as the study does not include control groups with non-infected individuals. Along the same line, MgCl2 supplementation in parasite cultures was done at 3mM, which is higher than the highest concentrations observed in clinical samples.

      This reviewer raised a valid point. The plasma magnesium levels for the wet symptomatic samples (averaging [0.79mM]) were within the normal range of a healthy individual (between [0.75-0.95mM]) while the dry asymptomatic levels were above the normal range (averaging [1.13mM]). Ideally, we would have liked to have control uninfected plasma samples from individuals from The Gambia. Unfortunately, field studies and human volunteer studies do not always have all the ideal controls that in vitro studies have. We recognize that [3mM] is higher than the normal range for magnesium levels, which is why we included a revised Supplementary Figure 3A. This figure shows that magnesium concentrations as low as [1mM] (similar to the levels found in dry asymptomatic samples) reduced the expression of RNA Pol III-transcribed genes.

      Although the study provides biochemical evidence of Maf1 accumulation in the parasite nuclear fraction upon magnesium addition, this is not fully supported by the immunofluorescence experiments.

      We agree that the resolution of IFA images does not allow to support the WB data. We believe that the importance of the IFA Supplementary Figure is to show that PfMaf1 clusters together in foci, which has not been previously reported.

      Reviewer #2 (Public Review):

      Weaknesses:

      However, most analyses are rather preliminary as only very few (3-5) candidate genes are analyzed by qPCR instead of carrying out comprehensive analyses with a large qPCR panel or RNA-seq experiments with GO term analyses. Data presentation lacks clarity, the number of biological replicates is rather low and the statistical analyses need to be largely revised. Although the in vivo data from wet (mildly symptomatic) and dry (asymptomatic) season parasites with different expression levels of Pol III-regulated genes, var genes, and MgCl2 are interesting, the link between the in vitro data and the in vivo virulence of P. falciparum, which is made in many sections of the manuscript, should be toned down. Especially since (i) the only endothelial receptor studied is CD36, which is associated with parasite binding during mild malaria, and (ii) several studies provide contradictory data on MgCl2 levels during malaria and in different disease states, which is not further discussed, but the authors mainly focused on this external stimulus in their experiments.

      We agree that, ideally, we would have liked to do full RNA-seq on The Gambia samples. However, that was out of the scope of this project. The RNA samples were limited which is why we did not use more primers. We believe that an appropriate number of replicates was done for the experiments. The wet symptomatic samples from this study were from mildly symptomatic individuals, as stated in the manuscript. Therefore, CD36 was a relevant receptor to use for our studies.

      We agree that the published studies about magnesium levels in infected individuals are not always consistent. What these studies do not consider is the time of year, whether the infection occurred during the dry or wet season. These studies were also done in different regions of the world using different technologies. For this reason, we only highlight the observed difference observed in our field study data from The Gambia.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) The signals upstream of Maf1 remain rather a black box. 4 are tested - heat shock and low-glucose, which seem to suppress ALL transcription; low-Isoleucine and high magnesium, which suppress Pol3. Therefore the authors use Mg supplementation throughout as a 'starvation type' stimulus. They do not discuss why they didn't use amino acid limitation, which could be more easily rationalised physiologically. It may be for experimental simplicity (no need for dropout media) but this should be discussed, and ideally, sample experiments with low-IsoLeu should be done too, to see if the responses (e.g. cytoadhesion) are all the same.

      We agree that deprivation of isoleucine would have been another experimental assay for our study, but it also would not have been as novel as magnesium. While understanding the exact mechanism or involvement of magnesium as a stress condition was not the scope of this manuscript, we believe that our data will be valuable into demonstrating that external stimuli act on P. falciparum virulence gene expression via RNA Pol III inhibition. Since we also had plasma level data for magnesium, and not isoleucine, we believed it made for a better external factor to use for our in vitro studies.

      (2) The proteomics, conducted to seek partners of Maf1, is probably the weakest part. From Figure S3: the proteins highlighted in the text are clearly highly selected (as ones that might be relevant, e.g. phosphatases), but many others are more enriched. It would be good to see the whole list, and which GO terms actually came top in enrichment.

      We apologize if the reviewer did not see the attached supplementary Co-IP MS data. The file includes all proteins found in each sample as well as GO term analysis. For the purpose of this work, we highlight proteins potentially involved in the canonical role of Maf1 that have been shown in model organisms to reversibly inhibit RNA Pol III (phosphatases, RNA Pol III subunits).

      (3) Figure 3 shows the Maf1-low line has very poor growth after only 5 days but it is stated that no dead parasites are seen even after 8 cycles and the merozoites number is down only ~18 to 15... is this too small to account for such poor growth (~5-fold reduced in a single cycle, day 3-5)? It would additionally be interesting to see a cell-cycle length assessment and invasion assay, to see if Maf1-low parasites have further defects in growth.

      We agree with the reviewer that the observed reduced merozoite numbers may not the only cause of the reduced growth rate. Other factors in the PfMaf1 knock-down line may contribute to the observed poor growth.

    1. Author Response

      Our answer to reviewer #1 comments:

      We attempted to perform structural characterization of the ASK1 complex with TRX1, but were unable to prepare a sufficiently stable ASK1:TRX1 complex for cryo-EM analysis, probably due to their relatively weak interactions. Therefore, we subsequently decided to use HDX-MS to characterize the structural changes of ASK1 induced by interactions with TRX1.

      Detailed information about cryo-EM data processing including 2D classification averages, local resolution of the EM map and FSC figure are shown in Supporting Information, Supplementary Table S1 and Figures S1-S3.

      We fully agree with the reviewer that the presence of hydrogen bonding cannot be reliably described at this resolution. However, if there is a sufficient electron density in a given region and a corresponding hydrogen bond donor-acceptor pair in the model, this suggests the possible presence of such an interaction.

      Our answer to reviewer #2 comments:

      We are fully aware that the use of a C-terminally truncated construct limits this study due to the presumed role of the C-terminus in ASK1 dimerization. A C-terminally truncated construct consisting of TBD, CRR, and KD (residues 88-973) was used due to the low expression yield and solubility of full-length human ASK1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you and the two reviewers for the thorough review of our manuscript. We found the reviewer’s comments highly valuable and addressed them by the following additional experiments and changes in the text and the figures:

      (1) We measured the effect of ROCK MASO’s on the ROCK expression by immunostaining and observed a reduction in ROCK signal, supporting the downregulation of ROCK protein level under ROCK MASO’s (new Fig. S3).

      (2) We measured the effect of lower concertation of ROCK inhibitor, Y27632 (10µM), and observe the same phenotypes of skeletal loss, skeletal reduction and ectopic branching in this concentration (Fig. 2, S4). Importantly, these phenotypes were not observed when directly inhibiting PKA and PKC, in whole sea urchin embryos (1) and in skeletogenic cell cultures (2), further supporting the specificity of ROCK inhibitor.

      (3) We added a time course of Pl-ROCK expression and immunostaining of ROCK in the fertilized egg, that show that this gene is maternal and the protein is present in the egg Fig. 2SA-C.

      (4) We recorded F-actin in ROCK MASO’s and demonstrate that it is still detected around the spicules and their tips, similarly to ROCK inhibited embryos (new Fig.S3).

      (5) We revised the paper text and figures to provide a better description of our results, distinguish clearly between our data and our interpretations and emphasize the novelty of our findings.

      This paper demonstrates that ROCK, F-actin polymerization and actomyosin contractility play critical roles in biomineral growth and in shaping biomineral morphology in the sea urchin embryo, and that ROCK activity affects skeletogenic gene expression. Our findings together with previous reports of the role of actomyosin in Eukaryotes biomineralization, suggest that this molecular machinery is a part of the common molecular tool-kit used in biomineralization. The identification of a common molecular mechanism within the diverse gene regulatory networks, organic scaffolds and minerals that Eukaryote use to build their biominerals will be of high interest to the field of biomineralization and evolutionary biology. Furthermore, our paper portrays the interplay between the cellular and the genetic machinery that drives morphogenesis. We believe it would be of great interest to the broad readership of eLife and particularly to the fields of biomineralization, cell, developmental and evolutionary biology.

      Thank you very much for the helpful review of our paper.

      Reviewer #1 (Public Review):

      We thank the reviewer for the appreciation of our work the helpful comments that guided us to strengthen the experimental evidence for our conclusions and increase the paper’s clarity. Below are our responses to the specific comments:

      Major comments

      One MASO led to reduced skeleton formation while the other one additionally induced ectopic branching. How was the optimum concentration for the MASOs determined? Did the authors perform a dose-response curve? What is the reason for this difference? Which of the two MASOs can be validated by reduced ROCK protein abundance? Since the ROCK antibody works, I would like to see a control experiment on Rock protein abundance in control and ROCK MO injected larvae which is the gold-standard for validating the knock-down.

      We tested several MASO concentrations to identify a concentration where the control embryos injected with Random MASO were overall healthy and ROCK MASO’s showed clear phenotypes.

      To test the effect of ROCK MASO’s on ROCK protein levels we did immunostaining experiments that are now presented in new Fig. S3. We could not do Western blot for injected embryos since ROCK antibody requires thousands of embryos for Western blot, which is not feasible for injected embryos. Therefore, we tested the effect of the two translation ROCK MASO’s on ROCK abundance compared to uninjected and Random MASO injected embryos using immunostaining. We observed a reduction of ROCK signal, supporting the downregulation of ROCK protein level in these genetic perturbations (new Fig. S3).

      L212 "Together, these measurements show that ROCK is not required for the uptake of calcium into cells." But what about trafficking and exocytosis? As mentioned earlier, I think this is a really important point that needs to be confirmed to understand the function of ROCK in controlling calcification. In their previous study (reference 45) the authors demonstrated that they have superior techniques in measuring vesicle dynamics in vivo. Here an acute treatment with the ROCK inhibitor would be sufficient to test if calcein-positive vesicle motion, including the observed reduction in velocity close to the tissue skeleton interface, is affected by the inhibitor.

      We thank the reviewer for the appreciation of our previous work where we studied calcium vesicle dynamics in whole embryos (Winter et al, Plos Com Biol 2021). We agree with the reviewer that the best way to test directly the effect of ROCK on mineral deposition and vesicle kinetics is to observe it in live skeletogenic cells. However, in Winter et al 2021, we found that the skeleton (spicules) doesn’t grow when the embryos are immobilized in either control or treated embryos. We have to immobilize the embryos to record live timelapses of whole embryos. Hence, this means that we can not determine the role of ROCK or any other perturbation in vesicle trafficking and exocytosis based on experiments conducted in immobilized whole embryos, since skeletogenesis is arrested. We believe that we can do it in skeletogenic cell cultures and we are currently developing this assay for vesicle tracking, but this is beyond the scope of this current work.

      Is there a colocalization of ROCK and f-actin in the tips of the spicules? This would support the mechano-sensing-hypothesis by ROCK.

      Our studies show that F-actin is localized around the spicule cavity and in the cortex of the cells (Figs. 5 and 6) while ROCK is enriched in the skeletogenic cell bodies, with some localization near the skeletogenic cell membranes (Fig. 1). To directly address the reviewer question we immune-stained ROCK and F-actin in the same embryos, and showed that their sub-cellular localizations does not show a strong overlap (Fig. S3 Q-T). However, ROCK does not bind F-actin directly: ROCK activates another kinase, LimK that phosphorylates Cofilin that interacts with F-actin. Therefore, the fact that ROCK is not colocalized with F-actin does not support nor contradicts the possible role of ROCK in mechano-sensing.

      L 283. "F-actin is enriched at the tips of the spicules independently of ROCK activity" The results of this paragraph clearly demonstrate that ROCK inhibition has no effect on the localization of f-actin at the tips of the growing spicules. In addition, the new cell culture experiments underline this observation. Still, the central question that remains is, what is the interaction between ROCK, f-actin, and the mineralization process, that leads to the observed deformations? What does the f-actin signal look like in a branched phenotype or in larvae that failed to develop a skeleton (inhibition from Y20)?

      As we report in Fig. 6, and now on new Fig. S3, under ROCK late inhibition or in ROCK morphants, we still detect F-actin around the spicule and enriched at the tips. When ROCK is inhibited and the embryo fails to develop a skeleton, we observe Factin accumulation in the skeletogenic cells, but the F-actin is not organized (Fig. 5). As the spicule is absent in this condition, it is hard to conclude whether the effect on F-actin organization is direct or due to the absence of spicule in this condition. We stated that explicitly in the current version in the results, lines 324-326 and in the discussion, lines 405-408.

      Immunohistochemical analyses on f-actin localization and abundance should be additionally performed with ROCK knock-down phenotypes to confirm the pharmacological inhibition.

      We did that in our new Figure S3 and showed that ROCK morphant show the same F-actin localization at the tips like control and ROCK inhibited embryos.

      L 365 "...supporting its role in mineral deposition..." "...Overall, our studies indicate that ROCK activity....is essential for the formation of the spicule cavity......which could be essential for mineral deposition..." I think the authors need to do a better job in clearly separating between the potential processes impacted by ROCK perturbation. Is it stabilization and mechano-sensing in the spicule tip or the intracellular trafficking and deposition of the ACC? If the dataset does not allow for a definite conclusion, I suggest clearly separating the different possibilities combined with thorough discussion-based findings from other mineralizing systems where the interaction between ROCK and F-actin has been described.

      We thank the reviewer for this important comment. We believe that ROCK and the actomyosin are involved in both, mechano-sensing of the rigid biomineral and in the transport and exocytosis of mineral-bearing vesicles. In the current version we provide explicit explanations of these two hypotheses in the discussion section. The possible role in exocytosis and the experiments that are required to assess this role are described in lines 427-439, and the possible mechano-sensing role and effect on gene expression is described in lines 440-453.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      L185 "These SR-µCT measurements show that the rate of mineral deposition is significantly reduced under ROCK inhibition." To correctly support this statement I would suggest to calculate the real growth rates (µm3 time-1). For example, an increase in volume from 6,850 µm3 at 48 hpf to 14,673 µm3 at 72 hpf would result in a growth rate of 7823 µm3 24h-1.

      We thank the reviewer for this suggestion. We calculated the rate of spicule growth as the reviewer suggested and we added this information in lines 218-221.

      L343: "This implies that....within the skeletogenic lineage." This concluding sentence is very speculative and therefore misplaced in the results section.

      We removed this sentence from the results section into the discussion, lines 443-445.

      L382: "The participation of F-actin and ROCK in polarized tip-growth and vesicle exocytosis has been observed in both, animals and plants." L407-409: "...F-actin could be regulating the localized exocytosis of mineral-bearing vesicles...." I think this is exactly the core question that remains unresolved in this study. To reduce speculations I strongly recommend addressing the effect of ROCK inhibition on vesicle trafficking and exocytosis (Monitoring of calcein-positive Vesicles in PMCs).

      We agree with the reviewer that this is a critical question that we would have address, but as we explained above, is beyond the scope of this study.

      Figure 5: The values below the scale bars in the newly added figures U+V are extremely small. Also, the Legend for this figure sounds incorrect. Should read: "...and skeletogenic cell cultures that were treated with 30µM ROCK inhibitor that was added at 48hpf and recorded at 72hpf.

      We increased the font near the scale bars and corrected the figure caption. Thanks for this and your other helpful comments!

      Reviewer #2 (Public Review):

      We thank the reviewer for raising the important issue of inhibitor concentration which led us to do additional experiments with lower concentration that were valuable and strengthen the manuscript. We also thank the reviewer for asking us to be clearer with the interpretation of the results. Below are our responses to the specific comments:

      My concerns are the interpretation of the experiments. The main overriding concern is a possible over-interpretation of the role of ROCK. In the literature that ROCK participates in many biological processes with a major contribution to the actin cytoskeleton. And when a function is attributed to ROCK, it is usually based on the determination of a protein that is phosphorylated by this kinase. Here that is not the case. The observation here is in most cases stunted growth of the spicule skeleton and some mis-patterning occurs or there is an absence of skeleton if the inhibitor is added prior to initiation of skeletal growth. They state in the abstract that ROCK impairs the organization of F-actin around the spicules. The evidence for that as a direct role is absent.

      We agree with the reviewer that since the spicule doesn’t form under ROCK continuous inhibition, it is unclear if the absence of F-actin around the spicule in this condition is a direct outcome of the lack of ROCK activation of F-actin polymerization, or an indirect outcome due to the lack of spicule to coat. We therefore deleted this line in the abstract and explicitly stated that we cannot conclude whether the impaired F-actin organization is directly due to ROCK effect on actin polymerization in the results, lines 324-326 and in the discussion, lines 405-408.

      They use morpholino data and ROCK inhibitor data to draw their conclusion. My main concern is the concentration of the inhibitor used since at the high concentrations used, the inhibitor chosen is known to inhibit other kinases as well as ROCK (PKA and PKC). They indicate that this inhibition is specifically in the skeletogenic cells based on the isolation of skeletogenic cells in culture and spicule production either under control or ROCK inhibition and they observe the same - stunting and branching or absence of skeletons if treated before skeletogenesis commences. Again, however, the high concentrations are known to inhibit the other kinases.

      In the previous version of the paper we used the range of 30-80µM Y-27632 to block ROCK activity. These concentrations are commonly used in mammalian systems and in Drosophila to block ROCK activity (3-8). The reviewer is correct stating that at high concentration, this inhibitor can block PKA and PKC. However, the affinity of the inhibitor for these kinases is more than 100 times lower than its affinity to ROCK as indicated by the biochemical Ki values reported in the manufactory datasheet: 0.14-0.22 μM for ROCK1, 0.3 μM for ROCK2, 25 μM for PKA and 26 μM for PKC.

      Importantly, these Ki values are based on biochemistry assays where the activity of the inhibitor is tested in-vitro with the purified protein. Therefore, these concentrations are not relevant to cell or embryo cultures where the inhibitor has to penetrate the cells and affect ROCK activity in-vivo. Y-27632 activity was studied both in-vitro and in-vivo in Narumiya, Ishizaki and Ufhata, Methods in Enzymology 2000 (9). This paper reports similar concentrations to the ones indicated in the manufactory datasheet for the in-vitro experiments, but shows that 10µM concentration or higher are effective in cell cultures. We therefore tested the effect of 10µM Y-27632 added at 0hpf (continuous inhibition) and at 25hpf (late inhibition) and added this information to Figs. 2 and S3. Continuous inhibition at this concentration resulted with three major phenotypes: skeletal loss, spicule initiations and small spicules with ectopic branching. This result supports our conclusion that ROCK activity is necessary for spicule formation, elongation and prevention of branching. Late inhibition in this concentration resulted with the majority of the embryos developing branched spicules, which is very similar to the effect of MyoII inhibition with Blebbistatin. This result again, supports the inference that ROCK activity is required for normal skeletal growth and the prevention of ectopic branching. Importantly, there are two papers were PKA and PKC were directly inhibited in whole sea urchin embryos (1) and in skeletogenic cell cultures (2). In both assays, PKC inhibition resulted with mild reduction of spicule length while PKA inhibition did not affect skeletal formation. Neither skeletal loss nor ectopic branching were ever observed under PKC or PKA inhibition, supporting the specific inhibition of ROCK by Y-27362. Furthermore, both genetic and pharmacological perturbations of ROCK resulted with significant reduction of skeletal growth and with the enhancement of ectopic branching. Therefore, we believe we provide convincing evidence for the role of ROCK in spicule formation, growth and prevention of branching. We revised Fig. 2 and S3 to include the 10µM Y-27632 data and the text describing the inhibition to include the explanations and references we provided here.

      They use blebbistatin and latrunculin and show that these known inhibitors of actin cytoskeleton lead to abnormal spiculogenesis, This coincidence is suggestive but is not proof that it is ROCK acts on the actomyosin cytoskeleton given the specificity concerns.

      As stated above, we believe that in the current vesion we overcame the specificity concerns and provided solid evidence that ROCK activity is necessary for spicule formation, growth and prevention of branching. Furthermore, the skeletogenic phenotypes of late 10µM Y-27632 are highly similar to those of MyoII inhibition (Blebbistatin) while the phenotypes of higher concetrations resemble the inhibition of actin polymerization by Latrunculin. We agree with the reviewer that: “This coincidence is suggestive but is not proof that ROCK acts on the actomyosin cytoskeleton” and we revise the discussion paragraph to differentiate between our solid findings and our speculations (lines 421-426): “These correlative similarities between ROCK and the actomyosin perturbations lead us to the following speculations: the low dosage of late ROCK inhibition is perturbing mostly ROCK activation of MyoII contractility while the higher dosage affects factors that control actin polymerization (Fig. 8F). Further studies in higher temporal and spatial resolution of MyoIIP activity and F-actin structures in control and under ROCK inhibition will enable us to test this.”

      Reviewer #2 (Recommendations For The Authors):

      The following areas require attention:

      (1) You begin and end the abstract with statements on evolution in which the actomyosin cytoskeleton is associated with skeletogenesis despite different GRNs, different contributing proteins, etc. You then move to ROCK and claim to reveal that ROCK is a central player in the process. As above, in the judgement of this reviewer, you fail to establish a direct role of ROCK to the actomyosin role in skeletogenesis. Sure, the ROCK inhibitors suggest that ROCK plays some kind of role in the process but you also indicate that ROCK could act on many processes, none of which you directly associate with the necessary activity of ROCK.

      We agree that our paper provides correlative similarities between the phenotypes of ROCK and those of direct pertrubations of the actomyosin network, and lacks causal relationship. We made this point clear throughout the current version of the manuscript.

      (2) In the abstract you report that ROCK inhibition impairs the actin cytoskeleton around the skeleton. In examining your images in Fig. 5 that is not the case. Based on Phalloidin staining, actin surrounds both the control and the ROCK-inhibited skeleton. The distribution of actin is the same in both cases. Myosin is also stained in this figure and it too shows similar staining both in experimental and control. So, to this reviewer, there is insufficient evidence to suggest that the actin cytoskeleton is impaired, and there is no evidence directly relating ROCK with that cytoskeleton. I'm not questioning the observation that inhibition of ROCK causes stunting and mispatterning of the skeleton. That you show and quantify well. The issue is the precise target of ROCK. Your data does not establish the specific cause. It could be the actin cytoskeleton but your experiments do not directly address that.

      Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (3) In parts of the manuscript you use the term filopodia and in other parts I think you use pseudopodia to refer to the same structure. Since Ettensohn has provided the most evidence on the organization of the skeletogenic syncytia, I suggest you use the same term he used for those cellular extensions.

      The filopodia and the pseudopodia are two distinct structures generated by the skeletogenic cells. The filopodia is the common cellular extension described in many cells, while the term “pseudopodia cable” describes the specific structure that forms between the skeletogenic cells in which the spicule cavity forms, in agreement with Prof. Ettensohn terminology.

      (4) In trying to find relationships you cite a number of previous papers at the end of the introduction. I went back to those papers and they describe (from your work) calcium exocytosis, plus filopodia formation, plus planar cell polarity, plus CDC42, any one of which could involve an actin cytoskeleton. You even cite a paper saying that perturbations of ROCK prevent spicule formation. I went back to that paper and that isn't the case. You then summarize the Introduction by relating ROCK and the actin cytoskeleton, thereby raising reader expectation that the two will be connected. As above, in reality, your evidence here does not connect the two.

      We thank the reviewer for giving us credit for all these works, but only the paper on vesicle kinetics is from our lab (winter et al 2021). As for Croce et al, 2006 that the reviewer refers to: in Fig. 9A, 75µM of Y-27632 is used to inhibit ROCK in the same sea urchin species that we use, and the phenotype is identical to what we observe – the skeletogenic cells are there, but the spicule is not formed. As mentioned above, in the current version we distinguished clearly between our solid findings and our interpretations.

      (5) You emphasize in Fig. 1 the inhibition of ROCK in the presence of VEGFR inhibition. However, at no place in the manuscript do you say anything about how VEGFR is inhibited, when it is inhibited, or how you know it is inhibited. That oversight must be corrected. You mention axitinib but don't say anything about what it does. Some readers may know its activity but many will not.

      We now indicate that we use Axitinib to block VEGFR in the results section (line 104) and in the methods section (lines 470-471).

      (6) Fig. 2. The use of Y27632 as a selective inhibitor of ROCK. According to data sheets from the manufacturer, at the levels used in your experiments, 120 µm, 80 µm and 30 µm, those levels of inhibitor also inhibit the activity of PKA and PKC (both inhibited at around 25 µm). This is concerning because of the literature indicating that activation of the VEGFR operates through PKA. Inhibition of PKA, then, would inhibit the activity of VEGF signaling. Thus, the inhibitory effects of Y27632 may actually not be attributed specifically to ROCK. Furthermore, the heading of this section states that ROCK activity controls initiation, growth, and morphology of the spicule. Yet, even in high levels of inhibitor spicule production is initiated. Yes, the growth and the morphology are compromised, but the initiation doesn't seem to be.

      The spicule fails to form under ROCK continuous inhibition in all concentrations (Fig. 2). Also, as we explained in details above, these Ki values are based on biochemical experiments with purified proteins and are not relevant to in-vivo use of the inhibitor. Yet, these Ki values demonstrate that the affinity of the inhibitor to ROCK is 100 higher than of its affinity to PKA and PKC. Specifically to the reviewer suggestion here: direct inhibition of PKA does not have skeletogenic phenotypes, not in whole embryos (1) and not in skeletogenic cell culture (2). Since we see the same skeletogenic phenotypes at low Y-27362 concentration and the genetic and pharmacological pertrubations of ROCK reconcile, we believe that these phenotypes can be atributed directly to ROCK.

      (7) The synchrotron study is very nice with two points that should be addressed. Again, a high concentration of Y27632 was used giving a caveat on ROCK specificity. And second, the blue and green calcein pulses are very nice but the recent paper by the Bradham group should be cited.

      We added a reference to Bradham recent paper on two calcein pulses (10).

      (8) Fig. 5 is where an attempt is made to associate ROCK inhibition to alterations in actomyosin. Again, a high concentration of the inhibitor is used casting doubt on whether it specifically inhibits ROCK. However, even if the inhibition is specific to ROCK the images do not provide convincing evidence that ROCK activity normally is directed toward actomyosin. This is crucial to the manuscript.

      As stated above, we addressed the specificity in this version and we modified the text to emphasize the correlation and not cuasation: Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (9) Again in Fig. 6 the inhibitor is used with the same concern about whether the effects noted are due to ROCK.

      Fig. 6 is now Fig. 7 – the effect of ROCK on gene expression and as explained above, we addressed the specificity in this version.

      (10) Lines 350-358. This interpretation falls apart without showing that the inhibitor is specific for ROCK as indicated above. Also, Fig. 5 is unconvincing in showing a difference in actin or myosin distribution in control vs ROCK inhibited embryos. Yes, the spicules are stunted, but whether actin or myosin have anything to do with that as a result of lack of ROCK activity is not demonstrated.

      As stated above, we addressed the specificity in the revised version and we modified the text to emphasize the correlation and not cuasation: Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (11) Throughout, the manuscript spelling, grammar, and sentence structure will require extensive editing. The mistakes are numerous.

      We did our best to correct the spelling and grammar. If we still missed some mistakes, we would be happy to further correct them.

      References

      (1) Mitsunaga K, Shinohara S, Yasumasu I. Probable Contribution of Protein Phosphorylation by Protein Kinase C to Spicule Formation in Sea Urchin Embryos: (sea urchin/protein kinase C/spicule formation/H-7/HA1004). Dev Growth Differ. 1990;32(3):335-42.

      (2) Mitsunaga K, Shinohara S, Yasumasu I. Does Protein Phosphorylation by Protein Kinase C Support Pseudopodial Cable Growth in Cultured MicromereDerived Cells of the Sea Urchin, Hemicentrotus pulcherrimus?: (sea urchin/protein kinase C/spicule formation/phorbol ester/H-7). Dev Growth Differ. 1990;32(6):647-55.

      (3) Su Y, Huang H, Luo T, Zheng Y, Fan J, Ren H, et al. Cell-in-cell structure mediates in-cell killing suppressed by CD44. Cell Discov. 2022;8(1):35.

      (4) Kagawa H, Javali A, Khoei HH, Sommer TM, Sestini G, Novatchkova M, et al. Human blastoids model blastocyst development and implantation. Nature. 2022;601(7894):600-5.

      (5) Canellas-Socias A, Cortina C, Hernando-Momblona X, Palomo-Ponce S, Mulholland EJ, Turon G, et al. Metastatic recurrence in colorectal cancer arises from residual EMP1(+) cells. Nature. 2022;611(7936):603-13.

      (6) Becker KN, Pettee KM, Sugrue A, Reinard KA, Schroeder JL, Eisenmann KM. The Cytoskeleton Effectors Rho-Kinase (ROCK) and Mammalian DiaphanousRelated (mDia) Formin Have Dynamic Roles in Tumor Microtube Formation in Invasive Glioblastoma Cells. Cells. 2022;11(9).

      (7) Segal D, Zaritsky A, Schejter ED, Shilo BZ. Feedback inhibition of actin on Rho mediates content release from large secretory vesicles. J Cell Biol. 2018;217(5):1815-26.

      (8) Fischer RS, Gardel M, Ma X, Adelstein RS, Waterman CM. Local cortical tension by myosin II guides 3D endothelial cell branching. Curr Biol. 2009;19(3):2605.

      (9) Narumiya S, Ishizaki T, Uehata M. Use and properties of ROCK-specific inhibitor Y-27632. Methods Enzymol. 2000;325:273-84.

      (10) Descoteaux AE, Zuch DT, Bradham CA. Polychrome labeling reveals skeletal triradiate and elongation dynamics and abnormalities in patterning cue-perturbed embryos. Dev Biol. 2023;498:1-13.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The OSCA/TMEM63 channels have recently been identified as mechanosensitive channels. In a previous study, the authors found that OSCA subtypes (1, 2, and 3) respond differently to stretch and poke stimuli. For example, OSCA1.2 is activated by both poke and stretch, while OSCA3.1, responds strongly to stretch but poorly to poke stimuli. In this study, the authors use cryo-EM, mutagenesis, and electrophysiology to dissect the mechanistic determinants that underlie the channels' ability to respond to poke and stretch stimuli.

      The starting hypothesis of the study is that the mechanical activation of OSCA channels relies on the interactions between the protein and the lipid bilayer and that the differential responses to poke and stretch might stem from variations in the lipid-interacting regions of OSCA proteins. The authors specifically identify the amphipathic helix (AH), the fenestration, and the Beam Like Domain (BLD) as elements that might play a role in mechanosensing.

      The strength of this paper lies in the technically sound data - the structural work and electrophysiology are both very well done. For example, the authors produce a high-resolution OSCA3.1 structure which will be a useful tool for many future studies. Also, the study identifies several interesting mutants that seemingly uncouple the OSCA1.2 poke and stretch responses. These might be valuable in future studies of OSCA mechanosensation.

      However, the experimental approach employed by the authors to dissect the molecular mechanisms of poke and stretch falls short of enabling meaningful mechanistic conclusions. For example, we are left with several unanswered questions surrounding the role of AH and the fenestration lipids in mechanosensation: Is the AH really important for the poke response if mutating residues conserved between OSCA1.2 and OSCA3.1 disrupts the OSCA1.2 ability to respond to poke but mutating the OSCA1.2 AH to resemble that of OSCA3.1 results in no change to its "pokability"? Similar questions arise in response to the study of the fenestrationlining residues.

      We thank the reviewer for their feedback. We believe that the different OSCA1.2 mutants on their own suggest an involvement of the AH and fenestration-lining residues in its mechanosensitive response. We attribute the inability to restore the poke response of OSCA3.1 with similar mutations to its inherent high threshold to this particular stimulus and perhaps other structural differences, or a combination of them, that we did not probe in this study. We agree more work is required in the field to address these remaining questions and further dissect the difference between poke and stretch responses.

      Reviewer #2 (Public Review):

      Summary:

      Jojoa-Cruz et al. determined a high-resolution cryo-EM structure in the Arabidopsis thaliana (At) OSCA3.1 channel. Based on a structural comparison between OSCA3.1 and OSCA1.2 and the difference between these two paralogs in their mechanosensitivity to poking and membrane stretch, the authors performed structural-guided mutagenesis and tested the roles of three structural domains, including an amphipathic helix, a beam-like domain, and a lipid fenestration site at the pore domain, for mechanosensation of OSCA channels.

      Strengths:

      The authors successfully determined a structure of the AtOSCA3.1 channel reconstituted in lipid nanodiscs by cryo-EM to a high resolution of 2.6 Å. The high-resolution EM map enabled the authors to observe putative lipid EM densities at various sites where lipid molecules are associated with the channel. Overall, the structural data provides the information for comparison with other OSCA paralogs.

      In addition, the authors identified OSCA1.2 mutants that exhibit differential responses to mechanical stimulation by poking and membrane stretch (i.e., impaired response to poke assay but intact response to membrane stretch). This interesting behavior will be useful for further study on differentiating the mechanisms of OSCA activation by distinct mechanical stimuli.

      Major weakness:

      The major weaknesses of this study are the mutagenesis design and the functional characterization of the three structural domains - an amphipathic helix (AH), a beam-like domain (BLD), and the fenestration site at the pore, in OSCA mechanosensation.

      (1) First of all, it is confusing to the reviewer, whether the authors set out to test these structural domains as a direct sensor(s) of mechanical stimuli or as a coupling domain(s) for downstream channel opening and closing (gating). The data interpretations are vague in this regard as the authors tend to interpret the effects of mutations on the channel 'sensitivity' to different mechanical stimuli (poking or membrane stretch). The authors ought to dissect the molecular bases of sensing mechanical force and opening/closing (gating) the channel pore domain for the structural elements that they want to study.

      We agree with the reviewer that our data are unable to distinguish the transduction of a mechanical stimulus and channel gating. We set up to determine whether these features were involved in the mechanosensitive response. However, as the reviewer points out, evaluating whether they work as direct sensors or coupling domains would require a more involved experimental design that lies beyond the scope of this work. Thus, we do not claim in our study whether these features act as direct sensors of mechanosensitive stimuli or as coupling domains, only their involvement.

      Furthermore, the authors relied on the functional discrepancies between OSCA1.2 (sensitive to both membrane poking and stretch) and OSCA3.1 (little or weak sensitivity to poking but sensitive to membrane stretch). But the experimental data presented in the study are not clear to address the mechanisms of channel activation by poking vs. by stretch, and why the channels behave differently.

      We had hoped that when we switched regions of the OSCA1.2 and OSCA3.1 channels we would abolish poke-induced responses in OSCA1.2 and confer poke-induced sensitivity to OSCA3.1. We agree with the reviewer that we were not able to pinpoint the reason or multiple reasons, as it could be a compounded effect of several differences, that caused OSCA3.1 higher threshold and thus we could not confer to it an OSCA1.2-like phenotype. Yet, we shed some light on some of the structural differences that appear to contribute to OSCA3.1 behavior, as mutagenesis of OSCA1.2 to resemble this channel led to OSCA3.1-like phenotype.

      (2) The reviewer questions if the "apparent threshold" of poke-induced membrane displacement and the threshold of membrane stretch are good measures of the change in the channel sensitivity to the different mechanical stimuli.

      The best way to determine an accurate measure of sensitivity to mechanical stimuli is stretch applied to a patch of membrane. There are more complicating factors that influence the determination of "apparent threshold" in the whole cell poking assay, including visualizing when the probe first hits the cell (very difficult to see). With that said, the stretch assay has its own issues such as the creep of the membrane into the pipette glass which we try to minimize with positive pressure between tests.

      (3) Overall, the mutagenesis design in the various structural domains lacks logical coherence and the interpretation of the functional data is not sufficient to support the authors' hypothesis. Essentially the authors mutated several residues on the hotspot domains, observed some effects on the channel response to poking and membrane stretch, then interpreted the mutated residues/regions are critical for OSCA mechanosensation. Examples are as follows.

      In the section "Mutation of key residues in the amphipathic helix", the authors mutated W75 and L80, which are located on the N- and C-terminal of the AH in OSCA1.2, and mutated Pro in the OSCA1.2 AH to Arg at the equivalent position in OSCA3.1 AH. W75 and L80 are conserved between OSCA 1.2 and OSCA3.1. Mutations of W75 and/or L80 impaired OSCA1.2 activation by poking, but not by membrane stretch. In comparison, the wildtype OSCA3.1 which contains W and L at the equivalent position of its AH exhibits little or weak response to poking. The loss of response to poking in the OSCA1.2 W/L mutants does not indicate their roles in pokinginduced activation.

      Besides, the P2R mutation on OSCA1.2 AH showed no effect on the channel activation by poking, suggesting Arg in OSCA3.1 AH is not responsible for its weak response to poking. Together the mutagenesis of W75, L80, and P2R on OSCA1.2 AH does not support the hypothesis of the role of AH involved in OSCA mechanosensation.

      Mutagenesis of OSCA1.2 in the amphipathic helix for residues W75 and L80 suggests a role of the helix in the poke response in OSCA1.2, regardless of OSCA3.1 having the same residues. Furthermore, the lack of alteration in the response for mutant P77R suggests that specific residues of the helix are involved in this response and is not a case where any mutation in the helix will lead to a loss of function.

      OSCA3.1 WT exhibits a high-threshold response (near membrane rupture) in the poke assay without any mutations, and this could be due to other features, for example, the residues lining the membrane fenestration, as well as features not identified/probed in this study. We agree with the reviewer that the differences in the AH do not explain the different response to poke in OSCA1.2 and OSCA3.1, and we have added this statement explicitly in the discussion for clarification (line #251-252).

      In the section "Replacing the OSCA3.1 BLD in OSCA1.2", the authors replaced the BLD in OSCA 1.2 with that from OSCA3.1, and only observed slightly stronger displacement by poking stimuli. The authors still suggest that BLD "appears to play a role" in the channel sensitivity to poke despite the evidence not being strong.

      We agree with the reviewer that the experiments carried out show little difference between the response of OSCA1.2 WT and OSCA1.2 with OSCA3.1 BLD, and we have stated so (line #259: “Substituting the BLD of OSCA1.2 for that of OSCA3.1 had little effect on poke- or stretchactivated responses. Although these results suggest that the BLD may not be involved in modulating the MA response of OSCA1.2…”). However, the section of the discussion that the reviewer points out also considers evidence provided by recent reports from Zheng, et al. (Neuron, 2023) and Jojoa-Cruz, et al. (Structure, 2024) and we suggest an hypothesis to reconcile our findings with these new evidence.

      OSCA1.2 has four Lys residues in TM4 and TM6b at the pore fenestration site, which were shown to interact with the lipid phosphate head group, whereas two of the equivalent residues in OSCA3.1 are Ile. In the section "Substitution of potential lipid-interacting lysine residues", the authors made K435I/K536I double mutant for OSCA1.2 to mimic OSCA3.1 and observed poor response to poking but an intact response to stretch. Did the authors mutate the Ile residues in OSCA3.1 to Lys, and did the mutation confer channel sensitivity to poking stimuli resembling OSCA1.2? The reviewer thinks it is necessary to perform such an experiment, to thoroughly suggest the importance of the four Lys residues in lipid interaction for channel mechanoactivation.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are no longer able to perform such experiments.

      Reviewer #3 (Public Review):

      Summary:

      Jojoa-Cruz et al provide a new structure of At-OSCA3.1. The structure of OSCA 3.1 is similar to previous OSCA cryo-em structures of both OSCA3.1 and other homologues validating the new structure. Using the novel structure of OSCA3.1 as a guide they created several point mutations to investigate two different mechanosensitive modalities: poking and stretching. To investigate the ability of OSCA channels to gate in response to poking they created point mutations in OSCA1.2 to reduce sensitivity to poking based on the differences between the OSCA1.2 and 3.1 structures. Their results suggest that two separate regions are responsible for gating in response to poking and stretching.

      Strengths:

      Through a detailed structure-based analysis, the authors identified structural differences between OSCA3.1 and OSCA1.2. These subtle structural changes identify regions in the amphipathic helix and near the pore that are essential for the gating of OSCA1.2 in response to poking and stretching. The use of point mutations to understand how these regions are involved in mechanosensation clearly shows the role of these residues in mechanosensation.

      Weaknesses:

      In general, the point mutations selected all show significant alterations to the inherent mechanosensitive regions. This often suggests that any mutation would disrupt the function of the region, additional mutations that are similar in function to the WT channel would support the claims in the manuscript. Mutations in the amphipathic helix at W75 and L80 show reduced gating in response to poking stimuli. The gating observed occurs at poking depths similar to cellular rupture, the similarity in depths suggests that these mutations could be a complete loss of function. For example, a mutation to L80I or L80Q would show that the addition of the negative charge is responsible for this disruption not just a change in the steric space of the residue in an essential region.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several questions regarding some of the aspects of your study:

      Mutation of the hydrophobic W75 and L80 in OSCA1.2 to charged residues significantly decreases the poke response in OSCA1.2 without affecting the stretch response. However, W75 and L80 are also present in OSCA3.1, which does not respond efficiently to poke. You conclude that these two residues are important for the poke response, but do not delve into why, if these residues are important, OSCA3.1 is not poke-sensitive.

      In addition, mutation of the OSCA1.2 AH to resemble that of OSCA3.1 does not produce channels that are less poke-sensitive. Given the data presented, if AH were a universal "poke sensor", one could also expect WT OSCA3.1 to exhibit a robust poke response, like OSCA1.2. Here I think it would be important to explain in more detail how this data might fit together.

      We thank the reviewer for bringing up this issue. We decided to test the importance of the AH due to the presence of similar structures in other mechanosensitive channels. Our data showed that single and double mutants of the AH of OSCA1.2 affected its poke response but not stretch. This supports the idea of the AH involvement in the poke response. Yet, we agree that the differences in the AH between OSCA1.2 and OSCA3.1 (P77R mutation) do not explain the higher threshold of OSCA3.1, we have explicitly added this in line #255. The particular OSCA3.1 phenotype may be due to other differences in the structure, for example, differences in the membrane fenestration area, or a combined effect of several differences, which we believe is more likely.

      I also have some questions about the protein-lipid interactions in the fenestration. A lipid has been observed in this location in both OSCA1.2 and OSCA3.1 structures. Mutation of the two OSCA1.2 lysines to isoleucines results in channels that are resistant to poke which leads to the conclusion that the interactions between the fenestration lysines and lipids are important for the poke response.

      Here, there are several questions that arise but are not answered:

      It is not shown what happens when OSCA3.1 isoleucines are mutated to lysines - do these mutants result in poke-able channels? Is the OSCA3.1 mechanosensing altered?

      We performed a preliminary test on OSCA3.1 I423K/I525K double mutant (n = 3). However, we did not see an increase in poke sensitivity. We attributed this to other unexplored differences in OSCA3.1 having an effect in channel mechanosensitivity.

      It is implied that the poke response is predicated on the lysine-lipid interaction. However, lipid densities are present in both OSCA1.2 and OSCA3.1 structures, indicating that both fenestrations interact with lipids. How can we be certain that the mutation of lysine to isoleucine does not disrupt an inter-protein interaction rather than a protein-lipid one? For example, the K435I mutation might disrupt interactions with D523 or the backbone of G527?

      The reviewer brings up a good point. We believe the phenotype seen is due to a different strength in the interaction between lipids and proteins, however, disrupted interaction with other residues is a valid alternative explanation. We agree that the suggested experiments will further clarify the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      Similarly, the effects of single lysine-to-isoleucine (K435I or K536I) mutations are not explored.

      The observed effect might be caused by only one of these substitutions.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      I also wanted to take this opportunity to ask a couple of philosophical (?) questions about using a mammalian system to study ion channels that have evolved to function in plants. Your study highlights the intimate relationship between the lipid bilayer and protein function/mechanosensitivity. Plant cells contain high levels of sterols and cerebrosides that would significantly affect both cell stiffness and the specific interactions that can be formed between the protein and the lipid bilayer. I wonder if the properties of the lipid bilayer might shift the thresholds for poke and/or stretch stimuli and if structural elements that do not appear to have a major role in mechanosensation in a mammalian cell (e.g., BLD) might be very influential in a lipid environment that more closely resembles that of a plant?

      Conversely, is it possible that OSCA channels are not poke-sensitive in plant cells? These questions are beyond the scope of your study, but they might be a nice addition to your discussion.

      The reviewer poses a great question. Electrophysiological approaches for studying plant mechanosensitive channels suffer the limitation of not being able to fully reconstitute the environment of a plant cell. To be able to patch the cell, the cell wall needs to be disposed of, which eliminates the tension generated from this structure onto the membrane. In that sense, performing these assays in plant cells or another system would not give us a fully accurate picture of the physiological thresholds of these channels. Given this limitation, we performed our study with mammalian cells given our expertise with them. Like the reviewer, we are also intrigued by the effect of different membrane compositions on the behavior of OSCA channels and how these channels will behave under physiological conditions, but we agree with the reviewer that these questions are out of the scope of our work. To address this point, in line #294 we have added: “It is also important to note that the membrane of a plant cell contains a different lipid composition than that of HEK293 cells used in our assays, and thus these lipids, or the plant cell wall, may alter how these channels respond to physiological stimuli.”

      Line 313 For structural studies, human codon-optimized OSCA3.1. Could you please clarify what this means?

      We have changed the phrase to “For structural studies, the OSCA3.1 (UniProt ID: Q9C8G5) coding sequence was synthesized using optimized codons for expression in human cells and subsequently cloned into the pcDNA3.1 vector” in line #327 to clarify this sentence.

      As a final comment, in the methods you use references to previously published work. I would strongly encourage you to replace these with experimental details.

      We understand the reviewer’s argument. However, this article falls under eLIFE’s Research Advances and will be linked to the original published work to which we reference the method. As suggested in the guidelines for this type of article, we only described the methods that were different from the original paper.

      Reviewer #2 (Recommendations For The Authors):

      (1) In line 85, provide C-alpha r.m.s.d. values for the structural alignment among OSCA3.1, OSCA1.1, and OSCA1.2 protomers.

      As requested, we have added the C-alpha RMSD in line #86.

      (2) In line 90, should the figure reference to Fig. 1d be Fig. 1e?

      We thank the reviewer for catching this error. We have corrected it in the manuscript.

      (3) In lines 89-94, what putative lipid is it resolved in the OSCA3.1 pore? Can the authors assign the lipid identity? Is this the same or different from the lipids resolved in OSCA1.2, OSCA1.1, and TMEM63?

      In the model, we have built the lipid as palmitic acid to represent a lipid tail, but the resolution in this area makes it difficult to ascertain the identity of said lipid, hence we cannot compare to lipids in other orthologs.

      (4) In lines 115-121, the authors describe the presence of AHs and their functional roles in MscL and TMEM16. It will be more informative if the authors can add figures to show the structure of MscL and highlight the analogous AH. In addition, the current Supplementary Fig. 6 is not informative so it should be improved. It is not clear to the reviewer why that stretch of helix in TMEM16 is equivalent or analogous to the AH in OSCAs, either sequence alignment or a detailed structural alignment is helpful to address this point. Also, in lines 120-121, it says this helix in TMEM16 "does not present amphipathic properties", please show the sequence or amphipathicity of the helix.

      We thank the reviewer for the feedback on this figure. Supplementary Fig. 6 has been thoroughly modified to address the reviewer’s concerns. We now include a panel showing the structure of MscL and its amphipathic helix. We have modified the alignment of OSCA3.1 to a TMEM16 homolog to make clearer the homologous positioning of the helices in question and zoom in to show their sequences.

      (5) In discussion, lines 249-257, the authors referred to a recent study that suggested three evolutionarily coupled residue pairs located on BLD and TM6b. The authors speculate that the reason they did not observe a significant effect of channel response to poke/stretch stimuli in the BLD swapping between OSCA1.2 and 3.1 is due to the 2 of 3 salt bridges remaining for the residue pairs. To test the importance of these residue pairs and their coupling for channel gating, instead of swapping the entire BLD, can the authors systematically mutate the residue pairs, disrupt the salt-bridge interactions, and analyze the effect on channel response to mechanical force?

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      (6) The reviewer suggests the authors tone down the elaboration of polymodal activation of OSCA by membrane poking and stretch.

      We believe the idea of polymodal activation is sufficiently toned down as we only postulate it as a possibility and following we give an alternative explanation based on methodological limitations: “Nonetheless, the discrepancy could be due to inherent methodological differences between these two assays, as whole-cell recordings during poking involve channels in inaccessible membranes (at the cell-substrate interface) and channel interactions with extracellular and intracellular components, while the stretch assay is limited to recording channels inside the patch.”

      (7) In lines 81-83, the authors described the BLD as showing increased flexibility, and the EM map at this region is less well resolved for registry assignment. In the method for cryo-EM image processing and Supplementary Fig. 1, the authors only carried out 3D refinement and classification at the full channel level. Have the authors attempted to do focus refinement or classification at the BLD domain in order to improve the local resolution or to sort out conformational heterogeneity? The reviewer suggests doing so because the BLD domain is a hot spot that the authors have proposed to play an important role in OSCA mechanosensation. Conformational changes identified in this region might provide insights into its role in the channel function.

      We thank the reviewer for this suggestion. We have performed focused classification on the BLD with and without surrounding regions and, in our hands, it did not improve the resolution or provide further insights.

      Reviewer #3 (Recommendations For The Authors):

      Here are a few specific minor corrections that should be addressed

      (1) In lines 117-135, in the discussion of Figure 2, the data shows an apparent increase in the poking threshold to gate W75K/L80E. The substantial increase in the depth required to gate the channel suggests that these channels are less sensitive to poking. Would it be possible to compare the depth at which these two patches show activity and the depth at which the other 22 cells ruptured? Line 161 mentions that the rupture threshold of HEK cells is close to the gating of OSCA3.1 at 13.8 µm.

      The distance just before the cell ruptured in 22 cells with no response was 12.5 +/- 2.5 um. The distance at which the cells ruptured was 0.5 um more (13 +/- 2.5 n=22). We have added this last value in line #137.

      (2) Would it be possible in Figures 2 panels b and c, 3, and figure 4 to label the WT as WT OSCA1.2?

      We thank the reviewer for pointing this out. We agree this modification will improve the clarity of the figures and have changed the figures to follow the reviewer’s suggestion.

      (3) Can you provide a western blot of the mutations described in Figure 2? This would provide insight into the amount of protein at the cell surface and available to respond to poking, the stretch data shows that these channels are in the membrane but does not show if they are in the membrane in similar quantities.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      (4) The functional differences between the two channels are projected to be tied to several distinct point mutations, however, the data could be strengthened by additional point mutations at all sites to show that the phenotypes are due to the mutations specifically not just any mutation in the region.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This manuscript from Mukherjee et al examines potential connections between telomere length and tumor immune responses. This examination is based on the premise that telomeres and tumor immunity have each been shown to play separate, but important, roles in cancer progression and prognosis as well as prior correlative findings between telomere length and immunity. In keeping with a potential connection between telomere length and tumor immunity, the authors find that long telomere length is associated with reduced expression of the cytokine receptor IL1R1. Long telomere length is also associated with reduced TRF2 occupancy at the putative IL1R1 promoter. These observations lead the authors towards a model in which reduced telomere occupancy of TRF2 - due to telomere shortening - promotes IL1R1 transcription via recruitment of the p300 histone acetyltransferase. This model is based on earlier studies from this group (i.e. Mukherjee et al., 2019) which first proposed that telomere length can influence gene expression by enabling TRF2 binding and gene transactivation at telomere-distal sites. Further mechanistic work suggests that G-quadruplexes are important for TRF2 binding to IL1R1 promoter and that TRF2 acetylation is necessary for p300 recruitment. Complementary studies in human triple-negative breast cancer cells add potential clinical relevance but do not possess a direct connection to the proposed model. Overall, the article presents several interesting observations, but disconnection across central elements of the model and the marginal degree of the data leave open significant uncertainty regarding the conclusions.

      Strengths:

      Many of the key results are examined across multiple cell models.

      The authors propose a highly innovative model to explain their results.

      Weaknesses:

      Although the authors attempt to replicate most key results across multiple models, the results are often marginal or appear to lack statistical significance. For example, the reduction in IL1R1 protein levels observed in HT1080 cells that possess long telomeres relative to HT1080 short telomere cells appears to be modest (Supplementary Figure 1I). Associated changes in IL1R1 mRNA levels are similarly modest.

      Related to the point above, a lack of strong functional studies leaves an open question as to whether observed changes in IL1R1 expression across telomere short/long cancer cells are biologically meaningful.

      Statistical significance is described sporadically throughout the paper. Most major trends hold, but the statistical significance of the results is often unclear. For example, Figure 1A uses a statistical test to show statistically significant increases in TRF2 occupancy at the IL1R1 promoter in short telomere HT1080 relative to long telomere HT1080. However, similar experiments (i.e. Figure 2B, Figure 4A - D) lack statistical tests.

      TRF2 overexpression resulted in ~ 5-fold or more change in IL1R1 expression. Compared to this, telomere length-dependent alterations in IL1R1 expression, although about 2-fold, appear modest (~ 50% reduction in cells with long telomeres across different model systems used). Notably, this was consistent and significant across cell-based model systems and xenograft tumors (see Figure 1). Unlike TRF2 induction, telomere elongation or shortening vary within the permissible physiological limits of cells. This is likely to result in the observed variation in IL1R1 levels. For biological relevance, we further demonstrated that IL1 signalling in TNBC tissue and tumor organoids, and M2 macrophage infiltration, was significantly dependent on telomere length. Details of tests of significance were included in the individual figure legends. Based on the comment here we will expand on it in a dedicated paragraph in the methods section to make the information clearer for readers. We noticed that the stars (*) denoting statistical significance were omitted in some ChIP-experiment figures. This was likely an error during figure assembly for PDF conversion. We thank the reviewer for bringing this up; necessary changes will be made in the revised manuscript.

      Reviewer #2 (Public Review):

      This study highlights the role of telomeres in modulating IL-1 signaling and tumor immunity. The authors demonstrate a strong correlation between telomere length and IL-1 signaling by analyzing TNBC patient samples and tumor-derived organoids. Mechanistic insights revealed non-telomeric TRF2 binding at the IL-1R1. The observed effects on NF-kB signaling and subsequent alterations in cytokine expression contribute significantly to our understanding of the complex interplay between telomeres and the tumor microenvironment. Furthermore, the study reports that the length of telomeres and IL-1R1 expression is associated with TAM enrichment. However, the manuscript lacks in-depth mechanistic insights into how telomere length affects IL-1R1 expression. Overall, this work broadens our understanding of telomere biology.

      The mechanism of how telomere length affects IL1R1 expression involves sequestration and reallocation of TRF2 between telomeres and gene promoters (in this case, the IL1R1 promoter). We have previously shown this across multiple genomic sites (Mukherjee et al, 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). We have described this in the manuscript along with references citing the previous works. A scheme explaining the model was provided as Additional Supplementary Figure 1, along with a description of the mechanistic model.

      Figure 1-4 in main figures describe the molecular mechanism of telomere-dependent IL1R1 activation. This includes ChIP data for TRF2 on the IL1R1 promoter in long/short telomeres, as well as TRF2-mediated histone/p300 recruitment and IL1R1 gene expression. We further show how specific acetylation on TRF2 is crucial for TRF2-mediated IL1R1 regulation (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, entitled "Telomere length sensitive regulation of Interleukin Receptor 1 type 1 (IL1R1) by the shelterin protein TRF2 modulates immune signalling in the tumour microenvironment", Dr. Mukherjee and colleagues pointed out clarifying the extra-telomeric role of TRF2 in regulating IL1R1 expression with consequent impact on TAMs tumor-infiltration.

      Strengths:

      Upon careful manuscript evaluation, I feel that the presented story is undoubtedly well conceived. At the technical level, experiments have been properly performed and the obtained results support the authors' conclusions.

      Weaknesses:

      Unfortunately, the covered topic is not particularly novel. In detail, the TRF2 capability of binding extratelomeric foci in cells with short telomeres has been well demonstrated in a previous work published by the same research group. The capability of TRF2 to regulate gene expression is well-known, the capability of TRF2 to interact with p300 has been already demonstrated and, finally, the capability of TRF2 to regulate TAMs infiltration (that is the effective novelty of the manuscript) appears as an obvious consequence of IL1R1 modulation (this is probably due to the current manuscript organization).

      Here we studied the TRF2-IL1R1 regulatory axis (not reported earlier by us or others) as a case of the telomere sequestration model that we described earlier (Mukherjee et al., 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). This manuscript demonstrates the effect of the TRF2-IL1R1 regulation on telomere-sensitive tumor macrophage recruitment. To the best of our knowledge, no previous study connects telomeres of tumor cells mechanistically to the tumor immune microenvironment. Here we focused on the IL1R1 promoter and provided mechanistic evidence for acetylated-TRF2 engaging the HAT p300 for epigenetically altering the promoter. This mechanism of TRF2 mediated activation has not been previously reported. Further, the function of a specific post translational modification (acetylation of the lysine residue 293K) of TRF2 in IL1R1 regulation is described for the first time. Additional experiments showed that TRF2-acetylation mutants, when targeted to the IL1R1 promoter, significantly alter the transcriptional state of the IL1R1 promoter. To our knowledge, the function of any TRF2 residue in transcriptional activation had not been previously described. Taken together, these demonstrate novel insights into the mechanism of TRF2-mediated gene regulation, that is telomere-sensitive, and affects the tumor-immune microenvironment. We are considering the suggestion to reorganize the manuscript to highlight the novel aspects of our work more convincingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      (1) Please expand methods with additional details related to cell co-culture, such as cell numbers and duration.

      We thank the reviewer for the careful reading and constructive suggestions and we are sorry to make you confused. We have added the experimental details (manuscript line 551-553) related to co-culture in the revised manuscript.

      (2) Please unify the writing of the abbreviation of small extracellular vesicles in the text, figure, and caption.

      Thank you for your comments. We have unified the abbreviation of extracellular vesicles to sEVs in the revised manuscript.

      (3) The effects of components other than sEVs in mechanically stimulated osteocyte CM on the proliferation of NSCLC cells should be evaluated.

      We evaluated the effects of SF, lEVs and sEVs in osteocyte CM on NSCLC cell proliferation under mechanical stimulation, and found that sEVs had the most obvious inhibition on NSCLC cell proliferation, as shown in the revised Supplemental Figure 4c, d.

      (4) In addition to osteocytes and osteoblasts, the effects of other types of cells on the proliferation of NSCLC cells should be detected. It is recommended to add at least one type of cell from an infrequent metastatic site of NSCLC as a negative control.

      We thank the reviewer for the suggestion. We added NCM460 cell line (derived from intestinal epithelium) as a negative control and found that NCM460 had no significant effect on NSCLC cell proliferation, as shown in Figure 1d. These experiments were conducted before our last submission.

      (5) The bone microenvironment is complex. It is recommended to evaluate the effect of bone marrow-derived sEVs on NSCLC to validate whether the tumor suppressive effect of osteocyte sEVs is unique.

      We thank the reviewer for the suggestion. We agree with the reviewer’s comments that the bone microenvironment is complex. We explored the effect of bone marrow-derived sEVs on NSCLC cell proliferation and found that bone marrow-derived sEVs promoted NSCLC cell proliferation, as shown in Supplemental Figure 2g, h in the revised manuscript.

      (6) The description of exercise preconditioning is not clear enough. It is recommended to supplement the pattern diagram to improve readability. Exercise preconditioning should be further discussed by the Authors.

      Thank you for your comments and we are sorry to make you confused. We have added the pattern diagram of the exercise preconditioning in Supplemental Figure 6a.

      Reviewer #2 (Recommendations For The Authors):

      (1) The histological images are analyzed in a qualitative manner, with no description of the methodology used. A quantitative assessment of the distance and level of Ki-67+ NSCLC cells needs to be performed in human and murine tissues. Because in bone metastases cancer cells are frequently mixed with bone marrow cells, the inclusion of a cell marker to identify NSCLC cells is needed for proper interpretation of the imaging data.

      We thank the reviewer for the careful reading and constructive suggestions. We conducted the suggested quantitative assessment and descripted the methodology in the revised manuscript. The results showed that Ki-67 was lower in tumor cells adjacent to bone tissue than in the surrounding tumor cells (Figure 1a, b).

      In order to effectively identify NSCLC cells in bone metastases, GFP-expressing NSCLC cells were used in the animal model. We have added the immunofluorescence analysis of GFP and CCND3 in Supplemental Figure 4e, 4g, 5 and 6b.

      (2) The authors rely on KI-67 as a marker of proliferation. Yet, it is intriguing that some osteocytes, non-proliferating cells by definition, are often positive for this marker, which questions the specificity of the staining. The authors should provide the proper immunostaining controls to check for specificity and use additional markers of proliferation to confirm these results.

      We thank the reviewer for the suggestions. Ki-67 staining was wildly used to determine the dormancy of tumor cells in previous studies [1-4]. To confirm the results of Ki-67 staining, we used cyclin D3 (CCND3) as an additional marker of proliferation as suggested by the reviewer. We added the immunofluorescence analysis of CCND3 in Supplemental Figure 4e, 4g, 5 and 6b, which is consistent with the result of the quantitative immunofluorescence analysis of KI-67.

      (3) The lack of proper controls in the in vivo experiments makes the interpretation of the data difficult. For instance, in the preconditioning experiment, it is likely that the bone mass increases. thus, these mice start with high bone mass than the control mice. The lack of a proper control (naive mice exposed to moderate exercise) does not allow testing if the presence of cancer cells still promotes bone loss in this group. The authors need to include naive mice or analyze the bones from the non-injected contralateral legs.

      We thank the reviewer for the thoughtful comments and we are sorry to make you confused. We absolutely agree with the reviewer that the bone mass increases after exercise preconditioning. Multiple tissues and organ systems are affected by exercise, initiating diverse homeostatic responses. Although exercise preconditioning effectively suppressed bone metastasis progression of NSCLC as mentioned in the previous manuscript, we cannot immediately conclude that it is completely dependent on osteocytes to function. The mechanism of exercise preconditioning in suppressing bone metastasis progression is complex which still need further exploration. The revised manuscript has expanded the discussion on this area (manuscript line 326-328).

      (4)Further, validating the in vivo work with other osteocyte-like cells or primary osteocytes would have strengthened the results.

      We thank the reviewer for the suggestion. We have conducted the experiments of co-culture of MLO-A5 (another type of osteogenic cell line) and NSCLC cells as shown in Supplemental Figure 1g. Not surprisingly, MLO-A5 cells also had an inhibitory effect on proliferation of NSCLC cells.

      (5) The data on miRNA99b-3p on NSCLC in Supplementary Figure 3 is not convincing. The positive cells are difficult to see and most of the osteocyte lack nuclei. Better data, in humans and the mouse model, is needed to confirm that osteocytes produce miRNA99b-3p.

      We thank the reviewer for the comments and we are sorry to make you confused. In this study, we used miRCURY LNA miRNA detection probes in ISH without staining the nuclei in the tissues, which method have been used in our previous studies with others [5-7]. Detailed experimental procedures for ISH of miRNA have been added in the revised manuscript (manuscript line 461-474).

      (6) The authors do not provide a piece of data supporting that osteocytes are responsible for any of the effects seen by the interventions done in the in vivo models. Osteocytes, as well as other bone cells, can respond to mechanical stimulation and thus could virtually be responsible for the protective effects of mechanical loading or moderate exercise. In vivo experiments demonstrating a direct role of osteocytes-produced miRNA99b-3p are needed to support the notion that osteocytes maintain tumor dormancy in NSCLC bone metastasis.

      We thank the reviewer for the thoughtful comments and suggestion. We constructed in vivo model by injecting with antagomir-NC and antagomir-99b-3p with mechanical loading [8]. The results showed that the injection of antagomiR-99b-3p could partially and effectively rescue the inhibitory effect on NSCLC cell proliferation (Figure 4i-k).

      (7) Further, the authors solely rely on Ki-67 as a marker of dormancy. Completing this analysis with an assessment of a dormant gene expression signature or in vivo studies assessing tumor dormancy directly would be needed to confirm this notion.

      We thank the reviewer for the suggestion. We conducted the suggested experiment by using CCND3 as an additional dormancy marker. We added the immunofluorescence analysis of CCND3 in Supplemental Figure 4e, 4g, 5 and 6b, which is consistent with the result of the quantitative immunofluorescence analysis of Ki-67.

      References

      [1] Guba M, Cernaianu G, Koehl G et al. A primary tumor promotes dormancy of solitary tumor cells before inhibiting angiogenesis. Cancer Res, 2001, 61: 5575-9.

      [2] Bliss Sarah A, Sinha Garima, Sandiford Oleta A et al. Mesenchymal Stem Cell-Derived Exosomes Stimulate Cycling Quiescence and Early Breast Cancer Dormancy in Bone Marrow. Cancer Res, 2016, 76: 5832-5844.

      [3] Correia Ana Luísa, Guimaraes Joao C, Auf der Maur Priska et al. Hepatic stellate cells suppress NK cell-sustained breast cancer dormancy. Nature, 2021, 594: 566-571.

      [4] Hu Jing, Sánchez-Rivera Francisco J, Wang Zhenghan et al. STING inhibits the reactivation of dormant metastasis in lung adenocarcinoma. Nature, 2023, 616: 806-813.

      [5] Song Qiancheng, Xu Yuanfei, Yang Cuilan et al. miR-483-5p promotes invasion and metastasis of lung adenocarcinoma by targeting RhoGDI1 and ALCAM. Cancer Res, 2014, 74: 3031-42.

      [6] Carotenuto Pietro, Hedayat Somaieh, Fassan Matteo et al. Modulation of Biliary Cancer Chemo-Resistance Through MicroRNA-Mediated Rewiring of the Expansion of CD133+ Cells. Hepatology, 2020, 72: 982-996.

      [7] Lv Yan, Wang Yin, Song Yu et al. LncRNA PINK1-AS promotes Gαi1-driven gastric cancer tumorigenesis by sponging microRNA-200a. Oncogene, 2021, 40: 3826-3844.

      [8] Zhang Yun, Li Shuaijun, Jin Peisheng et al. Dual functions of microRNA-17 in maintaining cartilage homeostasis and protection against osteoarthritis. Nat Commun, 2022, 13: 2447.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans. In this manuscript, the authors generated TRIP13 null mice and Flag-tagged TRIP13 knock-in mice to study its role in meiosis. They demonstrate that TRIP13 regulates MORMA domain proteins and is essential for meiotic completion and fertility. The main impact of this manuscript is its clarification of the in vivo function of TRIP13 during mouse meiosis and its previously unrecognized role as a dose-sensitive regulator of meiosis.

      Strengths:

      Two previously reported Trip13 mutations in mice are both hypomorphic alleles with distinct phenotypes, precluding a conclusion on its function. This study for the first time generated the TRIP13 null mice, definitively revealing the function of TRIP13 in meiosis. The authors also show the novel localization of TRIP13 at SC and its independence from the axial element components. The finding of dose-sensitive regulation of meiosis by TRIP13 has implications in understanding human meiosis and disease phenotypes.

      Weaknesses:

      This manuscript would be more impactful if more mechanistic advancements could be made. For example, the authors could follow up with one of the new interactors identified by MS to offer new insight into the molecular function of TRIP13.

      We agree that it would be interesting to follow up on new candidate interactors but think that it would be more feasible to follow up on them in future studies.

      Reviewer #2 (Public Review):

      Summary and Strengths:

      In this manuscript, Chotiner and colleagues demonstrated the localization of TRIP13 and clarified the phenotypes of Trip13-null mice in mouse meiosis. The meiotic phenotypes of Trip13 have been well characterized using the hypomorph alleles in the literature. However, the null phenotypes have not been examined, and the localization of TRIP13 was not clearly demonstrated. The study fills these important knowledge gaps in the field. The demonstration of TRIP13 localization to SC in mice provides an explanation of how HOMRA domain proteins are evicted from SC in diverse organisms. This conclusion was confirmed in both IF and TRIP13-tagged Tg mice. Further, the phenotypes of Trip13-null mice are very clear. The manuscript is well crafted, and the discussion section is well organized and comprehends the topic in the field. All in all, the manuscript will provide important knowledge in the field of meiosis.

      Weaknesses:

      The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. However, the authors did not examine meiotic recombination in the Trip13-null mice.

      Meiotic recombination was extensively characterized in Trip13 severe hypomorph mutants in two previous studies: gamma-H2AX, BLM, BRCA1, ATR, RPA, RAD51, DMC1, MLH1 (Li and Schimenti, 2007; Roig et al., 2010). All the meiotic defects in our Trip13-null mice were also present in Trip13 severe hypermorph mutants: meiotic arrest, defects in chromosomal synapsis, asynapsis at chromosomal ends, and accumulation of HORMAD1/2 on the SC axis. Therefore, the defects in meiotic recombination in Trip13-null mice are expected to be similar to those in Trip13 severe hypermorph mutants and thus we did not examine the proteins involved in meiotic recombination in the Trip13-null mutant.

      Reviewer #3 (Public Review):

      Summary:

      The authors perform a thorough examination of the phenotypes of a newly generated Trip13 null allele in mice, noting defects in chromosome synapsis and impact on localization of other key proteins (namely HORMADs) on meiotic chromosomes. The vast majority of data confirms observations of several prior studies of Trip13 alleles (moderate and severe hypomorphs). The original or primary aims of the study aren't clear, but it can be assumed that the authors wanted to better study the role of this protein in evicting HORMADs upon synapsis by studying phenotypes of mutants and better characterizing TRIP13 localization data (which they find localizes to the central element of synapsed chromosomes using a new epitope-tagged allele). Their data confirm prior reports and are consistent with localization data of the orthologous Pch2 protein in many other organisms.

      Strengths:

      The quality of data is high. Probably the most important data the authors find is that TRIP13 is localized along the CE of synapsed chromosomes. However, this was not unexpected because PCH2 is also similarly localized. Also, the authors use a clear null (deletion allele), whereas prior studies used hypomorphs.

      Weaknesses:

      There is limited new data; most are confirmatory or expected (i.e., SC localization), and thus the impact of this report is not high. The claim that TRIP13 "functions as a dosage-sensitive regulator of meiosis" is exaggerated in my opinion. Indeed, the authors make the observation that hets have a phenotype, but numerous genes have haploinsufficient phenotypes. In my opinion, it is a leap to extrapolate this to infer that TRIP13 is a "regulator" of meiosis. What is the definition of a meiosis regulator? Is it at the apex of the meiosis process, or is it a crucial cog of any aspect of meiosis?

      TRIP13 is not haploinsufficient, as Trip13 heterozygotes were still viable and fertile (albeit with defects in meiosis). TRIP13 is an ATPase and changes the conformation of meiosis-specific proteins such as HORMAD proteins. TRIP13 is essential for meiosis and its mutations cause defects in both meiotic recombination and chromosomal synapsis. Reviewer 1 stated that “TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans”. Therefore, we feel that TRIP13 can be called a regulator of meiosis.

      Reviewer #1 (Recommendations For The Authors):

      A schematic illustration of SC structure, the components involved, and the main finding, would be helpful for readers to better understand the advancement made by this study.

      We have now added a schematic illustration in a new panel - Figure 7C.

      Fig. 1B, the stage with diplotene cells should be XII.

      The pachytene cells (Pac) were mis-labelled as diplotene cells. Corrected.

      Fig. 1C, color mislabeled.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript will provide important knowledge in the field of meiosis. I support the publication of this study. I have some suggestions to improve and polish the manuscript.

      Major points:

      (1) The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. Given the function of HORMAD1 in meiotic recombination, it would be informative if the authors could examine how major makers of meiotic recombination behave in Trip13-null meiosis.

      Please see our response to Weaknesses from Reviewer #2.

      (2) Relating to the above point, the complete lack of synapsis on the sex chromosomes in the Trip13-null meiosis is impressive. This result raises a question as to whether the pathway to designate XY-obligatory crossover (which can be detected with large foci of ANKRD31 and MEI4/REC114 at PAR) is affected or not. It would be interesting to examine whether the ANKRD31 and MEI4/REC114 foci are present on PAR in Trip13-null meiosis.

      We have performed immunofluorescent analysis of REC114 in spermatocytes. In Trip13-null pachytene-like spermatocytes, X and Y chromosomes are not synapsed. REC114 still formed one focus each on the unsynapsed X and Y chromosomes. We have added this new data in the Results as a new supplementary figure (Figure 4 -supplement 1).

      (3) Figure 4 can be improved if there are quantified data for each phenotype. These phenotypes look nearly complete, but it would be informative to show the penetrance of these phenotypes.

      Because some chromosomes have unsynapsed ends, resulting in two centromere or telomere foci, the total number of centromere or telomere foci is always higher in Trip13-null pachytene-like spermatocytes than wild type pachytene spermatocytes. Therefore, we did not count the foci of centromeres and telomeres. Consistently, the centromere and telomere markers localized as expected in both wild type and Trip13-null spermatocytes.

      (4) I am not fully convinced by these photos: "synapsed sister chromatids (Figure 6B)" and "Sycp2-/- spermatocytes formed short stretches of synapsis (Figure 6C)". The authors may try confocal microscopy with super-resolution deconvolution as they did for other data.

      These have been previously demonstrated. The “synapsed sister chromatids (Figure 6B)” were previously demonstrated by confocal microscopy with super-resolution deconvolution (Guan et al., 2020). The short stretches of synapsis in Sycp2-/- spermatocytes was previously demonstrated by electron microscopy (Tripartite SC structure) and SYCP1 immunofluorescence (Yang et al., 2006). We have revised the text by citing the previous evidence and the publications.

      Minor points:

      (1) Line 19-21: "Loss of TRIP13 leads to meiotic arrest and thus sterility in both sexes. Trip13-null meiocytes exhibit abnormal persistence of HORMAD1 and HOMRAD2 on synapsed SC". These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles. This information can be added to the abstract. Otherwise, it sounds like these are totally new findings, as written.

      This information is now added to the abstract: “These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles.”

      (2) The introduction section seems too long and contains unnecessary information. Some molecular details that are not touched in the result section can be deleted (e.g., Line 65-73).

      We would like to keep the molecular details on the two conformation states, as it provides biochemical background on TRIP13-HORMAD interactions.

      (3) Introduction, Line 92. A rationale can be added as to why the authors characterized the Trip13-null allele.

      a rationale has been added as follows: “To determine the effect of complete loss of TRIP13, we characterized Trip13-null mice.”

      (4) Line 205: Typo "TRRIP13". Corrected.

      Reviewer #3 (Recommendations For The Authors):

      Just a few recommendations:

      (1) In my opinion, the title is an overreach. "Regulator" invokes other concepts such as transcription factors.

      Please see our explanation in response to weaknesses from Reviewer #3.

      (2) The first sentence of the results deals with TRIP13 expression in only 3 tissues. The authors might look at more comprehensive RNA-seq data from mice and humans.

      We examined TRIP13 protein expression in 8 mouse tissues by WB and found that TRIP13 protein was abundant in testis but present at a very low level in ovary and liver (Figure 1A). We feel that readers can easily look up the relative transcript levels of Trip13 in more tissues from mice and humans from NCBI database under “Gene”.

      (3) The null allele is semi-lethal. Is body size affected? Were the mice abnormal in any other ways, given that TRIP13 has been implicated in other diseases and processes, and is expressed in other tissues (TRIP13 stands for Thyroid receptor interacting protein).

      The body weight of 2-3 month-old males was not significantly different between wild type (24.3±2.8 g, n=5) and Trip13 KO mice (22.8±1.7 g, n=5, p=0.3, Student’s t-Test). We have included the body weight information in the revised manuscript. We didn’t observe abnormal somatic defects in the viable Trip13-null mice, nor did the authors report any in the Trip13 hypomorph mutants in two previous studies (Li and Schimenti, 2007; Roig et al., 2010).

      (4) Line 276 : It would be nice to elaborate on the "spatial explanation."

      We meant that TRIP13 localizes to SC while HORMAD proteins are removed from SC upon chromosomal synapsis, thus providing a spatial explanation. However, we have now deleted “spatial”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      However, there are several concerns to be explained more in this study. In addition, some results should be revised and updated.

      Thank you for your comments. The concerns were addressed by the description and experiment.

      Some results were revised and updated accordingly.

      Reviewer #2 (Public Review):

      The minor weakness of the study is inconsistent use of terminology throughout the manuscript, occasional logic-jump in their flow, and missing detailed description in methodologies used either in the text or Materials and Methods section, which can be easily rectified.

      Thank you for your review. We have revised the manuscript and corrected errors according to your comments.

      Reviewer #3 (Public Review):

      Importantly, besides the Miwi ubiquitination experiment which is performed in a heterologous and therefore may not be ideal for extracting conclusions, the possible involvement of ubiquitination was not shown for any other proteins that the authors found that interact with FBXO24. Could histones and transition proteins be targets of the proposed ubiquitin ligase activity of FBXO24, and in its absence, histone replacement is abrogated?

      Thank you for your comments. The histones and transition proteins were not found in the immunoprecipitates of FBXO24, suggesting they are not the direct targets of FBXO24, shown in Figure S3G.

      Miwi should be immunoprecipitated and Miwi ubiquitination should be detected (with WB or mass spec) in WT testis.

      We agree with this suggestion. In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      Therefore, the claim that FBXO24 is essential for piRNA biogenesis/production (lines 308, 314) is not appropriately supported.

      We appreciate the comment. We have revised the description and modified the claim on page 11.

      Reviewing Editor's note for revision

      (1) As noted by all three reviewers, as currently written the rationale to focus on MIWI is not entirely clear. A transitional narrative to focus on MIWI needs to be provided as well as an explanation for how the absence of FBXO24 as an E3 ubiquitin ligase is responsible for the observed mRNA and protein differential expression.

      We appreciate your comments. We have supplemented the transitional narrative by focusing on MIWI and explained mRNA and protein differential expression upon FBXO24 deletion, shown on Page 7 and Page 13, respectively.

      (2) As it can be indirect, mass spec detection of MIWI in testis co-IP and MIWI ubiquitination should be detected (with WB or mass spec) in WT testis.

      In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      (3) Please tone down the claim that FBXO24 is essential for piRNA biogenesis/production as it requires further evidence.

      We have revised the description and modified the claim on page 11.

      (4) Ontology analysis of the genes with abnormally spliced mRNAs to provide an explanation for developmental defects.

      In the revision, we have performed the ontology analysis and provided new data regarding the abnormally spliced genes, as shown in Figure S4D.

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      (1) The authors performed mainly with the WT (or knock-in) and Fbxo24-knockout mouse model. Do the heterozygous males and their sperm have any physiological defects like FBXO24-deficient mice?

      This is a good question. We did the phenotype analysis and found that heterozygous males are all fertile, and their sperm do not have any physiological defects.

      (2) Fbxo24-KO sperm carries swollen mitochondria. How do the mitochondria affect sperm function?

      Thank you for raising this interesting question. Based on our data and published literature, the defective mitochondria were associated with energetic disturbances and reduced sperm motility, as shown on Page 12.

      (3) TEM images show that Fbxo24-KO spermatids carry swollen mitochondria and enlarged chromatoid bodies. How the swollen mitochondria and enlarged chromatid are defective for sperm motility and flagellar development, requires more explanation. In addition, it is unclear how the enlarged diameter of the chromatoid body is critical for normal sperm development.

      Thank you for your comments. The chromatoid bodies are considered to be engaged in mitochondrial sheath morphogenesis. Analysis of the chromatoid bodies' RNA content reveals enrichment of PIWI-interacting RNAs (piRNAs), further emphasizing the role of the chromatoid bodies in post-transcriptional regulation of spermatogenetic genes. We added this explanation on Page 12-13.

      (4) The authors only show band images to compare the protein amounts between WT and KO sperm and round spermatids. As the blots for loading controls are not clear, the authors should quantify the protein levels and perform a statistical comparison.

      We quantified the protein levels and performed a statistical comparison, as shown in Figure S3B.

      (5) The authors show the defective sperm head structure from Fbxo24-KO sperm in Figure 5. However, the Fbxo24-KO sperm heads seem quite normal in Figure 3. How many sperm show defective sperm head structure? In addition, the authors observed altered histone-to-protamine conversion in sperm, but it is unclear whether the altered nuclear protein conversion causes morphological defects in the sperm head.

      We appreciate the comments. In our study, we found over 80% of Fbxo24 KO sperm showed defective structure in the sperm head. Altered histone-to-protamine conversion caused the decondensed nucleus of Fbxo24 KO sperm. Notably, in many knockout mice studies, impaired chromatin condensation is frequently associated with abnormal sperm head morphology, as shown in reference 15 of Page 8.

      (6) The authors compare the protein levels of RNF8, PHF7, TSSK6, which participate in nuclear protein replacement in sperm. However, considering the sperm is the endpoint for the nuclear protein conversion, it is unclear to compare the protein levels in mature sperm. The authors might want to compare the protein levels in developing germ cells.

      Thank you for your comment. Yes, we actually detected the protein levels of RNF8, PHF7, and TSSK6 in the testes, not in sperm. We have corrected it in the Figure 5E. We apologize for our carelessness.

      (7)This reviewer suggests describing more rationales for how the authors focus on the MIWI protein. Also, it is wondered whether MIWI is also detected from testis co-IP mass spectrometry.

      We agree with this suggestion. Since MIWI was a core component of CB and also identified as an FBOX24 interacting partner from our immunoprecipitation-mass spectrometry (IP-MS) (Table S1), we focused on the examination of MIWI expression between WT and Fbxo24 KO testes. We have added this description in the revision (see lines 191-193 on page 7).

      (8) The authors need to provide a more detailed explanation for how the altered piRNA production affects physiological defects in germ cell development. In addition, it will be good to describe more how the piRNAs affect a broad range of mRNA levels.

      Thank you for your comments. The previously published studies have demonstrated that piRNAs could act as siRNAs to degrade specific mRNAs during male germ cell development and maturation. We have cited these studies on lines 369-372 of Page 13.

      (9) The authors observed an altered splicing process in the absence of FBXO24. However, it is a little bit confusing how the altered splicing events affect developmental defects. Therefore, the authors should state which mRNAs have undergone abnormal splicing processes and provide ontology analysis for the genes.

      We have performed the ontology analysis and showed the new data in Figure S4D.

      Minor comments

      (1) Figure 1A-C - Statistical comparison is missed. Numbers for biological replication should be described in corresponding legends.

      Thank you for your careful review. We have provided the statistical comparison and the numbers for biological replication in the legends of Figure 1A-C.

      (2) Figure 1E, F - Current images can't clearly resolve the nuclear localization of the FBXO24 testicular germ cells. To clarify the intracellular localization, the authors should provide images with higher resolution.

      The resolution of Figure 1E, F was improved, as suggested. Thank you!

      (3) Figure 1E, F - Scale bar information is missing.

      The scale bars of Figure 1E, F were provided.

      (4) It will be much better to show the predicted frameshift and early termination of the protein translation in Fbxo24-knockout mice.

      The predicted frameshift of Fbxo24-knockout mice was added and shown in Figure S1B.

      (5) It is required to provide primer information for qPCR.

      The primer information for qPCR was provided, as shown in Table S7.

      (6) The authors describe that Fbxo24-KO sperm show abrupt bending of the tail. However, the description is unclear and the sperm shown in Figure 3C seems quite normal. The authors should clarify the abnormal bending pattern of the tail and show quantified results.

      Thank you for pointing out this issue. In Fbxo24 KO sperm, abnormal bending of the sperm tails mainly included neck bending and midpiece bending. We have shown them in Figure S3A.

      (7) The authors mention that Fbxo24-KO sperm have swollen mitochondria at the midpiece, but this is also unclear. How many mitochondria are swollen in Fbxo24-KO sperm?

      This is a good question. However, since it is very difficult to observe all of the mitochondria in each sperm using the electronic microscope, we could not quantify the swollen mitochondria in Fbxo24 KO sperm.

      (8) Scale bar information is missed - Fig 3C insets, Fig 3D, Fig 3F insets, 4A insets, Figure 4C insets.

      All the scale bars have been added.

      (9) How many sperm have annulus defects? In Figure 3F, WT sperm does not have an annulus, which could be damaged during sample preparation. Is the annulus defects in Fbxo24-KO sperm consistent?

      Thank you for asking these questions. Based on our results, about 30% of Fbxo24 KO sperm showed defective annulus structure. Since both TEM (Figure 3F) and SEM (Figure 3G) results clearly showed the defective annulus structure of Fbxo24 KO sperm, we believe the annulus defects are consistent and highly unlikely caused by sample preparation.

      (10) A Cross-section image for the endpiece of Fbxo24-KO sperm is not suitable. There is a longitudinal column structure of the principal piece.

      Thank you for your comments. It is difficult to observe a completely longitudinal structure of sperm tail under TEM. The cross-section of the endpiece and principal piece allowed us know the structure of the axoneme, ODFs and fibrous sheath (FS).

      (11) The endpiece of Fbxo24-KO sperm seems to have a normal axoneme. Do all endpieces of Fbxo24KO sperm have normal axoneme? Also, the authors need to describe whether an axonemal structure is damaged and disrupted in all Fbxo24-KO sperm.

      Our TEM data showed the axonemal structure was impaired in the endpiece of Fbxo24 KO sperm (See right panels of Figure 3H). Moreover, based on the ultrastructure analysis of TEM, we found over 90% of Fbxo24 sperm had a damaged axonemal structure.

      (12) Reference blots in Fig 3I, 3J, 4E (left), 5C and 5E are quite faint. The authors should replace the blot images.

      Thank you for pointing out this. We have rerun Western blot multiple times but could not obtain better images due to antibody sensitivity. However, we quantified the protein levels and performed a statistical comparison, as shown in Figure S3B, to establish a good readout from these images for the readers.

      (13) Loading controls are required - 7D-H.

      Done as suggested. Thanks!

      (14) How do the authors measure the midpiece length? From where to where? This should be clarified.

      Good question. We measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this on Page 16.

      (15) How are the bands for Fbxo24 shifted during IP in Fig 7A?

      The protein modification in the interaction may cause the band shift.

      (16) There are several typos throughout the manuscript. Please check carefully and fix them.

      Thank you for your careful review. We have corrected and fixed all the typos as far as we can.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      (1) Please provide a schematic of HA-Fbxo24 knock-in construct and strategy together with knockout (Figure S2) or even separately early in Figure S1. The description of using the transgenic mouse is mentioned even earlier than the knockout but there are no citations or methods provided in the text other than that listed in Materials and Methods.

      Thank you for your suggestion. As suggested, the schematic of the HA-Fbxo24 knock-in strategy has been supplemented in Figure S2A. The description of using the transgenic mouse has been added to the results, as shown on page 4 of lines 102-103.

      Also, it is not clear to what extent the phenotypic and molecular characterization of HA-transgenic mice is performed. For example, Lines 134-139: The use of Fbxo24-HA labeled transgenic mice results in the rescue of spermatogenesis and fertility as shown in Figure 2F by measuring the litter size. It is not clear how this observation leads the author to state that this rescues defects in spermiogenesis. Please clarify how and what other measures are taken to support this conclusion. Is the observed infertility due to defects in spermatogenesis or spermiogenesis?

      Thank you for your question. We crossed FBXO24-HATag males with FBXO24−/− females to obtain FBXO24−/−; FBXO24-HATag males. We examined the testes volume and histological morphology of FBXO24−/−; FBXO24-HATag males and found that they were similar to FBXO24+/−; FBXO24-HATag littermates, indicating that spermatogenesis was restored, as shown in Figure S2H.

      (2) Line 107 vs Line 114: Please use the terminology spermatogenesis and spermiogenesis consistently throughout the text. Earlier in the introduction, the authors clearly defined that spermatogenesis involves three phases, with the third phase referred to as spermiogenesis. However, the author concludes in the first line that "FBXO24 plays a role during spermatogenesis" while summarizing at the end of the paragraph that this protein is "expressed in haploid spermatids specifically during spermiogenesis". Therefore, it is not clear whether the authors conclude that FBXO24 is important for all of spermatogenesis (line 107) or only for part of spermiogenesis (line 114). Another example is line 219 vs. 238: At this point in the manuscript, it is again unclear whether the authors want to study molecular changes during spermatogenesis or spermiogenesis upon FBXO24 depletion. Many examples of such cases throughout the text, and it is recommended to be consistent in using more restrictive terminology whenever applicable for a clear interpretation.

      We thank you for your careful review. We have double-checked the terminology of spermatogenesis and spermiogenesis and made it consistent throughout the text of the revised manuscript.

      (3) It is not clear how rampant/frequent the Fbxo24-knockout sperm show defects in head morphology based on Figures 3C, 3F, and 5A since it seems that there are some sperm showing relatively normallooking sperm heads. Please provide quantification.

      We have performed the quantification and found that over 80% of Fbxo24 KO sperm showed defective structures in the sperm head.

      (4) Figure 3B: The authors describe in the figure legend that 3 mice were analyzed in each group. The standard deviation for the WT analysis is missing, or if the author wanted to set the WT value to 100%, the bar and scale shown on the y-axis do not fit. The value for WT looks more like 95%.

      We have indeed analyzed sperm motility based on the WT value set at 100% and have revised Figure 3B in the revision. We apologize for this oversight.

      (5) Figure 3 B and C: It is not clear how the motility is measured. Is CASA used (not described in Methods). The conclusion about abnormal flagellar bending in KO spermatozoa cannot be drawn from the static microscopic images alone. Please provide more details of motility analysis together with videos of live cell imaging.

      The sperm motility was measured manually using a hemocytometer, according to the reference.

      We provided the details of sperm motility analysis in the Materials and Methods section on Page 16.

      (6) Figure 3 I and J: These are one of a few figures that are not supported by statistical analysis. In particular, for 3I, GAPDH controls of WT and KO protein do not show equal loading, which could explain the lower expression of the KO protein. Please show normalized bar graphs with multiple biological replicates or at least show a representee technical replicat that shows equal loading of GAPDH to better support the conclusion.

      Thank you for your suggestion. Statistical comparison of relative protein expression was supplemented, as shown in new Figure S3B.

      (7) Line 184: It is not clear how the authors define a swollen mitochondrion? Are there any size criteria (roundness) that can be measured to distinguish between a swollen and a non-swollen mitochondrion? It is recommended to use another terminology as often 'swollen' implies there is a difference in osmolarity but there is no experiment to support this implication.

      Thank you for your comment. We have changed the “swollen” to “vacuolar” in the revision, as shown on Page 7.

      (8) Figure S4, without a bright field image, it is hard to see the purity and morphology of the isolated prep. Please provide the bright field images together or as overlaid images.

      We agree with your comment. We have provided the overlaid images in new Figure S4A.

      (9) There is a big logic jump in what prompts the authors to look MIWI protein level and link the observation to MIWI/piRNA pathway in both Introduction and Results while it is one of the main findings. It is recommended to provide a better rationale and logical flow in the text.

      Thank you for your suggestion. We have added a sentence explaining why we wanted to focus on studying MIWI expression (see lines 190-193 on page 7).

      Minor comments

      (1) Please keep all the conventions of gene vs. protein nomenclature. For example, write the genes mentioned in the figures in italics with the first letter in Capital, as it is done in the main part. Proteins should be in ALL CAPITAL like FBXO24.

      The names of gene and protein have been revised in the revision, as suggested.

      (2) In the MM section, the name of the manufacturer and the location of the materials used are missing in several sections. Please go back through the MM section and add this information in the appropriate places.

      Done as suggested. Thank you!

      (3) On page 4, the authors mentioned that "Further qPCR analysis of developmental testes and purified testicular cells showed that FBXO24 mRNA was highly expressed in the round spermatids and elongating spermatids (Fig 1B-C)". Please include statistical analyses for Fig 1B-C as well as for Fig 1A to support the written statements.

      Statistical comparison was supplemented, as shown in Figure 1. P-values are denoted in figures by *p < 0.05.

      (4) Figure 3E: Please describe in more detail how the length of the midpiece was measured. Was it based on TEM images or based on fluorescent images using MitoTracker?

      As we responded to Reviewer #1, we measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this in the Method and Material section on Page 16.

      (5) Line 431: In the "Electron Microscopy" section of the MM part, the author should indicate the ascending ethanol series (%) used.

      Done as suggested. Thank you!

      (6) Line 432: The thickness of the sections prepared is missing, as well as an indication of the microtome used.

      We have added thickness and the microtome in the Method and Material section on Page 16.

      (7) Line 433: If the generated tiff files have been processed with Adobe Photoshop, this information is missing.

      We have provided information on the usage of Adobe Photoshop for the generation of tiff files on Page 17.

      (8) Lines 445, 452, 467: In some places in the paper, the temperature is written with a space between the number and {degree sign}C, and sometimes it is not. Please go through the paper and make it consistent. The usual spelling is 4{degree sign}C.

      We have gone through the manuscript and checked all the spelling of temperature writing to make them consistent. Thank you for careful review.

      (9) Line 469: The gel documentation system used is not mentioned.

      Done as suggested. Thank you!

      (10) Line 469: The 'TM' should be superscripted.

      Done as suggested.

      (11) Line 489: A space is missing between the changes and the parenthesis.

      Done as suggested.

      (12) Line 495-496: The authors write that the fractions enriched with round spermatids after sedimentation were collected manually. Was a determination of cell concentration - e.g., 2 x106 cells/ml -performed after collection of the cells? How were the cells stored until use? Please add the sedimentation time and used temperature.

      Store the cell in the 1´ Krebs buffer on ice. The cell sediment was through a BSA density gradient for 1.5 h at 4°C. The cell concentration was determined after collection, as shown on Page 18.

      (13) Line 505: spelling error. Instead of " manufacturer's procedure" it is written manufactures' instructions.

      The spelling error was corrected.

      (14) Line 520: Please write a short sentence on how the purification of the 16-40 nt long RNA was performed.

      The length of 16–40 nt RNA was enriched by polyacrylamide gel electrophoresis. We added this information on Page 19 of line 531.

      (15) Line 528: The version of the used GraphPad software is missing.

      The version of GraphPad software was supplemented, as shown on Page 19.

      (16) Line 677: For qPCR analyses, the number of mice analyzed (N) and a statistical evaluation are missing.

      The statistical comparison and the numbers for biological replication were added, as shown on Page 26.

      (17) Figure 3D: Please add a scale bar.

      Done as suggested. Thanks!

      (18) Line 371 and Line 377: Two times "in summary" is written. Please make one summary for the whole paper.

      This sentence was revised, as shown in Page 13.

      (19) Line 382: To be consistent in the whole paper, please write Figure 10 in bold letters.

      Done as suggested.

      (20) Please make the size and font of the references consistent with the main text.

      Done as suggested. Thanks again for your careful review.

      Reviewer #3 (Recommendations For The Authors):

      I would like to see the description of the FBXO24 immunoprecipitation experiment performed in HEK293T cells. This somatic cell line does not normally express Miwi, so how Miwi was detected in FBXO24 mCherry IP beads? It is not mentioned if Miwi is expressed from a recombinant vector in this experiment. Similarly, I would like to see a better description of the experiment described in the same paragraph towards the end of it with the ubiquitin peptides, it is not clear.

      Thank you for your comments. FBXO24-mCherry was expressed in HEK293T cells and the immunoprecipitates was incubated with the protein lysate of the testes (see lines 268-272 on Page 10). The description of the ubiquitin experiment was added as well, as shown in lines 283-286 on Page 10.

      Line 263: I think the term ectopic here is not appropriate, a correction is needed.

      We have changed “ectopic” to “increased” in the revision (see line 268 on Page 10).

      I would like the authors to provide a tentative explanation or evidence of why FBXO24 KO males are completely sterile, even though there are still mature sperm produced with some motility. Since there are defects in nuclear condensation it will be very relevant to check DNA damage/fragmentation, which could contribute to the sterility phenotype.

      This is a good suggestion. We reanalyzed the sperm DNA damage by TUNEL staining and shown the new data in Figure S3E-F.

      Line 213: There have been some conflicting reports about the role of RNF8 in spermiogenesis, but a recent report has shown that RNF8 is not involved in histone PTMs that mediate histone to protamine transition (Abe et al Biol Reprod 2021 https://doi.org/10.1093%2Fbiolre%2Fioab132).

      Thank you for your comment. We have cited this critical reference and discussed it in Discussion section on Page 12.

      Figure 7: I would like to see zoomed-out views of the affected exons, so that flanking unaffected exons can be used as a reference for unaffected splicing. Most of the genome browser views in this image only show affected exons and it is impossible to see if these alone are affected or if the reduced RNAseq coverage in those exons is a result of overall reduced mapped reads in these genes. Also, a fixed Y axis with the same max value should be shown for these genome browser snapshots so that the expression level is comparable between the two genotypes.

      Thank you for your comments. Loading control of RT-PCR and scale range of Y axis were added in new Figure 7.

      Minor corrections:

      Line 70: correct "..functions as protein-protein interaction..".

      Thank you for your careful review. We have corrected this sentence (see line 69 on Page 3).

      Line 101: correct "..qPCR analysis of developmental testis..".

      We have corrected this sentence (see line 100 on Page 4). Thanks again.

      Line 116: correct "..results in detective..".

      Corrected.

      Line 186: correct ".. explored..".

      Corrected.

      Line 218: correct ".. gene expressions.

      Corrected.

      Line 221: correct "..genes significantly differentiated expressed".

      Corrected.

      Line 241: FBXO24 was shown earlier in both cytoplasm and nucleus.

      We have changed “FBXO24 is mainly confined to the nucleus” to “FBXO24 expressed in the nucleus”, as shown in line 247 on Page 9.

      Line 501-502: correct "..reverse transcriptional".

      “reverse transcriptional” was changed into “reverse transcription”, showing in Page 18.

      Line 686: correct ".. deficiency male..".

      Corrected.

      Line 769: correct "..Western blots were adopted..".

      Corrected.

      Line 784: correct "..WT tesis..".

      Corrected.

      I cannot understand exactly what is shown in Figure 9B. Some elements marked on the X-axis are single base locations (-2K, TSS, +2K) and others are stretches of sequences so they cannot be equivalent. Why there is only an intron shown? There should be a measure of normalized expression on the Y-axis.

      Thank you for your questions. The X-axis means that genome segments were scaled to the same size and were calculated the signal abundance, which was analyzed by computeMatrix. Aim to know the piRNA source, piRNA was mapped to the gene body, including introns, CDS and UTRs. The value of the Y-axis is the normalized count.

      Figure 6F is not needed.

      Figure 6F was used to illustrate the number of different types of mRNA splicing upon FBXO24 deletion in the round spermatids. To better understand the splicing for the reader, we decided to keep it.

      The last two paragraphs of the discussion seem to be redundant.

      Thank you for pointing out this. We have revised the last two paragraphs of the discussion.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      R. C. Edgar, et al., Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through host-switching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version did not necessarily fail on our dataset due to its size (ALE ran, but provided unrealistic parameter estimates and was not able to output possible reconciliations, as mentioned in our Material and Methods section). We think it most likely did not run because there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent transfers is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. Following a suggestion from reviewer #3, we are going to try running the dated version of ALE independently on the alpha and beta-coronaviruses, resulting in smaller datasets. This will help us elucidate whether the dated version of ALE fails due to data size or the absence of a codiversification pattern.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that cross-species transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host transfers involve unsampled intermediate hosts. To address the reviewer's comment, we will better underline the importance of sampling biases in our main text and include the suggested references. We will also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text.

      We agree that distinguishing between alpha and beta coronaviruses will provide useful additional insights; we are going to run separate cophylogenetic analyses for these two sub-clades. We will report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that we will now discuss.

    1. Author Response

      Reviewer #1:

      This manuscript presents an extremely exciting and very timely analysis of the role that the nucleosome acidic patch plays in SWR1-catalyzed histone exchange. Intriguingly, SWR1 loses activity almost completely if any of the acidic patches are absent. To my knowledge, this makes SWR1 the first remodeler with such a unique and pronounced requirement for the acidic patch. The authors demonstrate that SWR1 affinity is dramatically reduced if at least one of the acidic patches is absent, pointing to a key role of the acidic patch in SWR1 binding to the nucleosome. The authors also pinpoint a specific subunit - Swc5 - that can bind nucleosomes, engage the acidic patch, and obtain a cryo-EM structure of Swc5 bound to a nucleosome. They also identify a conserved arginine-rich motif in this subunit that is critical for nucleosome binding and histone exchange in vitro and for SWR1 function in vivo. The authors provide evidence that suggests a direct interaction between this motif and the acidic patch.

      Strengths:

      The manuscript is well-written and the experimental data are of outstanding quality and importance for the field. This manuscript significantly expands our understanding of the fundamentally important and complex process of H2A.Z deposition by SWR1 and would be of great interest to a broad readership.

      We thank the reviewer for their enthusiastic and positive comments on our work.

      Reviewer #2:

      Summary:

      In this study, Baier et al. investigated the mechanism by which SWR1C recognizes nucleosomal substrates for the deposition of H2A.Z. Their data convincingly demonstrate that the nucleosome's acidic patch plays a crucial role in the substrate recognition by SWR1C. The authors presented clear evidence showing that Swc5 is a pivotal subunit involved in the interaction between SWR1C and the acidic patch. They pared down the specific region within Swc5 responsible for this interaction. However, two central assertions of the paper are less convincing. First, the data supporting the claim that the insertion of one Z-B dimer into the canonical nucleosome can stimulate SWR1C to insert the second Z-B dimer is somewhat questionable (see below). Given that this claim contradicts previous observations made by other groups, this hypothesis needs further testing to eliminate potential artifacts. Secondly, the claim that SWR1C simultaneously recognizes the acidic patch on both sides of the nucleosome also needs further investigation, as the assay used to establish this claim lacks the sensitivity necessary to distinguish any difference between nucleosomal substrates containing one or two intact acidic patches.

      Strengths:

      As mentioned in the summary, the authors presented clear evidence demonstrating the role of Swc5 in recognition of the nucleosome acidic patch. The identification of the specific region in Swc5 responsible for this interaction is important.

      We thank the reviewer for their careful critique of our work. Below we address each major concern.

      Major comments:

      (1) Figure 1B: It is unclear how much of the decrease in FRET is caused by the bleaching of fluorophores. The authors should include a negative control in which Z-B dimers are omitted from the reaction. In the absence of ZB dimers, SWR1C will not exchange histones. Therefore, any decrease in FRET should represent the bleaching of fluorophores on the nucleosomal substrate, allowing normalization of the FRET signal related to A-B eviction.

      In this manuscript, as well as in our two previous publications (Singh et al., 2019; Fan et al.,2022), we have presented the results of no enzyme controls, +/- ZB dimers, no ATP controls, or AMP-PNP controls for our FRET-based, H2A.Z deposition assay (see also Figure S3). We do not observe significant levels of photobleaching in this assay, either during ensemble measurements or in an smFRET experiment. To aid the reader, we have added the AMP-PNP data for the experiment shown in Figure 1B. The results show there is less than a 10% decrease in FRET over 30’, and the signal from the double acidic patch disrupted nucleosome is identical to this negative control.

      (2) Figure S3: The authors use the decrease in FRET signal as a metric of histone eviction. However, Figure S3 suggests that the FRET signal decrease could be due to DNA unwrapping. Histone exchange should not occur when SWR1C is incubated with AMP-PNP, as histone exchange requires ATP hydrolysis (10.7554/eLife.77352). And since the insertion of Z-B dimer and the eviction of A-B dimer are coupled, the decrease of FRET in the presence of AMP-PNP is unlikely due to histone eviction or exchange. Instead, the FRET decrease is likely due to DNA unwrapping (10.7554/eLife.77352). The authors should explicitly state what the loss of FRET means.

      We agree with the reviewer, that loss of FRET can be due to DNA unwrapping from the nucleosome. We have previously demonstrated this activity by SWR1C in our smFRET study (Fan et al., 2022). However, DNA unwrapping is highly reversible and has a time duration of only 1-3 seconds. We and others have not observed stable unwrapping of nucleosomes by SWR1C, but rather the stable loss of FRET reports on dimer eviction. We assume the reviewer is concerned about the rather large decrease in FRET signal shown in the AMP-PNP controls for Figure S3, panels A and D. For the other 7 panels, the decrease in FRET with AMP-PNP are minimal. In fact, if we average all of the AMP-PNP data points, the rate of FRET loss is not statistically different from no enzyme control reactions (nucleosome plus ZB dimers).

      Data for panels A and D used a 77NO nucleosomal substrate, with Cy3 labeling the linker distal dimer. This is our standard DNA fragment, and it was used in Figure 1B. The only difference between data sets is that the data shown in Fig 1B used nucleosome reconstituted with a Cy5-labelled histone octamer, rather than the hexasome assembly method used for Fig S3. Three points are important. First, for all of these substrates, we assembled 3 independent nucleosomes, and the results are highly reproducible. Two, we performed a total of 6 experiments for the 77NO-Cy5 substrates to ensure that the rates were accurate (+/-ATP). Third, and most important, we do not see this decrease in FRET signal in the absence of SWR1C (no enzyme control). This data was included in the data source file. Thus, it appears that there is significant SWR1C-induced nucleosome instability for these two hexasome-assembled substrates. We now note this in the legend to Figure S3. Key for this work, however, is that there is a large increase in the rate of FRET loss in the presence of ATP, and this rate is faster when a ZB dimer was present at the linker proximal location. In response to the last point, we state in the first paragraph of the results: “The dimer exchange activity of SWR1C is monitored by following the decrease in the 670 nm FRET signal due to eviction of the Cy5-labeled AB-Cy5 dimer (Figure 1A).”

      (3) Related to point 2. One way to distinguish nucleosomal DNA unwrapping from histone dimer eviction is that unwrapping is reversible, whereas A-B eviction is not. Therefore, if the authors remove AMP-PNP from the reaction chamber and a FRET signal reappears, then the initial loss of FRET was due to reversible DNA unwrapping. However, if the removal of AMP-PNP did not regain FRET, it means that the loss of FRET was likely due to A-B eviction. The authors should perform an AMP-PNP and/or ATP removal experiment to make sure the interpretation of the data is correct.

      See response to item 2 above

      (4) The nature of the error bars in Figure 1C is undefined; therefore, the statistical significance of the data is not interpretable.

      We apologize for not making this more explicit for each figure. The error bars report on 95% confidence intervals from at least 3 sets of experiments. This statement has been added to the legend.

      (5) The authors claim that the SWR1C requires intact acidic patches on both sides of the nucleosomes to exchange histone. This claim was based on the experiment in Figure 1C where they showed mutation of one of two acidic patches in the nucleosomal substrate is sufficient to inhibit SWR1C-mediated histone exchange activity. However, one could argue that the sensitivity of this assay is too low to distinguish any difference between nucleosomes with one (i.e., AB/AB-apm) versus two mutated acidic patches (i.e., AB-apm/AB-apm). The lack of sensitivity of the eviction assay can be seen when Figure 1B is taken into consideration. In the gel-shift assay, the AB-apm/AB-apm nucleosome exhibited a 10% SWR1C-mediated histone exchange activity compared to WT. However, in the eviction assay, the single AB/AB-apm mutant has no detectable activity. Therefore, to test their hypothesis, the authors should use the more sensitive in-gel histone exchange assay to see if the single AB/AB-apm mutant is more or equally active compared to the double AB-apm/AB-apm mutant.

      Our pincher model is based on three, independent sets of data, not just Figure 1C. First, as noted by the reviewer, we find that disruption of either acidic patch cripples the dimer exchange activity of SWR1C in the FRET-based assay. Whether the defect is identical to that of the double APM mutant nucleosome does not seem pertinent to the model. In a second set of assays, we used fluorescence polarization to quantify the binding affinity of SWR1C for wildtype nucleosomes, a double APM nucleosome, or each single APM nucleosome. Consistent with the pincher model, each single APM disruption decreases binding affinity at least 10-fold (below the sensitivity of the assay). Finally, we monitored the ability of different nucleosomes to stimulate the ATPase activity of SWR1C. Consistent with the pincher model, a single APM disruption was sufficient to eliminate nucleosome stimulation.

      (6) The authors claim that the AZ nucleosome is a better substrate than the AA nucleosome. This is a surprising result as previous studies showed that the two insertion steps of the two Z-B dimers are not cooperative (10.7554/eLife.77352 and 10.1016/J.CELREP.2019.12.006). The authors' claim was based on the eviction assay shown in Fig 1C. However, I am not sure how much variation in the eviction assay is contributed by different preparations of nucleosomes. The authors should use the in-gel assay to independently test this hypothesis.

      For all data shown in our manuscript, at least three different nucleosome preparations were used. The impact of a ZB dimer on the rates of dimer exchange was highly reproducible among different nucleosome preparations and experiments. We also see reproducible ZB stimulation for three different substrates – with ZB on the linker proximal side, the linker distal side, and on one side of a core particle. We do not believe that our data are inconsistent with previous studies. First, the previous work referenced by the reviewer performed dimer exchange reactions with a large excess of nucleosomes to SWR1C (catalytic conditions), whereas we used single turnover reactions. Secondly, our study is the first to use a homogenous, ZA heterotypic nucleosome as a substrate for SWR1C. All previous studies used a standard AA nucleosome, following the first and second rounds of dimer exchange that occur sequentially. And finally, we observe only a 20-30% increase in rate by a ZB dimer (e.g. 77N0 substrates), and such an increase was unlikely to have been detected by previous gel-based assays.

      Minor comments:

      (1) Abstract line 4: To say 'Numerous' studies have shown acidic patch impact chromatin remodeling enzymes activity may be too strong.

      Removed

      (2) Page 15, line 15: The authors claim that swc5∆ was inviable on formamide media. However, the data in Figure 8 shows cell growth in column 1 of swc5∆.

      The term ‘inviable’ has been replaced with ‘poor’ or ‘slow growth’

      (3) The authors should use standard yeast nomenclature when describing yeast genes and proteins. For example, for Figure 8 and legend, Swc5∆ was used to describe the yeast strain BY4741; MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0; YBR231c::kanMX4. Instead, the authors should describe the swc5∆ mutant strain as BY4741 MAT a his3∆1 leu2∆0 met15∆0 ura3∆0 swc5∆::kanMX4. Exogenous plasmid should also be indicated in italics and inside brackets, such as [SWC5-URA3] or [swc5(R219A)-URA3].

      We apologize for missing this mistake in the Figure 8 legend. We had inadvertently copied this from the euroscarf entry and forgot to edit the entry. We decided not to add all the plasmid names to the figure, as it was too cluttered. We state in the figure legend that the panels show growth of swc5 deletion strains harboring the indicated swc5 alleles on CEN/ARS plasmids.

      (4) According to Lin et al. 2017 NAR (doi: 10.1093/nar/gkx414), there is only one Swc5 subunit per SWR1C. Therefore, the pincher model proposed by the authors would suggest that there is a missing subunit that recognizes the second acidic patch. The authors should point out this fact in the discussion. However, as mentioned in Major comment 6, I am not sure if the pincer model is substantiated.

      In our discussion, we had noted that the published cryoEM structure had suggested that the Swc2 subunit likely interacts with the acidic patch on the dimer that is not targeted for replacement, and we proposed that Swc5 interacts with the acidic patch on the exchanging H2A/H2B dimer. We have now made this more clear in the text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We wish to thank the reviewers for their helpful insightful comments. Their concerns were mainly related to the interpretation of the data, help in clarifying our statements and improving our discussion.

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting study It involves the utilization of hippocampal neuronal cultures from syntaxin 1 knock-out mice. These cultures serve as a platform for monitoring changes in synaptic transmission through electrophysiological recording of postsynaptic currents, upon lentiviral infection with various isoforms, chimeras, and point mutations of syntaxins.

      The authors observe the following:

      (1) Syntaxin2 restores neuronal viability and can partially rescue Ca2+-evoked release in syntaxin1 knock-out neurons that it is much slower (cumulative charge transfer differences) and with a clearly smaller RRP than when rescued with syntaxin1. In contrast, syntaxin2-mediated rescue leads to a high increase in spontaneous release (Figure 1). Convincingly, the authors conclude that syntaxin 1 is optimized for fast phasic release and for clamping of spontaneous release, in comparison with syntaxin2.

      (2) The replacement of the SNARE domain (or its C-terminal part) of syntaxin1 by the SNARE domain of syntaxin2 (or its C-terminal part) rescues the fast kinetics, but not the amplitude, of Ca2+-evoked release. This is associated with a decrease in the size of the RRP and an increase in spontaneous release. The probability of vesicular release (PVR) is a little bit increased, which is intriguing because a little decrease would be expected instead according to the reduced RRP, indicating that an enhancement of Ca2-dependent fusion is occurring at the same time by unknown mechanisms as the authors properly point out. The replacement of the Analogous experiments in which the SNARE domain of syntaxin1 is replaced into syntaxin2, reveals the exitance of differential regulatory elements outside the SNARE domain.

      (3) Different constructs of syntaxin 1 and syntaxin 2 display different expression levels. On the other hand, the expression levels of Munc-18 are associated with the characteristics of the transfected specific syntaxin construct. In any case, the electrophysiological phenotypes cannot be consistently explained by changes in Munc-18.

      (4) Mutations in several residues of the outer surface of the C-terminal half of the syntaxin1 SNARE domain lead to alterations in the RRP and the frequency of spontaneous release, but the changes cannot attributed to a change in the net surface charge, because the alterations occur even in paired mutations in which electrical neutrality is conserved.

      Comments:

      (1) This is a comment regarding the interpretation of the results. In general, the decrease in the RRP size is associated with the increased frequency of spontaneous release due to unclamping. The authors claim that both phenomena seem to be independent of each other. In any case, how can the authors discard the possibility that the unclamping of spontaneous release leads to a decrease in the RRP size?

      The main argument against the reduction of the RRP being caused by the observed increase in the mEPSC frequency is based on kinetics of refilling and depletion. The average time a vesicle fuses spontaneously after it becomes primed is 500 – 1000 seconds (spontaneous vesicle release rate – STX1 Figure 1, Figure 2 and Figure 3). The time it takes to refill the RRP after depletion is in the order of 3 seconds (Rosenmund and Stevens, 1996). Therefore, the refilling of the RRP is more than 100 times faster. Even when the spontaneous release would increase 5 fold, this would lead to less than 5 % of the steady state depletion of the RRP.

      (2) The authors have analyzed the kinetics of mEPSCs and found differences (Fig2-Supp. Fig1; Fig2-Supp. Fig1). It would be interesting and pertinent to discuss these data in the context of potential phenotypes in the fusion pore kinetics involving syntaxin1 and syntaxin2 and their SNARE domains. Indeed, the figure will improve by including averaged traces of mEPSCs.

      We thank the reviewer for the idea. Upon closer examination of the changes in mEPSC rise time and mEPSC decay time we noticed a minor slowing in the mEPSC rise time from 0.443ms (SEM0.0067) of STX1A to 0.535ms (SEM0.0151) for STX1A-2(SNARE) or 0.507ms (SEM0.01251) for STX1A-2(Cter), while the mEPSC half widths did not change significantly. It is possible that the measured change is related to the detection algorithm as mEPSC detection at elevated frequencies becomes more difficult due to increased overlap of event, and we therefore prefer to refrain from making any mechanistic claims.

      Minor comments:

      (1) Fig2 J; Fig 3 J. It is difficult to distinguish between different colors and implementing a legend within the graph will be very helpful.

      (2) Fig3 H. Please change the color of the box plot for Stx1 A to improve the contrast with the individual data points.

      (3) Page 6. Line 225. "Figure 2D and E" should be corrected to "Figure 2C and D"

      (1) Colors were changed for clearer visualization. (2) Unfortunately, changing the color did not improve the contrast with the individual plots. However, the numerical data is all included in the data sheets of the corresponding figure. (3) The mistake was corrected.

      Reviewer #2 (Recommendations For The Authors):

      Line 135-136: Are cited numbers cited in the text mean and SEM? Please indicate.

      Line 139 and Figure 1G: The difference between purple and blue was very hard to see on my hard copy.

      Line 152: Reference to Figure 1L should probably be 1K.

      Line 183: Reference to Figure 2C should probably be Figure 2F.

      Line 225: Reference to Figure 2D and 2E should probably be 2C and 2D.

      Line 239: Reference to Figure 3I should probably be 3H.

      All typos were addressed and colors were changed for better visualization.

      Line 210-211: Sentence ("One of the benefits..") is hard to understand.

      Thank you for noticing this mistake, agreeably the the sentence did not add any important or new information and so it was deleted. Additionally, the message of the mentioned sentence was already clearly stated in lines 209-211.

      Figure 4E-H misses data for STX2, for the figure to be arranged like Figure 5.

      Given that STX1 is the endogenous syntaxin in hippocampal neurons, we use it at a control for all the analysis done in STX2 and STX2-chimera experimental groups, thus it is included in Figure 3 and 5.

      It appears that the authors do not present or discuss the Western Blot in Fig. 4D. Are the quantitative results of the Western Blot consistent with or different from the quantification of the immunostainings (Fig. 4B-C)? A similar question for Figure 5D, which also seems not to be presented.

      In terms of quantification, we have relied mainly on the ICC experiments because they test also for putative impairments in transport to the presynaptic compartment. Our WB data are overall consistent with the results, but were not used to quantitate expression of our syntaxin chimeras and mutations in the STX1-null hippocampal neuron model.

      Figure 6F-G: The normalization of spontaneous vesicular release rates is not clear, because the vesicular release rates already contain a normalization (mEPSC rate divided by RRP size). Is a further normalization of the STX1A condition informative? The authors should consider presenting the release rates themselves. In any case, the normalization should be presented/explained, at least in the legends.

      The reviewer is in principle correct. Due to the large number of experimental groups we had to perform recordings from multiple cultures, where not all experimental groups were present, while the WT STX1 was present as a consistent control. The reduce culture to culture variability, additional normalization to the WT control group was performed. However, we also included the raw data numerical values in the data-source sheets (Normalized and absolute), which produce a similar overall outcome.

      References to Figure 7 subpanels (A, B, and C) are missing.

      Thank you for the comment. We have integrated all panels into one for better representation and understanding since they are representative of one another.

      Lines 330-339 and Figure 7 in Discussion: the authors discuss that adding the non-cognate STX2 SNARE-domain to syntaxin-1 might destabilize the primed state and decrease the fusion energy barrier (as indicated in Figure 7C). What is the evidence that the decrease in RRP size is not caused solely by the depletion of the pool due to the increased spontaneous fusion?

      Please see the comments to major point 2 of reviewer 1.

      Statistics: Missing is the number of observations (n) for all data. Even if all data points are displayed, this should be stated.

      N numbers are included in the data sheets attached to each figure.

      The statement (start of Discussion,) that the SNARE-domain of STX1 'plays a minimal role in the regulation for Ca2+-evoked release' is somewhat puzzling, since without the SNARE-domain in STX1 there would be no Ca2+-evoked release. I guess these statements (similar statements are found elsewhere) are due to the interesting finding that STX2 leads to a decrease in release kinetics, compared to STX1, and this is not (entirely) due to differences in the SNARE-domain. I would suggest rephrasing the finding in terms of release kinetics. Also, the statement in the last sentence of the Abstract is not clear.

      Thank you for pointing this out and we agree that our experiments showed strong impact of the syntaxin isoform exchange on release kinetics and overall release output. A similar comment came also from reviewer #3 and so, we have addressed both comments as one.

      Our confusing statement resulted from the order of the presented results and our summarizing remarks for each section. Our statement reflected our finding that mutating residues in the C-terminal part of the STX1 SNARE motif affected only spontaneous release and RRP size but not release efficacy. We now state (pg. 6 lines 231-233) that the data observed from the comparison of “the results obtained from the Ca2+-evoked release between STX1 and STX2 support major regulatory differences of the domains outside of the SNARE domain between isoforms”.

      We have changed the abstract pg. 2 lines 55-56

      We have changed the introduction pg. 3 lines 102-105 for a better contextualization.

      We have changed the start of the discussion pg. 9 lines 250-252 for better contextualization.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, Salazar-Lázaro et al. presented interesting data that C-terminal half of the Syx1 SNARE domain is responsible for clamping of spontaneous release, stabilizing RRP, and also Ca2+-evoked release. The authors routinely utilized the chimeric approach to replace the SNARE domain of Syx1 with its paralogue Syx2 and analyzed the neuronal activity through electrophysiology. The data are straightforward and fruitful. The conclusions are partly reasonable. One obvious drawback is that they did not explore the underlying mechanism. I think it is easy for the authors to carry out some simple assays to verify their hypothesis for the mechanism, instead of just talking about it in the discussion section. In all, I appreciate the data presented in the manuscript. If the authors could supply more data on the mechanisms, this would be important research in the field. Some critical comments are listed below:

      We thank the reviewer for his/her comments and suggestions.

      Major comments:

      (1) In pg.3, lines 102-104, the authors stated that 'We found that the C-terminal half of the SNARE domain of STX1.. ..while it is minimally involved in the regulation of Ca2+-evoked release.' But in pg.5, lines 174-176, they wrote that 'Replacement of the full-SNARE domain (STX1A-2(SNARE)) or the C-terminal half (STX1A-2(Cter)) of the SNARE domain of STX1A with the same domain from STX2 resulted in a reduction in the EPSC amplitude (Figure 2B).' and in pg.5-6, lines 197-199, they wrote that 'Taken together our results suggest that the C-terminal half of the SNARE domain of STX1A is involved in the regulation of the efficacy of Ca2+-evoked release, the formation of the RRP and in the clamping of spontaneous release.' It puzzles me a lot as to what the authors are really trying to express for the relationship between C-half of the SNARE complex and Ca2+-evoked release (i.e., minimally involved or significantly participate in the process?). Please clarify and reorganize the contexts.

      Please see our reply to the last comment of reviewer 2.

      (2) Figure 1-figure supplement 1, the authors should analyze Syx1/VGlut1 level additionally. And, if possible, compare the difference between Syx1/VGlut1 and Syx2/VGlut1.

      The levels of STX1/VGlut1 and STX2/VGlut1 were analyzed in detail in Figures 4 and 5.

      The direct comparison between the expression levels of these two proteins is not possible since affinities of the antibodies to the target proteins are different and can induce potential biases. While this could be overcome by the use of a FLAG-tag to the syntaxin proteins, we have not utilized this approach in this publication. We in addition inferred sufficient and comparable expression of both syntaxins from their ability to rescue some of syntaxin1 loss of function phenotypes.

      (3) Figure 2D only analyzed the EPSC half-width, could the author alternatively analyze the rise/decay time? Also, in Figure 3-figure supplement 1, does it refer to the kinetic parameters of Syx2-1A in Figure 3? It is very confused.

      We have changed the text accordingly and each parameter is referenced to its corresponding figure for clarity. As for the decay and rise time of STX1 and STX1-chimeras, they are in Figure 2-figure supplement 1A and B.

      (4) On pg.4, lines 151-152, 'Finally, no change was observed in the paired-pulse ratio (PPR) between STX1A and STX2 groups (Figure 1L).' does not contain any explanations and comments for this observation in the texts.

      The small EPSC amplitudes and altered kinetics on the STX2 constricts (Figure 1 and Figure 3) have made it more difficult to quantitate paired pulse experiments. Therefore, we preferred not to overinterpret these measurements. The findings that the paired pulse data were not significantly different, fit with the vesicular release probability measurements which showed no major changes. We have made our statement on this basis.

      (5) On pg.6, lines 235-236, the authors wrote that 'Additionally, we found that only STX2-1A(SNARE) and STX2-1A(Cter) could rescue the RRP to around double of what we measured from STX2 and STX2-1A(Nter) (figure 3F)'. However, in Figure 3F, the authors indicated 'n.s.' (p>0.05) for the differences between STX2 and STX2-1A(SNARE)/STX2-1A(Cter). It is perplexing how the authors interpret their data. Definitely, the p-value could not be arbitrarily used as a criterion of difference. An easier way is that indicating the exact p-values for each comparison (indicate in figure legends or list in tables).

      We apologize for any confusion, and hope the modification gives more clarity in our interpretation. The calculated p-values are included in attached data source tables and hope this will provide clarity to our comparative analysis. We have changed the text in pg 7 lines 238-241 and are cautious to overinterpret these results and rely more on the data observed in STX1A-chimeras, which show significant changes in the RRP.

      (6) I noticed that the authors preferred using 'xx% increase/decrease' or 'xx-fold increase/decrease' to interpret their inter-group data. I would doubt whether the interpretations are appropriate. First, it seems that most of the individual scatters from one set were not subject to Gaussian distribution; also, the authors utilized non-parameter tests to compare the differences. Second, the authors did not explicitly indicate the method to calculate the % or fold, e.g., by comparing mean value or median. I think it is a bad choice to use the median to calculate fold changes; meanwhile, the mean value would also be biased, given the fact that the data were not Gaussian-distributed. The authors should be cautious in interpreting their data.

      We thank the reviewer for pointing the inaccuracy of our descriptions and have included the parameter used to calculated the percentage and fold increase/decrease in the materials and methods section. Specifically, the mean. Our intention is to plainly state the amount of change seen in a parameter based on the observed changes in the mean value. We agree with the reviewer that interpreting this could be problematic if we are speculating possible mechanisms. Further test should be conducted as to state whether similar increase/decrease changes in a parameter are due to the disturbance of the same mechanisms or different. E.g., we discussed whether the regulation of SYT1 might be or not be the mechanism affected in some of the chimeras that show an increase in the spontaneous release rate, for the release rate observed in some is massively higher than that seen in SYT1-KO (Bouazza-Arostegui et al., 2022). It is tempting to speculate that it could be due to other mechanisms based on the differences in the changes. For this reason, we have given an array of possible mechanisms affected when we manipulate the SNARE domain of STX1.

      (7) The authors routinely analyzed the levels of Munc18-1 in neuronal lysates by WB and Munc18-1/VGlut1 by immunofluorescence in various Syx1 mutants. However, in my view, these assays were slightly indirect. It is evident that the SNARE domain of Syx1 participates in the binding to Munc18-1 according to the atomic structures (pdb entries: 3C98 and 7UDB). Meanwhile, Han et al. reported that K46E mutation (located in domain 1 of Munc18-1) strongly impairs Syx1 expression, Syx1-interaction, vesicle docking and secretion (Han et al., 2011, PMID: 21900502). Intriguingly, the residue K46 of Munc18-1, which is close to D231/R232 of Syx1, may have potential electrostatic contacts to D231 and R232 of Syx1. This is reminiscent of the possibility that Syx1D231/R232 and some Syx1-2 chimeras lost their normal function through their defective binding to Munc18-1.nmb, To better understand the underlying mechanism, the authors may need to carry out in vivo and/or in vitro binding analysis between syntaxin mutants/chimeras and Munc18-1. They also need to conduct more discussions about the issue.

      We express our gratitude for the identification of a previously overlooked aspect in our investigation of the interplay between Munc18-1 and STX1. In response, we have incorporated additional discourse on this matter in pg11 lines 419-431.

      Additionally, we appreciate the thoughtful suggestion regarding additional experiments to further explore the molecular relationship between Munc18-1 and STX1. We agree that co-immunoprecipitation experiments (either by using an antibody against Munc18-1 or STX1 and STX2) would offer greater insight into whether the binding of these proteins is affected in the isoform or the mutants. Notably, we performed immunoprecipitation experiments by using neuronal lysates of the corresponding groups and using STX1A and STX2 antibodies for the pull-downs. However, we were unable to co-IP Munc18-1 when doing so. Changing the conditions of the experiment did not yield better results and so these experiments remained inconclusive for the moment. For this reason, we included it as an open question and a potential concluding hypothesis of the molecular mechanism. However, Shi et al., 2021, have performed co-IP assays using Munc18-1-wt and a mutant form which affects the binding to the C-terminal half of the SNARE domain of STX, and STX1-wt and a STX mutants targeting some of our residues of interest and showed a decrease in the pulled-down levels of Munc18-1 using HeLa cells. We have made sure to mention the conclusion of this important publication in our discussion.

      (8) The third possible mechanism (i.e., interaction with Syt1) proposed by the authors seems more reasonable. However, the discussions raised by the authors were not enough. For instance, plenty of literature has indicated that Syt1 may participate in synaptic vesicle priming through stabilizing partially or fully assembled SNARE complex (Li et al., 2017, PMID: 28860966; Bacaj et al., 2015, PMID: 26437117; Mohrmann et al., 2013, PMID: 24005294; Wang et al., 2011; PMID: 22184197; Liu et al., 2009, PMID: 19515907); complexins are also SNARE binding modules that regulate synaptic exocytosis. Lack of complexins could lead to unclasping of spontaneous fusion of synaptic vesicles, though it causes severe Ca2+-triggered release at the same time (Maximov et al., 2009, PMID: 19164751). Meanwhile, different domains of complexin may accomplish different steps of SV fusion, early research had indicated that the C-terminal sequence of complexin is selectively required for clamping of spontaneous fusion and priming but not for Ca2+-triggered release (Kaeser-Woo et al., 2012, PMID: 22357870). Likewise, if possible, the authors may need to carry out in vivo and/or in vitro binding analysis to confirm their hypothesis.

      The exploration of complexin´s involvement was limited in our study primarily due to our methodological focus on comprehending molecular mechanisms concerning the sequence disparities between STX1 and STX2. Our laboratory has studied the role of Complexin extensively, and we certainly have had a possible involvement in mind. However, since the sites identified on syntaxin are either conserved between STX1 and STX2 or not close to the central or accessory helical domains of complexin, we did not perform experiments to test putative interactions, and we refrained from discussing complexin in this paper.

      (9) Lastly, I would suspect that whether the defects of Syx2 and Syx1 chimeras were caused by the SNARE complex itself, from another point of view that is different from the hypothesis raised by the authors. Changing the outward residues (or we say the solvent-accessible residues) of the SNARE complex may affect the stability, assembly kinetics, and energetics (Wang and Ma, 2022, PMID: 35810329; Zorman et al., 2014, PMID: 25180101), especially for the C-terminal halves. Is this another possible mechanism through which the C-terminus of Syx1 might contribute to SV priming and clamping of spontaneous release? The authors should at least conduct some discussions about the point.

      Thank you for this suggestion. We indeed assumed that since the hydrophobic layers of the SNARE domains that form the hydrophobic pocket of STX2 and STX1 are mainly conserved, that the intrinsic stability of the SNARE complex is largely unchanged. Additionally, Li et al., (2022) PMID: 35810329 examined the stability of the alfa-helix structure of the SNARE domain of SNAP25. And while they found no changes in the stability and formation of the alfa-helix when mutating outwards-facing residues for methodological purposes (bimane-tryptophan quenching), their study did not selectively explore the effect of mutations of outer-surface residues on the stability of the alfa-helix.

      Zorman et al., (2014) PMID: 25180101, as noted by the reviewer, observed that changes in the sequence of the SNARE domain (by using SNARE proteins from different trafficking systems (neuron, GLUT4, yeast…) correlated with changes in the step-wise SNARE complex assembly. However, they also did not selectively mutate the outer solvent-accessible residues, hindering conclusive speculations in the contribution of said residues on the kinetics and energetics of assembly and intrinsic stability of the SNARE complex.

      Upon petition of the reviewer, we have added this paragraph to discuss an additional mechanism:

      “As a final remark, it is possible that the changes in the spontaneous release rate and the priming stability may stem from a reduced stability of the SNARE complex itself through putative interactions between outer surface residues. Studies of the kinetics of assembly of the SNARE complex which mutate solvent-accessible residues in the C-terminal half of the SNARE domain of SYB2 have shown reduction in the stability of the SNARE complex assembly and are correlated with impaired fusion (Jiao et al., 2018). However, STX1 mutations of outward residues were inconclusive and were always accompanied by hydrophobic layer mutations (Jiao et al., 2018), which affect the assembly kinetics and energetics of the SNARE complex (Ma et al., 2015). Single molecule optical-tweezer studies have focused on the impact of regulatory molecules on the stability of assembly such as Munc18-1 (Ma et al., 2015; Jiao et al., 2018) and complexin (Hao et al., 2023), or on the intrinsic stability of the hydrophobic layers in the step-wise assembly of the SNARE complex (Gao et al., 2012; Ma et al., 2015; Zhang et al., 2017). Although the conserved hydrophobic layers in the SNARE domains of STX1A and STX2 (Figure 1) suggest unchanged zippering and intrinsic stability of the complex, further studies addressing the contribution of surface residues on the stability of the alfa-helix structure of the SNARE domain of STX1 (Li et al., 2022) or the stability of the SNARE complex should be conducted.”

      Minor comments:

      (1) In pg.6, line 236, 'figure 3F', the initial 'f' should be uppercased.

      (3) On pg.11, line 396, the section title 'The interaction of the C-terminus of de SNARE domain of STX1A with Munc18-1 in the stabilization of the primed pool of vesicles.' The word 'de' is confusing, please check.

      (4) In pg.12, line 446, the section title, should 'though' be 'through'?

      These comments have been acknowledged and changed. Thank you

      (2) In pg.7, line 239, '..had an increased PVR (Figure 3G), no change in the release rate (Figure 3I)', should Figure 3I be Figure 3H? and line 240, 'and an increase in short-term depression during 10Hz train stimulation (Figure 3I)', should Figure 3I be Figure 3J? If so, Figure 3I will not be cited in the texts and lack adequate interpretations. Please check.

      We apologize for the oversight in not referencing this specific subpanel of the figure and have incorporated the reference in the text. Additionally, our interpretation of this data is connected to the mechanisms that govern efficacy of Ca2+-evoked response, and its dependence on the integrity of the entire-SNARE domain. We wish to highlight the modifications made to the discussion on the regulation of the Ca2+-evoked response based on previous reviewer comment #1, and a similar comment from reviewer #2 (as stated previously).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Visual Perceptual Learning (VPL) results in varying degrees of generalization to tasks or stimuli not seen during training. The question of which stimulus or task features predict whether learning will transfer to a different perceptual task has long been central in the field of perceptual learning, with numerous theories proposed to address it. This paper introduces a novel framework for understanding generalization in VPL, focusing on the form invariants of the training stimulus. Contrary to a previously proposed theory that task difficulty predicts the extent of generalization - suggesting that more challenging tasks yield less transfer to other tasks or stimuli - this paper offers an alternative perspective. It introduces the concept of task invariants and investigates how the structural stability of these invariants affects VPL and its generalization. The study finds that tasks with high-stability invariants are learned more quickly. However, training with low-stability invariants leads to greater generalization to tasks with higher stability, but not the reverse. This indicates that, at least based on the experiments in this paper, an easier training task results in less generalization, challenging previous theories that focus on task difficulty (or precision). Instead, this paper posits that the structural stability of stimulus or task invariants is the key factor in explaining VPL generalization across different tasks

      Strengths:

      • The paper effectively demonstrates that the difficulty of a perceptual task does not necessarily correlate with its learning generalization to other tasks, challenging previous theories in the field of Visual Perceptual Learning. Instead, it proposes a significant and novel approach, suggesting that the form invariants of training stimuli are more reliable predictors of learning generalization. The results consistently bolster this theory, underlining the role of invariant stability in forecasting the extent of VPL generalization across different tasks.

      • The experiments conducted in the study are thoughtfully designed and provide robust support for the central claim about the significance of form invariants in VPL generalization.

      Weaknesses:

      • The paper assumes a considerable familiarity with the Erlangen program and the definitions of invariants and their structural stability, potentially alienating readers who are not versed in these concepts. This assumption may hinder the understanding of the paper's theoretical rationale and the selection of stimuli for the experiments, particularly for those unfamiliar with the Erlangen program's application in psychophysics. A brief introduction to these key concepts would greatly enhance the paper's accessibility. The justification for the chosen stimuli and the design of the three experiments could be more thoroughly articulated.

      Response: We appreciate the reviewer's feedback regarding the accessibility of our paper. In response to this feedback, we plan to enhance the introduction section of our paper to provide a concise yet comprehensive overview of the key concepts of Erlangen program. Additionally, we will provide a more thorough justification for the selection of stimuli and the experimental design in our revised version, ensuring that readers understand the rationale behind our choices.

      • The paper does not clearly articulate how its proposed theory can be integrated with existing observations in the field of VPL. While it acknowledges previous theories on VPL generalization, the paper falls short in explaining how its framework might apply to classical tasks and stimuli that have been widely used in the VPL literature, such as orientation or motion discrimination with Gabors, vernier acuity, etc. It also does not provide insight into the application of this framework to more naturalistic tasks or stimuli. If the stability of invariants is a key factor in predicting a task's generalization potential, the paper should elucidate how to define the stability of new stimuli or tasks. This issue ties back to the earlier mentioned weakness: namely, the absence of a clear explanation of the Erlangen program and its relevant concepts.

      Response: Thanks for highlighting the need for better integration of our proposed theory with existing observations in the field of VPL. Unfortunately, the theoretical framework proposed in our study is based on the Klein’s Erlangen program and is only applicable to geometric shape stimuli. For VPL studies using stimuli and paradigms that are completely unrelated to geometric transformations (such as motion discrimination with Gabors or random dots, vernier acuity, spatial frequency discrimination, contrast detection or discrimination, etc.), our proposed theory does not apply. Some stimuli employed by VPL studies can be classified into certain geometric invariants. For instance, orientation discrimination with Gabors (Dosher & Lu, 2005) and texture discrimination task (F. Wang et al., 2016) both belong to tasks involving Euclidean invariants, and circle versus square discrimination (Kraft et al., 2010) belongs to tasks involving affine invariance. However, these studies do not simultaneously involve multiple geometric invariants of varying levels stability, and thus cannot be directly compared with our research. It is worth noting that while the Klein’s hierarchy of geometries, which our study focuses on, is rarely mentioned in the field of VPL, it does have connections with concepts such as 'global/local', 'coarse/fine', 'easy/difficulty', 'complex/simple': more stable invariants are closer to 'global', 'coarse', 'easy', 'complex', while less stable invariants are closer to 'local', 'fine', 'difficulty', 'simple'. Importantly, several VPL studies have found ‘fine-to-coarse’ or ‘local-to-global’ asymmetric transfer (Chang et al., 2014; N. Chen et al., 2016; Dosher & Lu, 2005), which seems consistent with the results of our study.

      In the introduction section of our revised version and subsequent full author response, we will provide a clear explanation of the Erlangen program and elucidate how to define the stability of new stimuli or tasks. In the discussion section of our revised version, we will compare our results to other studies concerned with the generalization of perceptual learning and speculate on how our proposed theory fit with existing observations in the field of VPL.

      • The paper does not convincingly establish the necessity of its introduced concept of invariant stability for interpreting the presented data. For instance, consider an alternative explanation: performing in the collinearity task requires orientation invariance. Therefore, it's straightforward that learning the collinearity task doesn't aid in performing the other two tasks (parallelism and orientation), which do require orientation estimation. Interestingly, orientation invariance is more characteristic of higher visual areas, which, consistent with the Reverse Hierarchy Theory, are engaged more rapidly in learning compared to lower visual areas. This simpler explanation, grounded in established concepts of VPL and the tuning properties of neurons across the visual cortex, can account for the observed effects, at least in one scenario. This approach has previously been used/proposed to explain VPL generalization, as seen in (Chowdhury and DeAngelis, Neuron, 2008), (Liu and Pack, Neuron, 2017), and (Bakhtiari et al., JoV, 2020). The question then is: how does the concept of invariant stability provide additional insights beyond this simpler explanation?

      Response: We appreciate the alternative explanation proposed by the reviewer and agree that it presents a valid perspective grounded in established concepts of VPL and neural tuning properties. However, performing in the collinearity and parallelism tasks both require orientation invariance. While utilizing the orientation invariance, as proposed by the reviewer, can explain the lack of transfer from collinearity or parallelism to orientation task, it cannot explain why collinearity does not transfer to parallelism.

      As stated in the response to the previous review, in the revised discussion section, we will compare our study with other studies (including the three papers mentioned by the reviewer), aiming to clarify the necessity of the concept of invariant stability for interpreting the observed data and understanding the mechanisms underlying VPL generalization.

      • While the paper discusses the transfer of learning between tasks with varying levels of invariant stability, the mechanism of this transfer within each invariant condition remains unclear. A more detailed analysis would involve keeping the invariant's stability constant while altering a feature of the stimulus in the test condition. For example, in the VPL literature, one of the primary methods for testing generalization is examining transfer to a new stimulus location. The paper does not address the expected outcomes of location transfer in relation to the stability of the invariant. Moreover, in the affine and Euclidean conditions one could maintain consistent orientations for the distractors and targets during training, then switch them in the testing phase to assess transfer within the same level of invariant structural stability.

      Response: Thanks for raising the issue regarding the mechanism of transfer within each invariant conditions. We plan to design an additional experiment that is similar in paradigm to Experiment 2, aiming to examine how VPL generalizes to a new test location within a single invariant stability level.

      • In the section detailing the modeling experiment using deep neural networks (DNN), the takeaway was unclear. While it was interesting to observe that the DNN exhibited a generalization pattern across conditions similar to that seen in the human experiments, the claim made in the abstract and introduction that the model provides a 'mechanistic' explanation for the phenomenon seems overstated. The pattern of weight changes across layers, as depicted in Figure 7, does not conclusively explain the observed variability in generalizations. Furthermore, the substantial weight change observed in the first two layers during the orientation discrimination task is somewhat counterintuitive. Given that neurons in early layers typically have smaller receptive fields and narrower tunings, one would expect this to result in less transfer, not more.

      Response: We appreciate the reviewer's feedback regarding the clarity of our DNN modeling experiment. We acknowledge that while DNNs have been demonstrated to serve as models for visual systems as well as VPL, the claim that the model provides a ‘mechanistic’ explanation for the phenomenon still overstated. In our revised version,

      We will attempt a more detailed analysis of the DNN model while providing a more explicit explanation of the findings from the DNN modeling experiment, emphasizing its implications for understanding the observed variability in generalizations.

      Additionally, the substantial weight change observed in the first two layers during the orientation discrimination task is not contradictory to the theoretical framework we proposed, instead, it aligns with our speculation regarding the neural mechanisms of VPL for geometric invariants. Specifically, it suggests that invariants with lower stability rely more on the plasticity of lower-level brain areas, thus exhibiting poorer generalization performance to new locations or stimulus features within each invariant conditions. However, it does not imply that their learning effects cannot transfer to invariants with higher stability.

      Reviewer #2 (Public Review):

      The strengths of this paper are clear: The authors are asking a novel question about geometric representation that would be relevant to a broad audience. Their question has a clear grounding in pre-existing mathematical concepts, that, to my knowledge, have been only minimally explored in cognitive science. Moreover, the data themselves are quite striking, such that my only concern would be that the data seem almost too clean. It is hard to know what to make of that, however. From one perspective, this is even more reason the results should be publicly available. Yet I am of the (perhaps unorthodox) opinion that reviewers should voice these gut reactions, even if it does not influence the evaluation otherwise. Below I offer some more concrete comments:

      (1) The justification for the designs is not well explained. The authors simply tell the audience in a single sentence that they test projective, affine, and Euclidean geometry. But despite my familiarity with these terms -- familiarity that many readers may not have -- I still had to pause for a very long time to make sense of how these considerations led to the stimuli that were created. I think the authors must, for a point that is so central to the paper, thoroughly explain exactly why the stimuli were designed the way that they were and how these designs map onto the theoretical constructs being tested.

      (2) I wondered if the design in Experiment 1 was flawed in one small but critical way. The goal of the parallelism stimuli, I gathered, was to have a set of items that is not parallel to the other set of items. But in doing that, isn't the manipulation effectively the same as the manipulation in the orientation stimuli? Both functionally involve just rotating one set by a fixed amount. (Note: This does not seem to be a problem in Experiment 2, in which the conditions are more clearly delineated.)

      (3) I wondered if the results would hold up for stimuli that were more diverse. It seems that a determined experimenter could easily design an "adversarial" version of these experiments for which the results would be unlikely to replicate. For instance: In the orientation group in Experiment 1, what if the odd-one-out was rotated 90 degrees instead of 180 degrees? Intuitively, it seems like this trial type would now be much easier, and the pattern observed here would not hold up. If it did hold up, that would provide stronger support for the authors' theory.

      It is not enough, in my opinion, to simply have some confirmatory evidence of this theory. One would have to have thoroughly tested many possible ways that theory could fail. I'm unsure that enough has been done here to convince me that these ideas would hold up across a more diverse set of stimuli.

      Response: (1) We appreciate the reviewer’s feedback regarding the justification for our experimental designs. We recognize the importance of thoroughly explaining how our stimuli were designed and how these designs correspond to the theoretical constructs being tested. In our revised version, we will enhance the introduction of Erlangen program and provide a more detailed explanation of the rationale behind our stimulus designs, aiming to enhance the clarity and transparency of our experimental approach for readers who may not be familiar with these concepts.

      (2) We appreciate the reviewer’s insight into the design of Experiment 1 and the concern regarding the potential similarity between the parallelism and orientation stimuli manipulations.

      The parallelism and orientation stimuli in Experiment 1 were first used by Olson & Attneave (1970) to support line-based models of shape coding and then adapted to measure the relative salience of different geometric properties (Chen, 1986). In the parallelism stimuli, the odd quadrant differs from the rest in line slope, while in the orientation stimuli, in contrast, the odd quadrant contains exactly the same line segments as the rest but differs in direction pointed by the angles. The result, that the odd quadrant was detected much faster in the parallelism stimuli than in the orientation stimuli, can serve as evidence for line-based models of shape coding. However, according to Chen (1986, 2005), the idea of invariants over transformations suggests a new analysis of the data: in the parallelism stimuli, the fact that line segments share the same slope essentially implies that they are parallel, and the discrimination may be actually based on parallelism. Thus, the faster discrimination of the parallelism stimuli than that of the orientation stimuli may be explained in terms of relative superiority of parallelism over orientation of angles—a Euclidean property.

      The group of stimuli in Experiment 1 has been employed by several studies to investigate scientific questions related to the Klein’s hierarchy of geometries (L. Chen, 2005; Meng et al., 2019; B. Wang et al., n.d.). Due to historical inheritance, we adopted this set of stimuli and corresponding paradigm, despite their imperfect design.

      (3) Thanks for raising the important issue of stimulus diversity and the potential for "adversarial" versions of the experiments to challenge our findings. We acknowledge the validity of your concern and recognize the need to demonstrate the robustness of our results across a range of stimuli. We plan to design additional experiments to investigate the potential implications of varying stimulus characteristics, such as different rotation angles proposed by the reviewer, on the observed patterns of performance.

    1. Author Response

      We would like to thank the editors and reviewers who took their valuable time to evaluate the manuscript from various perspectives. We are delighted that our technique was found appealing to biologists and imaging technologists. However, we received several comments that the principles and effectiveness of our techniques are often vague and difficult to understand. They also pointed out that the explanations and representations for several figures were not appropriate. We will revise the manuscript to address these issues and make the manuscript more clear and rigorous.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Comment 1.1: “Did the UKB or HCHS datasets have information on accurate markers of insulin resistance, such as HbA1c or HOMA-IR (if fasting glucose was not available)? Looking at that data would allow us to determine the contribution of insulin resistance to the observed cortical phenotype.”

      Reply 1.1: We appreciate the insightful suggestion from the reviewer. In response, we incorporated the HbA1c into our analysis, enhancing its sensitivity to potential effects of insulin resistance. Subsequently, our analysis was reperformed, integrating HbA1c alongside non-fasting blood glucose in the PLS. This addition did not alter our main results, i.e., that of the PLS, virtual histology, and network contextualization analysis. Notably, as a result of the inclusion of HbA1c, the second latent variable now accounted for a greater shared variance (22.13%), with HbA1c showing the highest loading among MetS component variables. The manuscript has been thoroughly revised to incorporate these results.

      Comments 1.2: “(Results, p.13, 291-292) "A correlation matrix relating all considered MetS component measures is displayed in supplementary figure S12. Please clarify in this figure labels whether this was non-fasting glucose. If this is non-fasting glucose, it is not a MetS-related risk factor. The reader might be misled into thinking that fasting-glucose has a weak correlation, while its contribution (and the effect of insulin resistance) was not studied here.”

      “Table S8 and Table S9: Is the glucose metric here measured following fasting? If not, this should not be listed as a metabolic syndrome criterion. Or it should be specified that it isn't fasted glucose, otherwise, it sounds misleading.”

      Reply 1.2: We thank the reviewer for bringing this ambiguity to our attention. The initial analysis included only non-fasting plasma glucose in the PLS, as fasting plasma glucose data was unavailable for UKB and HCHS participants. Following your suggestion in reply 1.1, we have now incorporated HbA1c, a more indicative marker of insulin resistance. We retained non-fasting blood glucose in our analysis, recognizing its relevance as a diagnostic variable for type 2 diabetes mellitus, although it is less informative than fasting plasma glucose, HbA1c, or HOMA-IR. This decision is substantiated by the significant correlation found between non-fasting plasma glucose and HbA1c in our sample (r=.49).

      To enhance clarity, we have revised the methods section to explicitly mention that the study investigates non-fasting blood glucose. The revised sentence reads: “Here, we related regional cortical thickness and subcortical volumes to clinical measurements of MetS components, i.e., obesity (waist circumference, hip circumference, waist-hip ratio, body mass index), arterial hypertension (systolic blood pressure, diastolic blood pressure), dyslipidemia (high density lipoprotein, low density lipoprotein, total cholesterol, triglycerides) and insulin resistance (HbA1c, non-fasting blood glucose).”

      Additionally, we have updated the caption of supplementary figure S13 (formerly supplementary figure S12) to clearly indicate the investigation of non-fasting plasma glucose. The table detailing diagnostic MetS criteria (supplementary table S2) has also been amended to clarify the absence of fasting plasma glucose data in our study and to indicate that only data on antidiabetic therapy and diagnosis of type 2 diabetes mellitus were used as criteria for insulin resistance in the case-control analysis.

      Comment 1.3: “I do not understand how the authors can claim there is a deterministic relationship there if all the results are only correlational or comparative. Can the differences in functional connectivity and white matter fiber tracts observed not be caused by the changes in cortices they relate to? How can the authors be sure the network organisation is shaping the cortical effects and not the opposite (the cortical changes influence the network organisation)? This should be further discussed or explained.”

      Reply 1.3: We agree with the reviewer's comment on the non-causative nature of our data and have accordingly revised the discussion section to reflect a more cautious interpretation of our findings. We have carefully reframed our language to avoid any implications of causality, ensuring the narrative aligns with the correlational nature of our data. Nevertheless, we believe that exploring causal interpretations can offer valuable clinical insights. Therefore, while moderating our language, we have maintained certain speculative discussions regarding potential causative pathomechanistic pathways.

      Comment 1.4: “The hippocampus is also an area where changes have consistently been observed. Why did the authors limit their analysis to the cortex.”

      Reply 1.4: We appreciate this reviewer comment. In response, we have added volumes of Melbourne Subcortical Atlas parcels (including the hippocampus) to the analysis. Corresponding results are now shown in figure 2. The subcortical bootstrap ratios indicated that higher MetS severity was related to lower volumes across all investigated subcortical structures.

      Comment 1.5: “Which field ID of the UK biobank are the measures referring to? If possible, please specify the Field ID for each of the UKB metrics used in the study.”

      Reply 1.5: We thank the reviewer for the recommendation. The Field IDs used in our study are now listed in supplementary figure S1.

      Comment 1.6: “Several Figures were wrongly annotated, making it hard to follow the text.”

      Reply 1.6: Thank you for bringing the annotation issues to our awareness. We have thoroughly edited all annotations which should now correctly reference the figure content.

      Reviewer 2

      Comment 2.1: “Do the authors have the chance to see how the pattern relates to changes in cognitive function in the UKBB and possibly HCHS? This could help to provide some evidence about the directionality of the effect.” Reply 2.1: Thank you for your suggestion. We acknowledge the potential value of investigating gray matter morphometric data alongside longitudinal information on cognitive function. Although we concur with the significance of this approach, we are constrained by the ongoing processing of the UKB's imaging follow-up data and the pending release of the HCHS follow-up data. Consequently, our current analysis cannot incorporate this aspect for now. We plan to explore the relationship between MetS, cognition and brain morphology using longitudinal data as soon as it becomes available.

      Comment 2.2: “Also, you could project new data onto the component and establish a link with cognition in a third sample which would be even more convincing. I can offer LIFE-Adult study for this aim.”

      Reply 2.2: We are grateful for your recommendation to enhance our study's robustness by including a third sample to establish a cognitive link. While we recognize the merit of such a sensitivity analysis, we believe that our current dataset, derived from two large, independent cohorts, is sufficiently comprehensive for the scope of our current analysis. However, we are open to considering this approach in future studies and appreciate your offer of the LIFE-Adult study. We would welcome further conversation with you regarding future joint projects.

      Comment 2.3: “The sentences (p.17, ll.435 ff) seem to repeat: "Interestingly, we also observed a positive relationship between cortical thickness and MetS in the superior frontal, parietal and occipital lobe. Interpretation of this result is, however, less intuitive. We also noted a positive MetS-cortical thickness association in superior frontal, parietal and occipital lobes, a less intuitive finding that has been previously reported [60,61].”

      Reply 2.3: Thank you for making us aware of this duplication. We have deleted the first part of the section. It now reads “We also noted a positive MetS-cortical thickness association in superior frontal, parietal and occipital lobes, a less intuitive finding that has been previously reported.”

      Comment 2.4: “I would highly appreciate empirical evidence for the claim in ll. 442 "In support of this hypothesis, the determined cortical thickness abnormality pattern is consistent with the atrophy pattern found in vascular mild cognitive impairment and vascular dementia" Considering the previous reports about the co-localization of obesity-associated atrophy and AD neurodegeneration (Morys et al. 2023, DOI: 10.3233/JAD-220535), that most dementias are mixed and that MetS probably increases dementia risk through both AD and vascular mechanisms, I feel such "binary" claims on VaD/AD-related atrophy patterns should be backed up empirically.”

      Reply 2.4: Thank you for highlighting the need for clarity in differentiating between vascular and Alzheimer's dementia. We recognize the intricate overlap in dementia pathologies. Acknowledging the prevalence of mixed dementia and the influence of MetS on both AD and vascular mechanisms, we realize our original statement might have implied a specificity to vascular dementia, which was not intended.

      To address your concern, we have revised our statement to avoid an exclusive focus on vascular pathology, ensuring a more balanced representation of dementia types. Additionally, we have included Morys et al. 2023 as a reference. The section now reads: “In support of this hypothesis, the determined brain morphological abnormality pattern is consistent with the atrophy pattern found in vascular mild cognitive impairment, vascular dementia and Alzheimer’s dementia.”

      Comment 2.5: “I wonder how specific the cell-type results are to this covariance pattern. Maybe patterns of CT (independent of MetS) show similar associations with one or more of the reported celltypes? Would it be possible to additionally show the association of the first three components of general cortical thickness variation with the cell type densities?”

      Reply 2.5: Thank you for your query regarding the specificity of the cell-type results to the observed covariance pattern. To address this, we have conducted a virtual histology analysis of the first three latent variables of the main analysis PLS. The findings of this extended analysis have been detailed in the supplementary Figure S21. The imaging covariance profile of latent variable 2 was significantly associated with the density of excitatory neurons of subtype 3. The imaging covariance profile linked to latent variable 3 showed no significant association of cell type densities. Possibly, latent variable 3 represents only a noise component as it explained only 2.12% of shared variance. We hope this addition provides a clearer understanding of the specificity of our main results.

      Comment 2.6: “I agree that this multivariate approach can contribute to a more holistic understanding, yet I would like to see the discussion expanded on how to move on from here. Should we target the MetS more comprehensively or would it be best to focus on obesity (being the strongest contributor and risk factor for other "downstream" conditions such as T2DM)? A holistic approach is somewhat at odds with the in-depth investigation of specific mechanisms.”

      Reply 2.6: We value your suggestion to elaborate on the implications of our findings. Our study indicates that obesity may have the most pronounced impact on brain morphology among MetS components, suggesting it as a key contributor to the clinical-anatomical covariance pattern observed in our analysis. This highlights obesity as a primary target for future research and preventive strategies. However, we believe that our results warrant further validation, ideally through longitudinal studies, before drawing definitive clinical conclusions.

      Additionally, our study endorses a comprehensive approach to MetS, highlighting the importance of considering the syndrome as a whole to gain broader insights. We want to clarify, however, that such an approach is meant to complement, rather than replace, the study of individual cardiometabolic risk factors. The broad perspective our study adopts is facilitated by its epidemiological nature, which may not be as applicable in experimental settings that are vital for deriving mechanistic disease insights.

      To reflect these points, we have expanded the discussion in our manuscript to include a more detailed consideration of these implications and future research directions.

      Comment 2.7: “Please report the number of missing variables.”

      Reply 2.7: Thank you for your request to report the number of missing variables. We would like to direct your attention to table 1, where we have listed the number of available values for each variable in parentheses. To determine the number of missing variables, one can subtract these numbers from the total sample size.

      Comment 2.8: “Was the pattern similar in pre-clinical (pre-diabetes, pre-hypertension) vs. clinical conditions?“

      Reply 2.8: Thank you for your interest in the applicability of our findings across different MetS severity levels. Our analysis employs a continuous framework to encompass the entire range of vascular and cardiometabolic risks, including those only mildly affected by MetS. The linear relationship we observed between MetS severity and gray matter morphology patterns, as illustrated in Figure 2d, supports the interpretation that our findings apply to the entire spectrum of MetS severities.

      Comment 2.9: “How did you deal with medication (anti-hypertensive, anti-diabetic, statins..)?”

      Reply 2.9: Information on medication was considered for defining MetS for the case-control sensitivity analysis but was not included in the PLS. Detailed information can be found in table 1.

      Comment 2.10: “It would be really interesting to determine the genetic variations associated with the latent component. Have you considered doing a GWAS on this, potentially in the CHARGE consortium or with UKBB as discovery and HCHS as replication sample?”

      Reply 2.10: Thank you for your valuable suggestion regarding the implementation of a GWAS. We agree that incorporating a GWAS would provide significant insights, but we also recognize that it extends beyond the scope of our current analysis. However, we are actively planning a follow-up analysis. This subsequent analysis will encompass a comprehensive examination of both genetic variation and imaging findings in the context of MetS.

      Comment 2.11: “Please provide more information on which data fields from UKBB were used exactly (e.g. in github repository).”

      Reply 2.11: We appreciate your recommendation. The details regarding the Field IDs used in our study have been included as supplementary table S1.

      Reviewer 3

      Comments 3.1: “After a thorough review of the methods and results sections, I found no direct or strong evidence supporting the authors' claim that the identified latent variables were related to more severe MetS to worse cognitive performance. While a sub-group comparison was conducted, it did not adequately account for confounding factors such as educational level.”

      “Page 18-19 lines 431-446: the fifth paragraph in the discussion section. - As previously mentioned in the "Weaknesses" section, this study did not conduct a direct association analysis between MetS and cognitive levels without considering subgroup comparisons. Hence, I recommend the content of this paragraph warrants careful reconsideration.”

      Reply 3.1: We acknowledge the reviewer's constructive feedback regarding our analysis of cognitive data. We have performed a mediation analysis relating the subject-specific clinical PLS score of latent variable 1 representing MetS severity and cognitive test performances and testing for mediating effects of the imaging PLS score capturing the MetS-related brain morphological abnormalities. The imaging score was found to statistically mediate the relationship between the clinical PLS score and executive function and processing speed, memory, and reasoning test performance. These findings highlight brain structural differences as a relevant pathomechanistic correlate in the relationship of MetS and cognition. Corresponding information can now be found in figure 3, methods section 2.6.2, result section 3.3 and discussion section 4.2.

      Moreover, we would like to apologize for any confusion caused by previous unclear presentation. Our study further incorporates association analyses between MetS, brain structure, and cognition using MetS components, regional brain morphological measures, and cognitive performance data in a PLS to investigate whether cognitive measures contribute to the latent variable. These analyses were separately performed on the UK Biobank and HCHS datasets, due to their distinct cognitive assessments. We adjusted for age, sex, and education in the subgroup analyses by removing their effects from the input variables. These relationships are detailed in supplementary figures S16b and S17b, with loadings close to zero for age, sex, and education, confirming effective deconfounding.

      In sum, we greatly appreciate the suggestion to conduct a mediation analysis, which has substantially enhanced the strength and relevance of our analysis.

      Comment 3.2: “I would suggest the authors provide a more comprehensive description of the metrics used to assess each MetS component, such as obesity (incorporating parameters like waist circumference, hip circumference, waist-hip ratio, and body mass index) and arterial hypertension (detailing metrics like systolic and diastolic blood pressure), etc.”

      Reply 3.2: Thank you for your suggestion regarding a more detailed description of the metrics for assessing each component of MetS. We would like to point out that the specific metrics used, including those for obesity (such as waist circumference, hip circumference, waist-hip ratio, and body mass index) and arterial hypertension (including systolic and diastolic blood pressure), are comprehensively detailed in table 1 of our manuscript. We hope this table provides the clarity and specificity you are seeking regarding the MetS assessment metrics in our study.

      Comment 3.3: “I recommend the inclusion of an additional, detailed flowchart to further illustrate the procedure of virtual histology analysis. This would enhance the clarity of the methodological approach and assist readers in better comprehending the analysis method.”

      Reply 3.3: Thank you for your suggestion. Recognizing the challenges in visually representing many of our analysis steps, we have instead supplemented our manuscript with additional references. These references provide a clearer understanding of our virtual histology approach, particularly focusing on the processing of regional microarray expression data.

      The corresponding sentence reads: “Further details on the processing steps covered by ABAnnotate can be found elsewhere (https://osf.io/gcxun) [42]”

      Comment 3.4: “Why were both brain hemispheres used instead of solely utilizing the left hemisphere as the atlas, especially considering that the Allen Human Brain Atlas (AHBA) only includes gene data for the right hemisphere for two subjects?”

      Reply 3.4: Thank you for your query regarding our decision to use both brain hemispheres instead of solely the left hemisphere, especially considering the Allen Human Brain Atlas (AHBA) predominantly featuring gene data from the left hemisphere. Given the AHBA's limited spatial coverage of expression data in the right hemisphere, our approach involved mirroring the existing tissue samples across the left-right hemisphere boundary using the abagen toolbox,1 a practice supported by findings that suggest minimal lateralization of microarray expression.2,3 Further details are provided in previous work employing ABAnnotate.4 These studies are now referenced in our methods section.

      Comment 3.5: “The second latent variable was not further discussed. If this result is deemed significant, it warrants a more detailed discussion. "

      Reply 3.5: Thank you for the suggestion. We have added a paragraph to the discussion that discusses the second latent variable in greater detail. It reads: “The second latent variable accounted for 22.33% of shared variance and linked higher insulin resistance and lower dyslipidemia to lower thickness and volume in lateral frontal, posterior temporal, parietal and occipital regions. The distinct covariance profile of this latent variable, compared to the first, likely indicates a separate pathomechanistic connection between MetS components and brain morphology. Given that HbA1c and blood glucose were the most significant contributors to this variable, insulin resistance might drive the observed clinicalanatomical relationship.”

      Comment 3.6: “I suggest appending positive MetS effects after "..., insular, cingulate and temporal cortices;" for two reasons: a). The "positive MetS effects" might represent crucial findings that should not be omitted. b). Including both negative and positive effects ensures that subsequent references to "this pattern" are more precise.”

      Reply 3.6: We concur with the notion that the positive MetS effects should be highlighted as well. We modified the first discussion paragraph now mentioning them.

      Comment 3.7: “I would appreciate further clarification on this sentence and the use of the term "uniform" in this context. Does this suggest that despite the heterogeneity in the physiological and pathological characteristics of the various MetS components (e.g., obesity, hypertension), their impacts on cortical thickness manifest similarly? How is it that these diverse components lead to "uniform" effects on cortical thickness? Does this observation align with or deviate from previous findings in the literature?”

      Reply 3.7: Thank you for highlighting the ambiguity in our previous explanation. We agree that the complexity of the relationship between MetS components and brain morphology requires clearer articulation. To address this, we have revised the relevant sentence for better clarity. It now reads: „This finding indicates a relatively uniform connection between MetS and brain morphology, implying that the associative effects of various MetS components on brain structure are comparatively similar, despite the distinct pathomechanisms each component entails.“

      Comment 3.8: “Figure 1 does not have the labels "c)" and "d)". ”

      Reply 3.8: Thank you. We have modified figure 1 and made sure that the caption correctly references its content.

      Comment 3.10: “Incorrect figure/table citation:

      • Page 18 line 418: "(figure 2b and 1c)" à (figure 2b and 2c).

      • Page 18 line 419: "(supplementary figures S8 and S12-13)" à (supplementary figures S11 and S1516).

      • In the supplementary material, "Text S5 - Case-control analysis" section contains several figure or table citation errors. Please take a moment to review and correct them.”

      Reply 3.10: Thank you for bringing this to our attention. We have corrected the figure and table citation errors.

      Comment 3.11: “Page 8 line 184: The more commonly used term is "insulin resistance" rather than "insuline resistance.”

      Reply 3.11: We now use “insulin resistance” throughout the manuscript.

      Comment 3.12: “Nevertheless, variations in gene sets may introduce a degree of heterogeneity in the results (Seidlitz, et al., 2020; Martins et al., 2021). Consequently, further validation or exploratory analyses utilizing different gene sets can yield more compelling results and conclusions.”

      Reply 3.12: Thank you for your insightful comment regarding the potential heterogeneity introduced by variations in gene sets. We agree that exploring different gene sets could indeed enhance the robustness and generalizability of our findings. However, we think conducting a comprehensive methodological analysis of the available cell-type specific gene sets is a substantial effort and warrants its own investigation to thoroughly implement it and assess its implications. We also like to highlight that we are adhering to previous practices in our analysis setup.4,5

      References

      (1) Markello RD, Arnatkeviciute A, Poline JB, Fulcher BD, Fornito A, Misic B. Standardizing workflows in imaging transcriptomics with the abagen toolbox. Jbabdi S, Makin TR, Jbabdi S, Burt J, Hawrylycz MJ, eds. eLife. 2021;10:e72129. doi:10.7554/eLife.72129

      (2) Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature. 2012;489(7416):391-399. doi:10.1038/nature11405

      (3) Hawrylycz M, Miller JA, Menon V, et al. Canonical genetic signatures of the adult human brain. Nat Neurosci. 2015;18(12):1832-1844. doi:10.1038/nn.4171

      (4) Lotter LD, Saberi A, Hansen JY, et al. Human cortex development is shaped by molecular and cellular brain systems. Published online May 5, 2023:2023.05.05.539537. doi:10.1101/2023.05.05.539537

      (5) Lotter LD, Kohl SH, Gerloff C, et al. Revealing the neurobiology underlying interpersonal neural synchronization with multimodal data fusion. Neuroscience & Biobehavioral Reviews. 2023;146:105042. doi:10.1016/j.neubiorev.2023.105042

    1. Author Response

      Reviewer #2 (Public Review):

      This study aims to test the role of awake replay in short-term memory, a type of memory that operates on the timescale of seconds and minutes. Replay refers to a time-compressed burst of neuronal population activity during a particular oscillatory local field potential event in the hippocampus, called the sharp-wave ripple (SWR). SWRs are found during sleep and in the awake state and are always associated with the animal being quiescent. The paper compares results from three different behavioral tasks ranging in memory requirements and memory timescales. First, rats were trained on either a spatial match-to-sample task (MTS), a non-match-to-sample task (NMTS), or a task requiring the memorization of sequences (maze arms to be visited in a specific temporal order). In this initial training phase, the animals were allowed to learn the maze structure and the rules governing these tasks for all these behavioral paradigms. Then, awake sharp-SWRs were disrupted as the animal performed these tasks (both during instruction and test phases) via an online detection system combined with closed-loop electrical stimulation of the ventral hippocampal commissure. Notably, this manipulation appeared not to affect performance in all three tasks, as determined using various behavioral parameters. Trials with no stimulation or delayed stimulation serve as controls. Thus, the authors conclude that awake SWRs are not involved in these short-term memory-guided behaviors. I do have a few comments that the authors should discuss or address:

      (1) This study adds to a large number of studies investigating the role of awake SWRs in spatial learning and memory tasks. The results of these previous studies are quite contradictory and range from awake SWRs are not crucial in guiding decisions at all to SWRs are only essential during task rule learning to SWRs do guide behavior. Could the authors comment on these seemingly contradictory results? Why are these experiments now the right ones?

      The reviewer is correct that there is a large body of literature investigating awake SWRs. Most commonly, interpretations about the role of SWRs and associated replay are made based on correlations of their occurrence with behavior. These correlations do, however, not necessarily indicate that SWRs contribute to a particular cognitive process. That is why interventional studies like ours are important to clarify the contribution of SWRs.

      The acquisition of a novel task involves a number of cognitive processes, including short- and long-term memory, building a map of the environment, exploration of the solution space and incorporating (non-)rewarding feedback. Based on available evidence, SWRs could contribute to many of these processes. Our experiments were designed to exclude the long-term memory aspect and focus on the memorization of locations on a short time-scale which as we now demonstrate is not dependent on SWRs. Since the use of short-term spatial memory is one of the possible explanations for the learning deficit seen by Jadhav et al. (2012) following SWR disruption in an alternation task, our results may also narrow down the exact contribution of SWR in these studies.

      (2) None of the experiments presented here test the role of replay. I suggest making this distinction in the paper and the title clear. As the results are presented now, is it possible that the SWR content is not affected sufficiently to have a behavioral effect or that there is a bias towards detecting specific SWRs, e.g., longer SWRs?

      The reviewer is right that our experiments do not say anything about replay directly. We adapted the text to make this distinction clear.

      We address the possibility that SWR content may not be disrupted sufficiently to cause a behavioral effect in response to recommendation 1.

      Reviewer #3 (Public Review):

      In this manuscript, the authors seek to shed light on the role of awake hippocampal replay during memory tasks that are claimed to be short-term memory. For this, they make use of a real-time detection and disruption system of awake hippocampal ripples, which are used as a proxy for awake neuronal replay. The manuscript describes extensively the tasks as well as the disruption system and controls used during the experiments. The authors present numerous and solid analyses of the behavioral data acquired during the tasks. Nonetheless, the current version of the manuscript is lacking a more complete discussion in which the results are contrasted to previous similar findings, as well as mentioning the role of the awake ripple in the stabilization of hippocampal maps. Some extra analyses are also suggested below. The manuscript would also be enriched if the authors suggested alternative mechanisms for memory rehearsal. Finally, some claims of "we are first" seem inappropriate when compared to the previous literature.

      Major comments:

      How does one define short-term memory (STM) in rodents? The examples and papers cited in the first paragraphs refer mostly to human working memory tasks, from which it is known that a non- rehearsed STM lasts typically 20-30 seconds. Could the authors mention how this concept is translated to rodents? Could you clarify until what point memory is considered STM and what is the criteria to consider it has turned into long-term memory or when is it simply working memory or habit/skill?

      We agree with the reviewer that the definition of short-term memory is fluid and may differ between researchers and model systems. To avoid confusion, we reframed our study in a different context and hope that this makes the timeframes we are talking about clearer.

      Further, why should these tasks be classified as testing STM while Jadhav et al. tasks are working memory or as they now mention in this article rule learning?

      Note that short-term memory and working memory are closely related, but not identical, concepts. Whereas short-term memory refers to the retaining of information for a short period of time, working memory is generally considered to also include some manipulation of that information. Unfortunately, in the rodent literature, (spatial) working memory and short-term memory are often used interchangeably.

      Many (animal) spatial memory tasks do not test a single cognitive faculty, but likely involve a combination of short-term memory, working memory, and rule learning (among other abilities) to acquire or solve the task. As such, an unequivocal classification of behavioral tasks is not generally possible. For example, in the continuous version of the spatial alternation task used in Jadhav et al., animals may learn the rule “if I in the center arm and I came from the left goal arm, then I will next find reward in the right goal arm”. The execution of this rule would require maintaining in (short-term) memory the most recent visited goal arm. Alternatively, animals may learn the rule to turn left twice and right twice to successfully perform the task.

      One of our goals in our study was to attempt to isolate rule learning components and short-term memory components in our tasks (to be clear: we are not claiming that our tasks are pure short- term memory tasks).

      We have rewritten the introduction to reframe our study, which hopefully clarifies the points above.

      In humans, the retention of memory after a certain time is achieved by retrieving a long-term memory. How do we know if the considerable training the rats received has not allowed the use of a long-term memory strategy which allows the rats to perform well even in the absence of rehearsal (replay)? These are conceptual explanations that would help understand the key concept of STM in greater detail.

      Our experiments aimed to distinguish between the process of learning general task rules through training and the need to retain information specific to each trial or session. For example, in the NMTS task, the animals may have a long-term memory of the overall task design, but they cannot anticipate or recall in advance which specific arms will be baited in the instruction phase since they vary from one trial to another. Therefore, to complete a trial successfully, the animals must have formed some type of (short-term) memory of the instruction arms and/or of the arms that still need to be visited in the test phase. Although extended training may have resulted in a more optimized and less demanding strategy to memorize the necessary information, evidence in the literature indicates that even then (for this particular task), a functional hippocampus is required (Sasaki 2021). The question we address in our experiments is whether hippocampal SWRs (and by association, replay) are instrumental in the formation or maintenance of this memory, whether through rehearsal or other mechanisms. The rewritten introduction explains these concepts more clearly.

      Further, claims of "first" should be adjusted, since I do not see a large difference between the w (m) maze of Jadhav and these tasks. The main difference between the two projects would rather be that Jadhav tests when animals are still newer to the task while here overtrained animals are used. In Jadhav, it's unlikely that just rule learning is affected since the inbound component is not affected by disruption, which also tests rule learning. Therefore, it is still likely that the effect seen in Jadhav et al is a deficit in working memory/short-term memory. And here it is more likely, that no effect was seen since with overtrained animals other strategies (cortical, striatal, etc) were used. The authors should compare in more detail how overtrained animals were in these different projects as well as in the articles they cite for replay analysis.

      The training of the animals on the general task rules prior to SWR disruption manipulations is by design, as it better isolates the short-term memory demands required to solve the task in each trial/session. In our tasks, the rats are required to memorize a randomly chosen combination of goal arms on each day (MTS & SEQ task) or in every trial (NMTS task). Unlike the continuous alternation paradigm used by Jadhav et al. (2012), our tasks can not be solved using a stereotypical or habitual (striatal) strategy that is acquired through extended training. We can not exclude that the rats acquired an optimized and less cognitively demanding strategy that is mainly dependent on cortical structures outside the hippocampus, however evidence in the literature still indicates the requirement for a functional hippocampus (Sasaki, 2021; Okaichi and Oshima 1990; Blokland, Honig, and Raaijmakers, 1992).

      The reviewer is correct that the inbound component of the continuous alternation task in Jadhav et al. (2012) can be considered rule learning and was not affected by SWR disruption. However, we do not believe that this should be generalized to all rule learning and it is very well conceivable that SWRs contribute to the learning of more complex rules that also feature ambiguity (such as the outbound component in the continuous alternation task). We elaborate on these points in the discussion (lines 425-455).

      The main conclusion of the authors is that hippocampal replay is not the rehearsal mechanism expected in STM given that its disruption doesn't lead to behavioral changes. Could the authors hypothesize in their discussion what other neural mechanisms different from hippocampal replay may be involved in this rehearsal?

      Thank you for this suggestion. We added an extra paragraph speculating on this aspect (lines 499- 518).

      The discussion also lacks closure with respect to how the findings fit in the study of STM in human memory. This would make the article more interesting to a larger audience and highlight its translational aspect.

      We agree with the reviewer and added our insight to the discussion.

      The results describe deeply the behavioral performance of the rats and the validation of the ripple detection/disruption system. However, one important aspect missing is how the hippocampal activity and its encoding of space may be affected by the awake ripple disruption. The authors don't cite the work by Roux et al., Nature Neuroscience. 2017 where optogenetic stimulation of hippocampal neurons provided evidence that neuronal activity associated with awake hippocampal ripples during goal-directed behavior is required for both stabilizing and refining hippocampal place fields, while memory performance was not affected during ripple-locked stimulations compared to a ripple-delayed stimulation control (See supplementary Figure 7 of the mentioned article). I would like the authors to comment on their own findings and contrast them with those of Roux et al.

      We agree that it is interesting to include the results of Roux et al. in our discussion (lines 470 and 463-466).

      Line 64: Could the authors clarify what they mean by "indirect" causal evidence when discussing the contribution of papers by Jadhav, Igata, and Fernandez? Is it the fact that rodents' learning speed changed instead of showing a complete absence of learning? Or is it the fact that the disruption/prolongation is done on the hippocampal ripple and not strictly in the replay sequence?

      We apologize for the confusion and rewrote large parts of the introduction to clarify the contributions of the papers by Jadhav, Igata, and Fernandez and the difference with what our manipulations contribute. In the process, we removed the phrase ‘indirect causal evidence’.

      I would also highlight this latter difference, given that the above-mentioned authors describe their methodological approaches in terms of ripples and not in terms of replay content. For example, the use of "replay" instead of "ripple" in Line 61 results in methodological inaccurate terms such as replay disruption and replay prolongation.

      Thank you for pointing this out. We adapted the manuscript to always use ‘ripple’ or ‘sharp-wave ripple’ (SWR) when describing our results.

      Despite its apparent lack of statistical significance, the reported mean ripple detection rate during the trial and non-trial periods tend to be always higher in the disruption condition of all tasks by observing the median of the boxplots in Figure 1J, Figure 2H, and Figure 3J. It is worth investigating this further using the same linear regression method as Girardeau et al. Journal of Neuroscience, 2014 which may reduce the variability and allow comparing slopes of a cumulative number of ripples over time. This may reveal a compensatory homeostatic-like increase in the rate of ripples during the disrupted sessions, which may suggest a need for the ripple/replay occurrence in spite of it not having an effect on the rats' performance during the task.

      The reviewer makes an interesting observation and we appreciate the suggestion for further investigation. However, note that a clear trend for higher ripple rates in disruption trials/sessions is not present when comparing to non-stimulated control trials/session. Part of the variability in the observed ripple rates is likely due to the variability in the animals’ behavioral state (e.g., moving, pausing but alert, grooming, consuming reward) and the corresponding varying propensity for SWRs to occur. The behavioral variability makes application of the linear regression approach of Girardeau et al. (2014) not straightforward (note that Girardeau et al. looked at SWRs during sleep). For these reasons, we have decided to not further look into the potential disruption-induced increase of the SWR rate.

      In line 425, the authors report a median relative delay of 52.9 of their disruption system. Such a value would indicate that only around 47% of the ripple is being blocked. Is there any data from the authors or others that could reassure the reader that the 52.9% of the ripple that "leaks" is not enough for the replay phenomenon to occur? Considering the findings of Fernandez-Ruiz et al. 2019 on large-duration ripples, could the authors report the relative delay for both short and long ripples (>100 ms) separately?

      The reviewer is correct that the initial part (~35 ms) of SWRs remains intact, which is inherent to the online detection and disruption approach. In relative terms, a larger fraction of long SWRs is disrupted. As requested, we have adapted figure 4c to separately show the distribution of relative detection delays for long (duration >100ms) and short SWRs.

      As we and others have shown, the electrical stimulation temporarily suppresses spiking activity in CA1 and thus abruptly interferes with any ongoing replay, but any beginning of replay sequences before the stimulation will not be affected. Previous studies that use the same methodology to disrupt SWRs reported a behavioral performance deficit despite the detection delays (Michon et al. 2019; Girardeau et al. 2009; Jadhav et al. 2012). This suggests that the initial part of SWRs (and replay) is not sufficient to support the behavior. The delays in the current study are quantitatively similar to what we have reported before in Michon et al. (2019) and thus we are confident that we should have been able to observe a behavioral effect if present. We now elaborate on this topic in the Discussion (lines 489-498) .

      Line 494: The authors define long ripples as (>120 ms) but this doesn't coincide with the 100ms threshold from Fernandez Ruiz et al. 2019.

      Thank you for pointing this out, it is corrected in the text both in the Results (line 389) and Discussion (line 486).

      The online ripple detector used filtered the traces in the 135-255 Hz range. This is a narrower frequency range compared to online detectors used by Jadhav et al. 2012 (100-400 Hz) and Fernandez-Ruiz et al. 2019 (80-300 Hz). What motivated the use of this narrow range? Would the omittance of ripples below 135 Hz have implications in the results? Could the authors add to the supplement a figure similar to Figure 4B (FDR vs TPR) using a wider frequency range similar to the authors above in the offline detection of ripples?

      The frequency of hippocampal ripple oscillation in rat generally lies in the range of 160-225 Hz (Buzsaki, 1992). We have added a power spectrum in Figure 1d that confirms this frequency range in our experiments. Filters that include frequencies below this range (as in the studies referenced by the reviewer) likely also pass through high-frequency gamma oscillations, and filters that include frequencies above this range likely also pass through multi-unit spiking activity. The challenge for a real-time ripple detection system is to design a filter that has an acceptable trade-off between filtering in a specific (narrow) frequency range and introducing a long delay. In our study, we specifically designed a filter that is specific to the ripple frequency band and still has an acceptable low delay.

      It is unclear what criterion was used to train the rats in the NMTS task. Line 216 specifies a learning criterion of 80% fully correct trials in one session for three days in a row, while the methods in line 852 mention an average performance below 50% for at least three days in a row.

      Thank you for pointing this out. We corrected the learning criterium description in the results section (lines 108-110) to match the description in the Methods section.

      In the methods section, it is not mentioned if there was a specific region in the cortex where the tetrode was placed (Line 908).

      The detections in this tetrode were used to mark events as "false positives". The authors should be careful in line 933 when they make the statement "ripples are not present in the cortex". There have been recent publications that challenge this affirmation. See Khodagholy, Science. 2017, Nitzan, Nature Comm. 2020.

      Thank you for pointing this out. We have added the cortical region in the methods (line 882) and clarified that, as far as we know, no ripples in that part of the cortex (parietal associate cortex) have been described that are synchronous with hippocampal ripples.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a useful characterization of the biochemical consequences of a disease-associated point mutation in a nonmuscle actin. The study uses solid and well-characterized in vitro assays to explore function. In some cases the statistical analyses are inadequate and several important in vitro assays are not employed.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      The authors first perform several important controls to show that the expressed mutant actin is properly folded, and then show that the Arp2/3 complex behaves similarly with WT and mutant actin via a TIRF microscopy assay as well as a bulk pyrene-actin assay. A TIRF assay showed a small but significant reduction in the rate of elongation of the mutant actin suggesting only a mild polymerization defect.

      Based on in silico analysis of the close location of the actin point mutation and bound cofilin, cofilin was chosen for further investigation. Faster de novo nucleation by cofilin was observed with mutant actin. In contrast, the mutant actin was more slowly severed. Both effects favor the retention of filamentous mutant actin. In solution, the effect of cofilin concentration and pH was assessed for both WT and mutant actin filaments, with a more limited repertoire of conditions in a TIRF assay that directly showed slower severing of mutant actin.

      Lastly, the mutated residue in actin is predicted to interact with the cardiomyopathy loop in myosin and thus a standard in vitro motility assay with immobilized motors was used to show that non-muscle myosin 2A moved mutant actin more slowly, explained in part by a reduced affinity for the filament deduced from transient kinetic assays. By the same motility assay, myosin 5A also showed impaired interaction with the mutant filaments.

      The Discussion is interesting and concludes that the mutant actin will co-exist with WT actin in filaments, and will contribute to altered actin dynamics and poor interaction with relevant myosin motors in the cellular context. While not an exhaustive list of possible defects, this is a solid start to understanding how this mutation might trigger a disease phenotype.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      • Potential assembly defects of the mutant actin could be more thoroughly investigated if the same experiment shown in Fig. 2 was repeated as a function of actin concentration, which would allow the rate of disassembly and the critical concentration to also be determined.

      The polymerization rate of individual filaments observed in TIRFM experiments showed only minor changes, as did the bulk-polymerization rate of 2 µM actin in pyrene-actin based experiments. Therefore, we decided not to perform additional pyrene-actin based experiments, in which we titrate the actin concentration, as we expect only very small changes to the critical concentration. Instead, we focused on the disturbed interaction with ABPs, as we assume these defects to be more relevant in an in vivo context. Using pyrene-based bulkexperiments, we did determine the rate of dilution-induced depolymerization of mutant filaments and compare them with the values determined for wt (Figure 5A, Table 1).

      • The more direct TIRF assay for cofilin severing was only performed at high cofilin concentration (100 nM). Lower concentrations of cofilin would also be informative, as well as directly examining by the TIRF assay the effect of cofilin on filaments composed of a 50:50 mixture of WT:mutant actin, the more relevant case for the cell.

      The TIRF assay for cofilin severing was performed initially over the cofilin concentration range from 20 to 250 nM. The results obtained in the presence of 100 nM cofilin allow a particularly informative depiction of the differences observed with mutant and WT actin. This applies to the image series showing the changes in filament length, cofilin clusters, and filament number as well as to the graphs showing time dependent changes in the number of filaments and total actin fluorescence. We have not included the results for a 50:50 mixture of WT:mutant actin because its attenuating effect is documented in several other experiments in the manuscript.

      • The more appropriate assay to determine the effect of the actin point mutation on class 5 myosin would be the inverted assay where myosin walks along single actin filaments adhered to a coverslip. This would allow an evaluation of class 5 myosin processivity on WT versus mutant actin that more closely reflects how Myo5 acts in cells, instead of the ensemble assay used appropriately for myosin 2.

      Our results with Myo5A show a less productive interaction with mutant actin filaments as indicated by a 1.7-fold reduction in the average sliding velocity and an increase in the optimal Myo5A-HMM surface density from 770 to 3100 molecules per µm2. These results indicate a reduction in binding affinity and coupling efficiency, with a likely impact on processivity. We expect only a small incremental gain in knowledge about the extent of changes by performing additional experiments with an inverted assay geometry, given that under physiological conditions the motor properties of Myo5A and other cytoskeletal myosins are modulated by other factors such as the presence of tropomyosin isoforms and other actin binding proteins.

      Reviewer #2 (Public Review):

      Greve et al. investigated the effects of a disease-associated gamma-actin mutation (E334Q) on actin filament polymerization, association of selected actin-binding proteins, and myosin activity. Recombinant wildtype and mutant proteins expressed in sf9 cells were found to be folded and stable, and the presence of the mutation altered a number of activities. Given the location of the mutation, it is not surprising that there are changes in polymerization and interactions with actin binding proteins. Nevertheless, it is important to quantify the effects of the mutation to better understand disease etiology.

      We thank the reviewer for the positive evaluation of our work.

      Some weaknesses were identified in the paper as discussed below.

      • Throughout the paper, the authors report average values and the standard-error-of-the-mean (SEM) for groups of three experiments. Reporting the SEM is not appropriate or useful for so few points, as it does not reflect the distribution of the data points. When only three points are available, it would be better to just show the three different points. Otherwise, plot the average and the range of the three points.

      We have gone through the manuscript carefully to correct any errors in the statistics, as explained below.

      Figure 1B, 5B, 5C, 5D, 8D, 9B, and 8 – figure supplement 2 all show the mean ± SD, as also correctly reported for Figure 8E and 8F in the figure legend. The statement, that these figures show the mean ± SEM was inaccurate. We corrected this mistake for all the listed figures. Furthermore, we now give the exact N for every experiment in the figure legend.

      Figure 2C, 2E, 2F, 4B, 5A, 6B-E showed the mean ± SEM. As suggested by the reviewer, we corrected the figures to show the mean ± SD.

      We still refer to the mean ± SEM in Figure 2B, where elongation rates for more than 100 filaments were recorded, and in Figure 8B, where sliding velocities for several thousand actin filaments were measured.

      • The description and characterization of the recombinant actin is incomplete. Please show gels of purified proteins. This is especially important with this preparation since the chymotrypsin step could result in internally cleaved proteins and altered properties, as shown by Ceron et al (2022). The authors should also comment on N-terminal acetylation of actin.

      We added an additional figure showing the purification strategy for the recombinant cytoskeletal γ –actin WT and p.E334Q protein with exemplary SDS-gels from different stages of purification (Figure 1 – figure supplement 1).

      In a previous paper, we reported the mass spectrometric analysis of the post-translational modifications of recombinant human β- and γ-cytoskeletal actin produced in Sf-9 cells. (Müller et al., 2013, Plos One). Recombinant actin showing complete N-terminal processing resulting in cleavage of the initial methionine and acetylation of the following aspartate (β-actin) or glutamate (γ-actin) is the predominant species in the analyzed preparations (> 95 %). While the recombinant actin in the 2013 study was produced tag-free and purified by affinity chromatography using the column-immobilized actin-binding domain of gelsolin (G4-G6), we have no reason to assume that the purification strategy using the actin-thymosin-β4 changes the efficiency of the N-terminal processing in Sf-9 cells. This is supported by our, yet unpublished, mass-spectrometric studies on recombinant human α-cardiac actin purified using the actin- thymosin-β4 fusion construct, which revealed actin species with an acetylated aspartate-3. This N-terminal modification of α-cardiac actin is catalyzed by the same actinspecific acetyltransferase (NAA80) as the acetylation of asparate-2 or glutamate-2 in cytoskeletal actin isoforms (Varland et al., 2019, Trends in Biochemical Sciences). Furthermore, additional studies that used the actin-thymosin-β4 fusion construct for the production of recombinant human cytoskeletal actin isoforms in Pichia pastoris reported robust N-terminal acetylation, when the actin was co-produced with NAA80 (In contrast to Sf-9 cells, NAA80 is not endogenously expressed in Pichia pastoris) (Hatano et al., 2020, Journal of Cell Science).

      We therefore, added the following statement to the manuscript:

      “Purification of the fusion protein by immobilized metal affinity chromatography, followed by chymotrypsin–mediated cleavage of C–terminal linker and tag sequences, results in homogeneous protein without non–native residues and native N-terminal processing, which includes cleavage of the initial methionine and acetylation of the following glutamate. “

      • The authors do not use the best technique to assess actin polymerization parameters. Although the TIRF assay is excellent for some measurements, it is not as good as the standard pyrene-actin assays that provide critical concentration, nucleation, and polymerization parameters. The authors use pyrene-actin in other parts of the paper, so it is not clear why they don't do the assays that are the standard in the actin field.

      The polymerization rate of individual filaments observed in TIRFM experiments showed only minor changes, as did the bulk-polymerization rate of 2 µM actin in pyrene-actin based experiments. Therefore, we decided not to perform additional pyrene-actin based experiments, in which we titrate the actin concentration, as we expect only very small changes to the critical concentration. Instead, we focused on the disturbed interaction with ABPs, as we assume these defects to be more relevant in an in vivo context. Using pyrene-based bulkexperiments, we did determine the rate of dilution-induced depolymerization of mutant filaments and compare them with the values determined for WT (Figure 5A, Table 1).

      • The authors' data suggest that, while the binding of cofilin-1 to both the WT and mutant actins remains similar, the major defect of the E334Q actin is that it is not as readily severed/disassembled by cofilin. What is missing is a direct measurement of the severing rate (number of breaks per second) as measured in TIRF.

      The severing rate as measured in TIRF is dependent on a number of parameters in a nonlinear manner. Therefore, we opted to show the combination of images directly showing the progress of the reaction and graphs summarizing the concomitant changes in cofilin clusters, actin filaments, actin-related fluorescence intensity and cofilin-related fluorescence intensity.

      • Figure 4 shows that the E334Q mutation increases rather than decreases the number of filaments that spontaneously assemble in the TIRF assay, but it is unclear how reduced severing would lead to increased filament numbers, rather, the opposite would be expected. A more straightforward approach would be to perform experiments where severing leads to more nuclei and therefore enhances the net bulk assembly rate.

      Figure 4 shows polymerization experiments that were started from ATP-G-actin in the presence of cofilin-1. These experiments show clearly that, especially at the higher cofilin-1 concentration (100 nM), the filament number is strongly increased in experiments performed with mutant actin. Inspection of the corresponding videos of these TIRFM experiments suggest that the increased number of filaments must result from an increased number of de novo nucleation events and not primarily from a mutation-induced change in severing susceptibility. The observation of a cofilin-stimulated increase in the de novo nucleation efficiency of actin was initially described by Andrianantoandro & Pollard (2006, Molecular Cell) using TIRFMbased experiments and is thought to arise from the stabilization of thermodynamically unfavorable actin dimers and trimers by cofilin. While the exact role of this cofilin-mediated effect in vivo is not completely clear, it is thought to contribute to cofilin-meditated actin dynamics synergistically with cofilin-mediated severing. It is therefore necessary, to clearly distinguish between the two effects of cofilin in vitro: stimulation of de novo nucleation and stimulation of filament disassembly. Our data indicated that the E334Q mutation affects these two effects differentially, as we state in the abstract and in the discussion.

      Abstract: “E334Q differentially affects cofilin-mediated actin dynamics by increasing the rate of cofilin-mediated de novo nucleation of actin filaments and decreasing the efficiency of cofilin-mediated filament severing.”

      Discussion: “Cofilin-mediated severing and nucleation were previously proposed to synergistically contribute to global actin turnover in cells (Andrianantoandro & Pollard, 2006; Du & Frieden, 1998). Our results show that the mutation affects these different cofilin functions in actin dynamics in opposite ways. Cofilin-mediated filament nucleation is more efficient for p.E334Q monomers, while cofilin-mediated severing of filaments containing p.E334Q is significantly reduced. The interaction of both actin monomers and actin filaments with ADF/cofilin proteins involves several distinct overlapping reactions. In the case of actin filaments, cofilin binding is followed by structural modification of the filament, severing and depolymerizing the filament (De La Cruz & Sept, 2010). Cofilin binding to monomeric actin is followed by the closure of the nucleotide cleft and the formation of stabilized “long-pitch” actin dimers, which stimulate nucleation (Andrianantoandro & Pollard, 2006)”.

      We interpret the reviewer's suggestion to mean that additional pyrene-actin-based bulk polymerization experiments should be performed to investigate the bulk-polymerization rate of ATP-G-actin in the presence of cofilin-1. In our understanding, these experiment would not provide additional value as 1) An observed increase of the bulk-polymerization rate cannot be directly correlated to a change of the efficiency of de novo nucleation or severing and 2) the effect of the mutation on cofilin-mediated filament disassembly was extensively analyzed in other experiments starting from preformed actin filaments. Moreover, our results are consistent with in silico modelling and normal mode analysis of the WT and mutant actin-cofilin complex.

      • Figure 5 A: in the pyrene disassembly assay, where actin is diluted below its critical concentration, cofilin enhances the rate of depolymerization by generating more free ends. The E334Q mutation leads to decreased cofilin-induced severing and therefore lower depolymerization. While these data seem convincing, it would be better to present them as an XY plot and fit the data to lines for comparison of the slopes.

      We now present the data as suggested by the reviewer. Furthermore, we determined the apparent second-order rate constant for cofilin-induced F-actin depolymerization (kc) to quantify the observed differences between WT, mutant and heterofilaments, as suggested by the reviewer.

      The paragraph describing these results was changed accordingly:

      “The observed rate constant values are linearly dependent on the concentration of cofilin–1 in the range 0–40 nM, with the slope corresponding to the apparent second– order rate constant (kC) for the cofilin-1 induced depolymerization of F–actin. In experiments performed with p.E334Q filaments, the value obtained for kC was 4.2-fold lower (0.81 × 10-4 ± 0.08 × 10-4 nM-1 s-1) compared to experiments with WT filaments (3.42 × 10-4 ± 0.22 × 10-4 nM-1 s-1). When heterofilaments were used, the effect of the mutation was reduced to a 2.2-fold difference compared to WT filaments (1.54 × 10-4 ± 0.11 × 10-4 nM-1 s-1).”

      • Figure 5 B and C: the cosedimentation data do not seem to help elucidate the underlying mechanism. While the authors report statistical significance, differences are small, especially for gel densitometry measurements where the error is high, which suggests that there may be little biological significance. Importantly, example gels from these experiments should be shown, if not the complete set included in the supplement. In B, the higher cofilin concentrations would be expected to stabilize the filaments and thus the curve should be Ushaped.

      We do not completely agree with the reviewer on this point. We think the co-sedimentation experiments are useful, as they show that cofilin-1 efficiently binds to mutant filaments, but is less efficient in stimulating disassembly in these endpoint-experiments. This information is not provided by the analysis of the effect of cofilin-1 on the bulk-depolymerization rate and adds to our understanding of the defect of the actin-cofilin interaction for the mutant.

      While we agree with the reviewer on the point that co-sedimentation experiments must be repeated several times to produce reliable data, we cannot fully grasp the reasoning behind the statement “While the authors report statistical significance, differences are small, especially for gel densitometry measurements where the error is high, which suggests that there may be little biological significance.”. We interpret this statement as advice to be cautious when extrapolating the observed perturbances of cofilin-mediated actin dynamics in vitro to the in vivo context. We think we are cautious about this throughout the manuscript.

      The author expects a U-shape curve, as high cofilin concentrations are reported to stabilize actin filaments by completely decorating the filament before severing-prone boundaries between cofilin-decorated and undecorated regions are generated. We have also performed these experiment with cytoskeletal β-actin and human cofilin-1 and never observed this U shape. This indicates that significant filament disassembly also happens at high cofilin concentrations, most likely directly after mixing of F-actin and cofilin. We cannot rule out that the incubation time plays an important role and that the U-shape only appears after longer incubation times. We also want to direct the reviewer to the publication “A Mechanism for Actin Filament Severing by Malaria Parasite Actin Depolymerizing Factor 1 via a Low Affinity Binding Interface” (Wong et al. 2013, JBC) in which comparable co-sedimentation experiments were performed (Figure 5E-G) with rabbit skeletal α-actin and human cofilin-1 and also no Ushaped curves were observed, even at higher molar excess of cofilin-1 compared to our experiments and with longer incubation times (1 hour vs. 10 minutes).

      We now included an exemplary gel showing co-sedimentation experiments performed with WT, mutant actin and different concentrations of cofilin at pH 7.8 in the manuscript (Figure 5 – figure supplement 2)

      • Figure 5 D: these data show that the binding of cofilin to WT and E334Q actin is approximately the same, with the mutant binding slightly more weakly. It would be clearer if the two plots were normalized to their respective plateaus since the difference in arbitrary units distracts from the conclusion of the figure. If the difference in the plateaus is meaningful, please explain.

      As suggested by the reviewer, we normalized the data for a better understanding of the message conveyed.

      • Figure 6: It is assumed that the authors are trying to show in this figure that cofilin binds both actins approximately the same but does not sever as readily for E334Q actin. The numerous parameters measured do not directly address what the authors are actually trying to show, which presumably is that the rate of severing is lower for E334Q than WT. It is therefore puzzling why no measurement of severing events per second per micron of actin in TIRF is made, which would give a more precise account of the underlying mechanism.

      The severing rate as measured in TIRF is dependent on a number of parameters in a nonlinear manner. Therefore, we opted to show the combination of images directly showing the progress of the reaction and graphs summarizing the concomitant changes in cofilin clusters, actin filaments, actin-related fluorescence intensity and cofilin-related fluorescence intensity.

      • Actin-activated steady-state ATPase data of the NM2A with mutant and WT actin would have been extremely useful and informative. The authors show the ability to make these types of measurements in the paper (NADH assay), and it is surprising that they are not included for assessing the myosin activity. It may be because of limited actin quantities. If this is the case, it should be indicated.

      Indeed, the measurement of the steady-state actin-activated ATPase with recombinant cytoskeletal actin is very material-intensive and therefore costly, as a complete titration of actin is required for the generation of meaningful data. Since the vast majority of our assays involving a myosin family member were performed with NM2A-HMM, we decided to perform a full actin titration of the steady-state actin-activated ATPase of NM2A-HMM with WT and mutant filaments. The results of these experiments are now shown in Figure 8C. The panel showing the results used for determining the dissociation rate constants (k-A) for the interaction of NM2C-2R with p.E334Q or WT γ –actin in the absence of nucleotide was moved to the supplement (Figure 8 – figure supplement 2).

      We added the following paragraph to the Material and Methods section concerning the Steady-State ATPase assay:

      “For measurements of the basal and actin–activated NM2A–HMM ATPase, 0.5 µM MLCKtreated HMM was used. Phalloidin–stabilized WT or mutant F-actin was added over the range of 0–25 µM. The change in absorbance at 340 nm due to oxidation of NADH was recorded in a Multiskan FC Microplate Photometer (Thermo Fisher Scientific, Waltham, MA, USA). The data were fitted to the Michaelis-Menten equation to obtain values for the actin concentration at half-maximal activation of ATP-turnover (Kapp) and for the maximum ATP-turnover at saturated actin concentration (kcat).”

      Furthermore, we added a description of the results of the experiments to the Results section of the manuscript:

      “Using a NADH-coupled enzymatic assay, we determined the ability of p.E334Q and WT filaments to activate the ATPase of NM2A-HMM over the range of 0-25 µM F-actin (Figure 8C). While we observed no significant difference in Kapp, indicated by the actin concentration at half-maximal activation, in experiments with p.E334Q filaments (2.89 ± 0.49 µM) and WT filaments (3.20 ± 0.74 µM), we observed a 28% slower maximal ATP turnover at saturating actin concentration (kcat) with p.E334Q filaments (0.076 ± 0.005 s-1 vs. 0.097 ± 0.002 s-1).”

      • (line 310) The authors state that they "noticed increased rapid dissociation and association events for E334Q filaments" in the motility assay. This observation motivates the authors to assess actin affinities of NM2A-HMM. Although differences in rigor and AM.ADP affinities are found between mutant and WT actins, the actin attachment lifetimes (many minutes) are unlikely to be related to the rapid association and dissociation event seen in the motility assay. Rather, this jiggling is more likely to be related to a lower duty ratio of the myosins, which appears to be the conclusion reached for the myosin-V data. These points should be clarified in the text.

      We changed the text in accordance with the reviewer’ suggestion. It reads now: Cytoskeletal –actin filaments move with an average sliding velocity of 195.3 ± 5.0 nm s–1 on lawns of surface immobilized NM2A–HMM molecules (Figure 8A, B). For NM2A-HMM densities below about 10,000 molecules per μm2, the average sliding speed for cytoskeletal actin filaments drops steeply (Hundt et al, 2016). Filaments formed by p.E334Q actin move 5fold slower, resulting in an observed average sliding velocity of 39.1 ± 3.2 nm/s. Filaments copolymerized from a 1:1 mixture of WT and p.E334Q actin move with an average sliding velocity of 131.2 ± 10 nm s–1 (Figure 8A, B). When equal densities of surface-attached WT and mutant filaments were used, we observed that the number of rapid dissociation and association events increased markedly for p.E334Q filaments (Figure 8 – video supplement 7– 9).

      Using a NADH-coupled enzymatic assay, we determined the ability of p.E334Q and WT filaments to activate the ATPase of NM2A-HMM over the range of 0-25 µM F-actin (Figure 8C). While we observed no significant difference in Kapp, indicated by the actin concentration at halfmaximal activation, in experiments with p.E334Q filaments (2.89 ± 0.49 µM) and WT filaments (3.20 ± 0.74 µM), we observed a 28% slower maximal ATP turnover at saturating actin concentration (kcat) with p.E334Q filaments (0.076 ± 0.005 s-1 vs. 0.097 ± 0.002 s-1). To investigate the impact of the mutation on actomyosin–affinity using transient–kinetic approaches, we determined the dissociation rate constants using a single–headed NM2A–2R construct (Figure 8D). …..

      • (line 327) The authors report that the 1/K1 value is unchanged. There are no descriptions of this experiment in the paper. I am assuming the authors measured the ATP-induced dissociation of actomyosin and determined ATP affinity (K1) from this experiment. If this is the case, they should describe the experiment and show the data, provide a second-order rate constate for ATP binding, and report the max rate of dissociation (k2). This is a kinetic experiment done frequently by this group, so the absence of these details is surprising.

      In the previous version of the manuscript, the method used to determine 1/K1 (ATP-induced dissociation of the actomyosin complex) was described in the Material and Methods paragraph “Transient kinetic analysis of the actomyosin complex” and the values obtained for 1/K1 were given in Table 1. We now included the experimental data as an additional figure in the manuscript (Figure 8 – figure supplement 3). Furthermore, we also give the maximal dissociation rate k+2 and the apparent second-order rate constant for ATP-binding (K1k+2) for the WT and mutant actomyosin complex in Table 1. Therefore, we changed the paragraph in the Results section concerning this experiment to:

      “The apparent ATP–affinity (1/K1), the maximal dissociation rate of NM2A from F-actin in the presence of ATP (k+2), and the apparent second-order rate constant of ATP binding (K1k+2) showed no significant differences for complexes formed between NM2A and WT or p.E334Q filaments (Table 1, Figure 8 – figure supplement 3).”

      and the section in the Material and Methods to:

      “The apparent ATP–affinity of the actomyosin complex was determined by mixing the apyrase–treated, pyrene–labeled, phalloidin–stabilized actomyosin complex with increasing concentrations of ATP at the stopped–flow system. Fitting an exponential function to the individual transients yields the ATP–dependent dissociation rate of NM2A–2R from F–actin (kobs). The kobs–values were plotted against the corresponding ATP concentrations and a hyperbola was fitted to the data. The fit yields the apparent ATP–affinity (1/K1) of the actomyosin complex and the maximal dissociation rate k+2.

      The apparent second–order rate constant for ATP binding (K1k+2) was determined by applying a linear fit to the data obtained at low ATP concentrations (0 – 25 µM).”

      For a better understanding of the numerous rate and equilibrium constants, we have now included a figure showing the kinetic reaction scheme of the myosin ATPase cycle (Figure 8 – figure supplement 1).

      Recommendations for the authors:

      Reviewer #1:

      • The subdomains of actin are mislabeled in Fig. 1A.

      The labeling of the subdomains has been corrected.

      • Additional experimental data addressing the 3 weaknesses noted in the public review would be informative but are not essential in my opinion. Examining the effect of cofilin on severing by the TIRF assay in more detail and using a processivity assay for myosin V (immobilized actin) would be the two aspects I would most value.

      The TIRF assay for cofilin severing was performed initially over the cofilin concentration range from 20 to 250 nM. The results obtained in the presence of 100 nM cofilin allow a particularly informative depiction of the differences observed with mutant and WT actin. This applies to the image series showing the changes in filament length, cofilin clusters, and filament number as well as to the graphs showing time dependent changes in the number of filaments and total actin fluorescence. We have not included the results for a 50:50 mixture of WT:mutant actin because its attenuating effect is documented in several other experiments in the manuscript.

      Our results with Myo5A show a less productive interaction with mutant actin filaments as indicated by a 1.7-fold reduction in the average sliding velocity and an increase in the optimal Myo5A-HMM surface density from 770 to 3100 molecules per µm2. These results indicate a reduction in binding affinity and coupling efficiency, with a likely impact on processivity. Given that Myo5A is only one of many cytoskeletal myosin motors and that the motor properties of all myosins are modulated by the presence of tropomyosin isoforms and other actin binding proteins, we expect only a small incremental gain in knowledge by performing additional experiments with an inverted assay geometry.

      Reviewer #2:

      • The authors should address the concerns regarding the statistical methodologies.

      We have gone through the manuscript carefully to correct any errors in the statistics, as explained below.

      Figure 1B, 5B, 5C, 5D, 8D, 9B, and 8 – figure supplement 2 all show the mean ± SD, as also correctly reported for Figure 8E and 8F in the figure legend. The statement, that these figures show the mean ± SEM was wrong and we corrected this mistake for all the listed figures. Furthermore, we now give the exact N for every experiment in the figure legend.

      Figure 2C, 2E, 2F, 4B, 5A, 6B-E indeed showed the mean ± SEM. As the reviewer rightly points out, this is not the appropriate way to deal with such sample sizes. We therefore corrected the figures to show the mean ± SD.

      We still refer to the mean ± SEM in Figure 2B, where elongation rates for more than 100 filaments were recorded, and in Figure 8B, where sliding velocities for several thousand actin filaments were measured.

      • The authors should present the actin titration of the steady state ATPase activity for at least one of the myosins, or preferably all of them.

      An actin titration of the steady state ATPase activity of NM-2A has been included in the revised version of the manuscript (Fig 8C).

      • The authors should consider the use of pyrene-actin in measuring the assembly/disassembly of actin.

      Values for the rate of actin assembly/disassembly measured with pyrene-actin are given in Table 1. Based on the small changes observed, we did not determine the critical actin concentration for the mutant construct.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We thank reviewer #1 for identifying the major caveats of the paper, and have split them out into separate comments below to address them.

      Comment 1) The caveats are that ecosystem processes beyond water availability are not investigated although they are brought into play in the title and in the paper

      Author response: We disagree that water availability is the only ecosystem process investigated in this study, as herbivory, plant mortality, and the maintenance of diversity in higher trophic levels are important processes within ecosystems. We have added text to the abstract and introduction clarifying that we consider these response measures to be ecosystem processes. Further language to this effect already exists in the abstract, methods, and discussion.

      Comment 2) That herbivory beyond leaf damage was not reported (there might be none, the reader needs to be shown the evidence for this)

      Author response: This is typically how herbivory is assessed in ecological studies, and our focus is on folivores. There may be additional herbivory in the form of fluid-sucking insects, shoot/root herbivory, etc., but these were not assessed. It would be interesting to assess these other forms of herbivory to see if they respond similarly with additional studies.

      Comment 3) That herbivore diversity is defined by leaf damage (authors need to give evidence that this is a valid inference)

      Author response: We thank reviewer #1 for pointing out the lack of written support for this claim. We have modified the methods (lines 138-139; 214-217) to clarify that this is a useful proxy for insect richness in the Piper system, and have added citations demonstrating it has been found to correlate well with insect richness in tropical forests.

      Comment 4) That the plots were isolated from herbivores beyond their borders

      Author response: This was not an assumption of the study. We have modified the methods (line 200) to make this clearer to the reader.

      Comment 5) That the effects of extreme climate events were isolated to Peru

      Author response: This was not an assumption of the study, rather it is an observation. While we consider it important to include observed climate differences between sites in the interpretation of our results, it was not necessary for there to be extreme climate events at other sites as we consider manipulated water availability to represent changes in precipitation that are expected to occur at these sites with climate change.

      Comment 6) That intraspecific variation in the host plants needs to be explained and interpreted in more detail

      Author response: We thank reviewer #1 for identifying that our current explanations needed development. We have modified the introduction to explore potential mechanisms relating intraspecific diversity to ecosystem function based on recent studies, and have modified the discussion to bring focus to why the effects of intraspecific differ from interspecific.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1) Pare this material down to simpler results. The most significant to me is the intraspecific variation in damage. Were this broken out and reported in some detail it could be quite interesting. I find the results to be a confusing blizzard of multiple factors that differ among sites; after reading the paper twice I could not recall the takeaway lesson beyond that drought wrecks the diversity of herbivores and sometimes even kills the host plant.

      Author response: We agree that the results are complicated given the variation in effects among sites, but this variation and complexity is important – and is in itself is one of the takeaway points. Unfortunately, nature is not simple. We have made several large edits to the results section, including the removal of methodological and otherwise redundant information, to hopefully bring the major takeaways into focus.

      Reviewer #2 (Public Review):

      Comment 1) This is an important and large experimental study examining the effects of plant species richness, plant genotypic richness, and soil water availability on herbivory patterns on Piper species in tropical forests.

      A major strength is the size of the study and the fact that it tackled so many potentially important factors simultaneously. The authors examined both interspecific plant diversity and intraspecific plant diversity. They crossed that with a water availability treatment. And they repeated the experiment across five geographically separated sites.

      The authors find that both water availability and plant diversity, intraspecific and interspecific, influence herbivore diversity and herbivory, but that the effects differ in important ways across sites. I found the study to be solid and the results to be very convincing. The results will help the field grapple with the importance of environmental change and biodiversity loss and how they structure communities and alter species interactions.

      Author response: We thank reviewer #2 for their kind words.

      Reviewer #2 (Recommendations For The Authors):

      Comment 1) I was confused about why the authors measured species diversity/richness as a proportion of the species pool. This means that the metric of richness decreases if species are added to the species pool but not the plot/experiment. I think I understand it, but I suggest the authors explain this choice.

      Author response: We thank reviewer #2 for pointing out that this was confusing. We have clarified the methods (lines 228-232) to explain that this choice was made to allow easier comparison between intra- and interspecific richness.

      Comment 2) One of the stronger estimated relationships was a positive effect of plant species richness on insect richness. I found it a little hard to interpret this relationship. Is this just because there are host species specialists? So, with more host species there are more herbivore species? Or does insect richness increase multiplicatively with increasing plant species richness? One way to look for this would be for the authors to examine the relationship between plant species richness and the average number of herbivore damage types per plant species.

      Author response: We agree that this is important for the reader to understand and have added text to the introduction and discussion sections explaining that this is the expectation based on theory and other empirical studies. We have additionally added text to the discussion (lines 386-388) pointing out that this pattern was not observed at all sites. While we agree that it would be interesting to explore if this effect was additive or multiplicative, we do not believe this is in the scope of the paper due to the methods used to measure insect richness.

      Comment 3) Unless I missed it, some important information about the models was missing. E.g., what distributions were assumed for each of the variables? Any transformations?

      Author response: We thank reviewer #2 for pointing this out, this information has been added to the methods (lines 272-274)

      Comment 4) Why is there no model with water addition affecting insect richness directly but not percent herbivory directly?

      Author response: While we originally decided to not include this model due to lack of theoretical support and low statistical performance, we have added references to this model (now model II) in the methods and results for consistency and to make model performance clearer to the reader. We have additionally moved supplemental table S1 to the main text to make the models and hypotheses tested by each model more accessible.

      Comment 5) Fig. 2. What are the percentages above the figures? Maybe PD values?

      Author response: These values are now clarified in the figure caption

      Comment 6) L364 "can differ dramatically" This is vague and confusing. Differ in what way? From each other? Did the authors really expect plant richness to have the same effect on herbivory and plant survival? What would it mean anyway for plant richness to have the same effect on herbivory and plant survival?

      Author response: We agree that the language here is confusing and thank reviewer #1 for drawing our attention to it. We have modified the discussion (lines 363-365) to clarify that the direction of effect of intraspecific richness can vary from the direction of effect of interspecific richness, rather than the effects on different response variables varying from each other.

      Comment 7) L 375 "only meaningful differences" This statement feels a little overly strong. It seems like there is a good argument for this, but there could be other things going on.

      Author response: We agree that the language here was unnecessarily strong, and have modified the discussion (lines 398-403) to focus on the lack of difference between methodologies at these two sites, and the observed differences in climate and community structure at each site.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors aimed to investigate how cells respond to dynamic combinations of two stresses compared to dynamic inputs of a single stress. They applied the two stresses - carbon stress and hyperosmotic stress - either in or out of phase, adding and removing glucose and sorbitol.

      Both a strength and a weakness, as well as the main discovery, is that the cells' hyperosmotic response strongly requires glucose. For in-phase stress, cells are exposed to hyperosmotic shock without glucose, limiting their ability to respond with the well-studied HOG pathway; for anti-phase stress, cells do have glucose when hyperosmotically shocked, but experience a hypo-osmotic shock when both glucose and sorbitol are simultaneously removed. Responding with the HOG pathway and so amassing intracellular glycerol amplifies the impact of this hypo-osmotic shock. Counterintuitively then, it is the presence of glucose rather than the stress of its absence that is deleterious for the cells.

      The bulk of the paper supports these conclusions with clean, compelling time-lapse microscopy, including extensive analysis of gene deletions in the HOG network and measurements of both division and death rates. The methodology the authors develop is powerful and widely applicable.

      Some discussion of the value of applying periodic inputs would be helpful. Cells are unlikely to have previously seen such inputs, and periodic stimuli may reveal behaviours that are rarely relevant to selection.

      We thank the referee for his review. To answer the reviewer’s last comment, our main objective was not to study conditions that are ecologically relevant, but rather to perturb the system in an original way to reveal new mechanisms and properties of the system. The main advantage of periodic inputs over more complex or unpredictible types of temporal fluctuations is that they can be defined with few parameters that are easy to interpret and to integrate in biophysical models. For instance, by using periodic inputs we were able to investigate how changing the phasing of two stresses impacted fitness while keeping other parameters constant (the duration of each stress was kept constant). We added two sentences at the beginning of the discussion to highlight the value of using periodic inputs.

      We do not fully agree with the reviewer’s statement that periodic stimuli may reveal behaviours that are rarely relevant to selection. Indeed, many parameters of natural environments are known to vary periodically, such as light, temperature, predation, tides. Even if the periodic stimuli we use are artificial, they can still be a valuable tool to reveal new molecular processes. For instance, null mutants have been invaluable to understand biological systems despite being unlikely to reveal behaviours relevant to selection.

      The authors' findings demonstrate the tight links that can exist between metabolism and the ability to respond to stress. Their study appears to have parted somewhat from their original aim because of the HOG pathway's reliance on glucose. It would be interesting to see if the cells behaviour is simpler in periodically varying sorbitol and a stress where there is little known connection to the HOG network, such as nitrogen stress.

      The use of periodic nitrogen stress is a very interesting suggestion from both reviewers. However, we think it represents a large amount of work that deserves its own study. In particular, it would require first identifying a relevant period at which nitrogen fluctuations have an impact on division rate similar to what we observed for glucose fluctuations before performing experiments in AS and IPS conditions.

      Nitrogen starvation is known to induce filamentous growth via activation of components of the HOG pathway (Cullen and Sprague, 2012), with potential cross-talk between filamentous growth and hyperosmotic stress response. Therefore, periodic osmotic stress and periodic nitrogen starvation may interact in a complex way.

      Reviewer #2 (Public Review):

      The authors have used microfluidic channels to study the response of budding yeast to variable environments. Namely, they tested the ability of the cells to divide when the medium was repeatedly switched between two different conditions at various frequencies. They first characterized the response to changes in glucose availability or in the presence of hyper-osmotic stress via the addition of sorbitol to the medium. Subsequently, the two stresses were combined by applying the alternatively or simultaneously (in-phase). Interestingly, the observed that the in-phase stress pattern allowed more divisions and low levels of cell mortality compared to the alternating stresses where cells were dividing slowly and many cells died. A number mutants in the HOG pathway were tested in these conditions to evaluate their responses. Moreover, the activation of the MAPK Hog1 and the transcriptional induction of the hyper-osmotic stress promoter STL1 were quantified by fluorescence microscopy.

      Overall, the manuscript is well structured and data are presented in a clear way. The time-lapse experiments were analyzed with high precision. The experiments confirm the importance of performing dynamic analysis of signal transduction pathways. While the experiments reveal some unexpected behavior, I find that the biological insights gained on this system remain relatively modest.

      In the discussion section, the authors mention two important behaviors that their data unveil: resource allocation (between glycolysis and HOG-driven adaptation) and regulation of the HOG-pathway based on the presence of glucose. These behaviors had been already observed in other reports (Sharifan et al. 2015 or Shen et al. 2023, for instance). I find that this manuscript does not provide a lot of additional insights into these processes.

      We thank the referee for his review. We agree with the reviewer that the interaction between glucose availability and osmotic stress response has been investigated in previous studies. However, this interaction was investigated using experimental procedures that differed from our approach in critical ways, and therefore the behaviors observed were not the same. In Sharifian et al. (2015), the authors identified a new negative feedback loop regulating Hog1 basal activity and described underlying molecular mechanisms. This feedback loop is unlikely to explain differences of cell fitness we observed in IPS and AS conditions, because 1) differences of division rate was still observed in hog1 mutant cells and 2) differences of death rate involve glycerol synthesis, which is independent of the feedback loop described in Sharifian et al. (2015). In Shen et al. (2023), the authors observed a stronger expression of Hog-responsive genes at lower glucose concentrations, which seems contradictory with our observation of very low pSTL1-GFP expression in absence of glucose. However, they did not use fluctuating conditions and they did not report expression of stress-response genes when glucose was totally depleted (the lower glucose concentration they used was 0.02%) as we did, which may explain the different outcomes. We added three sentences in the discussion to compare our findings to those of Shen et al. (2023).

      One clear evidence that is presented, however, is the link between glycerol accumulation during the sorbitol treatment and the cell death phenotype upon starvation in alternating stress condition. However, no explanations or hypothesis are formulated to explain the mechanism of resource allocation between glycolysis and HOG response that could explain the poor growth in alternating stresses or the lack of adaptation of Hog1 activity in absence of glucose.

      In the revised version of the manuscript, we included a new result section and a supplementary figure (Figure 4 – figure supplement 2) where we tested three hypotheses to explain the lower division rate observed in AS condition relative to IPS condition. We found no evidence supporting these hypotheses, and the mechanisms responsible for the reduced growth in AS condition therefore remains elusive.

      Another key question is to what extent the findings presented here can be extended to other types of perturbations. Would the use of alternative C-source or nitrogen starvation change the observed behaviors in dynamic stresses? If other types of stresses are used, can we expect a similar growth pattern between alternating versus in-phase stresses?

      As mentioned above in our response to the other reviewer, these are very interesting questions that we think go beyond the scope of our study due to the amount of work involved.

      Recommendations for the authors:

      Reviewer #1

      My comments are only minor.<br /> - More paragraphs would improve legibility.

      To improve legibility, we split the longer section of the Results in three paragraphs (page 12, section entitled “Osmoregulation is impaired under in-phase stresses but not under alternating stresses.” However, we kept it as one section with a single title for global coherency: each section of the results corresponds to one main figure and have one main conclusion.

      • I found AS and IPS confusing because what becomes important is whether sorbitol appears with glucose or not. For me, an acronym that makes that co-occurrence clear would be better or even better still no acronyms at all.

      We tried several alternative names for the two conditions in previous drafts of the manuscript. Based on colleagues feedback, AS and IPS acronyms appeared as a good compromise between concision and clarity. To avoid confusion, the two acronyms are precisely defined when they are first used in the Results section. We think it is more important to emphasize the co-occurrence (or not) of the two stresses, rather than the co-occurrence of glucose and sorbitol. Indeed, standard yeast medium contains glucose but no sorbitol, and therefore we defined the two periodic conditions based on differences from standard medium. Even though we avoided using acronyms as much as possible in the manuscript, the use of these two acronyms to refer to the dual fluctuations of the environment seemed essential for concision. Indeed, IPS and AS acronyms are used many times in the results (16 occurrences on page 12 alone), figures and figure legends.

      • I would consider moving some of Fig S2 to the main text: it helps clarify where Fig 2 is coming from and is referenced multiple times.

      We fully agree with the reviewer and we moved panels A-D from Figure S2 to the main Figure 2.

      • On page 10, "constantly facing a single stress that changes over time" is confusing. Perhaps "repetitively facing a single stress" instead?

      We agree this sentence could be wrongly interpreted the way it was written. We changed it to: “cells grow more slowly when facing periodic alternation of the two stresses (AS) than when facing periodic co-occurrence of these stresses (IPS)”.

      • Is there any knowledge on how cells resist hyperosmotic stress in the absence of glucose? That would help explain the IPS results.

      Based on comments from both reviewers, we surveyed the literature to flesh out the discussion of hypotheses that would help explain observed differences between AS and IPS conditions. We found few studies that investigated cell responses in the absence of glucose, and because of significant differences in the experimental approaches it remains difficult to explain our results from conclusions of these previous studies. For instance, Shen et al., 2023 described and modeled the hyperosmotic stress response at various glucose concentrations. They found that Hog1p relocation to the nucleus after hyperosmotic shock lasted longer at lower glucose concentration, which is consistent with our finding in absence of glucose. However, they did not include the absence of glucose in their experiments or periodic fluctuations of glucose concentration. In addition, their model ignores the impact of cell signaling processes involved in growth arrest in response to hyperosmotic stress or glucose depletion. It is therefore difficult to relate their conclusions to our results. We have developed the discussion of our study to include these hypotheses and to clarify what is explained or not in our IPS and AS results.

      There is knowledge on activation of the hyperosmotic stress pathway in response to glucose fluctuations, but not about the response to hyperosmotic stress in absence of glucose.

      • On page 11, Figure 5a should be Figure 4a.

      Correct.

      • I would explain the components of the HOG pathway in the caption of Fig 1 or in the text when you cite Fig 1a. They are described later, but an early overview would be useful.

      To give more context, we added the following sentences to the caption of Figure 1: “Yeast cells maintain osmotic equilibrium by regulating the intracellular concentration of glycerol. Glycerol synthesis is regulated by the activity of the HOG MAP kinase cascade that acts both in the cytoplasm (fast response) and on the transcription of target genes in the nucleus (long-term response). For simplicity, we only represented on the figure genes and proteins involved in this study.”

      • On page 16, I wasn't sure what "redirect metabolic fluxes against glycerol synthesis" meant.

      For more clarity, we modified this sentence to: “Since glucose is a metabolic precursor of glycerol, the absence of glucose may prevent glycerol synthesis and thereby fast osmoregulation."

      • For Fig 2, having a dot-dash and dash-dash lines rather than both dash-dash would be better.

      We made the proposed change, assuming the reviewer was referring to the gray dashed lines and not the colored ones.

      • In the caption of Fig 3, 2% glucose is 20 g/L.

      We thank the reviewer for catching this typo.

      • In the Materials and Methods Summary, adding how you estimated death rates would be helpful: they are not often reported.

      The calculation of death rates was explained in the Methods section. For more clarity, we modified the names of the parameters in the equation to make more explicit which ones refer to cell death.

      Reviewer #2 (Recommendations For The Authors):

      In Figure 2, it would be interesting to show individual growth rates of the perturbations at various frequencies as shown in Figures 3 c and d.

      We thank the reviewer for this suggestion. We added a new supplementary figure (Figure 2 – figure supplement 2) showing the temporal dynamics of division rates at three different frequencies of osmostress and glucose depletion. We did not include high frequencies (periods below 48 minutes) because the temporal resolution of image acquisition in our experiments (1 image every 6 minutes) was too low. Very interestingly, this new analysis suggests that the positive relationship between the frequency of glucose depletion and division rate is explained by a delay between glucose removal and growth arrest rather than a delay between glucose addition and growth recovery. We therefore added the following conclusion:

      “Under periodic fluctuations of 2% glucose, the division rate was lower during half-periods without glucose than during half-periods with glucose (Figure 2 – figure supplement 2d-f), as expected. However, this difference depended on the frequency of glucose fluctuations: the average division rate during half-periods without glucose was higher at high frequency (small period) than at low frequency (large period) of fluctuations (Figure 2 – figure supplement 2d-f). Therefore, the effect of the frequency of glucose availability on the division rate in 2% glucose is likely due to a delay between glucose removal and growth arrest: cell proliferation never stops when the frequency of glucose depletion is too fast.”

      According to Sharifan et al. 2015, I would have expected that Hog1 would not relocate in the nucleus in 0% glucose. I wonder if this is due to the use of sorbitol as a stressor or the presence of low levels of glucose in the medium. I would suggest performing some control experiments with NaCl as hyperosmotic agent and test the addition of 2-deoxy-glucose to completely block glycolysis.

      After careful reading of Sharifian et al. 2015, we fail to understand why the reviewer think Hog1 would be expected to not relocate to the nucleus after hyperosmotic stress in 0% glucose. In this previous study, the authors never combined glucose depletion with a strong hyperosmotic stress as we did in our study. They report the results of independent experiments where cells were exposed either to a single pulse of hyperosmotic stress (0.4 M NaCl) or to transient glucose starvation, but they did not combine these two stimuli. In this context, it is difficult to compare their results with ours. The fact that Sharifian et al. 2015 did not observe Hog1 nuclear relocation in 0% glucose (consistent with our result in Figure 6 – figure supplement 1a, yellow curve) is not inconsistent with our observation of Hog1 nuclear enrichment in 0% glucose + 1M sorbitol. One potential discrepancy between the two studies is the fact that they observed a small transient peak of Hog1 nuclear localization just after glucose is added back to the medium, while we failed to observe this peak in similar conditions (yellow curve in Figure 6 – figure supplement 1a). However, this could be simply explained by the temporal resolution of our experimental system: we image cells once every 6 minutes and the peak lasts less than 2 minutes in Sharifian et al. 2015. We added a sentence to discuss this minor point in the Results: “Although previous studies observed small transient (less than two minutes) peaks of Hog1-GFP nuclear localization after glucose was added back to the medium following glucose depletion (Sharifian et al., 2015, Piao et al., 2013), the temporal resolution in our experiments (one image every 6 minutes) may have been too low to detect these peaks.”.

      While we agree many additional experiments would be interesting, such as testing the effects of different stress factors or the non-metabolizable glucose analog 2-deoxy-D-glucose, we think this is beyond the scope of this study because such experiments are likely to open broad perspectives and to not be conclusive in a reasonable amount of time.

      When discussing Figure 7, the authors write that the HOG pathway is "overactivated" or "hyperactivated". I would refrain from using these terms because as seen in Figure 6, the Hog1 activity pattern, if anything, decreases as the number of alternative pulses increases. The high level of pSTL1mCitrine measured is mostly due to the long half-life of the fluorescent protein.

      We used the formulation “hyper-activation” of the HOG pathway because Mitchell et al. 2015 used it to refer to the same phenomenon in their seminal study. This "hyper-activation" refers to the fact that both the integral activation of Hog1p (sum of areas under Hog1 nuclear peaks) and the global activation of transcriptional targets is much higher during fast periodic hyperosmotic stress than during constant hyperosmotic stress. That being said, we understand the point made by the reviewer about the decreasing size of Hog1 peaks over time during repeated pulses of osmotic stress. Therefore, we slightly modified the text to refer to hyper-activation of pSTL1-mCitrine transcription or expression instead of hyper-activation of the HOG pathway. For coherency, we replaced all instances of “overactivation” by “hyper-activation”.

      Last but not least, the high level of pSTL1-mCitrine is both due to the long half-life of the protein and to the fact that pSTL1 transcription is never turned off due to high Hog1p activity under fast periodic osmostress.

      Minor comments:

      In the main text, I think it might be more intuitive to refer to doubling time in hours instead of division rates in 1/min which are harder to interpret.

      In an early draft of the manuscript, we made figures with either division rates or with doubling times (ln(2)/division rate) and we received mixed opinions from colleagues on what measure was more intuitive to interpret. Both measures are widely used in the literature, and we decided to use division rates in the final version of the figures because it was more directly related to population growth rate and to fitness. For instance, the population growth rate shown in Figure 5 is simply calculated by subtracting the death rate from the division rate. For coherency, we therefore reported division rates instead of doubling times in figures and results. However, to address the reviewer’s comment we included the doubling times (in addition to the division rates) when mentioning the most important results. For instance, page 12: “Strikingly, cells divided about twice as fast under IPS condition (1.67 x 10-3 division/min, corresponding to an average doubling time of 415 minutes) than under AS condition (9.4 x 10-4 division/min, corresponding to an average doubling time of 737 minutes)”.

      I found various capitalized version of "HOG /Hog pathway"

      We corrected this incoherency and used “HOG pathway” everywhere.

      Page 11. Figure 5a should refer to Figure 4a I believe.

      Correct.

      The methods are generally very thorough and precise. The explanation about the calculation of the division rate seems incomplete. For completeness, it would be good to mention the brand and model of valves used. In addition, it would be interesting to have an idea of the number of cells and microcolonies tracked in the various growth experiments.

      We are not sure why the reviewer found the explanation of the calculation of division rate incomplete. For more clarity, we modified the names of parameters in the equations to make them more explicit. We also added a reference to Supplementary File 1 that contains all R scripts used to calculate division rates and death rates. We included the brand and model of valves used, as requested. As for the number of cells tracked in the various experiments, we mentioned in the Methods: “we selected 25 positions (25 fields of view) of the motorized stage (Prior Scientific ProScan III) that captured 10 to 50 cells in each of the 25 growth chambers of the chip and were focused slightly below the median cell plane based on cell wall contrast.” To address the reviewer’s comment, we also included the range of number of tracked cells for each experiment in corresponding figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First, we would like to thank you and all the reviewers for acknowledging the meaningful contribution of our manuscript to the field. Your useful comments helped us improve the manuscript's quality. We understood the key issues of the manuscript were the quantification of inference accuracy and applicability to methylome data. We here therefore present a revised version of the manuscript addressing all major comments.

      For each demographic inference we have added the root mean square error as demanded by the reviewers. These results confirm the previous interpretation of the graphs especially in recent times. We also added TMRCA inference analysis as requested by one reviewer as a proof of principle that integrating multiple markers can improve ARG inference.

      The discussion was rewritten to further discuss the challenges of application to empirical methylation data. We clarify that in the case epimutations are well understood and modelled, they can be integrated into a SMC framework to improve the approaches accuracy. When epimutations are not well understood, our approach can help understand the epimutations process through generations at the evolutionary time scale along the genome. Hence, in both cases our approach can be used to unveil marker evolution processes through generations, and/or deepen our understanding of the population past history. We hope our discussion underlies better how our approach is designed and can be used.

      eLife assessment

      This important study advances existing approaches for demographic inference by incorporating rapidly mutating markers such as switches in methylation state. The authors provide a solid comparison of their approach to existing methods, although the work would benefit from some additional consideration of the challenges in the empirical use of methylation data. The work will be of broad interest to population geneticists, both in terms of the novel approach and the statistical inference proposed.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalecent model that allows to simultaneously analyse multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

      Answer: We thank the reviewer 1 for his positive comments and acknowledging the future promises of our method as better and more reliable data will be available in different species. We appreciate the reviewer noticing the complete set of work undertaken here to integrate local and regional effects of methylation into a model containing as much knowledge of the epigenetics mutational processes as possible. Note that in Figure 2 of the manuscript we observed a gain of accuracy even when the rates are unknown. Our results thus suggests that the accuracy gain of additional marker with unknown rates is also possible, although it is most likely be scenario and rate dependent.

      At last, as noticed and highlighted by the very recent work of the Johannes lab (Yao et al. Science 2023) using phylogenetic methods, knowing the epimutation rate is essential at short time scale to avoid confounding effects of homoplasy. In our estimation of the coalescent trees, the same applies, though our model considers finite site markers. We now provide additional evidence for the potential gain of power to infer the TMRCA (Supplementary Table S7) when knowing or not the epimutation rates and revised the discussion to clarify the potential shortcomings/caveats for the analysis of real data.

      Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

      Answer: We thank the reviewer 2 for his positive comments. Indeed, surveys of CG methylation in other plant species show that its distribution is clearly bimodal (i.e. binary). This is not the case for non-CG methylation, such as CHG and CHH (where H=C,T,A). However, these later types of methylation contexts are also not heritable across generations and can therefore not be used as heritable molecular markers.

      Reviewer #3 (Public Review):

      I very much like this approach and the idea of incorporating hypervariable markers. The method is intriguing, and the ability to e.g. estimate recombination rates, the size of DMRs, etc. is a really nice plus. I am not able to comment on the details of the statistical inference, but from what I can evaluate it seems sound and reasonable. This is an exciting new avenue for thinking about inference from genomic data. I have a few concerns about the presentation and then also questions about the use of empirical methylation data sets.

      I think a more detailed description of demographic accuracy is warranted. For example, in L245 MSMC2 identifies the bottleneck (albeit smoothed) and only slightly overestimates recent size. In the same analysis the authors' approach with unknown mu infers a nonexistent population increase by an order of magnitude that is not mentioned.

      Answer: We thank the reviewer 3 for his positive comments and refer to our answer to reviewer 1 above. We added RMSE (Root Mean Square Error) analyses to quantify the inference accuracy. We apologize for not mentioning this last point. Thank you for pointing this out and we have now fixed it (line 245-253).

      Similarly, it seems problematic that (L556) the approach requiring estimation of site and region parameters (as would presumably be needed in most empirical systems like endangered nonmodel species mentioned in the introduction) does no better than using only SNPs. Overall, I think a more objective and perhaps quantitative comparison of approaches is warranted.

      Answer : See answer to reviewer 1 above, and more elaborate answers below. We provide now new RMSE analyses to quantify the accuracy of our demographic inference (Supplementary Tables 1,6,7,8,9,10). We also discuss the validity and usefulness of our approach when the epimutation rates are unknown. In short, the discussion was rewritten to further discuss the challenges of application to empirical methylation data. We clarify that in the case epimutations are well known and modelled (as much is known in A. thaliana for example), they can be integrated into a SMC framework to improve the accuracy of the method approach. When epimutations are not well understood and rates unknown, our approach can help understand the epimutational process through generations at the evolutionary time scale. Hence, whether makers are understood or not, our approach can be used to study the marker evolutionary processes through generations and/or to deepen our understanding of the population past history. We hope our discussion underlies better how our approach is designed and can be used.

      The authors simulate methylated markers at 2% (and in some places up to 20%). In many plant genomes a large proportion of cytosines are methylated (e.g. 70% in maize: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8496265/). I don't know what % of these may be polymorphic, but this leads to an order of magnitude more methylated cytosines than there are SNPs. Couldn't this mean that any appreciable error in estimating methylation threatens to be of a similar order of magnitude to the SNP data? I would welcome the authors' thoughts here.

      Answer : The reviewer is correct and this is an interesting question. First, studies show that heritable epimutations in plants are restricted to CG dinucleotides that are located well outside of the target regions of de novo methylation pathways in plants. Most of these CGs tend of fall within so-called gene body methylated regions. While it is true that plant species can differ substantially in their proportion of methylation at the genome-wide scale, the number of gene body methylated genes (i.e. genic CG methylation) is relatively similar, and at least well within the same order of magnitude (Takuno et al. Nature Plants 2016, review in Muyle et al. Genome Biol Evol 2022). Moreover, spontaneous CG epimutations in gene body methylated regions has been shown to be neutral (van Der Graaf et al. 2015, Vidali et al. 2016, Yao et al. 2023), which is an ideal property for phylogentic and demographic inference.

      Second, CG methylation calls are sometimes affected by coverage or uncertainty. Stringent filtering for reliable SMP calls typically reduces the total proportion of CG sites that can be used as input for demographic inference. Here we only kept CG sites where the methylation information could be fully trusted after SMP calling (i.e. >99.9% posteriori certainty). Overall, this explains why the percentage of sites with methylation information is so small, and why we have decided to work on simulation with 2% of reliable methylated markers.

      Nevertheless, for the sake of generality, it may be that in some species such as maize a higher percentage of polymorphic methylated sites can be used, and the number of SMPs could be higher than that of SNPs when the effective population size is very small (due to past demographic history and/or life history traits). In this case, any error in the epimutation rate and variance due to the finite site model estimation (and homoplasy) are not corrected by the lack of SNPs and can lead to mis-inference.

      A few points of discussion about the biology of methylation might be worth including. For example, methylation can differ among cell types or cells within a tissue, yet sequencing approaches evaluate a pool of cells. This results in a reasonable fraction of sites having methylation rates not clearly 0 or 1. How does this variation affect the method? Similarly, while the authors cite literature about the stable inheritance of methylation, a sentence or so more about the time scale over which this occurs would be helpful.

      Answer: We thank reviewer 3 for asking those very interesting questions, which we further developed below and mention in the discussion (lines 716-722).

      For Arabidopsis thaliana:

      Following up on our previous comment above, the majority of the CG sites that serve as input to our approach are located in body methylated genes. Previous work has shown that CG methylation in these regions shows essentially no tissue and cellular heterogeneity (e.g. Horvath et al. 2019). This means that bulk methylation measurements only show limited susceptibility to measurement error. That said, to guard against any spurious SMPs call that could arise from residual measurement variation, we applied stringent filtering of CG methylation. We have kept sites where the methylation percentage is close to either 0% or 100% (the rest being removed from the analysis). We have used similar filtering strategies in previous studies of epimutational processes in mutation accumulation lines and long-lived perennials (work of the Johannes lab). In these later studies we found that the SMP calls sufficiently accurate for inferences of phylogenetic parameters in experimental settings (Sharyhary et al. Genome Biology 2021, Yao et al. Science, 2023).

      For other species:

      It is true that currently, evaluating the methylation state of a site from a pool of cells may be problematic for some species for two main reasons: 1) it will add noise to the signal and SMP calling could be erroneous, and 2) the methylation state used in analysis might originate from different tissues at different location of the genome/methylome. Overall, this will lead to spurious SMPs and can render the inference inaccurate (see Sellinger et al 2021 for the effect of spurious SNPs). Hence, caution is advised when calling SMPs in other species and for different tissues.

      Finally, in some species methylated cytosines have mutation rates an order of magnitude higher than other nucleotides. The authors mention they assume independence, but how would violation of this assumption affect their inference?

      Answer: Indeed, we assume the mutation and epimutation process to be independent thus the probability for a SNP to occur does not depend on the local methylation state. If this was the case, the mutation rate use would indeed be wrong to a degree function of the dependency between the processes. We suggest that by ignoring this dependence, we are in the same situation as ignoring the variation of mutation rate along the genome. We have previously documented the effect of ignoring this biological feature of genomes in Strüt et al 2023 and Sellinger et al 2021. The variation in mutation rate along the genome if too extreme and not accounted for can lead to erroneous inference results. However, this problem could be easily solved (modelled) by adapting the emission matrix. To correctly model this dependency, additional knowledge is needed: either the mutation and epimutation rates must be known to quantify the dependency, or the dependency must be known to quantify the resulting rates. As far as we know, these data are at the moment not available, but could maybe be obtained using the MA lines of A. thaliana (used in Yao et al. 2023).

      Recommendations for the authors:

      All three reviewers liked this approach and found it a valuable contribution. I think it is important to address reviewer 1/3 concerns about quantifying the accuracy of inference (the TMRCA approach from reviewer 1 sounds pretty reasonable), and reviewer 1 also highlights an intriguing point about model accuracy being worse when the mutation rate is known. Additionally, I think some discussion is warranted about challenges dealing with empirical methylation data (points from Rev 2 and 3 as well as Rev 1's question about inferred vs published rates of epigenetic mutation).

      Answer : We have added tables containing the root mean square error (RMSE) of every demographic inference in the manuscript to better quantify accuracy. We have below given the explanation on why accuracy in presence of site and region epimutations can in some cases decrease when real rates are known (because methylation state at the region level needs to be first inferred). We added evidence that accounting for methylation can improve the accuracy when recovering the TMRCA along the genome when the rates are known. We also have enhanced the discussion on the challenges of dealing with epimutations data for inference. As is suggested, we hope this study will generate an interest in tackling these challenges by applying the methods to various methylome datasets from different species.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • For all of the simulated demographic inference results, only plots are presented. This allowsfor qualitative but not quantitative comparisons to be made across different methods. It is not easy to tell which result is actually better. For example, in Supp. Fig. 5, eSMC2 seems slightly better in the ancient past, and times the trough more effectively, while SMCm seems a bit better in the very recent past. For a more rigorous approach, it would be useful to have accompanying tables that measure e.g. mean-squared error (along with confidence intervals) for each of the different scenarios, similar to what is already done in Tables 1 and 2 for estimating $r$.

      Answer : We understand the concern of reviewer #1 for a more quantitative approach to compare the inference results. We agree that plots are not sufficient to fully grasp a method performance. To provide better supports to quantity approaches performance, we added Sup tables 1,6,8,9 and 10 containing the RMSE (in log10 for visibility) for all Figures. The root mean-squared error is calculated as in Sellinger 2021 and a description of how the root mean-squared error is calculated and now found in the method section lines 886-893.

      • 434: The discussion downplays the really odd result that inputting the true value of themutation rate, in some cases, produces much worse estimates than when they are learned from data (SFig. 6)! I can't think of any reason why this should happen other than some sort of mathematical error or software bug. I strongly encourage the authors to pin down the cause of this puzzling behaviour.

      Answer : There are unfortunately no errors in this plot and those results are perfectly normal and coherent, but we understand they can be confusing at first.

      As described in the method section and in the appendix, when accounting for regionlevel epimutations, our algorithm requires the regional methylation status which needs to be inferred as a first step from the data (real or simulated). Because region and single site epimutation events are occurring at similar rates in our simulated scenario, the methylation state of the region is very hard to correctly recover (e.g. there will be unmethylated site in methylated regions and methylated sites in unmethylated regions). In other words, the accuracy of the region estimation HMM procedure is decreased by the joint action of site and region epimutation processes.

      When subsequently applying the HMM for inference, as described in the appendix, the probabilities of two CG site being in the same or different methylation state depends on the methlylation state of the "region". Hence the mislabelling of the region methylation state is (to some extent) equivalent to spurious SMPs (or inaccurate SMP calling).

      If the true rates for site and region epimutations are given as input, the model forces the demography (and other inferred parameters) to fit the observed distribution of SMPs (given the inputted rates), resulting in the poor accuracy observed in the Figure (Now Supplementary Figure 7).

      Note: The estimated rates from real data in A. thaliana suffer from the same issue as the region and site epimutation rates are independently estimated, and the existence of regions first quantified using an independent HMM method (Denkena et al. 2022).

      However, when rates are freely inferred, they are inferred accordingly to the estimated methylation status of regions and SNPs. Therefore, even if the inferred rates are wrong, they are used by the SMC in a more consistent way.

      Note: When methylation rates violate the infinite site assumption, such as here, we first estimate the tree sequence along the genome using SNPs (i.e. DNA mutations). The algorithm then infers the epimutations rates given the inferred coalescent times and the observed methylation diversity.

      To summarise: when inputting rates to the model, if the model fails to correctly recover the region methylation status there will be conflicting information between SNPs and SMPs leading to accuracy loss. However if the rates are inferred this is realized with the help of SNPs, leading to less conflicting information and potentially smaller loss of accuracy. We apologize that the explanations were missing from the manuscript and have added them lines 449-460 and 702-716.

      A further argument is that if region and site epimutations occur at rates of at least two orders of magnitude difference, the inference results are better (and accurate) when the true rates are given. The reason is that one epimutational process overrides the other (see Supplementary Table 2). In that case one epimutation process is almost negligible and we fall back to results from Figure 5 or Supplementary Figure 6.

      • As noted at 580, all of the added power from integrating SMPs/DMRs should come fromimproved estimation of recent TMRCAs. So, another way to study how much improvement there is would be to look at the true vs. estimated/posterior TMRCAs. Although I agree that demographic inference is ultimately the most relevant task, comparing TMRCA inference would eliminate other sources of differences between the methods (different optimization schemes, algorithmic/numerical quirks, and so forth). This could be a useful addition, and may also give you more insight into why the augmented SMC methods do worse in some cases.

      Answer : We fully agree with reviewer 1. We have added a comparison in TMRCA inference as proof of principle between using or not using methylation sites. The results are written in Supplementary Table 7 and methodology is inspired by Schiffels 2014 and described at the end of the method section (line 894-907). Those results demonstrate the potential gain in accuracy when using methylation polymorphic. However, TMRCA (or ARG) inference is a very vast and complex subject in its own right. Therefore, we are developing a complete TMRCA/ARG inference investigation and an improve methodology than the one presented in this manuscript. To do so we are currently working on a manuscript focusing on this topic specifically. We hence consider further investigations of TMRCA/ARG inference beyond the scope of this current study.

      • A general remark on the derivations in Section 2 of the supplement: I checked theseformulas as best I could. But a cleaner, less tedious way of calculating these probabilities would be to express the mutation processes as continuous time Markov chains. Then all that is needed is to specify the rate matrices; computing the emission probabilities needed for the SMC methods reduces to manipulating the results of some matrix exponentials. In fact, because the processes are noninteracting, the rate matrix decomposes into a Kronecker sum of the individual rate matrices for each process, which is very easy to code up. And this structure can be exploited when computing the matrix exponential, if speed is an issue.

      Answer: We thank the reviewer for this very interesting suggestion! Unfortunately, it is a bit late to re-implement the algorithm and reshape the manuscript according to this suggestion. Speed is not yet an issue but will most likely become one in the future when integrating many different rates or when using a more complex SMC model. Hence, we added reviewer #1 suggestions to the discussion (line 648) and hope to be using it in our future projects.

      • Most (all?) of the SNP-only SMC methods allow for binning together consecutiveobservations to cut down on computation time. I did not see binning mentioned anywhere, did you consider it? If the method really processes every site, how long does it take to run?

      Answer: This is a very good question. We do the binning exactly as described in Mailund 2013 & Terhorst 2017, and added this information in the method section (lines 801-809). However, as described in Terhorst 2017, one can only bin observation of the same "type" (to compute the Baum-Welch algorithm). Therefore, the computation time gain by binning is reduced when different markers spread along the genome in high proportion. This is the approach we used throughout the study when facing multiple markers as it had the best speed performance. As for example, when the proportion of site with methylated information is 1% or less, computation time is only slightly affected (i.e. same order of magnitude).

      However, the binning method presented in Mailund 2013 can be extended to observation of different types, but parameters need to be estimated through a full likelihood approach (as presented in Figure 2). In our study this approach did not have the best speed performance. However, as our study is the first of its kind, it remains sub-optimal for now. Hence, we did not further investigate the performance of our approach in presence of many multiple different genomic marker (e.g. 5 different markers each representing ~20% of the genome each). Currently, with SMC approaches a high proportion of sites contain the information "No SNPs", making the Baum welch algorithm described in Terhorst 2017 very efficient. But when further developing our theoretical approach, we expect that most of the sites in a genome analysis will contain some "information", which could render the full likelihood approach computationally more tractable.

      • 486: The assumed site and region (de)methylation rates listed here are several OOMdifferent from what your method estimated (Supp. Tables 5-6). Yet, on simulated data your method is usually correct to within an order of magnitude (Supp. Table 4). How are we to interpret this much larger difference between the published estimates and yours? If the published estimates are not reliable, doesn't that call into question your interpretation of the blue line in Fig. 7 at 533?

      Answer: We thank the reviewer for asking this question. We believe answering this question is indeed the most interesting aspect of our study. Beyond demographic inference, our study has indeed unveiled a discrepancy between rates inferred through biological experiment and our study through the use of SNPs and branch length. There are several reasons which could explained the discrepancy between both approaches:

      • Firstly, our underlying HMM hypotheses are certainly violated. We ignoredpopulation structure, variation of mutations and recombination rate along the genome as well as the effect of selection. Hence, the branch lengths used for methylation rate estimations are to some extent inaccurate. We note that this is especially likely for the short branches of coalescent tree originating from background selection events in the coding regions and which are especially observable when using the methylation sites with a higher mutation rate than SNPs (Yao et al. 2023) at body methylated genes.

      • Secondly, calling single methylation site polymorphism is not 100 % reliable. If theerror rate is 0.1%, as the study was conducted on ~10 generations a minimum epimutation rate of 10-4 is to be expected. However, because our approach works at the evolutionary time scale, we expect that it suffers less from this bias as the proportion of diversity originating from actual epimutations, and not SMP calling error, should be greater.

      • Thirdly, as mentioned above, recovering the methylation status of a region is veryhard. Hence false region status inference could affect our inference accuracy as shown in Supplementary Figure 4.

      • Lastly and most importantly, the reason behind this discrepancy is the modelling ofepimutation and methylation between sites and regions. As we discuss, the current combination of rates and models is still limited to describe the observed diversity along the genome (as we intend in SMC methods). This is in contrast to the recent study by Yao et al. where very few regions of polymorphic SMPs are chosen, which implicitly avoids the influence of the methylation region effect. A study just published by Biffra et al. (Cell reports 2023) also uses a functional model of methylation modelling using a mix of region and site epimutation, albeit not tuned for evolutionary analyses. Thus we suggest, in line with functional studies, that epimutations are not independent from the local methylation context and may tend to stabilize the methylation state of a region. Therefore, the estimated methylation rates show a discrepancy to the previously measured ones. Indeed, the biological experiment would reveal a fast epimutation rate because epimutations can actually be tracked at sites which can mutate, while region mutation rate is much slower. However, because the methylation state of a region is rather stable through time it would reduce the methylation diversity over long time scale, and these rates would differ between methylated or unmethylated regions (i.e. the methylation rate is higher in methylated regions). Our results are thus in agreement with the observation by Biffra et al. that region methylation modelling is needed to explain patterns of methylation across the genome.

      To solve the discrepancy, one would need to develop a theoretical region + site epimutation model capable of describing the observed diversity at the evolutionary time scale (possibly based on the Biffra et al. model within an underlying population evolution model), and then use this model to reanalyse the sequence data from the biological experiment (i.e. in de Graaf et al. 2015 & Denkena et al. 2022) to re-estimate the methylation region sizes and epimutation rates.

      Minor comments:

      • 189: "SMCtheo" first occurs here, but it's not mentioned until 247 that this is the newmethod being presented.

      Answer : Fixed

      • 199: Are the estimates in this section from a single diploid sequence? Or is it n=5 (diploid) as mentioned in the earlier section?

      Answer : Yes, those results were obtained with 5 diploid individuals. We added it in the Table 1 description.

      • 336: I'm confused by the wording: it sounds like the test rejects the null if there is positivecorrelation in the methylation status across sites. But then, shouldn't 339 read "if the test is significant" (not non-significant)?

      Answer : We apologize for the confusion and rewrote the sentence line 339-348, the choice of word was indeed misleading .

      • Fig. 6: for some reason fewer simulations were run for 10Mb (panels C nad D) than for100Mb (A and B). Since it's very difficult to tell what's happening on average in the 10Mb case, I suggest running the same number of simulations.

      Answer : Yes we understand your concern. Actually, the same number of simulations were run but we plotted only the first 3 runs as it was less visually confusing. We now have added the missing lines to the plot C and D.

      Typos:

      • 104: "or or"

      • 292: build => built

      • 388: fulfil

      • 683: sample => samples

      Answer : Many thanks to reviewer 1 for pointing out the typos. They are all now fixed.

      Reviewer #2 (Recommendations For The Authors):

      The authors may find some valuable information in Pisupati et al (2023) "On the causes of gene-body methylation variation in Arabidopsis thaliana" on interpreting epimutation rates.

      Answer: Many thanks for the recommended manuscript. We add it to the cited literature as it strongly supports our use of heritability or methylation. We also added the recent Biffra et al. paper.

      Reviewer #3 (Recommendations For The Authors):

      There are many places throughout the manuscript with minor grammatical errors. Please review these. A few noted below as I read:

      L104: extra "or"

      L123: built not build

      L 160 "relies" instead of "do rely"

      L161 "events"

      L 336 "from methylation data"

      L 378 "exists"

      L 379 "regions are on average shorter" instead of "there are shorter"

      L 338 "a regional-level"

      L 349 "," instead of "but"

      L 394 DMRs

      Table 1 legend: parentheses not brackets?

      Answer : Many thanks to reviewer #3 for finding those mistakes. They are all now fixed.

      I think a paragraph in the discussion of considerations of when to use this approach might be helpful to readers. Comparison to e.g. increased sample size in MSMC2, while not necessary, might be helpful here. It may often be the case that doubling the number of haplotypes with SNP data may be easier and cheaper estimating methylation accurately.

      Answer : We discuss (lines 691-698) that our approach is always useful by design, but cannot always be used for the same purpose. If the evolutionary properties of the used marker used are not understood, we suggest that our approach can be used to investigate the marker heritability process through generations. This could help to correctly design experiments aiming to study the marker heritability through lineages. And if the properties of the marker are well understood and modelled, it can be integrated into the SMC framework to improve inference accuracy.

      Other minor notes:

      L 486 "known" is a stretch. empirically estimated seems appropriate.

      Answer : Fixed

      L 573 ARG? You are not estimating the full ARG here.

      Answer : We apologize for the wrong choice of word and have rephrased the sentence.

      Fig. 2 is not super useful and could be supplemental.

      Answer : We moved Figure 2 to the appendix (now sup fig 1)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses: The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      We agree on the reviewers observation about the evidence on seasonal shift in the host use pattern in Cx. quinquefasciatus populations from southern latitudes. We include a paragraph in the Introduction section regarding this. Unfortunately, studies conducted in South America to understand host use by Culex mosquitoes are very limited, and there are virtually no studies on the seasonal feeding pattern. In Argentina, there is some evidence (Stein et al., 2013, Beranek, 2019) regarding the seasonal change in host use by Culex species, including Cx. quinquefasciatus, where the inclusion of mammals during the autumn has been observed. As part of a comprehensive study on characterising bridge vectors for SLE and WN viruses, our research group is currently working on the molecular identification of blood meals from engorged females to gain deeper insights into the seasonal feeding pattern of Culex mosquitoes. While the seasonal change in host use by Culex quinquefasciatus has not been reported in Argentina so far, there has been an observed increase in reported cases of SLE virus in humans between summer and fall (Spinsanti et al., 2008). It is based on this evidence that we hypothesise there is a seasonal change in host use by Cx. quinquefasciatus, similar to what occurs in the United States. This is also considering that both countries (Argentina and the United States) have regions with similar climatic conditions (temperate climates with thermal and hydrological seasonality). Since we work on the same species and in a similar temperate climate regimen, we assumed there is a seasonal shift in the host use by this mosquito species.

      Reviewer #1 (Recommendations for the authors):

      Abstract

      Line 23: fed on two different hosts.

      Accepted as suggested.

      I think the concluding statement should be rewritten to say that immediate reproductive outcomes do not explain the shift in host use pattern of Cx. quinquefasciatus mosquitoes from birds to mammals towards autumn.

      Accepted as suggested.

      Introduction

      No comments.

      Materials and Methods

      Please mention sample sizes in the text as well (n = ?) for each treatment.

      Accepted as suggested.

      Page 99: ......C. quinquefasciatus, since C. pipiens and its hybrids are present as well in Cordoba.

      Accepted as suggested.

      Results – Line 146: subsequently instead of posteriorly

      Accepted all changes as suggested.

      Line 148: were counted instead of was counted.

      Accepted all changes as suggested.

      Line 160: Subsequently instead of posteriorly

      Accepted all changes as suggested.

      Line 171: on fertility

      Accepted all changes as suggested.

      Line 174: there was an interaction effect on…

      Accepted all changes as suggested.

      Line 175: there were no differences in the number of eggs

      Accepted all changes as suggested.

      Discussion

      I think the first paragraph in the discussion section is redundant and should be deleted.

      The whole discussion was rewritten to be focused on our aims and results.

      Line 282: this sentence needs to be rewritten.

      Accepted as suggested.

      Line 299: at 28{degree sign}C

      Line 300: at 30{degree sign}C

      Sorry, but we are not sure about your comment here. We checked. Temperatures are written as stated, 28°C and 30°C.

      Line 363: I think the authors need to discuss more about the bigger question they were addressing. I think that the discussion section can be strengthened greatly by elaborating on whether there is evidence for a seasonal shift in host use pattern in Cx. quinquefasciatus in the southern latitudes. If yes, what alternate mechanisms they believe could be driving the seasonal change in host use in this species in the southern latitudes now that they show the 'deriving reproductive advantages' hypothesis to be not true for those populations.

      Thanks for this observation. We agree and so the Discussion section was restructured to align it with our results, as suggested.

      Reviewer #2 (Public Review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness in birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used a generalized linear mixed model to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity and fertility and a null model analysis via data randomization for hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite of that hypothesized. While this finding is interesting and worth reporting, there are significant issues with the experimental design and the conclusions that are drawn from the results, which are described below. These issues should be addressed to make the findings trustworthy.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) There is no replication built into this study. Egg lay is a highly variable trait, even within treatments, so it is important to see replication of the effects of treatment across multiple discrete replicates. It is standard practice to replicate mosquito fitness experiments for this reason. Furthermore, the sample size was particularly small for some groups (e.g. 15 egg rafts for the second gonotrophic cycle of mice in the autumn, which was the only group for which a decrease in fecundity and fertility was detected between 1st and 2nd gonotrophic cycles). Replicates also allow investigators to change around other variables that might impact the results for unknown reasons; for example, the incubators used for fall/summer conditions can be swapped, ensuring that the observed effects are not artefacts of other differences between treatments. While most groups had robust sample sizes, I do not trust the replicability of the results without experimental replication within the study.

      We agree egg lay is a variable trait and so we consider high numbers of mosquitoes and egg lay during experiments compared to our studies of the same topics. Evaluating variables such as fecundity, fertility, or other types of variables (collectively referred to as "life tables") is a challenging issue that depends on several intrinsic and extrinsic factors. Because all of this, in some experiments, sample sizes might not be very large, and in several articles, lower sample sizes could be found. For instance, in Richards et al. (2012), for Culex quinquefasciatus, during the second gonotrophic cycle, some experiments had 13 or even 6 egg rafts. For species like Aedes aegypti, the sample size for life table analysis is also usually small. As an example, Muttis et al. (2018) reported between 1 and 4 engorged females (without replicates). In addition, small sample size would be a problem if we would not have obtained any effect, which is not the case due to the fact that we were interested in finding an effect, regardless of the effect size. Because of this, we do find our sample sizes quite robust for our results.

      Regarding the need to repeat the experiments in order to give more robustness to the study we also agree. However, after a review of the literature (articles cited in the original manuscript), it is apparent that similar experiments are not frequently repeated as such. Examples of this are the studies of Richards et al. (2012), Demirci et al. (2014) or Telang & Skinner (2019), which even they manipulate several cages at a time as “replicates”, they are not true replicates because they summarise and manipulate all data together, and do not repeat the experiment several times. We see these “replicates” as a way of getting a greater N.

      As was stated by the reviewer, repetition is a resource and time-consuming activity that we are not able to do. Replicating the experiment poses a significant time and resources challenge. The original experiment took over three months to complete, and it is anticipated that a similar timeframe would be necessary for each replication (6 months in total considering two more replicates). Given our existing commitments and obligations, dedicating such an extensive period solely to this would impede progress on other crucial projects and responsibilities.

      Given the limitations of resources and time and the infrequent use of experimental replication in this type of studies, we performed a simulation-based analysis via a Monte Carlo approach. This approach involved generating synthetic data that mimics the expected characteristics of the original experiment and subsequently subjecting it to the same analysis routine. The main goal of this simulation was to evaluate the potential spuriousness and randomness of the results that might arise due to the experimental conditions. So, evaluating the robustness and confidence of our results and data.

      (2) Considering the hypothesis is driven by the host switching observed in the field, this phenomenon is discussed very little. I do not believe Cx. quinquefasciatus host switching has been observed in Argentina, only in the northern hemisphere, so it is possible that the species could have an entirely different ecology in Argentina. It would have been helpful to conduct a blood meal analysis prior to this experiment to determine whether using an Argentinian population was appropriate to assess this question. If the Argentinian populations don't experience host switching, then an Argentinian colony would not be the appropriate colony to use to assess this question. Given that this experiment has already been conducted with this population, this possibility should at least be acknowledged in the discussion. Or if a study showing host switching in Argentina has been conducted, it would be helpful to highlight this in the introduction and discussion.

      Thanks for this observation. We agree. However, we conducted the experiment beside host use data from Argentina since we used the mosquito species, and the centre region of Argentina (Córdoba) has a similar temperate weather regimen that those observed in the east coast of US.

      We are aware that few studies regarding host shifting in South America are available, some such that those conducted by Stein et al. (2013) and Beranek (2019) reported a moderate host switch for Culex quinquefasciatus in Argentina. We have already performed a study about seasonal host feeding patterns for this species. However, even though there are few studies regarding host shifting, our hypothesis is based mainly in the seasonality of human cases of WNV and SLEV, a pattern that has been demonstrated for our region, see for example the study of Spinsanti et al. (2008).

      We include a new paragraph in the Introduction and Discussion sections. Please see answers Reviewer #1.

      (3) The impacts of certain experimental design decisions are not acknowledged in the manuscript and warrant discussion. For example, the larvae were reared under the same conditions to ensure adults of similar sizes and development timing, but this also prevents mechanisms of action that could occur as a result of seasonality experienced by mothers, eggs, and larvae.

      We understand the confusion that may have arisen due to a lack of further details in the methodology. If we are not mistaken, you are referring to our oversight regarding the consideration of carry-over effects of larvae rearing that could potentially impact reproductive traits. When investigating the effects of temperature or other environmental factors on reproductive traits, it is possible to acclimate either larvae or adults. This is due to the significant phenotypic plasticity that mosquitoes exhibit throughout their entire ontogenetic cycle. In our study, we followed an approach similar to that of other authors where the adults are exposed to experimental conditions (temperature and photoperiod). For a similar approach you can refer to the studies conducted by Ferguson et al. (2018) for Cx. pipiens, Garcia Garcia & Londoño Benavides (2007) for Cx. quinquefasciatus or Christiansen-Jucht et al. (2014, 2015) for Anopheles gambiae.

      (4) There are aspects of the data analysis that are not fully explained and should be further clarified. For example, there is no explanation of how the levels of categorical variables were compared.

      The methodology and statistical analysis were expanded for a better understanding.

      (5) The results show the opposite trend as was predicted by the authors based on observed feeding switches from birds to mammals in the autumn. However, they only state this once at the end of the discussion and never address why they might have observed the opposite trend as was hypothesized.

      The discussion was restructured to focus on our results and our model.

      (6) Generally speaking, the discussion has information that isn't directly related to the results and/or is too detailed in certain parts. Meanwhile, it doesn't dig into the meaning of the results or the ways in which the experimental design could have influenced results.

      As mentioned above, the discussion was restructured to reflect our findings. We also included the effect that our design might have influenced our results. However, as stated above we do not fully agree that the design is inadequate for our analysis, we performed standard protocols followed by other researchers and studies in this research field.

      (7) Beyond the issue of lack of replication limiting trust in the conclusions in general, there is one conclusion reached at the end of the discussion that would not be supported, even if additional replicates are conducted. The results do not show that physiological changes in mosquitoes trigger the selection of new hosts. Host selection is never measured, so this claim cannot be made. The results don't even suggest that fitness might trigger selection because the results show that physiological changes are in the opposite direction as what would be hypothesized to produce observed host switches. Similarly, the last sentence of the abstract is not supported by the results.

      We agree with this observation. However, we did not evaluate the impact of fitness on host selection in this study. Instead, we aimed to investigate the potential influence of seasonality on mosquito fitness as a potential trigger for a shift in host selection. We agree that we have incorrectly used the term “host selection” when we should actually be discussing “host use change”. Our results indicate a seasonal alteration in mosquito fitness in response to temperature and photoperiod changes. Building upon this observation, we re-discussed our hypothesis and theoretical model to explain this seasonal shift in host use.

      (8) Throughout the manuscript, there are grammatical errors that make it difficult to understand certain sentences, especially for the results.

      All English grammar and writing of the manuscript was revised and corrected to be easily understood.

      This study is driven by an interesting question and has the potential to be a valuable contribution to the literature.

      Reviewer #2 (Recommendations for The Authors):

      I hope that the authors will consider the suggested revisions and experimental replication to improve the quality of the study and paper.

      This study tests a very interesting hypothesis. I understand that additional replicates are difficult to conduct, but I do believe that fitness studies absolutely require experimental replicates. Unless you are able to replicate the observed effects, I personally would not trust the results of this study. I hope that you will consider conducting replicates so that this important question can be answered in a more robust manner. Below, I expand upon some additional points in the public review and also provide more specific suggestions. I provided some copy-editing feedback, but was not able to point out all grammatical mistakes. I suggest that you use ChatGPT to help you edit the English. For example, you can feed ChatGPT your MS and ask it to bold the grammatical errors or you can ask it to edit grammatical errors and bold the sections that were edited. I understand that writing in a second language is very difficult (from personal experience!), so I view ChatGPT as a great tool to help even the playing field for publishing. Below are line item suggestions. Apologies that wording is curt, I was trying to be efficient in writing.

      20-21: I suggest that you emphasize that you are investigating the interactive effect.

      Accepted as suggested.

      22: they weren't "reared" (from larvae) in different conditions, they were "maintained" as adults

      Accepted as suggested.

      26-27: increased/decreased is a bit misleading since you did not evaluate these groups sequentially in time. It might be more accurate to describe it as less than/greater than. Also, if you say increased/decreased or less than/greater than, you should always say what you are comparing to. The same applies throughout the MS.

      Accepted as suggested.

      29-30: "finding the" is not correct here; could be "with the lowest..."

      Accepted as suggested.

      34-36: I do not think that your results suggest this, even if you were to replicate the results of this experiment. You haven't shown metabolic changes.

      We understand the point. Accepted as suggested.

      42-44: "one of the main responsible" should be "one of the main species responsible..."

      Accepted as suggested.

      48: I think that "host preference" is better than selection here; -philic denotes preference

      Accepted as suggested.

      50: "Moreover" isn't the correct transition word here

      Accepted as suggested.

      57: "could" isn't correct here; consider saying "... species sometimes feed primarily on mammal hosts, including humans, in certain situations."

      Accepted as suggested.

      58: Different isn't correct word here

      Accepted as suggested.

      60: delete "feeding"

      Accepted as suggested.

      66-68: I am not familiar with any blood meal analysis studies in the southern hemisphere that show host switching for Culex species between summer and autumn. If this hasn't been shown, then this critique of the host migration hypothesis doesn't make sense.

      There are some studies pointing this out (Stein et al., 2013, Beranek 2019), and unpublished data from us). However, our hypothesis has supported by epidemiological data observed in human population which indicate a seasonal activity pattern. It was explained in depth in the Introduction section.

      68: ensures is not the right word; I suggest "suggests"

      Accepted as suggested.

      68-70: this explanation isn't clear to me; please revise

      It will be revised. Accepted as suggested.

      70: change cares to care

      Accepted as suggested.

      76-77: can you explain how they were not supported by the data for the benefit of those who are not familiar with these papers please?

      Accepted as suggested.

      87-89: I suggest the following wording: "In the autumn, we expect a greater number of eggs (fecundity) and larvae (fertility) in mosquitoes after feeding on a mammal host compared to an avian host, and the opposite relationship in the summer."

      Accepted as suggested.

      99: edit for grammar

      Accepted as suggested.

      102: suggest: "...offered a blood meal from a restrained chicken twice a month"

      Accepted as suggested.

      107: powder

      Accepted as suggested.

      108: inbred? Is this the term you meant to use?

      Changed as suggested.

      109: "several" cannot be used to describe 20 generations; suggest using "over twenty generations"; also, it would be good to acknowledge in your discussion that lab adaptation could force evolution, especially since mosquitoes are kept at constant temperatures and fed with certain hosts (with easy access) in the lab. Also, it would be good to know when the experiments were conducted to know the lapse of time between the creation of the colony and the experiments.

      Accepted as suggested.

      110-111: Does humidity vary between summer and fall in Córdoba? If so, I suggest acknowledging in the discussion that if humidity differences are involved in a potential interaction between host species and seasonality, then this would not have been captured by your experimental design.

      Several variables change during seasons. We were interested in capturing the effects of temperature and photoperiod, since humidity is a variable difficult to control.

      113-116: I suggest combining into one sentence to make more concise.

      Accepted as suggested.

      135: You might be obscuring the true impact of seasonality by rearing the larvae under the same conditions. There may be signals that mothers/eggs/larvae receive that influence their behavior (e.g. I believe this is the case for diapause), so this limitation should also be acknowledged. I understand why you decided to do this to control for development time and size, but it is something that should be considered in the discussion.

      As it was explained above, Cx. quinquefasciatus do not suffer diapause in our country. Maintaining mosquitoes from adults was an approach selected by us based on other studies.

      138: edit: "with cotton pads soaked in... on plastic..."; what is plastic glass? Do you mean plastic dishes?

      Accepted as suggested.

      141: here and throughout paragraph, full should be "fully"

      Accepted as suggested.

      144: located should be "placed"

      Accepted as suggested.

      147: suggest editing to "at which point, they were fixed with 1 mL of 96% ethanol and the number of L1 larvae per raft was counted."

      Accepted as suggested.

      154-155: edit for grammar

      Accepted as suggested.

      157: Your GLM explanation doesn't say anything about how you made pairwise comparisons between your levels; did you use emmeans?

      This revised version includes a more detailed methodology and statistical analysis. Accepted as suggested.

      158-160: I don't understand why you took this approach - it seems strange to me to use this analysis, but I am not familiar with it, so it might be that I lack the knowledge to be able to adequately evaluate. Please provide more explanation so that readers can better understand this analysis. A citation for this kind of application of the analysis would be helpful.

      It was changed to be in accordance with the remaining analyses.

      173: replace neither with either

      Accepted as suggested.

      174: this applies throughout; edit to : "An interaction effect was observed..."

      Accepted as suggested.

      175: "it was not found" is grammatically incorrect; instead : "We did not find ..." or "no differences in... were detected", etc

      Accepted as suggested.

      183: "it was detected" is grammatically incorrect

      Accepted as suggested.

      185-186: "being this treatment... in terms of fitness": I do not understand what this means. Please rephrase

      Accepted as suggested.

      170-199: you should provide the effect sizes and p values in text and/or in the figure for the pairwise comparisons

      Accepted as suggested.

      193-196. These two sentences are confusing and I am not sure what you mean, especially in the first sentence.

      It was rewritten. Accepted as suggested.

      Figure 1: This figure is great and easy to read and interpret! Thank you for the comment! 218-219: it is important to state which mosquito species you are referring to here.

      Accepted as suggested.

      226-227: you definitely should acknowledge the small sample size here.

      Considered.

      227: "it was observed" should be "We observed" or "A greater hatching rate.... was observed."

      Accepted as suggested.

      228-229: is the result really comparable even though you took very different approaches to the analysis for these outcomes?

      Changed to be comparable.

      230-278: the discussion of these hypotheses is too long and detailed, especially since the comparison of mouse vs chicken wasn't your main question; you really wanted to understand this in the context of seasonality. I suggest cutting this down a lot and making room to dig into your results more, and also to discuss the potential impacts of your experimental design/limitations on the results.

      Discussion was changed to focus on our results and model. Accepted as suggested.

      281: Hoffman is an old citation; I suggest you cite a modern review.

      Accepted as suggested. We deleted it due to the re-writing of the manuscript.

      282: "It can be recognise".. I am not sure what you are trying to say here

      Accepted as suggested.

      1. After the first time you write a species name, you can abbreviate the genus in all future mentions unless it is at the beginning of a sentence.

      Accepted as suggested.

      303-305: Revise this sentence. E.g "Fewer studies are available regarding photoperiod and show mixed results; Mogi (1992) found that mid and long day lengths induced greater fecundity while Costanzo et al. (2015) did not find differences in fecundity by day length."

      Accepted as suggested.

      315-316: typically, unpublished data shouldn't be referenced; I'm not sure if eLife has a policy on this.

      We will check this with eLife guidelines. However, since the lack of evidence on this pattern we consider important to include this unpublished data.

      316: Aegypti should be lowercase

      Accepted as suggested.

      328-330: This sentence is redundant with the first sentence of the paragraph

      Accepted as suggested.

      321-336: You never reintroduced your hypothesis in your discussion. I suggest that you center your whole discussion more directly around the hypothesis that motivated the study. If you decide not to restructure your discussion, you should at least reintroduce your hypothesis here and discuss how your results do not support the hypothesis.

      Accepted as suggested.

      337-348: This paragraph is a bit confusing as you jump between fertility and hatchability

      Accepted as suggested.

      353: is viral transmission the right word to use here? I think you might mean bridge vector transmission to humans specifically?

      Accepted as suggested.

      357: you say "neither" but never define which traits you are referring to

      Accepted as suggested.

      361: I suggest "two variables previously analyzed separately..."

      Accepted as suggested.

      General: There is no statement about the availability of data; it is eLife policy to require all data to be publicly available. Also, it would be helpful to share your code to help understand how you conducted pairwise comparisons, etc.

      In the submission it was not mentioned anything about data availability. However, all data and scripts will be uploaded with the VOR if it is required.

      Recommendations for the authors:

      I found your study interesting and potentially promising. However, there are some fundamental problems with the study design and the hypothesis, including:

      <(1) Seasonality simulation - Seasonality is strongly associated with time, so it is unusual to simulate seasonal factors without accounting for time. The actual factors associated with seasonal change in reproductive output may be neither a difference in host blood meal nor temperature and photoperiod. It is therefore, odd to reduce seasonality to a difference in photoperiod and temperature in summer and autumn without even mentioning the time of year when the experiment was carried (except for the mention of February as the time the stock samples were collected from the wild).

      The temperature and photoperiod settings are established according to a representative day in both autumn and summer. To determine these settings, we utilized climate data spanning a 3-year period (2020-2022), encompassing the most frequently occurring temperatures and day lengths. The weather conditions remained notably consistent throughout this time frame, which is why the specific year was not mentioned. Moreover, including the year in laboratory experiment details is uncommon, as evident in various papers. This practice can be corroborated by referring to multiple sources (cited in the original manuscript). We mention this in the new version.

      (2) Hypothesis - While the hypothesis alludes to the 'reason' for seasonal host shift, the prediction is on the outcome of the interaction between blood meal type and season.

      It might be nicer to frame your hypothesis to be consistent with the aim, which is, testing the partial contributions of blood meal type, versus photoperiod and temperature to seasonal change in the reproductive output of Culex quinquefasciatus. A hypothesis like that can be accompanied by alternative predictions according to the expected individual and interactive effects of both factors.

      It was rewritten in the revised version to be consistent with our predictions and findings.

      Blood meal type, temperature, and photoperiod are all components of seasonality, so the strength of the study is its potential to decouple the effect of blood meal type from that of temperature and photoperiod on the seasonal reproductive output of Culex quinquefasciatus by comparing the two blood meal types under simulated summer and winter conditions. Ideally, this should have been over a natural summer and winter because a natural time difference captures the effect of other seasonal factors other than temperature and photoperiod.

      Furthermore, the hypothesis stemmed from field observations, while the study itself was conducted under laboratory conditions using a local population of Culex quinquefasciatus from Argentina. It remains uncertain whether there is supporting evidence for a seasonal shift in host usage in Culex quinquefasciatus from the stock population. Discussing the field observations within the stock population would provide valuable insights.

      It was considered in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study seeks to disentangle the different selective forces shaping the evolutionary dynamics of transposable elements (TEs) in the wild grass Brachypodium distachyon. Using haplotype-length metrics, and genetic and environmental differentiation tests, the authors present in large parts convincing evidence that positive selection on TE polymorphisms is rare, and that the distribution of TE ages points to purifying selection being the main force acting on TE evolution in this species. A caveat of this study, as of other studies that seek to assess TE insertion polymorphisms with short reads, is that the rates of false negatives and false positives are difficult to estimate, which may have major effects on the interpretation. This study will be relevant for anyone interested in the role of TEs in evolution and adaptation.

      Thank you for considering our manuscript for publication in eLife. We appreciate the constructive comments and suggestions of the reviewers. We have addressed the raised issues by the reviewers. Below, we provide a more detailed response to each of the reviewer comments.

      Public Reviews:

      Reviewer #1:

      The study presented in this manuscript presents very convincing evidence that purifying selection is the main force shaping the landscape of TE polymorphisms in B. distachyon, with only a few putatively adaptive variants detected, even though most conclusions are based on the 10% of polymorphisms contributed by retrotransposons. That first conclusion is not novel, however, as it had already been clearly established in natural A. thaliana strains (Baduel et al. Genome Biol 2021) and in experimental D. simulans lines (Langmüller et al. NAR 2023), two studies that the authors do not mention, or improperly mention. In contrast to the conclusions reached in A. thaliana, however, Horvath et al. report here a seemingly deleterious effect of TE insertions even very far away from genes (>5kb), a striking observation for a genome of relatively similar size. If confirmed, as a caveat of this study is the lack of benchmarking of the TE polymorphisms calls by a pipeline known for a high rate of false positives (see detailed Private Recommendations #1), this set of observations would make an important addition to the knowledge of TE dynamics in the wild and questioning our understanding of the main molecular mechanisms through which TEs can impact fitness.

      Thank you for your positive evaluation of our paper. We have now adjusted the manuscript to include the mentioned studies (Line 330-333) and to address the issue of false positive and false negative calls. The detailed responses to all the raised points are below.

      Reviewer #2:

      Summary:

      Transposable elements are known to have a strong potential to generate diversity and impact gene regulation, and they are thought to play an important role in plant adaptation to changing environments. Nevertheless, very few studies have performed genome-wide analyses to understand the global effect of selection on TEs in natural populations. Horvath et al. used available whole-genome re-sequencing data from a representative panel of B. distachyon accessions to detect TE insertion polymorphisms (TIPs) and estimate their time of origin. Using a thorough combination of population genomics approaches, the authors demonstrate that only a small amount of the TE polymorphisms are targeted by positive selection or potentially involved in adaptation. By comparing the age-adjusted population frequencies of TE polymorphisms and neutral SNPs, the authors found that retrotransposons are affected by purifying selection independently of their distance to genes. Finally, using forward simulations they were able to quantify the strength of selection acting on TE polymorphisms, finding that retrotransposons are mainly under moderate purifying selection, with only a minority of the insertions evolving neutrally.

      Strengths:

      Horvath et al., use a convincing set of strategies, and their conclusions are well supported by the data. I think that incorporating polymorphism's age into the analysis of purifying selection is an interesting way to reduce the possible bias introduced by the fact that SNPs and TEs polymorphisms do not occur at the same pace. The fact that TE polymorphisms far from genes are also under purifying selection is an interesting result that reinforces the idea that the trans-regulatory effect of TE insertions might not be a rare phenomenon, a matter that may be demonstrated in future studies.

      Weaknesses:

      TEs from different classes and orders strongly differ in multiple features such as size, the potential impact of close genes upon insertion, insertion/elimination ratio (ie, MITE/TIR excision, solo-LTR formation), or insertion preference. Given such diversity, it is expected that their survival rates on the genome and the strength of selection acting on them could be different. The authors differentiate DNA transposons and retrotransposons in some of the analyses, the specificities of the most abundant plant TE types (ie, LTR/Gypsy, LTR/Copia, MITE DNA transposons) are not considered.

      The authors used a short-read-based approach to detect TIPs and TAPs. It is known that detecting TE polymorphisms is challenging and can lead to false negatives, depending on the method used and the sequencing coverage. The methodology used here (TEPID) has been previously applied to other species, but it is unclear if the sensitivity of the TIP/TAP caller is equivalent to that of the SNP caller and how these potential differences may affect the results.

      Thank you for your positive evaluation of our paper. We have now adjusted the manuscript and the discussion to include the mentioned points on the different TE superfamilies and the reliability of the TE calls. The detailed responses to all the raised points are below.

      Private Recommendations:

      Reviewer #1:

      (1) TE polymorphisms (presence and absence variants) were called from short-read sequencing data using a pipeline (TEPID, Stuart et al. eLife 2016) that is known to have a low specificity as well as a low sensitivity in its detection of presence variants (Baduel et al. MIMB 2021). An assessment of the rate of false positives and false negatives in the data presented in this study and how it varies across TE superfamilies is therefore of crucial importance as it may bias all downstream analyses, especially if it impacts the identification of polymorphisms contributed by retrotransposons, as these are the basis of most conclusions of the manuscript. Nonetheless, the fact that the PCA of the polymorphisms contributed by DNA transposons is less able to distinguish genetic clades than with those contributed by retrotransposons, suggests the issue of false positives is most preeminent for DNA transposons. However, high rates of false positives may explain why no significant increase in TE frequency is detected within selective sweep regions, a result that runs against the expectation of hitch-hiking of neutral or weakly deleterious polymorphisms which the authors claim is the category of many TE polymorphisms. Furthermore, given that the reference genome belongs to the B_east clade, and the TEPID is better at calling absence than presence it may bias analyses in this clade (where clade-specific insertions will take the form of absence in other clades which are well detected) compared to other clades (where clade-specific insertions will be presence polymorphisms and may be missed). A benchmark of TE polymorphism calls could be done by de novo assembling one genome from each clade or by cross-checking at least the presence variant calls from TEPID with those made with another of the many TE calling pipelines available.

      We agree with this issue raised by both reviewers regarding the effects of false negative and false positive TE calls. We also think that some reasonable follow-ups should be done to check the potential impact of the false negative and false positive TE calls on the presented results, without turning the manuscript in a method comparison paper as this is not the main goal of this study. Therefore, we generated a subsample of our dataset that included only accession with an average genome wide mapping coverages of at least 20x, as the false negative TE call rate is correlated with the mapping coverage and a high mapping coverage is expected to lead to a reduction in the false negative TE call rates. We then used this subsample to check if our results would change if our dataset had a lower false negative TE call rate. However, reducing the rate of false negative calls through the use of only higher coverage samples did not change our results and interpretations.

      Re-running the ANCOVA analyses revealed similar results regarding the accumulation of TEs in selective sweep regions. This was added to the main text Line 143-148: “Similar results were obtained when investigating the number of fixed TE polymorphisms (Additional file 2: Table S1) and the allele frequency of TE polymorphisms (Additional file 2: Table S2) in high iHS regions using a subset of our dataset with an expected lower false negative TE call rate, that only included samples with a genome-wide mapping coverage of at least 20x (see Discussion and Materials and Methods for more details).” and in Additional file 2: Table S1 and S2.

      Further, we re-ran the age-adjusted SFS based on this subset of our dataset and found that the results and conclusions from the age-adjusted SFS were not only driven by false negative TE calls. This was also included in the text Line 338-349: “One caveat of the approach used in this study is that TE calling pipelines based on short-reads tend to have higher false positive and false negative call rates than SNP calling pipelines, which is also the case for the TEPID TE calling pipeline used here [57, 59]. A high false negative TE calling rate however might bias our TE frequency estimates toward lower frequencies, which could drive the observed patterns in the age-adjusted SFS. To assess if the false negative TE calling rate in our study substantially affected our results, we re-run the age-adjusted SFS on a subset of our dataset only including samples with a genome-wide mapping coverage of at least 20x, as higher mapping coverages are expected to reduce the false negative call rate [27, 59]. Using the TE allele frequencies estimated based on this subset of our data to estimate  frequency revealed similar results of the age-adjusted SFS based on the whole dataset (Additional file 1: Fig. S9), indicating that our observation of retrotransposons evolving under purifying selection is not solely driven by a high false negative TE calling rate.” and in Additional file 1: Fig. S9.

      The details of this analyses have been added to the materials and methods Line 493-498: “Mapping coverage is known to influence false discovery rate [27, 59]. To investigate the impact of false positive and false negative TE calls on our results, we down sampled the TE dataset to only include TEs that have been called in samples that had at least an average mapping coverage of 20x. The allele frequencies of TEs present in our high coverage dataset was recalculated only considering samples with at least an average mapping coverage of 20x. This second TE dataset was then used to check if using a dataset with a higher mapping coverage and presumably a lower false TE calling rate impacted our results.”

      (2) If confirmed, the observation that retrotransposons located more than 5kb away from genes appear to be also affected by purifying selection (L209) is indeed surprising. The authors should add a comparison with SNPs at the same distance from genes to strengthen the claim and make sure it is not the result of mapping artifacts, such as alignment quality dropping far away from genes.

      We added a comparison of the age-adjusted SFS of SNPs and retrotransposons more than 5 kb away from genes to evaluate if the observed shape of the age-adjusted SFS of retrotransposons more than 5 kb away from genes were due to artefacts. The results are included on line 383-389: “Finally, we tested whether TE polymorphisms located more than 5 kb away from genes are evolving under purifying selection could be due to mapping or other artefacts by comparing the shape of the age-adjusted SFS of retrotransposons and SNPs more than 5 kb away from genes. However, the age-adjusted SFS of SNPs 5 kb away from genes differs from the one of retrotransposons (Additional file 1: Fig. S10), indicating that the shape of the age-adjusted SFS of retrotransposons more than 5 kb away from genes is not likely to be the result of artefacts in regions of the genome far away from genes.” and Additional file 1: Fig. S10.

      (3) The authors' claim that most TE polymorphisms are under weak to moderate purifying selection (L273) relies on the comparison of the age of polymorphisms in the oldest age bin with forward simulations. However, the conclusions from these comparisons cannot be extrapolated to the fitness effects of all TE polymorphisms as variants in the oldest age bin are de facto a biased sample of the variants of a category, a point the authors highlight.

      We adjusted the mentioned paragraph to better highlight this point. Line 390-397: “To further ascertain the strength of purifying selection, we used forward simulation and showed that simulations assuming a moderately weak selection pressure (S = -5 or S = -8) against TE polymorphisms best fitted our observed data. In theory, no TE polymorphisms under strong purifying selection should be present in a natural population, as such mutations are expected to be quickly lost, especially in a predominantly selfing species where most loci are expected to be homozygous. Therefore, it is not surprising that TE polymorphisms which persist in B. distachyon are under weak to moderate selection, as also shown, for example, for the L1 retrotransposons in humans [27] or the BS retrotransposon family in Drosophila melanogaster [62].”

      L220-228 for high-effect SNPs. Indeed, the most deleterious TE polymorphisms would be purged very quickly and never contribute to variants in the oldest age bin. Unless new arguments can be made to support this claim, this conclusion should be rephrased to claim instead that even the oldest TE polymorphisms are still mostly non-neutral and under weak to moderate purifying.

      This has been adjusted. Line 231-232: “. Hence, even the oldest retrotransposon polymorphisms seem to be mostly non-neutral and are affected by purifying selection.”

      L214: replace smaller with more negative for clarity.

      Done.

      L233: Given the discussion L220-228, the oldest age bin seems to be biased in its composition and thus not useful for comparisons. The sentence should therefore be rephrased to reflect that DNA transposon polymorphisms appear to be actually less deleterious than high-effect SNPs in S9A and B based on the penultimate age bin.

      This has been fixed.

      Reviewer #2:

      • I wonder if false negative detection could artificially increase the evidence for purifying selection by increasing the amount of low-frequency variants. This could be easily checked if long-read data or genome assembly is available for any of the samples in the collection, by comparing the TIP/TAP prediction with the actual sequence.

      We agree with this point from the reviewers that false negative calls can lead to misinterpretations of the observed low-frequencies of the TEs. (But see response to the first comment of reviewer #1). Unfortunately, long-read data from the sample used here are not available to estimate false negative call rates. However, to check if the observed results are manly driven by high false negative rates, we re-run the age-adjusted SFS based on samples with at least 20x mapping coverage, which should result in the reduction the false negative TE calling rate. The results and conclusions from this second analyses were included in the text Line 338-349: “One caveat of the approach used in this study is that TE calling pipelines based on short-reads tend to have higher false positive and false negative call rates than SNP calling pipelines, which is also the case for the TEPID TE calling pipeline used here [57, 59]. A high false negative TE calling rate however might bias our TE frequency estimates toward lower frequencies, which could drive the observed patterns in the age-adjusted SFS. To assess if the false negative TE calling rate in our study substantially affected our results, we re-run the age-adjusted SFS on a subset of our dataset only including samples with a genome-wide mapping coverage of at least 20x, as higher mapping coverages are expected to reduce the false negative call rate [27, 59]. Using the TE allele frequencies estimated based on this subset of our data to estimate  frequency revealed similar results of the age-adjusted SFS based on the whole dataset (Additional file 1: Fig. S9), indicating that our observation of retrotransposons evolving under purifying selection is not solely driven by a high false negative TE calling rate.” and in Additional file 1: Fig. S9.

      • Supplementary Figure S1. DNA transposons are much worse at separating the samples in comparison to LTR-retrotransposons. Doesn´t this suggest that these two classes have very different dynamics in the population and maybe different intensities of the selection forces acting on them? Could this profile be explained as DNA transposons being older and likely more fixed in all the clades, whereas retrotransposons are more recent and more specific to some populations? Another possibility might be that some B. distachyon DNA transposons had an unusually high excision rate. In any case, in my opinion, this reinforces the need to study the different TE orders in more detail.

      Indeed, different TE orders and superfamilies can have different excision rates, age distributions and be under different selective regimes. To investigate the possibility that different TE orders are affected by very different selective regimes, we split our TE dataset into the four different TE types: Copia, Ty3, Helitron and MITE. We than re-run the age-adjusted SFS analyses and added our results to the text Line 422-430: “To further examine our conclusion on purifying selection, we investigated the selective regime affecting different retrotransposons and DNA-transposons superfamilies. Thereby, we generated age-adjusted SFS for the four most common TE superfamilies Copia, Ty3 (also known under the name Gypsy, but we will avoid using this name because of its problematic nature see [71]), Helitron and MITE and found similar deviations of the  frequency from 0 in the four investigated TE superfamilies (Additional file 1: Fig. S12–S15). These results indicate that our conclusion on the broad effect of purifying selection is not driven by a single TE superfamily but is at least common among the four most numerous TE superfamilies.” and in Additional file 1: Fig. S12- S15.

      • Line 112: "most TE polymorphisms in our dataset were young and only a few were very old". Does this change substantially among TE orders/superfamilies?

      Indeed, there are some differences in the age distribution of the TEs depending on the superfamilies, However, the differences are no substantial as the age bins in the age-adjusted SFS of the different TE superfamilies are fairly similar. See Additional file 1: Fig. S12-S15.

      • Figure 2. Is difficult to read, especially lower panels. I think the grey border of the boxplots makes visualization difficult.

      The gray borders have been removed.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      Many of my specific issues have been addressed in the revision. However, the data shown in Reviewer Fig. 1 and 2 is not sufficiently described to assess it's reliability and these new data do not appear to have been integrated into the paper. A response that more clearly states how the manuscript has been revised to address the comments is necessary.

      We appreciate the opportunity to respond to your updated comments on our manuscript. We carefully considered the feedback and made changes to address the specific issues raised.

      In response to your question of insufficient description of the data shown in Reviewer Fig. 1 and 2, we would like to confirm that we have taken this feedback seriously. Supplementary data, including the information provided in Reviewer Figures 1 and 2, have been fully described and integrated into the body of the manuscript according to your request. We ensured that the reliability and significance of new data were clearly presented to enhance the overall synthesis of the manuscript.

      We are grateful to your valuable feedback, which undoubtedly contributed to the refinement of our manuscript. We hope that the revised version meets the standards of the journal and look forward to the opportunity for further deliberation.

      Reviewer #2 (Recommendations For The Authors):

      Additional feedback from the reviewer:

      "I think the authors have been responsive to my previous comments. However, I cannot find this new data in the main text but rather only in the response to reviewers. New data should be incorporated into the main text not the supplement as the controls are important to consider alongside the treatment groups. Lastly, while the authors include BODIPY in their approaches, their results are not quantitative. My suggestion was to include this data in a quantitative manner not just the images. Lastly, I am still somewhat puzzled about the connection with GABA. The rationale for its selection other than it was significantly changed is not strong."

      Thank you for providing us with the latest feedback. We appreciate the opportunity to address the specific concerns raised and provide a detailed response to each point.

      (1) Incorporation of New Data into the Main Text:

      We acknowledge the reviewer's comment regarding the incorporation of new data into the main text rather than solely in the response to reviewers. In response to this feedback, we have diligently revised the manuscript to ensure that the new data, including controls, is now seamlessly integrated into the main body of the text. This modification allows for a more comprehensive and contextual presentation of the data, as recommended by the reviewer.

      (2) Quantitative Presentation of BODIPY Results:

      We understand the importance of presenting quantitative data for the BODIPY results, and we appreciate the reviewer's suggestion to include this information in a quantitative manner, not just as images. In line with this valuable feedback, we have revised the relevant sections to incorporate quantitative data alongside the images, providing a more robust and comprehensive presentation of the results.

      (3) Rationale for the Selection of GABA:

      In the present study, in order to elucidate the molecular mechanisms through which pathway participates metformin-treated IR injury, we analysed gene expression profiles of each group mice, showing that similar mRNA changes are mainly concentrated in the three top pathways: lipid metabolism, carbohydrate metabolism, and amino acid metabolism. Given the close relevance between lipid metabolism and ferroptosis, and the fact of carbohydrate metabolism is a primary way to metabolize amino acids, 22 species of amino acid were detected in liver tissues using HPLC-MS/MS for further identification of key metabolites involved in the role of metformin against HIRI-induced ferroptosis. It was found that only GABA level is significantly increased by metformin treatment and FMT treatment, further verifying by the data of ELISA detection. Consequently, we identified GABA was the main metabolism of metformin protecting from HIRI and focus on the source of GABA generation.

      We would like to express our gratitude to your thorough evaluation and constructive feedback, which has undoubtedly contributed to the improvement of our manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study that provides new insights into the development and function of medullary thymus epithelial cells (mTEC). The authors provide compelling evidence to support their claims as to the differentiation and lineage outcomes of CCL21+ mTEC progenitors, which further our understanding of how central tolerance of T cells is enforced within the thymus.

      Public Reviews:

      Reviewer #1 (Public Review):

      The work by Ohigashi and colleagues addresses the developmental and lineage relationship of a newly characterized thymus epithelial cell (TEC) progenitor subset. The authors take advantage of an elegant and powerful set of experimental approaches to demonstrate that CCL21-expressing TECs appear early in thymus organogenesis and that these cells, which are centrally located, go on to give rise to medullary (m)TECs. What makes the findings intriguing is that these CCL21-expressing mTECs are a distinct subset, which do not express RANK or AIRE, and transcriptomic and lineage tracing approaches point to these cells as potential mTEC progenitor-like cells. Of note, using in vitro and in vivo precursor-product cell transfer experiments, the authors show that this subset has a developmental potential to give rise to AIRE+ self-antigen-displaying mTECs, revealing that CCL21-expressing mTECs can give rise to distinct mTEC subsets. This functional duality provides an attractive rationale for the necessary function of mTECs, which is to attract CCR7+ thymocytes that have just undergone positive selection in the thymus cortex to enter the medulla to undergo tolerance-induction against self-antigen-displaying mTECs. Overall, the work is well supported and offers new insights into the diverse functions of the medullary compartment, and how two distinct subsets of mTECs can achieve it.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to discover a developmental pathway leading to functionally diverse mTEC subsets. They show that Ccl21 is expressed early during thymus ontogeny in the medullary area. Fate-mapping gives evidence for the Ccl21 positive history of Aire positive mTECs as well as of thymic tuft cells and postnatally of a certain percentage of cTECs. Therefore, the differentiation potential of Ccl21+ TECs is tested in reaggregate thymus experiments - using embryonic or postnatal Ccl21+ TECs. From these experiments, the authors conclude that at least embryonic mTECs in large part pass through a Ccl21 positive stage prior to differentiation towards an Aire expressing or tuft cell stage.

      The authors are using Ccl21a as a marker for a bipotent progenitor that is detectable in the embryonic thymus and is still present at the adult stage mainly giving rise to mTECs. The choice of this marker gene is very interesting since Ccl21 expression can directly be linked to an important aspect in thymus biology: the expression of Ccl21 by cells in the thymic medulla allows trafficking of T cells into the medulla in order to undergo T cell selection.

      Making use of the Ccl21 detection, the authors can nicely show that cells actively expressing Ccl21 are localized throughout the medulla at an embryonic stage but also in adult thymus tissue. This suggests, that this progenitor is not accumulating at a specific area inside the medulla. This is a new finding.

      Moreover, the finding that a Ccl21+ progenitor population plays a functional role in thymocyte trafficking towards the medulla has not been described. Thus, Ccl21 expression may be used to localize a late bipotent progenitor in the thymic lobes.

      In addition, in Fig.8, the authors provide evidence that these progenitor cells have the potential to self-maintain as well as to differentiate in reaggregate experiments at E17 (not at 4 weeks of age). The first point is of great interest and importance since these cells in theory can be of therapeutic use.

      Overall assessment:

      The authors highlight a developmental pathway starting from a Ccl21-expressing TEC progenitor that contributes to a functionally diverse mTEC repertoire. This is a welcome addition to current knowledge of TEC differentiation.

      Reviewer #3 (Public Review):

      In this manuscript, the authors define the developmental trajectory resulting in a diverse mTEC compartment. Using a variety of approaches, including a novel CCL21-fate mapping model, data is presented to argue that embryonic CCL21-expressing thymocyte attracting mTECs naturally convert to into self-antigen displaying mTEC subsets, including Aire+ mTECs and thymic tuft cells. Perhaps somewhat surprisingly, a large fraction of cTECs were also marked for having expressed CCL21, suggesting that there exists some conversion of mTEC (progenitors) into cTEC, a developmentally interesting observation that could be followed up later. Overall, the experimental setup, writing, and conclusions, are all outstanding.

      Provisional author response

      We thank the editors and the reviewers for their supportive comments on our manuscript. We will revise the manuscript according to their helpful recommendations.

      Author response to recommendations

      We thank the editors and the reviewers for their supportive comments on our manuscript. We also thank the three reviewers for their helpful recommendations. We have revised the manuscript accordingly, as detailed below.

      Reviewer #1 (Recommendations For The Authors):

      There are several unanswered questions, which the authors themselves acknowledge, a principal one being whether CCL21+ mTECs represent a progenitor for yet another distinct subset of cortical (c)TECs, or whether they represent an intermediary or unique population of mTECs derived from a bipotent (cTEC/mTEC) progenitor. These questions will need to be addressed in future work as they go beyond the initial characterization of this intriguing mTEC subset.

      Indeed, our findings reported in this manuscript have stimulated many interesting questions, including those pointed out by the reviewer. We would like to address them one by one in our future work.

      The presence of GFP+ cTECs, which are lineage-traced as having expressed CCL21, begs the question as to whether these cells are generated as a consequence of later steps in mTEC differentiation or derived from earlier bipotent cells, which again the authors point out. The authors could discuss this further or perhaps experimentally address this by using a model system whereby mTEC differentiation is absent or halted (e.g., Relb ko, or TCRa/TCRd ko) and test whether GFP+ cTECs are still present.

      According to the suggestion, we have revised the manuscript by adding a statement that it is interesting to examine whether GFP+ cTEC development in Ccl21a-Cre x CAG-loxP-EGFP mice is mediated through RelB-dependent mTEC developmental progression or developing thymocyte-dependent mTEC-nurturing ‘crosstalk’ signals.

      Reviewer #2 (Recommendations For The Authors):

      Even though the manuscript highlights the functional aspect of a postnatal bipotent progenitor, there are several aspects that need further discussion.

      (1) The title is somewhat misleading since the identified TEC subset can not only be detected in embryonic, but also in postnatal thymus. Only the RTOC experiments indicate a higher developmental potential of TECs isolated from embryos, but this might as well be due to experimental difficulties as discussed in the text. Furthermore, Ccl21+ TECs are shown to differentiate postnatally into mTECs and cTECs, therefore this subset presumably belongs to a bipotent progenitor population described earlier (their ref. 22, 39).

      We are fully aware of previous studies showing that mTEC progenitors include cells that transcribe Ccl21a, and have cited them in the manuscript. The manuscript title describes our finding that thymocyte-attracting CCL21-expressing functional mTECs isolated from embryonic thymus show the capability to give rise to self-antigen-displaying mTECs. We thank the reviewer for further pointing out the possibility that postnatal CCLl21+ TECs include cells that retain the capability to differentiate into mTECs and cTECs.

      (2) In the introduction the authors claim that the "developmental progression of the self-antigen-displaying mTEC subset occurs in a single stream as mTEClow progenitors -> mTEChigh Aire-expressing cells -> mTEClow mimetic cells." line 79. So far it only could be shown that some mimetic cell types undergo an Aire+ stage; whether this is true for all mimetic cells remains to be shown. Therefore, this statement should be toned down.

      Following the suggestion, this sentence has been toned down in the revised manuscript.

      (3) In line 86, the reference to another paper, describing Ccl21a expression in a postnatal mTEC biased progenitor should be added: Nusser et al. Nature. 2022 PMID: 35614226, in which the developmental potential of the Ccl21 positive so-called postnatal progenitor is analysed by barcoding and results give evidence for differentiation into mature mTECs (see lines 94-96).

      As suggested, the Introduction of the revised manuscript now cites Nusser, et al. study showing that postnatal mTEC-biased progenitors include cells that transcribe Ccl21a.

      (4) Have a look at Extended Data Figure 2b of PMID: 35614226, wherein the population-specific gene expression pattern of the progenitor population at different time points is depicted. Ccl21a belongs to a group of genes, which identifies the postnatal progenitor, and indicates that its functionality and/or developmental potential is age-dependent. Therefore, it would be important to specify the age of the analysed mice throughout the text of the results part instead of describing them as "postnatal" only.

      As recommended, mouse age has been added to the revised manuscript and figures.

      (5) Line 113: "embryonic" needs to be replaced since the results of Fig. 1 are referring to 5-week-old mice.

      The manuscript has been revised per the reviewer’s suggestion.

      (6) Referring to Fig. 3g, line 173: It is interesting to see that, at 3 weeks of age, 95% of mTECs have a Ccl21-history but only approx. 70% of cTECs. Therefore, the earliest progenitor giving rise to the first cTECs might still be productive and feed into the cTEC lineage. This reporter would allow for the analysis of progenitor activity over time. The same could be done for mTECs since at E15 the tdTomato signal is still low compared to the assigned medullary area in Fig. 2c in order to detect when the Ccl21-expressing progenitor becomes the main source of mTECs. The finding in Fig. 4e (line196) also argues for the timed replacement of cTECs by a progenitor which locates to the medulla, thus, leading to a decline in Ccl21-history signal towards the subcapsular region at 2 weeks of age. This should be better explained/discussed.

      We appreciate the work of Nusser, et al. showing that postnatal mTEC-biased, but not embryonic cTEC-biased, TEC progenitors include cells that transcribe a detectable amount of Ccl21a (cited in the Introduction as ref. 23). It is important to clarify whether and how those postnatal TEC progenitors (23) overlap with the embryonic and postnatal CCL21-protein-expressing mTECs reported in this study. It is also interesting to shed light on how Ccl21a+ progenitors contribute to cTECs and mTECs over the ontogeny and whether the enrichment of Ccl21a+ progenitor-derived cTECs in the perimedullary area reflects a temporal replacement of cTECs derived from Ccl21a+ progenitors localized in the medulla. We would like to clarify these issues in our future work. The revised manuscript includes a discussion of these issues.

      (7) Line 304 and 355: Note that the "unstable" age-dependent gene expression profiles were already reported in Nusser et al. Nature. 2022. Not only Ccl21 expression, but other progenitor-specific genes also change their expression levels with age. The entirety of changes in gene expression during aging likely impacts the developmental potential of progenitor populations. These changes might be reflected in the negative results of the RTOC experiment using TECs of 4-week-old mice. The manuscript would benefit from a discussion in light of this "unstable" age-dependent gene expression.

      It is interesting to point out that the age-dependent difference in gene expression profiles, which was reported in TEC progenitors by Nusser, et al. (23), is also detected in CCL21-expressing mTECs in this study. Similarly to the recommendation no. 6 by reviewer 2, and as described in the revised manuscript, it is interesting to clarify whether and how embryonic and postnatal CCL21-expressing mTECs overlap with the previously reported TEC progenitors.

      (8) Line 321: as discussed above, the exact time point should be added to the text since the proportion of cTECs derived from a Ccl21+ progenitor is associated with a certain time point, "2/3 of cTECs" refers to 3 weeks of age.

      The manuscript has been revised following the reviewer’s suggestion.

      Reviewer #3 (Recommendations For The Authors):

      The one question I have, which may be more of a curiosity of this reviewer than a requirement for the manuscript, is whether thymocytes themselves are required for the conversion/maturation of attracting TECs to mTECs? For example, in CD3e-/- (or Rag-/-) mice, are mTECs arrested at the thymocyte attracting stage, or is the conversion process 'pre-programed'? In the same vein, do cTECs (or the immature cTECs) maintain CCL21 expression in the absence of mature thymocytes? These are not critical studies but are fairly straightforward (effort- and time-wise) that would aid in placing this process in the overall scope of thymus development.

      We previously showed that Aire+ mTECs are detectable in the thymus of RAG2-deficient mice, in which thymocyte development is arrested beyond the CD4/CD8 double-negative 3 stage (Hikosaka, et al. 2006; PMID: 18799150). In another work, we also showed that Aire+ mTECs and CCL21+ mTECs are detectable in the thymus of TCR-alpha-KO mice, which lack mature CD4/CD8 single-positive TCR-alpha/beta-expressing thymocytes (Lkhagvasuren, et al. 2013; PMID: 23585674). These results indicate that thymocyte maturation beyond the Rag-dependent stage is not essential for the development of Aire+ mTECs. Nonetheless, we agree with the reviewer pointing out that it is important to clarify how developing thymocytes contribute to the growth and differentiation of diverse TEC subpopulations, including GFP+ cTEC development in Ccl21a-Cre x CAG-loxP-EGFP mice. The revised manuscript includes a discussion of these issues.

    1. Author Response

      We thank eLife Senior Editor and reviewers for the comprehensive evaluation and constructive comment on our manuscript. We are grateful that all 3 reviewers recognize the value of the large pharmacological and proteomics screen of 51 cancer cell lines in relation to vitamin C IC50 values. As reviewer 1 points out, our findings are of interest as high dose vitamin C is in clinical trials. Most importantly, we show that all 51 cell lines tested can be killed at a dose range that is achievable by intravenous administration in the clinic. These pharmacological findings underscore high-dose vitamin C as a potent anti-cancer agent. Moreover, we provide an elaborate description of functional terms associated with the vitamin C IC50 values in the different cell panels (Figs 1-5) and the common denominators across panels (Figs 6, 7 and 8), thereby enhancing our biological insights of sensitivity to vitamin C treatment. This study indeed is of descriptive nature and our large scale pharmacological and proteomics scale dataset should be seen as a resource for further research. The raw and processed data will be available in the ProteomeXchange repository (accession number and reviewer password were provided before) and the resubmission will include all processed proteome and phosphoproteome data as a supplementary file.

      It is beyond the scope of our study to do mechanistic studies with knock-downs to see if we can further sensitize cancer cell lines that are less sensitive. We do not call these cell lines resistant as cell growth can be inhibited at a clinically achievable dose.

      In our detailed rebuttal we will follow up on the suggestion of reviewer 1 to put our data also in the context of NCI-60 growth inhibition data for other cytotoxic agents. This will expand our comparative analysis to cisplatin in the lung cancer panel (Fig 5A) where we show that vitamin C IC50 values and cisplatin IC50 values are not one-on-one correlated as one of the most cisplatin resistant NSCLC cell lines in our panel was very sensitive to high dose vitamin C. Furthermore, we will clarify method details and annotate mutational status in our panels and explore potential genomic associations to high-dose vitamin C sensitivity as presented in previous studies (e.g. mutant BRAF and/or KRAS tumors, https://doi.org/10.1126/science.aaa5004).

      Finally, we will critically read the manuscript and add references where needed.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head-restrained mice running down a virtual linear path. Mice were trained to collect water rewards at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in the ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90 s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.

      Strengths:

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis of the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.

      Weaknesses:

      Aspects of the methodology, data analysis, and interpretation diminish the overall significance of the findings, as detailed below.

      The LC axonal recordings are well-powered, but the DA axonal recordings are severely underpowered, with recordings taken from a mere 7 axons (compared to 87 LC axons). Additionally, 2 different calcium indicators with differential kinetics and sensitivity to calcium changes (GCaMP6S and GCaMP7b) were used (n=3, n=4 respectively) and the data pooled. This makes it very challenging to draw any valid conclusions from the data, particularly in the novelty experiment. The surprising lack of novelty-induced DA axon activity may be a false negative. Indeed, at least 1 axon (axon 2) appears to be showing a novelty-induced rise in activity in Figure 3C. Changes in activity in 4/7 axons are also referred to as a 'majority' occurrence in the manuscript, which again is not an accurate representation of the observed data.

      The reviewer points out a weakness in the analysis of VTA axons in our dataset. The relatively low n (currently 7) comes from the fact that VTA axons in the CA1 region of the hippocampus are very sparse and very difficult to record from (due to their sparsity and the low level of baseline fluorescence inherent in long range axon segments). This is the reason they have not been recorded from in any other lab outside of our lab. LC axons, on the other hand, are more abundant in CA1. In the paper when comparing VTA versus LC axons we deal with the mismatch in n by downsampling the LC axons to match the VTA axons and repeated this 1000 times to create a distribution. However, because the VTA axon n is relatively low, it is possible that we have not sampled the VTA axon population sufficiently and therefore have a biased population in our dataset. The issue is that it takes months for the baseline expression of GCaMP to reach sufficient levels to be able to record from VTA axons, and it is typical to find only a single axon in a FOV per animal. There are additional reasons why mice and/or axon recordings do not reach criteria and cannot be included in the dataset (these exclusion criteria are reported in the Methods section). For instance, out of the 54 DAT-Cre mice injected, images were never conducted in 36 for lack of expression or because mice failed to reach behavioral criteria. Another 11 mice were excluded for heat bubbles that developed during imaging, z-drift of the FOV, or bleaching of the GCaMP signal.

      However, we do have n=2 additional VTA axon recordings that we will add to the dataset to bring the n up from 7 to 9. We plan on re-analyzing the data with n=9 VTA axons and making comparisons to down-sampled LC axons as described above. This boost in n will increase the power of our VTA axon analysis. To more formally test whether this is sufficient for statistical tests, we plan to utilize the G*power power-analysis tool to compute statistical power for each of the different tests we use. We will report this in the next version of the paper. However, the n=2 additional axons were nor recorded in the novel environment, so the next version will remain at n=7 for the novel environment analysis. We agree with the reviewer that the lack of the novelty induced DA axon activity may be a false negative, and so we will adjust the description of our results and discussion accordingly.

      During the data collection of VTA axon activity we tried two variants of GCaMP: 6s and 7b, to see if one would increase the success rate of finding and recording from VTA axons. Given the long time-course of these experiments and the low yield in success, we pooled the GCaMP variants together to increase statistical power. Because the 2 additional VTA DA axons that were recorded from expressed GCaMP6s, the next version of the paper will have n=5 GCaMP6s, and n=4 GCaMP7b VTA DA axons, which will allow us to compare the activity of the two sensors in the familiar environment. The reviewer correctly pointed out that the sensors themselves could confound our results, and so they should not be pooled unless we can show they do not produce different signals in the axons. We will make this comparison and report the findings in the next version of the paper. If we find no significant differences, we will pool the data. If differences are detected, we will keep these axons separate for subsequent analysis and comparisons to LC axons.

      The authors conducted analysis on recording data exclusively from periods of running in the novelty experiment to isolate the effects of novelty from novelty-induced changes in behavior. However, if the goal is to distinguish between changes in locus coeruleus (LC) axon activity induced by novelty and those induced by motion, analyzing LC axon activity during periods of immobility would enhance the robustness of the results.

      This is indeed true, and this suggested analysis could further support our conclusions regarding the LC novelty signal. For the next version of the paper, we will use the periods of immobility to analyze and isolate any novelty induced activity in LC axons. However, following exposure to the novel environment, mice spend much less time immobile, therefore there may not be sufficient periods of immobility close in time to the exposure to the novel environment (which is when the novelty signal occurs). We plan to analyze mouse behavior during the early exposure to the novel environment for immobility and check whether we have enough of this behavior to perform the suggested analysis.

      The authors attribute the ramping activity of the DA axons to the encoding of the animals' position relative to reward. However, given the extensive data implicating the dorsal CA1 in timing, and the remarkable periodicity of the behavior, the fact that DA axons could be signalling temporal information should be considered.

      This is a very good point. We agree that the VTA DA axons could be signaling temporal information, as we have previously shown that these axons also exhibit ramping activity when you average their activity by time to reward (Krishnan et. al., 2022). We will conduct this analysis on this dataset. We have not, however, conducted any experiments designed to separate out time from distance, such as the experiments conducted in Kim et. al., 2020. Therefore, we cannot determine whether this is due to proximity in space to reward or time to reward. We will clarify in our text that by proximity, we mean either place or time, and cannot conclude which feature of the experience drives the VTA axon signal.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Kim, HyungGoo R., Athar N. Malik, John G. Mikhael, Pol Bech, Iku Tsutsui-Kimura, Fangmiao Sun, Yajun Zhang, et al. A Unified Framework for Dopamine Signals across Timescales. Cell 183, no. 6 (2020).

      The authors should explain and justify the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments.

      LC axon activity was recorded on a 3m track to match the track length from an experiment we recently published (Dong et al., 2021) in which mice were exposed to a novel 3m track while populations of CA1 pyramidal cells were recorded. In that paper we described the time course of place field formation on the novel track. We wanted to test if LC axons signaled novelty (as we hypothesized) and whether the time course of LC axon activity matched the time course of place field formation. We briefly discuss this in the Discussion section of this paper and hypothesize that LC axons in CA1 could open a window of plasticity in which new place fields can form.

      VTA axons were recorded on a 2m track (same VR tracks as LC axons were recorded on) to match another recent paper from our lab in which reward expectation was manipulated (Krishnan et al, 2022). In that study CA1 populations of pyramidal cells were recorded during the reward expectation experiment. To match the experience during recordings of VTA axons in CA1 to test how reward expectation may influence axon signaling along the track, we also used a 2m track. The idea was to check how VTA dopaminergic inputs to CA1 may influence CA1 population dynamics along the track.

      Although the tracks were identical for LC and VTA recordings for both the familiar and novel tracks in terms of visual cues and design, the track lengths are different (simply modulated by gain control of the rotary encoder). To account for this we normalized the lengths for our comparison analysis. This normalization allows for a direct comparison of the patterns of activity across the two types of axons, controlling for the potential confound introduced by the different track lengths. By adjusting the data to a common scale, we could assess the relative changes in activity levels at matched spatial bins, ensuring that any observed differences or similarities are due to the intrinsic properties of the axons rather than differences in track lengths. However, the different lengths do make the animal’s experience slightly different. This is somewhat offset by the observations in our study that none of the LC or VTA axon signals would be expected to be majorly influenced by variations in track length. For instance, LC axons are associated with velocity and a pre-motion initiation signal, neither of which would be influenced by track length. VTA axons are also associated with velocity, which would not influence a direct comparison to LC axon velocity signals as mice reach maximal velocity very rapidly along the track. VTA axons do ramp up in activity as they approach the reward zone, and this signal could be modulated by track length (or maybe not if the signal is encoding time to reward rather than distance). However, LC axons show no ramping to reward signals, so a comparison across axons recorded on different track lengths for this analysis is justified.

      However, to add rigor to comparisons of axon dynamics recorded along 2m and 3m tracks, we plan to plot axon activity of both sets of axons by time to reward, and actual (un-normalized) distance from reward.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Dong, C., Madar, A. D. & Sheffield, M.E. Distinct place cell dynamics in CA1 and CA3 encode experience in new environments. Nat Commun 12, 2977 (2021).

      Reviewer #2 (Public Review):

      Summary:

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.

      The main findings were as follows:

      • In a familiar environment, the activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.
      • VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.
      • In contrast, the activity of LC axons ramped up before the initiation of movement on the Styrofoam wheel.
      • In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral, and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward, and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.

      Strengths:

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that the activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.

      Weaknesses:

      (1)The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.

      (2) Some aspects of the methodology would benefit from clarification.<br /> First, to help others to better scrutinize, evaluate, and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data.

      We thank the reviewer for helping us formalize the scientific rigor of our study. There are ten ARRIVE Guidelines and we have addressed most of them in our study already. However, there is an opportunity to add detail. We have listed below all ten points and how we have or will address each one.

      (1) Experimental design - we go into great depth explaining the experimental set-up, how we used the autofluorescent blebs as imaging controls, how we controlled for different sample sizes between the two populations, and the statistical tests used for comparisons. We also carefully accounted for animal behavior when quantifying and describing axon dynamics both in the familiar and novel environments.

      (2)Sample size - We state both the number of ROIs and mice for each analysis. Wherever we state how many axons had a certain kind of activity, we will also state the number of mice we saw this activity in. For the next version of the paper, we plan to conduct a power analysis using G*power to assess the power of our sample sizes for statistical analysis.

      (3) Inclusion/exclusion criteria - Out of the 36 NET-Cre mice injected, 15 were never recorded for either failing to reach behavioral criteria, or a lack of visible expression in axons. Out of the 54 DAT-Cre mice injected, images were never conducted in 36 for lack of expression or failing to reach behavioral criteria. Out of the remaining 21 NET-CRE, 5 were excluded for heat bubbles, z-drift, or bleaching, while 11 DAT-Cre were excluded for the same reasons. This was determined by visually assessing imaging sessions, followed by using the registration metrics output by suite2p. This registration metric conducted a PCA on the motion-corrected ROIs and plotted the first PC. If the PC drifted largely, to the point where no activity was apparent, the video was excluded from analysis.

      (4) Randomization - Already included in the paper is a description of random down sampling of LC axons to make statistical comparisons with VTA axons. LC axons were selected pseudo-randomly (only one axon per imaging session) to match VTA sampling statistics. This randomization was repeated 1000 times and comparisons were made against this random distribution.

      (5) Blinding-masking - no blinding/masking was conducted as no treatments were given that would require this. We will include this statement in the next version.

      (6) Outcomes - We defined all outcomes measured, such as those related to animal behavior and related axon signaling.

      (7) Statistical methods - None of the reviewers had any issues regarding our description of statistical methods, which we described in detail in this version of the paper.

      (8) Experimental animals - We described that DAT- Cre mice were obtained through JAX labs, and NET-Cre mice were obtained from the Tonegawa lab (Wagatsuma et al. 2017)

      (9) Experimental procedure - Already listed in detail in Methods section.

      (10) Results - Rigorously described in detail for behaviors and related axon dynamics.

      Wagatsuma, Akiko, Teruhiro Okuyama, Chen Sun, Lillian M. Smith, Kuniya Abe, and Susumu Tonegawa. “Locus Coeruleus Input to Hippocampal CA3 Drives Single-Trial Learning of a Novel Context.” Proceedings of the National Academy of Sciences 115, no. 2 (January 9, 2018): E310–16. https://doi.org/10.1073/pnas.1714082115.

      Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?

      A detailed response to this is written above for a similar comment from reviewer 1.

      Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Figure 3a, but as <0.2 cm/s for the imaging data analysis in Figure 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.

      This is an error leftover from before we converted velocity from rotational units of the treadmill to cm/s. This will be corrected in the next version of the paper.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Figure 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Figure 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Figure 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the noveltyinduced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.

      This is a great point. The decreased velocity in the novel environment could lead to a diminished novelty response in LC axons. We will add a discussion point on this in the next version. This could also be the case for VTA axons, so will add a discussion point that the lack of novelty signaling seen in VTA axons could be due to reduced velocity masking this signal.

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.

      Mice receive their water reward through a waterspout that is immobile and positioned directly in front of their mouth (which is also immobile as they are head fixed) and water delivery is triggered by a solenoid when the mice reach the end of the virtual track. Therefore, because the waterspout remains in the same place relative to the mouse, and the water reward is not delivered until they reach the end of the virtual track, there is nothing for the mice to detect. We will update the paper to make this clearer.

      Additionally, on the initial laps with no reward, the ramping activity is still present (Krishnan et al, 2022) indicating this activity is not directly related to the presence/absence of water but is instead caused by reward expectation.

      Reviewer #3 (Public Review):

      Summary:

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine the activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during the approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength of their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of learning and memory. The conclusions of this manuscript are mostly well supported by the data, but some additional analysis and/or experiments may be required to fully support the author's conclusions.

      Weaknesses:

      (1) During teleportation between familiar to novel environments the authors report a decrease in the freezing ratio when combining the mice in the two experimental groups (Figure 3aiii). A major conclusion from the manuscript is the difference in VTA and LC activity following environment change, given VTA and LC activity were recorded in separate groups of mice, did the authors observe a similar significant reduction in freezing ratio when analyzing the behavior in LC and VTA groups separately?

      In response to this comment, we will analyze the freezing ratios in DAT-Cre and NET-Cre mice separately. However, other members of the lab have seen the same result in other mouse strains (See Dong et al. 2021), so we do not expect to see a difference (but it is certainly worth checking).

      (2) The authors satisfactorily apply control analyses to account for the unequal axon numbers recorded in the LC and VTA groups (e.g. Figure 1). However, given the heterogeneity of responses observed in Figures 3c, 4b and the relatively low number of VTA axons recorded (compared to LC), there are some possible limitations to the author's conclusions. A conclusion that LC-CA1 axons, as a general principle, heighten their activity during novel environment presentation, would require this activity profile to be observed in some of the axons recorded in most all LC-CA1 mice.

      We agree with the reviewer’s point here. To help avoid this problem, when downsampling LC axons to compare to VTA axons, we matched the sampling statistics of the VTA axons/mice (i.e. only one LC axon was taken from each mouse to match the VTA dataset).

      However, in the next version of the paper we will also report the number of mice that we see a significant novel response in. We will also add the number of mice with significant activity for each of the measures in the familiar environment (e.g. how many mice had axons positively correlated with velocity).

      Additionally, if the general conclusion is that VTA-CA1 axons ramp activity during the approach to reward, it would be expected that this activity profile was recorded in the axons of most all VTA-CA1 mice. Can the authors include an analysis to demonstrate that each LC-CA1 mouse contained axons that were activated during novel environments and that each VTA-CA1 mouse contained axons that ramped during the approach to reward?

      As stated above, we will add the number of mice that had each activity type we reported here.

      (3) A primary claim is that LC axons projecting to CA1 become activated during novel VR environment presentation. However, the experimental design did not control for the presentation of a familiar environment. As I understand, the presentation order of environments was always familiar, then novel. For this reason, it is unknown whether LC axons are responding to novel environments or environmental change. Did the authors re-present the familiar environment after the novel environment while recording LC-CA1 activity?

      This is an important point to address. While we never varied the presentation order of the familiar vs novel environments, we did record the activity of LC axons in some of the mice in a dark environment (no VR cues) prior to exposure to the familiar environment. We will look at these axons to address whether they respond to initial exposure to the familiar environment. This will allow us to check whether they are responding to environmental change or novelty. We will add this analysis to the next version of the paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study assesses anatomical, behavioral, physiological, and neurochemical effects of early-life seizures in rats, describing a striking astrogliosis and deficits in cognition and electrophysiological parameters. The convincing aspects of the paper are the wide range of convergent techniques used to understand the effects of early-life seizures on behavior as well as hippocampal prefrontal cortical dynamics. While reviewers thought that the scope was impressive, there was criticism of the statistical robustness and number of animals used per study arm, as well as the lack of causal manipulations to determine cause-and-effect relationships. This paper will be of interest to neurobiologists, epileptologists, and behavioral scientists.

      We thank Joseph Gleeson as the Reviewing Editor and Laura Colgin as the Senior Editor for considering this revision of our manuscript for publication in eLife. We appreciate the positive acknowledgment of the study and the critical points raised by the reviewers. We have addressed all the excellent comments of the two reviewers, providing a detailed response for each comment. We believe that these revisions have significantly improved the quality and rigor of our study.

      We want to assure you that our experimental design was meticulously crafted, incorporating adequate control groups, and is grounded in prominent studies in systems neurophysiology focusing into early-life seizures effects, especially for capturing mild effects. We conducted statistical tests adhering to established norms and recommendations, ensuring a thorough and transparent description of the employed statistical methods. We welcome any specific suggestions to further improve this aspect.

      In fact, the concerns raised by the reviewers regarding statistical robustness may stem from a misunderstanding of the rat cohorts used in each experiment. Criticism was directed at the use of only 5 animals without a control group for acute electrophysiological recording. It is essential to clarify that this group served the sole purpose of confirming that the injection of lithium-pilocarpine would induce both behavioral and electrographic seizures. Importantly, this was a descriptive result, and no statistical test or further analysis was conducted with these data. In the revised manuscript, we have made adjustments to this description, aiming to eliminate any ambiguity, particularly addressing the issue of sample size in each experiment.

      Regarding the lack of causal manipulations, we fully agree that this approach would provide a deeper mechanistic understanding of our findings and is an essential next step. Still, developmental brain disturbances are linked to manifold intricate outcomes, so an initial observational exploration would offer insights about particular and nuanced relationships for following studies aimed at targeted interventions. In this context, our objective was to provide a comprehensive characterization of ELS effects to serve as a foundation for future research. While recognizing the relevance of causal manipulations, only a more sophisticated data analyses were able to reveal more complex aspects like specific multivariate associations and non-linear relationships that would not have been revealed by causally perturbing one or another factor at first. In the revised manuscript, we emphasized the limitation of lacking causal manipulations as well as the advantages of our approach. Also, we mentioned some possible targets for following perturbational investigations based on our findings.

      For a more detailed discussion on these matters, we invite you to review our response to reviewers.

      Reviewer 1

      In this paper, Ruggiero, Leite, and colleagues assess the effects of early-life seizures on a large number of anatomical, physiological, behavioral, and neurochemical measures. They find that prolonged early-life seizures do not lead to obvious cell loss, but lead to astrogliosis, working memory deficits on the radial arm maze, increased startle response, decreased paired pulse inhibition, and increased hippocampal-PFC LTP. There was a U-shape relationship between LTP and cognitive deficits. There is increased theta power during the awake state in ELS animals but reduced PFC theta-gamma coupling and reduced theta HPC-PFC coherence. Theta coherence seems to be similar in ACT and REM states in ELS animals while in decreases in active relative REM in controls.

      Strengths:

      The main strength of the paper is the number of convergent techniques used to understand how hippocampal PFC neural dynamics and behavior change after early-life seizures. The sheer scale, breadth, and reach of the experiments are praiseworthy. It is clear that the paper is a major contribution to the field as far as understanding the impact of early-life seizures. The LTP findings are robust and provide an important avenue for future study. The experiments are performed carefully and the analysis is appropriate. The paper is well-written and the figures are clear.

      We express our gratitude to Reviewer #1 for conducting a thoughtful and comprehensive review of our manuscript. We sincerely value both the constructive criticisms provided and your acknowledgment of the manuscript's strengths.

      Weaknesses:

      The main weakness of the paper is the lack of causal manipulations to determine whether prevention or augmentation of any of the findings has any impact on behavior or cognition. Alternatively, if other manipulations would enhance working memory in ELS animals, it would be interesting to see the effects on any of these parameters measured in the paper.

      We sincerely appreciate the insightful comments from Reviewer #1 regarding the potential benefits of including causal manipulations in our study. We wholeheartedly agree that such manipulations can provide a deeper understanding of the mechanistic underpinnings of the observed relationships and represent a crucial next step in our research trajectory.

      Our primary objective in this study was to establish a comprehensive framework through observational examinations, exploring intricate relationships across various neurobiological and behavioral variables in the aftermath of early-life seizures (ELS). By identifying these associations, our work aims to provide a foundation for future investigations that can delve into targeted interventions.

      While we acknowledge the importance of causal manipulations, we would like to underscore the advantages of our initial multivariate correlational study. Importantly, developmental brain disturbances have lasting impacts affecting multiple biological outcomes that may have intricate relationships between themselves. Firstly, although some neurobiological variables stood out from the comparisons of group means, this did not reveal some nuanced relationships within the data. The complexity of the relationships we uncovered, involving behavior, cognition, immunohistochemistry, plasticity, neurochemistry, and network dynamics, required a more elaborate analytical approach. Only through sophisticated data analysis techniques, we were able to dissect important peculiarities, such as the robust multivariate association between brain-wide astrogliosis and sensorimotor impairments, as well as non-linear relationships, such as the inverted-U relationship between plasticity and working memory. These nuances might not have been fully revealed through causal manipulations, since several variables are strongly related and consequently can affect several outcomes, leading to a false conclusion of direct causality.

      Nevertheless, we acknowledge the understatement of the limitation of lacking causal manipulations in our manuscript. To address this, we have included a dedicated section in the discussion highlighting this limitation. We emphasize the advantages of this exploratory phase, supported by a review of the literature on cause-and-effect studies that align with our findings. Additionally, we speculate on promising targets for future cause-and-effect studies based on our findings. For instance, we hypothesize that enhancing plasticity may improve working memory in control subjects, while attenuating plasticity might have a similar effect in ELS subjects. Furthermore, we propose that reactive astrogliosis and concurrent neuroinflammatory processes likely underlie sensorimotor changes in the ELS group. Lastly, we suggest that dopaminergic antagonism in the ELS group could normalize behavioral deficits, prevent the exaggerated LTP induction of the HPC-PFC pathway, reestablish the state-dependent network dynamics, and desensitize the dopaminergic response.

      [...]Also, I find the sections where correlations and dimensionality reduction techniques are used to compare all possible variables to each other less compelling than the rest of the paper (with the exception of the findings of U-shaped relationship of cognition to LTP). In fact, I think these sections take away from the impact of the actual findings.

      We appreciate the reviewer's feedback and would like to emphasize the significance of the multivariate analysis conducted in our study. Multivariate analysis extends beyond bivariate correlations and is the only type of analysis capable of comprehending the relation of data in a multidimensional way, offering a comprehensive approach to understanding complex relationships among multiple variables. By employing techniques such as principal component analysis (PCA), generalized linear models (GLM), and canonical correlation analysis (CCA), we aimed to unravel intricate patterns of covariance that explore how different variables collectively contribute to the observed outcomes and assess the impact of each independent variable (predictor) on the dependent variable (the variable to be predicted or explained). Importantly, it enables us to control for potential confounding factors by keeping all other variables constant.

      While we acknowledge that these sections may appear intricate, their inclusion is indispensable for a comprehensive understanding of the diverse variables associated with SE outcomes. We believe that these analyses offer valuable insights into the intricate dynamics of our study, providing a more holistic perspective on the altered spectrum induced by early-life seizures (ELS).

      Regarding the reviewer's observations about the impact of the U-shaped relationship between cognition and LTP, we have made graphical and textual adjustments to emphasize the significance of these findings, aiming to enhance their clarity and impact within the broader context of our research. We trust that these modifications contribute to a more compelling presentation of our results.

      […]Finally, the apomorphine section seemed to hang separately from the rest of the paper and did not seem to fit well.

      We appreciate the Reviewer #1 feedback on the apomorphine section. In order to address this point, we carefully rewrote our rationale before the results to clarify our hypothesis and chosen methodology. In our work, we performed the apomorphine experiment as a logical next step of previous data. We showed that ELS rats display REM-like oscillatory dynamics during active behavior, similar to genetically and pharmacologically hyperdopaminergic mice (Dzirasa et al., 2006). Furthermore, other results also indicated possible dopamine neurotransmission alterations, such as working memory deficits, hyperlocomotion, PPI deficits, aberrant HPC-PFC LTP, and abnormal PFC gamma coordination. Therefore, we hypothesized that ELS animals would present a state of hyperdopaminergic activity. Among the possible methodologies to investigate the hyperdopaminergic state, we choose the apomorphine sensitivity test, which is classically used and induces unambiguous behavior and neurochemical alterations in hyperdopaminergic rodents (Duval, 2023; Ellenbroek & Cools, 2002).

      Reviewer 1 (Recommendations For The Authors):

      (1) It would be useful to stain for other GABAergic interneuron markers such as somatostatin, VIP, CCK.

      (2) The authors refer to neuroinflammation but they are really referring to reactive astrogliosis. I would also suggest staining for microglial markers.

      (3) The duration of chronic electrographic seizures in ELS animals should also be calculated and presented.

      (4) Word usage: the authors frequently use the word "presents" when "demonstrates" would be more appropriate

      (1) We appreciate your insight into staining for other GABAergic interneuron markers such as somatostatin, VIP, CCK. While investigating additional interneuron types is indeed relevant, it was not the primary focus of this study for several reasons: 1) The overall neuron density, assessed through NeuN immunostaining, revealed no differences between controls and early life seizure (ELS) groups, even in brain regions susceptible to neuron death after SE (i.e., CA1). Therefore, differences in interneurons, which are more resistant to death in SE and constitute approximately 20% of the cells, are unlikely. 2) Among all interneuron subtypes, Parvalbumin-positive (PV+) interneurons represent a substantial population and are susceptible to various stressors. In the hippocampus, 24% of GABAergic neurons are PV+, whereas 14% are SST+, 10% are CCK+, and VIP+ are less than 10% (Freund and Buzsaki, 1996). Consequently, we considered PV+ interneurons to be a more sensitive subpopulation for evaluating the effects of SE. As they showed no significant difference, we do not believe that assessing smaller subtypes, such as VIP+ or CCK+ cells, would yield significant differences.

      (2) While we often see activated microglia in hippocampal sclerosis, these cells are only slightly increased in cases without hippocampal sclerosis (which are similar to our animals), as we previously published (Peixoto-Santos et al., 2012). Astrocytes are a better marker for the epileptogenic zone, as are increased in epileptogenic zones without neuron loss and are also important for controlling neuronal activity by neurotransmitter recycling and ion buffering. In fact, our present model is very similar to the mesial temporal lobe epilepsy patients with gliosis-only, which are characterized by only presenting increased reactive astrogliosis in the hippocampus, without cell loss, and also present changes in innate inflammatory response related to the presence of reactive astrocytes (Grote et al., 2023).

      (3) We have performed these calculations and added this information to the revised manuscript.

      (4) We thank the reviewer for the word usage recommendation. Indeed, we frequently used “present” throughout the manuscript to describe the observations and patterns the groups “exhibited” or “showed”. However, we believe this is truly not the most appropriate usage in the Discussion when we describe the multivariate latent factors, as we did not “present” them, but rather, we “demonstrated” their existence and significance through our analysis. We rewrote these sentences and hope this is the point the reviewer was referring to.

      References:

      Duval F. Systematic review of the apomorphine challenge test in the assessment of dopaminergic activity in schizophrenia. Healthcare. 2023 11 (1487): 1-11. doi: 10.3390/healthcare11101487.

      Dzirasa K, Ribeiro S, Costa R, Santos LM, Lin SC, Grosmark A, Sotnikova TD, Gainetdinov RR, Caron MG, Nicolelis MAL. Dopaminergic control of sleep-wake states. Journal of Neuroscience. 2006 26:10577–10589. doi:10.1523/JNEUROSCI.1767-06.2006.

      Freund TF, Buzsáki G. Interneurons of the hippocampus. Hippocampus. 1996;6(4):347-470. doi: 10.1002/(SICI)1098-1063(1996)6:4<347::AID-HIPO1>3.0.CO;2-I. PMID: 8915675.

      Ellenbroek BA & Cools AR. Apomorphine susceptibility and animal models for psychopathology: genes and environment. Behavior Genetics. 2002 32 (5): 349-361. doi: 10.1023/a:1020214322065.

      Grote A, Heiland DH, Taube J, Helmstaedter C, Ravi VM, Will P, Hattingen E, Schüre JR, Witt JA, Reimers A, Elger C, Schramm J, Becker AJ, Delev D. 'Hippocampal innate inflammatory gliosis only' in pharmacoresistant temporal lobe epilepsy. Brain. 2023 Feb 13;146(2):549-560. doi: 10.1093/brain/awac293. PMID: 35978480; PMCID: PMC9924906.

      Peixoto-Santos JE, Galvis-Alonso OY, Velasco TR, Kandratavicius L, Assirati JA, Carlotti CG, Scandiuzzi RC, Serafini LN, Leite JP. Increased metallothionein I/II expression in patients with temporal lobe epilepsy. PLoS One. 2012;7(9):e44709. doi: 10.1371/journal.pone.0044709. Epub 2012 Sep 18. Erratum in: PLoS One. 2016;11(7):e0159122. PMID: 23028585; PMCID: PMC3445538.

      Reviewer 2

      In this manuscript, the authors employ a multilevel approach to investigate the relationship between the hippocampal-prefrontal (HPC-PFC) network and long-term phenotypes resulting from early-life seizures (ELS). Their research begins by establishing an ELS rat model and conducting behavioral and neuropathological studies in adulthood. Subsequently, the manuscript delves into testing hypotheses concerning HPC-PFC network dysfunction. While the results are intriguing, my enthusiasm is tempered by concerns related to the logical flow

      We thank the reviewer for bringing attention to the logical flow of the manuscript. Given the diverse array of behavioral and neurobiological variables examined in our study obtained through various methods and measures, we utterly recognize the utmost importance of a clear and coherent logical flow to provide a comprehensive understanding of the overall narrative.

      Our goal was to articulate the neurobiological findings in a manner that underscores their convergence of mechanisms, revealing a cohesive relationship between early-life seizure, cognitive deficits, sensorimotor impairments, abnormal network dynamics, aberrant plasticity, neuroinflammation and dysfunctional dopaminergic transmission.

      Briefly, an outline of our narrative could be summarized in the highlights:

      (1) ELS induces sensorimotor alterations and working memory deficits.

      (2) ELS does not induce neuronal loss, so neurobiological underpinnings may be molecular and functional.

      (3) ELS induces brain-wide astrogliosis and exaggerated HPC-PFC long-term plasticity.

      (4) ELS does not induce neuronal loss, so neurobiological underpinnings may be molecular and functional.

      (5) Sensorimotor alterations are more correlated to astrogliosis, while cognitive deficits to altered HPC-PFC plasticity.

      (6) ELS-induced functional alterations may also be observable in freely moving subjects. ELS induces state-dependent alterations in the HPC-PFC network dynamics, such as increased hippocampal theta and abnormal PFC gamma coordination during behavioral activity.

      (7) ELS leads to REM-ACT similarity, previously reported in hyperdopaminergic mice, indicating dopaminergic dysfunction.

      (8) ELS exhibits altered dopaminergic transmission and behavioral sensitivity that mirror the initial sensorimotor findings.

      (9) The literature establishes an inverted-U relationship between dopamine and cognition and PFC plasticity, which may explain our finding of an inverted-U relationship between working memory and HPC-PFC LTP across CTRL and ELS rats.

      To address this concern, we have made revisions to enhance the logical flow, ensuring a more seamless transition between the different sections of the Results by presenting clearer links between observations and following investigations. We hope these changes contribute to a more straightforward rationale and easily understandable presentation of our hypotheses and results.

      Focus on Correlations: The manuscript primarily highlights correlations as the most significant findings. For instance, it demonstrates that ELS induces cognitive and sensorimotor impairments. However, it falls short of elucidating why these deficits are specifically linked to HPC-PFC synaptic plasticity/network. Furthermore, the manuscript mentions the involvement of other brain regions like the thalamus in the long-term outcomes of ELS based on immunohistochemistry data.

      Thank you for your insightful comments, which allowed us to provide further clarification on our study's focus and findings. Our primary goal was to delve into the electrophysiological alterations within the HPC-PFC pathway. The rationale behind this choice lies in the hypothesis that, even in the absence of significant neuronal loss, functional changes in circuits closely linked to the cognitive and behavioral aspects under investigation could be identified.

      While we concentrated our electrophysiological investigation on the HPC-PFC pathway due to its well-established functional correlates in existing literature, it is essential to highlight that our data reveal broader alterations in neural circuitry. Notably, we observed an increase in GFAP in the entorhinal cortex and thalamic reticular nucleus, along with changes in the dopaminergic release within the VTA-NAc pathway. These findings suggest that the impact of early-life seizures extends beyond the HPC-PFC circuit.

      While we recognize the relevance of other brain circuits in the outcomes of ELS, we argue for a specific role of the HPC-PFC circuit in the outcomes of ELS. We will detail the supporting evidence and arguments that specifically link the HPC-PFC function to our ELS-related observations in a later comment regarding the "overinterpretation" of the HPC-PFC role. To better convey these important nuances, we have made specific modifications to the results and in the discussion section to underscore the broader implications of our findings, providing a more comprehensive understanding of the study's scope and outcomes.

      […]This raises questions about the subjective nature and persuasiveness of the statistical studies presented.

      All statistical analyses were carefully applied based on the literature and following well-established precepts and precautions. Specifically, we constructed the experimental design for univariate inferential statistics for the data related to behavioral tests, synaptic plasticity, immunohistochemistry, oscillatory activity, and dopaminergic sensitization. However, we also submitted our data to multivariate statistical analysis, which is recommended in cases with a considerable amount of data, and intend to investigate possible hidden effects. In this situation, multivariate analyses are inherently exploratory due to the possibility of using multiple measurements for each phenomenon investigated. Nevertheless, their application is not subjective and follows the same statistical rigor as univariate analyses. We firmly believe that abstaining from exploring these data, would not reach the full potential of this analytical method in dissecting the multidimensional associations within our dataset. In order to eliminate any doubt regarding the objectivity in the choice and application of statistics, we carefully rewrote the methods, highlighting the details of statistical rigor even more.

      Sample Size Concerns: The manuscript raises concerns about the adequacy of sample sizes in the study. The initial cohort for acute electrophysiology during ELS induction comprised only 5 rats, without a control group. Moreover, the behavioral tests involved 11 control and 14 ELS rats, but these same cohorts were used for over four different experiments. Subsequent electrophysiology and immunohistochemistry experiments used varying numbers of rats (7 to 11). Clarification is needed regarding whether these experiments utilized the same cohort and why the sample sizes differed. A power analysis should have been performed to justify sample sizes, especially given the complexity of the statistical analyses conducted.

      We appreciate the reviewer's thoroughness and considerations regarding the sample sizes used in our study. The concerns raised about statistical robustness seem to stem from a lack of clarity in delineating the rat cohorts used in each experiment. It is encouraging to note that several studies in the field of neurophysiology, employing similar analyses, utilize a sample size similar to what was used in our research. The choice of the sample size was based on a thorough analysis of the existing literature, considering specific experimental demands, the complexity of employed techniques, and the need to achieve statistically robust results. In response to these concerns and to enhance clarity on the sample sizes, we have made several modifications (highlighted in red) in the text. Below, we provide details for each animal cohort utilized:

      Cohort 1 - Acute Electrophysiology

      The decision to use only 5 animals without a control group for acute electrophysiological recording aimed specifically to confirm that the injection of lithium-pilocarpine would induce both behavioral and electrographic seizures. It is crucial to note that this was a descriptive result and a methodological control of the ELS model. Besides, no statistical test or further analysis was conducted on these data. We maintain the belief that a group of 5 animals is sufficient to demonstrate that the protocol induces electrographic seizures, and introducing a control group was considered unnecessary to show that saline injection does not induce electrographic seizures.

      Cohort 2 - Behavior, LTP Recording, and Immunohistochemistry

      Initially, 14 (ELS) and 11 (CTRL) rats were used for behavior assessment. The reduction in sample size for LTP and immunohistochemistry experiments was influenced by practical challenges, including mortality during LTP surgery and issues with immunohistochemical staining that hindered a proper analysis for some animals.

      Cohort 3 - Chronic Freely-Moving Electrophysiology

      A new cohort of animals (n=6 and 9 for CTRL and ELS, respectively) was used specifically for freely-moving electrophysiological data.

      Cohort 4 - Behavioral Sensitization to Psychostimulants

      A fourth cohort was utilized for assessing behavioral sensitization to psychostimulants (CTRL n=15 and ELS n=14). The reduced sample size for neurotransmitter analysis (CTRL n=8 and ELS n=9) was a deliberate selection of a subsample to ensure a sufficient sample for quantification while maintaining statistical validity

      Overinterpretation of HPC-PFC Network Dysfunction: The manuscript potentially overinterprets the role of HPC-PFC network dysfunction based on the results.

      We appreciate the insight from Reviewer #2 regarding the potential overinterpretation of the role of the hippocampal-prefrontal cortex (HPC-PFC) network dysfunction in the various alterations observed after ELS.

      The significance of HPC-PFC plasticity and network function has been extensively documented concerning cognitive, affective, and sensorimotor functions, as well as in models of neuropsychiatric diseases. Our recent review (Ruggiero et al., 2021) compiles these findings. Specifically, the HPC-PFC network has been linked to spatial working memory through a series of causal and correlational studies conducted by Floresco et al. and Gordon et al. These findings make the HPC-PFC pathway a plausible candidate for underlying alterations associated with working memory, consistent with our observation of exaggerated HPC-PFC LTP associated with poorer performance in the ELS group. Regarding the immunohistochemical observations, we concur with Reviewer #2 that these findings suggest broader-scale brain alterations related to sensorimotor dysfunction beyond the HPC-PFC circuitry. Surely, we acknowledge that these large-scale alterations may underlie brain-wide network functional changes.

      In our network dynamics study arm, we investigated HPC-PFC oscillatory activity, allowing us to discuss potential relationships between abnormal plasticity (verified in the second study arm) and network dynamics. It is important to note that while there is some anatomical specificity to the LFPs recorded in the HPC and PFC, these activities may represent larger-scale limbic-cortical dynamics. The intermediate HPC exhibits a significant influence from both dorsal and ventral HPC, and the prelimbic PFC is intricately related to both hippocampal and thalamic oscillations exhibiting under-demand state-dependent synchrony. Additionally, the state maps used in our study were initially described to distinguish states at a global forebrain network level. Even in our past studies, we have described HPC-PFC patterns of network activity (Marques et al., 2022a) that later were found to represent a part of a brain-wide synchrony pattern (Marques et al., 2022b). However, most of our findings on oscillatory dynamics were centered around theta oscillations, a well-established brain-wide activity that originates and spreads from the hippocampus and are present in the HPC-PFC circuit during activity.

      In conclusion, we believe the correlations between HPC-PFC LTP and working memory, as well as the specific alterations of theta coordinated activity, support a particular role of the HPC-PFC network dysfunction in the effects of ELS. However, the brain-wide immunochemical alterations are plausible indications of larger-scale dysfunctional networks. To address this issue, we emphasized in the discussion of network findings that the immunohistochemical and neurochemical findings endorse the need to investigate ELS effects on larger networks.

      Notably, cognitive deficits are described as subtle, with no evidence of learning deficits and only faint working memory impairments. However, sensorimotor deficits show promise. Consequently, it's essential to justify the emphasis on the HPC-PFC network as the primary mechanism underlying ELS-associated outcomes, especially when enhanced LTP is observed. Additionally, the manuscript seems to sideline neuropathological changes in the thalamus and the thalamus-to-PFC connection. The analysis lacks a direct assessment of the causal relationship between HPC-PFC dysfunction and ELS-associated outcomes, leaving a multitude of multilevel analyses yielding potential correlations without easily interpretable results.

      We thank Reviewer #2 for the thorough review and insightful comments. To better grasp the context, it is crucial to consider this characterization within the scope of our experimental design and expected outcomes. Unlike epilepsy models involving adult animals or interventions causing pronounced neuronal loss and structural modifications, our study was intentionally designed to explore moderate behavioral alterations. In fact, the mild behavioral alterations observed in ELS models and the lack of neuronal loss guided our focus on investigating changes in HPC-PFC communication.

      While our observed cognitive deficits may be milder compared to certain models, it is imperative to underscore their robustness and clinical relevance. These findings have been consistently replicated globally across various experimental models, encompassing ELS induced by hyperthermia (Chang et al., 2003; Kloc et al., 2022), kainic acid (Statsfrom et al. 1993), flurothyl (Karnam et al., 2009a; 2009b), and hypoxia (Najafian et al., 2021; Hajipour et al., 2023). Mild cognitive deficits were also evident by other research groups using the pilocarpine model in P12 (Mikulecká et al., 2019; Kubová et al., 2013; Kubová et al., 2002). Furthermore, our group replicated the working memory deficit results using an alternative paradigm (the T-maze) and a different rat strain (Sprague Dawley), enhancing the reliability of our observations (D’Agosta et al., 2023).

      The clinical perspective gains importance, considering that cognitive effects of ELS may be less severe than those in patients with long-term epilepsy. In fact, the majority of patients with childhood epilepsy exhibit mild cognitive impairment as the most common grade of severity - more than two times the rate of severe cognitive impairment (Sorg et al., 2022). Investigating the mechanisms underlying these mild cognitive changes is crucial for shedding light on neurobiological aspects not fully understood, thereby expanding our comprehension of the consequences of ELS.

      We recognize the challenges associated with conducting causal experiments in neuroscience, especially in long-term and chronic alterations as seen in our model. Isolating modifications of specific activities is indeed intricate. However, it's essential to acknowledge that neuroscience progress has not solely relied on causal experiments but has significantly advanced through correlational observations. Our findings serve as a foundational step in comprehending the repercussions of ELS, proposing mechanisms and circuits that necessitate further in-depth dissection and study in the future. We have integrated these considerations into the discussion section of the manuscript to enhance clarity.

      Overall, while the manuscript presents intriguing findings related to the HPC-PFC network and ELS outcomes, it requires a more rigorous experimental design[…]

      We thank the reviewer for acknowledging our intriguing findings. Regarding the experimental design, we are confident that all the manuscript hypotheses, design, and execution of experiments were rigorously based on the literature and carried out with all necessary controls. As stated earlier, we constructed the experimental design for univariate inferential statistics and explored associations between variables using multivariate statistics. Specifically, we achieved a rigorously experimental design following a series of guidelines. First, the planning of the sample size in each experiment and their respective controls were based on mild effects from the ELS literature. As previously indicated, the only experiment with one group was just the description of the behavioral effects and electrographic seizures after the acute injection of lithium-pilocarpine. Given the exhaustive replication of these data in the ELS literature, this result was presented descriptively as a methodological control. Second, detailed descriptions of statistics were made in both methods and results, always indicating positive and negative results. Notably, the experimental designs used in the work do not correspond to any novelty or radicalization, strictly following the literature of the field. However, new indications and references about the experimental accuracy were added to the manuscript to resolve any doubts regarding objectivity.

      References:

      Chang YC, Huang AM, Kuo YM, Wang ST, Chang YY, Huang CC. Febrile seizures impair memory and cAMP response-element binding protein activation. Ann Neurol. 2003 Dec;54(6):706-18. doi: 10.1002/ana.10789. PMID: 14681880.

      D'Agosta R, Prizon T, Zacharias LR, Marques DB, Leite JP, Ruggiero RN. Alterations in hippocampal-prefrontal cortex connectivity are associated with working memory impairments in rats subjected to early-life status epilepticus. In: NEWROSCIENCE INTERNATIONAL SYMPOSIUM, 2023, Ribeirão Preto. Poster.

      Hajipour S, Khombi Shooshtari M, Farbood Y, Ali Mard S, Sarkaki A, Moradi Chameh H, Sistani Karampour N, Ghafouri S. Fingolimod Administration Following Hypoxia Induced Neonatal Seizure Can Restore Impaired Long-term Potentiation and Memory Performance in Adult Rats. Neuroscience. 2023 May 21;519:107-119. doi: 10.1016/j.neuroscience.2023.03.023. Epub 2023 Mar 28. PMID: 36990271.

      Karnam HB, Zhou JL, Huang LT, Zhao Q, Shatskikh T, Holmes GL. Early life seizures cause long-standing impairment of the hippocampal map. Exp Neurol. 2009 Jun;217(2):378-87. doi: 10.1016/j.expneurol.2009.03.028. Epub 2009 Apr 2. PMID: 19345685; PMCID: PMC2791529.

      Karnam HB, Zhao Q, Shatskikh T, Holmes GL. Effect of age on cognitive sequelae following early life seizures in rats. Epilepsy Res. 2009 Aug;85(2-3):221-30. doi: 10.1016/j.eplepsyres.2009.03.008. Epub 2009 Apr 22. PMID: 19395239; PMCID: PMC2795326.

      Kubová H, Mareš P. Are morphologic and functional consequences of status epilepticus in infant rats progressive? Neuroscience. 2013 Apr 3;235:232-49. doi: 10.1016/j.neuroscience.2012.12.055. Epub 2013 Jan 7. PMID: 23305765.

      Kloc ML, Marchand DH, Holmes GL, Pressman RD, Barry JM. Cognitive impairment following experimental febrile seizures is determined by sex and seizure duration. Epilepsy Behav. 2022 Jan;126:108430. doi: 10.1016/j.yebeh.2021.108430. Epub 2021 Dec 10. PMID: 34902661; PMCID: PMC8748413.

      Kubová H, Mares P, Suchomelová L, Brozek G, Druga R, Pitkänen A. Status epilepticus in immature rats leads to behavioural and cognitive impairment and epileptogenesis. Eur J Neurosci. 2004 Jun;19(12):3255-65. doi: 10.1111/j.0953-816X.2004.03410.x. PMID: 15217382.

      Marques DB, Ruggiero RN, Bueno-Junior LS, Rossignoli MT, and Leite JP. Prediction of Learned Resistance or Helplessness by Hippocampal-Prefrontal Cortical Network Activity during Stress. The Journal of Neuroscience. 2022a 42 (1): 81-96.. https://doi.org/10.1523/jneurosci.0128-21.2021.

      Marques DB, Rossignoli MT, Mesquita BDA, Prizon T, Zacharias LR, Ruggiero RN and Leite JP. Decoding fear or safety and approach or avoidance by brain-wide network dynamics abbreviated. bioRxiv. 2022b https://doi.org/10.1101/2022.10.13.511989.

      Mikulecká A, Druga R, Stuchlík A, Mareš P, Kubová H. Comorbidities of early-onset temporal epilepsy: Cognitive, social, emotional, and morphologic dimensions. Exp Neurol. 2019 Oct;320:113005. doi: 10.1016/j.expneurol.2019.113005. Epub 2019 Jul 3. PMID: 31278943.

      Najafian SA, Farbood Y, Sarkaki A, Ghafouri S. FTY720 administration following hypoxia-induced neonatal seizure reverse cognitive impairments and severity of seizures in male and female adult rats: The role of inflammation. Neurosci Lett. 2021 Mar 23;748:135675. doi: 10.1016/j.neulet.2021.135675. Epub 2021 Jan 28. PMID: 33516800.

      Ruggiero RN, Rossignoli MT, Marques DB, de Sousa BM, Romcy-Pereira RN, Lopes-Aguiar C and Leite JP. Neuromodulation of Hippocampal-Prefrontal Cortical Synaptic Plasticity and Functional Connectivity: Implications for Neuropsychiatric Disorders. Frontiers in Cellular Neuroscience. 2021 15 (October): 1–23. https://doi.org/10.3389/fncel.2021.732360.

      Sorg AL, von Kries R, Borggraefe I. Cognitive disorders in childhood epilepsy: a comparative longitudinal study using administrative healthcare data. J Neurol. 2022 Jul;269(7):3789-3799. doi: 10.1007/s00415-022-11008-y. Epub 2022 Feb 15. PMID: 35166927; PMCID: PMC9217877.

      Stafstrom CE, Chronopoulos A, Thurber S, Thompson JL, Holmes GL. Age-dependent cognitive and behavioral deficits after kainic acid seizures. Epilepsia. 1993 May-Jun;34(3):420-32. doi: 10.1111/j.1528-1157.1993.tb02582.x. PMID: 8504777.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      This is a short but important study. Basically, the authors show that α-synuclein overexpression's negative impact on synaptic vesicle recycling is mediated by its interaction with E-domain containing synapsins. This finding is highly relevant for synuclein function as well as for the pathophysiology of synucleinopathies. While the data is clear, functional analysis is somewhat incomplete.

      (1) The authors should present a clearer dissociation of endocytosis and exocytosis under the various conditions they study. They should quantify the rate of rise and decay of pHluorin signals. 2. In addition, I strongly recommend a few additional experiments with and without a vATPase inhibitor such as bafilomycin to estimate the relative effects on exo- vs. endocytosis. As the authors are aware bafilomycin will mask the re-acidification /endocytosis component, thus revealing pure exocytosis and thus enabling quantification of endocytosis with minimal contamination from exocytosis.

      In the revised version, we analyzed and quantified exocytosis and endocytosis separately, with bafilomycin experiments, as the reviewer suggested (new data, Fig. 1- Fig. Supp. 1A-B). Overexpression of human alpha-synuclein only attenuated exocytosis in neurons that also expressed synapsins (WT neurons and synapsin TKO neurons transduced with synapsin Ia). In parallel, we also examined endocytosis by calculating the time-constant of the decay in the fluorescence of sypHy during the endocytotic phase (Fig. 1- Fig. Supp. 1C-E). Previous studies have shown that after brief stimulus-trains – like those used in our study (20Hz/300AP) – most endocytosis occurs after the cessation of stimulation 1. Expression of human alpha-synuclein did not alter the endocytosis time-constant in any of our experiments. To summarize, the interaction of alpha-synuclein with the synapsin E domain was required for alpha-synuclein induced attenuation of exocytosis, but not endocytosis.

      Reviewer #2

      ...The paper will be improved significantly if additional experiments are added to expand and provide a more mechanistic understanding of the effect of α-syn and the intricate interplay between synapsin, α-syn, and the SV. For an enthusiastic reader, the manuscript as it looks now with only 3 figures, ends prematurely. Some of the experiments above or others could complement, expand and strengthen the current manuscript, moving it from a short communication describing the phenomenon to a coherent textbook topic. Nevertheless, this work provides new and exciting evidence for the regulation of neurotransmitter release and its regulation by synapsin and α-syn.

      (1) Did the authors try to attach E-domain for example to synapsin Ib and restore α-syn inhibition with synapsin Ib-E?

      This is an interesting idea, but in previous studies, we found that synapsin Ib does not associate with synaptic vesicles2, so it will not be present at the right location to be able to restore alpha-synuclein induced synaptic attenuation. We have also seen that this mis-localization alters synaptic properties (unpublished).

      (2) Was the expression level of Synapsin-IaScrE examined and compared to WT Synapsin-Ia in Fig 3?

      Yes, this data is now shown in Fig. 3-Fig. Supp. 1.

      (3) Were SVs dispersed in α-syn overexpression as predicted?

      We interpret the reviewer’s question and reasoning as follows. If alpha-synuclein binds to the E-domain of synapsin, a prediction in the alpha-synuclein over-expression scenario is that the overabundance of alpha-synuclein molecules would bind to and sequester the E-domain synapsins away from synaptic vesicles. In the absence of E-domain synapsins, the synaptic-vesicle clustering effects of synapsins would be lost, and there would be dispersion of synaptic vesicles. We tested this prediction, which is now shown in an additional figure (new data, Fig. 4). Indeed, the AAV-mediated over-expression of alpha-synuclein leads to a dispersion of synaptic vesicles, and this dispersion is dependent on synapsins Ia and Ib, but not IIa and IIb (please see Fig. 4D-E in the revised manuscript). Appropriate text is also added, starting with “Previous studies have shown that loss of all synapsins...” presents this data and interprets it.

      (4) How does this study coincide with the effects of α-syn on fusion pore and endocytosis? This should be at least discussed. It is also possible that the effects of α-syn on endocytosis might affect the results as if endocytosis is affected, SVs number and distribution will be also affected.

      It is difficult to reconcile our data with the idea that alpha-synuclein facilitates fusion-pore opening, as proposed by the Edwards lab 3. In fact, its difficult to reconcile this concept with their own previous data, showing that alpha-synuclein over-expression attenuates SV-recycling 4. As mentioned above, modulation of endocytosis does not seem to be a major factor in our experiments, though this does not rule out a physiologic role for alpha-synuclein in endocytosis, since all our experiments are based on over-expression paradigms. Future experiments looking at phenotypes after acute alpha-synuclein knockdown may provide more clarity. In any case, there are many purported roles of alpha-synuclein, and this is now mentioned in the last paragraph (starting with Additionally, -syn has been implicated…”

      (5) What happened after stimulation when synapsin is detached from SV, does α-syn continues to be linked to it?

      The fate of alpha-synuclein after stimulation is unclear in our experiments. Previous experiments suggest that while both synapsin and alpha-synuclein detach from the SV cluster during stimulation, synapsin returns to synapses while alpha-synuclein does not 5. However, our more recent experiments (unpublished) suggest that the activity-induced dispersion of alpha-synuclein might be phosphorylation-dependent, and that over-expression of alpha-synuclein may not be the best setting to evaluate protein dispersion. We hope to answer this question more rigorously using alpha-synuclein knock-in constructs.

      (6) The experiment with E-domain fused to syPhy assumes that α-syn will still be bound to the SV. So how does α-syn inhibit ST?

      The goal of this experiment was to force the synapsin E-domain to be in a location where it would normally be present – i.e. surface of the synaptic vesicle – by tagging it to sypHy (sypHy-E), and ask if this forced-retention would be sufficient to reinstate the alpha-synuclein mediated attenuation of SV-recycling (as shown in Fig. 3F, it does). Please note that the sypHy-E in these experiments does target to the synapses (new data, Fig. 3-Fig. Supp. 2D). In this context, we are not sure what the reviewer means by “So how does a-syn inhibit synaptic transmission?” We don’t think that alpha-synuclein needs to unbind from the SVs in order to inhibit synaptic transmission. Overall, we think that alpha-synuclein needs to cooperate with synapsins to perform its function, but as mentioned above and in the manuscript, the precise role of alpha-synuclein in this process is still unclear.

      (7) An interesting experiment will be the expression of the isolated E-domain and examining blockage of α-syn inhibition and disruption of synapsin- α-syn interaction. Have the authors examined it as was done in other models?

      We did do the experiment where we only over-expressed the isolated synapsin E-domain in neurons. We were thinking that perhaps the E-domain would have a dominant-negative effect on SV-clustering, as it did in the lamprey and other model-systems, where the E-peptide was directly injected into the axon. However, we found that in cultured hippocampal neurons, the over-expressed E-domain behaves like a soluble protein and is not enriched in synapses (see new data, Fig. 3-Fig. Supp. 2B). Also, the over-expressed E-domain cannot reinstate the synaptic attenuation induced by alpha-synuclein (new data, Fig. 3-Fig. Supp. 2C), likely because the E-domain does not target to synapses. Actually, this is why we did the syPhy-E domain experiment in the first place, to ensure that the E-domain was in the right location to have an effect.

      (8) A schematic model/scheme providing a mechanistic view of the interplay between the proteins is essential and can improve the paper.

      The only model we can confidently make right now would be stick-figures showing the site where alpha-synuclein C-terminus binds to synapsin, which is obviously not very insightful. As noted above (and in the revised version), several different functions have been attributed to alpha-synuclein, and the precise role of alpha-synuclein/synapsin interactions in regulating the SV-cycle is unclear. We hope to create a better model after getting some more data from us and our colleagues working on this challenging problem.

      References

      (1) Kononenko NL & Haucke V. (2015) Molecular mechanisms of presynaptic membrane retrieval and synaptic vesicle reformation. Neuron 85, 484-496.

      (2) Gitler D, Xu Y, Kao H-T, Lin D, Lim S, Feng J, Greengard P & Augustine GJ. (2004) Molecular Determinants of Synapsin Targeting to Presynaptic Terminals. J. Neurosci. 24, 3711-3720.

      (3) Logan T, Bendor J, Toupin C, Thorn K & Edwards RH. (2017) α-Synuclein promotes dilation of the exocytotic fusion pore. Nat Neurosci 20, 681-689.

      (4) Nemani VM, Lu W, Berge V, Nakamura K, Onoa B, Lee MK, Chaudhry FA, Nicoll RA & Edwards RH. (2010) Increased expression of alpha-synuclein reduces neurotransmitter release by inhibiting synaptic vesicle reclustering after endocytosis. Neuron 65, 66-79.

      (5) Fortin DL, Nemani VM, Voglmaier SM, Anthony MD, Ryan TA & Edwards RH. (2005) Neural activity controls the synaptic accumulation of alpha-synuclein. J Neurosci 25, 10913-10921.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1: I would have preferred to see more figures with brain images showing the cellular abundance maps and the atrophy maps. Without being able to see these figures, it's difficult for the reader to 1) validate the atrophy patterns or 2) gain intuition about how the cellular abundance maps vary across the brain. The images in Figure 1C give a small preview, but I'd like to see these maps in their entirety on the brain surface or axial image slices.

      (1) We added brain surface visualization plots of the voxel-wise cellular abundance maps to Figure 1 (lateral, dorsal, and ventral views of both hemispheres). To illustrate how their spatial distributions are associated with brain tissue damage, in Figure 2, we have also added brain surface visualizations of regional values from the atrophy t-statistic maps for the thirteen neurodegenerative conditions and the cell-type map most strongly associated with each condition. These plots allow us to observe variability across the cell-type density and atrophy maps, as well as to visually validate and compare how the patterns vary across the brain.

      Reviewer 1: FTD is an umbrella category for a family of distinct clinical syndromes with different atrophy patterns. It doesn't seem a good idea to take the average of all subjects in this group to form a single atrophy map. Instead, different average maps for each syndrome should be provided.

      (2) Considering the heterogeneity of clinical FTD syndromes, we addressed the reviewers' concerns about using the averaged atrophy map across all patients with an FTD diagnosis. As suggested, we accessed different atrophy maps for each major variant of clinical FTD, including behavioral FTD (n = 70), as well as the semantic (n = 36) and nonfluent variants of primary progressive aphasia (n = 30). These maps are based on data from the participants from the same dataset of the Frontotemporal Lobar Degeneration Neuroimaging Initiative (FTLDNI) that we originally used. Similar to our previous results using the atrophy map averaged over all FTD patients, the analysis showed significant associations of atrophy patterns with cell type densities in all three major variants (see Figure 3A). Notably, these new findings offer insights into specific differences in spatial vulnerability of different cell-types across the variants of FTD, each characterized by unique symptoms, clinical manifestations, and atrophy patterns. In response to these additions, we have updated all figures, results, and interpretations accordingly.

      Reviewer 2: In the abstract, the list of neurodegenerative disorders should be edited: frontotemporal dementia is an umbrella clinical syndrome, not a neurodegenerative disorder. Frontotemporal lobar degeneration (FTLD) is a neurodegenerative disorder, and many tauopathies are FTLDs. While the authors grab their definitional classes from various sources (i.e., published cohort, and other studies), the reader fatigues to understand the population that is being assessed.

      (3) To address potential confusion arising from the inclusion of atrophy maps from FTLD patients across two different studies, stratified based on both clinical and pathological criteria, we added clarifications regarding the assessed population and the used definitions. We used the term FTD when addressing the clinical syndromes, and the term FTLD was employed when referencing the histologically confirmed neurodegenerative pathologies. In addition, we added details on the diagnostic criteria employed for participant recruitment in the FTLDNI cohort, which data we used for atrophy maps in clinical subtypes of FTD. Lastly, throughout the text and within the figures, we systematically refined the nomenclature for FTLD pathological types, categorizing them based on their known definitions used in literature and type of proteinaceous inclusions (FTLD- 3-repeat and 4-repeat tauopathies and FTLD-TDP types A and C).

      Reviewer 1: The results section contains perhaps too much interpretation. While the information that's provided serves as an interesting review (e.g., the discussion of the blood-brain barrier), the discussion may be a better place for this.

      (4) We removed sentences with excessive interpretation but insisted on including those outlining the fundamental functions of cell types and their literature-based relevance to neurodegenerative diseases in the Results section, clarifying the significance of our findings to the readers.

      Reviewer 2: The authors based their methodology on the use of a deconvolutional cell classifier; however, do not extensively recognize that their data on gene expression are based on normal brain levels rather than on diseased ones.

      (5) We acknowledged that the gene expression data is based on normal human brain levels in figure titles and all sections of the paper (Introduction, Results, Discussion, Methods) to remind the readers that the analysis shows how changes in gray matter tissue in diseased brains correlates with healthy reference levels of cellular density.

      Reviewer 2: More information in the text needs to be provided regarding the method used to infer gene expression levels at non-sampled brain locations. The reader should not be forced to read reference 40 or investigate the methods section. Figure 1 schematics do not sufficiently explain the used method.

      (6) We added clarifications/references about the used Gaussian progress regression for imputing gene expression (Results and figure titles).

      Reviewer 2: Also, while predicted levels are uniquely based on patterns of brain atrophy, it is not possible to know whether this strategy is generalizable to all diseases (for instance, it is known that pure DLB, PD and ALS are not associated with extensive brain atrophy), or even adequately comparable between subtypes of diseases within the same class (e.g., different forms of FTLD). The authors do not acknowledge that only data based on true neuropathological assessment may prove whether their findings are true.

      (7) Although diagnoses of most dementia conditions used in our study were histologically confirmed, we added acknowledgement about the importance of neuropathological assessment (Discussion section).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      One criticism the authors have made of previous studies was that they have not distinguished between 'tonic' and 'phasic' LC activity and could not demonstrate 'time- locked phasic firing'. This has not been achieved in the present report, as an examination of Fig 1 C,D and 2 C,D shows. Previous reports in rats and monkeys, using unit recording in rats and monkeys clearly show that the latency of LC 'phasic' responses to salient or behaviorally relevant stimuli are in the range of tens of milliseconds, with a very short duration, often followed by a long-lasting inhibition. This kind of temporal precision concerning the phasic response cannot be gleaned from the time scale shown in the Figures (assuming the time scale is in seconds). We can discern a long-lasting increase in tonic firing level for the more salient stimuli (Fig 1C) (although the authors state in the discussion that "we did not observe obvious changes in tonic LC-HPC activity). This calcium imaging methodology as used in the present experiments can give us a general idea of the temporal relation of LC response to the stimulus, but apparently does not afford the millisecond resolution necessary to capture a phasic response, at least as the data are presented in the Figures.

      While we understand the reviewer’s concern with our use of the terms phasic and tonic, we believe we have represented them as accurately as possible given our data. Unfortunately, the distinction between tonic and phasic activity is somewhat arbitrary, in that there is no strict definition, to our knowledge, of the exact parameters that activity must fall into to be categorized as tonic or phasic. While it is true that phasic LC activity has typically been studied with electrophysiological approaches that afford millisecond resolution and that observed phasic responses are often extremely short, there are numerous differences between those studies and this one. Most prominently, the stimuli used to elicit a phasic response are generally extremely short (often 1ms or less) and therefore generate extremely short phasic responses (Aston-Jones and Bloom, 1981a; Aston-Jones and Cohen, 2005), but this is not to say that phasic responses might not be longer in response to a longer lasting stimulus. Moreover, tonic activity is reported to track with behavioral state on the order of dozens of seconds to minutes and is not reported in response to specific stimuli (Aston-Jones and Bloom, 1981b). The “phasic” responses we report generally decay in less than 5 seconds in our fluorescence signals. Given the slow time course of decay for GcAMP6s (a single action potential can generate a response that lasts 3 or more seconds (Chen et al., 2013)) and the GRAB sensors (GRAB-DA2h τoff = 7.2s (Sun et al., 2020)), the underlying neural responses would have lasted for a significantly shorter period. Therefore, we believe the responses we observed are much more consistent with phasic responses to long-lasting sensory stimuli (20-second tone, 1-2 second shock), than with increases in tonic activity associated with a change in behavioral state. Finally, regardless of whether these responses are exactly the same as previously reported phasic responses, our photometry and optogenetics studies provide insight about a form of LC activity that is fundamentally different than what can be gleaned from much slower dialysis, lesion, and pharmacology studies. Nonetheless, we added the following to the discussion section to clarify the limitations of our interpretation:

      “…given their relatively short duration and the fact that they are elicited specifically by salient sensory stimuli, we refer to these responses as “phasic responses.” However, because of the comparatively slow dynamics of fluorescent sensors relative to electrophysiology, we cannot rule out the possibility that these responses are somehow different in nature to previously reported phasic LC responses. Thus, some care must be taken in conflating the characteristics and/or function of the relatively short-lasting responses presented here and the extremely fast phasic responses to very brief (μs to ms) sensory stimuli reported previously.”

      Much of the data presented here can be regarded as 'proof of concept' i.e. demonstrating that Photometric imaging of calcium signalling yields similar results concerning LC responses to salient or behaviorally relevant stimuli as has been previously reported using electrophysiological unit recording. The role of dopamine as the principal player in hippocampaldependent learning also corroborates previous reports.

      Although some of the data presented in this study could be seen as “proof of concept” or “confirmatory” of previous results, we believe this work extends previous results by showing 1) the importance of hippocampal dopamine to aversive hippocampus-dependent learning and trace fear conditioning specifically, 2) that LC responses are important at the specific times of learning (i.e. CS/US onset/termination), and 3) that dopamine in the hippocampus is likely important for learning in a way that is not tied to prediction error or memory consolidation.

      No attempt was made to address the important current question of the modular organisation of Locus Coeruleus, although the authors recognize the importance of this question and propose future experiments using their methodology to record simultaneously in several LC projection sites.

      While we do recognize the importance of this modular organization, which is addressed in the discussion as the reviewer mentions, experiments addressing this organization are beyond the scope of the present study. Future work will address the possibility that LC projections to different regions show differential responses during learning.

      The phasic-tonic issue has not been resolved by these experiments. Phasic responses of LC single units are short-latency, short-lived (just 3-4 action potentials), and followed by a relatively long refraction period. Multiunit responses will have a more jittery latency and longer-lasting response (but still only tens to hundreds of milliseconds). Your figures clearly show long-lasting increases in tonic firing levels, even though you state the contrary in the discussion. Therefore, I strongly recommend removing the word 'phasic' from the title.

      Addressed above.

      Yohimbine, the Alpha 2 antagonist, administered systemically, induces a massive increase in the rate of firing of LC cells (through blockade of autoinhibition at the cell body level at terminals). I guess its effect on the receptor 'backbones' overrides the massive release of NE and/or DA, but you might want to mention this; also include the dose of all drug treatments.

      Yes, yohimbine’s effect on the GRAB-NE signal is somewhat counter-intuitive given the known effect of yohimbine on norepinephrine levels. However, our result is consistent with previous reports (Feng et al., 2019). We have added the following to the results section to clarify:

      “Thus, even though yohimbine is known to increase NE levels in the hippocampus (Abercrombie et al., 1988), its blockade effect on the GRAB-NE sensor should result in a decrease in fluorescence after administration.”

      Include time scale units on all figures (I assume it is seconds in Figs 1 &2).

      Thank you for pointing out this issue, we have added units on all figures.

      • Is it possible to have a better quality example of staining? Fig 1 B in particular is very blurry. Is the yellow double staining? Please indicate. Most of the GCaMP seems to be outside the main area of TH staining. Fig 4 B is much nicer--and it looks morphologically, like LC.

      Unfortunately, the GcAMP6s staining was very dim in our hands and resulted in relatively blurry images. Yes, in this case, yellow is double staining. Regarding the morphology, the GCaMP image is taken from a sagittal section and the shape of expression is consistent with images of LC in the sagittal plane. However, given the quality of our ChR2 images, we are confident in the specificity of expression in these mice.

      Reviewer 2

      The claim that dopamine release in dHPC is caused by LC neurons is not directly tested. Unfortunately, the most critical experiment for the claims that dopamine release comes from LC during conditioning is not tested. A lack of dopamine signal in dHPC caused by inhibition of LC during TFC would show this. It is indeed an interesting observation that chemoegenetic activation of LC causes dopamine release in the dHPC. However, in the absence of concurrent VTA inhibition or lesion, it remains a possibility that the dopamine release is mediated through indirect actions on other dopamine-expressing neurons. The authors do a good job of arguing against this interpretation in the discussion, and the literature seems appropriate for this. However, the title is still an overstatement of the data presented in this study.

      We agree with the reviewer’s comments. As indicated in the discussion, it is possible that hippocampal dopamine is increased indirectly via LC projections to dopaminergic midbrain regions. We believe that our title is consistent with this possibility. When phasic stimulation was delivered to the LC, dopamine levels increased in the hippocampus and trace fear conditioning was enhanced. The observed increase in dopamine could be direct or indirect. As the reviewer notes, we argue for the former in the discussion section. A number of experiments would be needed to show this directly (record dopamine while: inhibiting the LC, inhibiting the VTA, stimulating LC while simultaneously inhibiting the VTA etc.) and we are planning to do these in the future.

      The primary alternative interpretations of the phasic activation experiment are whether only stimulation to the cue events (both on and off), or whether only stimulation to the shock. Thus this experiment would benefit from additional data showing either a no shock control, to show that enhanced activity of the LC to the tone is not inherently aversive, or manipulations to the tone but not to the shock.

      Future work will explore whether the contribution of LC to learning is primarily due to its activation during the CS or the US. However, this is beyond the scope of this manuscript.

      Specificity of the GRAB-NE and GRAB-DA sensors should be either justified through additional experiments testing the alternative antagonist (i.e. GRAB-NE CNO+eticloprode / GRAB-DA CNO+yohimbine) or additional citations that have tested this already. It is critical for the claims of the paper to show that these sensors are specific to dopamine or norepinephrine.<br /> Although sensitivity is a potential concern, these sensors have been thoroughly vetted and used by many groups since their generation. In particular, the creators of these sensors provided extensive data showing their specificity. The GRAB-DA sensor is ~10 fold more sensitive to DA than to NE (Sun et al., 2020, cited 239 times) and the GRAB-NE sensor is ~37 fold more sensitive to NE than to DA (Feng et al., 2019, cited 371 times).

      The role of dopamine in prediction error was tested through a series of conditions whereby the shock was presented either signaled (i.e. predicted), or not. However, another way that prediction error is signaled is through the absence of an expected outcome. Admittedly it might not be possible to observe a decrease in dopamine signaling with this methodology.

      Although this is a strong point, given that the study is not primarily focused on error prediction and the low likelihood of observing the typically small decrease in signaling during expected outcome omission, we feel that additional error prediction studies are beyond the scope of this manuscript. However, further experiments as suggested by the reviewer could prove interesting in future studies.

      The difference between Fig. 6E and 6H needs to be clarified. What is shown in Fig. 6E is that the response to the shock decreases through experience (i.e. by the 10th trial). However in Fig 6H, there is no difference between signaled and signaled shock, but this is during conditioning, and not after learning (based on my understanding of the methods, line 482).

      We are not sure we fully understand what point of clarification the reviewer is asking for. However, we have clarified in the methods that the signaled vs unsignaled shock experiment took place in animals that had already been trained on TFC. Thus, all of the trials took place after the animals had learned the tone-shock association. Therefore, although the drop in shock-response could be taken as an indicator of a prediction-error like signal, all the other data points to this not being the case (no change in tone response over training, no difference in signaled vs. unsignaled responses after training).

      Unless I missed it, at no point in the manuscript is the number of subjects described. Please add the n per experiment within each section describing each experiment in the methods (Behavioral procedures). Some more details in the photometry statistical analysis would be helpful. For example, what is the n per group for every data set that is presented? How many trials per analysis?

      We thank the reviewer for pointing this out. Animal numbers have been added in the methods section in the Behavioral Procedures, Optogenetics, and Drugs sub-sections and in the figure legends. Trial numbers are included in these sections and all trials were used for analysis.

      What is the difference in experimental procedure between Fig. 2D and Fig. 3B? It seems that they are the same, and yet the LC response to the conditioned CS is not.

      Fig. 3B is simply the Day 1 data from Fig 2D presented at a different scale because the shock response is included in Fig. 3B which necessitates a larger scale on both axes. Close inspection of the figures will show that the shapes of these two curves and the error around them is the same, but the different scaling obfuscates this slightly.

      Typo in the legend of Figure 2 - D should be E.

      Thank you, we have corrected this.

      • Anatomical localization of the virus injections, and more importantly the fiber placements, is not shown. Including this information helps with replication and understanding where exactly the observations were made in dHPC to contrast with prior studies.

      Representative examples are included in the manuscript in figure 1B, 3F, 4B, and 5B.

      Reviewer 3

      While the optogenetic study was lovely, a control using the same stimulation but delivered at different time points would have been a good addition to show how critical the neural signal at tone onset, tone offset, and shock is.

      We agree that it would be interesting in future studies to delineate the specific times when LC stimulation produces a learning enhancement. It could be that LC activity is most important during one specific time period (eg. just during shock) or that all three periods of activation are required. It would also be useful to know whether stimulation at other times during learning can produce an enhancement given the potentially long-lasting effects of dopamine on HPC plasticity and learning.

      Justification for the focus on D1 receptors was lacking.

      We chose to focus on D1 receptors because previous studies have shown that these receptors are critical for memory formation or consolidation in the hippocampus. We have added a sentence justifying this in the results section.

      “To test whether dopamine is required for trace fear memory formation, we administered the dopamine D1 receptor antagonist SCH23390 (0.1mg/kg) 30 minutes before training, as D1/D5 receptors have previously been shown to be critical for other types of hippocampus dependent memory and plasticity (Frey et al., 1990; Huang and Kandel, 1995; O’Carroll et al., 2006; Wagatsuma et al., 2018).”

      The manuscript provides convincing evidence that the neural signal is not an error- correcting one by including a predicted (by a tone) and unpredicted shock. One possibility is that perhaps the unpredicted shock could be predicted by the context. Some clarification on the behavioural procedures would help understand if indeed the unsignaled shock could be predicted by the context or not.

      Mice always exhibit freezing in the training environment, so the context is definitely a predictor of shock. However, the tone is a much better predictor because it is always followed by shock while the mice spend a large amount of time in the context without being shocked. This is demonstrated by the fact that the same procedure used in the current experiments consistently produces more tone fear than context fear (Wilmot et al., 2019). While we did not do long-term memory tests here, we assume the same dissociation occurred as it has been observed very consistently across studies (Chowdhury et al., 2005; Kitamura et al., 2014; Wilmot et al., 2019). Nonetheless, it is possible that a difference between signaled and unsignaled groups was obscured by the context. We should note however, that differences between dopaminergic responses to cued and uncued rewards and aversive outcomes has been observed and these animals were also trained in the same context (Eshel et al., 2016; Matsumoto and Hikosaka, 2009; Pan et al., 2005; Schultz, 1998). Therefore, we believe this experiment does differentiate the observed dopamine response in the hippocampus from previously reported VTA dopamine prediction error signaling.

      Figure 2 - tone termination in Tone only group - no change? Stats?

      Thank you for pointing out this omission. We have added the stats to the figure legend. Although the response to tone termination decreased numerically, it did not change significantly across days. This is one point we may seek to clarify in future studies, as the difference between tone onset and termination responses is unexpected. Given the relatively small responses, it’s possible future studies with stronger signal (eg. GcAMP8) may find differences in the tone termination response across training days. This is one of the reasons we focused primarily on the responses to tone onset and shock in the rest of the manuscript.

      Fig 4 data - stimulation at time incongruent with the signal as a control for the timing of stim.

      This is addressed above.

      Fig 5 - GRAB-NE - yohimbine seems to suppress the signal below the vehicle. Not the case for GRAB-DA. Is this sig? post-hoc stats?

      Yes, this does appear to be the case for GRAB-NE, and would not be entirely surprising given that there is likely a baseline level of NE (and dopamine) in the hippocampus that produces some degree of baseline fluorescence in the vehicle group. This signal could be reduced/abolished by blocking the sensor and preventing this baseline level of NE from binding and producing fluorescence. This may not be the same for the GRAB-DA for a variety of reasons – different sensor binding affinities, different baseline neurotransmitter levels, potentially non-equivalent drug doses, etc. Because of the large number of pairwise comparisons in this data (18), we did not make post-hoc pairwise comparisons.

      Shock response curve - lines 466-474 - some explanation of what the pseudorandom order of shock presentation means.

      We have added the following explanation to this section:

      “…pseudorandom order, such that the shocks did not occur in ascending or descending order or follow the same pattern in each block,…”

      Line 126 - the extinction came out of the blue, it needs some introduction such as a statement that the animals were exposed to extinction training following conditioning.

      We have added the following earlier in that same paragraph:

      “On the second and third days, mice underwent extinction trials in which no shocks were administered.”

      References in Response

      Abercrombie ED, Keller RW, Zigmond MJ. 1988. Characterization of hippocampal norepinephrine release as measured by microdialysis perfusion: Pharmacological and behavioral studies. Neuroscience 27:897–904. doi:10.1016/0306-4522(88)90192-3

      Aston-Jones G, Bloom FE. 1981a. Nonrepinephrine-containing locus coeruleus neurons in behaving rats exhibit pronounced responses to non-noxious environmental stimuli. Journal of Neuroscience 1:887–900. doi:10.1523/JNEUROSCI.01-08-00887.1981

      Aston-Jones G, Bloom FE. 1981b. Activity of norepinephrine-containing locus coeruleus neurons in behaving rats anticipates fluctuations in the sleep-waking cycle. J Neurosci 1:876–886. doi:10.1523/JNEUROSCI.01-08-00876.1981

      Aston-Jones G, Cohen JD. 2005. AN INTEGRATIVE THEORY OF LOCUS COERULEUSNOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance. Annual Review of Neuroscience 28:403–450. doi:10.1146/annurev.neuro.28.061604.135709

      Chen T-W, Wardill TJ, Sun Y, Pulver SR, Renninger SL, Baohan A, Schreiter ER, Kerr RA, Orger MB, Jayaraman V, Looger LL, Svoboda K, Kim DS. 2013. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499:295–300. doi:10.1038/nature12354

      Chowdhury N, Quinn JJ, Fanselow MS. 2005. Dorsal hippocampus involvement in trace fear conditioning with long, but not short, trace intervals in mice. Behavioral Neuroscience 119:1396–1402. doi:http://dx.doi.org/10.1037/0735-7044.119.5.1396

      Eshel N, Tian J, Bukwich M, Uchida N. 2016. Dopamine neurons share common response function for reward prediction error. Nat Neurosci 19:479–486. doi:10.1038/nn.4239

      Feng J, Zhang C, Lischinsky JE, Jing M, Zhou J, Wang H, Zhang Y, Dong A, Wu Z, Wu H, Chen W, Zhang P, Zou J, Hires SA, Zhu JJ, Cui G, Lin D, Du J, Li Y. 2019. A Genetically Encoded Fluorescent Sensor for Rapid and Specific In Vivo Detection of Norepinephrine. Neuron 102:745-761.e8. doi:10.1016/j.neuron.2019.02.037

      Frey U, Schroeder H, Matthies H. 1990. Dopaminergic antagonists prevent long-term maintenance of posttetanic LTP in the CA1 region of rat hippocampal slices. Brain Research 522:69–75. doi:10.1016/0006-8993(90)91578-5

      Huang YY, Kandel ER. 1995. D1/D5 receptor agonists induce a protein synthesis-dependent late potentiation in the CA1 region of the hippocampus. Proceedings of the National Academy of Sciences 92:2446–2450. doi:10.1073/pnas.92.7.2446

      Kitamura T, Pignatelli M, Suh J, Kohara K, Yoshiki A, Abe K, Tonegawa S. 2014. Island Cells Control Temporal Association Memory. Science 343:896–901. doi:10.1126/science.1244634

      Matsumoto M, Hikosaka O. 2009. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459:837–841. doi:10.1038/nature08028

      O’Carroll CM, Martin SJ, Sandin J, Frenguelli BG, Morris RGM. 2006. Dopaminergic modulation of the persistence of one-trial hippocampus-dependent memory. Learning & memory 13:760–769.

      Pan W-X, Schmidt R, Wickens JR, Hyland BI. 2005. Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network. J Neurosci 25:6235–6242. doi:10.1523/JNEUROSCI.1478-05.2005

      Schultz W. 1998. Predictive Reward Signal of Dopamine Neurons. Journal of Neurophysiology 80:1–27. doi:10.1152/jn.1998.80.1.1

      Sun F, Zhou J, Dai B, Qian T, Zeng J, Li X, Zhuo Y, Zhang Y, Wang Y, Qian C, Tan K, Feng J, Dong H, Lin D, Cui G, Li Y. 2020. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 17:1156–1166. doi:10.1038/s41592-02000981-9

      Wagatsuma A, Okuyama T, Sun C, Smith LM, Abe K, Tonegawa S. 2018. Locus coeruleus input to hippocampal CA3 drives single-trial learning of a novel context. Proceedings of the National Academy of Sciences 115:E310–E316. doi:10.1073/pnas.1714082115

      Wilmot JH, Puhger K, Wiltgen BJ. 2019. Acute Disruption of the Dorsal Hippocampus Impairs the Encoding and Retrieval of Trace Fear Memories. Frontiers in Behavioral Neuroscience 13. doi:10.3389/fnbeh.2019.00116

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors conducted two tasks at 300 days of separation. First, a social perception task, where Ps responded whether a pictured person either deserved or needed help. Second, an altruism task, where Ps are offered monetary allocations for themselves and a partner. Ps decide whether to accept, or a default allocation of 20 dollars each. The partners differed in perceived merit, such that they were highly deserving, undeserving, or unknown. This categorisation was decided on the basis of a prisoner's dilemma game the partner played beforehand. "Need" was also manipulated, by altering the probability that the partner must have their hand in cold water at the end of the experiment and this partner can use the money to buy themselves out. These two tasks were conducted to assess the perception of need/merit in the first instance, and how this relates to social behaviour in the second. fMRI data were collected alongside behavioural.

      The authors present many analyses of behaviour (including DDM results) and fMRI. E.g., they demonstrate that they could decode across the mentalising network whether someone was making a need or deserving judgement vs control judgement but couldn't decode need vs deserving. And that brain responses during merit inferences (merit - control) systematically covaried with participants' merit sensitivity scores in the rTPJ. They also found relationships between behaviour and rTPJ in the altruism task. And that merit sensitivity in the perception task predicted the influence of merit on social behaviour in the altruism task.

      Strengths:

      This manuscript represents a sensible model to predict social perceptions and behaviours, and a tidy study design with interesting findings. The introduction introduced the field especially brilliantly for a general audience.

      Response: We are pleased that the reviewer found the model sensible and the findings interesting! Below, we respond to each of the reviewer’s comments/critiques.

      Weaknesses: (1) The authors do acknowledge right at the end that these are small samples. This is especially the case for the correlational questions. While the limitation is acknowledged at the end, it is not truly acknowledged in the way that the data are interpreted. I.e. much is concluded from absent relationships, where the likelihood of Type II error is high in this scenario. I suggest that throughout the manuscript, authors play down their conclusions about absence of effects.

      Response: We agree with the reviewer that the limitation of small samples should be adequately reflected in the interpretation of the data. We have therefore added cautionary language to the interpretation of the correlational effects in several places of the revised manuscript. For example, we now state: “However, this absence of effects for need ought to be interpreted with caution, given the comparatively small sample size.” (pg. 33) and “As mentioned above, we cannot rule out the possibility that null findings may be due to the comparatively small sample size and should be interpreted cautiously (also see discussion)” (pg. 34-35).

      (2) I found the results section quite a marathon, and due to its length I started to lose the thread concerning the overarching aims - which had been established so neatly in the introduction. I am unsure whether all of these analyses were necessary for addressing the key questions or whether some were more exploratory. E.g. it's unclear to me what one would have predicted upfront about the decoding analyses.

      Response: We acknowledge and share the reviewer’s concern about the length of the results section and potential loss of clarity. Regarding the decoding analyses, we want to clarify that they were conducted as a sanity check to compare against the results of the univariate analysis. We didn’t have apriori hypotheses regarding these supplemental decoding analysis. We have clarified this issue in the revised version of the manuscript and moved the decoding analyses fully to the supplemental material to streamline the main text. The remaining results reported in the manuscript are indeed all based on apriori, key questions (unless specified otherwise, for example, supplemental analyses for other regions of interest for the sake of completeness). The only exception is the final set of results (Neural markers of merit sensitivity predict merit-related behavioral changes during altruistic choice) which represent posthoc tests to clarify the role of activation in the right temporoparietal junction (rTPJ) in merit-related changes in other-regard in altruistic decisions. While we acknowledge that this is a complex paper, after careful consideration we couldn’t identify any other parts of the results section to remove or report in the supplemental material.

      (3) More specifically, the decoding analyses were intriguing to me. If I understand the authors, they are decoding need vs merit, and need+merit vs control, not the content of these inferences. Do they consider that there is a distributed representation of merit that does not relate to its content but is an abstracted version that applies to all merit judgements? I certainly would not have predicted this and think the analyses raise many questions.

      Response: We thank the reviewer for sharing their thoughts on the decoding analyses and agree that this set of analyses are intriguing, yet raise additional questions, such as the neural computations required to assess content. However, we wish to clarify that the way we view our current results is very much analogous to results obtained from studies of perception in other fields. For example, in the face perception literature, it is often observed that the fusiform face area is uniformly more active, not only when a face (as opposed to an object) is on the screen, but when a compound stimulus consistent of features of a face and other features (e.g. of objects) is on the screen, but participants are instructed to attend to and identify solely the face. Moreover, multivariate activity in the FFA (but not univariate activity) is sufficient to decode the identity of the face. We view the results we report in the manuscript as more akin to the former types of analyses, where any region that is involved in the computation is uniformly more active when attention is directed to judgment-specific features. Unfortunately, the present data are not sufficient to properly answer the latter questions, about which areas enable decoding of specific intensity or identity of merit-related content. Follow-up experiments with a more optimized design are needed. Although interesting, we thus refrain from further discussing the decoding analyses in the manuscript to avoid distracting from the main findings based on the univariate comparison of brain responses observed while participants make merit or need inferences in the social perception task.

      Reviewer #2 (Public Review):

      When people help others is an important psychological and neuroscientific question. It has received much attention from the psychological side, but comparatively less from neuroscience. The paper translates some ideas from a social Psychology domain to neuroscience using a neuroeconomically oriented computational approach. In particular, the paper is concerned with the idea that people help others based on perceptions of merit/deservingness, but also because they require/need help. To this end, the authors conduct two experiments with an overlapping participant pool:

      (1) A social perception task in which people see images of people that have previously been rated on merit and need scales by other participants. In a blockwise fashion, people decide whether the depicted person a) deserves help, b) needs help, and c) whether the person uses both hands (== control condition).

      (2) In an altruism task, people make costly helping decisions by deciding between giving a certain amount of money to themselves or another person. How much the other person needs and deserves the money is manipulated.

      The authors use a sound and robust computational modelling approach for both tasks using evidence accumulation models. They analyse behavioural data for both tasks, showing that the behaviour is indeed influenced, as expected, by the deservingness and the need of the shown people. Neurally, the authors use a block-wise analysis approach to find differences in activity levels across conditions of the social perception task (there is no fMRI data for the other task). The authors do find large activation clusters in areas related to the theory of mind. Interestingly, they also find that activity in TPJ that relates to the deservingness condition correlates with people's deservingness ratings while they do the task, but also with computational parameters related to helping others in the second task, the one that was conducted many months later. Also, some behavioural parameters correlate across the two tasks, suggesting that how deserving of help others are perceived reflects a relatively stable feature that translates into concrete helping decisions later-on.

      The conclusions of the paper are overall well supported by the data.

      Response: We thank the reviewer for the positive evaluation of our study and the comprehensive summary of our main findings. We would like to clarify, though, that we did originally collect fMRI data for the independent altruism task. Unfortunately, due to COVID-19-related interruptions, only 25 participants from the sample that performed the social perception task also completed the fMRI altruism task (see pg. 18). Given the limited sample size and noise level of fMRI data, we moved anything related to the neuroimaging data of the altruism task to the supplemental material (see Note S7) and decided to focus solely on the behavior of the altruism task to address our research objectives. We apologize for any confusion.

      (1) I found that the modelling was done very thoroughly for both tasks. Overall, I had the impression that the methods are very solid with many supplementary analyses. The computational modelling is done very well.

      Response: We are pleased that the reviewer found the computational model sensible.

      (2) A slight caveat, however, regarding this aspect, is that, in my view, the tasks are relatively simplistic, so even the complex computational models do not do as much as they can in the case of more complex paradigms. For example, the bias term in the model seems to correspond to the mean response rate in a very direct way (please correct me if I am wrong).

      Response. We agree that the Bias term relates to mean responding (although it is not the sole possibility: thresholds and starting default biases can also produce changes in mean levels of responding that, without the computational model, are not possible to dissociate). However, we think that the primary value of this parameter comes not from the analysis of the social judgment task (where the reviewer is correct that the bias relates in a quite straightforward way to the mean response rate), but in the relationship of this parameter to the un-contextual generosity response in the altruism task. Here, we find that this general bias term relates not to overall generosity, but rather to the overall weight given to others’ outcomes, a finding that makes sense if the tendency to perceive others as deserving overall yields an increase in overall attention/valuation of their outcomes. Thus, a simple finding in one task relates to a more nuanced finding in another. However, we agree it is important to acknowledge the point raised by the reviewer, and now do so on pg. 20: “It is worth noting that the Bias parameters are strongly associated with (though not the sole determinant of) the mean response rate.”

      (3) Related to the simple tasks: The fMRI data is analysed in a simple block-fashion. This is in my view not appropriate to discern the more subtle neural substrates of merit/need-based decision-making or person perception. Correspondingly, the neural activation patterns (merit > control, need > control) are relatively broad and unspecific. They do not seem to differ in the classic theory of mind regions, which are the focus of the analyses.

      Response: The social perception task is modified from a well-established social inference task (Spunt & Adolphs, 2014; 2015) designed to reliably localize the mentalizing network in the brain. As such, we acknowledge that it is not optimally designed to discern the intrinsic complexities of social perception, or the specific appraisals or computations that yield more or less perception (of need or merit) in a given context. Instead, it was designed to highlight regions that are more generally recruited for performing these social perceptions/inferences.

      We heartily agree with the reviewer that it would be interesting and informative to analyze this task in a trial-wise way, with parametric variation in evidence for each image predicting parametric variation in brain activity. Unfortunately, the timing of this task is not optimal for this kind of an analysis, since trials were presented in rapid and blocked fashion. We were also limited in the amount of time we could devote to this task, since it was collected in conjunction with a number of other tasks as part of a larger effort to detail the neural correlates of social inference (reported elsewhere). Thus, we were not able to introduce the kind of jittered spacing between trials that would have enabled such analysis, despite our own wish to do so. We hope that this work will thus be a motivator for future work designed more specifically to address this interesting question, and now include a statement to this effect on pgs. 2223: “Future research may reveal additional distinctions between merit and need appraisals in trial-wise (compared to our block-wise) fMRI designs.”

      References:

      Spunt, R. P. & Adolphs, R. Validating the Why/How contrast for functional MRI studies of Theory of Mind. Neuroimage 99, 301-311, doi:10.1016/j.neuroimage.2014.05.023 (2014).

      Spunt, R. P. & Adolphs, R. Folk explanations of behavior: a specialized use of a domain-general mechanism. Psychological Science 26, 724-736, doi:10.1177/0956797615569002 (2015).

      (4) However, the relationship between neural signal and behavioural merit sensitivity in TPJ is noteworthy.

      Response: We agree with this assessment and thank the reviewer for their positive assessment; we feel that linking individual differences in merit sensitivity with variance in TPJ activity during merit judgments is one of the key findings of the study.

      (5) The latter is even more the case, as the neural signal and aspects of the behaviour are correlated across subjects with the second task that is conducted much later. Such a correlation is very impressive and suggests that the tasks are sensitive for important individual differences in helping perception/behaviour.

      Response: Again, we share the reviewer’s impression that this finding is more noteworthy for appearing in tasks separated both by considerable conceptual/paradigmatic differences, and by such a long temporal distance. These findings make us particularly excited to follow up on these results in future research.

      (6) That being said, the number of participants in the latter analyses are at the lower end of the number of participants that are these days used for across-participant correlations.

      Response: We fully agree with this assessment. Unfortunately, COVID-related disruptions in data collection, as well as the expiration of grant funds due to the delay, severely limited our ability to complete assessments in a larger sample. Future research needs to replicate these results in a larger sample. We comment on this issue in the discussion on pg. 40. If the editor or reviewer has suggestions for other ways in which we could more fully acknowledge this, we would be happy to include them.

      Reviewer #3 (Public Review):

      Summary:

      The paper aims to provide a neurocomputational account of how social perception translates into prosocial behaviors. Participants first completed a novel social perception task during fMRI scanning, in which they were asked to judge the merit or need of people depicted in different situations. Secondly, a separate altruistic choice task was used to examine how the perception of merit and need influences the weights people place on themselves, others, and fairness when deciding to provide help. Finally, a link between perception and action was drawn in those participants who completed both tasks.

      Strengths:

      The paper is overall very well written and presented, leaving the reader at ease when describing complex methods and results. The approach used by the author is very compelling, as it combines computational modeling of behavior and neuroimaging data analyses. Despite not being able to comment on the computational model, I find the approach used (to disentangle sensitivity and biases, for merit and need) very well described and derived from previous theoretical work. Results are also clearly described and interpreted.

      Response: We thank the reviewer for their positive comments regarding presentation, approach, and content.

      Weaknesses:

      My main concern relates to the selection of the social perception task, which to me is the weakest point. Such weakness has been also addressed by the same authors in the limitation section, and related to the fact that merit and need are evaluated by means of very different cues that rely on different cognitive processes (more abstract thinking for merit than need). I wonder whether and how such difference can bias the overall computational model and interpretation of the results (e.g. ideal you vary merit and need to leave all other aspects invariant).

      Response: We agree with the reviewer on the importance of future research to more fully unpack the differences in this task, and develop better ways to manipulate need and merit in more comparable fashion. However, we point out that the issue of differences in abstractness of cues for need and merit does not actually seem to have a strong influence on the parameters retrieved by the computational model. Participants seem to be equally sensitive to BOTH merit and need information, despite that information deriving from different sources, as evidenced by the fact that the magnitude of the sensitivity parameters for need and merit in the social judgment task were nearly identical, and not statistically distinguishable. Nor were other parameters related to non-decision time or threshold statistically different (see Supplemental Table S2). If our results were driven purely by differences in the difficulty or abstractness of these judgments, we would have expected to see some evidence of this in the computational model, in the form of longer non-decision times, higher thresholds, or both. We do not. Likewise, the neural underpinnings evoked by both need and merit perceptions in this task (in the mentalizing brain network) were comparable. This is not to say that there aren’t real differences in the cues that might signal these quantities in our social perception task - just that there is little direct evidence for this difference in computational parameters or evoked brain responses, and thus it is unlikely that our results (which rely on an analysis of computational parameters) are driven solely by computational model biases, or the inability of the model to adequately assess participant sensitivity to need as opposed to merit.

      A second weakness is related to the sample size which is quite small for study 2. I wonder, given that study 2 fRMI data are not analyzed, whether is possible to recover some of the participants' behavioral results, at least the ones excluded because of bad MR image quality.

      Response: We fully agree with the reviewer that increasing the sample size for the cross-task correlations would be desirable. Unfortunately, the current sample size already presents the maximum of ‘usable’ data; the approach suggested by the reviewer won’t affect the sample size. We used all participants whose behavioral data in the altruism task suggested they were performing the task in good faith and conscientiously.

      Finally, on a theoretical note, I would elaborate more on the distinction of merit and need. These concepts tap into very specific aspects of morality, which I suspect have been widely explored. At the moment I am missing a more elaborate account of this.

      Response: Need and merit are predominantly studied in separate lines of research (Molouki & Bartels, 2020) so there is relatively little theoretical research on the distinction between the two. Consequently, Siemoneit (2023) states that the relation between the concepts of need and merit in allocative distributions remains diffuse. To emphasize the distinct concepts of morality in the introduction we have now added to pg. 3: “Need and deservingness (merit) are two distinct principles of morality. The need principle involves distributing resources to those who require them, irrespective of whether they have earned them, while the "merit principle" focuses on allocating resources based on individuals' deservingness, regardless of their actual need (Wilson, 2003).”

      One of the added values of our paper to the research literature is in adding to the clarification of computational and neural underpinnings of broad concepts like merit and need. To highlight the latter point, we have added the following statement on pg. 5 to the manuscript: “Examining need and merit concurrently in this task will also help clarify the computational and neural underpinnings of related, but distinct concepts, distinguishing between them more effectively.”

      References:

      Molouki, S., & Bartels, D. M. (2020). Are future selves treated like others? Comparing determinants and levels of intrapersonal and interpersonal allocations. Cognition, 196, 104150.

      Siemoneit, A. (2023). Merit first, need and equality second: hierarchies of justice. International Review of Economics, 70(4), 537-567.

      Wilson, C. (2003). The role of a merit principle in distributive justice. The Journal of ethics, 7, 277-314.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I acknowledge the difficulty with respect to recruitment, especially in the age of covid, but is it possible for the authors to collect larger samples for their behavioural questions via online testing? Admittedly, I'm sure they don't want to wait 300 days to have the complete dataset, but I would be in favour of collecting a sample in the hundreds on these behavioural tasks, completed at a much shorter separation (if any). I believe this would strengthen the authors' conclusions considerably if they could both replicate the effects they have and check these null effects in a sample where they could draw conclusions from them. Indeed, Bayesian stats to provide evidence for the null would also help here.

      Response: We share the reviewer’s desire to see these results replicated (ideally in a sample of hundreds of participants). We have seriously considered the possibility of trying to replicate our results online, even before submitting the first version of the paper. However, it is difficult to fully replicate this paradigm online, given the elaborate story and context we engaged in to convince participants that they were playing with real others, as well as the usage of physical pain (Cold Pressor Task) for the need manipulation in the altruism task. Moreover, given comments by this reviewer that the results are already a little long, adding a new, behavioral replication would likely only add to the memory burden for the reader. We have thus opted not to include a replication study in the current work. However, we are actively working on a replication that can be completed online, using a modified experimental paradigm and different ways to manipulate need and merit. Because of the differences between that paradigm and the one described here, which would require considerable additional exposition, we have opted not to include the results of this work in the current paper. We hope to be able to publish this work as a separate, replication attempt in the future.

      Given the difficulty of wading through the results section while keeping track of the key question being answered, I would suggest moving any analyses that are less central to the supplementary. And perhaps adding some more guiding sentences at the start and end of each section to remind the reader how each informs the core question.

      Response: We deliberated for quite some time about what results could be removed, but in the end, felt that nearly all results that we already described need to be included in the paper, since each piece of the puzzle contributes to the central finding (relating parameters and behavior to neural and choice data across two separate tasks). However, we did move the decoding analysis results to the supplemental (see point below). We also take the reviewers point that the results can be made clearer. We thus have worked to include some guiding sentences at the start and end of sections to remind the readers how each analysis informs the core questions.

      I think it needs unpacking more for the reader what they should conclude from the significant need+merit vs control decoding analyses, and what they would have expected in terms of cortical representation from the decoding analyses in general.

      Response: We agree with the reviewer that given the decoding results position in the main manuscript it would need unpacking. After considering the reviewer's prior suggestion, we have reevaluated the placement of these supplemental results. Consequently, we have relocated it to the supplemental materials, as it was deemed less relevant to directly addressing the core research questions in the main manuscript. On pg. 23, the main manuscript now only states “We also employed supplemental multivariate decoding analyses (searchlight analysis 85-87), as commonly used in social perception and neuroscience research 7,58,82,88,89, corroborating our univariate findings (see Supplemental Note S6, Supplemental Table S10).”

      Reviewer #2 (Recommendations For The Authors):

      (1) I would suggest moving information on how the computational models were fitted to the main text.

      Response: The computational models are a key element of the paper and we deliberated about the more central exposure of the description of how the models were fitted in the main manuscript. However, we are concerned about the complexity and length of the article, which requires quite a lot from readers to keep in mind (as also commented on by reviewer 1). Those readers who are particularly interested in details of model fitting can still find an extensive discussion of the procedures we followed in the supplements. We thus have opted to retain the streamlined presentation in the main manuscript. However, if the editor feels that including the full and extensive description of model fitting in the main paper would significantly improve the flow and exposition of ideas, we are happy to do so.

      (2) For the fMRI analyses: Could it be worth analysing the choices in the different conditions? They could be modelled as a binary regressor (yes/no) and this one might be different across conditions (merit/need/hands). Maybe this won't work because of the tight trial timeline, but it could be another avenue to discern differences across fMRI conditions.

      Response: We thank the reviewer for this interesting suggestion! Unfortunately, the block design and rapid presentation of stimuli within each condition make it challenging to distinguish the different choices (within or across conditions). While we see the merit in the suggested analytical approach (in fact, we discussed it before the initial submission of the article), it would require some modifications of the task structure (e.g., longer inter-trial-intervals between individual stimuli) and an independent replication fMRI study. We were not able to have such a long inter-trial interval in the original design due to practical constraints on the inclusion of this paradigm in a larger effort to examine a wide variety of social judgment and inference tasks. We hope to investigate this kind of question in greater detail in future fMRI work.

      (3) The merit effects seem to be more stable across time than the need conditions. Would it be worthwhile to test if the tasks entailed a similar amount of merit and need variation? Maybe one variable varied more than the other in the task design, and that is why one type of effect might be stronger than the other?

      Response: We thank the reviewer for drawing attention to this important point. We used extensive pilot testing to select the stimuli for the social perception task, ensuring an overall similar amount of need and merit variation. For example, the social perception ratings of the independent, normative sample suggest that the social perception task entails a similar amount of need and merit variation (normative participant-specific percentage of yes responses for merit (mean ± standard deviation: 53.95 ± 13.87) and need (45.65 ± 11.07)). The results of a supplemental paired t-test (p = 0.122) indicate comparable SD for need and merit judgments. Moreover, regarding the actual fMRI participant sample, Figure S3 illustrates comparable levels of variations in need and merit perceptions (participant-specific percentage of yes responses for merit (56.70 ± 11.91) and need (48.69 ± 10.81) in the social perception task). Matching the results for the normative sample, the results of a paired t-test (p = 0.705) suggest no significant difference in variation between need and merit judgments. With respect to the altruism task, we manipulated the levels of merit and need externally (high vs. low).

      Reviewer #3 (Recommendations For The Authors):

      (1) It would be good to provide the demographics of each remaining sample.

      Response: We appreciate the attention to detail and agree with the reviewer’s suggestion. We have now added the demographics for each remaining sample to the revised manuscript.

      (2) The time range from study 1 to study 2, is quite diverse. Did you use it as a regressor of no interest?

      Response: We thank the reviewer for this interesting suggestion. We have examined this in detail in the context of our cross-task analyses (i.e., via regressions and partial correlations). Interestingly, variance in the temporal delay between both tasks does not account for any meaningful variation, and results don’t qualitatively change controlling for this factor.

      For example, when we controlled for the delay between both separate tasks (partial correlation analysis), we confirmed that variance in merit sensitivity (social perception task) still reflected meritinduced changes in overall generosity (altruism task; p = 0.020). Moreover, we confirmed that variance in merit sensitivity reflected individuals’ other-regard (p = 0.035) and self-regard (p = 0.040), but not fairness considerations (p = 0.764) guiding altruistic choices. Regarding people’s general tendency to perceive others as deserving, we found that the link between merit bias (social perception task) and overall other-regard (p = 0.008) and fairness consideration (p = 0.014) (altruism task) holds when controlling for the time range (no significant relationship between merit bias and self-regard, p = 0.191, matching results of the main paper).

      We refer to these supplemental analyses in the revised manuscript on ps. 33 and 35: “Results were qualitatively similar when statistically controlling for the delay between both tasks (partial correlations).”

      (3) Why in study 1 a dichotomous answer has been used? Would not have been better (also for modeling) a continuous variable (VAS)?

      Response: We appreciate the reviewer's thoughtful feedback. In Study 1, opting for a dichotomous response format in the social perception task (Figure 1a) was a deliberate methodological choice. This decision, driven by the study's model requirements, aligns with the common use of a computational model employing two-alternative forced choices ("yes" and "no") as decision boundaries. While drift– diffusion models for multiple-alternative forced-choice designs exist, our study's novel research questions were effectively addressed without their complexity. Finally, our model cannot accept continuous response variables as input unless they are transformed into categorical variables.

      (4) In the fMRI analyses, when you assess changes in brain activity as a function of merit, I would control for need (and the other way round), to see whether such association is specific.

      Response: Regarding the reviewer’s suggestion on controlling for need when assessing changes in brain activity as a function of merit, and vice versa, we would like to clarify the nature of our fMRI analyses in the social perception task. Our focus is on block-wise assessments (need vs. control, merit vs. control, need vs. merit blocks, following the fMRI task design from which our social perception task was modified from). We don’t assess changes in brain activity as a function of the level of perceived merit or need (i.e., “yes” vs. “no” trials within or across task blocks). Blocks are clearly defined by the task instruction given to participants prior to each block (i.e., need, merit, or control judgments). Thus, unfortunately, given the short inter-stimulus-intervals of each block, the task design is not optimal to implement the suggested approach.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1, in both the public review and recommendations to authors, raises the important question of generalizability of the new technique to other brain areas, to analysis with sorters other than Kilosort, and in the absence of reference data. Specifically, how can experimenters working in brain areas other than visual cortex understand if the tracking is functioning, and set the parameters in the tracking pipeline.

      We agree that generalizability of the tracking procedure is a serious issue, especially with respect to other brain areas with varying degrees of measured waveform preservation over time. As the number of potential recording conditions is combinatorial to experimentally test, we instead address these issues in the manuscript by providing a general prescription for interpreting the distribution of vertical distances of matched pairs that can be used for data from any recording using any spike-sorter (Methods section 4.2, Supplement section 8.4, figure S9, paragraphs 7-10 of the Discussion section). This extension of the method allows users to estimate the matching success in the context of their own data, even in the absence of reference data. To address the concern of overfitting, we have also added discussion covering adjustment of the two parameters in the procedure (the relative weight of waveform distance vs. physical distance, and the threshold for accepting matches as real) to the Discussion section.

      Reviewer #2 suggested clarification of the following points in the public review. We answer those here and have also clarified these points in the main text where appropriate.

      (1) What is the purpose of testing the drift correction with imposed drift (Figure 2, page 6 in the original manuscript), and how the value was chosen?

      To test the ability of EMD to detect substantial drift, we need examples that resemble experimental data, including error in fit unit positions and units with no correct matches. We chose to create these examples by taking waveform and position sets from real data with modest drift, and adding a fixed shift to one dataset. The value of 12 um in the figure is arbitrary, simply an example in the range of real drift. These tests allow us to demonstrate the success of EMD for detection of drift in real data.

      (2) How is performance affected by using a different weighting of the 2 measures (physical distance and waveform distance) in the EMD?

      Recovery rate (number of reference units successfully matched in EMD) vs weighting of the waveform distance is shown in Supplement section 8.10. Recovery rate increases with low values of waveform weighting, leveling off at a value of 1500. We selected that inflection point for the analysis in this paper, to avoid coincidental matching of physically distant units with similar waveforms.

      (3) Should the intervals measured in the survival plot in Figure 5 be identical for the three different classes of tracked neurons?

      The plot includes all chains of tracked neurons, which can start on arbitrary days in the set of all recordings (see the definition of chains in section 2.4). As a result, the gaps between days, which determine where there is a point on the plot, can be different for the three sets of neurons (reference, putative, and mixed). We have added a comment to the Figure 5 caption to ensure this is clear.

      (4) Would other metrics of the similarity of visual responses work better?

      The similarity metric we use was adopted from the original paper using this data (reference 7). We chose to use the same metric both to take advantage of the original authors’ expertise about the data and allow for reasonable comparison of the new technique to theirs. It is correct that this similarity metric alone does not allow for unique matching (see Discussion and Supplement section 8.2). However, the agreement of EMD with reference pairs determined from the combination of position and visual response similarity is very high, suggesting there are few incorrect reference pairs. Any incorrect reference pairs cause an underestimate of the tracking accuracy.

      (5) Add a definition of ROC.

      Added this definition to the text.

      Reviewer #1 Recommendation to authors:

      The main text needs proofreading.

      We agree that the manuscript needed more thorough proofreading, and we have made corrections of typos and minor language errors throughout.

      Additional comment from the authors:

      Since the posting of this manuscript, another method for tracking neurons has been introduced:

      Enny H. van Beest, Célian Bimbard, Julie M. J. Fabre, Flóra Takács, Philip Coen, Anna Lebedeva, Kenneth Harris, Matteo Carandini, Tracking neurons across days with high-density probes, bioRxiv 2023.10.12.562040; doi: https://doi.org/10.1101/2023.10.12.562040

    1. Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors compared four types of hiPSCs and four types of hESCs at the proteome level to elucidate the differences between hiPSCs and hESCs. Semi-quantitative calculations of protein copy numbers revealed increased protein content in iPSCs. Particularly in iPSCs, proteins related to mitochondrial and cytoplasmic were suggested to reflect the state of the original differentiated cells to some extent. However, the most important result of this study is the calculation of the protein copy numbers per cell, and the validity of this result is problematic. In addition, several experiments need to be improved, such as using cells of different genders (iPSC: female, ESC: male) in mitochondrial metabolism experiments.

      Strengths:

      The focus on the number of copies of proteins is exciting and appreciated if the estimated calculation result is correct and biologically reproducible.

      Weaknesses:

      The proteome results in this study were likely obtained by simply looking at differences between clones, and the proteome data need to be validated. First, there were only a few clones for comparison, and the gender and number of cells did not match between ESCs and iPSCs. Second, no data show the accuracy of the protein copy number per cell obtained by the proteome data.

      We agree with the reviewer in their assessment that more independent stem cell clones and an equal gender balance would be preferable. We will mention these considerations as limitations of our study and encourage a larger-scale follow-up.

      Regarding the estimated copy numbers, we would like to highlight that they have been extensively in the field, with direct validation of the differences in copy numbers with orthogonal methods like FACS2-4,7,10. Furthermore, the original paper directly compared the copy numbers estimated using the “proteomic ruler” to spike-in protein epitope signature tags and found remarkable concordance. This was performed with a much older generation mass spectrometer with reduced peptide coverage, and the author predicted that higher coverage would increase the quantitative performance.

      Reviewer #2 (Public Review):

      Summary:

      Pluripotent stem cells are powerful tools for understanding development, differentiation, and disease modeling. The capacity of stem cells to differentiate into various cell types holds great promise for therapeutic applications. However, ethical concerns restrict the use of human embryonic stem cells (hESCs). Consequently, induced human pluripotent stem cells (ihPSCs) offer an attractive alternative for modeling rare diseases, drug screening, and regenerative medicine.

      A comprehensive understanding of ihPSCs is crucial to establish their similarities and differences compared to hESCs.

      This work demonstrates systematic differences in the reprogramming of nuclear and non-nuclear proteomes in ihPSCs.

      We thank the reviewer for the positive assessment.

      Strengths:

      The authors employed quantitative mass spectrometry to compare protein expression differences between independently derived ihPSC and hESC cell lines. Qualitatively, protein expression profiles in ihPSC and hESC were found to be very similar. However, when comparing protein concentration at a cellular level, it became evident that ihPSCs express higher levels of proteins in the cytoplasm, mitochondria, and plasma membrane, while the expression of nuclear proteins is similar between ihPSCs and hESCs. A higher expression of proteins in ihPSCs was verified by an independent approach, and flow cytometry confirmed that ihPSCs had larger cell sizes than hESCs. The differences in protein expression were reflected in functional distinctions. For instance, the higher expression of mitochondrial metabolic enzymes, glutamine transporters, and lipid biosynthesis enzymes in ihPSCs was associated with enhanced mitochondrial potential, increased ability to uptake glutamine, and increased ability to form lipid droplets.

      Weaknesses:

      While this finding is intriguing and interesting, the study falls short of explaining the mechanistic reasons for the observed quantitative proteome differences. It remains unclear whether the increased expression of proteins in ihPSCs is due to enhanced transcription of the genes encoding this group of proteins or due to other reasons, for example, differences in mRNA translation efficiency. Another unresolved question pertains to how the cell type origin influences ihPSC proteomes. For instance, whether ihPSCs derived from fibroblasts, lymphocytes, and other cell types all exhibit differences in their cell size and increased expression of cytoplasmic and mitochondrial proteins. Analyzing ihPSCs derived from different cell types and by different investigators would be necessary to address these questions.

      We agree with the Reviewer that our study does not provide a mechanistic reason for the quantitative differences between the two cell types. However, we will include an expanded section in the discussion where we discuss the potential causes.<br /> We also agree studying hiPSCs reprogrammed from different cell types, such as blood lymphocytes, would be of great interest and will include a section about this within the discussion to encourage further research into the area.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Brenes and colleagues carried out proteomic analysis of several human induced pluripotent (hiPSC) and human embryonic stem cell (hESC) lines. The authors found quantitative differences in the expression of several groups of cytoplasmic and mitochondrial proteins. Overall, hiPSC expressed higher levels of proteins such as glutamine transporters, mitochondrial metabolism proteins, and proteins related to lipid synthesis. Based on the protein expression differences, the authors propose that hiPSC lines differ from hESC in their growth and metabolism.

      Strengths:

      The number of generated hiPSC and hESC lines continues to grow, but potential differences between hiPSC and hESC lines remain to be quantified and explained. This study is a promising step forward in understanding of the differences between different hiPSC and hESC lines.

      Weaknesses:

      It is unclear whether changes in protein levels relate to any phenotypic features of cell lines used. For example, the authors highlight that increased protein expression in hiPSC lines is consistent with the requirement to sustain high growth rates, but there is no data to demonstrate whether hiPSC lines used indeed have higher growth rates.

      We respectfully disagree with the reviewer on this point. Our data shows that hESCs and hiPSCs show significant differences in protein mass and cell size, validated by the EZQ assay and FACS, while having no significant differences in their cell cycle profiles. Thus increased size and protein content would require higher growth rates to sustain the increased mass, which is what we show.

      The authors claim that the cell cycle of the lines is unchanged. However, no details of the method for assessing the cell cycle were included so it is difficult to appreciate if this assessment was appropriately carried out and controlled for.<br /> We apologise for this omission; the details will be included in the revised version of the document.

      Details and characterisation of iPSC and ESC lines used in this study were overall lacking. The lines used are merely listed in methods, but no references are included for published lines, how lines were obtained, what passage they were used at, their karyotype status, etc. For details of basic characterisation, the authors should refer to the ISSC Standards for the use of human stem cells in research. In particular, the authors should consider whether any of the changes they see may be attributed to copy number variants in different lines.

      We agree with the reviewer on this. The hiPSC lines were generated by the HipSci consortium in the Wellcome Sanger Centre as described in the flagship HipSci paper13. We cite the flagship paper which specifies in great detail the reprogramming protocols and quality control measures, including looking at copy number variations13. However, we agree that we did not make this information easily accessible for readers. We also believe it is relevant to also explicitly include this information on our manuscript instead of expecting readers to look at the flagship paper. These details will be added to the revised version.

      The expression data for markers of undifferentiated state in Figure 1a would ideally be shown by immunocytochemistry or flow cytometry as it is impossible to tell whether cultures are heterogeneous for marker expression.

      We agree with the reviewer on this. FACS is indeed much more quantitative and a better method to study heterogeneity. However, we did not have protocols to study these markers using FACS.

      TEM analysis should ideally be quantified.

      We agree with the reviewer that it would be nice to have a quantitative measure.

      All figure legends should explicitly state what graphs are representing (e.g. average/mean; how many replicates (biological or technical), which lines)? Some data is included in Methods (e.g. glutamine uptake), but not for all of the data (e.g. TEM).

      We agree with the reviewer completely. These points will be remediated in the revised version of the manuscript.

      Validation experiments were performed typically on one or two cell lines, but the lines used were not consistent (e.g. wibj_2 versus H1 for respirometry and wibj_2, oaqd_3 versus SA121 and SA181 for glutamine uptake). Can the authors explain how the lines were chosen?

      We will include these details within the updated manuscript.

      The authors should acknowledge the need for further functional validation of the results related to immunosuppressive proteins.

      We agree with the reviewer and will add a clear sentence in the discussion making this point explicitly.

      Differences in H1 histone abundance were highlighted. Can the authors speculate as to the meaning of these differences?

      Regarding H1 histones, our study of the literature as well as interaction with chromatin and histone experts both within our institute and externally have not shed light into what the differences could imply. We think this is an interesting result that merits further study, but we don’t have a clear hypothesis on the consequences.

      In summary, we thank the reviewers for their comments and will prepare a revised version that addresses their suggestions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study uses a multi-pronged empirical and theoretical approach to advance our understanding of how differences in learning relate to differences in the ways that male versus female animals cope with urban environments, and more generally how reversal learning may benefit animals in urban habitats. The work makes an important contribution and parts of the data and analyses are solid, although several of the main claims are only partially supported or overstated and require additional support.

      Public Reviews:

      We thank the Editor and both Reviewers for their time and for their constructive evaluation of our manuscript. We worked to address each comment and suggestion offered by the Reviewers in our revision—please see our point-by-point responses below.

      Reviewer #1 (Public Review):

      Summary:

      In this highly ambitious paper, Breen and Deffner used a multi-pronged approach to generate novel insights on how differences between male and female birds in their learning strategies might relate to patterns of invasion and spread into new geographic and urban areas.

      The empirical results, drawn from data available in online archives, showed that while males and females are similar in their initial efficiency of learning a standard color-food association (e.g., color X = food; color Y = no food) scenario when the associations are switched (now, color Y = food, X= no food), males are more efficient than females at adjusting to the new situation (i.e., faster at 'reversal learning'). Clearly, if animals live in an unstable world, where associations between cues (e.g., color) and what is good versus bad might change unpredictably, it is important to be good at reversal learning. In these grackles, males tend to disperse into new areas before females. It is thus fascinating that males appear to be better than females at reversal learning. Importantly, to gain a better understanding of underlying learning mechanisms, the authors use a Bayesian learning model to assess the relative role of two mechanisms (each governed by a single parameter) that might contribute to differences in learning. They find that what they term 'risk sensitive' learning is the key to explaining the differences in reversal learning. Males tend to exhibit higher risk sensitivity which explains their faster reversal learning. The authors then tested the validity of their empirical results by running agent-based simulations where 10,000 computersimulated 'birds' were asked to make feeding choices using the learning parameters estimated from real birds. Perhaps not surprisingly, the computer birds exhibited learning patterns that were strikingly similar to the real birds. Finally, the authors ran evolutionary algorithms that simulate evolution by natural selection where the key traits that can evolve are the two learning parameters. They find that under conditions that might be common in urban environments, high-risk sensitivity is indeed favored.

      Strengths:

      The paper addresses a critically important issue in the modern world. Clearly, some organisms (some species, some individuals) are adjusting well and thriving in the modern, human-altered world, while others are doing poorly. Understanding how organisms cope with human-induced environmental change, and why some are particularly good at adjusting to change is thus an important question.

      The comparison of male versus female reversal learning across three populations that differ in years since they were first invaded by grackles is one of few, perhaps the first in any species, to address this important issue experimentally.

      Using a combination of experimental results, statistical simulations, and evolutionary modeling is a powerful method for elucidating novel insights.

      Thank you—we are delighted to receive this positive feedback, especially regarding the inferential power of our analytical approach.

      Weaknesses:

      The match between the broader conceptual background involving range expansion, urbanization, and sex-biased dispersal and learning, and the actual comparison of three urban populations along a range expansion gradient was somewhat confusing. The fact that three populations were compared along a range expansion gradient implies an expectation that they might differ because they are at very different points in a range expansion. Indeed, the predicted differences between males and females are largely couched in terms of population differences based on their 'location' along the rangeexpansion gradient. However, the fact that they are all urban areas suggests that one might not expect the populations to differ. In addition, the evolutionary model suggests that all animals, male or female, living in urban environments (that the authors suggest are stable but unpredictable) should exhibit high-risk sensitivity. Given that all grackles, male and female, in all populations, are both living in urban environments and likely come from an urban background, should males and females differ in their learning behavior? Clarification would be useful.

      Thank you for highlighting a gap in clarity in our conceptual framework. To answer the Reviewer’s question—yes, even with this shared urban ‘history’, it seems plausible that males and females could differ in their learning. For example, irrespective of population membership, such sex differences could come about via differential reliance on learning strategies mediated by an interaction between grackles’ polygynous mating system and malebiased dispersal system, as we discuss in L254–265 (now L295–306). Population membership might, in turn, differentially moderate the magnitude of any such sex-effect since an edge population, even though urban, could still pose novel challenges—for example, by requiring grackles to learn novel daily temporal foraging patterns such as when and where garbage is collected (grackles appear to track this food resource: Rodrigo et al. 2021 [DOI: 10.1101/2021.06.14.448443]). We now introduce this important conceptual information— please see L89–96.

      Reinforcement learning mechanisms:

      Although the authors' title, abstract, and conclusions emphasize the importance of variation in 'risk sensitivity', most readers in this field will very possibly misunderstand what this means biologically. Both the authors' use of the term 'risk sensitivity' and their statistical methods for measuring this concept have potential problems.

      Please see our below responses concerning our risk-sensitivity term.

      First, most behavioral ecologists think of risk as predation risk which is not considered in this paper. Secondarily, some might think of risk as uncertainty. Here, as discussed in more detail below, the 'risk sensitivity' parameter basically influences how strongly an option's attractiveness affects the animal's choice of that option. They say that this is in line with foraging theory (Stephens and Krebs 2019) where sensitivity means seeking higher expected payoffs based on prior experience. To me, this sounds like 'reward sensitivity', but not what most think of as 'risk sensitivity'. This problem can be easily fixed by changing the name of the term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We have added this information to our manuscript in L494–497. We further apologise for not clearly explaining how our lambda parameter estimates such risk-sensitive foraging. To do so here, we need to consider our Bayesian reinforcement learning model in full. This model uses observed choice-behaviour during reinforcement learning to infer our phi (information-updating) and lambda (risksensitivity) learning parameters. Thus, payoffs incurred through choice simultaneously influence estimation of each learning parameter—that is, in a sense, they are both sensitive to rewards. But phi and lambda differentially direct any reward sensitivity back on choicebehaviour due to their distinct definitions. Glossing over the mathematics, for phi, stronger reward sensitivity (bigger phi values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For lambda, stronger reward sensitivity (bigger lambda values) means stronger internal determinism about seeking the non-risk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’. We hope this information, which we have incorporated into our revised manuscript (please see L153–161), clarifies the rationale and mechanics of our reinforcement learning model, and why lamba measures risk-sensitivity.

      In addition, however, the parameter does not measure sensitivity to rewards per se - rewards are not in equation 2. As noted above, instead, equation 2 addresses the sensitivity of choice to the attraction score which can be sensitive to rewards, though in complex ways depending on the updating parameter. Second, equations 1 and 2 involve one specific assumption about how sensitivity to rewards vs. to attraction influences the probability of choosing an option. In essence, the authors split the translation from rewards to behavioral choices into 2 steps. Step 1 is how strongly rewards influence an option's attractiveness and step 2 is how strongly attractiveness influences the actual choice to use that option. The equation for step 1 is linear whereas the equation for step 2 has an exponential component. Whether a relationship is linear or exponential can clearly have a major effect on how parameter values influence outcomes. Is there a justification for the form of these equations? The analyses suggest that the exponential component provides a better explanation than the linear component for the difference between males and females in the sequence of choices made by birds, but translating that to the concepts of information updating versus reward sensitivity is unclear. As noted above, the authors' equation for reward sensitivity does not actually include rewards explicitly, but instead only responds to rewards if the rewards influence attraction scores. The more strongly recent rewards drive an update of attraction scores, the more strongly they also influence food choices. While this is intuitively reasonable, I am skeptical about the authors' biological/cognitive conclusions that are couched in terms of words (updating rate and risk sensitivity) that readers will likely interpret as concepts that, in my view, do not actually concur with what the models and analyses address.

      To answer the Reviewer’s question—yes, these equations are very much standard and the canonical way of analysing individual reinforcement learning (see: Ch. 15.2 in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky 2018 [DOI: 10.1017/CBO9781316272503]; McElreath et al. 2008 [DOI: 10.1098/rstb/2008/0131]; Reinforcement Learning by Sutton & Barto 2018). To provide a “justification for the form of these equations'', equation 1 describes a convex combination of previous values and recent payoffs. Latent values are updated as a linear combination of both factors, there is no simple linear mapping between payoffs and behaviour as suggested by the reviewer. Equation 2 describes the standard softmax link function. It converts a vector of real numbers (here latent values) into a simplex vector (i.e., a vector summing to 1) which represents the probabilities of different outcomes. Similar to the logit link in logistic regression, the softmax simply maps the model space of latent values onto the outcome space of choice probabilities which enter the categorial likelihood distribution. We can appreciate how we did not make this clear in our manuscript by not highlighting the standard nature of our analytical approach—we now do so in our revised manuscript (please see L148–149). As far as what our reinforcement learning model measures, and how it relates cognition and behaviour, please see our previous response.

      To emphasize, while the authors imply that their analyses separate the updating rate from 'risk sensitivity', both the 'updating parameter' and the 'risk sensitivity' parameter influence both the strength of updating and the sensitivity to reward payoffs in the sense of altering the tendency to prefer an option based on recent experience with payoffs. As noted in the previous paragraph, the main difference between the two parameters is whether they relate to behaviour linearly versus with an exponential component.

      Please see our two earlier responses on the mechanics of our reinforcement learning model.

      Overall, while the statistical analyses based on equations (1) and (2) seem to have identified something interesting about two steps underlying learning patterns, to maximize the valuable conceptual impact that these analyses have for the field, more thinking is required to better understand the biological meaning of how these two parameters relate to observed behaviours, and the 'risk sensitivity' parameter needs to be re-named.

      Please see our earlier response to these suggestions.

      Agent-based simulations:

      The authors estimated two learning parameters based on the behaviour of real birds, and then ran simulations to see whether computer 'birds' that base their choices on those learning parameters return behaviours that, on average, mirror the behaviour of the real birds. This exercise is clearly circular. In old-style, statistical terms, I suppose this means that the R-square of the statistical model is good. A more insightful use of the simulations would be to identify situations where the simulation does not do as well in mirroring behaviour that it is designed to mirror.

      Based on the Reviewer’s summary of agent-based forward simulation, we can see we did a poor job explaining the inferential value of this method—we apologise. Agent-based forward simulations are posterior predictions, and they provide insight into the implied model dynamics and overall usefulness of our reinforcement learning model. R-squared calculations are retrodictive, and they say nothing about the causal dynamics of a model. Specifically, agent-based forward simulation allows us to ask—what would a ‘new’ grackle ‘do’, given our reinforcement learning model parameter estimates? It is important to ask this question because, in parameterising our model, we may have overlooked a critical contributing mechanism to grackles’ reinforcement learning. Such an omission is invisible in the raw parameter estimates; it is only betrayed by the parameters in actu. Agent-based forward simulation is ‘designed’ to facilitate this call to action—not to mirror behavioural results. The simulation has no apriori ‘opinion’ about computer ‘birds’ behavioural outcomes; rather, it simply assigns these agents random phi and lambda draws (whilst maintaining their correlation structure), and tracks their reinforcement learning. The exercise only appears circular if no critical contributing mechanism(s) went overlooked—in this case computer ‘birds’ should behave similar to real birds. A disparate mapping between computer ‘birds’ and real birds, however, would mean more work is needed with respect to model parameterisation that captures the causal, mechanistic dynamics behind real birds’ reinforcement learning (for an example of this happening in the human reinforcement learning literature, see Deffner et al. 2020 [DOI: 10.1098/rsos.200734]). In sum, agent-based forward simulation does not access goodness-of-fit—we assessed the fit of our model apriori in our preregistration (https://osf.io/v3wxb)—but it does assess whether one did a comprehensive job of uncovering the mechanistic basis of target behaviour(s). We have worked to make the above points on the method and the insight afforded by agent-based forward simulation explicitly clear in our revision—please see L192–207 and L534–537.

      Reviewer #2 (Public Review):

      Summary:

      The study is titled "Leading an urban invasion: risk-sensitive learning is a winning strategy", and consists of three different parts. First, the authors analyse data on initial and reversal learning in Grackles confronted with a foraging task, derived from three populations labeled as "core", "middle" and "edge" in relation to the invasion front. The suggested difference between study populations does not surface, but the authors do find moderate support for a difference between male and female individuals. Secondly, the authors confirm that the proposed mechanism can actually generate patterns such as those observed in the Grackle data. In the third part, the authors present an evolutionary model, in which they show that learning strategies as observed in male Grackles do evolve in what they regard as conditions present in urban environments.

      Strengths:

      The manuscript's strength is that it combines real learning data collected across different populations of the Great-tailed grackle (Quiscalus mexicanus) with theoretical approaches to better understand the processes with which grackles learn and how such learning processes might be advantageous during range expansion. Furthermore, the authors also take sex into account revealing that males, the dispersing sex, show moderately better reversal learning through higher reward-payoff sensitivity. I also find it refreshing to see that the authors took the time to preregister their study to improve transparency, especially regarding data analysis.

      Thank you—we are pleased to receive this positive evaluation, particularly concerning our efforts to improve scientific transparency via our study’s preregistration (https://osf.io/v3wxb).

      Weaknesses:

      One major weakness of this manuscript is the fact that the authors are working with quite low sample sizes when we look at the different populations of edge (11 males & 8 females), middle (4 males & 4 females), and core (17 males & 5 females) expansion range. Although I think that when all populations are pooled together, the sample size is sufficient to answer the questions regarding sex differences in learning performance and which learning processes might be used by grackles but insufficient when taking the different populations into account.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results (it might; so might muchneeded population replicates—see L310), but our Bayesian models still allow us to learn a lot from our current data. We now explain this in our revised manuscript—please see L452–457.

      Another weakness of this manuscript is that it does not set up the background well in the introduction. Firstly, are grackles urban dwellers in their natural range and expand by colonising urban habitats because they are adapted to it? The introduction also fails to mention why urban habitats are special and why we expect them to be more challenging for animals to inhabit. If we consider that one of their main questions is related to how learning processes might help individuals deal with a challenging urban habitat, then this should be properly introduced.

      In L74–75 (previously L53–56) we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We hope this included information sufficiently answers the Reviewer’s question. We have worked towards flushing out how urban-imposed challenges faced by grackles, such as the wildlife management efforts introduced in L64–65 (now L85–86), may apply to animals inhabiting urban environments more broadly; for example, we now include an entire paragraph in our Introduction detailing how urban environments may be characterised differently to nonurban environments, and thus why they are perhaps more challenging for animals to inhabit— please see L56–71.

      Also, the authors provide a single example of how learning can differ between populations from more urban and more natural habitats. The authors also label the urban dwellers as the invaders, which might be the case for grackles but is not necessarily true for other species, such as the Indian rock agama in the example which are native to the area of study. Also, the authors need to be aware that only male lizards were tested in this study. I suggest being a bit more clear about what has been found across different studies looking at: (1) differences across individuals from invasive and native populations of invasive species and (2) differences across individuals from natural and urban populations.

      We apologise for not including more examples of such learning differences. We now include three examples (please see L43–49), and we are careful to call attention to the fact that these data cover both resident urban and non-urban species as well as urban invasive species (please see L49–50). We also revised our labelling of the lizard species (please see L44). We are aware only male lizards were tested but this information is not relevant to substantiating our use of this study; that is, to highlight that learning can differ between urbandwelling and non-urban counterparts. We hope the changes we did make to our manuscript satisfy the Reviewer’s general suggestion to add biological clarity.

      Finally, the introduction is very much written with regard to the interaction between learning and dispersal, i.e. the 'invasion front' theme. The authors lay out four predictions, the most important of which is No. 4: "Such sex-mediated differences in learning to be more pronounced in grackles living at the edge, rather than the intermediate and/or core region of their range." The authors, however, never return to this prediction, at least not in a transparent way that clearly pronounces this pattern not being found. The model looking at the evolution of risk-sensitive learning in urban environments is based on the assumption that urban and natural environments "differ along two key ecological axes: environmental stability 𝑢 (How often does optimal behaviour change?) and environmental stochasticity 𝑠 (How often does optimal behaviour fail to pay off?). Urban environments are generally characterised as both stable (lower 𝑢) and stochastic (higher 𝑠)". Even though it is generally assumed that urban environments differ from natural environments the authors' assumption is just one way of looking at the differences which have generally not been confirmed and are highly debated. Additionally, it is not clear how this result relates to the rest of the paper: The three populations are distinguished according to their relation to the invasion front, not with respect to a gradient of urbanization, and further do not show a meaningful difference in learning behaviour possibly due to low sample sizes as mentioned above.

      Thank you for highlighting a gap in our reporting clarity. We now take care to transparently report our null result regarding our fourth prediction; more specifically, that we did not detect credible population-level differences in grackles’ learning (please see L130). Regarding our evolutionary model, we agree with the Reviewer that this analysis is only one way of looking at the interaction between learning phenotype and apparent urban environmental characteristics. Indeed, in L282–288 (now L325–329) we state: “Admittedly, our evolutionary model is not a complete representation of urban ecology dynamics. Relevant factors—e.g., spatial dynamics and realistic life histories—are missed out. These omissions are tactical ones. Our evolutionary model solely focuses on the response of reinforcement learning parameters to two core urban-like (or not) environmental statistics, providing a baseline for future study to build on”. But we can see now that ‘core’ is too strong a word, and instead ‘supposed’, ‘purported’ or ‘theorised’ would be more accurate—we have revised our wording throughout our manuscript to say as much (please see, for example, L24; L56; L328). We also further highlight the preliminary nature of our evolutionary model, in terms of allowing a narrow but useful first-look at urban eco-evolutionary dynamics—please see L228–232. Finally, we now detail the theorised characteristics of urban environments in our Introduction (rather than in our Results; please see L56–71), and we hope that by doing so, how our evolutionary results relate to the rest of our paper is now better set up and clear.

      In conclusion, the manuscript was well written and for the most part easy to follow. The format of eLife having the results before the methods makes it a bit harder to follow because the reader is not fully aware of the methods at the time the results are presented. It would, therefore, be important to more clearly delineate the different parts and purposes. Is this article about the interaction between urban invasion, dispersal, and learning? Or about the correct identification of learning mechanisms? Or about how learning mechanisms evolve in urban and natural environments? Maybe this article can harbor all three, but the borders need to be clear. The authors need to be transparent about what has and especially what has not been found, and be careful to not overstate their case.

      Thank you, we are pleased to read that the Reviewer found our manuscript to be generally digestible. We have worked to add further clarity, and to tempter our tone (please see our above and below responses).

      Reviewer #1 (Recommendations For The Authors):

      Several of the results are based on CIs that overlap zero. Tone these down somewhat.

      We apologise for overstating our results, which we have worked to tone down in our revision. For instance, in L185–186 we now differentiate between estimates that did or did not overlap zero (please also see our response to Reviewer 2 on this tonal change). We note we do not report confidence intervals (i.e., the range of values expected to contain the true estimate if one redoes the study/analysis many times). Rather, we report 89% highest posterior density intervals (i.e., the most likely values of our parameters over this range). We have added this definition in L459, to improve clarity.

      The literature review suggesting that urban environments are more unpredictable is not convincing. Yes, they have more noise and light pollution and more cars and planes, but does this actually relate to the unpredictability of getting a food reward when you choose an option that usually yields rewards?

      To answer the Reviewer’s question—yes. But we can see that by not including empirical examples from the literature, we did a poor job of arguing such links. In L43–49 we now give three empirical examples; more specifically, we state: “[...] experimental data show the more variable are traffic noise and pedestrian presence, the more negative are such human-driven effects on birds' sleep (Grunst et al., 2021), mating (Blickley et al., 2012), and foraging behaviour (Fernández-Juricic, 2000).” We note we now detail such apparently stable but stochastic urban environmental characteristics in our Introduction rather than our Results section, to hopefully improve the clarity of our manuscript (please see L56–71). We further note that we cite three literature reviews—not one—suggesting urban environments are stable in certain characteristics and more unpredictable in others (please see L59–60). Finally, we appreciate such characterisation is not certain, and so in our revision we have qualified all writing about this potential dynamic with words such as “apparent”, “supposed”, “theorised”, “hypothesised” etc.

      It would be interesting to see if other individual traits besides sex affect their learning/reversal learning ability and/or their learning parameters. Do you have data on age, size, condition, or personality? Or, the habitat where they were captured?

      We do not have these data. But we agree with the Reviewer that examining the potential influence of such covariates on grackles’ reinforcement learning would be interesting in future study, especially habitat characteristics (please see L306–309).

      For most levels of environmental noise, there appears to be an intermediate maximum for the relationship between environmental stability and the risk sensitivity parameter. What does this mean?

      There is indeed an intermediate maximum for certain values of environmental stochasticity (although the differences are rather small). The most plausible reason for this is that for very stable environments, simulated birds essentially always “know” the rewarded solution and never need to “relearn” behaviour. In this case, differences in latent values will tend to be large (because they consistently get rewarded for the same option), and different lambda values (in the upper range) will produce the same choice behaviour, which results in very weak selection. While in very unstable environments, optimal choice behaviour should be more exploratory, allowing learners to track frequently-changing environments. We now note this pattern in L240–248.

      Reviewer #2 (Recommendations For The Authors):

      L2: I'd encourage the authors to reconsider the term "risk-sensitive learning", at least in the title. It's not apparent to me how 'risk' relates to the investigated foraging behaviour. Elsewhere, risk-reward sensitivity is used which may be a better term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We have added this information to our manuscript in L494–497. In explaining our reinforcement model, we also now detail how risk relates to foraging behaviour. Specifically, in L153–161 we now state: “Both learning parameters capture individual-level internal response to incurred reward-payoffs, but they differentially direct any reward sensitivity back on choice-behaviour due to their distinct definitions (full mathematical details in Materials and methods). For 𝜙, stronger reward sensitivity (bigger values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For 𝜆, stronger reward sensitivity (bigger values) means stronger internal determinism about seeking the nonrisk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’.” We hope this information clarifies why lamba measures risk-sensitivity, and why we continue to use this term.

      L1-3: The title is a bit misleading with regard to the empirical data. From the data, all that can be said is that male grackles relearn faster than females. Any difference between populations actually runs the other way, with the core population exhibiting a larger difference between males and females than the mid and edge populations.

      It is customary for a manuscript title to describe the full scope of the study. In our study, we have empirical data, cognitive modelling, and evolutionary simulations of the background theory all together. And together these analytical approaches show: (1) across three populations, male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—outperform female counterparts in reversal learning; (2) they do this via risk-sensitive learning, so they’re more sensitive to relative differences in reward payoffs and choose to stick with the ‘safe’ i.e., rewarding option, rather than continuing to ‘gamble’ on an alternative option; and (3) risk-sensitive learning should be favoured in statistical environments characterised by purported urban dynamics. So, we do not feel our title “Leading an urban invasion: risk-sensitive learning is a winning strategy” is misleading with regard to our empirical data; it just doesn’t summarise only our empirical data. Finally, as we now state in L312–313, we caution against speculating about any between-population variation, as we did not infer any meaningful behavioural or mechanistic population-level differences.

      L13: "Assayed", is that correctly put, given that the authors did not collect the data?

      Merrian-Webster defines assay as “to analyse” or “examination or determination as to characteristics”, and so to answer the Reviewer’s question—yes, we feel this is correctly put. We note we explicitly introduce in L102–103 that we did not collect the data, and we have an explicit “Data provenance” section in our methods (please see L342–347).

      L42-46: The authors provide a single example of how learning can differ between populations from more urban and more natural habitats. I would like to point out that many of these studies do not directly confirm that the ability in question has indeed led to the success of the species tested (e.g. show fitness consequences). Then the authors could combine these insights to form a solid prediction for the grackles. As of now, this looks like cherry-picking supportive literature without considering negative results.

      Here are some references that might be helpful in identifying relevant literature to cite:

      Szabo, B., Damas-Moreira, I., & Whiting, M. J. (2020). Can cognitive ability give invasive species the means to succeed? A review of the evidence. Frontiers in Ecology and Evolution, 8, 187.

      Griffin AS, Tebbich S, Bugnyar T, 2017. Animal cognition in a human-dominated world. Anim Cogn 20(1):1-6.

      Kark, S., Iwaniuk, A., Schalimtzek, A., & Banker, E. (2007). Living in the city: Can anyone become an "urban exploiter"? Journal of Biogeography, 34(4), 638-651.

      We apologise for not including more examples of such learning differences. We now include three examples (please see L43–49). We are aware that direct evidence of fitness consequences is entirely lacking in the scientific literature on cognition and successful urban invasion; hence why such data is not present in our paper. But we now explicitly point out a role for likely fitness-affecting anthropogenic disturbances on sleep, mate, and foraging behaviour on animals inhabiting urban environments (please see L63–68). We hope these new data bolster our predictions for our grackles. Finally, the Reviewer paints a (in our view) inaccurate picture of our use of available literature. Nevertheless, to address their comment, we now highlight a recent meta-analysis advocating for further research to confirm apparent ‘positive’ trends between animal ‘smarts’ and successful ‘city living’ (please see L43).

      L64: Is their niche historically urban, or have they recently moved into urban areas?

      In L74–75 (previously L53–56) we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We hope this included information sufficiently answers the Reviewer’s question.

      L66-67: This is an important point that is however altogether missing from the discussion.

      We thank the Reviewer for highlighting a gap in our discussion regarding populationlevel differences in grackles’ reinforcement learning. In L310–312 we now state: “The lack of spatial replicates in the existing data set used herein inherently poses limitations on inference. Nevertheless, the currently available data do not show meaningful population-level behavioural or mechanistic differences in grackles’ reinforcement learning, and we should thus be cautious about speculating on between-population variation”.

      L68-71: The paper focuses on cognitive ability. The whole paragraph sets up the prediction of why male grackles should be better learners due to their dispersal behaviour. This example, however, focuses on aggression, not cognition. Here is a study showing differences in learning in male and female mynas that might be better suited:

      Federspiel IG, Garland A, Guez D, Bugnyar T, Healy SD, Güntürkün O, Griffin AS, 2017. Adjusting foraging strategies: a comparison of rural and urban common mynas (Acridotheres tristis). Anim Cogn 20(1):65-74.

      We thank the Reviewer for suggesting this paper. We feel it is better suited to substantiating our point in the Discussion about reversal learning not being indicative of cognitive ability—please see L276–277.

      L73: Generally, I suggest not writing "for the first time" as this is not a valid argument for why a study should be conducted. Furthermore, except for replication studies, most studies investigate questions that are novel and have not been investigated before.

      The Reviewer makes a fair point—we have removed this statement.

      L80-81: Here again, this is left undiscussed later on.

      By ‘this’ we assume the Reviewer is referring to our hypothesis, which is that sex differences in dispersal are related to sex differences in learning in an urban invader— grackles. At the beginning of our Discussion, we state how we found support for this hypothesis (please see L250–261); and in our ‘Ideas and speculation’ section, we discuss how these hypothesis-supporting data fit into the literature more broadly (please see L294–331). We feel this is therefore sufficiently discussed.

      L77-81: This sentence is very long and therefore hard to read. I suggest trying to split it into at least 2 separate sentences which would improve readability.

      Per the Reviewer’s useful suggestion, we have split this sentence into two separate sentences—please see L97–115.

      L83: Please explain choice-option switches. I am not aware of what that is and it should be explained at first mention.

      We apologise for this operational oversight. We now include a working definition of speed and choice-option switches at first mention. Specifically, in L107–108 we state: “[...] we expect male and female grackles to differ across at least two reinforcement learning behaviours: speed (trials to criterion) and choice-option switches (times alternating between available stimuli)”.

      L83-87: Again, a very long sentence. Please split.

      We thank the Reviewer for their suggestion. In this case we feel it is important to not change our sentence structure because we want our prediction statements to match between our manuscript and our preregistration.

      L96-97: Important to not overstate this. It merely demonstrates the potential of the proposed (not detected) mechanism to generate the observed data.

      As in any empirical analysis, our drawn conclusions depend on causal assumptions about the mechanisms generating behaviour (Pearl, J. (2009). Causality). Therefore, we “detected” specific learning mechanisms assuming a certain generative model, namely reinforcement learning. As there is overwhelming evidence for the widespread importance of value-based decision making and Rescorla-Wagner updating rules across numerous different animals (Sutton & Barto (2018) Reinforcement Learning), we would argue that this assumed model is highly plausible in our case. Still, we changed the text to “inferred” instead of “detected” learning mechanisms to account for this concern—please see L123–124.

      L99: "urban-like settings" again a bit confusing. The authors talk about invasion fronts, but now also about an urbanisation gradient. Is the main difference between the size and the date of establishment, or is there additionally a gradient in urbanisation to be considered?

      We now include a paragraph in our Introduction detailing apparent urban environmental characteristics (please see 56–71), and we now refer to this dynamic specifically when we define urban-like settings (please see L126–127). To answer the Reviewer’s question—we consider both differences. Specifically, we consider the time since population establishment in our paper (with respect to our behavioural and mechanistic modelling), as well as how statistical environments that vary in how similar they are to apparently characteristically urban-like environments, might favour particular learning phenotypes (with respect to our evolutionary modelling). We hope the edits to our Introduction as a whole now make both of the aims clear.

      L11-112: Above the authors talk about a comparable number of switches (10.5/15=0.7), and here of fewer number of switches (25/35=0.71), even though the magnitude of the difference is almost identical and actually runs the other way. The authors are probably misled by their conservative priors, which makes the difference appear greater in the second case than in the first. Using flat priors would avoid this particular issue.

      Mathematically, the number of trials-to-finish and the number of choice-optionswitches are both a Poisson distributed outcome with rate λ (we note lambda here is not our risk-sensitivity parameter; just standard notation). As such, our Poisson models infer the rate of these outcomes by sex and phase—not the ratio of these outcomes by sex and phase. So comparing the magnitude of divided medians of choice-option-switches between the sexes by phase is not a meaningful metric with respect to the distribution of our data, as the Reviewer does above. For perspective, 1 vs. 2 switches provides much less information about the difference in rates of a Poisson distribution than 50 vs 100 (for the former, no difference would be inferred; for the latter, it would), but both exhibit a 1:2 ratio. To hopefully prevent any such further confusion, and to focus on the fact that our Poisson models estimate the expected value i.e., the mean, we now report and graph (please see Fig. 2) mean and not median trialsto-finish and total-switch-counts. Finally, we can see that our use of the word “conservative” to describe our weakly informative priors is confusing, because conservative could mean either strong priors with respect to expected effect size (not our parameterisation) or weak priors with respect to such assumptions (our parameterisation). To address this lack of clarity, we now state that we use “weakly informative priors” in L457–458.

      L126: It is not clear what risk sensitivity means in the context of these experiments.

      Thank you for pointing out our lack of clarity. In L153–161 we now state: “Both learning parameters capture individual-level internal response to incurred reward-payoffs, but they differentially direct any reward sensitivity back on choice-behaviour due to their distinct definitions (full mathematical details in Materials and methods). For 𝜙, stronger reward sensitivity (bigger values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For 𝜆, stronger reward sensitivity (bigger values) means stronger internal determinism about seeking the nonrisk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’.” We hope this information clarifies what risk sensitivity means and measures, with respect to our behavioural experiments.

      L128-129: I find this statement too strong. A plethora of other mechanisms could produce similar patterns, and you cannot exclude these by way of your method. All you can show is whether the mechanism is capable of producing broadly similar outcomes as observed

      In describing the inferential value of our reinforcement learning model, we now qualify that the insight provided is of course conditional on the model, which is tonally accurate. Please see L161.

      L144: As I have already mentioned above, here is the first time we hear about unpredictability related to urban environments. I suggest clearly explaining in the introduction how urban and natural environments are assumed to be different which leads to animals needing different cognitive abilities to survive in them which should explain why some species thrive and some species die out in urbanised habitats.

      Thank you for this suggestion. We now include a paragraph in our Introduction detailing as much—please see L56–71.

      L162: "almost entirely above zero" again, this is worded too strongly.

      In reporting our lambda across-population 89% HPDI contrasts in L185–186, we now state: “[...] across-population contrasts that lie mostly above zero in initial learning, and entirely above zero in reversal learning”. Our previous wording stated: ““[...] across-population contrasts that lie almost entirely above zero”. The Reviewer was correct to point out that this previous wording was too strong if we considered the contrasts together, as, indeed, we find the range of the contrast in initial learning does minimally overlap zero (L: -0.77; U: 5.61), while the range of the contrast in reversal learning does not (L: 0.14; U: 4.26). This rephrasing is thus tonally accurate.

      L178-179: I think it should be said instead that the model accounts well for the observed data.

      We have rephrased in line with the Reviewer’s suggestion, now stating in L217–218 that “Such quantitative replication confirms our reinforcement learning model results sufficiently explain our behavioural sex-difference data.”

      L188-190: I am not convinced this is a general pattern. It is quite a bold claim that I don't find to be supported by the citations. Why should biotic and abiotic factors differ in how they affect behavioural outcomes? Also, events in urban environments such as weekend/weekday could lead to highly regular optimal behaviour changes.

      Please see our response to Reviewer 1 on this point. We note we now touch on such regular events in L94–96.

      L209-211: The first sentence is misleading. The authors have found that males and females differ in 'risk sensitivity', that their learning model can fit the data rather well, and that under certain, not necessarily realistic assumptions, the male learning type is favoured by natural selection in urban environments. A difference between core, middle, and edge habitats however is barely found, and in fact seems to run the other way than expected.

      In our study, we found: (1) across three populations, male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—outperform female counterparts in reversal learning; (2) they do this via risk-sensitive learning, so they’re more sensitive to relative differences in reward payoffs and choose to stick with the ‘safe’ i.e., rewarding option, rather than continuing to ‘gamble’ on an alternative option; (3) we are sufficiently certain risk-sensitive learning generates our sex-difference data, as our agentbased forward simulations replicate our behavioural results (not because our model ‘fits’ the data, but because we inferred meaningful mechanistic differences—see our response to Reviewer 1 on this point); and (4) under theorised dynamics of urban environments, natural selection should favour risk-sensitive learning. We therefore do not feel it is misleading to say that we mapped a full pathway from behaviour to mechanisms through to selection and adaptation. Again, as we now state in L311–313, we caution against speculating about any between-population variation, as we did not infer any meaningful behavioural or mechanistic population-level differences. And we note the Reviewer is wrong to assume an interaction between learning, dispersal, and sex requires population-level differences on the outcome scale—please see our discussion on phenotypic plasticity and inherent species trait(s) in L313–324.

      L216: "indeed explain" again worded too strongly.

      We have tempered our wording. Specifically, we now state in L218: “sufficiently explain”. This wording is tonally accurate with respect to the inferential value of agent-based forward simulations—please see L192–207 on this point.

      L234: "reward-payoff sensitivity" might be a better term than risk-sensitivity?

      Please see our earlier response to this suggestion. We note we have changed this text to state “risk-sensitive learning” rather than “reward-payoff sensitivity”, to hopefully prevent the reader from concluding only our lambda term is sensitive to rewards—a point we now include in L153–154.

      L234-237: I think these points may be valuable, but come too much out of the blue. Many readers will not have a detailed knowledge of the experimental assays. It therefore also does not become clear how they measure the wrong thing, what this study does to demonstrate this, or whether a better alternative is presented herein. It almost seems like this should be a separate paper by itself.

      We apologise for this lack of context. We now explicitly state in L275 that we are discussing reversal learning assays, to give all readers this knowledge. In doing so, we hope the logic of our argument is now clear: reversal learning assays do not measure behavioural flexibility, whatever that even is. The Reviewer’s suggestion of a separate paper focused on what reversal learning assays actually measure, in terms of mechanism(s), is an interesting one, and we would welcome this discussion. But any such paper should build on the points we make here.

      L270-288: Somewhere here the authors have to explain how they have not found differences between populations, or that in so far as they found them, they run against the originally stated hypothesis.

      We thank the Reviewer for these suggestions. In L310—313 we now state: “The lack of spatial replicates in the existing data set used herein inherently poses limitations on inference. Nevertheless, the currently available data do not show meaningful population-level behavioural or mechanistic differences in grackles’ reinforcement learning, and we should thus be cautious about speculating on between-population variation”.

      L284: should be "missing" not "missed out"

      We have made this change.

      L290-291: It is unclear what "robust interactive links" were found. A pattern of sexbiased learning was found, which can potentially be attributed to evolutionary pressures in urban environments. An interaction e.g. between learning, dispersal, and sex can only be tentatively suggested (no differences between populations). Also "fully replicable" is a bit misleading. The analysis may be replicable, but the more relevant question of whether the findings are replicable we cannot presently answer.

      We apologise for our lack of clarity. By “robust” we mean “across population”, which we now state in L333. We again note the Reviewer is wrong to assume an interaction between learning, dispersal, and sex requires population-level differences on the outcome scale— please see our discussion on phenotypic plasticity and inherent species trait(s) in L313–324. Finally, the Reviewer makes a good point about our analyses but not our findings being replicable. In L334 we now make this distinction by stating “analytically replicable”.

      L306-315: I think you have a bit of a sample size issue not so much when populations are pooled but when separated. This might also factor in the fact that you do not really find differences across the populations in your analysis. When we look at the results presented in Figure 2 (and table d), we can see a trend towards males having better risk sensitivity in core (HPDI above 0) and middle populations (HPDI barely crossing 0) but the difference is very small. Especially the results on females are based on the performance of only 8 and 4 females respectively. I suggest making this clear in the manuscript.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results; it might; so might muchneeded population replicates—see L310. But our Bayesian models still allow us to learn a lot from our current data, and, at present, we infer no meaningful population-level behavioural or mechanistic differences in grackles’ behaviour. To make clear the inferential sufficiency of our analytical approach, we now include some of the above points in our Statistical analyses section in L452–457. Finally, we caution against speculating on any between-population variation, as we now highlight in L311—313 of our Discussion.

      Figure 2: I think the authors should rethink their usage of colour in this graph. It is not colour-blind friendly or well-readable when printed in black and white.

      We used the yellow (hex code: #fde725) and green (hex code: #5ec962) colours from the viridis package. As outlined in the viridis package vignette (https://cran.rproject.org/web/packages/viridis/index.html), this colour package is “designed to improve graph readability for readers with common forms of color blindness and/or color vision deficiency. The color maps are also perceptually-uniform, both in regular form and also when converted to black-and-white for printing”.

      Figure 3B: Could the authors turn around the x-axis and the colour code? It would be easier to read this way.

      We appreciate that aesthetic preferences may vary. In this case, we prefer to have the numbers on the x-axis run the standard way i.e., from small to large. We note we did remove the word ‘Key’ from this Figure, in line with the Reviewer’s point about these characteristics not being totally certain.

      I also had a look at the preregistration. I do think that there are parts in the preregistration that would be worth adding to the manuscript:

      L36-40: This is much easier to read here than in the manuscript.

      We changed this text generally in the Introduction in our revision, so we hope the Reviewer will again find this easier to read.

      L49-56: This is important information that I would also like to see in the manuscript.

      We no longer have confidence in these findings, as our cleaning of only one part of these data revealed considerable experimenter oversight (see ‘Learning criterion’).

      L176: Why did you remove the random effect study site from the model? It is not part of the model in the manuscript anymore.

      The population variable is part of the RL_Comp_Full.stan model that we used in our manuscript to assess population differences in grackles’ reinforcement learning, the estimates from which we report in Table C and D (please note we never coded this variable as “study cite”). But rather than being specified as a random effect, in our RL_Comp_Full.stan model we index phi and lambda by population as a predictor variable, to explicitly model population-level effects. Please see our code:

      https://github.com/alexisbreen/Sex-differences-in-grackles- learning/blob/main/Models/Reinforcement%20learning/RL_Comp_Full.stan

      L190-228: I am wondering if the model validation should also be part of the manuscript as well, rather than just being in the preregistration?

      We are not sure how the files were presented to the Reviewer for review, but our study preregistration, which includes our model validation, should be part of our manuscript as a supplementary file.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1.1) This work introduces a new method of imaging the reaction forces generated by small crawling organisms and applies this method to understanding locomotion of Drosophila larva, an important model organism. The force and displacement data generated by this method are a qualitative improvement on what was previously available for studying the larva, improving simultaneously the spatial, temporal, and force resolution, in many cases by an order of magnitude. The resulting images and movies are quite impressive.

      We thank the reviewer for their recognition of the achievements our work presents and for their feedback with regard to what they consider our most important findings and the points raised in their review. We will address these points individually below.

      (1.2) As it shows the novel application of recent technological innovations, the work would benefit from more detail in the explanation of the new technologies, of the rationales underlying the choice of technology and certain idiosyncratic experimental details, and of the limitations of the various techniques. In the methods, the authors need to be sure to provide sufficient detail that the work can be understood and replicated. The description of the results and the theory of motion developed here focus only on forces generated when the larva pushes against the substrate and ignores the equally strong adhesive forces pulling the larva onto the substrate.

      As the reviewer correctly points out, our present work adapts a recently developed set of methods (namely, ERISM and WARP) for use with small soft-bodied animals. The foundational methods have been described in detail in previous publications (refs, 23 and 26). However, upon reflection, we agree that more information can be provided to ensure our work is more accessible and reproducible. We also agree that some additional clarifying information on our approach could be helpful. We have addressed this in the following ways:

      (1) We have included a detailed Key Resources table in the methods section to allow for maximum transparency on equipment and reagent sourcing. This can now be found on Pages 16-19.

      (2) We have modified the ‘Freely behaving animals force imaging’ section of the Materials and Methods section to include more detailed information on practical aspects of conducting experiments. These changes can be found on page 23-24 (lines 566–567, 571-577).

      (3) We have re-ordered the Materials and Methods section, such that microcavity fabrication and microcavity characterisation occur prior to the description of ERISM and WARP experiments - this change should hopefully aid replication. Details regarding the application of a silicone well to the surface of microcavities have also been added (lines 472-474).

      (4) We have added additional text in the Introduction and Results (Pages 3-4 and 7, lines 56-86, and 152-153) to explain our rationale for using ERISM/WARP and additional text in the discussion that discusses the potential role(s) of adhesive forces in larval locomotion (Page 12, lines 301307).

      (1.3) The substrate applies upward, downward, and horizontal forces on the larva, but only upward and downward forces are measured, and only upward forces are considered in the discussions of "Ground Reactive Forces." An apparent weakness of the WARP technique for the study of locomotion is that it only measures forces perpendicular to the substrate surface ("vertical forces" in Meek et al.), while locomotion requires the generation of forces parallel to the substrate ("horizontal forces"). It should be clarified that only vertical forces are studied and that no direct information is provided about the forces that actually move the larva forward (or about the forces which impede this motion and are also generated by the substrate). Along with this clarification, it would be helpful to include a discussion of other techniques, especially micropillar arrays and traction force microscopy, that directly measure horizontal forces and of why these techniques are inappropriate for the motions studied here.

      We attempted to provide a streamlined Introduction in our initial submission and then compared ERISM/WARP to other methods in our discussion. We are happy to provide a brief overview of substrate force measurement methods in the introduction to help set the stage for readers. The Introduction section of our revised manuscript now contains the following comparison of different mechanobiological imaging techniques on pages 3-4 lines 56-86:

      ‘However, in the field of cellular mechanobiology, many new force measuring techniques have been developed which allow measurement of comparatively small forces from soft structures exhibiting low inertia (15–17) often with relatively high spatial-resolution. Early methods such as atomic force microscopy required the use of laser-entrained silicon probes to make contact with a cell of interest (15). This approach is problematic for studying animal behaviour due to the risk of the laser and probe influencing behaviour. Subsequently, techniques have been developed which allow indirect measurement of substrate interactions. One such approach is Traction Force Microscopy (TFM) in which the displacement of fluorescent markers suspended in a material with known mechanical properties relative to a zero-force reference allows for indirect measurement of horizontally aligned traction forces (17–19). This technique allows for probe-free measurement of forces, but the need to obtain a precise zero-force reference would make time-lapse measurements on behaving animals challenging; further, depending on the version used, it has insufficient temporal resolution for the measurement of forces produced by many behaving animals, despite recent improvements (20). A second approach revolves around the use of micropillar arrays; in this technique, horizontally-aligned traction forces are measured by observing the deflection of pillars made of an elastic material with known mechanical properties. This approach can be limited in spatial resolution and introduces a non-physiological substrate that may influence animal behavior (21,22).

      Recently we have introduced a technique named Elastic Resonator Interference Stress Microscopy (ERISM) which allows for the optical mapping of vertically aligned GRFs in the pico and nanonewton ranges with micrometre spatial resolution by monitoring local changes in optical resonances of soft and deformable microcavities. This technique allows reference-free mapping of substrate deformations and calculation of vertically directed GRFs; it has been used to study a range of questions related to exertion of cellular forces (23–25). Until recently, this technique was limited by its low temporal resolution (~10s), making it unsuitable for recording substrate interaction during fast animal movements, but a further development of ERISM known as wavelength alternating resonance pressure microscopy (WARP), has been demonstrated to achieve down to 10 ms temporal resolution (26). Given ERISM/WARP allows for probe-free measurement of vertical ground reaction forces with high spatial and temporal resolution, it becomes an attractive method for animal-scale mechanobiology.’

      (1.4) The larvae studied are about 1 mm long and 0.1 mm in cross-section. Their volumes are therefore on order 0.01 microliter, their masses about 0.01 mg, and their weights in the range of 0.1 micronewton. This contrasts with the force reported for a single protpodium of 1 - 7 micronewtons. This is not to say that the force measurements are incorrect. Larvae crawl easily on an inverted surface, showing gravitational forces are smaller than other forces binding the larva to the substrate. The forces measured in this work are also of the same magnitude as the horizontal forces reported by Khare et al. (ref 32) using micropillar arrays.

      I suspect that the forces adhering the larva to the substrate are due to the surface tension of a water layer. This would be consistent with the ring of upward stress around the perimeter of the larva visible in S4D, E and in video SV3. The authors remark that upward deflection of the substrate may be due to the Poisson's ratio of the elastomer, but the calibration figure S5 shows that these upward deflections and forces are much smaller than the applied downward force. In any case, there must be a downward force on the larva to balance the measured upward forces and this force must be due to interaction with the substrate. It should be verified that the sum of downward minus upward forces on the gel equals the larva's weight (given the weight is neglible compared to the forces involved, this implies that the upward and downward forces should sum to 0).

      We have carefully calculated the forces exerted by protopodia and are confident in the accuracy of our measurements as reported. We further agree with the reviewer’s suggestion that gravitational forces can be largely neglected.

      As the reviewer points out, one would expect forces due to upward and downward deflections to cancel when considering the entire system. However, we see indications that the counteracting / balancing force often acts over a much larger area than the acting force, e.g. a sharp indentation by a protopodium might be counteracted by an upward deflection over a 10-20 fold larger radius and hence 100 to 400-fold larger area, thereby reducing the absolute value of the upward deflection at any given pixel surrounding the indentation. This in turn increases error in determining the integrated upward deformation, making it difficult to perform an absolute comparison of acting and counteracting force. Further, recording the entire counteracting force induced deformation would require acquiring data with a prohibitively large field of view.

      We agree that in some situations, water surface tension may be adhering animals to the substrate. Importantly, this is a challenge that the animal faces outside the lab in its natural environment of moist rotting fruit and yeast. The intricate force patterns seen in our study in the presence of water surface tension are therefore ecologically relevant. In other situations (e.g. preparing for pupation), larvae are able to stick to dry surfaces, suggesting that other adhesive forces such as mucoid adhesion can also come into play in certain behavioural contexts. A full characterization of the effects of water tension and mucoid adhesion are beyond the scope of this study. However, we have now added a sentence on pages 8 and 12 commenting on these other biomechanical forces at play:

      ‘We also observed that the animals travel surrounded by a relatively large water droplet (lines 189-190).’

      ‘We observed that larvae travel surrounded by moisture from a water droplet, which produces a relatively large upwardly directed force in a ring around the animal. The surface tension produced by such a water droplet likely serves a role in adhering the animal to the substrate. However, during forward waves, we found that protopodia detached completely during SwP, suggesting this surface tensionrelated adhesion force can be easily overcome by the behaving animal. (lines 301-307) .’

      (1.5) Much of the discussion and the model imply that the sites where the larva exerts downward force on the gel are the sites where horizontal propulsion is generated. This assumption should be justified. Can the authors rule out that the larva 'pulls' itself forward using surface tension instead of 'pushing' itself forward using protopodia?

      Determining the exact ‘sites’ where horizontal propulsion is generated is challenging. In our conceptual model, movement is not initiated by protopodia per se, but rather by a constellation of muscle contractions, which act upon the hydrostatic skeleton, which in turn causes visceral pistoning that heaves larvae forward. This is based on previous findings in Ref 31. While there are indeed downward protopodial ‘vaulting’ forces prior to initiation of swing, we propose that the main function of protopodia is not to push the larvae forward, but rather to provide anchoring to counteract opposing forces generated by muscles. We agree that water surface tension could also be sculpting biomechanical interactions; however, a full characterization of how water surface tension shapes larval locomotion is beyond the scope of this study.

      Since we have observed larvae move over dry terrain (e.g. glass) without an encasing water bubble, we do not believe that an encasing water bubble is strictly required for locomotion. We have also seen no obvious locomotion related modulations in the pulling forces created by water bubbles encasing larva, which would be expected if animals were somehow using water tension to pull themselves forward. Overall, the most likely explanation is that larvae use a mixture of biomechanical tactics to suit the moment in a given environment. This represents a challenge but also an opportunity for future research.

      We have now added additional text in the ‘Functional subdivisions within protopodia’ subsection to discuss these nuances (page 14, lines 382-387):

      ‘This increased force transmitted into the substrate is unexpected as the forces generated for the initiation of movement should arise from the contraction of the somatic muscles. We propose that the contraction of the musculature responsible for sequestration acts to move haemolymph into the protopodia thus exerting an increased pressure onto the substrate while the contact area decreases as a consequence of the initiation of sequestration.’

      and (page 15, lines 398-399):

      ‘Water surface films appear to facilitate larval locomotion in general but the biomechanical mechanisms by which they do this remain unclear.’

      (1.6) More detail should be provided about the methods, their limitations, and the rationale behind certain experimental choices.

      We thank the reviewer for this comment. As this significantly overlaps with a point raised earlier, we kindly direct them to our answer to comment #1.2 above.

      (1.7) Three techniques are introduced here to study how a crawling larva interacts with the substrate: standard brightfield microscopy of a larva crawling in an agarose capillary, ERISM imaging of an immobilized larva, and WARP imaging of a crawling larva. The authors should make clear why each technique was chosen for a particular study - e.g. could the measurements using brightfield microscopy also be accomplished using WARP? They should also clarify how these techniques relate to and possibly improve on existing techniques for measuring forces organisms exert on a substrate, particularly micropillar arrays and Traction Force Microscopy.

      Indeed, each of the three methods used has a specific merit. The brightfield microscopy was selected to track features on the animal’s body and to provide a basic control for the later measurements. However, this technique cannot directly measure the substrate interaction, it only allows inferences to be made from tracked features at the substrate interface. ERISM provides high resolution maps of the indentation induced by the larva; it is also extensively validated for mapping cell forces and the data analysis is robust against defects on the substrate (refs 23, 24 and 25). However, as we explain in the manuscript, ERISM lacks the temporal resolution needed to monitor mechanical activity of behaving larva. Its use was therefore limited to the study of anaesthetised animals. For mapping forces exerted by behaving larva, we used WARP which is a further development of ERISM that offers higher frame rates but at the cost of requiring more extensive calibration (Supplementary Figure S4). The streamlined introduction of the different methods in our original manuscript originates from our attempt to be as concise as possible. However, as state in response to comment #1.2, we agree that additional explanation and discussion will be helpful for readers and that it will helpful to briefly refer to other methods for force mapping. We have now added references to a variety of techniques in the Introduction (Page 3-4, lines 56-86) as stated in a prior response.

      (1.8) As written, "(ERISM) (19) and a variant, Wavelength Alternating Resonance Pressure microscopy (WARP) (20) enable optical mapping of GRFs in the nanonewton range with micrometre and millisecond precision..." (lines 53-55) may generate confusion. ERISM as described in this work has a much lower temporal resolution (requires the animal to be still for 5 seconds - lines 474-5); In this work, WARP does not appear to have nanonewton precision (judging by noise on calibration figures) and it is not clear that it has millisecond precision (the camera used and its frame rate should be specified in the methods).

      Previous studies have demonstrated the capabilities and limitations of ERISM and WARP. Upon reflection, we agree that our wording here could be more precise. To clarify our claim, we now separate the statements on ERISM and WARP in the introduction as follows (page 4, lines 78-83):

      “Until recently, this technique was limited by its low temporal resolution (~10s) making it unsuitable for use in recording substrate interaction during fast animal movements, but a further development of ERISM known as wavelength alternating resonance pressure microscopy (WARP), has been demonstrated to achieve down to 10 ms temporal resolution (26)”

      While WARP can achieve comparable force resolution as ERISM when used in a cellular context (c.f. Ref 26), we agree that for the present study, the resolution was in the 10s of nanonewton range, due to the need to use stiffer substrates and larger fields of view.

      The camera used in our work was specified in the appropriate subsection of the Materials and Methods (“All WARP and ERISM images were acquired using an Andor Zyla 4.2 sCMOS camera (Andor Technology, Belfast, UK)”). We apologise that the exact frame rate used in our current work was not mentioned in our original manuscript; this has now been added to the ‘Freely behaving animals force imaging’ section of the Materials and Methods (page 23, lines 574-577).

      (1.9) It would be helpful to have a discussion of the limits of the techniques presented and tradeoffs that might be involved in overcoming them. For instance, what is the field of view of the WARP microscope, and could it be increased by choosing a lower power objective? What would be required to allow WARP microscopy to measure horizontal forces? Can a crawling larva be imaged over many strides by recentering it in the field of view, or are there only particular regions of the elastomer where a measurement may be made?

      We agree with the reviewer that some discussion of the limitations of our technique will allow readers to have a more informed appreciation of what we are capable of measuring using WARP. However, as this is the first work to ever demonstrate such measurements, the limitations and tradeoffs cannot all be known with certainty at the present stage.

      To answer your individual questions:

      (1) There is a trade-off between numerical aperture and the ability to resolve individual interference fringes. Since our approach to calculate displacement from reflection maps relies upon counting of individual fringe transitions, going to a lower powered objective risks having these fringes blend and thus the identification of the individual transitions becoming impossible. The minimum numerical aperture of the objective will therefore generally depend on the steepness of indentations produced by the animals; the steeper an indentation, the closer the neighbouring fringes and thus the higher the required magnification to resolve them.

      (2) From WARP and ERISM data, one can make inferences about horizontal forces, as is described in detail in our earlier publications about ERISM (ref, 23). However, quantitation of horizontal forces at sufficient temporal resolution to allow the investigation of behaving Drosophila larva is currently not possible.

      (3) Many strides can indeed be imaged using our technique, however, this comes with additional technical challenges. Whether or not the animal itself can be recentred is an ongoing challenge. We have found that the animals are amenable to recentring themselves within the field of view if chasing an attractive odorant. However, manual recentering using a paintbrush risks destroying the top surface of the soft elastic resonator and recentering the microscope stage would require real-time object tracking which has been outside the scope of this original work, given the other challenging requirements on hardware and optics for obtaining high quality force maps.

      To provide more information on limitations of our technique, we have added the following text into the discussion (pages 13-14, lines 356-370).

      ‘Despite the substantial advances they have provided, the use of WARP and ERISM also brings challenges and has several technical limitations. For example, fabrication of resonators is much more challenging than preparation of the agarose substrates conventionally used for studying locomotion of Drosophila. This problem is compounded by the fragility of the devices owing to the fragility of the thin gold top mirror. This becomes problematic when placing animals onto the microcavities, as often the area local to the initial placement of the animal is damaged by the paintbrush used to move the animals. Further, as a result of the combining of the two wavelengths, the effective framerate of the resultant displacement and stress maps is equal to half of the recorded framerate of the interference maps. To be able to monitor fast movements, recording at very high framerates is therefore necessary which, depending on hardware, might require imaging at reduced image size, but this in turn reduces the number of peristaltic waves that can be recorded before the animal escapes the field of view. A further limitation is that WARP and ERISM are sensitive mainly to forces in the vertical direction; this is complementary to TFM, which is sensitive to forces in horizontal directions. Using WARP in conjunction with high speed TFM (possibly using the tuneable elastomers presented here) could provide a fully integrated picture of underlying vertical and horizontal traction forces during larval locomotion.’ And further on page 13, lines 337-341:

      ‘More detailed characterisation of this behaviour remains a challenge owing to the changing position of the mouth hooks. Due to their rigid structure and the relatively large forces produced in planting, mouth hooks produce substrate interaction patterns which our technique struggles to map accurately due to overlapping interference fringes ambiguating the fringe transitions.’

      We trust that the above discussion and our modifications to our manuscript resulting from these will address the reviewer’s concerns.

      Reviewer #2 (Public Review):

      (2.1) With a much higher spatiotemporal resolution of ground dynamics than any previous study, the authors uncover new "rules" of locomotory motor sequences during peristalsis and turning behaviors. These new motor sequences will interest the broad neuroscience community that is interested in the mechanisms of locomotion in this highly tractable model. The authors uncover new and intricate patterns of denticle movements and planting that seem to solve the problem of net motion under conditions of force-balance. Simply put, the denticulated "feet" or tail of the Drosophila larva are able to form transient and dynamic anchors that allow other movements to occur.

      We thank the reviewer for their feedback and the information regarding which of our results is likely to resonate most impactfully with readers from a biological background.

      The biology and dynamics are well-described. The physics is elementary and becomes distracting when occasionally overblown. For example, one doesn't need to invoke Newton's third law, per se, to understand why anchors are needed so that peristalsis can generate forward displacements. This is intuitively obvious.

      We are sorry to hear that the reviewer found some of the physics details distracting. To address this concern, we have simplified some of the language while still attempting to keep the core arguments intact. For context and analogy, we still believe that including a brief reference to the laws of motion is helpful for some readers to explain some of our results and highlight their general implications, especially with regard to anchoring against reaction forces.

      One of our objectives is to make this article accessible and interesting for biologists and physicists at all levels. We feel it is important to reach out to both communities and try to be inclusive as possible in our writing. Newton’s 3rd law is clearly relevant for our study and it is a common point of reference for anyone with a highschool education, and so we feel it is appropriate to mention it as a way to help readers across disciplines understand the biophysical challenges faced by the animals we study.

      (2.2) Another distracting allusion to "physics" is correlating deformation areas with displaced volume, finding that "volume is a consequence of mass in a 2nd order polynomial relationship". I have no idea what this "physics" means or what relevance this relationship has to the biology of locomotion.

      Upon reflection, we agree that this language may be overly complex and distracts from what is, at its core, a simple, but important principle governing how Drosophila larvae interact with their substrates. The point we are trying to make is that our data show that forces exerted by an animal are proportional in a non-linear way to contact area. This suggests that to increase force exerted on the substrate, an animal must increase contact area. We do not observe contact area remaining constant while force increases, or vice versa. To make this result more clear, we have made several changes in our revised manuscript. Figure 5B no longer shows the relationship between the protopodial contact area and the displaced volume of the elastic resonator, but instead now shows the protopodial contact area and recorded force transmitted into the substrate. This then shows that in order to increase force transmitted into the substrate, these animals must increase their contact area. We have made changes to the figure legend of Figure 5 and the statements in the Results section accordingly (Page 9, lines 220-222).

      2.3 The ERISM and WARP methods are state-of-the-art, but aside from generally estimating force magnitudes, the detailed force maps are not used. The most important new information is the highly accurate and detailed maps of displacement itself, not their estimates of applied force using finite element calculations. In fact, comparing displacements to stress maps, they are pretty similar (e.g., Fig 4), suggesting that all experiments are performed in a largely linear regime. It should also be noted that the stress maps are assumed to be normal stresses (perpendicular to the plane), not the horizontal stresses that are the ones that actually balance forces in the plane of animal locomotion.

      We largely agree with the statement made by the reviewer here. However, we have found that in many contexts, audiences appreciate having the absolute number of the forces and stresses involved reported. Therefore, where possible, we have used stress maps, rather than displacement maps. We also observe that while stress and displacement maps show similar patterns, features sometimes appear sharper in the stress map, which is a result of the finite element algorithm being able to attribute a broad indentation to a somewhat more localised downward force. We have thus opted to keep to original stress maps. We have been more explicit about WARP and ERISM being more tuned to recording vertically directed forces throughout the revised manuscript (lines 75, 78, 86, 162, 301, 305, 336).

      We have also modified our Discussion section to encourage further investigation of our proposed model using a technique more tuned to horizontal stresses (pages 12-13, lines 324-328):

      ‘However, WARP microscopy is best suited to measurements of forces in the vertical direction, and though we can make inferences such as this as they are a consequence of fundamental laws of physics, we present this conclusion as a testable prediction which could be confirmed using a force measurement technique more tuned to horizontally directed forces relative to the substrate.’

      (2.4) But none of this matters. The real achievements are the new locomotory dynamics uncovered with these amazing displacement measurements. I'm only asking the authors to be precise and down-to-earth about the nature of their measurements.

      We thank the reviewer for their perceptiveness in finding that though the forces are interesting, the interactions themselves are the most noteworthy result here. We trust that with the changes made in our revised manuscript, the description is now more “down-to-earth”, more concise where appropriate, and accurate as to which results are particularly important and novel.

      (2.5) It would be good to highlight the strength of the paper -- the discovery of new locomotion dynamics with high-resolution microscopy -- by describing it in simple qualitative language. One key discovery is the broad but shallow anchoring of the posterior body when the anterior body undertakes a "head sweep". Another discovery is the tripod indentation at the tail at the beginning of peristalsis cycles.

      We thank the reviewer for this recommendation. We agree that including a more explicit statement of some of our findings, especially with regards to these new posterior tripod structures and the whole-abdomen preparatory anchoring prior to head sweeps, would make the paper more impactful. As a result, we have modified the discussion section to include a statement for each new result and have also amended our abstract as a result (lines 407-416):

      “Here we have provided new insights into the behaviour of Drosophila larval locomotion. We have provided new quantitative details regarding the GRFs produced by locomoting larvae with high spatiotemporal resolution. This mapping allowed the first detailed observations of how these animals mitigate friction at the substrate interface and thus provide new rules by which locomotion is achieved. Further, we have ascribed new locomotor function to appendages not previously implicated in locomotion in the form of tripod papillae, providing a new working hypothesis of how these animals initiate movement. These new principles underlying the locomotion outlined here may serve as useful biomechanical constraints as called for by the wider modelling community (39).”

      (2.6) As far as I know, these anchoring behaviors are new. It is intuitively obvious that anchoring has to occur, but this paper describes the detailed dynamics of anchoring for the first time. Anchoring behavior now has to be included in the motor sequence for Drosophila larva locomotion in any comprehensive biomechanical or neural model.

      We agree with the reviewer on this. We think it is best to let our colleagues reflect on our findings and then decide how best to include them in future models.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please be sure to describe in a figure caption or in the methods the details of the optical setup, especially the focal lengths of all the lenses, including the objective, and part numbers of the LEDs and filters. It would be helpful to have a figure in the main paper explaining the principles of ERISM/WARP microscopy along with the calibration measurements and computational pipeline (this would mainly combine elements already in the supplement). Such a figure should also include details of the setup that are alluded to in the methods but not fully explained (for instance, a "silicone well" is referred to in the methods but never described). The calibration of elastomer stiffness that now appears in the main text could be made a supplementary figure, unless there is some new art in the fabrication of the elastomers that should be highlighted as an advance in the main text.

      We appreciate the importance of explaining our methods to readers.

      In response to the public comments, we have added further details in our methods section to clarify practical aspects and ensure that readers will be able to reproduce our work.

      In Supplemental Figure 2, we show the full optical light path for ERISM and WARP along with named components. In addition, the principles of ERISM and WARP microscopy have already been extensively described in previous publications (See Refs 23-26). In light of this, we feel that the best approach in this paper is to direct readers to those publications.

      We feel that it is appropriate to present the calibration of elastomer stiffness in the main text because this is indeed a new innovation that is not just about making the elastomers but making force sensors based on these different materials. This is really important because it shows how researchers can tune the stiffness of an ERISM/WARP elastomer to match the type of tissue or organism under study. This is really the key technical advance that enables whole animal biomechanics across a range of animal sizes, so we think it is appropriate to keep it in the main text.

      We want to make sure that we do not oversell this point, and we feel that we make it sufficiently clear in the main text of our manuscript that making elastomer based force sensors of appropriate stiffness is important, when we state

      “First, we developed optical microcavities with mechanical stiffnesses in the range found in hydrogel substrates commonly used for studying Drosophila larval behaviour, i.e. Young’s modulus (E) of 10-30kPa (36–38).” (p. 5, ll. 124) and later

      “Here we used Drosophila larvae as a test case, but our methods now allow elastic optical resonators to be tuned to a wide range of animal sizes and thus create new possibilities for studying principles of neuro-biomechanics across an array of animals.” (p. 12, ll. 337)

      I would appreciate a description of the "why" behind some experimental choices, as understanding the motivation would be helpful for other researchers looking to adopt these techniques.

      We have now added additional text in the introduction and discussion that explains the rationale behind our experimental choices. in more detail. Please see our response to Reviewer 1’s public comments on the same point.

      (1) The WARP and ERISM experiments were conducted on a collagen coated gold surface rather than agarose. Why? EG does agarose not adhere to the gold, or would its thickness interfere with the measurement?

      The gold layer is applied above the elastomer and the collagen on top of the gold layer makes the gold a more natural biological surface for the animals. Agarose is unsuitable as an elastomer because it would dry during the vacuum based deposition of the gold. It is also unsuitable as a surface coating on top of the gold as the coating on the gold needs to very thin to preserve the spatial and mechanical resolution of our sensors. Further, processing of agarose generally requires temperatures of 60°C and higher which we find can damage the elastomer / gold films.

      (2) The ERISM measurements are made on a cold anesthetized animal right as it starts to wake up (visible mouth-hooks movement), which presents some difficulty. Why not start imaging while the animal is still completely immobile? Or why not use a dead larva?

      This approach allowed us to get measurements of forces exerted by denticles that are physiologically and biomechanically accurate. In dead or fully anesthetized animals, one cannot be sure that the forces exerted by denticles and denticle bands are representative of the forces exerted by an animal with active hydrostatic control.

      (3) In the ERISM setup the monochromator is spatially filtered by focusing through pinhole, while in the WARP setup, the LEDs are not.

      Yes that’s correct. The LED light sources used in WARP have better spatial homogeneity than the tungsten filament used in ERISM and so a pinhole is not required in WARP.

      (4) SV4 shows the interference image of a turning larva (presumably from one illumination wavelength) rather than a reconstruction of the displacement or stresses. Why?

      We felt that in this particular case the interference images provided a clearer representation of the behavioural sequence, showing both the small indentations generated by individual denticles and the larger indentations of the animal overall.

      Lines 49-50 "a lack of methods with sufficient spatiotemporal resolution for measuring GRFs in freely behaving animals has limited progress." This needs a discussion of what sufficient spatial and temporal resolutions would be and how existing methods fall short of these goals.

      We have now rewritten the introduction to include an overview of other alternative approaches and of what we see as the requirements here. See our response to the public comments.

      Figure caption 1B (line 789) refers to "concave areas of naked cuticle (black line) which generally do not interact with the substrate" While I think this might be supported by later WARP images, it's not clear how the technique of figure 1 measures interaction, which could e.g. be mediated by surface tension of a transparent fluid.

      The technique of Figure 1 provides qualitative information which as the reviewer points out is validated by WARP measurements later.

      Lines 184-189 "However, unexpectedly, we observed an additional force on the substrate when protopodia leave the substrate (SI) and when they are replanted (ST). To investigate whether this force was due to an active behaviour or due to shifting body mass, we plotted integrated displacement (i.e. displaced volume) against the contact area for each protopodium, combining data from multiple forwards waves (Figure 5B). Area is correlated with displaced volume for most time points, indicating that volume is a consequence of mass in a 2nd order polynomial relationship." I couldn't follow this argument at all.

      We have now reworded this section and explained our rationale. Also see our response to a similar critique in Reviewer 2’s public comments.

      Generally the authors might reconsider their use of acronyms. e.g. (244-246) "SI latencies were much more strongly correlated with wave duration across most segments than ST latencies. SIs scale with SwP and this could be mediated by proprioceptor activity in the periphery" is made more difficult to parse by the abbreviations.

      As we need to refer to these terms multiple times throughout the manuscript, we feel the use of acronyms is appropriate here.

      The video captions are inadequate. Please expand on them to explain clearly what is shown, and also describe in the methods how the data were acquired and processed. For instance, it seems that in SV3 a motion correction algorithm is applied so that the larva appears stationary even as it crawls forward. I think "fourier filtered" means that the images were processed with a spatial high pass filter - this should be explained and the parameters noted.

      We have revisited the video captions provided in the supplementary information document and conclude that these contain the important information. The mode of acquisition are described in the methods, e.g. Video 1 and 2 see section in Methods on “Denticle band kinematic imaging” and Videos 3 and 4 see section in Methods on WARP. Supplementary Video 3 does not make use of motion correction; indeed, one can see the larvae moving upwards/forwards in the field of view. We apologize for not explaining the Fourier filtering process for Video 3. We have now modified the video caption to read as follows:

      Video SV3. WARP imaging during forwards peristalses.

      Video showing high frame rate displacement maps produced by a freely behaving Drosophila larva. Displacement maps were Fourier filtered to make denticulated cuticle more readily visible and projected in 3D to show the effects of substrate interaction. Details of the Fourier filtering procedure were described elsewhere [Kronenberg et al, Nat Cell Biol 19, 864–872 (2017)].

      What were the reflectances of the bottom (10 nm Au/Cr) and top (15nm Au) metal layers at the wavelengths used? I imagine the bottom layer should be less than 38%, the top layer higher, and the product of the square of the bottom transmission and the top reflectance coefficients equal to the bottom reflectance (to make the two paths of the interferometer contribute equal intensity), but none of this is stated.

      The reflectance of the gold mirrors was studied in detail in prior work on ERISM. See Kronenberg et al, Nat Cell Biol 19, 864–872 (2017). We therefore refrained from adding a complete optical characterization of the ERISM sensors again here. In brief, we found that a reflectance >13% at each Au mirror is required for reliable ERISM measurements.

      The description of the gold coated elastomer as a microcavity is confusing to me. Does the light really make multiple round trips between the plates before returning to the detector? The loss of light on each round trip would depend on the reflectance and parallelism of the top and bottom mirrors. From the WARP calculation it's appears that there is only one round trip - a pi/2 phase shift results from the calculation for one round trip: 2pi*2nL 5nm/(630nm)^2, with n = 1.4 and L = 8 microns - if there were two round trips, the phase shift would be pi etc. Would this better be described as a mostly common path interferometer?

      The physics of our devices is best described within the framework of thin film interference and (weak) microcavity optics. Indeed, light can make multiple roundtrips, though it gets attenuated with each reflection. The complete calculation of the multiple roundtrips is only required to obtain quantitative information on the amount of light that is reflected. The spectral position of minima in reflectance can also be obtained from assuming one roundtrip which is what is done in the description of the WARP calculations.

      Figure 2 e,f: the line fits appear to be dominated by the data points at 2 s. If these are removed, do the fits change? To support the argument that 2e shows a correlation and 2f does not, some kind of statistical test, ideally a hierarchical bootstrap, should be conducted to compare between the two measurements.

      If we remove the data points at 2 s, then R^2’s for swing initiation latencies change as follows: A2: 0.35 to 0.005; A4: 0.78 to 0.31; A6: 0.61 to 0.01. The data in 2e,f are the averages from 3 waves in each animal and so the data points at 2 s are not simply the result of single ‘rogue’ waves but rather averages of several trials. Further, if all individual waves are plotted, we can see that the overall trends are still visible.

      We don’t think it is appropriate to remove the data at 2 s from our analysis, but we take the point regarding statements about presence or absence of correlation in a formal sense. We have therefore changed the wording in the description of 2e,f to refer simply to the fact that wave duration can ‘largely determine' latencies in some instances, but is less able to in other instances, as is suggested by the R^2 (coefficient of determination) data. In discussion, we have also adjusted our wording.

      Figure 4 - please provide in the main figure or as a supplement the full images (i.e. not cropped to the assumed shape of the larva)

      We do not feel that it is necessary or helpful to provide the full images given that the focus of the analysis is on dynamics of protopodia movements.

      Figure 5e top: single data points around wave duration 0.6s appear to dominate fit lines. Does removing these points alter the fits? To support the argument that 5e top shows a correlation and 5e bottom does not, some kind of statistical test, ideally a hierarchical bootstrap, should be conducted to compare between the two measurements.

      In Figure 5e, we are showing all waves analysed across animals. If we remove the datapoints at 0.6 s, A2 R^2 changes from 0.24 to 0.05, A4 R^2 changes from 0.48 to 0.11, A6 R^2 changes from 0.69 to 0.34; however we don’t feel it is appropriate to remove these data from our analysis. We take the point about needing to be cautious about making claims about correlation versus no correlation and have now reworded description of these results along same lines as Figure 4.

      It appears from the methods (467-489) that animals were kept wet for warp imaging but not for ERISM imaging. Please confirm or explain further the presence or absence of a water layer in these two sets of measurements, as this could affect the adhesion forces.

      In each case, the animals were transferred onto experimental substrates with a moistened paintbrush. We have added text explicitly stating this in the methods section.

      Kim et al. Nature Methods 2017 (10.1038/nmeth.4429) describes recording two images separated by less than 60 microseconds using a scientific CMOS camera with a frame rate of 200 Hz. This is accomplished by triggering a pulsed LED once at the end of one frame's capture window and then a second time at the beginning of the next frame's window (see Supplementary Figure 10). I'm not sure if this trick is widely known, but it's worth considering if the authors are running into a problem with movement between the two wavelength exposures in their WARP setup.

      Thank you for this tip. We will take this under consideration for future work.

      Is the setup compatible with optogenetics? (EG is the red light dim enough that it wouldn't activate CsChrimson, or could a longer wavelength led be used for interferometry?) If so, activation of mooncrawler descending neuron (MDN) could be used to study backward crawling (or thermogenetic activation of MDN), e.g. to contrast the sites and order of "anchoring" between the two directions of crawling.

      The set-up is potentially compatible with optogenetics. We are in the process of exploring this in current ongoing work.

      Reviewer #2 (Recommendations For The Authors):

      Simplify/reduce the commentary about force measurements, and highlight the clear, qualitative descriptions of the novel locomotion patterns that they have observed. The microscopy and movements seem to matter more than the ground force estimations.

      We have addressed these issues in our responses to Reviewer 2’s public comments.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank the reviewers for their valuable feedback which has improved this work greatly from its original form, and are elated to have such glowing reviews of the revised work published alongside the revised preprint. Reviewer 3 raises some final salient points, which deserve a brief address here.

      Teeth: We thank the reviewer for clarifying their points. We do make the assumption that the ecological parameter space of toothed and beaked organisms will be comparable. Both are governed by the same set of physical principles and have the jaw bone as the most likely point of failure (teeth are harder than bone, and keratinous rhamphothecae are malleable and can be regrown with relative ease when deformed). Differences in stress/strain distribution between toothed and beaked organisms will occur but are already accounted for in our methods as we model both the teeth and rhamphotheca and will observe these different effects. We have added an explicit statement of this hypothesis to the Methods section of the manuscript.

      Cranial kinesis: In our opinion, it is a safe assumption that the lower jaws of extant birds and enantiornithines are comparable. We do not see why the acquisition of kinesis in the upper jaw would generally affect the functional role of or constraints on the lower jaw. One possibility we discussed is that a quickly-moving kinetic premaxilla could let the lower jaw move a shorter distance during effective prey capture and lower the selection for speed (i.e. allow jaw-closing MA to remain higher). While we have added this possibility to our call for the investigation of cranial kinesis, we consider it too speculative to begin altering interpretations of fossil taxa. All raw measurement data remains available so that, if evidence is found for cranial kinesis having predictable effects on our measured parameters, future researchers can re-analyse our data and update any ecological predictions accordingly.

      Organization: To our knowledge eLife format incorporates what one would think of as a Conclusions section into the Discussion. Our Discussion section currently contains 18 subheadings which should guide a reader to any specific topic of interest. The Discussion also progresses from a more narrow to broad focus which we and several colleagues find intuitive.

      We thank all three reviewers once again for their feedback that has improved this work and their kind words throughout the process.


      The following is the authors’ response to the original reviews.

      We thank all three reviewers for their detailed reviews, and generally agree with their feedback. To accompany the reviewed preprint of this manuscript, we wished to respond to comments from the reviewers so that they (and the public) will know what we are planning to incorporate in the revised manuscript we are currently preparing. If there are any comments on our plans in the meantime, please let us know.

      • Reviewer 1, on concerns regarding identification of ontogenetic stage and comparison of taxa from different ontogenetic stages: It is fair to say that enantiornithine ontogeny is still poorly understood, though we believe all current evidence points to each specimen used in this study to being adequately mature for comparison to the extant birds used in the study. Stages of skeletal fusion are the standard method of assessing enantiornithine ontogeny (Hu and O'Connor 2017), and our comparison of histological work (Atterholt, Poust et al. 2021) to skeletal stages in Table S4 suggests a transition from juvenile to subadult in stage 0 or 1 and from subadult to adult within stage 3. Thus, the specimens we quantitatively examine in this study, all at stages 2 or 3 (Figure S10), are advanced subadults or adults. It is well-known that many living animals considered “adults” would be considered subadults or even juveniles to a palaeontologist (Hone, Farke et al. 2016). So, even if some individuals in this study are not fully skeletally mature, they should have obtained the morphology which they would possess for most of their lives and thus the morphology which undergoes selective pressure. We will add this context to the “Bohaiornithid Ontogeny” section and thank the reviewer for seeking more detail for this point.

      • Reviewer 2, on need of a context figure: We have an artistic life reconstruction of a bohaiornithid in preparation, and can include that in the revised manuscript as a figure.

      • Reviewer 2, on raptor claw categories: We explain these categories in-depth in a previous work (Miller, Pittman et al. 2023). However, we will now add a short summary of that explanation to this work so that this manuscript will become self-contained in this regard. In short, the “large raptor” category includes extant birds with records of regularly taking prey which cannot be encircled with the pes, while birds in the “small raptor” have no such records. As Reviewer 2 points out this does often follow phylogenetic lines, but not always. E.g. most owls specialise in taking small prey, but the great horned owl Bubo virginianus regularly takes mammals and birds larger than its pes (Artuso, Houston et al. 2020); and conversely we can only find reports of the common black hawk Buteogallus anthracinus taking prey samll enough for the pes to encircle (Schnell 2020) despite other accipiters frequently taking large prey. In both cases these taxa plot in PCA nearer to other large or small raptors (respectively) than to their phylogenetic relatives.

      • Reviewer 3, on teeth vs beaks: We are not aware of any foods which are exclusive to toothed or beaked animals. There are some aspects of extant bird biology that may affect the way a certain diet may need to be adapted to which we do comment on, e.g. discussion of alternatives to the crop and ventriculus for processing plant matter in the Bohaiornithid Ecology and Evolution section. For functional studies, e.g. FEA, we have included the rhamphotheca in toothless models which serves the same role as teeth, to be a feeding surface. It should not matter, in theory, if the feeding surface is hard or soft as mechanical failure occurs in high stress/strain states regardless of the medium. If having teeth necessarily increases or decreses overall stress/strain relative to a beak (and from our work this does not appear to be the case), this would in turn necessarily limit dietary options. So, all models in our work should be directly comparable.

      As an additional note on this topic, we address tooth shape in bohaiornithids at the end of the Bohaiornithid Ecology and Evolution section. We specifically note that their tooth shape is likley controlled by phylogeny in the current version, though we will add a note in the upcoming version that the morphospace of bohaiorntihid teeth overlaps that of many other clades with purportedly diverse diets, which is consistent with a hypothesis of diverse diets within the clade.

      • Reviewer 3, on cranial kinesis: Our FE models should be unaffected by cranial kinesis, as these are two-dimensional and model the akinetic lower jaw only. Some mediolateral kinesis may be relevant in the mandible in the form of “wishboning” in different taxa, but its prevalence in extant birds is currently unknown. The preservation of enantiornithines (two-dimensionally and typically in lateral view) limits the ability to capture any mediolateral function regardless.

      Our models of mechanical advantage do not account for any cranial kinesis. This is a necessary simplifcation. The nature of cranial kinesis in extant birds, and the role that it plays in feeding, is poorly understood. Cranial kinesis will increase gape, but we don’t yet know how/if it affects jaw closing force and speed (moreover, given the variation in quadrate and hinge morphology present in extant birds, this is also something that is likely to be highly diverse). We have therefore modelled the extant birds’ jaw closing systems as having one, akinetic out lever (the jaw joint to the bite point), to match the situation in our fossil taxa. This is a common simplification that has been used previously with success (Corbin, Lowenberger et al. 2015, Olsen 2017). However, we acknowledge that this simplification may introduce some error. Unfortunately, until the mechanics of cranial kinesis – and the variation in the anatomy and performance of kinetic structures in extant birds – are better understood, we cannot determine exactly what that error looks like. We therefore have greater confidence in the inter-species comparability this conservative, akinetic approach (in other words, we may not be making assumptions that are 100% accurate, but we are at least making the same assumption across all taxa, so it should be comparable in its error). We will add a section in the Mechanical Advantage and Functional Indices discussion calling for further research into the mechanics of cranial kinesis so future mechanical advantage work in birds can take this matter into account.

      • Reviewer 3, on skull reconstruction: This issue is partly addressed in the Bohaiornithid Skull Reconstruction section, though we agree that adding more mentions of it in the MA and FEA Discussion sections and the Bohaiornithid Ecology and Evolution sections will benefit the manuscript. Most notably Shenqiornis and Sulcavis have similar ecological interpretations, but much of the Shenqiornis skull reconstruction uses Sulcavis bones. Longusunguis is the only other taxon which takes more than two bones from a different taxon, and in this case all but the quadrate are not used in any quanitative measurements. We have ensured that the skull reconstructions presented in Figure 2 show what portions of the skull come from what specimen so that as new material is discovered and phylogenetic relationships are updated it will be clear to future readers which parts of reconstructions will need to be updated.

      • Reviewer 3, on data availability: All data including FEA models and raw measurement data are included in the same repository as the scripts, which we will make clear in the manuscript. Good catch on the data link being dead, we will publish it now.

      As a final note, it was brought to our attention by another colleague that the original manuscript’s ancestral state reconstrction lacked an outgroup. An updated reconstruction using Sapeornis as an outgroup will be included in the revised manuscript. The addition of the outgroup does not change any conclusions of the manuscript.

      We once again thank our reviewers for their valuable feedback and will submit a revised version of this manuscript for publication shortly. Please let us know if you have any additional comments after reading our response that we can take onboard in our revision.

      References

      Artuso, C., C. S. Houston, D. G. Smith and C. Rohner (2020). Great Horned Owl (Bubo virginianus), version 1.0. Birds of the World. A. F. Poole. Ithaca, NY, USA, Cornell Lab of Ornithology.

      Atterholt, J., A. W. Poust, G. M. Erickson and J. K. O'Connor (2021). "Intraskeletal osteohistovariability reveals complex growth strategies in a Late Cretaceous enantiornithine." Frontiers in Earth Science 9: 640220.

      Corbin, C. E., L. K. Lowenberger and B. L. Gray (2015). "Linkage and trade‐off in trophic morphology and behavioural performance of birds." Functional ecology 29(6): 808-815.

      Hone, D. W. E., A. A. Farke and M. J. Wedel (2016). "Ontogeny and the fossil record: what, if anything, is an adult dinosaur?" Biology letters 12(2): 20150947.

      Hu, H. and J. K. O'Connor (2017). "First species of Enantiornithes from Sihedang elucidates skeletal development in Early Cretaceous enantiornithines." Journal of Systematic Palaeontology 15(11): 909-926.

      Miller, C. V., M. Pittman, X. Wang, X. Zheng and J. A. Bright (2023). "Quantitative investigation of Mesozoic toothed birds (Pengornithidae) diet reveals earliest evidence of macrocarnivory in birds." iScience 26(3): 106211.

      Olsen, A. M. (2017). "Feeding ecology is the primary driver of beak shape diversification in waterfowl." Functional Ecology 31(10): 1985-1995.

      Schnell, J. H. (2020). Common Black Hawk (Buteogallus anthracinus), version 1.0. Birds of the World. A. F. Poole and F. B. Gill. Ithaca, NY, USA, Cornell Lab of Ornithology.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The current work by Kulich et al. examines the dynamic relocalization of NGR1 (LAZY2) a member of the LAZY protein family which is key for auxin redistribution during gravitropic responses. After gravistimulation of the triple mutant ngr123 (lazy234), the PIN3 activating kinase D6PK is not polarized in the columella cells.

      Strengths:

      The authors show a thorough characterization of NGR1 relocalization dynamics after gravistimulation.

      Weaknesses:

      Genetically the relocalization of D6PK depends on the LAZY protein family, but some essential details are missing in this study. On the one hand, NGR1-GFP does not associate with the BFA compartments and maintains its association with the PM and amyloplasts. On the other hand, D6PK relies on GNOM, via vesicle trafficking sensitive to BFA, suggesting that D6PK follows a different relocalization route than NGR1 which is BFA-insensitive. Based on these observations, D6PK relocalization requires the LAZY proteins, but D6PK and NGR1 relocalize through independent routes. How can this be interpreted or reconciled?

      Response: Since we demonstrated that D6PK does not relocalize in the absence of NGR proteins, we conclude that NGR1 acts upstream of D6PK. The molecular mechanism driving this interaction is not fully understood; however, it is evident that NGR1 triggers the mobilization of D6PK. Despite previous investigations into D6PK mobility, the underlying mechanisms remain elusive. Notably, despite its sensitivity to BFA, D6PK does not localize to BFA bodies and does not undergo conventional endocytosis (https://doi.org/10.1016/j.devcel.2014.05.006). We fully acknowledge the importance and interest in gaining a better understanding of these processes, and it will be a focal point of our future research.

      Two other works (now published) provide valuable and fundamental findings related to the mechanism examined in the current manuscript and display complementary and similar results to the ones shown in the current manuscript. Given the similarities in the examined mechanisms, these preprints should be referenced, recognized, and discussed in the manuscript under review. It is assumed that the three projects were independently developed, but the results of these previous works should be addressed and taken into account at least during the discussion and when drawing any conclusions. This does not mean that this work is less relevant. On the contrary, some of the observations that seem to be redundant are more solid, and firm conclusions can now be drawn from them.

      Response: We have included and discussed these works in the revised discussion

      Reviewer #2 (Public Review):

      Summary:

      This manuscript addresses what rapid molecular events underly the earliest responses after gravity-sensing via the sedimentation of starch-enriched amyloplasts in columella cells of the plant root cap. The LAZY or NEGATIVE GRAVITROPIC RESPONSE OF ROOTS (NGR) protein family is involved in this process and localizes to both the amyloplast and to the plasma membrane (PM) of columella cells.

      The current manuscript complements and extends Nishimura et al., Science, 2023. Kulich and colleagues describe the role of the LZY2 protein, also called NGR1, during this process, imaging its fast relocation and addressing additional novel points such as molecular mechanisms underlying NGR1 plasma membrane association as well as revealing the requirement of NGR1/LZY2, 3,4 for the polar localization of the AGCVIII D6 protein kinase at the PM of columella cells, in which NGR1/LZY2 acts redundantly with LZY3 and LZY4.

      The authors initially monitored relocalization of functional NGR1-GFP in columella cells of the ngr1 ngr2 ngr3 triple mutant after 180-degree reorientation of the roots. Within 10 -15 min NGR1-GFP signal disappeared from the upper PM after reorientation and reappeared at the lower PM of the reoriented cells in close proximity to the sedimented amyloplasts. Reorientation of NGR1-GFP occurred substantially faster than PIN3-GFP reorientation, at about the same time or slightly later than a rise in a calcium sensor (GCaMP3) just preceding a change in D2-Venus auxin sensor alterations. Reorientation of NGR1-GFP proved to be fast and not dependent on a brefeldin A-sensitive ARF GEF-mediated vesicle trafficking, unlike the trafficking of PIN proteins, like PIN3, or the AGCVIII D6 protein kinase. Strikingly, the PM association of NGR1-GFP was highly sensitive to pharmacological interference with sterol composition or concentration and phosphatidylinositol (4)kinase inhibition as well as dithiothreitol (DTT) treatment interfering with thioester bond formation e.g. during S-acylation. Indeed, combined mutation of a palmitoylation site and polybasic regions of NRG1 abolished its PM but not its amyloplast localization and rendered the protein non-functional during the gravitropic response, suggesting NRG1 PM localization is essential for the gravitropic response. Targeting the protein to the PM via an artificially introduced N-terminal myristoylation and an ROP2-derived polybasic region and geranylgeranylation site partially restored its functionality in the gravitropic response.

      Strengths:

      This timely work should be of broad interest to plant, cell and developmental biologists across the field as gravity sensing and signaling may well be of general interest. The point that NGR1 is rapidly responsive to gravistimulation, polarizes at the PM in the vicinity to amyloplast and that this is required for repolarization of D6 protein kinase, prior to PIN relocation is really compelling. The manuscript is generally well-written and accessible to a general readership. The figures are clear and of high quality, and the methods are sufficiently explained for reproduction of the experiments.

      Weaknesses:

      Statistical analysis has been performed for some figures but is lacking for most of the quantitative analyses in the figure legends.

      Response: We added this information to the figure legends

      The title claims a bit more than what is actually shown in the manuscript: While auxin response reporter alterations are monitored, "rapid redirection of auxin fluxes" are not really directly addressed and, while D6PK can activate PIN proteins in other contexts, it is not explicitly shown in the manuscript that PIN3 is a target in the context of columella cells in vivo. A title such as "Rapid redirection of D6 protein kinase during Arabidopsis root gravitropism relies on plasma membrane translocation of NGR proteins" would reflect the results better.

      Response: We modified the title to Rapid translocation of NGR proteins driving polarization of PIN-activating D6 protein kinase during root gravitropism

      Fig. 4: The point that D6PK is transcytosed cannot be made here based on the data of these authors. They should have used a photoswitchable version of NGR1 to show that the same molecules observed at the upper PM are translocated to the lower PM. Nishimura and colleagues actually did that for NGR4. However, this is a lot of work and maybe for NGR1 that fusion would have too low fluorescence intensity (as it was the case for NGR3). So, I think a rewording would be sufficient such as NGR-dependent reorientation of D6PK plasma membrane localization" as this does not say, from where it comes to the lower PM. Theoretically, the signal could also be amyloplast-derived or newly synthesized (or just folded) NGR1-GFP.

      Response: We fully agree and rephrased the text using translocation instead of transcytosis

      The authors make a model in which D6PK AGCVIII kinase-dependent on NGRs activates PIN3 to drive auxin fluxes. However, alterations in auxin responses are observed prior to PIN3 reorientation. They should explain this discrepancy better and clearly describe that this is a working hypothesis for the future rather than explicitly proven, yet.

      Reviewer #3 (Public Review):

      The mechanism controlling plant gravity sensing has fascinated researchers for centuries. It has been clear for at least the past decade that starch-filled plastids (termed statoliths) in specialised gravity-sensing columella cells sense changes in root orientation, triggering an asymmetric auxin gradient that alters root growth direction. Nevertheless, exactly how statolith movement triggers PIN auxin efflux carrier activation and auxin gradient formation has remained unclear until very recently. A series of new papers (in Science and Cell) and this manuscript report how LAZY proteins (also referred to as NEGATIVE GRAVITROPIC 50 RESPONSE OF ROOTS; NGR) play a pivotal role in regulating root gravitropism. In terms of their overall significance, their collective findings provide seminal insights into the very earliest steps for how plant roots sense gravity which are arguably the most important papers about root gravitropism in the past decade.

      In the current manuscript, Kulich et al initially report (through creating a functional NGR1-GFP reporter) that "NGR1-GFP displayed a highly specific columella expression, which was most prominent at the PM and the statolith periphery." Is NGR1-GFP expressed in shoot tissues? If yes, is it in starch sheath (the gravity-sensing equivalent of root columella cells)? The authors also note "NGR1-GFP signal from the PM was not evenly distributed, but rather polarized to the lower side of the columella cells in the vicinity of the sedimented statoliths (Fig. 1A)." and (when overexpressing NGR-GFP) "chloroplasts in the vicinity of the PM strongly correlated with NGR1 accumulating at the PM nearby, similar to the scenario in columella" suggesting that NGR1 does not require additional tissue-specific factors (i.e. trafficking proteins or lipids) to assist in its intracellular movement from plastid to PM.

      Response: Yes, NGR1, also called LAZY2 is expressed in the inner hypocotyl tissues, according to https://doi.org/10.1104/pp.17.00942. Unfortunately, we saw very little signal with our NGR-GFP construct, possibly due to NGR1-GFP weak signal and/or NGR1 being expressed only exclusively in the inner tissues.

      Next, the authors study the spatiotemporal dynamics of NGR1-GFP re-localisation with other early gravitropic signals and/or components Calcium, auxin, and PIN3. The temporal data presented in Figure 1 illustrates how the GCaMP calcium reporter (in panel E) revealed "the first signaling event in the root gravitropic bending is the statolith removal from the top membrane, rather than its arrival at the bottom" It appeared that the auxin DII-VENUS reporter was also changing rapidly (panel G) - was this detectable BEFORE statolith re-sedimentation?

      Response: In our data (Figure 1G), we observe that the increase in signal at the top side begins prior to starch sedimentation, in contrast to the bottom side, where the decrease starts only after starch grains land on the bottom membrane. While this observation aligns with our hypothesis and other data, we refrained from commenting on it due to the small differences between the first 2-3 timepoints, which are obscured by noise. This phenomenon arises because the DII response relies on protein degradation and is relatively slow. Hence, for rapid tracking of the auxin response, we utilized auxin-induced calcium as a proxy, with NPA treatment serving as a negative control.

      Please can the authors explain their NPA result in Fig 1E? Why would treatment with the auxin transport inhibitor NPA block Ca signalling (unless the latter was dependent on the former)?

      Response: Auxin induces rapid calcium transients (e.g., http://dx.doi.org/10.1016/j.cub.2015.10.025). Consequently, when auxin reaches the bottom elongation zone approximately 5-6 minutes after rotation, we observe an increased GCaMP signal at this location. Notably, when we inhibit PIN function using NPA, the GCaMP signal persists, but the difference between the top and bottom diminishes. This validates that the calcium transients at the bottom side can be interpreted as monitoring increase in auxin accumulation as a result of auxin transport.

      They go on to note "This initial auxin asymmetry is mediated by PIN-dependent auxin transport, despite visible polarization of PIN3 can be detected only later" which suggests that PIN activity was being modified prior to PIN polarisation.

      In contrast to other proteins involved in gravity response like RLDs and PINs, NGR1 localization and gravity-induced polarization does not undergo BFA-sensitive endocytic recycling by ARF-GEF GNOM. This makes sense given NGR1 is initially targeted to plastids, THEN the PM. Does NGR1 contain a cleavable plastid targeting signal? The authors go on to elegantly demonstrate that NGR1 PM targeting relies on palmitoylation through imaging and mutagenesis-based transgenic ngr rescue assays.

      Response: Yes, there is weakly conserved plastid targeting signal on NGR1. Although we also started researching in this direction, we quickly realized, that two other groups showed very comprehensive data regarding NGR plastid localization.

      Finally, the authors demonstrate that gravitropic-induced auxin gradient formation is initially dependent on PIN3 auxin efflux activation (prior to PIN3 re-localisation). This early PIN3 activation process is dependent on NGR1 re-targeting D6PK (a PIN3 activating kinase). This elegant molecular mechanism integrates all the regulatory components described in the paper into a comprehensive root gravity sensing model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Line 83: This construct fully rescued the agravitropic bending phenotype of the ngr1/2/3 triple mutant (see further).

      What does it mean the see further in this context?

      Response: It is a reference to the second part of the manuscript (Fig. 3, Supplementary Fig S3, Fig S4), where we extensively address the complementation with wild type and point mutated versions of NGR. There we show that the construct we are using is functional. This does not prove, but strongly imply that the GFP signal we obtain is relevant. We updated the text to point this out.

      Line 101: Timing of events during the gravitropic response

      When describing the equipment employed and the rotation applied to the samples, "the vertical stage microscope and minimized the time required for rotating the sample. 180{degree sign} rotation..."

      The authors mentioned a travel time of 5 minutes first and later of 15 minutes for the relocalization of NGR1. Are these two different experiments? Were there two different rotation angles or degrees applied? Could the authors please rephrase this part of the description to answer these questions and help the reader understand how the assay performed?

      Response: We added this explanation to the text.

      Figure 1 E, F, and G.

      Could the authors please provide pictures and/or videos for the PIN3 localization dynamics, intracellular calcium transients, and auxin reporter DII-Venus? In other words, show the complementing images for Figure 1E, 1F, and 1G as the authors did for Figure 2D where authors presented the pictures and the corresponding quantification plots.

      Response: We wanted to avoid overcrowding the figure, but we would also love to show the videos. Therefore, we did additional supplementary movie 3, where we put all the additional observations.

      Line 194: This implies the existence of posttranslational modifications such as S-acylation to associate with PM.

      Why is this specific modification suggested/examined and no other modification? What is the criteria to select this kind of modification? Based on what premises? Could the authors elaborate on that? Could the authors please include references?

      Response: Thank you for this comment. We of course first checked the prediction tools which have shown very strongly conserved S-acylation side. We now clarified this in the text and added other modifications as an example. Later on, we rule out myristoylation (that happens on the glycins) and prenylation (it happens only at the C-terminus CAAX box).

      Line 255: NGR1 PM localization is synergistically mediated by polybasic regions and a palmitoylation site

      Similarly to the previous commentary, How and why are these regions examined/analyzed? Likewise, why is the palmitoylation site selected? Please provide some background, criteria, and references.

      Response: Here, we clearly state that the prediction of the palmitoylation site is made based on the GPS lipid prediction tool.

      As for the polybasic region, these can be seen upon manual inspection of the primary protein sequence. We simply looked at the protein and saw it there. We rephrased the text so that it is more clear.

      Reviewer #2 (Recommendations For The Authors):

      Please, proofread the manuscript for style and minor language errors.

      Statistical analysis has been performed for some figures but is lacking for most of the quantitative analyses in the figure legends. Where it has been performed it is not given what "n" number of roots, cells, or plasma membranes were analyzed NGR1-GFP and no information is given whether the data is derived from a representative experiment or several or pooled data from several experiments. This certainly requires revision in Fig. 1D-G, Fig. 2B-D, Fig. S2 B,E, Fig. 3B,D, F-H, Fig. S.3 B,D, Fig. S. 4 ,E-H, Fig. 4 D.

      Response: Thank you, we added this information to the figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This fascinating paper by M. Alfatah et al. describes work to uncover novel genes affecting lifespan in the budding yeast S. cerevisiae, eventually identifying and further characterizing a gene, YBR238C, now named AAG1 by the authors. The authors began by considering published gene sets pulled from the Saccharomyces genome database that described increases or decreases in either chronological lifespan or replicative lifespan in yeast. They also began with gene sets known to be downregulated upon treatment with the lifespan-extending TOR inhibitor rapamycin.

      YBR283C was unique in being largely uncharacterized, downregulated upon rapamycin treatment, and linked to both increased replicative lifespan and increased chronological lifespan upon deletion.

      The authors show that YBR283C may act to negatively regulate mitochondrial function, in ways that are both dependent on and independent of the stressresponsive transcription factor Hap4, largely by looking at relative expression levels of relevant mitochondrial genes.

      In a hard-to-fully interpret but well-documented series of experiments the authors note that the two paralogues YBR283C and RMD9 (which have ~66% similarity) (a) have opposite effects when acting alone, and (b) appear to interact in that some phenotypes of ybr283c are dependent on RMD9.

      A particularly interesting finding in light of the current literature and of the authors' strategy in identifying YBR283C is that changes in electron transport chain genes upon rapamycin treatment appear to be affected via YBR283C.

      Based on a series of experiments the authors move to conclude the existence of "a feedback loop between TORC1 and mitochondria (the TORC1-Mitochondria-TORC1 (TOMITO) signaling process) that regulates cellular aging processes."

      Strengths

      Overall, this study describes a great deal of new data from a large number of experiments, that shed light on the potential specific roles of YBR238C and its paralog RMD9 in aging in yeast, and also underscore the potential of an approach looking for "dark matter" such as uncharacterized genes when seining the increasing deluge of published datasets for new hypotheses to test. This work when revised will become a valuable addition to the field.

      Weaknesses

      A paralog of YBR283C, RMD9, also exists in the yeast genome. While the authors indicate that part of their interest in YBR283C lies in its uncharacterized nature, its paralogue, RMD9, is not uncharacterized but is named due to its phenotype of Required for Meiotic nuclear Division, which is not mentioned or discussed anywhere in the manuscript currently.

      In the context of the current work, in addition to the cited Hillen, H.S et al. and Nouet C. et al, the authors might be very interested in the 2007 Genetics paper "Translation initiation in Saccharomyces cerevisiae mitochondria: functional interactions among mitochondrial ribosomal protein Rsm28p, initiation factor 2, methionyl-tRNAformyltransferase and novel protein Rmd9p" (PMID: 17194786), which does not appear to be cited or discussed in the current version of the manuscript.

      Thank you for your thorough and insightful review of our manuscript. We value your positive feedback and recognition of the strengths in our study. Your constructive comments have been carefully considered, leading to the inclusion of RMD9, identified as 'Required for Meiotic Nuclear Division,' and the addition of the relevant reference (PMID: 12586695) in the revised manuscript. This information has been incorporated into the second paragraph of the "The YBR238C paralogue RMD9 deletion decreases the lifespan of cells" results section.

      Furthermore, we appreciate the reviewer's suggestion to include the 2007 Genetics paper on translation initiation in Saccharomyces cerevisiae mitochondria (PMID: 17194786). This citation has been integrated into our revised manuscript.

      We believe that these revisions significantly strengthen the manuscript and address the concerns raised by Reviewer #1. We thank the reviewer for their time and valuable input.

      Reviewer #2 (Public Review):

      The effectors of cellular aging in yeast have not been fully elucidated. To address this, the authors curated gene expression studies to link genes influenced by rapamycin - a well-known mediator of longevity across model systems - to genes known to affect chronological and replicative lifespan (RLS) in yeast. Through their analyses, they find one gene, ybr238c, whose deletion increases both CLS and RLS upon deletion and that is downregulated by rapamycin. Curiously, despite these selection criteria, the authors only use CLS as a proxy for cellular aging throughout their study and do not explore the effects of ybr238c deletion on RLS. This does not diminish their conclusions, but given the importance of this phenotype in their selection criteria, it is surprising that the authors did not choose to test both types of aging throughout their study.

      Nonetheless, the authors demonstrate that deletion of ybr238c increases CLS across multiple yeast strains and through multiple assays. The authors also test the effects of YBR238C overexpression on lifespan and find the opposite effect, with overexpression yeast showing decreased survival relative to wild-type cells, consistent with "accelerated aging" as the authors propose. The authors also note that ybr238c has a paralog, rmd9, whose deletion decreases CLS and seems to be epistatic to ybr238c, as a double ybr238c/rmd9 mutant has decreased CLS relative to a wild-type strain.

      Collectively, the data presented by the authors convincingly demonstrate that ybr238c influences lifespan in a manner that is distinct from (and likely opposite to) rmd9. However, the authors then link the increased CLS in Δybr238c yeast to mitochondrial function using only a handful of assays that do not directly test mitochondrial function. These include total cellular ATP levels, levels of reactive oxygen species, and the transcript levels of select nuclear-encoded mitochondrial genes. Yeast is well established to generate ATP through non-mitochondrial pathways such as glycolysis in fermentive conditions. While it is possible that the ATP levels assayed in the manuscript were tested in stationary phase, which would more likely reflect "mitochondrial function," the methods nor the figure legends contain these details, which are critical for the interpretation of these data. Similarly, ROS can be generated through non-mitochondrial pathways, and the transcription of nuclear-encoded mitochondrial genes is an indirect measure of mitochondrial function at best. Thus, the authors' proposed connection of ybr238c to mitochondrial function is correlative and should be substantiated with assays that more closely align with organellar function, such as respirometry or assaying the activity of oxidiative phosphorylation complexes. Finally, the authors attempt to tie the phenotypes of mitochondrial dysfunction caused by the deletion of ybr238c to TORC1 signaling, as the gene is influenced by rapamycin. However, the presentation of the data, such as reporting ATP levels as relative percentages or failing to perform appropriate statistical comparisons between conditions in which the authors derive conclusions, renders the data difficult to interpret. As such, this manuscript establishes that ybr238c is rapamycin responsive and influences CLS, but its influence on mitochondrial activity and ties to TORC1 signaling remain speculative.

      We would like to express our gratitude to Reviewer #2 for the thoughtful feedback on our manuscript. We have carefully considered your comments and have made comprehensive revisions to address the concerns raised.

      We appreciate the suggestion to investigate the role of YBR238C in replicative lifespan (RLS). However, we want to bring to your attention that four previous studies (references 7, 39, 40, and 41) have already identified the involvement of YBR238C in the RLS phenotype. Given the existing body of literature on this aspect, we chose not to duplicate these efforts in our study.

      Instead, we focused our efforts on validating the role of YBR238C in chronological lifespan (CLS) phenotype, a finding reported in only one genome-wide study (reference 38). To enhance the comprehensiveness of our study, we performed analyses on different phenotypes, including mitochondria activity and oxidative stress, under both logarithmic-phase (condition for RLS) and stationary phase (condition for CLS). We now clearly indicate the logarithmic-phase/stationary phase conditions in the figure legends of the manuscript, specifying whether the conditions are relevant to RLS or CLS. Additional results of the new experiments have been included in the revised manuscript as supplementary figures (S3E-S3I).

      To address concerns about the indirect nature of our mitochondrial function assays, we have performed relative mitochondria content (S3F), quantification of ROS levels from fermentative to stationary phase conditions (S3G), and assessment in respiratory glycerol medium (S3H), which provides a more direct insight into mitochondrial biology. Additionally, we have investigated the resistance of ybr238c∆ cells to H2O2 toxicity and found them to be more resistant compared to wild-type cells.

      We believe these revisions strengthen the scientific rigor and clarity of our study. We sincerely appreciate the guidance from Reviewer #2, and we hope these modifications address the concerns raised effectively.

      Reviewer #3 (Public Review):

      Summary: The study by Alfatah et al. presented a role for YBR238C in mediating lifespan through improved mitochondrial function in a TOR1-dependent metabolic pathway. The authors used a dataset comparison approach to identify genes positively modulating yeast chronological (CLS) and Replicative (RLS) lifespan when deleted, and their expression is reduced under Rapamycin treatment condition. This approach revealed an unknown, mitochondria-localized yeast gene YBR238C, and through mechanistic studies, they identified its paralogous gene RMD9 regulating lifespan in an antagonistic effect.

      Strengths:

      Findings have valuable implications for understanding the YBR238C-mediated, mitochondrial-dependent yeast lifespan regulation, and the interplay between two paralogous genes in the regulation of mitochondrial function represents an inserting case for gene evolution.

      Weaknesses:

      Overall, the implication/findings of this study are restricted only to the yeast model since these two genes do not have any homology in higher eukaryotes. The primary methods must be carefully designed by considering two different metabolic states: respiration-associated with CLS and fermentation-associated with RLS in a single comparative approach. Yeast CLS and RLS are two completely different processes. It is already known that most gene-regulating CLS is not associated with RLS or vice versa. The method section is poorly written and missing important information. The experimental approaches are poorly designed, and variability across the datasets (e.g., media condition "YPD," "SC" etc.) and their experimental conditions are not well described/considered; thus, presented data are not conclusive, which decreases the overall rigor of the study.

      We sincerely appreciate your thorough review of our manuscript and your insightful comments. We acknowledge the limitation of our study being yeast-specific due to the absence of homologous genes in higher eukaryotes. However, we would like to highlight the significance of our findings in revealing a feedback loop between mitochondrial function and TORC1 signaling (TORC1-Mitochondria-TORC1 or TOMITO signaling process) in cellular lifespan regulation.

      Our interpretation of the experimental results is grounded in recent literature. Two studies (references 62 and 63) support our findings by demonstrating TORC1 activation after mitochondrial electron transport chain dysfunction and the delay in brain pathology progression upon TORC1 inhibition, respectively. These studies, discussed in our manuscript, reinforce the relevance of our work in a broader biological context.

      We recognize the importance of carefully designing our primary methods to account for the different metabolic states associated with cellular processes, such as respiration in cellular lifespan (CLS) and fermentation in replicative lifespan (RLS). We want to bring to your attention that four previous studies (references 7, 39, 40, and 41) have already identified the involvement of YBR238C in the RLS phenotype. To avoid duplicating these efforts, we have chosen not to reiterate these findings in our study. However, we have clarified the logarithmic-phase/stationary phase conditions in the figure legends, specifying their metabolic states relevance to RLS or CLS. Additionally, we have included new supplementary figures (S3E-S3I) to provide further details on the new experiments conducted.

      We appreciate your feedback regarding the clarity and completeness of our method section. In the revised manuscript, we have invested additional effort to enhance the clarity of the method section, providing a more detailed account of the experimental procedures, including the missing information you identified.

      We believe these revisions strengthen the scientific rigor and clarity of our study. We sincerely appreciate the guidance from Reviewer #3, and we hope these modifications address the concerns raised effectively.

      Reviewer #1 (Recommendations For The Authors):

      Thank you for your detailed review and valuable recommendations. We have carefully addressed each of your comments in the revised manuscript. The specific changes made include:

      (1) "TORC1 positively regulates aging, and its inhibition increases lifespan in various eukaryotic organisms including yeast and mammalian 13,26,27,29,30." Here I would suggest replacing "mammalian" with "mammals".

      We have amended the sentence as recommended.

      (2) "Next, we experimentally tested whether the transcriptome longevity signatures are associated with enhanced mitochondrial metabolism, whether the cellular energy level has gone up and cellular stress responses are induced with a switch to oxidative metabolism 47,48." Here I would replace "transcriptome longevity signatures is" with "transcriptome longevity signatures are".

      We have amended the sentence as recommended.

      (3) "Thus, HAP4-independent mechanism does exist through which YBR238C also affects cellular aging (Figure 3I)." I would replace "Thus, HAP4-independent" with "Thus, a HAP4-independent".

      We have amended the sentence as recommended.

      (4) "We examined other mitochondrial dysfunctional conditions to confirm that suppressive effect of rapamycin is not only specific to YBR238C-OE." I would change "that suppressive effect" to "that the suppressive effect".

      We have amended the sentence as recommended.

      (5) "Understanding the mechanism of aging will also require to understand the role of many genes of yet unknown function as YBR238C at the beginning of this work." I would switch "require to understand" to "require understanding".

      We have amended the sentence as recommended.

      (6) "The gene lists that modulate cellular lifespan in aging model organism yeast Saccharomyces cerevisiae were extracted from database SGD 22 and GenAge 23 (as of 8th November 2022)" "yeast" should not be italicized.

      Corrected.

      (7) Figure 1, panels C and D, ybr238c should be italicized.

      Corrected.

      (8) Figure 2B, top left-most (oxidative phosphorylation) network. I might consider repositioning some labels to make them more readable if possible.

      Thank you for your feedback. The figure labels in Figure 2B are default from Metascape analysis, so repositioning isn't feasible. However, we have indicated in the figure legends that the full set of genes for functional enrichment analysis and the MCODE complex is available in Additional File 3.

      (9) Figure 4E, rmd9, pet100, and cox6 should be italicized.

      Corrected.

      (10) Figure 5C, rmd9 and rmd9 ybr238c should be italicized. Corrected.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for your detailed review and valuable recommendations. We have carefully addressed each of your comments in the revised manuscript. The specific changes made include:

      (1) The presentation of data as heatmaps (Figures 1F, 3D, 4C, 4G, 5B, 5H, 5L, 6K) obfuscates the quantitative nature of the data. These data would be much stronger if presented as bar graphs with appropriate statistical analysis. If the authors prefer the visual of the heat map, there should be some statistical analysis performed to accompany these figures. This is particularly important for Figure 3D, in which the authors state "We found that HAP4 deletion significantly decrease the ETC complex I-V genes' expression" (bottom of page 8). As no statistical analyses were performed, the authors should refrain from using such language as it is unsupported by the data as analyzed.

      Thank you for your insightful comments and suggestions regarding the presentation of our data. We appreciate the attention you have given to Figures 1F, 3D, 4C, 4G, 5B, 5H, 5L, and 6K.

      In response to your feedback, we have carefully re-evaluated our approach. Considering the large volume of data associated with our lifespan analysis at different time points, we initially chose to visualize it using heatmaps to comprehensively capture the complexity of the results. However, we have now incorporated quantification information into the heatmaps.

      For Figure 3D, which addresses the impact of HAP4 deletion on the expression of ETC complex I-V genes, we have replaced the heatmap with a bar graph. This modification allows for a clearer representation of the quantitative nature of the data. Moreover, we have conducted thorough statistical analyses comparing data between ybr238c∆ and ybr238c∆ hap4∆ to support the statements made in the text. The results of these analyses are now included in the revised figure. Moreover, we also replaced the Figure 6K heatmap with a bar graph.

      We believe that these changes enhance the interpretability and robustness of our findings. We are grateful for your guidance, and we are confident that these adjustments will strengthen the overall quality of our manuscript.

      (2) The presentation of ATP data, given its importance in supporting the core conclusions of this manuscript, is poor. The conditions under which yeast was collected are not reported, making these data impossible to interpret; total cellular ATP levels would be significantly altered and influenced by separate pathways in fermentive versus stationary phases. Minimally, the authors should describe the conditions of yeast growth (e.g., age, culture media) in which these measurements were made. The presentation of relative ATP percentages is problematic, particularly with measurements that deviate so far from wild-type ATP levels in conditions such as those in Figure 6A, in which the authors report that rapamycin induces a 1200% increase in cellular ATP. Previous papers have established that ATP levels in yeast hover around 4 mM and are stable through the cell cycle and across nutrient conditions (PMID: 30858198, 35438635). Given this, the reported ATP levels would be expected to be near 48 mM, which is strongly outside of the typically accepted values of 1-10 mM for this metabolite. Without understanding the contexts in which these measurements are made, as well as the absolute values for these measurements (which would be easily achievable through the use of a standard curve of ATP), these data are uninterpretable. Furthermore, it seems unlikely that yeast would be able to accommodate shifts of ATP levels that span an order of magnitude without dire cellular consequences, particularly during rapamycin treatment.

      We appreciate the valuable feedback from the reviewer regarding the importance of providing detailed information on yeast growth conditions for interpreting ATP data. In response to this suggestion, we have enhanced the figure legends associated with the relevant figures to include a comprehensive description of the yeast growth conditions. This now specifies the age of the culture, culture media composition, and other pertinent parameters.

      In addressing the concern raised about the rapamycin-induced ATP increase, we have carefully re-examined our experimental procedures. We performed additional experiments and confirmed the consistency of our findings in logarithmic-treated cultures. The results remain in alignment with our initial observations, reinforcing the reliability and reproducibility of our data.

      (3) As stated above, the inference of mitochondrial function from cellular ATP levels, cellular ROS levels, and gene expression of a handful of nuclear-encoded genes is not sound. The authors should include further experimentation as evidence of mitochondrial functionality, such as respirometry or metabolic flux experiments.

      Thank you for your constructive feedback on our manuscript. We appreciate your careful consideration of our work. In response to your concerns regarding the indirect nature of our mitochondrial function assays, we have implemented the following changes: We have incorporated additional assays to provide a more direct insight into mitochondrial biology. Specifically, we performed relative mitochondria content analysis (S3F) and quantified ROS levels under fermentative to stationary phase conditions (S3G). These assays offer a more direct and comprehensive assessment of mitochondrial function. Furthermore, we conducted experiments in respiratory glycerol medium (S3H) to complement our previous findings.

      To further support our claims, we investigated the resistance of ybr238c∆ cells to H2O2 toxicity. Our results demonstrate that these cells exhibit increased resistance compared to wild-type cells. This additional evidence strengthens the link between mitochondrial function and cellular response to oxidative stress.

      We believe these adjustments address your concerns and significantly enhance the robustness of our study. We hope you find these modifications satisfactory. We are grateful for your valuable input, which has undoubtedly improved the clarity and reliability of our findings.

      (4) Multiple gene expression analyses are performed on n=2 measurements, and this should be bolstered by further replicates. Many bar graphs do not have accompanying statistics; these should be added. Some statistical tests are performed across inappropriate comparisons, such as Figure 3G, in which expression levels of mitochondrial genes in both deletion and overexpression strains should be compared to a wild-type control rather than to each other.

      Thank you for your thorough review and constructive feedback on our manuscript. We appreciate your careful examination of our work. In response to your comments, we have made the following revisions to address your concerns: The multiple gene expression analysis in our study focused specifically on ETC genes. It is important to note that ETC genes themselves represent multiple replicates within the ybr238c deletion and overexpression cells, as illustrated in Figures 4D, 4G, and 6B.

      We acknowledge and appreciate your observation regarding Figure 3G. To address this concern, we have revised the statistical comparisons. The expression levels of mitochondrial genes in the overexpression strain are now appropriately compared to a wild-type control. This correction has been applied in the figure that correctly corresponds to text in the manuscript.

      (5) Figure 2B is uninterpretable as it stands, as most gene symbols are obscured.

      We appreciate the reviewer's attention to Figure 2B and the feedback provided. Regarding the gene labels in Figure 2B, we would like to clarify that these labels are default outputs from the Metascape analysis, and unfortunately, repositioning them within the current figure layout isn't feasible without compromising the integrity of the information.

      However, we have taken the reviewer's concern seriously and have made efforts to address the interpretability issue. To provide readers with access to the full set of genes for functional enrichment analysis and the MCODE complex, we have included this information in Additional File 3. The figure legends have been updated accordingly to guide readers to refer to Additional File 3 for a more detailed examination of the gene symbols and their annotations.

      We hope that this solution addresses the concern raised by the reviewer.

      (6) The conclusions to be drawn from Figure 3A are not clear, and this figure is cited only once in the text along with two other figures (page 8).

      Thank you for your valuable feedback. We have carefully considered your comments and made revisions to improve the clarity of the conclusions drawn from Figure 3A.

      (7) Figure 6K reports a range of 100-200% cell survival - how does a cell have 200% survival? Isn't survival binary (i.e., you survive or you are dead)? Perhaps this is meant to be relative to another condition; this should be more clearly stated in the figure, or the axis should be normalized to a maximum of 100% survival.

      Thank you for your guidance and valuable feedback. Based on your recommendation, we have made significant changes to Figure 6K in the revised manuscript. Specifically, we replaced the heatmap with a bar graph to enhance clarity. Additionally, we would like to highlight that cell survival of combined treated cells is measured relative to the control treatment, which is considered 100% survival. This aims to provide a more accurate and comprehensible representation of the data. We believe these modifications contribute to a clearer presentation of our findings.

      (8) The authors state that "TORC1 inhibition in yeast and human cells with mitochondrial dysfunction suppresses their accelerated aging." No studies of aging were done in human cells; survival in response to mitochondrial toxins does not reveal aging phenotypes. To state such is a substantial overstatement and should be amended to perhaps "cellular survival" rather than directly linked to aging.

      We appreciate the careful review of our manuscript and the constructive feedback provided by the reviewer. In response to the concern raised regarding the statement about TORC1 inhibition and accelerated aging in human cells, we have revised the relevant passage as follows: "In turn, TORC1 inhibition in yeast and human cells with mitochondrial dysfunction enhances their cellular survival." We believe that this modification accurately reflects the outcomes of our experiments and addresses the concern raised by the reviewer. We would like to express our gratitude for the valuable feedback, which has contributed to the improvement of our manuscript. Thank you for your thoughtful consideration.

      Reviewer #3 (Recommendations For The Authors):

      Thank you for your detailed review and valuable recommendations. We have carefully addressed each of your comments in the revised manuscript. The specific changes made include:

      The authors should have attempted to fully characterize the RLS and CLS phenotype of strains lacking the YBR238C and RMD9 gene, the single most important gene identified in this study. Before further characterization, its association with aging must be tested to replicate findings from the literature. Although Figure 3 shows partially characterized CLS in SC medium, different media conditions could be tested, and the full spectrum of CLS lifespan curves should be represented. RLS phenotypes of these cells were not analyzed throughout the study.

      We appreciate the suggestion to investigate the role of YBR238C in both Replicative Lifespan (RLS) and Chronological Lifespan (CLS). However, it's essential to note that the involvement of YBR238C in the RLS phenotype has been previously documented in four studies (references 7, 39, 40, and 41). Considering the established literature on this matter, we chose not to duplicate these efforts in our study.

      Our primary focus was on confirming the role of YBR238C in the chronological lifespan (CLS) phenotype, as indicated by a genome-wide study (reference 43). Accordingly, we also conducted an analysis of the role of RMD9 in CLS. The methods and figure legends explicitly state that CLS experiments for prototrophic CEN.PK113-7D strains were conducted in synthetic defined (SD) medium containing 6.7 g/L yeast nitrogen base with ammonium sulfate without amino acids and 2% glucose. For auxotrophic BY4743 strains, SD medium was supplemented with histidine (40 mg/L), leucine (160 mg/L), and uracil (40 mg/L).

      It is important to clarify that SC medium was not used for CLS analysis. Instead, we employed SD medium, recommended for CLS analysis (reference 15; PMID: 22768836). The CLS experiments were conducted using three different methods, providing a comprehensive representation of the entire CLS lifespan (Figures 1C, 1D, 1E, and 1F).

      While we did not present the Replicative Lifespan (RLS) phenotype explicitly, we performed experiments such as mitochondrial activity and ROS production under both CLS and RLS conditions. These additional analyses contribute valuable insights into the broader implications of YBR238C and RMD9 on cellular function.

      We believe that these clarifications and the inclusion of additional experimental details enhance the robustness and validity of our findings. We hope these explanations address the concerns raised by the reviewer and contribute to the overall improvement of our manuscript.

      In addition, authors include RNAseq data from Rapamycin-treated cells to identify differentially expressed genes. Notably, genes with decreased expression were used to compare KO strains' lifespan phenotype. Additional RNAseq analyses were performed on individual KO cells. The methodology section needs to be better written with information on which media and metabolic state that these cells are collected after treatment with rapamycin. If the cells are collected during logarithmic growth, the data can be compared with RLS aging gene sets only. A separate experiment has to be performed on stationary cells (respiratory) to collect RNAseq data after rapamycin treatment, then can be compared to the CLS aging gene set.

      Thank you for your insightful comments and considerations regarding our methodology for obtaining Rapamycin response genes (RRGs). We appreciate the opportunity to address your concerns and provide further clarification on our experimental approach.

      As mentioned in our manuscript, we obtained RRGs by treating logarithmic cells with 50 nM Rapamycin for 1 hour, and the details have been included in supplementary Figure S1C legends. Our primary objective was to compare these RRGs with agingassociated genes that modulate both Replicative Lifespan (RLS) and Chronological Lifespan (CLS). We acknowledge the significance of this comparison and believe that our approach, treating logarithmic cells, is suitable for achieving this goal.

      It is important to note that the use of a higher concentration of Rapamycin for treatment renders the cells less efficient in terms of growth, resulting in a very low optical density (OD) at 72 hours, as illustrated in Figure 6H. Unfortunately, due to this limitation in growth efficiency, obtaining Rapamycin response genes at the stationary phase was not feasible in our experimental setup.

      As the experimental conditions vary among the reports and the gene expression signature significantly changes under different metabolic conditions, the media condition that samples are collected for RNAseq analyses should match the media condition that the lifespans of those KO strains are tested. However, more information needs to be detailed on these methodologies. For example, the transcriptomic signature of the YBR238C KO strain should be done under both fermentative and respiratory conditions to understand the true gene expression signature associated with CLS and RLS. Throughout the manuscript, these two metabolic conditions and associated lifespan types (CLS vs. RLS) are not differentiated and treated as the same, probably causing the biggest confounding effect that resulted in the identification of a single yeast-specific gene.

      We obtained the transcriptomic signature of the YBR238C KO strain from logarithmic phase cultures. This consistency was maintained to align with the Rapamycin Response Genes (RRGs) obtained from logarithmic cells treated with rapamycin. Detailed methodology and metabolic status information is provided in the method section and relevant figure legends.

      To broaden the scope of our study, we conducted analyses on various phenotypes, including mitochondrial activity and oxidative stress, under both logarithmic phase (relevant to Replicative Lifespan, RLS) and stationary phase (relevant to Chronological Lifespan, CLS). We have now explicitly indicated the logarithmic phase/stationary phase conditions in the figure legends of the manuscript, specifying their relevance to RLS or CLS.

      Results from these additional experiments have been incorporated into the revised manuscript as supplementary figures (S3E-S3I). We believe that these clarifications and the inclusion of additional experimental details enhance the robustness and validity of our findings. We trust that these explanations effectively address the concerns raised by the reviewer and contribute to the overall improvement of our manuscript.

      YBR238C gene KO effect on mitochondrial function missing comprehensive characterization. Whether the improved mito function caused by increased mtDNA copy number and/or increased mitochondrial number could be easily tested by analyzing normalizing RNAseq reads from mtDNA genes to reads from nucDNA genes. Data could be further combined with western blot specific to mito membrane proteins to analyze mito copy number.

      Thank you for your insightful comments and suggestions. Following your recommendation, we conducted an assessment of relative mitochondrial content (see Figure S3F) and observed significantly higher mtDNA content in the ybr238c∆ compared to the wild type (see Figure S3F). Additionally, we have incorporated the methodology for mitochondrial DNA copy number analysis in the methods section.

      The two paralogous gene interaction is an interesting observation. However, in yeast, it is known that deletion of one of the paralogous genes causes copy number amplification of the certain chromosome that the other paralogous gene is located, causing aneuploid chromosome. Many of the observed phenotypes can be associated with increased chromosome copy number and should be carefully tested. However, the authors did not consider this important point. Simply, using RNA seq data normalized read/per chromosome could be plotted to analyze the karyotype of YBR238C and RMD9 KO cells.

      We appreciate your thoughtful consideration of our work and the suggestion to investigate chromosome copy number variations. While we did not directly test the chromosome copy, we want to highlight that our study extensively explores the impact of YBR238C on cellular lifespan through an RMD9-dependent mechanism (Figure 5). Deletion of YBR238C increases, whereas overexpression of YBR238C decreases the expression of its paralog, RMD9 (Figure 5F). Furthermore, this phenotype is associated with the lifespan of YBR238C-deleted and overexpressed cells. In our study, we have thoroughly investigated this aspect.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate the care and the detail shown by the Reviewers. Their comments have made our article more focused and more accessible to a general audience.

      We would like to begin with a comment about the last sentence of the “eLife assessment”. The evolution of metamorphosis in insects was a major triumph in animal evolution that subsequently impacted almost every aspect of plant and animal evolution in the terrestrial and freshwater aquatic biospheres. Unlike the metamorphoses of most other groups, whose evolutions are lost in time, insect evolution arose relatively recently (~400 mya) and insect orders have branched off at various points in this evolution and have persisted to modern times. Although these “relic” groups also have undergone millions of years of evolution and specialization, they still provide us with windows into how this progression may have come about. The study of these groups provides a unique opportunity to explore the mechanisms that underlie major life history shifts and should be of interest to anyone interested in evolution – not just entomologists.

      Reviewer #1 (Public Review):

      Summary:

      This paper provides strong evidence for the roles of JH in an ametabolous insect species. In particular, it demonstrates that:

      • JH shifts embryogenesis from a growth mode to a differentiation mode and is responsible for terminal differentiation during embryogenesis. This, and other JH roles, are first suggested as correlations, based on the timing of JH peaks, but then experimentally demonstrated using JH antagonists and rescue thereof with JH mimic. This is a robust approach and the experimental results are very convincing.

      • JH redirects ecdysone-induced molting to direct formation of a more mature cuticle

      • Kr-h1 is downstream of JH in Thermobia, as it is in other insects, and is a likely mediator of many JH effects

      • The results support the proposed model that an ancestral role of JH in promoting and maintaining differentiation was coopted during insect radiations to drive the evolution of metamorphosis. However, alternate evolutionary scenarios should also be considered.

      Strengths:

      Overall, this is a beautiful, in-depth student. The paper is well-written and clear. The background places the work in a broad context and shows its importance in understanding fundamental questions about insect biology. The researchers are leaders in the field, and a strength of this manuscript is their use of a variety of different approaches (enzymatic assays, gene expression, agonists & antagonists, analysis of morphology using different types of microscopy and detection, and more) to attack their research questions. The experimental data is clearly presented and carefully executed with appropriate controls and attention to detail. The 'multi-pronged' approach provides support for the conclusions from different angles, strengthening conclusions. In sum, the data presented are convincing and the conclusions about experimental outcomes are well-justified based on the results obtained.

      Weaknesses:

      This paper provides more detail than is likely needed for readers outside the field but also provides sufficient depth for those in the field. This is both a strength and a weakness. I would suggest the authors shorten some aspects of their text to make it more accessible to a broader audience. In particular, the discussion is very long and accompanied by two model figures. The discussion could be tightened up and much of the text used for a separate review article (perhaps along with Figure 11) that would bring more attention to the proposed evolution of JH roles.

      We appreciate the comments about the strengths and weaknesses of the paper. To deal with the weaknesses, we have condensed some of the Results to make them less cumbersome and the Discussion has been completely revised, keeping a sharp focus on the actions of JH in Thermobia embryos and how these actions relate to the status quo functions of JH in insects with metamorphosis. As part of the revision of the Discussion, we have replaced Figures 10 and 11.

      Reviewer #1 (Recommendations For The Authors):

      In keeping with my public review, this paper is very strong and I have very few suggestions for improvement. They are:

      (1) Thermobia are extant insects and are not ancestral insects. It is likely that they retain features found in an insect ancestor. However, these insects have been evolving for a very long time, and for any one feature, many changes may have occurred, both gain and loss of gene function and morphology. Further, even for morphological features present in an extant species that are the same as an ancestor, genetic pathways regulating this feature may have changed over time (see for examples papers from the Haag and Pick labs). Although I realize this is a small, possibly almost semantic point, I feel it is important to be precise here. For example, in the title, "before" is speculative as there could have been a different role in the ancestor with the role in embryogenesis arising in lineages leading to Thermobia; similarly in the abstract, "this ancestral role of JH' is an overstatement since we cannot actually measure the ancestral role.

      Since the title has already been cited in a Perspectives review, we decided to keep the title as is.

      (2) I don't understand the results in Met and myo in Fig. 3B. Perhaps include them in the explanation of Fig.3 and not after the description of Fig. 4 and explain them in more detail (or perhaps not include them at all?). I don't really understand the statistical analysis of these panels either.

      We have revised the figure legends to explain the statistics.

      (3) Another point regarding language - talking about the embryo being "able" to go through a developmental stage implies decision-making. I would suggest dropping that wording (e.g, in the description of Fig. 5C). Similarly, in explaining Fig. 6B, it would be more correct to say "JH treatment no longer inhibited" than as written "could no longer inhibit" (implying 'no matter how hard it tried, it still couldn't do it')

      We have removed the “can’t” wording. Figure 6 has been revised

      Reviewer #2 (Public Review):

      The authors have studied in detail the embryogenesis of the ametabolan insect Thermobia domestica. They have also measured the levels of the two most important hormones in insect development: juvenile hormone (JH) and ecdysteroids. The work then focuses on JH, whose occurrence concentrates in the final part (between 70 and 100%) of embryo development. Then, the authors used a precocene compound (7-ethoxyprecocene, or 7EP) to destroy the JH producing tissues in the embryo of the firebrat T. domestica, which allowed to unveil that this hormone is critically involved in the last steps of embryogenesis. The 7EP-treated embryos failed to resorb the extraembryonic fluid and did not hatch. More detailed observations showed that processes like the maturational growth of the eye, the lengthening of the foregut and posterior displacement of the midgut, and the detachment of the E2 cuticle, were impaired after the 7EP treatment. Importantly, a treatment with a JH mimic subsequent to the 7EP treatment restored the correct maturation of both the eye and the gut. It is worth noting that the timing of JH mimic application was essential for correcting the defects triggered by the treatment with 7EP.

      This is a relevant result in itself since the role of JH in insect embryogenesis is a controversial topic. It seems to have an important role in hemimetabolan embryogenesis, but not so much in holometabolans. Intriguingly, it appears important for hatching, an observation made in hemimetabolan and in holometabolan embryos. Knowing that this role was already present in ametabolans is relevant from an evolutionary point of view, and knowing exactly why embryos do not hatch in the absence of JH, is relevant from the point of view of developmental biology.

      The unique and intriguing aspect of juvenile hormone is its status quo action in the control of metamorphosis. Our reason for dealing with an insect group that branched off from the line of insects that eventually evolved metamorphosis, was to gain insight into the ancestral functions of this hormone. Our data from Thermobia as well as that from grasshoppers and crickets indicate that the developmental actions of JH were originally confined to embryogenesis where it promoted the terminal differentiation of the embryo. Its actions in promoting differentiation also included suppressing morphogenesis. This latter function was not pronounced during embryogenesis because JH only appeared after morphogenesis was essentially completed. However, it was a preadaptation that proved useful in more derived insects that delayed aspects of morphogenesis into the postembryonic realm. JH was then used postembryonically to inhibit morphogenesis until late in juvenile growth when JH disappears, and this inhibition is released.

      Then, the authors describe a series of experiments applying the JH mimic in early embryogenesis, before the natural peak of JH occurs, and its effects on embryo development. Observations were made under different doses of JHm, and under different temporal windows of treatment. Higher doses triggered more severe effects, as expected, and different windows of application produced different effects. The most used combination was 1 ng JHm applied 1.5 days AEL, checking the effects 3 days later. Of note, 1.5 days AEL is about 15% embryonic development, whereas the natural peak of JH occurs around 85% embryonic development. In general, the ectopic application of JHm triggered a diversity of effects, generally leading to an arrest of development. Intriguingly, however, a number of embryos treated with 1 ng of JHm at 1.5 days AEL showed a precocious formation of myofibrils in the longitudinal muscles. Also, a number of embryos treated in the same way showed enhanced chitin deposition in the E1 procuticle and showed an advancement of at least a day in the deposition of the E2 cuticle.

      While the experiments and observations are done with great care and are very exhaustive, I am not sure that the results reveal genuine JH functions. The effects triggered by a significant pulse of ectopic JHm when the embryo is 15% of the development will depend on the context: the transcriptome existing at that time, especially the cocktail of transcription factors. This explains why different application times produce different effects. This also explains why the timing of JHm application was essential for correcting the effects of 7EP treatment. In this reasoning, we must consider that the context at 85% development, when the JH peaks in natural conditions and plays its genuine functions, must be very different from the context at 15% development, when the JHm was applied in most of the experiments. In summary, I believe that the observations after the application of JHm reveal effects of the ectopic JHm, but not necessarily functions of the JH. If so, then the subsequent inferences made from the premise that these ectopic treatments with JHm revealed JH functions are uncertain and should be interpreted with caution.

      We disagree with the reviewer. An analogous situation would be in exploring gene function in which both gain-of-function and loss-of-function experiments often provide complementary insights into how a gene functions. We see JH effects only when its receptor, Met, is present and JH can induce its main effector protein, Kr-h1. The latter gives us confidence that we are looking at bona fide JH effects. We have also kept in mind, though, that the nature of the responding tissues is changing through time. Nevertheless, we see a consistent pattern of responses in the embryo and these can be related to its postembryonic effects in metamorphic insects.

      Those inferences affect not only the "JH and the progressive nature of embryonic molts" section, but also, the "Modifications in JH function during the evolution of hemimetabolous and holometabolous life histories" section, and the entire "Discussion". In addition to inferences built on uncertain functions, the sections mentioned, especially the Discussion, I think suffer from too many poorly justified speculations. I love speculation in science, it is necessary and fruitful. But it must be practiced within limits of reasonableness, especially when expressed in a formal journal.

      We have tried to dial back the speculation.

      Finally, In the section "Modifications in JH function during the evolution of hemimetabolous and holometabolous life", it is not clear the bridge that connects the observations on the embryo of Thermobia and the evolution of modified life cycles, hemimetabolan and holometabolan.

      Our Figure 12 should put this into perspective.

      Reviewer #2 (Recommendations For The Authors):

      Main points

      (1) Please, reduce the level of overinterpretation of ectopic treatment experiments with JHm, since the resulting observations represent effects, but not necessarily functions of JH.

      We have revised this section to indicate that the “effects” of ectopic treatments provide insights into the function of JH. Using a genetic analogy, both “loss-of-function” and “gain-of-function” experiments provide insights into a given gene. (see response to Public Comments)

      (2) Especially in the sections "JH and the progressive nature of embryonic molts" and "Modifications in JH function during the evolution of hemimetabolous and holometabolous life histories", and the entire "Discussion", please keep the level of speculation within reasonable limits, avoiding especially the inference of conclusions on the basis of speculation, itself based on previous speculation.

      We have toned down some of the speculation and provided reasons why it is worth suggesting.

      (3) Please revisit the argued roles of myoglianin in the story, in light of its effects as an inhibitor of JH production, repressing the expression of JHAMT, as has been reliably demonstrated in hemimetabolan species (DOI: 10.1073/pnas.1600612113 and DOI: 10.1096/ fj.201801511R).

      Our appreciation to the reviewer. We are more explicit about the relationship between JH and myo.

      Minor points

      (4) Please keep the consistency of the scientific binomial nomenclature for the species mentioned. For example, read "Manduca sexta" (in italics) at the first mention, and then "M. sexta" (in italics) in successive mentions (instead of reading "Manduca" on page 17, and then "Manduca sexta" on page 18, for example). The same for "Drosophila" ("Drosophila melanogaster" first, and then "D. melanogaster"), "Thermobia" ("Thermobia domestica" first, and then "T. domestica"), etc. In the figure legends, I recommend using the complete name: Thermobia domestica, in the main heading.

      Where there is no possibility of confusion, we intend to use Thermobia, rather than T. domestica, etc. We think that it is easier for a non-specialist to read and it is commonly done in endocrine papers.

      (5) There is no purpose in evolution and biological processes. Thus, I suggest avoiding expressions that have a teleological aftertaste. For example (capitals are mine), on p. 3 "appears to have been extended into postembryonic life where it acts TO antagonize morphogenic and allow the maintenance of a juvenile state".

      We have tried to avoid teleological wording.

      (6) The title "The embryonic role of juvenile hormone in the firebrat, Thermobia domestica, reveals its function before its involvement in metamorphosis" contains a redundancy ("role" and "function"), and an apparent obviousness ("before its involvement in metamorphosis"). I suggest a more straightforward title. Something like "Juvenile hormone plays developmental functions in the embryo of the firebrat Thermobia domestica, which predate its status quo action in metamorphosis".

      As noted above, we are retaining the title since it has already been cited.

      (7) Page 2. "The transition from larva to adult then occurred through a transitional stage, the pupa, thereby providing the three-part life history diagnostic of the "complete metamorphosis" exhibited by holometabolous insects (reviews: Jindra, 2019; Truman & Riddiford, 2002, 2019)". I suggest adding the reference ISBN: 9780128130209 9 7 8 - 0 - 1 2 - 8 1 3 0 2 0 - 9, as the most comprehensive and recent review on complete metamorphosis.

      Done

      (8) Page 3. "These severe developmental effects suggest that the developmental role of JH in insects was initially CONFINED to the embryonic domain" (capitals are mine). This appears contradictory with the observations of Watson, 1967, on the relationships between the apparition of scales and JH, mentioned shortly before by the authors.

      This is explained in the Discussion. Although JH can suppress scale appearance in the J4 stage, we have not been able to show that scales appearance is caused by changes in the juvenile JH titer.

      (9) Page 4. "we measured JH III levels during Thermobia embryogenesis at daily intervals starting at 5 d AEL". Why not before, like in the case of ecdysteroids? The authors might perhaps argue that the levels of Kr-h1 expression are consistently low from the very beginning, according to Fernandez-Nicolas et al, 2022 (reference cited later in the manuscript).

      (10) Page 4. "Ecdysteroid titers through embryogenesis and the early juvenile instars were measured using the enzyme immunoassay method (Porcheron et al., 1989) that is optimized for detecting 20-hydroxyecdysone (20E)". The antibody generated by Porcheron (and now sold by Cayman) recognizes ecdysone and 20-hydroxyecdysone alike. But that's not relevant here. I would refer to "ecdysteroids" when mentioning measurements. Also in figure 2B (and "juvenile hormone III" without the formula, in Panel A, for harmonization). And I would not expand on specifications, like those at the beginning of page 5, or towards the end of page

      We thank the reviewer for this important correction.

      (12) ("the fact that we detected only a slight rise in ecdysteroids at this time (Fig 2B) is likely due to the assay that we used being designed to detect 20E rather than ecdysone").

      Omitted.

      (11) Page 5. "Low levels of Kr-h1 transcripts were present at 12 hr after egg deposition, but then were not detected until about 6 d AEL when JH-III first appeared". There is a very precise Kr-h1 pattern in Fernandez-Nicolas et al. 2023 (reference mentioned later in the manuscript).

      (12) Page 5. "notably myoglianin (myo), have become prominent as agents that promote the competence and execution of metamorphosis in holometabolous and hemimetabolous insects (He et al., 2020; Awasaki et al., 2011)". See my note 3 above.

      The myoglianin issue has been revised.

      (13) Page 5. "a drug that suppresses JH production". Rather, "a drug that destroys the JH producing tissues". Why the way, do the authors know when the CA are formed in T. domestica embryo development?

      We prefer to keep our original wording. There have been some cases in which precocene has blocked JH production but did not kill the CA cells. We do not have observations that show that 7EP kills the CA cells in Thermobia embryos.

      (14) Page 5. "subsequent treatment with a JHm". I would say here that the JHm is pyriproxyfen, not on page 6 or page 7. Thus, to be consistent, after the first mention of "pyriproxyfen (JHm)" on page 5, I'd consistently use the abbreviation "JHm".

      (15) Page 9. "Limb loss in such embryos was often STOCHASTIC, i.e., in a given embryo some limbs were completely lost while others were maintained in a reduced state" (capitals are mine). The meaning of "stochastic" is random, involving a random variable; it is a concept usually associated to probability theory and related fields. I suggest using the less specialized word "variable", since to ascertain that the values are really stochastic would require specific mathematical approaches.

      We are still using stochastic because the loss is random.

      (16) Page 10. "9E). Indeed, the JH treatment redirects the molt to be more like that to the J2 stage, rather than to the E2 (= J1) stage". Probably too assertive given the evidence available (see my points 1 and 2 above).

      We do not see a problem with our conclusion. In response to the JHm treatment, the embryo produced a smooth, rather than a “pebbly” cuticle, failed to make the J1-specific egg tooth, and attempted to make cuticular lenses (a J2 feature). This ability of premature JH exposure to cause embryos to “skip” a stage is also seen in locusts (Truman & Riddiford, 1999) and crickets (Erezyilmaz et al., 2004). The JHm treatment resulted in the production of smooth cuticle, lack of a hatching tooth, and an attempt to make cuticular lenses.

      (17) Page 11. "early JHM treatment", read "early JHm treatment".

      Corrected

      (18) Page 11. "likely. A target of JH, and likely Kr-h1, in Thermobia is myoglianin...". Please see my notes 1, 2, and especially 3, above.

      This has been revised

      (19) Page 13. "the locust, Locusta americana (Aboulafia-Baginshy et al.,1984)". Please read "the locust, Locusta migratoria (Aboulafia-Baginshy et al.,1984)".

      Corrected

      (20) Page 13 "Acheta domesticus" three times. The correct name now is "Acheta domestica", after harmonizing the declension of the specific name with the generic one. See additionally my note 4 above.

      Acheta domesticus has been used in hundreds (thousands?) of papers since it was originally named by Linnaeus. We will continue to use it.

      (21) Page 15, "(also called the vermiform larva (Bernays, 1971) redirects embryonic development to form an embryo with proportions, cuticular pigmentation, cuticular sculpturing and bristles characteristic of a nymph, while pronymph modifications, such as the cuticular surface sculpturing (Bernays, 1971)". The reference "Bernays, 1971" is indeed "Bergot et al., 1971".

      There was a mistake in the references. The Bernays reference was omitted from the revised Discussion

      (22) Page 16. "Since JH also induces Kr-h1 in embryos of many insects, including Thermobia". I'm not sure that this has been studied in many insects. In any case, any reference would be useful.

      (23) Page 17. "Tribolium casteneum". Please read "Tribolium castaneum".

      Changed

      (24) Page 17. "...results in a permanent larva that continues to molt well after it has surpassed its critical weight (He et al., 2019)". The paper of He et al., 2019 is preceded by two key papers that previously demonstrate (and in hemimetabolan insects) that myoglianin is a determining factor in the preparation for metamorphosis: DOI: 10.1073/pnas.1600612113 and DOI: 10.1096/ fj.201801511R). See my note 3 above.

      Corrected in revision

      (25) Page 18. "These persisting embryonic primordia join the wing primordia in delaying their morphogenesis into postembryonic life". This reader does not understand this sentence.

      Made clearer in the revision.

      (26) Page 18. "is first possible in the commercial silkworm (Daimon et al., 2015)". Please mention the scientific Latin name of the species, Bombyx mori.

      (27) Page 19. "The functioning of farnesol derivatives in growth versus differentiation control extends deep into the eukaryotes.../... this capacity was eventually exploited by the insects to provide the hormonal system that regulates their metamorphosis". This information appears quite out of place.

      We have retained this point.

      (28) Page 21. Heading "Hormones". I suggest using the heading "Bioactive compounds", as neither pyriproxyfen nor 7-ethoxyprecocene are hormones.

      Done

      (29) Page 29, legend of figure 1. "Photomicrographs" is somewhat redundant. The technical word is "micrographs". "Thermobia domestica" appears in the explanation of panel C, but this is not necessary, as the name appears in the main heading of the legend.

      Done

      (30) Page 30, legend of figure 2. Panel B, see my comment 10 above. Why embryonic age is expressed in % embryo development in panel C (and in days in panels A and B)?

      All have been converted to days AEL

      (31) Page 35, legend of figure 5. "Photomicrograph" see my note 28 above.

      Done

      (32) Page 40, figure 10. In panel A, the indication of the properties of JH is misleading. The arrow going to promoting differentiation and maturation is OK, but the repression sign that indicates suppression of morphogenetic growth and cell determination seems to suggest that JH has retroactive effects. In panel B, I suggest to label "Flies" instead of "Higher Diptera", which is an old-fashioned term. In any case, see my general comments 1 and 2, above, about speculation.

      Figure has been completely revised

      (33) Figure 11. See my general comments 1 and 2, above, about speculation.

      Figure has been revised

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors use inhibitors and mimetics of juvenile hormone (JH) to demonstrate that JH has a key role in late embryonic development in Thermobia, specifically in gut and eye development but also resorption of the extraembryonic fluid and hatching. They then exogenously apply JH early in development (when it is not normally present) to examine the biological effects of JH at these stages. This causes a plethora of defects including developmental arrest, deposition of chitin, limb development, and enhanced muscle differentiation. The authors interpret these early effects on development as JH being important for the shift from morphogenetic growth to differentiation - a role that they speculate may have facilitated the evolution of metamorphosis (hemi- and holo-metaboly). This paper will be of interest to insect evo-devo researchers, particularly those with interests in the evolution of metamorphosis.

      Strengths:

      The experiments are generally conducted very well with appropriate controls and the authors have included a very detailed analysis of the phenotypes.

      The manuscript significantly advances our understanding of Thermobia development and the role of JH in Thermobia development.

      The authors interpret this data to present some hypotheses regarding the role of JH in the evolution of metamorphosis, some aspects of which can be addressed by future studies.

      Weaknesses:

      The results are based on using inhibitors and mimetics of JH and there was no attempt to discern immediate effects of JH from downstream effects. The authors show, for instance, that the transcription of myoglianin is responsive to JH levels, it would have been interesting to see if any of the phenotypic effects are due to myoglianin upregulation/suppression (using RNAi for example). These kinds of experiments will be necessary to fully work out if and how the JH regulatory network has been co-opted into metamorphosis.

      We agree completely and should be a feature of future work.

      The results generally support the authors' conclusions. However, the discussion contains a lot of speculation and some far-reaching conclusions are made about the role of JH and how it became co-opted into controlling metamorphosis. There are some interesting hypotheses presented and the author's speculations are consistent with the data presented. However, it is difficult to make evolutionary inferences from a single data point as although Thermobia is a basally branching insect, the lineage giving rise to Thermobia diverged from the lineages giving rise to the holo- and hemimetabolous insects approx.. 400 mya and it is possible that the effects of JH seen in Thermobia reflect lineage-specific effects rather than the 'ancestral state'. The authors ignore the possibility that there has been substantial rewiring of the networks that are JH responsive across these 400 my. I would encourage the authors to temper some of the discussion of these hypotheses and include some of the limitations of their inferences regarding the role of JH in the evolution of metamorphosis in their discussion.

      We have tried to be less all-encompassing in the Discussion. The strongest comparisons can be made between ametabolous and hemimetabolous insects and we have focused most of the Discussion on the role of JH in that transition. We still include some discussion of holometabolous insects because the ancestral embryonic functions of JH may be somehow related to the unusual reappearance of JH in the prepupal period. We have reduced this discussion to only a few sentences.

      Reviewer #3 (Recommendations For The Authors):

      (1) The overall manuscript is very long (especially the discussion), and the main messages of the manuscript get lost in some of the details. I would suggest that the authors move some of the results to the supplementary material (e.g. it might be possible to put a lot of the detail of Thermobia embryogenesis into the supplementary text if the authors feel it is appropriate). The discussion contains a lot of speculation and I suggest the authors make this more concise. One example: At the moment there is a large section on the modification in JH function during the evolution of holo and hemi-metabolous life history strategies. There are some interesting ideas in this section and the authors do a good job of integrating their findings with the literature - but I would encourage the authors to limit the bulk of their discussion to the specific things that their results demonstrate. E.g. The first half of p17 contains too much detail, and the focus should be on the relationship with Thermobia (as at the bottom of p17).

      Section has been revised and is more focused

      (2) I would also suggest a thorough proofread of the manuscript, I have highlighted some of the errors/points of confusion that I found in the list below - but this list is unlikely to be exhaustive . We appreciate catching the errors. Hopefully the final version is better proofed.

      (3) It might be me, but I found the wording in the second half of the abstract a bit confusing. Particularly the statement about the redeployment of morphogen systems - could this be stated more clearly?

      Abstract has been revised.

      (4) Introduction

      a. "powered flight" rather than 'power flight'

      Done

      b. 'brought about a hemimetabolous lifecycle' implies causality which hasn't been shown and directionality to evolution - suggest 'facilitated the evolution of a hemi...". Similar comment for 'subsequent step to complete metamorphosis'.

      c. Bottom of p2 - unclear whether you are referring to hemi- holo- or both

      d. Suggest removing sentence beginning "besides its effects..." as the relevance of the role of JH in caste isn't clear.

      Kept sentence but removed initial clause

      e. State that Thermoia is a Zygentoma.

      Done

      f. Throughout - full species names on first usage only, T. domestica on subsequent usages.

      We will continue to use genus names for the reason given above.

      Gene names e.g. kr-h1 in italics.

      g. 'antagonise morphogens"? rather than 'antagonise morphoentic'.

      Done

      (5) Results

      a. Unclear why drawings are provided rather than embryonic images in Fig. 1A

      We think that the points can be made better with diagrams.

      b. Top of p4, is 'slot' the correct word?

      Corrected

      c. Unclear why the measurements of JHIII weren't measured before 5 days AEL, especially given that many of the manipulative experiments are at earlier time points than this. I appreciate that, based on kr-h1, levels that JHIII is also likely to be low.

      d. Reference for the late embryonic peak of 20E being responsible for the J2 cuticle?

      Clarified that this is an assumption

      e. Clarify "some endocrine related transcripts" why were these ones in particular picked? Kr-h1 is a good transcriptional proxy for JH and Met is the JH-receptor, why myoglianin and not some of the other transcriptional proxies of neuroendocrine signalling?

      Hopefully, the choice is clearer.

      f. Fig 2C rather than % embryo development for the gene expression data please represent this in days (to be consistent with your other figures).

      It is now consistent with other parts of figure.

      g. In Fig. 3 the authors do t-tests, because there are three groups there needs to be some correction for multiple testing (e.g. Bonferroni) can the authors add this to the relevant methods section?

      We think that pair-wise comparisons are appropriate.

      h. Fig. 3 legend: you note that you treat stage 2 juveniles with 7EP - I couldn't tell what AEL this corresponded to.

      This is after hatching so AEL does not apply.

      i. Top of p7 'deformities' rather than 'derangements'?

      Done

      j. Regarding the dosage effects of embryonic abnormalities - it would be good to include these in the supp material, as it convinces the reader that the effects you have seen aren't just due to toxicity.

      It is not clear what the objection is.

      k. Bottom of p7 'problematic' not 'problematical'

      Done

      l. P8 Why are the clusters of Its important? - provide a bit more interpretation for the reader here.

      This is clear in the revised version.

      m. P9 Why is the modulation of transcription of kr-h1, met, and myo important in this context

      Explained

      n. P9 'fig. 7F'? there is no Fig. 5F

      Thanks for catching the typo.

      o. Fig. 7B add to the legend which treatment the dark and light points correspond to.

      We think it is obvious from the labeling on Fig 7B.

      (6) Discussion:

      a. What do we know about how terminal differentiation is controlled in non-insect arthropods? Most of the discussion is focused on insects (which makes sense as JH is an insect-specific molecule), but if the authors are arguing the ancestral role of JH it would be useful to know how their findings relate to non-insect arthropods.

      We have not been able to find any information about systemic signals being involved in non-insect arthropods.

      b. There is no Fig. 5E (are they referring to 7E?)

      Yes, it should have been Fig. 7E.

      c. Is myoglianin a direct target of JH in other species?

      Other reports are in postembryonic stages and show that myoglianin suppresses JH production. Our paper is the first examination in embryos and we find that the opposite is true – i.e., that JH treatment suppresses myoglianin production. We suspect that these two signaling systems are mutually inhibitory. It would be interesting to see whether treatment of a post-critical weight larva with JH (which would induce a supernumerary larval molt) would also suppress myoglianin production (as we see in Thermobia embryos).

      d. P12 What is the evidence that JH interacts with the first 20E peak to alter the embryonic cuticle?

      We are not sure what the issue is. The experimental fact is that treatment with JH before the E1 ecdysteroid peak causes the production of an altered E1 cuticle. We are faced with the question of why is this molt sensitive to JH when the latter will not appear until 3 or 4 days later? A possible answer is that the ecdysone response pathway has a component that has inherent JH sensitivity. The mosquito data suggest that Taiman provides another link between JH and ecdysone action

      e. Top of p13 - this paragraph can be cut down substantially. Although this is evidence that JH can alter ecdysteriods - it is in a species that is 400 my derived from the target species. Is it likely to be the exact same mechanism? I would encourage the authors to distil and retain the most important points.

      This paragraph has been shortened and focused.

      f. Bottom of p13 - what does this study add to this knowledge?

      The response of Thermobia embryos to JH treatment is qualitatively the same as seen in other short germband embryos. This similarity supports the assumption that the same responses would have been seen in their last common ancestor.

      g. P19 the last paragraph in the conclusions is really peripherally relevant to the paper and is a bit of a stretch, I would encourage the authors to leave this section out.

      We agree that it is a stretch. JH and its precursor MF are the only sesquiterpene hormones. How did they come about to acquire this function? We think it is worth pointing out the farnesol metabolites have been associated with promoting differentiation in various eukaryotes. An ancient feature of these molecules in promoting (maintaining?) differentiation may have been exploited by the insects to develop a unique class of hormones. It is worth putting the idea out to be considered.

      h. P19 "conclusions" rather than 'concluding speculations'.

      Changed as suggested.

      Methods:

      It is standard practice to include at least two genes as reference genes for RT-qPCR analysis (https://doi.org/10.1186/gb-2002-3-7-research0034, https://doi.org/10.1373/clinchem.2008.112797) If there are large-scale differences in the tissues being compared (e.g. as there are here during development) then more than two reference genes may be required and a reference gene study (such as https://doi.org/10.3390%2Fgenes12010021) is appropriate. Have the authors confirmed that rp49 is stably expressed during the stages of Thermobia development that they assay here?

      We have explained our choice in the Methods.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work describes a new method for sequence-based remote homology detection. Such methods are essential for the annotation of uncharacterized proteins and for studies of protein evolution.

      Strengths:

      The main strength and novelty of the proposed approach lies in the idea of combining stateof-the-art sequence-based (HHpred and HMMER) and structure-based (Foldseek) homology detection methods with recent developments in the field of protein language models (the ESM2 model was used). The authors show that features extracted from high-dimensional, information-rich ESM2 sequence embeddings can be suitable for efficient use with the aforementioned tools.

      The reduced features take the form of amino acid occurrence probability matrices estimated from ESM2 masked-token predictions, or structural descriptors predicted by a modified variant of the ESM2 model. However, we believe that these should not be called "embeddings" or "representations". This is because they don't come directly from any layer of these networks, but rather from their final predictions.

      We agree that there is some room for discussion about whether the amino acid probabilities returned by pre-trained ESM-2 and the 3Di sequences returned by ESM-2 3B 3Di can be properly referred to as “embeddings”. The term “embedding” doesn’t have a formal definition, other than some kind of alternative vector representation of the input data which, preferably, makes the input data more suitable for some downstream task. In that simple sense of the word “embedding”, amino acid probabilities and 3Di sequences output by our models are, indeed, types of embeddings. We posed the question on Twitter (https://twitter.com/TrichomeDoctor/status/1715051012162220340) and nobody responded, so we are left to conclude that the community is largely ambivalent about the precise definition of “embedding”.

      We’ve added language in our introduction to make it more clear that this is our working definition of an “embedding”, and why that definition can apply to profile HMMs and 3Di sequences.

      The benchmarks presented suggest that the approach improves sensitivity even at very low sequence identities <20%. The method is also expected to be faster because it does not require the computation of multiple sequence alignments (MSAs) for profile calculation or structure prediction.

      Weaknesses:

      The benchmarking of the method is very limited and lacks comparison with other methods. Without additional benchmarks, it is impossible to say whether the proposed approach really allows remote homology detection and how much improvement the discussed method brings over tools that are currently considered state-of-the-art.

      We thank the reviewer for the comment. To address the question, we’ve expanded the results by adding a new benchmark and added a new figure, Figure 4. In this new content, we use the SCOPe40 benchmark, originally proposed in the Foldseek paper (van Kempen et al., 2023), to compare our best method, ESM-2 3B 3Di coupled to Foldseek, with several other recent methods. We find our method to be competitive with the other methods.

      We are hesitant to claim that any of our proposed methods are state-of-the-art because of the lack of a widely accepted standard benchmark for remote homology detection, and because of the rapid pace of advancement of the field in recent years, with many groups finding innovative uses of pLMs and other neural-network models for protein annotation and homology detection.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a number of exploratory applications of current protein representations for remote homology search. They first fine-tune a language model to predict structural alphabets from sequence and demonstrate using these predicted structural alphabets for fast remote homology search both on their own and by building HMM profiles from them. They also demonstrate the use of residue-level language model amino acid predicted probabilities to build HMM profiles. These three implementations are compared to traditional profile-based remote homology search.

      Strengths:

      • Predicting structural alphabets from a sequence is novel and valuable, with another approach (ProstT5) also released in the same time frame further demonstrating its application for the remote homology search task.

      • Using these new representations in established and battle-tested workflows such as MMSeqs, HMMER, and HHBlits is a great way to allow researchers to have access to the state-of-the-art methods for their task.

      • Given the exponential growth of data in a number of protein resources, approaches that allow for the preparation of searchable datasets and enable fast search is of high relevance.

      Weaknesses:

      • The authors fine-tuned ESM-2 3B to predict 3Di sequences and presented the fine-tuned model ESM-2 3B 3Di with a claimed accuracy of 64% compared to a test set of 3Di sequences derived from AlphaFold2 predicted structures. However, the description of this test set is missing, and I would expect repeating some of the benchmarking efforts described in the Foldseek manuscript as this accuracy value is hard to interpret on its own.

      The preparation of training and test sets are described in the methods under the heading “Fine tuning ESM-2 3B to convert amino acid sequences into 3Di sequences”. Furthermore, there is code in our github repository to reproduce the splits, and the entire model training process: https://github.com/seanrjohnson/esmologs#train-esm-2-3b-3di-starting-from-the-esm-2-3bpre-trained-weights

      We didn’t include the training/validation/test splits in the Zenodo repository because they are very large: train 33,924,764; validation 1,884,709; test 1,884,710 sequences, times 2 because there are both amino acid and 3Di sequences. It comes out to about 30 Gb total, and is easily rebuilt from the same sources we built it from.

      We’ve added the following sentence to the main text to clarify:

      “Training and test sets were derived from a random split of the Foldseek AlphaFold2 UniProt50 dataset (Jumper et al., 2021; van Kempen et al., 2023; Varadi et al., 2022), a reducedredundancy subset of the UniProt AlphaFold2 structures (see Methods for details).”

      To address the concern about comparing to Foldseek using the same benchmark, we’ve expanded the results section and added a new figure, Figure 4 using the SCOPe40 benchmark originally presented in the Foldseek paper, and subsequently in the ProstT5 paper to compare Foldseek with ESM-2 3B 3Di to Foldseek with ProstT5, AlphaFold2, and experimental structures.

      • Given the availability of predicted structure data in AFDB, I would expect to see a comparison between the searches of predicted 3Di sequences and the "true" 3Di sequences derived from these predicted structures. This comparison would substantiate the innovation claimed in the manuscript, demonstrating the potential of conducting new searches solely based on sequence data on a structural database.

      See response above. We’ve now benchmarked against both ProstT5 and AF2.

      • The profile HMMs built from predicted 3Di appear to perform sub-optimally, and those from the ESM-2 3B predicted probabilities also don't seem to improve traditional HMM results significantly. The HHBlits results depicted in lines 5 and 6 in the figure are not discussed at all, and a comparison with traditional HHBlits is missing. With these results and presentation, the advantages of pLM profile-based searches are not clear, and more justification over traditional methods is needed.

      We thank the reviewer for pointing out the lack of clarity in the discussion of lines 5 and 6.

      We’ve re-written that section of the discussion, and reformatted Figure 3 to enhance clarity.

      We agree, a comparison to traditional HHBlits could be interesting, but we don’t expect to see stronger performance from the pLM-predicted profiles than from traditional HHBlits, just as we don’t see stronger performance from pLM-hmmscan or pLM-Foldseek than from the traditional variants. We think that the advantages of pLM based amino acid hmm searches are primarily speed. There are many variables that can influence speed of generating an MSA and HMM profile, but in general we expect that it will be much slower than generating an HMM profile from a pLM.

      We don’t know why making profiles of 3Di sequences doesn’t improve search sensitivity, we just think it’s an interesting result that is worth presenting to the community. Perhaps someone can figure out how to make it work better.

      • Figure 3 and its associated text are hard to follow due to the abundance of colors and abbreviations used. One figure attempting to explain multiple distinct points adds to the confusion. Suggestion: Splitting the figure into two panels comparing (A) Foldseek-derived searches (lines 7-10) and (B) language-model derived searches (line 3-6) to traditional methods could enhance clarity. Different scatter markers could also help follow the plots more easily.

      We thank the reviewer for this helpful comment. We’ve reformatted Figure 3 as suggested, and we think it is much easier to read now.

      • The justification for using Foldseek without amino acids (3Di-only mode) is not clear. Its utility should be described, or it should be omitted for clarity.

      To us, the use of 3Di-only mode is of great theoretical interest. From our perspective, this is one of our most significant results. Previous methods, such as pLM-BLAST and related methods, have made use of very large positional embeddings to achieve sensitive remote homology search. We show that with the right embedding, you don’t need very many bits per position to get dramatically improved search sensitivity from Smith-Waterman, compared to amino acid searches. We also doubt that predicted 3Di sequences are the optimal small encoding for remote homology detection. This result and observation opens up an exciting avenue for future research in developing small, learned positional embeddings that are optimal for remote homology detection and amenable to SIMD-optimized pre-filtering and Smith-Waterman alignment steps.

      We’ve expanded the discussion, explaining why we are excited about this result.

      • Figure 2 is not described, unclear what to read from it.

      It's just showing that ESM-2-derived amino acid probabilities closely resemble amino acid frequencies in MSAs. We think it gives readers some visual intuition about why predicted profile HMMs perform as well as they do. We’ve added some additional explanation of it in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The paper would mainly benefit from a more comprehensive benchmark:

      We suggest that the authors extend the benchmark by including the reference methods (HHpred and Foldseek) run with their original representations, i.e., MSAs obtained with 2-3 iterations of hhblits (for HHpred) and experimental or predicted structures (for Foldseek). HHpred profile-profile comparisons and Foldseek structure-structure comparisons would be important reference points for assessing the applicability of the proposed approach in distant homology detection. It is also essential to compare the method with other emerging tools such as EBA (DOI: 10.1101/2022.12.13.520313), pLM-BLAST (DOI: 10.1101/2022.11.24.517862), DEDAL (DOI: 10.1038/s41592-022-01700-2), etc.

      We also suggest using an evolutionary-oriented database for the benchmark, such as ECOD or CATH (these databases classify protein domains with known structures, which is important in the context of including Foldseek in the benchmark). We ran a cursory benchmark using the ECOD database and generated HH-suite .hhm files (using the single_seq_to_hmm.py and hhsearch_multiple.py scripts). Precision and recall appear to be significantly lower compared to "vanilla" hhsearch runs with MSA-derived profiles. It would also be interesting to see benchmarks for speed and alignment quality.

      The pLM-based methods for homology detection are an emerging field, and it would be important to evaluate them in the context of distinguishing between homology and analogy. In particular, the predicted Foldseek representations may be more likely to capture structural similarity than homology. This could be investigated, for example, using the ECOD classification (do structurally similar proteins from different homology groups produce significant matches?) and/or resources such as MALISAM that catalog examples of analogy.

      We’ve added the SCOPe40 benchmark, which we think at least partially addresses these comments, adding a comparison to pLM-BLAST, ProstT5, and AF2 followed by Foldseek. The question of Analogy vs homology is an interesting one. It could be argued that the SCOPe40 benchmark addresses this in the difference between Superfamily (distant homology) and Fold (analogy, or very distant homology).

      Our focus is on remote homology detection applications rather than alignment quality, so we don’t benchmark alignment quality, although we agree that those benchmarks would be interesting.

      Page 2, lines 60-67. This paragraph would benefit from additional citations and explanations to support the superiority of the proposed approach. The fact that flattened embeddings are not suitable for annotating multidomain proteins seems obvious. Also, the claim that "current search implementations are slow compared to other methods" should be supported (tools such as EBA or pLM-BLAST have been shown to be faster than standard MSA-based methods). Also, as we mentioned in the main review, we believe that the generated pseudo-profiles and fine-tuned ESM2 predictions should not be called "smaller positional embeddings".

      Discriminating subdomains was a major limitation of the influential and widely-cited PfamN paper (Bileschi et al., 2022), we’ve added a citation to that paper in that paragraph for readers interested in diving deeper.

      To address the question of speed, we’ve included data preparation and search benchmarks as part of our presentation of the SCOPe40 benchmark.

      Finally, we were not sure why exactly every 7th residue is masked in a single forward pass. Traditionally, pseudo-log likelihoods are generated by masking every single token and predicting probabilities from logits given the full context - e.g. https://arxiv.org/pdf/1910.14659.pdf. Since this procedure is crucial in the next steps of the pipeline, it would be important to either experiment with this hyperparameter or explain the logic used to choose the mask spacing.

      We’ve added discussion of the masking distance to the Methods section.

      Reviewer #2 (Recommendations For The Authors):

      • While the code and data for the benchmark are available, the generation of searchable databases using the methods described for a popular resource such as Pfam, AFDB, SCOP/CATH which can be used by the community would greatly boost the impact of this work.

      3Di sequences predicted by ESM-2 3B 3Di can easily be used as queries against any Foldseek database, such as PDB, AFDB, etc. We’ve added Figure 4E to demonstrate this possibility, and added some related discussion.

      • Minor: In line 114, the text should likely read "compare lines 7 and 8" instead of "compare lines 6 and 7."

      We’ve clarified the discussion of Figure 3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for their tremendously helpful comments. We outline below changes we have made to the manuscript in response to each point. These include new analyses and a substantial rewrite to address the concerns about lack of clarity.

      We believe the revisions strengthen the evidence for our conclusion that grid fields can be either anchored to or independent from a task reference frame, and that anchoring is selectively associated with successful path integration-dependent behaviour. Our additional analyses of non-grid cells indicate that while some are coherent with the grid population, many are not, suggesting cell populations within the MEC may implement grid-dependent and grid-independent computations in parallel.

      We hope the reviewers will agree that our novel experimental strategy complements and avoids limitations of perturbation-based approaches, and by providing evidence to dissociate the two major hypotheses for whether and when grid cells contribute to behaviour our results are likely to have a substantial impact on the field.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Clark et. al. uncovered an association between the positional encoding of grid cell activity with good performance in spatial navigation tasks that requires path integration, highlighting the contribution of grid firing to behaviour… The conclusions of this paper are mostly well supported by data, the finding about the association between grid cell encoding and behaviour in spatial memory tasks is important. However, some aspects of the analysis need to be clarified or extended.

      Thankyou for the overview and constructive comments.

      (1) While the current dataset aims to demonstrate a "correlation" between grid cell encoding and task performance, the other variables that could confound this correlation should be carefully examined.

      (1.1) The exact breakdown of the fraction of beaconed/non-beaconed/probe trials is never shown. if the session makeup has a significant effect on the coding scheme or other results, this variable should be accounted for.

      The lack of information about the trial organisation was a substantial oversight in our preparation of the first version of the manuscript. Session make up can not account for effects on grid stability and its relationship to behavioural outcome but this was not made at all clear.

      In all sessions trial types were varied in a fixed repeating sequence. Therefore, continuous blocks of trials on which grid firing is anchored (or independent from) the track can not be explained by the mouse experiencing a particular trial type. We have revised the manuscript to make this clearer, e.g. p 5, ‘These switches could not be explained by variation between trials in the availability of cues or rewards, as these were interleaved in blocks that repeated throughout a session (see Methods), whereas periods in which grid cell activity was in a given mode extended across the repeating blocks (e.g. Figures 3D,E, 4A, 5E,F).’ and methods p 12, ‘Trials were delivered in repeating blocks throughout a recording session…’

      (1.2) The manuscript did not provide information about whether individual mice experienced sessions with different combinations of the three trial types, and whether they show different preferences in position or distance encoding even in comparable sessions. This leads to the question of whether different behaviour and activity encoding were dominated by experimental or natural differences between individual mice. Presenting the data per mouse will be helpful.

      As we note above, because trial types were interleaved in a fixed sequence, experience of a particular trial type can not account for switching between task-anchored and taskindependent firing modes. This was insufficiently clear in the first version of the manuscript.

      We varied the proportions of trials of a particular type between sessions with the aim of maximising the number of non-beaconed and probe trials. This was necessary because we find that if we introduce too high a proportion of these trials early in training then mice appear to ‘lose interest’ in the task and their performance drops off. We therefore used an approach in which we increased the proportions of non-beaconed and probe trials over training days as mice became familiar with the task. This is now described in the methods (p 12).

      Because the decision for when to vary the proportion of trial types was based on the previous day’s performance, the experimental design was not optimised for addressing the reviewer’s question about dissociating experimental from natural differences in mice. To provide some initial insight we have analysed the relationship between task anchored coding and proportion of beaconed trials in a session (Figure 3, Figure Supplement 7). While on average there is a higher proportion of trials in which grid fields are task-anchored in sessions with more beaconed trials, this effect is small and most of the variance is independent from the proportion of beaconed trials.

      (1.3) Related to the above point, in Figure 5, the mice appeared to behave worse in probe trials than non-beaconed trials. If the mouse did not know if a trial is a probe or a non-beacon trial, they should behave equivalently until the reward location and thus should stop an equal amount. If this difference is because multiple probe trials are placed consecutively, did the mouse learn that it will not get a reward and then stop trying to get rewards? Did this affect switching between position and distance coding?

      Thankyou for flagging this. This reflected an inconsistency arising from the way we detected stops that we have now corrected. Briefly, the temporal resolution of the processed location data against which the stop detection threshold was applied was insufficiently high. As a result, stops in the non-beaconed group were picked up, as they tended to be longer because mice remained still to consume rewards, whereas some stops in the probe group were missed because they were relatively short. We have corrected this by repeating the analyses on raw position data at the highest temporal resolution available. This analysis is now clearly described in the Methods (see p13 “A stop was registered in Blender3D if the speed of the mouse dropped below 4.7 cm/s. Speed was calculated on a rolling basis from the previous 100 ms at a rate of 60 Hz.”).

      (1.4) It is not shown how the behaviours (e.g., running speed away from the reward zone, licking for reward) in beaconed/non-beaconed/probe trials were different and whether the difference in behaviours led to the different encoding schemes.

      Because trial types were interleaved and repeated with a period less than the length of typical trial sequences during which grid cell activity remained either task-anchored or taskindependent, differences between trial types are unlikely to explain use of the different coding schemes. Hopefully, this is clarified by the comments above.

      To further describe the relationship between behavioural outcomes, trial types and grid anchoring, we now also show running speed as a function of location for each combination of trial types and trial outcomes (Figure 6, Figure Supplement 1). This illustrates and replicates our previous findings (Tennant et al. 2018) that running speed profiles are similar for a given trial outcome regardless of trial type (Figure 6, Figure Supplement 1A), and further further shows that the behavioural profile for a given trial outcome and trial-type does not differ when grid cells are in task-anchored and task-independent modes (Figure 6, Figure Supplement 1B). This further argues against the possibility that difference in behaviours leads to the different encoding schemes.

      (2) Regarding the behaviour and activity encoding on a trial-by-trial basis, did the behavioural change occur first, or did the encoding switch occur first, or did they happen within the same trial? This analysis will potentially determine whether the encoding is causal for the behaviour, or the other way around.

      This is a good question but our experimental design lacks sufficient statistical power to address the timing of mode switches within a trial. This is because mode switching is relatively infrequent (so the n for switching is low) and only a subset of trials are uncued (making the relevant n even lower), while at a trial level the behavioural outcome is variable (increasing the required n for adequate power).

      (3) The author determined that the grid cell coding schemes were limited to distance encoding and position encoding. However, there could be other schemes, such as switching between different position encodings (with clear spatial fields but at different locations), as indicated by Low et. al., 2021, and switching between different distant encodings (with different distance periods). If these other schemes indeed existed in the data, they might contribute to the variation of the behaviours.

      Switching between position encoding schemes appears to be rare within our dataset and unlikely to contribute to variation in behaviour. In most sessions we did not observe switching between grid phases / position encodings (e.g. Figures 2A-B, 3B-E, 4A, 5C-D, F). In one session we found switching between different phases when grid cells were taskanchored. Because the grid period was unchanged, the spatial periodograms remained similar. We report this example in the revised manuscript (Figure 5E).

      (4) The percentage of neurons categorised in each coding scheme was similar between nongrid and grid cells. This implies that non-grid cells might switch coding schemes in sync with grid cells, which would mean the whole MEC network was switching between distance and position coding. This raises the question of whether the grid cell coding scheme was important per se, or just the MEC network coding scheme.

      We very much appreciate this suggestion. We note first that while the proportion of taskanchored grid and non-grid cells is similar, task-independent periodic firing of non-grid cells is much rarer than for grid cells (Figure 2E), suggesting a dissociation between the populations. To further address the question we have included additional analyses of nongrid cells (Figure 3, Figure Supplement 5). This shows that while some non-grid cells have anchoring that switches coherently with simultaneously recorded grid cells, others do not. Figures 4 and 5 now show examples of non-grid cell activity recorded simultaneously with grid cells.

      Together, our data suggest that the MEC implements multiple coding schemes: one that is associated with the grid network and includes some non-grid cells; and one (or more) that can be independent from the grid network. This dissociation adds to the insights into MEC function that are provided by our study and is now highlighted in the abstract and discussion.

      (5) In Figure 2 there are several cell examples that are categorised as distance or position coding but have a high fraction of the other coding scheme on a per-trial basis. Given this variation, the full session data in F should be interpreted carefully, since this included all cells and not just "stable" coding cells. It will be cleaner to show the activity comparison only between the stable cells.

      We have now included examples in Figure 2A-C where the grid mode is stable throughout a session. As the view of activity at a session level is important, we have not updated Figure 2F, but have clarified the terminology to now clearly refer to classification at either season or trial levels. In addition, we have repeated the analyses shown in Figure 2F but after grouping cells according to whether their firing has a single mode on >85% of the trials (Figure 3 Figure Supplement 4). This analysis supports similar conclusions to those of Figure 2F.

      (6) The manuscript is not well written. Throughout the manuscript, there are many unexplained concepts (especially in the introduction) and methods, mis-referenced figures, and unclear labels.

      We very much appreciate the feedback and have substantially rewritten the manuscript. We have paid particular attention to explaining key concepts in the introduction and have carefully checked the figures. We welcome further feedback on whether this is now clearer.

      Reviewer #2 (Public Review):

      Clark and Nolan's study aims to test whether the stability of grid cell firing fields is associated with better spatial behaviour performance on a virtual task… This study is very timely as there is a pressing need to identify/delimitate the contribution of grid cells to spatial behaviours. More studies in which grid cell activity can be associated with navigational abilities are needed.

      Thank you for the supportive comments and highlighting the importance of the question.

      The link proposed by Clark and Nolan between "virtual position" coding by grid cells and navigational performance is a significant step toward better understanding how grid cell activity might support behaviour. It should be noted that the study by Clark and Nolan is correlative. Therefore, the effect of selective manipulations of grid cell activity on the virtual task will be needed to evaluate whether the activity of grid cells is causally linked to the behavioural performance on this task. In a previous study by the same research group, it was shown that inactivating the synaptic output of stellate cells of the medial entorhinal cortex affected mice's performance of the same virtual task (Tennant et al., 2018). Although this manipulation likely affects non-grid cells, it is still one of the most selective manipulations of grid cells that are currently available.

      Again, thank you for the supportive comments. We recognise the previous version of the manuscript did not sufficiently clarify the motivation for our approach, or the benefits of capitalising on behavioural variable variability as a complementary strategy to perturbation approaches. We now make this clearer in the revised introduction (p 2, paragraphs 2 and 3).

      When interpreting the "position" and "distance" firing mode of grid cells, it is important to appreciate that the "position" code likely involves estimating distance. The visual cues on the virtual track appear to provide mainly optic flow to the animal. Thus, the animal has to estimate its position on the virtual track by estimating the distance run from the beginning of the track (or any other point in the virtual world).

      We appreciate the ambiguity here was confusing. We have re-named the groups to ‘taskanchored’, corresponding to when grid cells encode position on the track (as well as distance as the reviewer correctly points out), and ‘task-independent’, corresponding to the group we previously referred to as distance encoding.

      It is also interesting to consider how grid cells could remain anchored to virtual cues. Recent work shows that grid cell activity spans the surface of a torus (Gardner et al., 2022). A run on the track can be mapped to a trajectory on the torus. Assuming that grid cell activity is updated primarily from self-motion cues on the track and that the grid cell period is unlikely to be an integer of the virtual track length, having stable firing fields on the virtual track likely requires a resetting mechanism taking place on each trial. The resetting means that a specific virtual track position is mapped to a constant position on the torus. Thus, the "virtual position" mode of grid cells may involve 1) a trial-by-trial resetting process anchoring the grid pattern to the virtual cues and 2) a path integration mechanism. Just like the "virtual position" mode of grid cell activity, successful behavioural performance on non-beaconed trials requires the animal to anchor its spatial behaviour to VR cues.

      Reviewer #3 (Public Review):

      This study addresses the major question of 'whether and when grid cells contribute to behaviour'. There is no doubt that this is a very important question. My major concern is that I'm not convinced that this study gives a significant contribution to this question, although this study is well-performed and potentially interesting. This is mainly due to the fact that the relation between grid cell properties and behaviour is exclusively correlative and entirely based on single cell activity, although the introduction mentions quite often the grid cell network properties and dynamics. In general, this study gives the impression that grid cells exclusively support the cognitive processes involved in this task. This problem is in part related to the text.

      Thank you for the comments. We recognise now that the previous text was insufficiently clear. We have modified the introduction to clarify the value of an approach that takes advantage of behavioural variability. Importantly, this approach is complementary to perturbation strategies we and others have used previously. In particular it addresses critical limitations of perturbation strategies which can be confounded by off-target effects and possible adaptation, both of which are extremely difficult to fully rule out. We hope that with this additional clarification it is now clear that as for any important question multiple and complementary testing strategies are required to make progres, and second, that our study makes a new and important contribution by introducing a novel experimental approach and by following this up with careful analyses that clearly distinguish competing hypotheses.

      However, it would be interesting to look at the population level (even beyond grid cells) to test whether at the network level, the link between behavioural performance and neural activity is more straightforward compared to the single-cell level. This approach could reconcile the present results with those obtained in their previous study following MEC inactivation.

      We’re unclear here about what the reviewer means by ‘more straightforward’ as clear relationships between activity of single grid cells and populations of grid cells are well established (Gardner et al., 2021; Waaga et al., 2021; Yoon et al., 2013).

      To give a clearer indication of the corresponding population level representations, as mentioned in response to Reviewer #1, we now include additional data showing many simultaneously recorded neurons, and analyses of non-grid as well as grid cells (Figures 4, 5, Figure 5 Figure Supplement 2).

      To reconcile results with our previous study of MEC inactivation we have paid additional attention to the roles of non-grid cells (following suggestions by Reviewer #1). We show that while some non-grid cells show transitions between task-anchored and task-independent firing that are coherent with the grid population, many others have more stable firing that is independent of grid representations. This is consistent with the idea that the MEC supports localised behaviour in the cued and uncued versions of the task (Tennant et al., 2018), and suggests that while grid cells preferentially contribute when cues are absent, non-grid cells could also support the cued version. We make this additional implication clear in the revised abstract and discussion.

      The authors used a statistical method based on the computation of the frequency spectrum of the spatial periodicity of the neural firing to classify grid cells as 'position-coding' (with fields anchored to the virtual track) and 'distance-coding' (with fields repeating at regular intervals across trials). This is an interesting approach that has nonetheless the default to be based exclusively on autocorrelograms. It would be interesting to compare with a different method based on the similarities between raw maps.

      While our main analyses use a periodogram-based method to identify when grid cells are / are not anchored to the task environment, we validate these analyses by examination of the rate maps in each condition (Figures 2-4). For example, when grid cells are task-anchored, according to the periodogram analysis, the rate maps clearly show spatially aligned peaks, whereas when grid cells are not anchored the peaks in their rate maps are not aligned (Figure 2A vs 2B; Figure 3B-E; Figure 4C). We provide further validation by showing that spatial information (in the track reference frame) is substantially higher when grid cell activity is task-anchored vs task-independent (Figures 2F, 3G, 4F and Figure 3 Figure Supplement 4).

      To further address this point we have carried out additional complementary analyses in which we identify task anchored vs task independent modes using a template matching method applied to the raw rate maps (Figure 6, Figure Supplement 2). These analyses support similar conclusions to our periodogram-based analyses.

      Beyond this minor point, cell categorization is performed using all trial types.

      Each trial type (i.e. beacon or non-beacon) is supposed to force mice to use different strategies and should induce different spatial representations within the entorhinal-hippocampal circuit (and not only in the grid cell system). In that context, since all trials are mixed, it is difficult to extrapolate general information.

      We recognise that the description of the task design was insufficiently clear but are unsure why ‘it is difficult to extrapolate general information’. Before addressing this point, we should first be clear that mice are not ‘forced’ to adopt any particular strategy. Rather, on uncued trials a path integration strategy is the most efficient way to solve the task. However, mice could instead use a less efficient strategy, for example by stopping at short intervals they still obtain rewards. Detailed behavioural analyses indicate that such random stopping strategies are used by naive mice, while with training mice learn to use spatial stopping strategies (Tennant et al. 2018).

      In terms of ‘extracting general information’ from the task, the following findings lead to general predictions: 1) Grid cells can exist in either task-anchored or task-independent periodic firing modes; 2) These modes can be stable across a session, but often modeswitching occurs within a session; 3) While some non-grid cells show task-independent periodic firing, this is much less common than for grid cells, which suggests a model in which many non-grid MEC neurons operate independently from the grid network; 4) When a marker cue is available mice locate a reward equally well when grid cells are in taskanchored versus task-independent modes, which argues against theories in which grid cells are a key part of a general system for localisation; 5) When markers cues are absent taskanchored grid firing is associated with successful reward localisation, which corroborates a key prediction of theories in which grid cells contribute to path integration.

      In revising the manuscript we have attempted to improve the writing to make these advances clearer, and have clarified methodological details that made interpretation more challenging than it should have been. For example, as noted in our response to Reviewer #1, we have included additional details to clarify the organisation of trials and relationships between trials, behavioural outcomes and neural codes observed.

      On page 5 the authors state that 'Since only position representations should reliably predict the reward location, ..., we reasoned that the presence of positional coding could be used to assess whether grid firing contributes to the ongoing behaviour'. I do not agree with this statement. First of all, position coding should be more informative only in a cue-guided trial. Second, distance coding could be as informative as position coding since at the network level may provide information relevant to the task (such as distance from the reward).

      Again, this point perhaps reflects a lack of clarity on our part in writing the manuscript. When grid cells are anchored to the track reference frame (now called ‘tasked anchored’, previously ‘position encoding’), then the location of the rate peaks in grid firing is reliable from trial to trial. This is the case whether or not the trial is cued. When grid cells are independent of the track reference frame (now called ‘task independent’, previously ‘distance encoding’), then the location of the firing rate peaks vary from trial to trial. In the latter case, position can not be read out directly from trial to trial.

      In principle, in the task-independent mode track position could be calculated by storing the grid network configuration at the start of the track, which would differ on each trial, and then implementing a mechanism to readout relative distance as mice move along the track. However, if mice do use this computation we would expect them to do so equally well on cued and uncued trials. By contrast, our results clearly show a dissociation between trial types in the relationship between grid firing and behavioural outcome. We highlight and discuss this possibility in the revised manuscript (p 10, ‘Alternatively, mice could in principle estimate track location with a system that utilises information about distance travelled obtained from task-independent grid representations’).

      Third, position-coding is interpreted as more relevant because it predominates in correct trials. However, this does not imply that this coding scheme is indeed used to perform correct trials.

      We have revised the manuscript to clarify our goal of distinguishing major hypotheses for the roles of grid cells in behaviour (Introduction, ‘On the one hand, theoretical arguments that grid cell populations can generate high capacity codes imply that they could in principle contribute to all spatial behaviours (Fiete et al., 2008; Mathis et al., 2012; Sreenivasan and Fiete, 2011). On the other hand, if the behavioural importance of grid cells follows from their hypothesised ability to generate position representations by integrating self-motion signals (McNaughton et al., 2006), then their behavioural roles may be restricted to tasks that involve path integration strategies.’

      By showing that performance on cued trials is similar regardless of whether grid cells are task-anchored or not, we provide strong evidence against the idea that grid firing is in general necessary for location-based behaviours. By showing that task anchoring is associated with successful localisation when cues are absent we corroborate a key prediction of hypothesised roles for grid cells in path integration-dependent behaviour. Therefore, we substantially reduce the space of behaviours to which grid cells might contribute. Importantly, this space is much larger for the MEC, which is required for cued and uncued versions of the task. We have revised the introduction and discussion to make these points clearer.

      While we believe our results add a key piece of evidence to the puzzle of when and where grid cells contribute to behaviour, we agree that further work will be required to develop and test more refined hypotheses. Alternative models also remain plausible, for example perhaps the behaviourally relevant computations are implemented elsewhere in the brain with grid anchoring to the track as an indirect consequence. Nevertheless, explanations of this kind are more difficult to reconcile with evidence that inactivation of stellate cells in the MEC impairs learning of the task, and other manipulations that modify grid firing impair performance on similar tasks. We now discuss these possibilities (discussion p 10, ‘mice could in principle estimate track location with a system that utilises information about distance travelled obtained from task-independent grid representations’).

      It could be more informative to push forward the correlative analysis by looking at whether behavioural performance can be predicted by the coding scheme on a trial-by-trial basis.

      The previous version of the manuscript showed these analyses (now in Figure 6). Thus, task anchored grid firing predicts more successful performance on uncued trials at the session level (Figure 6A-B) and at the trial level (Figure 6C-D).

      Reviewer #1 (Recommendations For The Authors):

      (1) The author particularly mentioned that the 1D tracks are different from the "cue-rich environments that are typically used to study grid cells". It is not clear what conclusions would hold for a cue-rich environment or a track, which may require relatively less path integration compared to the cue-sparse environment. This point should be discussed.

      This is an important point that we did not pay sufficient attention to in the previous version of the manuscript. Our finding of successful localisation in the cued environment when grid cells are not task anchored implies that grid anchoring is not required to solve cued tasks. The implication here is that cue rich environments may then not be the most suitable for investigation of grid roles in behaviour as non-grid mechanisms may suffice, although this does not rule out the possibility that anchored grid codes may play important roles in learning about cue rich environments. We now address this point in the discussion (p 10, ‘An implication of this result is that cue rich tracks often used to investigate grid activity patterns may not engage behaviours that require anchored grid firing.’).

      (2) It would be good to see the statistics for the number of different cells (stable position or distance encoding, and unstable cells) identified per mouse/session and the number of grid cells per session.

      These are now added to Supplemental Data 2 and will also be accessible through code and datasets that we will make available alongside the version of record.

      (3) Figure 2F: any explanation about why AG cells had high spatial information?

      Previously the calculation used bits per spike and as aperiodic cells have low firing rates the spatial information was high. We have replaced this with bits per second, which provides a more intuitive measure and no longer implies high spatial information. We have amended this in the methods (p 15, ‘Spatial information was calculated in bits per second…’).

      (4) The following methods sections should provide additional details:

      (4.1) Details of the training protocol are largely left to reference papers. The reference papers give a general outline of the training protocol, but the details are not completely comparable given the single experiment performed on these mice. More details should be given on training stages and experience at the time of the experiment.

      The task is more clearly described in the introduction (p 3), and additional details of the training protocol are now provided in the methods (p 12-13).

      (4.2) The methods reference mean speed across sessions, but it is not clear where this was used.

      This was very poor wording. We have now changed this to ‘For each session the mean speed was calculated for each trial outcome’.

      (4.3) The calculation of the spatial autocorrelogram on a per-trial basis should be more explicitly stated. Is it the average of each 10 cm increment with the centre trial?

      We have added additional information to the methods (p 16-18).

      (4.4) 1D field detection is not sufficiently explained in Figure 1/S2. This information should also appear in the methods section.

      This is now clarified on page 16 in section ‘Analysis of neural activity and behaviour during the location memory task’.

      (5) The data in Figure 4A and B only shows speed vs. location for one example mouse. The combined per mouse or per session data should also be shown.

      This is now shown in Figure 5A and Figure 5, Figure Supplemental 2

      (6) Figure 5 is somewhat confusing. Why are A/B by session and C/D by trial? The methods imply that A/B are originally averaged by cell, but that duplicate cells in the same session are excluded because behaviour versus session type is identical. This method should be valid if all grid cells within a session are all "stable". This is likely given the synchrony of code-switching between grid cells, but not all co-active grid cells behaved identically.

      It is understandable that C/D are performed by trial, but it should be made clear that it is not a comparable analysis to A/B. It is unclear what N refers to in C. The figure says by trial, but the legend says the error bar is by cell. If data is calculated by trial and then averaged by cell, this should be more clearly stated.

      In Figure 6A/B (previously Figure 5A/B) we focus our analysis on sessions in which the mode of grid firing, either task-anchored or task-independent, was relatively stable on a trialto-trial basis (see Figure 3F for definitions). This enables us to then compare behaviour averaged across each session, with sessions categorised as task-anchored and task independent. This analysis has the advantage that it focuses on large blocks of time (whole sessions) in which the mode of grid firing is unambiguous, but the disadvantage is that it excludes many sessions in which grid firing switches between task-anchored and taskindependent modes.

      Figure 6C/D (previously Figure 5C/D) addresses this limitation by carrying out similar analyses with behaviour sorted into task-anchored versus task-independent groups at the level of trials. A potential limitation for this analysis is that grid firing is somewhat variable on a trial-by-trial basis and so some trials may be mis-classified. We don’t expect this to lead to systematic bias, but it may make the data more noisy. Nevertheless, these analyses are important to include as they allow assessment of whether conclusions from 6A/B hold when all sessions are considered.

      We have added additional clarification of the rationale for these analyses to the main text (p7-8, ‘’We addressed this by using additional trial-level comparisons’). We have also added clarification in the methods section for categorisation of task-anchored versus taskindependent trials when multiple grid cells were recorded simultaneously (p 17, ‘When assigning a common classification across a group of cells recorded simultaneously...’) and an explanation for the N in the figure legend. We also clarify that the analyses use a nested random effects design to account for dependencies at the levels of sessions and mice (methods, p 20, ‘Random effects had a nested structure to account for animals and sessions…’) .

      (7) Panels E and F of Figure 5 are not explained in the main text.

      This is now corrected (see p8, ‘Additional analyses…’).

      (8) Figure 5: Since stable grid cells and all grid cells are shown, it will be better to show unstable cells, which can be compared with grid cells.

      Given that the rationale for differences between Figure 6A/B and C/D (previously Figure 5AD) were not previously clear, the reason for focussing on stable grid cells here was likely also not clear (see point 6 above). We don’t show unstable grid cells in Figure 6A-B as the behaviour averaged at the level of a session would be a mix of trials when they are taskanchored and when they are task-independent. Therefore, the analysis would not test predictions about the relationship between task-anchored vs task-independent modes and behaviour. We hope this is now clear in the manuscript given the revisions introduced to address point 6 above.

      (9) The methods describing the statistics for these experiments are also confusing. The methods section should be written more clearly, and it should be made clear in the text or figure legend whether this data is the "original" data or is processed in relation to the model, such as excluding duplicate grid cells within a session. The figure legend should also state that a GLMM was used to calculate the statistics.

      We have revised the methods section with the goal of improving clarity, adding detail and removing ambiguity. This includes updates of the methods for the GLMM analysis, which are referred to within the Figure 6 legend. A clear definition of a stable session is now also added to the Figure 6 legend.

      Reviewer #2 (Recommendations For The Authors):

      When grid fields are anchored to the virtual world (position mode), there is probably small trialto-trial variability in the firing location of the firing fields. Is this trial-to-trial variability related to the variability in the stop location? This would provide a more direct link between path integration in grid cell networks and behaviour that depends on path integration.

      When attempting to address this we find that the firing of individual grid cells is too variable to allow sufficiently precise decoding of their fields at a single trial level. This is expected given the Poisson statistics of spike generation and previous evaluations of grid coding (e.g. (Stemmler et al., 2015)).

      The conclusion of the abstract is: "Our results suggest that positional anchoring of grid firing enhances the performance of tasks that require path integration." This statement is slightly confusing. The task requires 1) anchoring the behaviour to the visual cues presented at the start of the trial and 2) path integration from thereon to identify the rewarded location. The performance is higher when grid cells anchor to the visual cues presented at the start of the trial. What the results show is that the anchoring of grid firing fields to visual landmarks enhances the performance of tasks that require path integration from visual landmarks (i.e. grid cells being anchored to the reference frame that is behaviorally relevant).

      To try to more clearly explain the logic and conclusion we have rewritten the abstract, including the final sentence.

      Similar comment for the title of Figure 5: "Positional grid coding is not required for cued spatial localisation but promotes path integration-dependent localisation." Positional coding means that grid cells are anchored to the behaviorally relevant reference frame.

      To address the lack of clarity we have modified the little of Figure 6 (previously Figure 5) to read ‘Anchoring of grid firing to the task reference frame promotes localisation by path integration but is not required for cued localisation’.

      In Figure 1, there is a wide range of beaconed (40-80%) and non-beaconed (10-60%) trials given. It is not 100% clear whether these refer to the percentage of trials of a given type within the recording sessions. Was the proportion of non-beaconed trials manipulated? If so, was the likelihood of position and distance coding changing according to the percentage of nonbeaconed trials?

      The ranges given refer to proportions across different behavioural sessions. Within any given behavioural session the proportion was constant. We now make this clear in the figure legend and in the results and methods sections.

      We did not manipulate proportions of trial types during a session. Manipulations betweens sessions were carried out with the goal of maximising the numbers of uncued trials that the mice would carry out (see response to public comments above). While the effect of trial-type at the session level is not relevant to the hypotheses we aim to test here, we have included an additional analysis of the relationship between task anchoring and the proportions of trial types in a session (Figure 3, Figure Supplement 7)(also discussed above). As disentangling the effects of learning and motivation will be complex and likely require new experimental designs we have not drawn strong conclusions or pursued the analysis further..

      I was not convinced that the labels "position" and "distance" were appropriate for the two grid cell firing modes. My understanding is that the "position" code also requires the grid cell network to estimate distance. It seems that the main difference between the "position" and "distance" modes is that when in the "position" mode, the activity on the torus is reset to a constant toroidal location when the animal reaches a clearly identifiable location on the virtual track. In the "distance" mode, this resetting does not take place.

      As previously mentioned, we agree these terms weren’t the best and have since relabelled these as “task-anchored” and “task-independent”.

      There are a few sections in the manuscript that implicitly suggest that a causal link between grid cell activity and behaviour was demonstrated. For instance: "It has been challenging to directly test whether and when grid cells contribute to behaviour.": The assumption here is that the manuscript overcomes this challenge, but the study is correlative.

      We have modified the wording to be clear that we are introducing new tests of predictions made by hypotheses about causal relationships between grid coding and behaviour (introduction, p 1-2). We also clarify that our results argue against the hypothesis that grid cells provide a general coded for behaviour, but corroborate predictions of hypotheses in which they are specifically important for path integration (discussion, p 10).

      We have modified the title abstract and main text to try to treat claims about causality with care. We now more thoroughly introduce and contrast the approach we report here with previous experiments that use perturbations (introduction, p2). While it is tempting to make stronger claims for causality with these approaches, there are also logical limitations with perturbation-based approaches, for example the challenges of fully excluding off target effects and adaptation. We now explain how these strategies are complementary. Our view is that both strategies will be required to develop strong arguments for whether and when grid cells contribute to behaviour. From this perspective, it is encouraging that our conclusions are in agreement with what are probably the most specific perturbations of grid cells reported to date (Gil et al. 2017), while perturbations that more generally affect MEC function appear to impair cued and path integration-dependent behaviours (Tennant et al. 2018). We now discuss these points more clearly (introduction, p 2).

      I am slightly confused by the references to the panels in Figure 4.

      "In some sessions, localization of the reward occurred almost exclusively when grid cells were anchored to position and not when they encoded distance (Figure 4C). Figure 4C only shows position coding.

      "In other sessions, animals localised the reward when grid firing was anchored to position or distance, but overall performance was improved on positional trials (Figure 4D-E)." The reference should probably point to Figure 4E-F or just to 4E.

      "In a few sessions, we observed spatial stopping behaviour comparable to cued trials, even when grid firing almost exclusively encoded distance rather than position (Figure 4F)." From Figure 4F, it seems that the performance on non-beaconed trials is better during "position" coding.

      We have now updated Figure 5 (Figure 4 in the original manuscript) and references to the Figure in the text. Now Figure 5 shows the activity of cells recorded in stable and unstable task-anchored and task-independent sessions (see Figure 5C-F).

      Minor issues:

      Is this correct: (Figure 4A and Figure 4, Figure Supplement 1).

      This has been corrected.

      Figure 4B: There could be an additional label for position and distance.

      Figure 4B from the original manuscript has now been removed.

      Figure 4C-F. The panels on the right side should be explained in the Figure Legend.

      Legends for Figure 5C-F (previously Figure 4C-F) have now been updated.

      Reviewer #3 (Recommendations For The Authors):

      Specific questions :

      (1) Position coding reflects a coding scheme in which fields are spaced by a fixed distance; previous studies have shown that a virtual track grid map is a slice of the 2D classic grid. In that case, the fields are still anchored to the track but would produce a completely different map. Did the authors check whether it is the case at least for some cells? If not, what could explain such a major difference?

      Το avoid confusion we now use the term ‘task-anchored’ rather than ‘position coding’ (see comments above). We should further clarify that our conclusions rest on whether or not the grid fields are anchored to the track. Task anchored firing does not require that grid fields maintain their spacing from 2D environments, only that fields are at the same track position on each trial. Thus, whether the spacing of the fields corresponds to a slice through a 2D grid makes no difference to the hypotheses we test here.

      We agree that the relationship between 1D and 2D field organisation could be an interesting future direction, for example anchoring could involve resetting the grid phase while maintaining a stable period, or it could be achieved through local distortions in the grid period. However, since these outcomes would not help distinguish the hypotheses we test here we have not included analyses to address them.

      (2) Previous studies have highlighted the role of grid cells in goal coding. Here there is an explicit reward in a particular area. Are there any grid modifications around this area? This question is not addressed in this study.

      Again, we note that the hypotheses we test here relate to the firing mode of grid cells - taskanchored or task-independent - and interpretation of our results is independent from the specific pattern of grid fields on the track. This question nevertheless leads to an interesting prediction that if grid fields cluster in the goal area then this clustering should be apparent in the task-anchored but not the task-independent firing mode.

      We test this by considering the average distribution of firing fields across all grid cells in each firing mode (Reviewer Figure 1). We find that when grid firing is task-anchored there is a clear peak around the reward zone, which is consistent with previous work by Butler et al. and Boccara et al. Consistent with our other prediction, this peak is reduced when grid cells are in the task-independent mode.

      Author response image 1.

      Plot shows the grid field distribution during stable grid cell session (> 85 % task-anchored or task-independent) (A) or during task-anchored and task-independent trials (B). Shaded regions in A and B represent standard error of the mean measured across sessions and epochs respectively.

      (3) The behavioural procedure during recording is not fully explained. Do trial types alternate within the same session by blocks? How many trials are within a block? Is there any relation between trial alternation and the switch in the coding scheme observed in a large subset of the grid cells?

      We agree this wasn’t sufficiently clear in the previous version of the manuscript. Trial types were interleaved in a fixed order within each session. We have updated the results and methods sections to provide details (see responses above).

      (4) From the examples in Figure 2 it seems that firing fields tend to shift toward the start position. Is it the case in all cells? Could this reflect some reorganisation at the network level with cells signalling the starting as time progresses?

      This is inconsistent between cells. To make this variability clear we have included additional examples of spiking profiles from different grid cells (Figure 2 - 5). Because quantification of the phenomena would not, so far as we can tell, help distinguish our core hypotheses we have not included further analyses here.

      (5) Are grid cells with different coding properties recorded in different parts of the MEC? Are there any differences between these cell categories in the 2D map?

      The recordings we made are from the dorsal region of the MEC (stated at the start of the results section). We don’t have data to speak to other parts of the MEC.

      Minor:

      There are very few grid cell examples that repeat in the different figures. I would suggest showing more examples both in the main text and supplementary material.

      We have now provided multiple additional examples in Figures 2, 4 and 5. Grid cell examples repeat in the main figures twice, in both cases only when showing additional examples are shown from the same recording session (Figure 2A example #1 with Figure 5C, Figure 3E with Figure 4A). Further similar repeats are found in the supplemental figures (Figure 3D with Figure 5, Figure Supplement 2A, Figure 3C with Figure 5, Figure Supplement 2F).

      Fig1 A-B shows the predictions in a 1D track based on distance or position coding. The A inset represents the modification of field distribution from a 2D arena to a 1D track, as performed in this study. The inset B is misleading since it represents the modifications expected from a circular track to a 1D track as in Jacob et al 2019, that is not what the authors studied. It would be better to present either the predictions based on the present study or the prediction based on previous studies. In that case, they should mention the possibility that the 1D map is a slice of the 2D map.

      The goal of Figure 1A-B is to illustrate predictions (right) based on conclusions from previous studies (left). Figure 1A shows predicted 1D track firing given anchoring to the environment typically observed in grid cell studies in 2D arenas. Figure 1B shows predicted 1D track firing given the firing shifting firing patterns observed by Jacob et al. in a circular 2D track. To improve clarity, we have modified the legend to make clear that the schematics to the right are predictions given the previous evidence summarised to the left. As we outline above, the critical prediction relates to whether the representations anchor to the track. Whether the 1D representation is a perfect slice isn’t relevant to the hypotheses tested and so isn’t included in the schematic (see comments above).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study is valuable as it sheds light on the pivotal role played by alterations in glycan metabolism within chondrocytes in the onset of cartilage degeneration and early onset of osteoarthritis (OA) through the process of hypertrophic differentiation of chondrocytes, giving insights into the identification of nascent markers for early-stage OA. Although the methods, data, and analyses broadly support the claims, the data shown by the authors are incomplete because the mechanism by which cartilage degeneration induced by changes in glycometabolism occurs has not been fully elucidated. The authors' deductions stand to gain further credence through undertaking additional experiments aimed at analyzing the mechanisms underlying the changes in glycometabolism in cartilage, such as the meticulous identification of the target glycan molecules bearing core fucose and analysis of endochondral ossification in cartilage-specific Fut8 KO mice.

      We wish to express our strong appreciation to the Reviewer for his or her insightful comments on our paper. We feel the comments have helped us significantly improve the paper. In particular, we wish to acknowledge the Reviewer’s highly valuable comments on the effect of Fut8 on endochondral ossification.

      Reviewer #1 (Public Review): :<br /> Summary:

      This study is valuable in that it may lead to the discovery of future OA markers, etc., in that changes in glycan metabolism in chondrocytes are involved in the initiation of cartilage degeneration and early OA via hypertrophic differentiation of chondrocytes. However, more robust results would be obtained by analyzing the mechanisms and pathways by which changes in glycosylation lead to cartilage degeneration.

      Strengths:

      This study is important because it indicates that glycan metabolism may be associated with pre-OA and may lead to the elucidation of the cause and diagnosis of pre-OA.

      We thank reviewer #1 for their interest in our work and their overall positive report.

      Weaknesses:

      More robust results would be obtained by analyzing the mechanism by which cartilage degeneration induced by changes in glycometabolism occurs.

      To understand the mechanisms of cartilage degeneration induced by changes in glycometabolism, we attempted additional experiments using rescue experiments with external administration of TGF-β. We had shown that the addition of mannosidase to an organ culture system of normal wild-type mouse cartilage increased TGF-β gene expression from 6 hours (Fig. 3E) and that TGF-β expression was even suppressed in chondrocytes from Fut8 cKO mice (Fig. 4D). In addition to these results, an early OA model in which mannosidase is added to the cartilage was used to test the effect of exogenous TGF-β. As a result, under TGF-β treated conditions, no degenerative changes occurred when high-mannose type N-glycans were trimmed, and proteoglycan leakage during the recovery period was significantly reduced. This was considered to be a very useful finding and it was decided to include the experimental results in Figure 4F, rather than making them supplement data.

      Reviewer #2 (Public Review):

      Summary:

      This paper consists of mostly descriptive data, judged from alpha-mannosidase-treated samples, in which they found an increase in core fucose, a product of Fut 8.

      Strengths:

      This paper is interesting in the clinical field, but unfortunately, the data is mostly descriptive and does not have a significant impact on the scientific community in general.

      We thank reviewer #2 for their interest in our work and their overall positive report. In response to your comment about our attempts to show that glycan changes occur at the precursor stage of cartilage substrate degeneration and that this glycosylation is also what triggers substrate degeneration, we would like to add that reversing cartilage substrate degeneration is a very ambitious challenge. We are currently in the preparatory stages of characterizing the appropriate glycan-substrate relationships to 'rescue' cartilage tissue from degeneration, and we hope to use this approach to provide information on the pre-developmental stages of OA.

      Weaknesses:

      If core fucose is increased, at least the target glycan molecules of core fucose should be evaluated. They also found an increase in NO, suggesting that inflammatory processes also play an important role in OA in addition to glycan changes.

      As the increase in NO was observed in the organ culture system and cartilage is a tissue without vascular invasion, we thought that the involvement of immune cells could be excluded. On the other hand, our research group has reported that chondrocytes themselves have inflammatory circuits (Ota et al., Arthritis Rheum. 2019. DOI:10.1002/art.41182), but as we did not find increased expression of NF-κB, an indicator of inflammatory amplifier activation, we concluded that inflammation was not involved in this study.

      It has already been reported that core fucose is decreased by administration of alpha-mannosidase inhibitors. Therefore, it is expected that alpha-mannosidase administration increases core fucose.

      The report by Toegel et al. that the synthesis of complex-type N-glycans (Man2a1, Mgat2) is predicted in human OA chondrocytes along with the expression of Fut8 also led to the expectation that administration of α-mannosidase would increase core fucose. However, there was no conclusive evidence that administration of α-mannosidase increased core fucose; in 1987, Vignon et al performed an enzyme assay on experimental OA cartilage (rabbit ACLT model) and showed that mannosidase was very high in operated joints and that its activity increased and decreased with the severity of fibrosis in the cartilage. The results suggest that glycoprotein hexose degradation is an early transient event in the enzymatic process of cartilage destruction. These findings led to the conception of a novel 'pre-OA model' in which mannosidase is added to the joint. The present study is valuable in its demonstration that glycometabolism is a driver of degeneration.

      (see manuscript REF. 25, 9)

      Toegel et al., Arthritis Res. Ther. 2013. DOI:10.1186/ar4330

      Vignon et al., Clin Rheumatol. 1987. DOI:10.1007/BF02201026

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript "Articular cartilage corefucosylation regulates tissue resilience in osteoarthritis", the authors investigate the glycan structural changes in the context of pre-OA conditions. By mainly conducting animal experiments and glycomic analysis, this study clarified the molecular mechanism of N-glycan core fucosylation and Fut8 expression in the extracellular matrix resilience and unrecoverable cartilage degeneration. Lastly, a comprehensive glycan analysis of human OA cartilage verified the hypothesis.

      Strengths:

      Generally, this manuscript is well structured with rigorous logic and clear language. This study is valuable and important in the early diagnosis of OA patients in the clinic, which is a great challenge nowadays.

      We thank reviewer #3 for their interest in our work and their mainly positive report. This is precisely the purpose of our study, as we are primarily interested in the detection of conditions prior to the onset of OA.

      Weaknesses:

      I recommend minor revisions:

      (1) I would suggest the authors prepare an illustrative scheme for the whole study, to explain the complex mechanism and also to summarize the results.

      We would like to thank the reviewer for this comment and have created a new Figure 7 for the overall study scheme.

      We included the following statement in the opening discussion part:

      "The objective of this work was to provide novel and translational insights into pathogenesis of OA associated with changes in glycan structure. A graphical abstract summarizing our findings is shown in Fig. 7." (line199-201, p9)

      (2) Including but not limited to Figures 2A-C, Figures 3A and C, Figure 4B, and Figures 5A and D. The texts in the above images are too small to read, I would suggest the authors remake these images.

      The font size of the figures has been reviewed and revised throughout.

      (3) The paper is generally readable, but the language could be polished a bit. Several writing errors should be realized during the careful check.

      Thanks to your suggestion, I have noticed several writing errors. In addition, we have had the manuscript rewritten by an experienced scientific editor, who has improved the grammar and stylistic expression of the paper.

      (4) As several species and OA models were conducted in this study, it would be better if the authors could note the reason behind their choice for it.

      The authors agree with the reviewer's argument that since several species and OA models were performed in this study, it would be better to note the reason for their choice.

      We first attempted to inject mannosidase into rabbits, matching the animal species to a previous paper showing that N-glycans are altered prior to degeneration of the cartilage matrix. Next, we checked whether similar changes occur in mouse cartilage after mannosidase treatment, assuming that we would verify this in genetically engineered mice. We then used the integrated glycome in human cartilage to see if the corefucosylation phenomenon detected was conserved across species.

      For the modeling of OA in Fut8 cKO mice, the instability-induced OA model and the age-associated OA model were adapted. The former emphasizes mechanical stress factors in OA, the latter aging factors. OA is a multifactorial disease. Therefore, we thought it was appropriate to validate both aspects of OA.

      We included the following statements in each Methods part:

      "We injected mannosidase into rabbit knee joints in accordance with a previous paper showing that N-type glycans are altered prior to cartilage matrix degeneration." (line289-290, p12)

      "Organ culture experiments in mice were established to study the effects of mannosidase on articular cartilage without immunoreaction and in anticipation of later candidate gene research using transgenic mice." (line326-328, p14)

      "To determine whether the glycosylation detected is conserved across species, we analyzed the total glycome in human cartilage." (line407-408, p17)

      We included the following statements in the Discussion part:

      "For the modeling of OA in Fut8 cKO mice, the instability-induced OA model and the age-associated OA model were adapted. The former emphasizes mechanical stress factors in OA, the latter aging factors. OA is a multifactorial disease. Therefore, we thought it was appropriate to validate both aspects of OA." (line254-257, p11)

      Reviewer #1 (Recommendations For The Authors):

      (1) The cited literature states that core fucosylation by FUT8 has a chondroprotective effect via the TGF-β pathway and that the loss of these chondroprotective effects in Fut8 led to cartilage degeneration, but these need to be proven by experiment.

      We agree that corefucosylation and the TGF-β signaling pathway are important lines of investigation. We have now acknowledged this and added in the revised manuscript that additional experiments have shown that TGF-β restores the protective effects of Fut8 cKO cartilage by external administration.

      We included the following statements in the Results part:

      "To evaluate whether TGF-β1 decreases cartilage degeneration after mannosidase stimulation, TGF-β1 was exogenously added to Col2-Fut8−/− cartilage in the presence of α-mannosidase stimulation for 24 h. The samples treated with TGF-β1 leaked significantly less PG following mannosidase stimulation compared to samples not treated with TGF-β1 (Fig. 4F)." (line143-147, p6-7)

      We included the following statements in the Discussion part:

      "Here, the exogenous addition of TGF-β1 rescued them from cartilage degeneration." (line274-275, p12)

      (2) There are skeletal differences in cartilage-specific Fut8 KO mice compared to WT, and the effect of Fut8 on endochondral ossification should also be analyzed.

      We agree that Fut8 is associated with various endochondral ossification processes (for example by the TGF-β signaling pathway). Moreover, we would like to thank the reviewer for the proposed experiment.

      The growth curve was normal at birth, with differences beginning around weaning (~3 w for mice). Therefore, we evaluated the epiphyseal line of 4-week-old mice stained with toluidine, type 10 collagen, and proliferating cell nuclear antigen. This is similar to the epiphyseal growth plate phenotype of Smad3ex8/ex8 mice by Yang et al. and is consistent with the finding that Smad3 deficiency does not affect chondrogenesis during developmental stages, but the hypertrophic zone is increased in 3-4 week-old Smad3 KO mice. Chondrocytes in Fut8 cKO mice were suppressed of Tgf-β expression (Fig. 4D), suggesting that inhibition of TGF-β signaling, which is suppressive for late hypertrophic chondrocyte differentiation, led to the increased height of the hypertrophic zone.

      The results suggested that the growth plate of Fut8 cKO mice had an enlarged hypertrophic layer and decreased primary trabecular bone. Because these results have important implications for the content of the paper, we have included the staining results in Figure 5 and added a graph quantitatively assessing the extent of the hypertrophic zone as supplementary Figure S6.

      We included the following statement in the Results part:

      "To assess the role of FUT8 in endochondral ossification, we performed an epiphyseal plate analysis of 4-week-old Col2-Fut8−/− mice. This uncovered a significant enlargement of the zone of hypertrophic chondrocytes in the growth plates of the long bones of Col2-Fut8−/− mice compared to controls (Fig. 5C, S6 Figure)." (line154-158, p7)

      We included the following statement in the Discussion part:

      "The high-mannose/corefucosylation relationship estimated function to maintain formed cartilage. In endochondral ossification, the Fut8 cKO growth plate had an enlarged hypertrophic zone and reduced primary spongiosa because it is involved in the next process of cartilage replacement into bone rather than the process of cartilage formation." (line214-217, p9)

      Literature mentioned above (not included in manuscript):

      Yang X, et al. TGF-beta/Smad3 signals repress chondrocyte hypertrophic differentiation and are required for maintaining articular cartilage. J Cell Biol. 2001;153(1):35–46.

      (3) The DMM model analysis is performed with n=5 for each group. Please consider if the sample size is sufficient.

      In the literature, the sample sizes for DMM models have varied in previous studies (Doyran et al., n=5; Liao et al., n=6-7; Ouhaddi et al., n=8). Therefore, we performed a preliminary test of the DMM in WT and Flox mice with n=3 each and a power analysis with the outcome set to the OARSI score at 8 weeks. This resulted in n=4. The sample size for this study was increased to n=5 to account for attrition. The summed OARSI score of the WT in this study was comparable to that of Ouhaddi et al. and the model was judged to be working accurately. The summed OARSI score of the WT in this study was comparable to that of Ouhaddi et al. and the model was judged to be working accurately. The summed OARSI score of the WT in this study was comparable to that of Ouhaddi et al. and the model was judged to be working accurately.

      Literature mentioned above (not included in manuscript):

      (1) Doyran B, Tong W, Li Q, Jia H, Zhang X, Chen C, et al. Nanoindentation modulus of murine cartilage: a sensitive indicator of the initiation and progression of post-traumatic osteoarthritis. Osteoarthr Cartil. 2017;25(1):108–17.

      (2) Liao L, Zhang S, Gu J, Takarada T, Yoneda Y, Huang J, et al. Deletion of Runx2 in Articular Chondrocytes Decelerates the Progression of DMM-Induced Osteoarthritis in Adult Mice. Sci Rep. 2017 24;7(1):2371.

      (3) Ouhaddi Y, Nebbaki SS, Habouri L, Afif H, Lussier B, Kapoor M, et al. Exacerbation of Aging-Associated and Instability-Induced Murine Osteoarthritis With Deletion of D Prostanoid Receptor 1, a Prostaglandin D2 Receptor. Arthritis Rheum. 2017;69(9):1784–95.

      Reviewer #2 (Recommendations For The Authors):

      This paper is suitable for publication in clinical Journals related to osteoarthritis and cartilage.

      Identification of core fucosylated glycans from chondrocytes is essential for this type of paper.

      We mentioned that we had identified similar corefucosylated glycans in isolated mouse chondrocytes from the cartilage (line117-118, p5), but we have now also added the following to the subtitle of the Results section to avoid any potential confusion: "Corefucosylated N-glycan was formed in resilient cartilage and its isolated chondrocyte" (line109, p5)

      Thank you again for your comments on our paper. We trust that the revised manuscript is suitable for publication.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript describes fundamental single-molecule correlative force and fluorescence microscopy experiments to visualize the 1D diffusion dynamics and long-range nucleosome sliding activity of the yeast chromatin remodelers, RSC and ISW2. Compelling evidence shows that both remodelers exhibit 1D diffusion on bare DNA but utilize different mechanisms, with RSC primarily hopping and ISW2 mainly sliding on DNA. These results will be of interest to researchers working on chromatin remodeling.

      Reviewer #1 (Public Review):

      Single-molecule visualization of chromatin remodelers on long chromatin templates-a long sought-after goal-is still in its infancy. This work describes the behaviors of two remodelers RSC and ISW2, from SWI/SNF and ISWI families respectively, with well-conducted experiments and rigorous quantitative analysis, thus representing a significant advance in the field of chromatin biology and biophysics. Overall, the conclusions are supported by the data and the manuscript is clearly written. However, there are a few occasions where the strength of the conclusion suffers from low statistics. Some of the statements are too strong given the evidence presented.

      We thank the reviewer for the thorough and considerate review of our manuscript. We have increased the statistics when possible and have toned down the conclusions wherever further experimentation to improve statistics could not be done expeditiously.

      Specific Comments:

      (1) It is confusing what is the difference between the "non-diffusive" behavior of the remodeler upon nucleosome encounter and the nucleosome-translocating behavior in the presence of ATP. For example, in Figure 3F, readers can see a bit of nucleosome translocation in the first segment. Is the lower half-life of "non-diffusive" ISW2 with ATP on a nucleosome array because it is spending more time translocating nucleosomes? The solid and dashed green lines in Figure 3F and 3G are not explained. It is also not explained why Figure 3H and 3I are fit by double exponentials.

      We thank the reviewer for calling upon us to clarify these points. In both the case of translocation and stable non-translocating colocalization, the chromatin remodeler is marked as “non-diffusive” because the molecule is not moving quickly enough to be detected by our rolling-window (20 frames considered) diffusion coefficient analysis. We have updated the text to point out the translocation that is occurring in the panels indicated and noted that this type of motion is not detected by our automated analysis. Thus, translocation events were manually segmented for analysis from kymographs; a note of this was added to the results section (Results section # 1; Paragraph # 2).

      To address the question of whether the half-life of “non-diffusive” ISW2 with ATP on the nucleosome array is because of increased time spent in translocation, we have computed the percentage of “non-diffusive” time spent translocating in the presence of ATP for both remodelers; for ISW2, 14% of “non-diffusive” times are translocation whereas for RSC, 28% of “non-diffusive” times are translocation. Given that these percentages are not negligible, the reviewer helped identify an important parameter that better describes the effects of ATP hydrolysis on nucleosome binding for ISW2. In addition, we computed and compared the half-life of translocation times for both remodelers to the “non-diffusive” times and found that RSC translocates with a half-life of 20 s (similar to the half-life of “non-diffusion”) whereas ISW2 translocates with a half-life of 17 s (longer than the half-life of “non-diffusion”). We believe that this new information improves understanding of the role of ATP hydrolysis in turning over ISW2-nucleosome binding interactions, which result in the shorter “non-diffusive” lifetime as well as the shorter and more rarely observed ISW2 translocation events. We have updated the text to include these observations and our interpretation (Results section # 3; Paragraph # 3). As was already included in the text (Results section # 3; Final Paragraph), we speculate that this behavior may be due to a hydrolysis-dependent turnover of the ISW2-nucleosome bound state and refer the reader to Tim Richmond’s 2004 EMBO paper titled “Reaction cycle of the yeast Isw2 chromatin remodeling complex” in which bulk experiments show that ATP hydrolysis affects ISW2-nucleosome bound lifetimes.

      We thank the reviewer for also pointing out where details were missing from the figure legend and results section regarding Figure 3. We have added a description of the dashed and solid lines to the figure legend (Figure 3; Legend). We have also described why Figures 3H and I are fit to double exponentials to the results section (Results section # 3; Paragraph # 2).

      (2) What is the fraction of 1D vs. 3D nucleosome encountered by the remodelers? This is an important parameter to compare between RSC and ISW2.

      We thank the reviewer for raising this point. We agree that this is an important parameter to compare between RSC and ISW2; knowledge of this parameter would enable quantitative predictions to be made from our data regarding target localization efficiency increases owed to 1D scanning for each remodeler. We regretfully could not quantify this due to technical limitations of our measurements. A note about this limitation along with an explanation for why we were unable to quantify this parameter have been added to the main text (Results section # 3; end of Paragraph # 1).

      (3) A major conclusion stated repeatedly in the manuscript is that nucleosome translocation by a remodeler is terminated by a downstream nucleosome. But this is based on a total of 4 events. The problem of dye photobleaching was mentioned, which is a bit surprising considering that the green excitation was already pulsed. The authors should try to get more events by lowering the laser power or toning down the conclusion that translocation termination is prominently due to blockage by a downstream nucleosome. Quantifying the translocation distances before termination, in addition to the durations (Figure 4G and 4H), would also be helpful.

      We thank the reviewer for these observations and feedback. We agree that only 4 observations of direct visualization of remodeler translocation termination by a downstream nucleosome is a small n-value, and have chosen to omit presentation of these rare events in the manuscript.

      (4) The claim on nucleosome translocation directionality is also based on a small number of events, particularly for RSC. 6/9 is hardly over 50% if one considers the Poisson counting error (RSC was also found to switch directions.) If the authors would like to make a firm statement to support the "push-pull" model, they should obtain more events.

      We thank the reviewer for this critique and agree with the reviewer’s concern. In addition to adding data from two additional experimental replicates of RSC nucleosome translocation (which had the smaller n-value), we have also re-evaluated all events containing translocation for additional evidence in support or against the “push-pull” model. Previously we were only considering events where 1D diffusion on DNA leads immediately to translocation. Now we add the following categories to the count: (1) events where translocation terminates with the remodeler dissociating from the nucleosome and performing a 1D diffusive search, (2) events where 1D diffusion on DNA leads to association with a nucleosome and after a paused colocalization we observe translocation, and (3) the inverse scenario of (2) (see schematics in Figure 5 – figure supplement 1). These new results, detailed below, are now included in place of the older results in (Results Section # 5; Paragraph # 2). Furthermore, we toned down our argument and clarified that a larger n-value would be needed to be definitive, especially since we observe RSC switching directions, as the reviewer points out.

      By aggregating in new RSC data and using only events where 1D diffusion leads immediately to translocation, we observe 10/12 events in support of the “push” model. If we include these other categories in addition to aggregating the previous data with the new data, a total of 20/25 events are in support of the “push” model. For RSC, the breakdown in the other categories was as follows: (1) 7/10 events, (2) 1/1 events with a paused time of 5 seconds, and (3) 2/2 events with a paused time of 36 and 50 seconds.

      For ISW2, we had previously reported 12/13 events where 1D search lead immediately to translocation. After combing through the data a second time, we decided to omit two events which were less clear; Now we report 10/11 events in support of the “pull” model from this initial category. If we include these other categories in addition to the original, a total of 19/21 events are in support of the “pull” model. For ISW2, the breakdown in the other categories was as follows: (1) 4/4 events, (2) 4/4 events with pause times of 44, 27, 29, and 8 seconds, (3) 1/2 events with paused times of 5 and 19 seconds.

      (5) At 5 pN of tether tension, the outer wrap of nucleosomes is destabilized, which could impact nucleosome translocation dynamics. Additionally, a low buffer flow was kept on during data acquisition, which could bias remodeler diffusion behavior. The authors should rule out or at a minimum discuss these possibilities.

      We thank the reviewer for raising the important point regarding outer wrap destabilization of the nucleosome occurring at 5pN of tension. We have added an additional section to the discussion that reviews the literature on tension effects on nucleosome stability as well as what is currently known of the effects of tension on remodeler translocation on DNA (Discussion Paragraph # 3). While we cannot exclude the possibility that the 5pN of tension used in this study is a causative factor of the observed fast speed or high processivity nucleosome translocation that we report, we believe that with the modifications made to the text to emphasize to the reader of these possibilities, the reader can draw informed conclusions on the significance of our findings. The topic of force effects on remodeling outcomes is an interesting subject for the future.

      We apologize that the experimental details on buffer flow used during imaging was unclear in our initial submission; we do not have buffer flowing during imaging, rather the buffer containing protein is flowed over the DNA at low pressure just prior to imaging. The flow is completely stopped before the DNA or nucleosome array is stretched to 5pN of tension for imaging (See Methods section: Single Molecule Tracking and Analysis).

      Reviewer 1 (Recommendations For The Authors):

      (1) The figure panels could be better arranged to focus on the main messages of the paper.

      (i) Figure 3C-E should go to a supplemental figure.

      We thank the reviewer for this helpful suggestion. As recommended, we moved Figure 3C to the supplemental figure as this panel did not pertain to the main message of the paper.

      (ii) Figure 4 could be split into two figures, one characterizing processive nucleosome translocation (4C, D, G, H, I, J, K, and relevant panels in S4), and the other showing the differential directionality of each remodeler (4E, F, L, and relevant panels in S4).

      We thank the reviewer for their suggestions that help better organize our presentation of the data. As the reviewer suggests, we split figure 4 into two figures: figure 4 which now focuses on translocation characterization and figure 5 which now focuses on the differential directionality of each remodeler.

      (iii) The nucleotide condition should be clearly indicated in the figures or legends. For example, it is unclear if the data in Figure 2 were generated with or without ATP.

      We thank the reviewer for taking note of this. We have added clear indications of the nucleotide condition to figures where this is relevant, including in Figure 2 as indicated.

      (iv) There are many cartoon panels, and some are redundant (e.g., Figure 1A and 1B, Figure 3A and 3B).

      We thank the reviewer for bringing up this point. We agree that some cartoons are redundant. We have eliminated Figure panel 1B and Figure panel 3A of the original figures from the new figures.

      (2) The last paragraph of the Results section should be moved to Discussion. This paper did not directly address the effects of RSC/ISW2 on NDR length.

      We thank the reviewer for this suggestion. We agree and have moved the last paragraph of the Results section to the Discussion..

      (3) There are some typos in the text. For example, "Of the two main types of 1D diffusion, hopping and sliding" is not a complete sentence.

      We thank the reviewer for catching this typo and bringing our attention to others. Upon a more careful proofreading of the text and figures we have caught and amended this and other typos.

      (4) What are the green lines in Figure S1F?

      We thank the reviewer for asking this question. The green lines were meant emphasize how the percentage of traces in the majority high diffusion category increases for RSC but not for ISW2 in response to increases in the KCl concentration. Since this was confusing, we removed these green lines.

      Reviewer # 2 (Public Review):

      Summary:

      The authors use a dual optical trap instrument combined with 2-color fluorescence imaging to analyze the diffusion of RSC and ISW2 on DNA, both in the presence and absence of nucleosomes, as well as long-range nucleosome sliding by these remodelers. This allowed them to demonstrate that both enzymes can participate in 1D diffusion along DNA for rather long ranges, with ISW2 predominantly tracking the DNA strand, while RSC diffusion involves hopping. In an elegant two-color assay, the authors were able to analyze interactions of diffusing remodeler molecules, both of the same or different types, observing their collisions, co-diffusion, and bypassing. The authors demonstrate that nucleosomes act as barriers for remodeler diffusion, either repelling or sequestering them upon collision. In the presence of ATP, they observed surprisingly processive unidirectional nucleosome sliding with a strong bias in the direction opposite to where the remodeler approached the nucleosome from for ISW2. These results have fundamentally important implications for the mechanism of nucleosome positioning at promoters in vivo, will be of great interest to the scientific community, and will undoubtedly spark exciting future research.

      Strengths:

      The mechanism of target search for chromatin-interacting protein machines is a 'hot' topic, and this manuscript provides extremely important and timely new information about how RSC and ISW2 find the nucleosomes they slide. Intriguingly, although both remodelers analyzed in this study can diffuse along DNA, the diffusion mechanisms are substantially different, with extremely interesting mechanistic implications.

      The strong directional preference in nucleosome sliding by ISW2 dictated by the direction it approaches the nucleosomes from during 1D sliding on DNA is a very intriguing result with interesting implications for the regulation of nucleosome organization around promoters. It will be of great interest to the scientific community and will undoubtedly inspire future research.

      Relatively little is known about nucleosome sliding at longer ranges (>100bp), and this manuscript provides a unique view into such sliding and also establishes a versatile methodology for future studies.

      Weaknesses:

      All measurements were conducted at 5pN tension, which induces unwrapping of the outer DNA gyre from nucleosomes. This could potentially represent a limitation for experiments involving nucleosomes, since partial nucleosome unwrapping could affect the behavior of remodelers, especially their sliding of nucleosomes.

      We thank the reviewer for succinctly summarizing the strengths and weaknesses of our study. We have changed the Discussion to better review the literature on the effects of 5pN of tension on nucleosome wrapping and have more clearly presented the limitations of our studying owing to our conducting measurements at 5pN of tension. In doing so, we have tried to emphasize the strengths of our study identified by the reviewer and better inform the reader of the weaknesses.

      Reviewer #2 (Recommendations For The Authors):

      Although not required, nucleosome sliding data under lower tensions (e.g., <=2pN) could be a valuable addition to the manuscript. Indeed, to my knowledge, there is no data on force-dependent rates of nucleosome sliding, so a conclusive demonstration of changes in remodeling rate with tension would be an exciting new result and might be discussed in the context of a potential tension in chromatin. If such experiments cannot readily be added, the authors could alternatively discuss this potential limitation in more detail.

      We thank the reviewer for this suggestion. We agree that adding data at lower tensions (<= 2pN) would have been valuable. Due to time constraints, this will be the subject for the future. We agree that knowledge of the effects of tension would be especially interesting in light of the possibility that tension on chromatin in cells may be affecting remodeler function. We have added a discussion of this potential significance of future work to the discussion (Discussion Section; Paragraph # 3). We have also elaborated on the potential limitation of only conducting measurements at 5pN to the discussion (Discussion Section; Paragraph # 3), as the reviewer recommends.

      The quantitative implications of the proposed mechanism for targeting ISW2 and RSC towards +1 and -1 nucleosomes are highly interesting. To further strengthen the mechanistic implications, the authors could consider quantitatively analyzing how the observed 1D diffusion would affect the probabilities of binding to +1 and -1 versus to other nucleosomes.

      We thank the reviewer for their thoughtful suggestion. While we would have liked to present a final quantitative model that integrates the experimental parameters on 1D diffusion that we present in this study with the parameters extracted from live cell single particle tracking studies, there are key parameters for model building that are missing from our study, due to technical limitations. Namely, we were not able to quantify the fraction of 1D vs 3D nucleosome encounters by remodelers, because the majority of the protein that we image has been bound before the start of imaging; very few proteins bind the nucleosome arrays after the start of imaging as the protein concentration in the imaging chamber is very low. This makes observing binding directly to a nucleosome a very rare event, especially due to the sparse density of nucleosomes (~10) on the array (~50,000 kb).

      The low-diffusion state is intriguing - could the authors speculate about the nature of this state?

      We thank the reviewer for the question. We had added some speculation about the nature of the low-diffusion state to the results section (Results Section # 1; Paragraph 2). One thought that we have is that this may be due to more stable interactions made between remodelers and free DNA when they become trapped in a conformation state that binds more tightly to DNA. Conformational changes may result in different scanning speeds for chromatin remodelers; e.g. SWR1 was shown to scan DNA quicker when bound to ATP (Carcamo, C. et al. eLife 2022). Another possibility is that certain sequences due to their intrinsic curvature, for instance, or their AT-content may trap the remodeler which may make more contacts with the DNA at these sites.

      Minor points:

      Information on the labeling efficiencies for the remodelers would be helpful.

      We thank the reviewer for pointing this out. We assessed labeling saturation by running gels of remodeler labeling with increasing molar ratios of dye to protein and did not observe increased labeling efficiency above the molar ratio used for proteins imaged in our study (see added Figure 1 – figure supplement 1, panel A). From this, we assessed that we have high protein labeling efficiency. We could not assess the labeling efficiency using the standard absorbance method as the extinction coefficient for JFX650 was measured with 1% v/v TFA (PMCID: PMC8154212) which is not compatible for use in assessing our protein labeling efficiency in an aqueous buffer.

      How were the experimental conditions adjusted for two-color diffusion experiments in order to optimize the probability of observing two remodeler molecules with different labels at the same time.

      We thank the reviewer for this clarifying question. To image both remodelers on the same DNA, we combined the remodelers using the same concentrations that produced single molecule densities when the remodelers were imaged separately. We have clarified this point in the Methods section: “Bimolecular Remodeler-Remodeler Imaging and Interaction Analysis”.

      The authors should check the figures for consistency of labeling and provide definitions for abbreviations used in them (e.g. CDF and PDF).

      We thank the reviewer for catching inconsistencies in labeling in our figures. We have updated the figures such that there is consistent labeling throughout. We have also provided definitions for abbreviations such as Cumulative Distribution Function (CDF) and Probability Distribution Function (PDF) in the figure legends where applicable.

      In the section "Remodeler-remodeler collisions during 1D search" (4th line from the end) reference to Fig3D seems to be out of place.

      We thank the reviewer for catching this typo. We have reworded this section such that each figure panel can be discussed sequentially, eliminating this out of place reference to Fig 3D.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thorough reading and helpful comments which has allowed us to further improve the manuscript. Following the suggestions of the reviewers we have run a number of new simulations including mutations of the PIP binding residues and with an elastic network allowing more mobility of the linker. Together these excellent ideas have allowed us to strengthen the conclusions of the study. Below, we provide point-by-point responses to their suggestions.

      Reviewer #1 (Public Review):

      Summary:

      Here, the authors were attempting to use molecular simulation or probe the nature of how lipids, especially PIP lipids, bind to a medically-important ion channel. In particular, they look at how this binding impact the function of the channel.

      Strengths:

      The study is very well written and composed. The techniques are used appropriately, with plenty of sampling and analysis. The findings are compelling and provide clear insights into the biology of the system.

      Weaknesses:

      A few of the analyses are hard to understand/follow, and rely on "in house" scripts. This is particularly the case for the lipid binding events, which can be difficult to compute accurately. Additionally, a lack of experimental validation, or coupling to existing experimental data, limits the study.

      Our analysis scripts have now been made publicly accessible as a Jupyter notebook on Github https://github.com/etaoster/etaoster.github.io/tree/main/nav_pip_project

      It is my view that the authors have achieved their aims, and their findings are compelling and believable. Their findings should have impacts on how researchers understand the functioning of the Nav1.4 channel, as well as on the study of other ion channels and how they interact with membrane lipids.

      Reviewer #2 (Public Review):

      Summary:

      Y., Tao E., et al. used multiscale MD simulations to show that PI(4,5)P2 binds stably to an inactivated state of Nav channels at a conserved site within the DIV S4-S5 linker, which couples the voltage sensing domain (VSD) to the pore. The authors hypothesized that PI(4,5)P2 prolongs inactivation by binding to the same site where the C-terminal tail is proposed to bind during recovery from inactivation. They convincingly showed that PI(4,5)P2 reduces the mobility of both the DIV S4-S5 linker and the DIII-IV linker, thus slowing the conformational changes required for the channel to recover to the resting state. They also conducted MD simulations to show that phosphoinositides bind to VSD gating charges in the resting state of Nav channels. These interactions may anchor VDS at the resting state and impede its activation. Their results provide a mechanism by which phosphoinositides alter the voltage dependence of activation and the recovery rate from inactivation, an important step for developing novel therapies to treat Nav-related diseases. However, the study is incomplete and lacks the expected confirmatory studies which are relevant to such proposals.

      Strengths:

      The authors identified a novel binding between phosphoinositides and the VSD of Nav and showed that the strength of this interaction is state-dependent. Based on their work, the affinity of PIPs to the inactivated state is higher than the resting state. This work will help pave the way for designing novel therapeutics that may help relieve pain or treat diseases like arrhythmia, which may result from a leftward shift of the channel's activation.

      Weaknesses:

      However, the study lacks the expected confirmatory studies which are relevant to such proposals. For example, one would expect that the authors would mutate the positive residues that they claim to make interactions with phosphoinositides to show that there are much fewer interactions once they make these mutations. Another point is that the authors found that the main interaction site of PIPs with Nav1.4 is the VSD-DIV and DIII-DIV linker, an interaction that is expected to delay fast inactivation if it happens at the resting state. The authors should make a resting state model of the Nav1.4 channel to explain the recent experimental data showing that PIP2 delays the activation of Nav1.4, with almost no effect on the voltage dependence of fast inactivation.

      Following the reviewers suggestion we have conducted new simulations demonstrating that there are many fewer protein-PIP interactions after mutating the positive residues as shown in the new Supplementary Fig S6.

      The reviewer mentions that if PIPs interact with the VSD-DIV and DIII-DIV linker in the resting state that it could delay fast inactivation. However, as described in the original manuscript and depicted in the schematic (Fig 7) the C-terminal domain impeded PIP binding at the position in the resting state (but not the inactivated state), meaning that PIP does not bind in the resting state to delay fast inactivation. We have clarified this statement in the text on page 14 lines 1-2.

      Following the reviewer’s suggestion we have examined PIP binding to a model of the resting state of Nav1.4 (in addition to the resting state of Nav1.7 described in the original manuscript) as described on page 12 lines 28-30 (and in Fig S12). Similar to what we saw for Nav1.7, PIP binding to VSDI-III can impair activation of the channel.

      Major concern:

      (1) Lack of confirmatory experiments, e.g., mutating the positive residues that show a high affinity towards PIPs to a neutral and negative residue and assessing the effect of mutagenesis on binding.

      Done as described above

      (2) Nav1.4 is the only channel that has been studied in terms of the effect of PIPs on it, therefore the authors should build a resting state model of Nav1.4 and study the effect of PIPs on it.

      Done as described above

      Minor points:

      There are a lot of wrong statements in many areas, e.g., "These diseases 335 are associated with accelerated rates of channel recovery from inactivation, consistent with our observations that an interaction between PI(4,5)P2 and the residue corresponding to R1469 in other Nav 337 subtypes could be important for prolonging the fast-inactivated state." Prolonging the fast inactivated state would actually reduce recovery from inactivation and not accelerate it.

      We disagree with this statement from the reviewer which may have come from a misreading of the mentioned sentence. Our statement in the original manuscript is consistent with the original experiments that show that the presence of PIP prolongs the time spent in the fast inactivated state. Mutations at the PIP binding site are likely to reduce PIP binding, and with less PIP bound the channel is expected to recover from inactivation more quickly. We have reworded this sentence for clarity on page 13 line 27-30.

      Reviewer #3 (Public Review):

      Summary:

      This work uses multiscale molecular dynamics simulations to demonstrate molecular mechanism(s) for phosphatidylinositol regulation of voltage gated sodium channel (Nav1.4) gating. Recent experimental work by Gada et al. JGP 2023 showed altered Nav1.4 gating when Nav1.4 current was recorded with simultaneous application of PI(4,5)P2 dephosphorylate. Here the authors revealed probable molecular mechanism that can explain PI(4,5)P2 modulation of Nav1.4 gating. They found PIP lipids interacting with the gating charges - potentially making it harder to move the voltage sensor domain and altering the channels voltage sensitivity. They also found a stable PIP binding site that reaches the D_IV S4-S5 linker, reducing the mobility of the linker and potentially competing with the C-terminal domain.

      Strengths:

      Using multiscale simulations with course-grained simulations to capture lipid-protein interactions and the overall protein lipid fingerprint and then all-atom simulations to verify atomistic details for specific lipidprotein interactions is extremely appropriate for the question at hand. Overall, the types of simulation and their length are suitable for the questions the authors pose and a thorough set of analysis was done which illustrates the observed PIP-protein interactions.

      Weaknesses:

      Although the set of current simulations and analysis supports the conclusions drawn nicely, there are some limitations imposed by the authors on the course-grained simulations. If those were not imposed, it would have allowed for an even richer set and more thorough exploration of the protein-lipid interactions. The Martini 2 force field indeed cannot change secondary structure but if run with a properly tuned elastic network instead of backbone restraints, the change in protein configuration can be sampled and/or some adaptation of the protein to the specific protein environment can be observed. Additionally, with the 4to1 heavy atoms to a bead mapping some detailed chemical specificity is averaged out but parameters for different PIP family members do exist - including specific PIP(4,5)P2 vs PIP(3,4)P2, and could have been explored.

      We thank the reviewer for their excellent suggestions and have run new simulations with an elastic network instead of backbone restraints which have generated new insights. Indeed, as shown in the new panel Fig 4E, the new data allows us to demonstrate that the presence of PIP in the proposed binding site stabilises binding of the DIII-DIV linker to the inactivation receptor site, strengthening the conclusions of the paper.

      We thank the reviewer for pointing out that there do exist parameters for different PIP sub-species and have corrected our statement on page 14 line 16 to reflect this. We have not run additional CG simulations with each of these parameters but use the all-atom simulations to examine the interactions of phosphates at specific positions.

      In our atomistic simulations, we backmapped both PI(4,5)P2 and PI(4)P in the binding site to study their specific interactions. We chose to focus on PI(4,5)P2 given its physiological significance. However, we agree that differences in binding with PI(3,4)P2 would be interesting and warrants future investigation. We also note that the newer Martini3 forcefield would be useful in further work to differentiate between PIP subspecies interactions.

      Detailed Comments

      We thank the reviewers for their thorough reading and helpful comments which has allowed us to further strengthen the manuscript. Below, we provide point-by-point responses to their suggestions.

      Reviewer #1 (Recommendations For The Authors):

      I don't have many suggestions for the manuscript, just a few text edits. Of course, experimental analysis would bolster the claims made in the text, but I don't believe that this is necessary, given the quality of the data.

      I understand the focus on the PIP lipids, but it's a shame that the high binding likelihood of glycosphingolipid isn't considered or analysed in any way. This is an especially interesting lipid from the point-of-view of raftlike membrane domains. Given the potential role of raft-like domains in sodium channel function, I feel this would be worth a paragraph or two in the discussion.

      We thank the reviewer for bringing our attention to this interesting point. Glycolipids accumulate around Nav1.4 in our complex membrane simulations, however, given reports that carbohydrates tend to interact too strongly in the Martini2.2 forcefield (Grünewald et al. 2022, Schmalhorst et al. 2017) and there are no specific residues on Nav1.4 that interact preferentially with glycolipid species, we chose not to focus on this. However, we have noted that interactions with other lipids deserve further attention in our revised discussion.

      The analyses have been run using Martini 2. I don't suggest the authors repeat using the Martini 3 force field, but some mention of this in the discussion would be good.

      We have added the following statement to the discussion: “Our coarse grain simulations were carried out using the Martini2.2 forcefield, for which lipid parameters for many plasma membrane lipids have been developed. We expect that future investigations of lipid-protein interactions will benefit from use of the newer, refined Martini 3 forcefield (Souza et al. 2021) as parameters become available for more lipid types.

      This might just be an oversight, but no mention is made of an elastic network applied to the backbone beads.

      Lack of a network has been known to cause the protein to collapse, so if this is missing, I'd like to see an RMSD to show that the protein dynamics are not compromised.

      While no elastic network was used in our original CG simulations, weak protein backbone restraints (10 kJ mol-1 nm-2) used in our simulations allowed us to maintain the structure while allowing some protein movement. However, following the suggestion of reviewer 3, we conducted additional simulations with an elastic instead of backbone restraints as described in the results on page 9 line 30-37 (and in Fig 4E) of the revised manuscript.

      Minor

      •In Fig 3B, are these lipids binding to the channel at the same time? And therefore do the authors see cooperativity?

      The Fig 3B caption has been amended in the revised manuscript to read “Representative snapshots from the five longest binding events from different replicates, showing the three different PIP species (PIP1 in blue, PIP2 in purple and PIP3 in pink) binding to VSD-IV and the DIII-IV linker.” We cannot comment on PIP cooperativity based on these simulations shown in Fig 3, due to the artificially high concentrations used here; however, in model complex membrane simulations we see co-binding of PIPs at the binding site. This is likely due to PIP’s ability to accumulate together and the high density of positively charged residues in the region, attracting and supporting multiple PIP bindings.

      •What charges were used for the atomistic PIP lipids? Does this match the CG lipids?

      We used the CHARMM-GUI PIP parameters for the atomistic simulations. SAPI24 (PIP2) has a headgroup charge of –4e which is one less negative charge than the CG PIP2; whereas SAPI14 (PIP1) has a charge of –3e which is the same as the CG PIP1. We have explicitly included this charge information in the updated Methods of the manuscript (on page 15-16).

      •Line 259-260: "we performed embedded three structures"

      Corrected in the revised manuscript.

      •Line 272: "us" should be "µs"

      Corrected in the revised manuscript.

      •Line 434: kJ/mol should probably also have 'nm-2' included

      Corrected in the revised manuscript.

      •What charge state titratable residues were set to, and were pKa analyses done to decide this?

      Charge states were assigned to default values at neutral pH. We appreciate that future studies could examine this more carefully using constant pH simulations or similar.

      •It's stated that anisotropic scaling is used the AT sims - is this correct? If so, is there a reason this was chosen over semi-isotropic scaling?

      Anisotropic scaling was used for the atomistic simulations allowing all box dimensions to change independently.

      •I would recommend in-house analysis scripts are made available on GitHub or similar, just so the details can be seen.

      Per the reviewer’s request, the Jupyter notebooks used for analysis has been made available on GitHub (https://github.com/etaoster/etaoster.github.io/tree/main/nav_pip_project ).<br /> -One coarse grained notebook:

      • Lipid DE

      • Contact occupancy + outlier plots

      • Binding duration plots

      • Minimum distance plots

      • Number of ARG/LYS plots

      • PIP Occupancy, binding duration, gating charge residues

      • One atomistic notebook:

      • RMSD, RMSF and distance between IFM and its binding pocket (using MDAnalysis)

      • Atomistic PIP headgroup interaction analyses and plots (using ProLIF)

      As a final note, I am NOT saying this needs to be done for the current study, but I recommend the authors try the PyLipID package (https://github.com/wlsong/PyLipID) if they haven't yet, as it might be useful for similar projects they run in the future (i.e. for binding site identification, accurate binding kinetics calculations, lipid pose generation etc.).

      We thank the reviewer for this suggestion and will keep this in mind for future projects.

      Reviewer #2 (Recommendations For The Authors):

      Lin Y., Tao E., et al. used multiscale MD simulations to show that PI(4,5)P2 binds stably to an inactivated state of Nav channels at a conserved site within the DIV S4-S5 linker, which couples the voltage sensing domain (VSD) to the pore. The authors hypothesized that PI(4,5)P2 prolongs inactivation by binding to the same site where the C-terminal tail is proposed to bind during recovery from inactivation. They convincingly showed that PI(4,5)P2 reduces the mobility of both the DIV S4-S5 linker and the DIII-IV linker, thus slowing the conformational changes required for the channel to recover to the resting state. They also conducted MD simulations to show that phosphoinositides bind to VSD gating charges in the resting state of Nav channels. These interactions may anchor VDS at the resting state and impede its activation. Their results provide a mechanism by which phosphoinositides alter the voltage dependence of activation and the recovery rate from inactivation, an important step for developing novel therapies to treat Nav-related diseases. However, the study is incomplete lacks the expected confirmatory studies which are relevant to such proposals.

      The authors identified a novel binding between phosphoinositides and the VSD of Nav and showed that the strength of this interaction is state-dependent. Based on their work, the affinity of PIPs to the inactivated state is higher than the resting state. This work will help pave the way for designing novel therapeutics that may help relieve pain or treat diseases like arrhythmia, which may result from a leftward shift of the channel's activation. However, the study lacks the expected confirmatory studies which are relevant to such proposals. For example, one would expect that the authors would mutate the positive residues that they claim to make interactions with phosphoinositides to show that there are much fewer interactions once they make these mutations. Another point is that the authors found that the main interaction site of PIPs with Nav1.4 is the VSD-DIV and DIII-DIV linker, an interaction that is expected to delay fast inactivation if it happens at the resting state. The authors should make a resting state model of the Nav1.4 channel to explain the recent experimental data showing that PIP2 delays the activation of Nav1.4, with almost no effect on the voltage dependence of fast inactivation.

      Major concern:

      (1) Lack of confirmatory experiments, e.g., mutating the positive residues that show a high affinity towards PIPs to a neutral and negative residue and assessing the effect of mutagenesis on binding.

      (2) Nav1.4 is the only channel that has been studied in terms of the effect of PIPs on it, therefore the authors should build a resting state model of Nav1.4 and study the effect of PIPs on it. Minor points:

      Following the reviewer’s suggestion we have conducted new simulations demonstrating that there are notably fewer protein-PIP interactions after performing charge neutralizing and charge reversal mutations to the positive residues as shown in the new Fig S6.

      The reviewer mentions that if PIPs interact with the VSD-DIV and DIII-DIV linker in the resting state that it could delay fast inactivation. However as described in the original manuscript and depicted in the schematic (Fig 7) the C-terminal domain impeded PIP binding at the position in the resting state (but not the inactivated state), meaning that PIP does not bind in the resting state to delay fast inactivation. We have clarified this statement in the text on page 14 lines 1-2.

      Following the reviewers suggestion we have examined PIP binding to a model of the resting state of Nav1.4 (in addition to the resting state of Nav1.7 described in the original manuscript) as described on page 12 lines 28-30 (and in Fig S12). Similar to what we saw for Nav1.7 PIP binding to VSDI-III can impair activation of the channel.

      There are a lot of wrong statements in many areas, e.g., "These diseases 335 are associated with accelerated rates of channel recovery from inactivation, consistent with our observations that an interaction between PI(4,5)P2 and the residue corresponding to R1469 in other Nav 337 subtypes could be important for prolonging the fast-inactivated state." Prolonging the fast inactivated state would actually reduce recovery from inactivation and not accelerate it.

      We disagree with this statement from the reviewer which may have come from a misreading of the mentioned sentence. Our statement in the original manuscript is consistent with the the original experiments that show that the presence of PIP prolongs the time spent in the fast inactivated state. Mutations at the PIP binding site are likely to reduce PIP binding, and with less PIP present the channel will recover from inactivation more quickly. We have reworded this sentence for clarity on page 13 line 27-30.

      Reviewer #3 (Recommendations For The Authors):

      As mentioned in the public review, overall, I am impressed with the manuscript and do think the conclusions are supported. There are, however, quite a few mistakes, mostly minor (listed below). Additionally, I do have a few questions and several extensions that could be done and I mention a few but fully realize many of those could be outside of the scope of the current manuscript.

      We greatly appreciate the time taken by Reviewer 3 to carefully review our manuscript and provide detailed comments. We believe their suggestions have helped to improve our manuscript.

      First comments are in general about the PIP subtype.

      • In the paper you claim:

      L196, "However, this loss of resolution prevents distinction between phosphate positions on the inositol group and does not permit analysis of protein conformational changes induced by PIP binding"

      L367, "it does not distinguish between phosphate positions within each charge state (e.g. PI(3,4)P2 vs PI(4,5)P2)."

      This is not true the PIP2 most commonly used in Martini 2 is from dx.doi.org/10.1021/ct3009655 and is a PI(3,4)P2 subtype. Also other extensions and alternative parameters exist for PIPs in Martini 2 e.g. http://cgmartini.nl/index.php/tools2/other-tools - Martini lipid .itp generator has all three main variants of both PIP1 and PIP2.

      As described in the response to the public review we are grateful for the reviewer for pointing out that there do exist parameters for different PIP sub-species and have corrected our statement on page 14 to reflect this, and clarified the parameters chosen in the methods section (page 16 line 2-3). We have not run additional CG simulations with each of these parameters in the current work but use the all-atom simulations to examine the interactions of phosphates at specific positions.

      • One detail that is missing in the manuscript is some mention of the charge state of the PIPs e.g. Fig.1D does not specify and Fig.4D PIP2 looks like -2 on position 5 and -1 on position 4. Which I think fits the used SAPI24, please specify. Also, what if you use SAPI25 with the flipped charges would that significantly alter the results?

      The charge state of PIP2 is -2e on the 5’ phosphate and -1e on the 4’ phosphate, using the SAPI24 CHARMM lipid parameters. We have ensured that this charge information is stated clearly in the revised manuscript in the methods section on page 16 (line 21). We considered looking at SAPI25, however we expected that it would behave quite similarly, given that the PIP headgroup can adopt slightly different poses and orientations within the binding site across replicates and does fluctuate over simulations (Fig S8). We have noted this in the revised discussion on page 14 line 15-17.

      • I was very intrigued and puzzled by the lower binding of PIP3 vs PIP2 in the Martini simulations. Could it be that PIP3 has a harder time fully entering the binding site, or maybe just sampling? i.e. and its lower number of binding events is a sampling issue.

      We agree with the reviewer that PIP3 is less able to access the binding site than PIP2, likely because of its larger size. This might also be why we see PIP1 binding at the location via a more buried route (since it has the smallest headgroup size). However, PIP1 does not have enough negative charge to keep it in the binding site. It seems to be a Goldilocks-like situation where PIP2 has the optimal size and charge to allow access and stable binding at the site. We also see that when PIP3 enters the binding site it leaves before the end of the simulations. While it is hard to prove statistical significance given the number of binding and dissociation events even with the high and equal concentrations of all three PIP species in the enriched PIP membrane CG simulations, the data strongly suggests preferential binding of PIP2 over PIP3.

      Also the same L196 sentence as above "However, this loss of resolution prevents distinction between phosphate positions on the inositol group and does not permit analysis of protein conformational changes induced by PIP binding". The later part is also wrong, there are no conformational changes due to the restraints on the protein backbone, from methods "backbone beads were weakly restrained to their starting coordinates using a force constant of 10 kJ mol−1nm−2". Martini in general might have a hard time with some conformational changes and definitely cannot sample changes in secondary structure, but conformational changes can, and have on many occasions, been successfully sampled (even full ion channel opening and closing).

      On a similar note, in L179 you mention "owing to the flexibility of the linker." Hose does this fit with simulation with position restraints on all backbone atoms?

      We applied fairly weak restraints to the backbone only – therefore we still observe some flexibility in the highly flexible loop portion of the linker, where sidechains are able to flip between membrane-facing and cytosol-facing orientations.

      However, after reading the comments from the reviewer we have run additional simulations with an elastic network rather than backbone restraints on the DIII-DIV linker which have given further insight. As seen in Fig 4E and described in the results paragraph on page 9 line 30-37 of the revised manuscript, we can see that the presence of PIP does stabilise the linker in its receptor site. To accentuate this effect, we also ran simulation of the ‘IQM’ mutant known to have a less stable fast inactivated state due to weaker binding to the receptor. Without backbone restraints we can see partial dissociation of the DIII-DIV linker from the receptor that is partially rescued by the presence of PIP.

      I know the paper focuses on PIPs, also very nicely in Fig.2B and Fig. S1-2 the lipid enrichment is shown for other lipids, but why show all lipid classes except cholesterol? And, for the left-hand panels in Fig. S1-2 those really should be leaflet specific - as both the membrane and protein are asymmetric.

      The depletion/enrichment of Cholesterol is shown in Fig 2B and as are the Lipid Z-Density maps and contact occupancy structures a (in row 5 of Fig S2, labeled as CL in yellow). The Z-density maps are meant to provide an overall summary of lipid distribution. The contact occupancy structures showing the transverse views and intracellular/ extracellular views provide a better indication of the occupancy across the different leaflets.

      In L237 for the comparison of Cav2.2 and Kv7.1 bound to PI(4,5)P2 structures: They do agree well with the PIP1 simulations but not as much for the main PIP2 binding site. If you look in the CG simulations, is there another (not the main) PIP2 binding site at that same location (which might also be stable in AA simulations)?

      In some replicates of the CG simulations, we identify stable PIP1 binding via the other orientation (i.e. the one that overlaps with the Cav2.2 and Kv7.1 structures). Since we did not directly observe any PIP2 binding events from the other orientation, we did not run any backmapped atomistic simulations with PIP2 at this position. However, the binding site residues that the PIP1/2 headgroup binds to are the same regardless of which side PIP1/2 approaches from. We would expect that PIP2 bound from the alterative position is also stable.

      Two references I want to put for consideration to the authors, for potential inclusion if the authors find their inclusion would strengthen the manuscript. This one gives a good demonstration of using the same PM mixture to define lipid protein fingerprints with Martini:

      https://pubs.acs.org/doi/10.1021/acscentsci.8b00143.

      And this one https://pubmed.ncbi.nlm.nih.gov/33836525/ shows how Nav1.4 function could also be affected by general changes in bilayer properties (in addition to the specific lipid interactions explored here).

      We thank the reviewer for bringing to our attention these two relevant references that will help to respectively substantiate the use Martini to study membrane protein-lipid interactions, as well as, why Nav channels are interesting to study in the context of their membrane environment (and also the potential implications with drugs that can bind from within the membrane). We have added these citations to the introduction and discussion.

      Minor comments and fixes:

      L2, Title: A binding site for phosphoinositide modulation of voltage-gated sodium channels described by multiscale simulations

      The title reads very strangely to me, should it be "A binding site for phosphoinositide" ; "modulation". We thank the reviewer for this comment - title has been updated to: A binding site for phosphoinositides described by multiscale simulations explains their modulation of voltage gated sodium channels.

      L25, Abstract, "The phosphoinositide PI(4,5)P2 decreases Nav1.4 activity by increasing the difficulty of channel opening, accelerating fast activation and slowing recovery from fast inactivation." Assuming this is referring to results from Gada et al JGP, 2023 should this not be "accelerating fast inactivation"?

      Corrected in the revised manuscript.

      L71 maybe good to write the longer version of IFM on first use e.g. Ile-Phe-Met (IFM), as to not mistake it for some random three letter acronym.

      Corrected in the revised manuscript.

      L109, Fig.2. Maybe change the upper and lower leaflet to intracellular and cytoplasmic leaflets (or outer / inner). In D "(D) Distribution of PIP binding occupancies (left)" something missing can I assume, for/over all lipids exposed residues. Also, for D I am a little confused how occupancy is defined as the total occupancy per residue dose not add up to 100.

      The figure has been updated with intracellular and cytoplasmic leaflet labels. The binding occupancy distribution boxplot shows binding occupancies for all lipid exposed residues. In our analysis, we define contact occupancy as the proportion of simulation time in which a lipid type is within 0.7 nm of a given residue. It is possible for more than one lipid to be within this cut in any given frame – that is, both a PIP and PE can be simultaneously bound.

      L160 "occurring the identified site" in the

      Corrected in the revised manuscript.

      L170 "PIP3 (headgroup charge: -7e) has interacts similarly to PIP1," - remove has Corrected in the revised manuscript.

      L194, "reducing system size" the size does not change, I am assuming you want to say reducing the number of particles?

      Corrected in the revised manuscript.

      L252, Fig.6 "(B) Occupancy of all PIPs (PIP1, PIP2, PIP3) at binding site residues in the three systems" A little confusing, initially was expecting 3x3 data points per residue, maybe change to, Combined occupancy of all PIPs...

      Corrected in the revised manuscript.

      L253, Fig.6 D, I don't really have a good suggestion for improvement here, so this is just a FYI that this panel was very confusing for me and took some time to figure out what is shown.

      We have added to the caption of Fig. 6D to try to clarify this panel.

      L257, Fig.6 (F) not in bold

      Corrected in the revised manuscript.

      L259 "PIP binding, we performed embedded three structures of Nav1.7" something missing?

      Corrected in the revised manuscript.

      L272, "In triplicate 50 us coarse-grained simulations" us instead of (micro_greek)s

      Corrected in the revised manuscript.

      L272, that paragraph how long/many simulations only reported for the inactivated Nav1.7 system not the Nav1.7-NavPas chimera, which I am assuming is the same?

      Corrected in the revised manuscript.

      L297, "marked by both shortened inactivation times", can I assume this is: shortened times to inactivation (i.e. to get inactivated not times in the inactivated states)?

      Corrected in the revised manuscript.

      L331, "are conserved in Nav1.1-1.9 (Fig. 5D)," Fig.5C Corrected in the revised manuscript.

      L353, "channel opening []" [] maybe a missing reference?

      Thank you for pointing out this oversight - Goldschen-Ohm et al. has been cited here.

      L394, "The composition of the complex mammalian membrane is as reported in Ingólfsson, et al. (38)." Ref 38 is the "Computational lipidomics of the neuronal plasma membrane" which indeed uses the 63 component PM but the original reference for the average 63 lipid mixture PM is dx.doi.org/10.1021/ja507832e.

      Corrected in the revised manuscript.

      L404, "Additionally, a model Nav1.7 with all four VSDs in the deactivated state using Modeller (40)." Something missing, e.g. was also built and simulated for ...

      Corrected in the revised manuscript.

      Table S1 "Disease information", I am guessing this should be Disease information; mechanism? Of the x5 entries two have mechanism, one has "; unknown significance ", one has "; unknown" maybe clarify in title and make same if unknown.

      Corrected in the revised manuscript.

      Table S1 and S2 have different styles.

      The tables have been amended to have the same style.

      Fig. S3 "for all 12 lipid types in the mammalian membrane " there are many more lipid types in a typical PM (hundreds) and 63 in the PM mixture simulated here, so maybe write: 12 lipid classes?

      Corrected in the revised manuscript.

      Fig.S6 PIP headgroup, can I assume that is for the bound PIP only, please specify.

      Only a single PIP at the identified binding site was backmapped into all cases of atomistic simulations. We have now clarified this point in the methods, results and the FigS6 caption.

      Writing of PI(4,5)P2 and PI(4)P1 most of the time use 1 and 2 as subscripts but not always (at least not in SI), also the same with Nav vs Na_v (v subscript) and even NAV (in Table S1).

      Subscripts have been implemented in the updated Supplementary Information (as well as within various figures and throughout the manuscript).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This landmark study sheds light on a long-standing puzzle of Protein kinase A activation in Trypanosoma. Extensive experimental work provides compelling evidence for the conclusions of the manuscript. It represents a significant advancement in our understanding of the molecular mechanism of Cyclic Nucleotide Binding domains and will be of interest to researchers with interest in kinases and mechanistic studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cyclic Nucleotide Binding (CNB) domains are pervasive structural components involved in signaling pathways across eukaryotes and prokaryotes. Despite their similar structures, CNB domains exhibit distinct ligand-sensing capabilities. The manuscript offers a thorough and convincing investigation that clarifies numerous puzzling aspects of nucleotide binding in Trypanosoma.

      Strengths:

      One of the strengths of this study is its multifaceted methodology, which includes a range of techniques including crystallography, ITC (Isothermal Titration Calorimetry), fluorimetry, CD (Circular Dichroism) spectroscopy, mass spectrometry, and computational analysis. This interdisciplinary approach not only enhances the depth of the investigation but also offers a robust cross-validation of the results.

      Weaknesses:

      None noticed.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript clearly shows that Trypanosoma PKA is controlled by nucleoside analogues rather than cyclic nucleotides, which are the primary allosteric effectors of human PKA and PKG. The authors demonstrate that the inosine, guanosine, and adenosine nucleosides bind with high affinity and activate PKA in the tropical pathogens T. brucei, T. cruzi and Leishmania. The underlying determinants of nucleoside binding and selectivity are dissected by solving the crystal structure of T. cruzi PKAR(200-503) and T. brucei PKAR(199-499) bound to inosine at 1.4 Å and 2.1 Å resolution and through comparative mutational analyses. Of particular interest is the identification of a minimal subset of 2-3 residues that controls nucleoside vs. cyclic nucleotide specificity.

      Strengths:

      The significance of this study lies not only in the structure-activity relationships revealed for important targets in several parasite pathogens but also in the understanding of CNB's evolutionary role.

      Weaknesses:

      The main missing piece is the model for activation of the kinetoplastid PKA which remains speculative in the absence of a structure for the trypanosomatid PKA holoenzyme complex. However, this appears to be beyond the scope of this manuscript, which is already quite dense.

      We fully agree that insight into the activation mechanism and its possible deviation from the mammalian paradigm requires a holoenzyme structure revealing the details of R-C interaction. We have attempted Cryo-EM from LEXSY-produced holoenzyme, yet upscaling the purification procedures described in this manuscript have repeatedly failed in spite of numerous protocol changes and optimizations. Much more work is required to achieve this.

      Reviewer #2 (Recommendations For The Authors):

      Some minor points to consider for enhancing the impact of this interesting manuscript:

      (1) The nucleoside affinities measured are mainly for the regulatory subunits unbound to the kinase domain. How would nucleoside affinities change when the regulatory subunits are bound to the kinase domain, which is presumably the case under resting conditions? An estimation of this change in affinity is important because it more closely relates to the variations in cellular nucleoside concentrations needed for activation.

      This is an important question and we have given an indirect answer in the manuscript, but not very explicit. The EC50 values for kinase activation of the purified holoenzyme complexes are very similar or almost identical to the kD values measured by ITC with free regulatory subunits. By inference, the binding kD for the holoenzyme and for the free R-subunit cannot be very different. In addition, we have recently determined the EC50 for PKA activation in vivo in trypanosomes using a bioluminescence complementation reporter assay. The values fit perfectly to the values obtained with purified holoenzyme (Wu et al. in preparation). A sentence in Results (lines 201-203) has been added.

      (2) The authors should point out that a major implication of nucleoside vs. cyclic nucleotide activation is in terms of signal termination. If phosphodiesterases (PDEs) are responsible for cAMP/cGMP signal termination, what terminates nucleoside-dependent signaling? Although the answer to this question may not be known at this stage, it is important to highlight this critical implication of the authors' study.

      The mechanism of signal termination is indeed unknown so far. We speculate that some enzymes of the purine salvage pathways are differentially localized in subcellular compartments and thereby able to establish microdomains that enable nucleoside signaling. In addition, PKA subunit phosphorylations/dephosphorylations and/or protein turnover may also regulate signal termination. As an example, free PKAC1 is rapidly degraded upon depletion of the PKAR subunit by RNAi. We have now mentioned signal termination in Discussion and have revised the last part of Discussion (lines 567-602). A possible approach to monitor compartmentalized signaling would be using the FluoSTEPs technology (Tenner et al., Sci. Adv. 2021; 7: eabe4091), but adapting this to the trypanosome system will not be a short-term task.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We highly thank the editor and reviewers for their time and insightful comments and suggestions. We have made revisions by performing additional experiments and analysis, and clarified the items based on the suggestions.

      Reviewer #1 (Public Review):

      Summary of Author's Objectives:

      The authors aimed to explore JMJD6's role in MYC-driven neuroblastoma, particularly in the interplay between pre-mRNA splicing and cancer metabolism, and to investigate the potential for targeting this pathway.

      Strengths:

      (1) The study employs a diverse range of experimental techniques, including molecular biology assays, next-generation sequencing, interactome profiling, and metabolic analysis. Moreover, the authors specifically focused on gained chromosome 17q in neuroblastoma, in combination with analyzing cancer dependency genes screened with Crispr/Cas9 library, analyzing the association of gene expression with prognosis of neuroblastoma patients with large clinical cohort. This comprehensive approach strengthens the credibility of the findings. The identification of the link between JMJD6-mediated premRNA splicing and metabolic reprogramming in MYC-driven cancer cells is innovative.

      (2) The authors effectively integrate data from multiple sources, such as gene expression analysis, RNA splicing analysis, JMJD6 interactome assay, and metabolic profiling. This holistic approach provides a more complete understanding of JMJD6's role.

      (3) The identification of JMJD6 as a potential therapeutic target and its correlation with the response to indisulam have significant clinical implications, addressing an unmet need in cancer treatment.

      Weaknesses:

      (1) The manuscript contains complex technical details and terminology that may pose challenges for readers without a deep background in molecular biology and cancer research. Providing simplified explanations or additional context would enhance accessibility.

      We have provided simplified explanations for some terminology.

      (2) It would be beneficial to explore whether treatment with JMJD6 inhibitors, both in vitro and in vivo, can effectively target the enhanced pre-mRNA splicing of metabolic genes in MYC-driven cancer cells.

      Unfortunately, there is no potent and selective JMJD6 inhibitors available.

      Reviewer #3 (Public Review):

      Summary:

      Jablonowski and colleagues studied key characteristics of MYC-driven cancers: dysregulated pre-mRNA splicing and altered metabolism. This is an important field of study as it remains largely unclear as to how these processes are coordinated in response to malignant transformation and how they are exploitable for future treatments. In the present study, the authors attempt to show that Jumonji Domain Containing 6, Arginine Demethylase And Lysine Hydroxylase (JMJD6) plays a central role in connecting pre-mRNA splicing and metabolism in MYC-driven neuroblastoma. JMJD6 collaborates with the MYC protein in driving cellular transformation by physically interacting with RNA-binding proteins involved in pre-mRNA splicing and protein regulation. In cell line experiments, JMJD6 affected the alternative splicing of two forms of glutaminase (GLS), an essential enzyme in the glutaminolysis process within the central carbon metabolism of neuroblastoma cells. Additionally, the study provides in vitro (and in silico) evidence for JMJD6 being associated with the anti-proliferation effects of a compound called indisulam, which degrades the splicing factor RBM39, known to interact with JMJD6.

      Overall, the findings presented by Jabolonowski et al. begin to illuminate a cancer-promoting metabolic, and potentially, a protein synthesis suppression program that may be linked to alternative pre-mRNA splicing through the action of JMJD6 - downstream of MYC. This discovery can provide further evidence for considering JMJD6 as a potential therapeutic target for the treatment of MYC-driven cancers.

      Strengths:

      Alternative Splicing Induced by JMJD6 Knockdown: the study presents evidence for the role of JMJD6 in alternative splicing in neuroblastoma cells. Specifically, the RNA immunoprecipitation experiments demonstrated a significant shid from the GAC to the KGA GLS isoform upon JMJD6 knockdown. Moreover, a significant correlation between JMJD6 levels and GAC/KGA isoform expression was identified in two distinct neuroblastoma cohorts. This suggests a causative link between JMJD6 activity and isoform prevalence.

      Physical Interaction of JMJD6 in Neuroblastoma Cells: The paper provides preliminary insight into the physical interactome of JMJD6 in neuroblastoma cells. This offers a potential mechanistic avenue for the observed effects on metabolism and protein synthesis and could be exploited for a deeper investigation into the exact nature, and implications of neuroblastoma-specific JMJD6 protein-protein interactions.

      Weaknesses:

      There are several areas that would benefit from improvements with regard to the current data supporting the claims of the paper (i.e., the conclusion presented in Figure 8).

      Neuroblastoma Modelling Strategy: The study heavily relies on cell lines without incorporating patient derived cells/biomaterials. Using databases to fill gaps in the experimental design can only fortify the observations to a certain extent. A critical oversight is the absence of non-cancerous control cells in many figures, and the rationale for selecting specific cell lines for assays/approaches remains somewhat unclear. A foundational control for such experiments should involve the non-transformed neural crest cell line, which the authors have readily available. Are the observed splicing and metabolic effects of JMJD6 specific to neuroblastoma? Is there a neuroblastoma-specific JMJD6 interactome? Is MYC function essential?

      In Vivo Modelling: The inclusion of a genetic mouse model combined with an inducible JMJD6 knockdown, would enhance the study by allowing examination of JMJD6's role during both tumor initiation and growth in vivo. For instance, the TH-MYCN mice overexpressing MYCN in neural crest cells, could be a promising choice.

      Dependence on Colony Formation Assay: The study leans on 2D and semi-quantitative colony formation assays to assess malignant growth. To validate the link between the mechanistic insights discussed (e.g., reduced protein synthesis) and JMJD6-mediated malignant growth as a potential therapeutic target, evidence from in vivo or representative 3D models would be crucial.

      Data Presentation and Rigor: The presented data is predominantly qualitative and necessitates quantification. For instance, Western blots should be quantified. The RNAseq, metabolism, and pulldown data should be transparently and numerically presented. The figure legends seem elusive and their lack of transparency (oden with regards to biological repeats, error bars, cell line used etc.) is concerning. Adequate citation and identification of all data sources, including online resources, are imperative. The manuscript would also benefit from a more rigorous depiction and quantification of RNA interference of both stable and transient knockdowns with quantitative validation at mRNA and protein levels.

      Novelty Concerns: The emphasis on JMJD6 as a novel neuroblastoma target is contingent on the new mechanistic revelations about the JMJD6-centered link between splicing, metabolism, and protein synthesis. Given that JMJD6 has been previously linked to neuroblastoma biology, the rationale (particularly in Figure 1) for concentrating on JMJD6 may stem more from bias rather than data-driven reasoning.

      Depth of Mechanistic Investigation: Current evidence lacks depth in key areas such as JMJD6-RNA binding. A more thorough approach would involve pinpointing specific JMJD6 binding sites on endogenous RNAs using techniques such as cross-linking and immunoprecipitation, paired with complementary proximity-based methodologies. Regarding the presented metabolism data, diving deeper into metabolic flux via isotope labeling experiments could shed light on dynamic processes like TCA and glutaminolysis. As it stands, the 'pathway cartoon' in Figure 6d appears overly qualitative.

      Response: We agree with this reviewer that more in-depth studies are needed to understand the biological functions of JMJD6 in neuroblastoma. We have included one paragraph “limitation of the study” to point out that additional work needs to be done to address the comments from this reviewer.

      We have also added details in figure legend to increase rigor.

      Reviewer #1 (Recommendations For The Authors):

      In this study, Jablonowski and colleagues identify the link between JMJD6-mediated pre-mRNA splicing and metabolic reprogramming in cancer cells, with implications for therapeutic response to splicing inhibitors. I have reviewed your manuscript and found it quite promising. However, there are some specific points that require further clarification and additional experiments. Please consider the following comments:

      Major concerns:

      (1) Regarding Figure 1d and e: to enhance the robustness of your findings, it would be beneficial to include additional datasets, such as the Kocak-649 dataset. It is important to narrow down the analysis to high-risk patient groups when examining survival rates, specifically to investigate whether the elevated expression of the 114 gene signature correlates with poor survival within this subgroup. Additionally, please consider conducting a more detailed breakdown of the subsets depicted in Fig. 1b to explore the association between their expression levels and patient survival rates.

      Response: We have included the Kocak-649 datasets as Supplemental Figure 1. We have further analyzed the 114 gene signature in low-risk and high-risk patients, respectively, as Supplemental Figure 2.

      (2) Fig. 2b: Similar to the previous comment, it would strengthen your findings to include survival rate analysis in more datasets, particularly in high-risk patient groups.

      Response: We have further analyzed the association of JMJD6 with survival in low-risk and high-risk patients, respectively, as Supplemental Figure 3. Regardless of the risk factors, high expression of JMJD6 was associated with a poor outcome.

      (3) In reference to Fig. S1D, please clarify the time point under investigation. It looks like siRNAs were utilized in this study. Ensure consistency between the siRNA # mentioned in the methods section and what is presented in Fig. S1d.

      Response: We have clarified the time point under investigation in Fig. S1D (now as Fig. S4D). We have corrected the siRNA# on the method section.

      Additionally, it would be beneficial to include data on knockdown efficacy and consider incorporating western blot results, similar to those presented in Fig. 2c.

      Response: These experiments were performed as shown in Figure 4C. We assumed the knockdown efficiency was comparable.

      Furthermore, I recommend analyzing the RNA-seq data from JMJD6-depleted BE(2)C cells to identify any alterations in the expression of neuronal differentiation signature genes, with the aim of exploring potential associations with changes in cell morphology showed in Fig. S1D.

      Response: We have analyzed the data and indeed like this reviewer expected, we do see the upregulation of neuronal differentiation pathways. We have included the data as Fig. S7B.

      (4) Fig. 4g: Confirm whether the data is related to GAC, and if so, where is the data for KGA?

      Response: We apologize for this. KGA data was missed when we assembled the figure. We have added back as Figure 4H.

      (5) In relation to Fig. 4, I suggest conducting experiments to individually silence GAC and KGA, if feasible (for instance, by targeting their 3'-UTRs). This would allow for a more in-depth investigation into whether GAC and KGA play essential roles in NB cell proliferation.

      Response: As this reviewer suggested, we have performed the experiments to knock down GAC and KGA in BE2C cells, and we found that both isoforms seemed to be important for cell survival. We have included the data as Figure 5G-I. Additionally, we have also performed RNA-seq to understand the differential functions of GAC and KGA in neuroblastoma cells when they were overexpressed separately. We have included the data as Figure 5E,F, and Supplemental Figure 9.

      (6) Fig. 5c: Could this protein synthesis reduction be attributed to an artificial overexpression of JMJD6? It would be interesting to investigate whether the genetic silencing of JMJD6 has an impact on total protein synthesis.

      Response: This is a great question but could be very challenging to have a definitive answer. Since cells are not happy with knockdown of JMJD6, we may have a secondary effect resulting from activation of cell death. While we have successfully generated single cell JMJD6 CRISPR KO clones, the cells are not happy either. In the future, we may generate dTAG knockin cell line which will allow us to induce an acute protein degradation, and then we can assess if JMJD6 loss will consequently impact total protein synthesis.

      (7) Fig. S7: the authors have shown that knocking down of JMJD6 in NB cells reduced cell proliferation (Fig. 2c-e). Please clarify how you obtained sufficient cells ader CRISPR knockout of JMJD6 clones and whether the cells remained healthy. It would be helpful to provide cell images.

      Response: We harvested cells at different time points in Fig 2C-E, and we have added the information in Figure legends. Cells were not happy ader JMJD6 KD or KO. We therefore harvest cells for Western blot at an early time point while stained cells for survival effect at a late time point.

      (8) Fig. 7f: Address the paradox where JMJD-knockdown cells grow slower (Fig. 2c-e), but these JMJD-KO4E5 cells grow at a similar rate compared to SKNAS-WT in the DMSO treatment group. Clarify whether this aligns with the results observed with shRNA results shown in Fig. 2c-e.

      Response: The JMJD6 KO cells grew much slower than the wild-type cells. In these experiments, we intentionally seeded a lot more cells for JMJD6 KO clone so that we can have a comparable comparison for the cells with DMSO treatment.

      Minor concerns:

      (1) Fig. 2c: Please specify the time point for Fig. 2c to provide a clearer context for readers.

      We have added the information.

      (2) In Line 204, it is stated that 'Supplementary Table 3,' which describes the 'Correlation of JMJD6 KO and its co-dependency genes,' can actually be found in 'Supplementary Table 4.' Please clarify this discrepancy.

      We apologize for this. We probably accidentally uploaded the duplicates. We have uploaded the new table in our revision.

      (3) Line 207: The order of figures should be clarified. Fig. 3c should be mentioned before Fig. 3b in the text.

      Yes, we did.

      (4) In Line 216, it is mentioned that 'Supplementary Table 4,' which describes 'Differentially expressed genes by JMJD6 KD,' can actually be found in 'Supplementary Table 3.' Please provide clarification for this discrepancy.

      We have corrected this.

      (5) Line 244-247: Please provide clarification of this section to ensure readers can fully understand your point.

      We have rephrased the sentence.

      (6) Line 1048: Confirm whether Fig. 2c represents siRNA or shRNA, as the label in the graph does not match the figure legends.

      Sorry for this. We have corrected.

      (7) Line 1161: Provide clarification regarding the use of Image J from k, and in Line 1162, specify the source of Image J from l.

      We apologized for the confusion of our description. We meant “Image J” sodware. We have corrected in Figure legend.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions to authors:

      Line 39 - suggest introducing JMJD6.

      Response: We have added the full name of JMJD6.

      Line 47 - suggest slightly rephrasing 'metabolic program that is coupled with...'.

      We have made a slight change by changing “coupled” to “associate”.

      Line 85 - please delete/replace 'exceptional'; proofread for inadequate use of ambiguous wording.

      We have changed it as “significant”.

      Line 141 - please concisely define 'high risk'.

      We have defined it with a citation (line 142-146).

      Line 143 - please concisely define 'event free'.

      We have defined the event free and overall survival precisely (line 149, 150).

      Line 153 - provide an adequate citation for 'cBioportal'.

      We have added the citation (line166).

      Line 161 - please state the utilized cell lines.

      We have referenced to Materials and Methods (line 175).

      Line 166 - please note that 'morphological changes' of a cell do not suffice to determine 'stemness', please rephrase.

      We agreed and changed it to “regulate cellular differentiation” (line 181).

      Line 182 - provide a quantifiable measure for color change and or remove observation from the narrative.

      We have removed “indicative of acidic pH change” (line 198).

      Line 185 - the statement commencing with 'It is believed...' requires referencing.

      We have added references (line 200).

      Line 187 - please provide an adequate citation for the 'JoMa1' neural crest-derived cells (J. Maurer and colleagues?).

      We have added the reference (line 201).

      Line 203 - please provide an adequate citation for 'DepMap'.

      There is no citation specifically for DepMap and that’s why we can only provide the DepMap link.

      Line 234 - please provide an adequate citation for 'two algorithms'.

      We have provided the reference (line 265).

      Line 265 - please provide a rationale for the choice of the three tested cell lines.

      We have added definition by saying C-MYC overexpressed SKNAS, BE2C and SIMA with MYCN amplification (line 302, 303).

      Line 279 - suggest rephrasing 'gaining more ATPs'.

      We have removed these words as we do not have direct evidence to show ATP production (line 320).

      Line 342 - suggest rephrasing 'are in the only gene signature'.

      We have rephrased by saying “lysine demethylase (HDM) genes, including JMJD6, are present in the most significantly enriched gene signature in indisulam-sensitive cells” (line 416-416).

      Line 424 - please state the source or all cell lines (commercial provider?).

      We have added the source of cell lines.

      Lines 438 to 442 - are STR and mycoplasma profiling data adequately presented in the manuscript?

      We routinely test STR and mycoplasma for all cell lines cultured in hood in our Department every month.

      Lines 520 onwards - is the JMJD6 knockout generation data (e.g., cell viability upon knockout) adequately presented in the manuscript? Why does the study depend on transient transfection of siRNAs for obtaining mechanistic results?

      We created stable JMJD6 KO clones by selecting single cell with complete knockout. Cells are not happy ader KO. siRNA knockdown is a method for relatively acute depletion of JMJD6, which is easy and fast, and may be more reliable to assess the direct effect of JMJD6.

      Figures: please provide adequate axis-labeling for all graphs (e.g., FIg2 b, and e).

      We have added the axis labeling.

      Discussion line 370 - what is meant by 'too harsh' - please use unambiguous phrasing to highlight limitations.

      We have changed to “stringent”.

      Please provide a study limitation paragraph.

      We have added one limitation paragraph.

      Limitation of the study

      Our study focused on the understanding of JMJD6 function in neuroblastoma cell lines. In the future, we will consolidate our study by expanding our models to patient-derived xenograds, organoids, and neuroblastoma genetic models, in comparison with non-cancerous cells. Although we have identified a conserved interactome of JMJD6 in neuroblastoma cells, it remains to be determined whether it is neuroblastoma-specific and essential to MYC-driven cancers. The genome-wide RNA binding by JMJD6 in cancer cells and normal cells coupled with isotope labeling to dissect the metabolic effect of JMJD6 will enhance our understanding of the biological functions of JMJD6, awaiting future studies. Inability to target the enhanced pre-mRNA splicing of metabolic genes in MYC-driven cancer cells by pharmacologic inhibition of JMJD6 is another limitation, due to lack of selective and potent JMJD6 inhibitors.

      Additional editing and proof-reading of the manuscript's narrative, figures, legends, and methods is highly recommended.

      We have gone through the whole MS to have proof-reading.

    1. Author Response

      We are grateful for the reviewers' appreciation of our work and for their constructive feedback. We will address their comments through a revised version of the manuscript.

      Reviewer #1 (Public Review):

      This study by Paoli et al. used a resonant scanning multiphoton microscope to examine olfactory representation in the projection neurons (PNs) of the honeybee with improved temporal resolution. PNs were classified into 9 groups based on their response patterns. Authors found that excitatory repose in the PNs precedes the inhibitory responses for ~40ms, and ~50% of PN responses contain inhibitory components. They built the neural circuit model of the mushroom body (MB) with evolutionally conserved features such as sparse representation, global inhibition, and a plasticity rule. This MB model fed with the experimental data could reproduce a number of phenomena observed in experiments using bees and other insects, including dynamical representations of odor onset and offset by different populations of Kenyon cells, prolonged representations of after-smell, different levels of odor- specificity for early/delay conditioning, and shift of behavioral timing in delay conditioning. The trace conditioning was not modeled and tested experimentally. Also, the experimental result itself is largely confirmatory to preceding studies using other organisms. Nonetheless, the experimental data and the model provide a solid basis for future studies.

      We thank the reviewer for summarizing the value of our study and recognizing its generality and significance. As suggested, in a revised version of the manuscript, we will discuss the implication of our approach for the context of trace conditioning. The model we presented hinges on the learning-induced plasticity of KC-to-MBON synapses recruited during the learning window (i.e., the simulated US arrival). In the case of trace conditioning, the model predicts that the time of the behavioral response time should match the expected US arrival. Contrary to this prediction, preliminary analyses on empirical measurements of PER latency upon trace conditioning indicate this is not the case. In a revised version of the manuscript, we will discuss the differences between the predictions of the model and the experimental observations in a trace conditioning paradigm.

      Reviewer #2 (Public Review):

      The study presented by Paoli et al. explores temporal aspects of neuronal encoding of odors and their perception, using bees as a general model for insects. The neuronal encoding of the presence of an odor is not a static representation; rather, its neuronal representation is partly encoded by the temporal order in which parallel olfactory pathways participate and are combined. This aspect is not novel, and its relevance in odor encoding and recognition has been discussed for more than the past 20 years.

      The temporal richness of the olfactory code and its significance have traditionally been driven by results obtained based on electrophysiological methods with temporal resolution, allowing the identification and timing of the action potentials in the different populations of neurons whose combination encodes the identity of an odor. On the other hand, optophysiological methods that enable spatial resolution and cell identification in odor coding lack the temporal resolution to appreciate the intricacies of olfactory code dynamics.

      (1) In this context, the main merit of Paoli et al.'s work is achieving an optical recording that allows for spatial registration of olfactory codes with greater temporal detail than the classical method and, at the same time, with greater sensitivity to measure inhibitions as part of the olfactory code.

      The work clearly demonstrates how the onset and offset of odor stimulation triggers a dynamic code at the level of the first interneurons of the olfactory system that changes at every moment as a natural consequence of the local inhibitory interactions within the first olfactory neuropil, the antennal lobe. This gives rise to the interesting theory that each combination of activated neurons along this temporal sequence corresponds to the perception of a different odor. The extent to which the corresponding postsynaptic layers integrate this temporal information to drive the perception of an odor, or whether this sequence is, in a sense, a journey through different perceptions, is challenging to address experimentally.

      In their work, the authors propose a computational approach and olfactory learning experiments in bees to address these questions and evaluate whether the sequence of combinations drives a sequence of different perceptions. In my view, it is a highly inspiring piece of work that still leaves several questions unanswered.

      We thank the reviewer for considering that our work has an inspiring nature. Below we have tried to answer the questions raised by the following comments, and we will include part of these answers in the revised version of our manuscript.

      (2) In my opinion, the detailed temporal profile of the response of projection neurons and their respective probabilities of occurrence provide valuable information for understanding odor coding at the level of neurons transferring information from the antennal lobes to the mushroom bodies. An analysis of these probabilities in each animal, rather than in the population of animals that were measured, would aid in better comprehending the encoding function of such temporal profiles. Being able to identify the involved glomeruli and understanding the extent to which the sequence of patterns and inhibitions is conserved for each odor across different animals, as it is well known for the initial excitatory burst of activity observed in previous studies without the fine temporal detail, would also be highly significant.

      We thank the reviewer for recognizing the relevance of the findings in understanding the logic of olfactory coding. We agree about the importance of establishing if the different glomerular response profiles are evenly distributed across individuals or have individual biases. In the revised version of the manuscript, we will provide data on the distribution of response profiles for each animal and for different olfactory stimuli. Also, we fully agree on the importance of assessing to what extent such response profiles - largely determined by the local network of AL interneurons - are glomerulus-specific and conserved across individuals.

      In my view, the computational approach serves as a useful tool to inspire future experiments; however, it appears somewhat simplistic in tackling the complexity of the subject. One question that I believe the researchers do not address is to what extent the inhibitions recorded in the projection neurons are integrated by the Kenyon cells and are functional for generating odor-specific patterns at that level.

      The model we proposed represents, indeed, a simplification of olfactory signal processing throughout the honey bee olfactory circuit. Still, it shows that simple but realistic rules can be sufficient to grasp some fundamental aspects of olfactory coding. However, we agree with the reviewer and believe that such a minimalistic model can provide a basis for designing future experiments in which complexity can be increased by adding relevant features, such as the learning-induced plasticity of PN-to-KC synapses or the divergence of multiple PNs from the same glomerulus to different KCs

      Concerning the reviewer's question on the involvement of inhibitory inputs in generating odor-specific patterns at the level of the KCs, the short answer is yes, they contribute to the summed input of a target KC, thus to the odor representation. In designing the model, we considered that a given glomerulus provides maximal input at maximal excitation and minimal input (=0 input) at maximal inhibition. For this reason, an inhibited glomerulus contributes less (to KC action potential probability) than a glomerulus showing baseline activity. This, in turn, contributes less than an excited glomerulus. From the modeling point of view, normalizing the signal between 0 and 1 (i.e., setting minimal inhibition to 0 and maximal excitation to 1) would yield a similar result as with the current approach, where values range from -25% to +30% F/F. We implement the model's description to clarify this point.

      Lastly, the behavioral result indicating a difference in conditioned response latency after early or delayed learning protocol is interesting. However, it does not align with the expected time for the neuronal representation that was theoretically rewarded in the delayed protocol. This final result does not support the authors' interpretation regarding the existence of a smell and an after-smell as separate percepts that can serve as conditioned stimuli.

      Considering that our odor stimulus lasted 5 seconds, glomerular activity is highly variable at odor onset (i.e., within the first 1s) because of short excitatory response profiles and the delayed and slower onset of inhibitory responses. After the initial phase, the neural representation of the stimulus becomes more stable. Consequently, a neural signature learned in the case of delay conditioning, i.e., with the US appearing towards the end of the olfactory stimulation (t = 4 - 5s), may present itself much earlier (t = 1.5s), triggering a behavioral response that largely anticipates the expected US arrival time.

      In the model, we observe an early decrease in action potential probability even in the case of delay conditioning. This occurs because the synapses recruited during the last second of olfactory stimulation (within the learning window during which CS and US overlap) become inactive. Because odorant-induced activity recruits highly overlapping synaptic populations between 1.5 and 5 s from the onset, a learning-induced inactivation of part of these synapses will result in a reduced action-potential probability in the modeled MBON. Importantly, this event will not be governed by time but by the appearance of the learned synaptic configuration.

      We will add a new section to the revised version of the manuscript to clarify this concept and perform further analyses to characterize the contribution of different response types to the modeled response latency.

    1. Author Response

      Reviewer #1 (Public Review):

      Strengths:

      • The paper is clearly written, and all the conclusions stem from a set of 3 principles: circular topology, rotational symmetry, and noise minimization. The derivations are sound and such rigor by itself is commendable.

      • The authors provide a compelling argument on why evolution might have picked an eight-column circuit for path-integration, which is a great example of how theory can inform our thinking about the organization of neural systems for a specific purpose.

      • The authors provide a self-consistency argument on how cosine-like activity supports cosine-like connectivity with a simple Hebbian rule. However, their framework doesn't answer the question of how this system integrates angular velocity with the correct gain in the absence of allothetic cues to produce a heading estimate (more on that on point 3 below).

      Weaknesses:

      • The authors make simplifying assumptions to arrive at the cosine activity/cosine connectivity circuit. Among those are the linear activation function, and cosine driving activity u. The authors provide justification for the linearization in methods 3.1, however, this ignores the well-established fact that bump amplitude is modulated by angular velocity in the fly head direction system (Turner-Evans et al 2017). In such a case, nonlinearities in the activation function cannot be ignored and would introduce harmonics in the activity.

      We thank the reviewer for pointing out this omission. We added a paragraph at the end of section 4.1 clarifying that transient non-linearity, for instance when the circuit is actively receiving external input, is compatible with our work because we only need linearity in the line attractor, but not outside (lines 407-419).

      “In more intuitive terms, the neurons have a saturating nonlinear activation function where they modulate their gain based on the total activity in the network. If the activity in the network is above the desired level, r, the gain is reduced and the activity decreases, and when the activity of the network is less than desired level, both the gain and the activity increase. Note that in this scenario transient deviations from the line attractor, which would induce nonlinear behaviour in the circuit dynamics, are tolerable. External inputs, u(t), could transiently modify the shape of the activity, producing activity shapes deviating from what the linear model can accommodate. For example, the shape of the bump attractor could be modified through nonlinearities while the insect attains high angular velocity (Turner-Evans et al., 2017).

      Such nonlinear dynamics do not conflict with the theory developed here, which only requires linearity when the activity is projected onto the circular line attractor. In our framework, the linearity of integration at the circular line attractor is not a computational assumption, but rather it emerges from the principle of symmetry.”

      Furthermore, even though activity has been reported to be cosine-like, in fact in the fruit fly it takes the form of a somewhat concentrated activity bump (~80-100 degrees, Seelig & Jayaraman 2015; Turner-Evans et al 2017), and one has to take into account the smoothing effect of calcium dynamics too which might make the bump appear more cosine-like. So in general, it would be nice to see how the conclusions extend if the driving activity is more square-like, which would also introduce further harmonics.

      We added a cautionary comment on the sinusoidal activity (lines 222-226).

      “We note, however, that data from the fruit fly shows a more concentrated activity bump than what would be expected from a perfect sinusoidal profile (Seelig and Jayaraman, 2015; Turner-Evans et al., 2017), and that calcium imaging (which was used to measure the activity) can introduce biases in the activity measurements (Siegle et al., 2021; Huang et al., 2021). Thus the sinusoidal activity we model is an approximation of the true biological process rather than a perfect description.”

      Overall, it would be interesting to see whether, despite the harmonics introduced by these two factors interacting in the learning rule, Oja's rule can still pick up the "base" frequency and produce sinusoidal weights (as mentioned in methods 3.8). At this point, the examples shown in Figure 5 (tabula rasa and slightly perturbed weights) are quite simple. Such a demonstration would greatly enhance the generality of the results.

      We also extended the self-consistency framework from Oja’s rule to the non-linear case, and found that while Oja’s rule with non-linear neurons would not give pure harmonics, the secondary harmonics will remain small. We added a sentence explaining this in the main text (section 2.4, lines 309-312) and a methods section to develop the self-consistency framework for the case of non-linear activations (section 4.7.2).

      “For neurons with a nonlinear activation function, secondary harmonics would emerge, but would remain small under mild assumptions, as shown in Section 4.7.2. Oja’s rule will still cause the weights to converge to approximately sinusoidal connectivity.”

      • The match of the theoretical prediction of cosine-like connectivity profiles with the connectivity data is somewhat lacking. In the locust the fit is almost perfect, however, the low net path count combined with the lack of knowledge about synaptic strengths makes this a motivating example in my opinion. In the fruit fly, the fit is not as good, and the function-fitting comparison (Methods Figure 6) is not as convincing. First, some function choices clearly are not a good fit (f1+2, f2). Second, the profile seems to be better fit by a Gaussian or other localized function, however the extra parameter of the Gaussian results in the worst AIC and AICc. To better get at the question of whether the shape of the connectivity profile matches a cosine or a Gaussian, the authors could try for example to fix the width of the Gaussian (e.g. to the variance of the best-fit cosine, which seems to match the data very well even though it wasn't itself fit), and then fit the two other parameters to the data. In that case, no AIC or AICc is needed. And then do the same for a circular distribution, e.g. von Mises.

      We also included the fit with von Mises and Gaussian with the width parameters fixed to match the cosine as the reviewer suggested. We found that even though these two distributions fit the data better, the difference is very small (2%), probably due to the high variability of the fruit fly connectome data. We also changed the wording and state that the theory is compatible with experimental data.

      In the Methods 4.6 (lines 568-585), we wrote

      “As a complementary approach to evaluate the shape of the distribution, we first fit the Gaussian and von Mises distributions to the best fit f = 1 curve. We then freeze the width parameters of the distributions (σ_g for the Gaussian and κ_v for the von Mises) and only optimise the amplitude and vertical offset parameters (β and γ) to fit the data. This approach limits the number of free parameters for the Gaussian and von Mises distributions to two, to match the sinusoid. The results are shown in Methods Fig. 6 and Table 5. Both the fixed-width Gaussian and von Mises distributions are a slightly better fit to the data than the sinusoid, but the differences between the three curves are very small.

      In simplifying the fruit fly connectome data, we assumed all synapses of different types were of equal weight, as no data to the contrary were available. Different synapse types having different strengths could introduce nonlinear distortions between our net synaptic path count and the true synaptic strength, which could in turn make the data a better or worse fit for a sinusoidal compared to a Gaussian profile. As such, we don’t consider the only 2% relative differences between the f = 1 sinusoid and fixed-width Gaussian and von Mises distributions to be conclusive.

      Overall, we find that the cosine weights that emerge from our derivations are a very close match for the locust, but less precise for the fly, where other functions fit slightly better. Given the limitations in using the currently available data to provide an exact estimate of synaptic strength (for the locust), and due to the high variability of the synaptic count (for the fruit fly), we consider that our theory is compatible with the observed data.”

      In addition, the theoretical prediction of cosine-like connectivity is not clearly stated in the abstract, introduction, or discussion. As a prediction, I believe it should be center forward, as it might be revisited again in the future in lieu of e.g. new experimental data.

      We added the explicit prediction in the abstract and the introduction (lines 52-53).

      • I find the authors' claim that Oja's rule suffices to learn the insect head direction circuit (l. 273-5) somewhat misleading/vague. The authors seem to not be learning angular integration here at all. First, it is unclear to me what is the form of u(t). Is it the desired activity in the network at time t given angular velocity? This is different than modelling a population of PEN neurons jointly tuned to head direction and angular velocity, and learning weights so as to integrate angular velocity with the correct gain (Vafidis et al 2022). The learning rule here establishes a self-consistency between sinusoidal weights and activity, however, it does not learn the weights from PEN to EPG neurons so as to perform angular integration. Similar simple Hebbian rules have been used before to learn angular integration (Stringer et al 2002), however, they failed to learn the correct gain. Therefore, the authors should limit the statement that their simpler learning rule is enough to learn the circuit (l. 273-5), making sure to outline differences with the current literature (Vafidis et al 2022).

      We agree and we clarified that we focus only on the self-sustained activity condition. We appended the following text to the first and last paragraphs of section 2.4.

      For the first (lines 279-284): “Our approach follows from previous research which has shown that simple Hebbian learning rules can lead to the emergence of circular line attractors in large neural populations (Stringer et al., 2002), and that a head direction circuit can emerge from a predictive rule (Vafidis et al., 2022). In contrast to this work, we focus only on the self-sustaining nature of the heading integration circuit in insects and show that our proposed sinusoidal connectivity profile can emerge naturally.”

      For the last (lines 317-321): “However, this learning rule only applies to the weights that ensure stable, self-sustaining activity in the network. The network connectivity responsible for correctly integrating angular velocity inputs (given by the PEN to EPG connections in the fly) might require more elements than a purely Hebbian rule (Stringer et al., 2002), such as the addition of a predictive component (Vafidis et al., 2022).”

    1. Author Response

      Public reviews:

      Reviewer 1:

      Weaknesses:

      While I generally agree with the author's interpretations, the idea of Saccorhytida as a divergent, simplified off-shot is slightly contradictory with a probably non-vermiform ecdysozoan ancestor. The author's analyses do not discard the possibility of a vermiform ecdysozoan ancestor (importantly, Supplementary Table 4 does not reconstruct that character),

      Reply: Thanks for the comments. Saccorhytids are only known from the early Cambrian and their unique morphology has no equivalent among any extinct or extant ecdysozoan groups. This prompted us to consider them as a possible dead-end evolutionary off-shot. The nature of the last common ancestor of ecdysozoan (i.e. a vermiform or non-vermiform animal with capacities to renew its cuticle by molting) remains hypothetical. At present, palaeontological data do not allow us to resolve this question. The animal in Fig. 4b at the base of the tree is supposed to represent an ancestral soft-bodied form with no cuticle from which ecdysozoan evolved via major innovations (cuticular secretion and ecdysis). Its shape is hypothetical as indicated by a question mark. Our evolutionary model is clearly intended to be tested by further studies and hopefully new fossil discoveries.

      and outgroup comparison with Spiralia (and even Deuterostomia for Protostomia as a whole) indicates that a more or less anteroposteriorly elongated (i.e., vermiform) body is likely common and ancestral to all major bilaterian groups, including Ecdysozoa. Indeed, Figure 4b depicts the potential ancestor as a "worm". The authors argue that the simplification of Saccorhytida from a vermiform ancestor is unlikely "because it would involve considerable anatomical transformations such as the loss of vermiform organization, introvert, and pharynx in addition to that of the digestive system". However, their data support the introvert as a specialisation of Scalidophora (Figure 4a and Supplementary Table 4), and a pharyngeal structure cannot be ruled out in Saccorhytida. Likewise, loss of an anus is not uncommon in Bilateria. Moreover, this can easily become a semantics discussion (to what extent can an animal be defined as "vermiform"? Where is the limit?).

      Reply: We agree with you that “vermiform” is an ill-defined term that should be avoided. “Elongated” might be a better term to designate the elongation of the body along the antero-posterior axis. Changes have been made in the text to solve this semantic problem. Priapulid worms or annelids are examples of extremely elongated, tubular animals. In saccorhytids, the antero-posterior elongation is present (as it is in the vast majority of bilaterians) but extremely reduced, Saccorhytus and Beretella having a sac-like or beret-shape, respectively. That such forms may have derived from elongated, tubular ancestors (e.g. comparable with scalidophoran worms) would require major anatomical transformations that have no equivalent among modern animals. We agree that further speculation about the nature of these transformations is unnecessary and should be deleted simply because the nature of these ancestors is purely hypothetical. We also agree that the loss of anus and the extreme simplification of the digestive system is common among extant bilaterians. The single opening seen in Saccorhytus and possibly Beretella may result from a comparable simplification process. In Figure 4b, the hypothetical pre-ecdysozoan animal is slightly elongated (antero-posterior axis and polarity) but in no way comparable with a very elongated and cylindrical ecdysozoan worm (e.g. extant or extinct priapulid).

      Therefore, I suggest to leave the evolutionary scenario more open. Supporting Saccorhytida as a true group at the early steps of Ecdysozoa evolution is important and demonstrates that animal body plans are more plastic than previously appreciated. However, with the current data, it is unlikely that Saccorhytida represents the ancestral state for Ecdysozoa (as the authors admit), and a vermiform nature is not ruled out (and even likely) in this animal group. Suggesting that the ancestral Ecdysozoan might have been small and meiobenthic is perhaps more interesting and supported by the current data (phylogeny and outgroup comparison with Spiralia).

      Reply: We agree the evolutionary scenario should be more open, especially the evolutionary process that gave rise to Saccorhytida. Again, we know nothing about the morphology of the ancestral ecdysozoan (typically the degree of body elongation, whether it had a differentiated introvert or not, whether it had a through gut or not). Simplification appears as one possible option, but which assumes that the ancestral ecdysozoan was an elongated animal with a through gut. Changes will be made in Fig.4A accordingly. Alternatively, the ancestral ecdysozoan might have been small and meiobenthic.

      Reviewer 2:

      Weaknesses:

      The preservations of the specimens, in particular on the putative ventral side, are not good, and the interpretation of the anatomical features needs to be tested with additional specimens in the future. The monophyly of Cycloneuralia (Nematoida + Scalidophora) was not necessarily well-supported by cladistic analyses, and the evolutionary scenario (Figure 4) also needs to be tested in future works.

      Reply: Yes, we agree that our MS is the first report on an enigmatic ecdysozoan. Whereas the dorsal side of the animal is well documented (sclerites), uncertainties remain concerning its ventral anatomy (typically the mouth location and shape). Additional better-preserved specimens will hopefully provide the missing information. Concerning Cycloneuralia, their monophyly is generally better supported by analyses based on morphological characters than in molecular phylogenies. I

      Reviewer 3:

      Weaknesses: I, as a paleontology non-expert, experienced several difficulties in reading the manuscript. This should be taken into consideration when assuming a wide range of readers including non-experts.

      Reply: We have ensured that the text is comprehensible to biologists. Our main results are summarized in relatively simple diagrams (e.g. Fig. 4). We are aware that technical descriptive terms may appear obscure to non-specialists. However, we think that our text-figures help the reader to understand the morphology of these ancient animals.

    1. Author Response

      eLife assessment

      This study presents a useful comparison of the dynamic properties of two RNA-binding domains. The data collection and analysis are solid, making excellent use of a suite of NMR methods. However, evidence to support the proposed model linking dynamic behavior to RNA recognition and binding by the tandem domains remains incomplete. The work will be of interest to biophysicists working on RNA-binding proteins.

      Response: We thank eLife for taking the time and effort to review our manuscript. Evidence from the literature and our study shows a great deal of parity between the dynamic behavior of dsRBDs and its dsRNA-recognition and -binding, which helped us culminate in proposing a fair model. As mentioned in the manuscript, we have been working on the suggested experiments to further support our proposed model.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Differential conformational dynamics in two type-A RNA-binding domains drive the double-stranded RNA recognition and binding," Chugh and co-workers utilize a suite of NMR relaxation methods to probe the dynamic landscape of the TAR RNA binding protein (TRBP) double-stranded RNA-binding domain 2 (dsRBD2) and compare these to their previously published results on TRBP dsRBD1. The authors show that, unlike dsRBD1, dsRBD2 is a rigid protein with minimal ps-ns or us-ms time scale dynamics in the absence of RNA. They then show that dsRBD2 binds to canonical A-form dsRNA with a higher affinity compared to dsRBD1 and does so without much alteration in protein dynamics. Using their previously published data, the authors propose a model whereby dsRBD2 recognizes dsRNA first and brings dsRBD1 into proximity to search for RNA bulge and internal loop structures.

      Response: We thank the Reviewer for sending us an encouraging review. We have combined the findings reported in the literature with new ones, that led us to propose the dsRNA-binding model by tandem A-form dsRBDs.

      We propose that dsRBD1 can first recognize a variety of sequential and structurally different dsRNAs. dsRBD2 assists the interaction with a higher affinity, thus fortifying the interaction between TRBP and a possible substrate. This may enable the other associated proteins like Dicer and Ago2 to perform critical biological functions.

      However, the following statements made in the comment above are factually incorrect.

      (1) They then show that dsRBD2 binds to canonical A-form dsRNA with a higher affinity compared to dsRBD1 and does so without much alteration in protein dynamics.

      However, we have explicitly shown the perturbation in dsRBD2 dynamics upon RNA binding.

      (2) Using their previously published data, the authors propose a model whereby dsRBD2 recognizes dsRNA first and brings dsRBD1 into proximity to search for RNA bulge and internal loop structures.

      Our previously published data suggests that dsRBD1, owing to its high conformational dynamics in solution, is able to recognize a variety of structurally and sequentially different dsRNA (PMID: 35134335). dsRBDs preferably bind to the double-stranded region (minor-major-minor-groove) of an A-form RNA (PMID: 24801449; PMID: 27332119) and do not search for bulge and internal loop structures as a part of the binding event. Even though dsRBDs preferably bind to the double-stranded region, they can still accommodate perturbation in the A-form helix due to mismatch and bulges with decreased binding affinity (PMID 25608000). However, it is a matter of future research to identify how much of a deviation from the A-form structure can be accommodated by the dsRBDs. The diffusion event observed in the literature (PMID: 23251028) also does not show any direct implication to search for bulge and internal loop structures.

      Strengths:

      The authors expertly use a variety of NMR techniques to probe protein motions over six orders of magnitude in time. Other NMR titration experiments and ITC data support the RNA-binding model.

      Weaknesses:

      The data collection and analysis are sound. The only weakness in the manuscript is the lack of context with the much broader field of RNA-binding proteins. For example, many studies have shown that RNA recognition motif (RRM) domains have similar dynamic characteristics when binding diverse RNA substrates. Furthermore, there was no discussion about the entropy of binding derived from ITC. It might be interesting to compare with dynamics from NMR.

      Response: We understand the reviewer’s point that this study is focused on a dsRNA-binding mechanism rather than addressing the much broader field of RNA-binding. There are multiple challenges in finding a single mechanism that works for all RNA-binding proteins. For instance, RRM is a single-stranded RNA binding domain that is able to read out the substrate base sequence. RRM behaves entirely differently than the dsRBD in terms of sequence specificity. Besides, several other RNA-binding domains like the KH-domain, Puf domains, Zinc finger domains, etc., showcase a unique RNA-binding behavior. Thus, it would be really difficult to draw a single rule of thumb for RNA-recognition behavior for all these diverse domains.

      Thank you for pointing out the entropy of binding from ITC. We shall include the discussion about the entropy of binding in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Proteins that bind to double-stranded RNA regulate various cellular processes, including gene expression and viral recognition. Such proteins often contain multiple double-stranded RNA-binding domains (dsRBDs) that play an important role in target search and recognition. In this work, Chug and colleagues have characterized the backbone dynamics of one of the dsRBDs of a protein called TRBP2, which carries two tandem dsRBDs. Using solution NMR spectroscopy, the authors characterize the backbone motions of dsRBD2 in the absence and presence of dsRNA and compare these with their previously published results on dsRBD1. The authors show that dsRBD2 is comparatively more rigid than dsRBD1 and claim that these differences in backbone motions are important for target recognition.

      Strengths:

      The strengths of this study are multiple solution NMR measurements to characterize the backbone motions of dsRBD2. These include 15N-R1, R2, and HetNOE experiments in the absence and presence of RNA and the analysis of these data using an extended-model-free approach; HARD-15N-experiments and their analysis to characterize the kex. The authors also report differences in binding affinities of dsRBD1 and dsRBD2 using ITC and have performed MD simulations to probe the differential flexibility of these two domains.

      Weaknesses:

      While it may be true that dsRBD2 is more rigid than dsRBD1, the manuscript lacks conclusive and decisive proof that such changes in backbone dynamics are responsible for target search and recognition and the diffusion of TRBP2 along the RNA molecule. To conclusively prove the central claim of this manuscript, the authors could have considered a larger construct that carries both RBDs. With such a construct, authors can probe the characteristics of these two tandem domains (e.g., semi-independent tumbling) and their interactions with the RNA. Additionally, mutational experiments may be carried out where specific residues are altered to change the conformational dynamics of these two domains. The corresponding changes in interactions with RNA will provide additional evidence for the model presented in Figure 8 of the manuscript. Finally, there are inconsistencies in the reported data between different figures and tables.

      Response: We thank the reviewer for the comprehensive and insightful review. A larger construct carrying both RBDs was not used because of the multiple challenges pertaining to dynamics study by NMR spectroscopy (intrinsic R2 rates of the dsRBD1-dsRBD2 construct would be high, resulting in broadened peaks) as per our previous experience (PMID: 35134335). There would be additional dynamics in that construct coming from domain-domain relative motions, difficult to deconvolute the dynamics information. Further, the dsRNA needed to bind to this construct will be longer, thereby causing further line broadening in NMR.

      Coming to mutational studies, careful designing of domain mutants remains as a challenge because the conformational dynamics in both the domains are distributed all through the backbone rather than only in the RNA-binding residues. The mutational studies would need an exhaustive number of mutations in protein as well as RNA to draw a parallel between the binding and dynamics. Having said that, we are working on making such mutations in the protein (at several locations to freeze the dynamics site-specifically) and the RNA (to change the shape of the dsRNA) to systematically study this mechanism, which will be out of scope of this manuscript.

      The reviewer has rightly pointed out some subtle superficial differences. These superficial differences are present because of the context in which we are describing the data. For example, in Figure S4 we are talking about the average relaxation rates and nOe values for only the common residues we were able to analyze between two magnetic field strengths 600 and 800 MHz. Whereas in Figure 6, we are comparing the averages of the core dsRBD residues at 600 MHz, in presence and absence of D12RNA. The differences however are minute falling well within the error range.

    1. Author Response

      eLife assessment

      The manuscript explores the ways in which the genetic code evolves, specifically how stop codons are reassigned to become sense codons. The authors present phylogenetic data showing that mutations at position 67 of the termination factor are present in organisms that nevertheless use the UGA codon as a stop codon, thereby questioning the importance of this position in the reassignment of stop codons. Alternative models on the role of eRF1 would reflect a more balanced view of the data. Overall, the data are solid and these findings will be valuable to the genomic/evolution fields.

      Public Reviews:

      Reviewer #1 (Public Review):

      The issue:

      The ciliates are a zoo of genetic codes, where there have been many reassignments of stop codons, sometimes with conditional meanings which include retention of termination function, and thus > 1 meaning. Thus ciliate coding provides a hotspot for the study of genetic code reassignments.

      The particular issue here is the suggestion that translation of a stop (UGA) in Blastocritihidia has been attributed to a joint change in the protein release factor that reads UGA's and also breaking a base pair at the top of the anticodon stem of tRNATrp (Nature 613, 751, 2023).

      The work:

      However, Swart, et al have looked into this suggestion, and find that the recently suggested mechanism is overly complicated.

      The broken pairing at the top of the anticodon stem of tRNATrp indeed accompanies the reading of UGA as Trp as previously suggested. It changes the codon translated even though the anticodon remains CCA, complementary to UGG. A compelling point is that this misreading matches previous mutational studies of E coli tRNA's, in which breaking the same base pair in a mutant tRNATrp suppressor tRNA stimulated the same kind of miscoding.

      This is a fair characterization, and we would also note the additional positive aspect: that we observed there is consistency in the presence of 4 bp tRNA-Trp anticodon stems in those ciliates which translate UGA as tryptophan, and generally 5 bp anticodon stems in those that do not (including Euplotes with UGA=Cys).

      But the amino acid change in release factor eRF1, the protein that catalyzes termination of protein biosynthesis at UGA is broadly distributed. There are about 9 organisms where this mutation can be compared with the meaning of UGA, and the changes are not highly correlated with a change in the meaning of the codon. Therefore, because UGA can be translated as Trp with or without the eRF1 mutation, Swart et al suggest that the tRNA anticodon stem change is the principal cause of the coding change.

      We do think multiple lines of evidence support the shorter tRNA anticodon stem promoting UGA translation, but also think other changes in the translation system may be important. For instance, structural studies suggest interaction of ribosomal RNA with extended stop codons (particularly the base downstream of the triplet) during translation termination (Brown et al. 2015, Nature). As we noted, previous studies have sought to correlate individual eRF1 substitutions with genetic code changes, but the proposed correlations have invariably disappeared once new tranches of eRF1 sequences and alternative genetic codes for different species became available. This is why we concluded that there needs to be more focus on obtaining and understanding molecular structures during translation termination, particularly in the organisms with alternative codes.

      The review:

      Swart et al have a good argument. I would only add that eRF1 participation is not ruled out, because finding that UGA encodes Trp does not distinguish between encoding Trp 90% of the time and encoding it 99% of the time. The release factor could still play a measurable quantitative role, but the major inference here seems convincing.

      We agree that eRF1 may participate and compete with the tRNA, but we question the hypothesis that the particular amino acid position/substitution proposed by Kachale et al. 2023 is the key. There is experimental evidence in the form of Ribo-seq for the ciliate Condylostoma magnum (A67), which does appear to efficiently translate UGA sense codons (Swart et al. 2016, Figure S3: https://doi.org/10.1016/j.cell.2016.06.020): we observed no dip in ribosome footprints downstream of these codons, as there would be in the case of classical translational readthrough in standard genetic code organisms (which is usually relatively inefficient - certainly well below 50% of upstream translation from our reading of the literature). Ribo-seq also supports efficient termination at those Condylostoma UGA codons that are stops.

      Of course, the entire translation system may have evolved to be as efficient as what we currently observe, and it is not unreasonable to consider that it may have been less efficient in the past. However, not so inefficient that the error rate incurred would have been strongly deleterious. Importantly also, we believe the role of multiple eRF1 paralogs in translation termination in the ciliates really needs to be investigated, given that translation is inherently probabilistic with any of these proteins potentially being incorporated into the ribosome.

      Reviewer #2 (Public Review):

      The manuscript raises interesting observations about the potential evolution of release factors and tRNA to readdress the meaning of stop codons. The manuscript is divided into two parts: The first consists of revealing that the presence of a trp tRNA with an AS of 5bp in Condylostoma magnum is probably linked to contamination in the databases by sequences from bacteria. This is an interesting point which seems to be well supported by the data provided. It highlights the difficulty of identifying active tRNA genes from poorly annotated or incompletely assembled genomes.

      We will consider adding subheadings in revising the manuscript to make the structure more explicit, as it really has three parts to it, with the third largely in the supplement. The “good” was that there is a range of support for the 4 bp AS stem, with new evidence we supplied from ciliates and older studies with E. coli tRNAs. The “bad” is that scrutiny of eRF1 sequences, with the addition of ones we provided, contradicts the hypothesis by Kachale et al. that a S67A/G substitution is necessary for genetic code evolution in Blastocrithidia and certain ciliates. The “ugly” is that a tRNA shown in a main figure in Kachale et al. 2023, and which was investigated in a number of subsequent experiments, is almost certainly a bacterial contaminant.

      Proper scrutiny of the bacterial tRNA should have led to its immediate recognition and rejection, as one of us did years ago in searches of tRNAs in a preliminary Condylostoma genome assembly (only predicted 4 bp AS tRNA secondary structures were shown in Swart et al. 2016, Fig S4B and C). Evidence for the bacterial nature of this tRNA was placed in the supplement of the present manuscript, as the meat of the critique was the consideration of the evidence for and against its good and bad aspects. The bacterial tRNA secondary structure has been removed from the main figure by Kachale et al. 2023, and downstream experiments based on synthetic constructs for this tRNA have also been revised (https://www.nature.com/articles/s41586-024-07065-0).

      Much of the rest of the supplement served to correct multiple errors in genetic codes in public sequence databases that led to additional errors and difficulties in interpreting the eRF1 substitutions in Kachale et al. 2023. It is important that these codes get corrected. If not they create multiple headaches for users besides those investigating genetic codes, as we found out in communications with authors and a colleague of Kachale et al. 2023 (in particular, leading to thousands of missing genes in the macronuclear genome of the standard code ciliate Stentor coeruleus that were removed in automated GenBank processing due to incorrectly having an alternative genetic code specified).

      Recently the NCBI Genetic Codes curators reinstated a genetic code incorrectly attributed to the ciliate Blepharisma (“Blepharisma nuclear genetic code”) (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG15), despite us requesting a reasonable fix years ago. This would be very confusing for those that are not in the know. We have explained this confusion in our supplement too. Thus we also hope that this paper will aid in communication with the genetic code database curators and in correcting such issues.

      The second part criticises the fact that a mutation at position S67 of eRF1 is required to allow the UGA codon to be reassigned as a sense codon. As supporting evidence, they provide a phylogenetic study of the eRF1 factor showing that there are numerous ciliates in which this position is mutated, whereas the organism shows no trace of the reassignment of the UGA codon into a sense codon. While this criticism seems valid at first glance, it suffers from the lack of information on the level of translation of UGA codons in the organisms considered.

      Firstly, we not only showed that there are organisms with the S67 substitution but no UGA reassignment, but also provided evidence for the converse: organisms with a UGA=Trp reassignment but without the S67 substitution (both ciliates and a non-ciliate). So, two related lines of substitutions were not consistent with the eRF1 substitution hypothesis proposed.

      Secondly, we disagree that there is a “lack of information about UGA translation in the organisms considered”. Evolution has already supplied information as to whether UGA codons are translated at an appreciable level in the organisms of interest, in the form of codon frequencies within their protein-coding sequences and those ending them. If UGA was translated at appreciable levels, it would be found at a corresponding frequency in coding sequences. In genomes with thousands of genes, if not predicted as amino acids, they likely primarily serve as stops. Low levels of potential readthrough of actual stops would not change the arguments. With the exception of selenocysteine translation (which is restricted to a limited number of genes by the condition of requiring a specific mRNA secondary structure) there is no expectation of meaningful levels of UGA translation when this codon is missing from the bulk of coding sequences (CDSs).

      This is well illustrated by the heterotrichs, a clade of ciliates that use a variety of genetic codes. In heterotrichs that use the standard code, UGA is virtually absent from coding sequences, only appearing at the 3’ end of transcripts in the predicted stop codon and 3’-UTR (Seah et al. 2022, Figure 5). This contrasts notably with other genera like Blepharisma where appreciable levels of UGA codons occur throughout coding sequences, upstream of the predicted UAA and UAG stops (Seah et al. 2022, Figure 5: https://www.biorxiv.org/content/biorxiv/early/2022/07/12/2022.04.12.488043/F5.large.jpg). The difference in the UGA, UAG and UAA codon frequencies in 3’ UTRs compared to the upstream frequencies in CDSs of standard genetic code heterotrichs is stark. Frequencies of all three codons are elevated in the 3’ UTRs of all heterotrich ciliates, irrespective of their genetic codes (Seah et al. 2022, Figure 5), according with these codons not being deleterious in this region and strongly selected against upstream, within CDSs.

      The reviewer raises the possibility that UGA may appear to be a stop codon but still have biologically significant translational readthrough. We think that this is unlikely in the heterotrich ciliate species discussed here, which have extremely short (median 21-26 bp) and AU-rich 3’-UTRs compared to yeast and animals (Seah et al. 2022). Therefore, in heterotrichs where UGA is predicted to be a stop, translational readthrough would lead to extensions of only a few amino acids and be relatively inconsequential, as there are plenty of secondary UAA, UAG and UGA codons downstream of the typical stop.

      If one were to consistently pursue the reviewer’s line of argumentation, one would also have to argue against the very reasoning used in Kachale et al. 2023 about all the stop codon predictions/reassignments in protists for which experiments were not conducted in S. cerevisiae or other translation systems, as well as decades of prior work using sequence conservation in multiple sequence alignments to infer alternative genetic codes.

      Furthermore, experimental information for UGA translation levels is available for the ciliate Condylostoma magnum, predominantly in the form of Ribo-seq (Swart et al. 2016). Similarly to Condylostoma’s UAA and UAG codons, Ribo-seq shows that the UGA codons are generally either efficiently translated when present in the bodies of CDSs or terminate translation as actual stops close to mRNA 3’ termini/poly(A) tails (Swart et al. 2016). Thus, irrespective of the presence of the hypothesized eRF1 substitution there is an example of relatively discrete reading of UGA codons in ciliates as either stops or amino acids. This contrasts with Kachale et al 2023’s experiments in yeast with yeast eRF1 S67G or Blastocrithida eRF1 which also has glycine at the equivalent position that appear to lead to modest readthrough. In addition, efficient reading of codons in either of two ways also occurs in the ciliate genus Euplotes in which “stop” codons can either serve as frameshift sites during translation within coding sequences or be actual stops when they are close to 3’ mRNA termini (Lobanov et al. 2017), as verified by Ribo-seq and protein mass spectrometry.

      It has been clearly shown that S67G or S67A mutations allow a strong increase in the reading of UGA codons by tRNAs, so this point is not in doubt. However, this has been demonstrated in model organisms, and we now need to determine whether other changes in the translational apparatus could accompany this mutation by modifying its impact on the UGA codon. This is a point partly raised at the end of the manuscript.

      There is no doubt that S67G or S67A mutations lead to increased translational readthrough, but this is restricted to experiments with or in baker’s yeast or other standard genetic code surrogate model organisms. Experiments introducing eRF1 sequences from alternative genetic code eukaryotes into translation systems of such standard genetic code eukaryotes are not compelling because the rest of the associated translation system has also evolved tremendously. As far as we are aware, no in vivo experiments with ciliate eRF1s have been conducted to determine if position 67 or other substitutions have any effect. These considerations are critical given the vast evolutionary distances between yeasts, Blastocrithidia, the ciliates and Amoebophrya sp. ex Karlodinium veneficum. On the other hand, the evolutionary information presented contradicts the importance of this substitution in the Amoebophyra species and ciliates. We will consider how to incorporate these ideas in the revised version of the manuscript.

      Indeed, it is quite possible that in these organisms the UGA codon is both used to complete translation and is subject to a high level of readthrough. Actually, in the presence of a mutation at position 67 (or elsewhere), the reading of the UGA can be tolerated under specific stress conditions (nutrient deficiency, oxidative stress, etc.), so the presence of this mutation could allow translational control of the expression of certain genes.

      As explained a couple replies above, it is not constructive to invoke the additional complexity of conditional translation or any other kinds of factors that lead to enhanced readthrough, because the translation of UGA sense codons in the ciliate Condylostoma, where we have supporting experimental evidence, does not resemble translational readthrough. These codons occur in constitutively expressed single-copy genes, like a tryptophan tRNA synthetase and an eRF1 protein (Swart et al. 2016), not ones that might be expected to be conditionally translated.

      On the other hand, it seems obvious to me that there are other ways of reading through a stop codon without mutating eRF1 at position S67. So the absence of a mutation at this position is not really indicative of a level of reading of the UGA codon.

      It may seem obvious to the reviewer, but that is neither what Kachale et al. originally proposed nor what we questioned. Kachale et al. hypothesized that mutation of S67 to A or G is necessary for UGA=Trp translation, but we provided evidence that it is not: multiple organisms with S67 or C67 that translate UGA as tryptophan. Kachale et al. also originally suggested that the S67 to A/G substitution is also necessary in Condylostoma for UGA translation as tryptophan by weakening its recognition of this codon as a stop (from their abstract: “Virtually the same strategy has been adopted by the ciliate Condylostoma magnum.”). However, as we have stated, Condylostoma (A67) is both able to efficiently terminate at UGA stop codons and to efficiently translate (other) UGA sense codons, which does not fit this hypothesis.

      Before writing such a strong assertion as that found on page 3, experiments should be carried out. The authors should therefore moderate their assertion.

      Experiments should be carried out in the organisms in which stop codon reassignments have readily occurred and their close relatives that have not, not distantly related ones where they rarely, if ever, occur, like yeasts. We made this point in the conclusion. There is too much emphasis on models for investigation of genetic code evolution via stop codon reassignments in questionable models and too little investigation in the really good ones, particularly the ciliates. This clade has genera that are amenable to molecular experiments including Paramecium, Tetrahymena and Oxytricha. We plan to add some text about these considerations in revision.

      To make a definitive conclusion, we would need to be able to measure the level of termination and readthrough in these organisms. So, from my point of view, all the arguments seem rather weak.

      We reiterate: there is experimental information about translation and termination in two ciliate species worth considering, including one that translates UGA codons depending on their context. If one chooses to ignore the evolutionary information presented, this not only ignores all prior approaches to infer genetic codes, but also the fact that there is experimental verification and other lines of evidence supporting these approaches.

      Moreover, the authors themselves indicate that the conjunction between a Trp tRNA that is efficient at reading the UGA codon and an eRF1 factor that is not efficient at recognising this stop codon could be the key to reassignment.

      This does not convey well what we wrote, since the main consideration was overall eRF1 structure, rather than individual amino acid substitutions. Here are the key sentences:

      “Instead, in a transitional evolutionary phase, codons may be interpreted in two ways, with potential eRF1-tRNA competition. With time, beneficial mutations or modifications in either the tRNA or eRF1 (or other components of translation) that reduce competition may be selected.

      Instead of focusing on individual eRF1 substitutions, we propose future investigations should more generally explore the structure of non-standard genetic code eRF1’s captured in translation termination in the context of their own ribosomes.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the distinct subpopulation of adipocytes during brown-to-white conversion in perirenal adipose tissue (PRAT) at different ages. The evidence supporting the claims of the authors is convincing, although specific lineage tracing of this subpopulation of cells and mechanistic studies would expand the work. The work will be of interest to scientists working on adipose and kidney biology.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors performed single nucleus RNA-seq for perirenal adipose tissue (PRAT) at different ages. They concluded a distinct subpopulation of adipocytes arises through brown-to-white conversion and can convert to a thermogenic phenotype upon cold exposure.

      Strengths:

      PRAT adipose tissue has been reported as an adipose tissue that undergoes browning. This study confirms that brown-to-white and white-to-beige conversions also exist in PRAT, as previously reported in the subcutaneous adipose tissue.

      Response: We thank the reviewer for summarizing the strengths of our manuscript. However, we would like to clarify two points here. First, PRAT has been reported as a visceral adipose depot that contains brown adipocytes and a process of continuous replacement of brown adipocytes by white adipocytes has been previously suggested based on histological assessment. There is no evidence that PRAT undergoes browning, unless cold exposure is involved. Second, unlike the brown-to-white conversion, white-to-beige conversion in PRAT was not observed under normal conditions. The adipocyte population that arises from brown-to-white conversion (mPRAT-ad2) can respond to cold and restore their UCP1 expression. However, the adipocytes that arise from the mPRAT-ad2 subpopulation after cold exposure have a distinct transcriptome to that of cold-induced beige adipocyte in iWAT (Figure S7K) and are more related to iBAT brown adipocytes (Figure 6E). Therefore, it is more of a white-to-brown conversion in PRAT upon cold exposure rather than white-to-beige conversion and the underlying mechanism is likely different from the white-to-beige conversion in the subcutaneous adipose tissue.

      Weaknesses:

      (1) There is overall a disconnection between single nucleus RNA-seq data and the lineage chasing data. No specific markers of this population have been validated by staining.

      Response: We are not sure what “this population” refers to. We assume that it is the Ucp1-&Cidea+ mPRAT-ad2 adipocyte subpopulation. If so, we did not identify specific markers for these adipocytes as shown in Figure 1H and statements in the Discussion section. mPRAT-ad2 is negative for Ucp1 and Cyp2e1, which are markers for mPRAT-ad1 and mPRAT-ad3&4, respectively. To visualize the mPRAT-ad2 adipocytes on tissue sections, we collected pvPRAT and puPRAT of Ucp1CreERT2;Ai14 mice one day after tamoxifen injection and stained with CYP2E1 antibody and BODIPY. The Tomato-&CYP2E1-&BODIPY+ cells represent the mPRAT-ad2 adipocytes. Based on such strategy, we revealed a significantly higher percentage of mPRAT-ad2 cells in puPRAT than pvPRAT (presented as Figure S3E in the revised manuscript).

      (2) It would be nice to provide more evidence to support the conclusion shown in lines 243 to 245 "These results indicated that new BAs induced by cold exposure were mainly derived from UCP1- adipocytes rather than de novo ASPC differentiation in puPRAT". Pdgfra-negative progenitor cells may also contribute to these new beige adipocytes.

      Response: We stained pvPRAT and puPRAT of the PdgfraCre;Ai14 mice with the adipocyte marker Plin1 and observed a 100% overlap between the tdTomato signal and the Plin1 staining, after examining a total of 832 and 628 adipocytes in pvPRAT and puPRAT of two animals (Figure S4). Plin1 stains all adipocytes, while the endogenous tdTomato labels both the adipocytes and blood vessels. This result suggests that all adipocytes in mPRAT are derived from Pdgfra-expressing cells, which is in line with a previous study that integrated several single-cell RNA sequencing data sets and showed that Pdgfra is expressed by virtually all ASPCs (Ferrero et al., 2020).

      Also, we would like to point out that the cold-induced adipocytes in mPRAT resemble more to the brown adipocytes of iBAT than the beige adipocytes of iWAT (Figure 6E and S7K).

      Ferrero, R., Rainer, P., and Deplancke, B. (2020). Toward a Consensus View of Mammalian Adipocyte Stem and Progenitor Cell Heterogeneity. Trends Cell Biol 30, 937-950.

      (3) The UCP1Cre-ERT2; Ai14 system should be validated by showing Tomato and UCP1 co-staining right after the Tamoxifen treatment.

      Response: We collected pvPRAT and puPRAT of 1- and 6-month-old Ucp1CreERT2;Ai14 mice one day after the last tamoxifen injection and stained with UCP1 antibody to check the overlap between the Tomato and UCP1signal. All Tomato+ cells were UCP1+, indicating 100% specificity of the Ucp1CreERT2; and the labelling efficiency was over 93% at both time points for both regions (Figure S3C-D).

      Reviewer #2 (Public Review):

      Summary:

      In the present manuscript, Zhang et al utilize single-nuclei RNA-Seq to investigate the heterogeneity of perirenal adipose tissue. The perirenal depot is interesting because it contains both brown and white adipocytes, a subset of which undergo functional "whitening" during early development. While adipocyte thermogenic transdifferentiation has been previously reported, there remain many unanswered questions regarding this phenomenon and the mechanisms by which it is regulated.

      Strengths:

      The combination of UCP1-lineage tracing with the single nuclei analysis allowed the authors to identify four populations of adipocytes with differing thermogenic potential, including a "whitened" adipocyte (mPRAT-ad2) that retains the capacity to rapidly revert to a brown phenotype upon cold exposure. They also identify two populations of white adipocytes that do not undergo browning with acute cold exposure.

      Anatomically distinct adipose depots display interesting functional differences, and this work contributes to our understanding of one of the few brown depots present in humans.

      Weaknesses:

      The most interesting aspect of this work is the identification of a highly plastic mature adipocyte population with the capacity to switch between a white and brown phenotype. The authors attempt to identify the transcriptional signature of this ad2 subpopulation, however, the limited sequencing depth of single nuclei somewhat lessens the impact of these findings. Furthermore, the lack of any form of mechanistic investigation into the regulation of mPRAT whitening limits the utility of this manuscript. However, the combination of well-executed lineage tracing with comprehensive cross-depot single-nuclei presented in this manuscript could still serve as a useful reference for the field.

      Response: The sequencing depth of our data is comparable, if not better than previously published snRNA-seq studies on adipose tissue (Burl et al., 2022; Sarvari et al., 2021; Sun et al., 2020). Therefore, the depth of our data has reached the limit of the 3’ sequencing methods. Unfortunately, due to size limitation of the adipocytes, it is challenging to sort them for Smart-seq. We suspect that lack of specific markers for mPRAT-ad2 is partly due to its intermediate and plastic phenotype. Regarding the mechanistic regulation of mPRAT whitening, we believe that it is more suitable to leave such investigations for a separate follow-up and more in-depth study.

      Burl, R.B., Rondini, E.A., Wei, H., Pique-Regi, R., and Granneman, J.G. (2022). Deconstructing cold-induced brown adipocyte neogenesis in mice. Elife 11. 10.7554/eLife.80167.

      Sarvari, A.K., Van Hauwaert, E.L., Markussen, L.K., Gammelmark, E., Marcher, A.B., Ebbesen, M.F., Nielsen, R., Brewer, J.R., Madsen, J.G.S., and Mandrup, S. (2021). Plasticity of Epididymal Adipose Tissue in Response to Diet-Induced Obesity at Single-Nucleus Resolution. Cell Metab 33, 437-453 e435. 10.1016/j.cmet.2020.12.004.

      Sun, W., Dong, H., Balaz, M., Slyper, M., Drokhlyansky, E., Colleluori, G., Giordano, A., Kovanicova, Z., Stefanicka, P., Balazova, L., et al. (2020). snRNA-seq reveals a subpopulation of adipocytes that regulates thermogenesis. Nature 587, 98-102. 10.1038/s41586-020-2856-x.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) There is overall a disconnection between single nucleus RNA-seq data and the lineage chasing data. No specific markers of this population have been validated by staining.

      (2) It would be nice to provide more evidence to support the conclusion shown in lines 243 to 245: "These results indicated that new BAs induced by cold exposure were mainly derived from UCP1- adipocytes rather than de novo ASPC differentiation in puPRAT". Pdgfra-negative progenitor cells may also contribute to these new beige adipocytes.

      (3) The UCP1Cre-ERT2; Ai14 system should be validated by showing Tomato and UCP1 co-staining right after the Tamoxifen treatment.

      Please see above for the responses.

      Reviewer #2 (Recommendations For The Authors):

      • Without specific lineage tracing it is not possible to conclude that the mPRAT-ad2 population converted to beige with CE. The authors should change this wording from "likely" to "possible".

      Response: We have changed the word “likely” to “possible” in the text. Also, we would like to point out that the cold-induced adipocytes in mPRAT resemble more to the brown adipocytes of iBAT than the beige adipocytes of iWAT (Figure 6E and S7K).

      • The sentence "precursor cells may be less sensitive to environmental temperature and have a limited contribution to mature adipocyte phenotypes through de novo adipogenesis after cold exposure." and others like it should be changed to indicate the acute timeframe of this experiment. It has been shown that the precursors make a more significant contribution to de novo beige adipogenesis with chronic cold exposure.

      Response: We have modified the sentence as follows: “precursor cells may be less sensitive to acute environmental temperature drop and have a limited contribution to mature adipocyte phenotypes through de novo adipogenesis after cold exposure”. As mentioned above, the cold-induced adipocytes in mPRAT resemble more to the brown adipocytes of iBAT and therefore may have a different mechanism to the de novo beige adipogenesis with chronic cold exposure.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed the specific comments made upon the initial submission. In particular, they have now provided an explanation, why their GSDM tree looks different than previously published trees. The authors have also followed my initial suggestion to consider the highly-conserved residue following the cleavage site in bird GSDMA forms. Some of the more general weaknesses remain, since they cannot easily be addressed. I agree with the suggestions made by reviewer #2 to further improve the manuscript.

      We thank the reviewer for their insight which we think has improved our manuscript. We have additionally made the changes requested by this reviewer and reviewer #2 in the next section.

      Reviewer #2 (Recommendations For The Authors):

      The authors responded sincerely to our reviewers' questions in the revised manuscript and I sufficiently understand. After re-reading it, however, I found two issues that need to be revised, so please consider doing them.

      (1) New sentences (Page 5, lines 209-212) that the authors have added are better written in the subsection, "Bird GSDMA is activated .." after some modification. Because there is an undeniable sense of suddenness in present position.

      We agree with this evaluation and have moved these sentences to a more natural position in the following section.

      (2) Regarding the chromosomal location of the GSDMA gene, the authors describe that the genes of mammals, birds, and reptiles localize the same genetic locus, but no data are presented. To support their claim, it should also be presented as a supplementary figure.

      We agree with this evaluation and have generated Figure 1 – Supplemental 4 to show the synteny of the GSDMA locus from humans to GSDMEc in sharks.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewing Editor:

      Comment: Bladder dysfunction following spinal cord injury (SCI) represents a severe and disabling complication and we lack effective therapies. Following evidence that AMPA receptors play a key role in bladder function the authors show convincingly that AMPA allosteric activators can ameliorate many of the subacute defects in bladder and sphincter function following SCI, including prolonged voiding intervals and high bladder pressure thresholds for voiding. These valuable results in rodents may help in the development of these agents as therapeutics for humans with SCI-induced bladder dysfunction.

      Response: We thank the reviewing editor for their assessment of this manuscript and positive comments. We also appreciate the opportunity to revise this manuscript for publication in eLife. We have addressed the excellent comments of the three reviewers. We have included detailed response-to-reviewer comments below to address each specific point. Based on the reviewers’ critiques, we feel our re-working of the manuscript has made for a greatly improved study.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Spinal cord injury (SCI) causes immediate and prolonged bladder dysfunction, for which there are poor treatments. Following up on evidence that AMPA glutamatergic receptors play a key role in bladder function, the authors induced spinal cord injury and its attendant bladder dysfunction and examined the effects of graded doses of allosteric AMPA receptor activators (ampakines). They show that ampakines ameliorate several prominent derangements in bladder function resulting from SCI, improving voiding intervals and pressure thresholds for voiding and sphincter function.

      Strengths:

      Well-performed studies on a relevant model system. The authors induced SCI reproducibly and showed that they had achieved their model. The drugs revealed clear and striking effects. Notably, in some mice that had such bad SCI that they could not void, the drug appeared to restore voiding function.

      Weaknesses:

      The studies are well conducted, but it would be helpful to include information on the kinetics of the drugs used, their half-life, and how long they are present in rats after administration. What blood levels of the drugs are achieved after infusion? How do these compare with blood levels achieved when these drugs are used in humans?

      Response: We thank Reviewer #1 for the positive comments and their helpful critique. We address each of the specific comments below (in the “Recommendations for the Authors” section of this Response to Reviewer Comments document), and have made changes to the manuscript based on these excellent points.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Rana and colleagues present interesting findings demonstrating the potential beneficial effects of AMPA receptor modulators with ampakines in the context of the neurogenic bladder following acute spinal cord injury. Neurogenic bladder dysfunction is characterized by urinary retention and/or incontinence, with limited treatments available. Based on recent observations showing that ampakines improved respiratory function in rats with SCI, the authors explored the use of ampakine CX1739 on bladder and external urethral sphincter (EUS) function and coordination early after mid-thoracic contusion injury. Using continuous flow cystometry and EUS myography the authors showed that ampakine treatment led to decreased peak pressures, threshold pressure, intercontraction interval, and voided volume in SCI rats versus vehicle-treated controls. Although CX1739 did not alter EUS EMG burst duration, treatment did lead to EUS EMG bursting at lower bladder pressure compared to baseline. In a subset of rats that did not show regular cystometric voiding, CX1739 treatment diminished non-voiding contractions and improved coordinated EUS EMG bursting. Based on these findings the authors conclude that ampakines may have utility in recovery of bladder function following SCI.

      Strengths:

      The experimental design is thoughtful and rigorous, providing an evaluation of both the bladder and external urethral sphincter function in the absence and presence of ampakine treatment. The data in support of a role for CX1789 treatment in the context of the neurogenic bladder are presented clearly, and the conclusions are adequately supported by the findings.

      Weaknesses:

      Since CX1789 was administered in the context of cystometry and urethral sphincter EMG, a brief discussion of how ampakines could be used in a therapeutic context in humans would help to understand the translational significance of the work. The study lacks information on the half-life of CX1789 and how might this impact the implementation of CX1789 for clinical use. In addition, the study was limited to female rats. Lastly, given the male bias of traumatic SCI in humans, a brief discussion of this limitation is warranted.

      Response: We thank Reviewer #2 for their positive comments and their helpful critique. We address each of the specific comments below (in the “Recommendations for the Authors” section of this Response to Reviewer Comments document). We have also made changes to the manuscript based on the three excellent discussion points brought up by the reviewer.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rana and colleagues examined the effect of a "low impact" ampakine, an AMPA receptor allosteric modulator, on the voiding function of rats subjected to midline T9 spinal cord contusion injury. Previous studies have shown that the micturition reflex fully depends on AMPA glutaminergic signaling, and, that the glutaminergic circuits are reorganized after spinal cord injury. In chronic paraplegic rats, other circuits (no glutaminergic) become engaged in the spinal reflex mechanism controlling micturition. The authors employed continuous flow cystometry and external urethral sphincter electromyography to assess bladder function and bladder-urethral sphincter coordination in naïve rats (control) and rats subjected to spinal cord injury (SCI). In the acute phase after SCI, rats exhibit larger voids with lower frequency than naïve rats. This study shows that CX1739 improves, in a dose-dependent manner, bladder function in rats with SCI. The interval between voids and the voided volume was reduced in rats with SCI when compared to controls. In summary, this is an interesting study that describes a potential treatment for patients with SCI.

      Strengths:

      The findings described in this manuscript are significant because neurogenic bladder predisposes patients with SCI to urinary tract infections, hydronephrosis, and kidney failure. The manuscript is clearly written. The study is technically outstanding, and the conclusions are well justified by the data.

      Weaknesses:

      The study was conducted 5 days after spinal cord contusion when the bladder is underactive. In rats with chronic SCI, the bladder is overactive. Therefore, the therapeutic approach described here is expected to be effective only in the underactive bladder phase of SCI. The mechanism and site of action of CX1739 is not defined.

      Response: We thank Reviewer #3 for the positive comments and their helpful critique. We address each of the specific comments below (in the “Recommendations for the Authors” section of this Response to Reviewer Comments document), and have made changes to the manuscript based on the excellent point mentioned in the weakness section.

      Comment: Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Response: We have addressed all comments of both reviewers. We detail our responses in this Response to Reviewer Comments document and have made the associated modifications to the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Comment: These are well-performed studies.

      Response: We thank the reviewer for their positive comment.

      Comment: It would be useful to know the blood levels of the drug that are achieved by the infusions, and how long the drugs remain after infusion. Is the 45-minute interval between doses appropriate for the drug's kinetics?

      Response: While blood levels of ampakine were not tested in this study, pharmacokinetic parameters for CX1739 in Sprague Dawley rats have previously been determined following an intravenous administration of CX1739. The mean plasma half-life of CX1739 was 1.25 ± 0.03 hrs, with a Tmax of 30 minutes (information provided through personal communication with RespireRx). Although the 45 minutes interval between doses would not be within the time frame of post administration clearance of the first CX1739 dose from the system, the plasma levels would be considerably lower by 45 mins post administration. A limitation of terminal cystometry preparations is the duration you can maintain a single animal, and this was also included in our rationale for dosing every 45 mins. In our experience longer recordings can increase variability. A 45 min window allowed for the anesthetized procedure to remain under ~6 hours. Further, in our studies investigating the impact of ampakines in rats following an SCI, acute impacts of intravenous ampakine administration were observed for up to 30 minutes. (Rana et al., 2021) Along with the half-life and data from the respiratory system informed our decision here. We have added this rationale to the methods section and in part to the discussion section (Page 11, 2930).

      Comment: Since a major plus of these studies is their potential applicability to humans with SCI, it would be helpful to know whether the drug levels achieved here resemble those that were achieved in human trials to date.

      Response: Since blood/plasma levels were not tested in the current study, we cannot comment on the comparison of blood plasma levels achieved in human trials. However, we have expanded upon this point in the discussion section (page 29-30).

      Comment: The authors could also provide us with a bit more description of the different classes of ampakines, and why they chose the one they used.

      Response: Thank you for this suggestion. We would like to highlight a section in our discussion (Page 28-29) where we have an in-depth description of the two classes of ampakines in the discussion and the rationale for selecting the low-impact CX1739 drug.

      Comment: Lastly, the first reference is cited twice in the bibliography.

      Response: The duplicate reference has been removed.

      Reviewer #2 (Recommendations For The Authors):

      Comment: Overall, the findings support the potential for ampakine administration in the setting of neurogenic bladder dysfunction following SCI. The manuscript was well written, the experimental design was rigorous, the data were of excellent quality, and the conclusions were adequately supported by the findings. Weaknesses are considered minor and can be addressed mostly by clarification as noted below.

      Response: We thank the reviewer for their positive comments.

      Comment: Since CX1789 was provided in the context of cystometry and EUS EMG, a brief discussion of how ampakines could be used in a therapeutic context in humans would help to understand the translational significance of the work.

      Response: Thank you for this important comment to include a discussion about translational significance of CX1739. We have included a discussion (Page 34) about the translational significance of this work in the discussion section of the last paragraph.

      Comment: No information is provided on the half-life of CX1789 and how might this impact the implementation of CX1789 for clinical use. The inclusion of this information would help the reader to appreciate the potential for and limitations of clinical implementation.

      Response: Although pharmacokinetic analyses were not conducted as part of this study, we have included details of CX1739 plasma pharmacokinetics examined in Sprague-Dawley rats (Page 11, 29-30). This information has been provided through personal communication with RespireRx.

      Comment: The study was limited to female rats. Would the authors anticipate different efficacy of CX1789 in male rats? A comment on the choice of animal sex and implications for interpretation of the findings would strengthen the discussion and potential clinical implementation given the male bias of traumatic SCI in humans.

      Response: Thank you for your important comment. In this study, females were chosen primarily due to the fact they have better recovery outcomes from spinal cord injury. During initial preliminary data gathering, we used both male and female rats and found that the male rats often did not recover cytometric voiding at this time point. So we chose to continue only with the female rats in this current study. It is well established that female rats have better urogenic recovery from SCI effects, perhaps due to the easier postoperative care. It is critical that we complete future studies in both male and female rats, however, we will have to change our experimental paradigm (time after injury, and or severity of injury) to make comparisons between SCI and intact male rats. We have now included this important topic of our sex selection in the methods section (Page 6) of the manuscript and have also expanded this point in the discussion section (page 30).

      Reviewer #3 (Recommendations For The Authors):

      Comment: The impact of ampakine treatment on EUS EMG activity is not obvious from the data presented in Fig. 5C-F. I do see in the magnified area of the SCI rat tracing some clear EUS activity with 15 mg/kg of CX1739. However, statistically, there is not a significant improvement in bladder-urethral sphincter coordination in rats treated with ampakine. Authors should discuss how or why ampakine treatment improves bladder function without affecting bladder-urethral sphincter coordination. The background noise of the EUS EMG in Fig. 5B changes dramatically between conditions. Are these tracings from the same experiment? If yes, please explain why the background noise changes during the course of the experiment. Was this change in background noise observed only in SCI rats?

      Response: Thank you for such an interesting comment. Although our data analysis shows no statistically significant difference in the duration or amplitude of EUS EMG bursting when comparing vehicle to ampakine treatment. However, we did see a difference in the threshold at which bursting occurred (Fig 5C-F). Rats that lost complete coordination (Figure 6) due to injury, ampakines provide further confirmation about producing EUS EMS bursting and coordinated voiding.

      Therefore, these results suggest that ampakines have some positive modulatory effects on EUS EMG bursting events. Overall, we did not see any significant differences of the background noise of EUS EMG between conditions during experiments both in spinal intact and SCI. The background noise of the EUS EMG in Fig. 5B decreases after baseline and HPCD due to changes in experimental conditions (needed to use slightly more urethane due to showing up of animal’s consciousness). We would also like to confirm that these tracings are from the same experiment. Accordingly, we have made further clarifications in the manuscript.

      Comment: Tables 1 and 2 show the same data as figures 3 and 4. I suggest removing the tables. In addition, table 2 includes letters (A, B, C, D) to indicate statistical significance. However, no indication of the meaning of these letters is provided. What does "levels not connected by same letter are significantly different" mean? Please clarify. I suggest including the statistical comparisons in Fig. 4

      Response: While we did consider adding statistical bars in the graphs themselves, the number of comparisons being conducted reduced the readability of the graphs. Thus, we would like preserve the current format of the table and provide the readers with all statistical comparisons being made. The statement “levels not connected by the same letter are significantly different” indicates that only treatment groups for an outcome that do not have an overlapping letter, such as baseline (A) and HPCD (A) values for threshold pressures are different from the 5 mg/kg (B,C,D), 10 mg/kg (C,D) and 15 mg/kg (D) group in the SCI rats. Further, threshold pressures in the 5 mg/kg, 10 mg/Kg and 15 mg/kg groups are not significantly different from each other. These results have also been described in detail in the results section. Lastly, we acknowledge the redundancy of data presented in Tables 1 and 2. These two tables have been moved to the supplemental section.

      Comment: A study by Yoshiyama and colleagues previously showed that the AMPA antagonists LY215490 completely abolished the reflex bladder contractions and EMG activity of the EUS muscle during a continuous filling in naïve rats (JPET 1997). Surprisingly, CX1739, a low-impact AMPA receptor activator, does not affect bladder contractions or EMG activity in naïve rats. Authors should discuss the reason for this discrepancy.

      Response: Thank you for this comment. We believe the different pharmacokinetics of the drugs can explain these effects. We have included this critical point in the discussion (page 31-32).

      Comment: The conclusion that CX1739 is acting on sensory pathways is highly speculative and needs additional support. The functional status of the afferent pathways is uncertain following SCI. Please revise.

      Response: Thank you for this comment. We agree, in retrospect, that this speculative comment is an overassumption, and we have removed it from the discussion. We have modified the discussion to remove focus from the sensory nervous system and, more generally, discuss the location of AMPA receptors in the voiding neurocircuitry (page 31).

      Comment: Figure 3. It's difficult to see the asterisks that indicate statistical significance. Please use a line or a bigger symbol to indicate statistical differences between groups.

      Response: Thank you for the suggestion we have modified the figure to make the asterisks bigger and added a line.

      Comment: Data for peak pressure should be included in Figures 3 and 4.

      Response: Thank you for pointing out one of the important parameters of cystometry which is peak pressure. As we did not see significant changes in bladder peak contraction pressure between spinal intact and SCI rats, we prefer not to show a graph of peak pressure (in Fig 3) to highlight other parameters that showed significant injury effects, such as baseline pressure, ICI, threshold, and voided volume. However, peak pressure reduced similarly both in spinal intact and SCI rats, suggesting that ampakine has some treatment effects on peak pressure that we prefer to include in Fig 4. We modified our results section and have included a description on peak pressures in the result section.

      Comment: The peak pressure was reduced in both naïve and SCI rats treated with ampakine. Therefore, the peak pressure is not one of the parameters that improves by ampakine in SCI rats.

      Response: Yes, we agree that peak pressures between spinal intact and SCI rats were comparable. Some treatment effects of ampakine on peak pressure were observed both between spinal intact and SCI rats. We have amended the manuscript to make this clearer.

      Comment: The reference from Yoshiyama et al (1999) is duplicated.

      Response: Thank you for catching this error. The references have been combined in the revised version.

      Comment: Page 15, the authors state that "Coordinated bladder contractions and associated EUS EMG activity were readily demonstrated in all 7 naïve animals". In other sections, they referred to 8 naïve rats. What is the actual number of naïve rats?

      Response: Thanks for pointing out this error. The actual number of naïve rats is 8. We have rectified this error.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the Editors and the Reviewers for their comments on the importance of our work “showing a new role of caveolin-1 as an individual protein instead of the main molecular component of caveolae” in contributing to membrane bending rigidity and for constructive and thoughtful remarks that have allowed us to improve the manuscript.

      Indeed, we here establish the contributing role of caveolin-1 to membrane mechanics by a molecular mechanism that needs to be further addressed. To that respect, we thank the reviewers for suggesting avenues to improve the presentation and discussion of our hypotheses based on results of theoretical model and independent biophysical measurements of membrane mechanics in tube pulling from plasma membrane spheres, which concur to support the key role of caveolin-1 in building membrane bending rigidity.

      To fulfill the recommendations of the reviewers we have modified the manuscript, as discussed below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Because of the role of membrane tension in the process, and that caveloae regulate membrane tension, the authors looked at the formation of TEMs in cells depleted of Caveolin1 and Cavin1 (PTRF): They found a higher propensity to form TEMs, spontaneously (a rare event) and after toxin treatment, in both Caveolin 1 and Cavin 1. They show that in both siRNA-Caveolin1 and siRNA-Cavin1 cells, the cytoplasm is thinner. They show that in siCaveolin1 only, the dynamics of opening are different, with notably much larger TEMs. From the dynamic model of opening, they predict that this should be due to a lower bending rigidity of the membrane. They measure the bending rigidity from Cell-generated Giant liposomes and find that the bending rigidity is reduced by approx. 50%.

      Strengths:

      They also nicely show that caveolin1 KO mice are more susceptible to death from infections with pathogens that create TEMs.

      Overall, the paper is well-conducted and nicely written. There are however a few details that should be addressed.

      See below modifications brought to the manuscript in response to the Reviewer’s remarks.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Morel et al. aims to identify some potential mechano-regulators of transendothelial cell macro-aperture (TEM). Guided by the recognized role of caveolar invaginations in buffering the membrane tension of cells, the authors focused on caveolin-1 and associated regulator PTRF. They report a comprehensive in vitro work based on siRNA knockdown and optical imaging approach complemented with an in vivo work on mice, a biophysical assay allowing measurement of the mechanical properties of membranes, and a theoretical analysis inspired by soft matter physics.

      Strengths:

      The authors should be complimented for this multi-faceted and rigorous work. The accumulation of pieces of evidence collected from each type of approach makes the conclusion drawn by the authors very convincing, regarding the new role of cavolin-1 as an individual protein instead of the main molecular component of caveolae. On a personal note, I was very impressed by the quality of STORM images (Fig. 2) which are very illuminating and useful, in particular for validating some hypotheses of the theoretical analysis.

      Weaknesses:

      While this work pins down the key role of caveolin-1, its mechanism remains to be further investigated. The hypotheses proposed by the authors in the discussions about the link between caveolin and lipids/cholesterol are very plausible though challenging. Even though we may feel slightly frustrated by the absence of data in this direction, the quality and merit of this paper remain.

      We thank the reviewer for mentioning the merit of our work which lays the foundations for more molecular mechanistic work on a possible role of lipids/cholesterol in the building of membrane bending rigidity by caveolin-1 and which is currently carried out by some of the authors, and which shows that the question is indeed challenging as indicated by the reviewer. This is now stated in the results section, as suggested (Page 12) :

      "To test these predictions, we have treated cells with methyl-beta-cyclodextrin to deplete cholesterol from the plasma membrane and reduce its bending rigidity (47); unfortunately, this treatment affected the cell morphology, which precluded further analysis."

      The analogy with dewetting processes drawn to derive the theoretical model is very attractive. However, although part of the model has already been published several times by the same group of authors, the definition of the effective membrane rigidity of a plasma membrane including the underlying actin cortex, was very vague and confusing.

      We thank the reviewer for mentioning the importance of defining the terms “membrane bending rigidity” as well as “effective membrane bending rigidity” that is now used and defined in the material and method section in the Physical modelling description (see considerations below), while for the sake of simplicity we use the term “membrane bending rigidity” in the main text, which is now defined in the introduction section : “membrane bending rigidity, i.e. the energy required to locally bend the membrane surface”.

      Indeed, in a liposome, a rigorous derivation leads to a relationship between the membrane tension and the variation of the projected area, which are related by the bending rigidity: this relationship is known as the Helfrich’s law. This statistical physics approach is only rigorously valid for a liposome, whereas its application to a cell is questionable due to the presence of cytoskeletal forces acting on the membrane. Nevertheless, application of the Helfrich’s law to cell membranes may be granted on short time scales, before active cell tension regulation takes place (Sens P and Plastino J, 2015 J Phys Condens Matter), especially in cases where cytoskeletal forces play a modest role, such as red blood cells (Helfrich W 1973 Z Naturforsch C). The fact that the cytoskeletal structure and actomyosin contraction are significantly disrupted upon cell intoxication-driven inhibition of the small GTPase RhoA, as shown here for the first time by STORM analysis, supports the applicability of Helfrich’s law to describe TEM opening. Because of the presence of proteins, carbohydrates, and the adhesion of the remaining actin meshwork after toxin treatment, we expect the Helfrich relationship to somewhat differ from the case of a pure lipidic membrane. We account for these effects via an “effective bending rigidity”, a term used in the detailed discussion of the model hypotheses, which corresponds to an effective value describing the relationship between membrane tension and projected area variation in our cells.

      The following discussion has been extended and improved in the Physical modeling part of the materials & methods section (Pages 23-24): “κ is the effective bending rigidity of the cell membrane, which quantifies the energy required to bend the membrane. (…). While rigorously derived for a pure lipid membrane, we assumed that Helfrich’s law is applicable to describe the relationship between the effective membrane tension acting on TEMs and the observed projected surface in our cells. We expect Helfrich’s law to be applicable on short time scales, before active cell tension regulation takes place (73), especially in cases where cytoskeletal forces play a modest role, such as for red blood cells (74) or for the highly disrupted cytoskeletal structure of our intoxicated cells. Thus, the parameter κ in Eq. 2 is an effective bending rigidity, whose value may somewhat differ from that of a pure lipid membrane to account for the role played by protein inclusions and the mechanical contribution of the remaining cytoskeletal elements after cell treatment with the toxin”

      Here, for the first time, thanks to the STORM analysis, the authors show that HUVECs intoxicated by ExoC3 exhibit a loose and defective cortex with a significantly increased mesh size. This argues in favor of the validity of Helfrich formalism in this context. Nonetheless, there remains a puzzle. Experimentally, several TEMs are visible within one cell. Theoretically, the authors consider a simultaneous opening of several pores and treat them in an additive manner. However, when one pore opens, the tension relaxes and should prevent the opening of subsequent pores. Yet, experimentally, as seen from the beautiful supplementary videos, several pores open one after the other. This would suggest that the tension is not homogeneous within an intoxicated cell or that equilibration times are long. One possibility is that some undegraded actin pieces of the actin cortex may form a barrier that somehow isolates one TEM from a neighboring one.

      As pointed by the Reviewer, we expect that membrane tension is neither a purely global nor a purely local parameter. Opening of a TEM will relax membrane tension over a certain distance, not over the whole cell. Moreover, once the TEM closes back, membrane tension will increase again. This spatial and temporal localization of membrane tension relaxation explains that the opening of a first TEM does not preclude the opening of a second one or enlargement of the TEM when the actin cable is cut by laser ablation (20). On the other hand, membrane tension is not a purely local property. Indeed, we observe that when two TEMs enlarge next to each other, their shape becomes anisotropic, as their enlargement is mutually hampered in the region separating them. We account for this interaction by treating TEM membrane relaxation in an additive fashion. We emphasize that this simplified description is used to predict maximum TEM size, corresponding to the time at which TEM interaction is strongest. As the reviewer points out, it would be more questionable to use this additive treatment to predict the likelihood of nucleation of a new TEM, which is not done here.

      Accordingly, the Physical modelling part in the materiel and methods has been modified into: “Eq. 2 treats the effect of several simultaneous TEMs in an additive manner. This approximation is used here to predict TEM size, because at maximum opening of simultaneous TEMs their respective membrane relaxation is felt by each other, as it can be inferred from the shape that neighboring TEMs adopt in experiments. This additive treatment would appear less appropriate to describe the likelihood of nucleating a second TEM in the presence of a first one (a calculation that is not performed here), since membrane relaxation by a TEM may not be felt at membrane regions distant from it.”

      Could the authors look back at their STORM data and check whether intoxicated cells do not exhibit a bimodal population of mesh sizes and possibly provide a mapping of mesh size at the scale of a cell?

      To address the question raised by the Reviewer we decided to plot the whole distribution of mesh sizes in addition to the average value per cell. We did not observe a bimodal distribution but rather a very heterogeneous distribution of mesh size going up to a few microns square in all conditions of siRNA treatments. Moreover, we did not observe a specific pattern in the distribution of mesh size at the scale of the cell, with very large mesh sizes being surrounded by small ones. We also did not observe any specific pattern for the localization of TEM opening, as described in the paper, making the correlation between mesh size and TEM opening difficult.

      This following sentence has been added in the results section (Pages 8-9): “Indeed, we observed in cells treated with ExoC3 no specific cellular pattern or bimodal distribution of mesh size between the different siRNA conditions but a rather very heterogeneous distribution of mesh size values that could reach a few square microns in all conditions. ”

      In particular, it is quite striking that while bending rigidity of the lipid membrane is expected to set the maximal size of the aperture, most TEMs are well delimited with actin rings before closing. Is it because the surrounding loose actin is pushed back by the rim of the aperture? Could the authors better explain why they do not consider actin as a player in TEM opening?

      Actin ring assembly and stiffening is indeed a player in TEM opening, that was investigated in the work by Stefani et al., 2017 Nat comm. Interference of actin ring assembly and stiffening is included in our differential equation describing TEM opening dynamics (second term on the left-hand side of Eq. 3). In some cases, actin ring assembly is the dominant player, such as in TEM opening after laser ablation (ex novo TEM opening/widening). In contrast, here we investigate de novo TEM opening, for which we expect that bending rigidity can be estimated without accounting for actin assembly, as we previously reported (19). Such a bending rigidity estimate (Eq. 5) is obtained by considering two different time scales: the time scale of membrane tension relaxation, governed by bending rigidity, and the time scale of cable assembly, governed by actin dynamics. We expect the first time scale to be shorter, and thus the maximum size of de novo TEMs to be mainly constrained by membrane tension relaxation. Two paragraphs related to the discussion of the different time scales have been added to 1) the discussion section, and 2) to the physical modelling part discussed in the materiel and methods section of the revised manuscript (see below).

      The following paragraph has been added in the discussion (Pages 14-15): “Our study shows that membrane rigidity sets the maximal size of TEM aperture, although an actin ring appears before TEM closure (20). Actin ring assembly and stiffening is indeed a player in TEM opening, and it is included in our differential equation describing TEM opening dynamics (Eq. 3). In some configurations, actin ring assembly is the dominant player, such as in TEM opening after laser ablation (ex novo TEM opening), as we previously reported (20). In contrast, here we investigate de novo TEM opening, for which we expect that bending rigidity can be estimated without accounting for actin assembly (19). Such a bending rigidity estimate (Eq. 5) is obtained by considering two different time scales: the time scale of membrane tension relaxation, governed by bending rigidity, and the time scale of cable assembly, governed by actin dynamics. We expect the first-time scale to be shorter, and thus the maximum size of de novo TEMs to be mainly constrained by membrane tension relaxation. However, we cannot rule out that the formation of an actin cable around the TEM before it reaches its maximum size may limit the correct estimation of the bending rigidity.”

      The following paragraph has been added in the physical modelling part of the materiel and methods section (Pages 24-25) “A limitation of our theoretical description arises from the use of spatially uniform changes in parameter values to describe differences between experimental conditions, thus assuming spatially uniform effects. However, we cannot exclude the existence of non-uniform effects, such as changes in the size and organization of the remaining actin mesh, which could set local, non-uniform barriers to TEM enlargement in a manner not accounted for by our model.” And “We note that the estimate of κ provided by Eq. 5 is independent of α and thus of actin cable assembly. This simplification arises from membrane tension relaxing over a shorter time scale than actin assembly. Thus, we expect the maximum size of de novo TEMs to be mainly constrained by membrane tension relaxation (19), unlike ex novo TEM enlargement upon laser ablation, for which the dynamics of actin cable assembly control TEM opening (20)”

      Instead of delegating to the discussion the possible link between caveolin and lipids as a mechanism for the enhanced bending rigidity provided by caveolin-1, it could be of interest for the readership to insert the attempted (and failed) experiments in the result section. For instance, did the authors try treatment with methyl-beta-cyclodextrin that extracts cholesterol (and disrupts caveolar and clathrin pits) but supposedly keeps the majority of the pool of individual caveolins at the membrane?

      As recommended by the reviewer we have added the following sentence (Page 12): “We have treated cells with methyl-beta-cyclodextrin to deplete cholesterol from the plasma membrane and reduce its bending rigidity (47); unfortunately, this treatment affected the cell morphology, which precluded further analysis”

      Tether pulling experiments on Plasma membrane spheres (PMS) are real tours de force and the results are quite convincing: a clear difference in bending rigidity is observed in controlled and caveolin knock-out PMS. However, one recurrent concern in these tether-pulling experiments is to be sure that the membrane pulled in the tether has the same composition as the one in the PMS body. The presence of the highly curved neck may impede or slow down membrane proteins from reaching the tether by convective or diffusive motion.

      We thank the Reviewer for mentioning the dedicated work accomplished with tether pulling experiments on PMS and for pointing the obtention of convincing results that align well with the hypotheses drawn from the theoretical model thereby allowing us to propose a direct or indirect role of caveolin-1 in the building of membrane rigidity. As pointed out by the reviewer, a concern with tube pulling experiments is related to the dynamics of equilibration of membrane composition between the nanotube and the rest of the membrane. In our experiments, we have waited about 30 seconds after tube pulling and after changing membrane tension. We have checked that after this time, the force remained constant, implying that we have performed experiments of tube pulling from PMS in technical conditions of equilibrium that ensure that lipids and membrane proteins had enough time to reach the tether by convective or diffusive motion.

      The revised version of the manuscript now includes the following sentence and a representative example of force vs time plot (Page 12): “We waited about 30 seconds after tube pulling and changing membrane tension and checked that we reached a steady state (Fig. S5), where lipids and membrane proteins had enough time to equilibrate.”

      Could the authors propose an experiment to demonstrate that caveolin-1 proteins are not restricted to the body of the PMS and can access to the nanometric tether?

      In principle, this could be further checked using cells expressing GFP-caveolin-1 to generate PMS as done in Sinha et al., 2011 and by analyzing a steady protein signal in the tube. This would confirm the equilibration, provided that caveolin-1 is recruited in the nanotube due to mechanical reasons that are now discussed in the discussion section (Pages 13-14) : “Our tube pulling experiments can be discussed along 2 lines. Indeed, since caveolin-1 is inserted in the cytosolic leaflet of the plasma membrane, when a nanotube is pulled towards the exterior of the PMS, we can expect 2 situations depending on the ability of caveolin-1 to deform membranes, which remains to be addressed (24). i) If Cav1 does not bend membranes, it could be recruited in the nanotube at a density similar to the PMS and our force measurement would reflect the bending rigidity of the PMS membrane. Cav1 could then stiffen membrane either as a stiff inclusion at high density or/and by affecting lipid composition. ii) If Cav1 bends the membrane, it is expected from caveolae geometry that the curvature in the tube would favor Cav1 exclusion. The force would then reflect the bending rigidity of the membrane depleted of Cav1, which should be the same in both types of experiments (WT and Cav1-depleted conditions) if the lipid composition remains unchanged upon Cav1 depletion. Note that the presence of a very reduced concentration of Cav1 as compared to the plasma membrane has been reported in tunneling nanotubes (TNT) connecting two neighboring cells (51). These TNTs have typical diameters of similar scale than diameters of tubes pulled from PMS. At this stage, we cannot decipher between both properties for Cav1. Considering a direct mechanical role of Cav1, previous studies showed that inclusion of integral proteins in membranes had no impact on bending rigidity, as shown in the bacteriorhodopsin experiment (52), or even decreased membrane rigidity as reported for the Ca2+-ATPase SERCA (53). Previous simulations have also confirmed the softening effect of protein inclusions (54). Nevertheless, our observations could be explained by a high density of stiff inclusions in the plasma membrane (>>10%), which is generally not achievable with the reconstituted membranes. Considering an impact on lipid composition, it is well established that caveolae are enriched with cholesterol, sphingomyelin, and glycosphingolipids, including gangliosides (55,56), which are known to rigidify membranes (57,47). Thus, caveolin-1 might contribute to the enrichment of the plasma membrane with these lipid species. We did not establish experimental conditions allowing us to deplete cholesterol without compromising the shape of HUVECs, which prevented a proper analysis of TEM dynamics. Moreover, a previous attempt to increase TEMs width by softening the membrane through the incorporation of poly-unsaturated acyl chains into phospholipids failed, likely due to homeostatic adaptation of the membrane’s mechanical properties (18). Further studies are now required to establish whether and how caveolin-1 oligomers control membrane mechanical parameters through modulation of lipids organization or content. Caveolin-1 expression may also contribute to plasma membrane stiffening by interacting with membrane-associated components of the cortical cytoskeletal or by structuring ordered lipid domains. Nevertheless, it has been reported that the Young’s modulus of the cell cortex dramatically decreases in ExoC3-treated cells (17) suggesting a small additional contribution of caveolin-1 depletion to membrane softening. This is supported by 2D STORM data showing a dramatic reorganization of actin cytoskeleton in ExoC3-treated cells into a loose F-actin meshwork that is not significantly exacerbated by caveolin-1 depletion. Altogether, our results suggest that the presence of Cav1 stiffens plasma membranes, and that the exact origin of this effect must be further investigated.”

      Author recommendations

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improvements:

      (1) Depletion of both Cavin1 and Caveolin1 increases the density of TEMs. Membrane tension is a critical parameter of the initiation phase of TEMs, its nucleation, and initial enlargement. From the TEM dynamics, the authors should be able to measure membrane tension. The expectation is that in both Caveolin1 and Cavin1 depleted cells, tension is higher (because there is no caveolae), explaining why there are more TEMs.

      While we cannot directly measure membrane tension, we can estimate membrane tension variations using our theoretical modeling. As reported in the article, we predict that depleting Caveolin-1 leads to a significant 2-fold increase of membrane tension, which can explain the concomitant increase in the nucleation of TEMs, as the reviewer points out. In contrast, the model predicts no significant increase of membrane tension upon Cavin-1/PTRF depletion, whereas TEM nucleation also increases significantly (but less than upon Caveolin-1 depletion). Altogether, we can explain these results by considering that membrane tension is an important player in TEM nucleation, but not the only one. Notably, we expect cell height to be another important player, as it sets an energy barrier for the basal and apical membranes to meet each other and fuse. Indeed, we report that membrane height is reduced upon depletion Cavin-1, thus explaining the observed increase in TEM nucleation. The importance of reducing cell thickness to increase the TEM opening likelihood is best supported by previous data showing that pushing forces applied on the apical membrane induced the opening of TEMs (Ng et al., 2017 MBoC).

      An improved discussion of the parameters controlling TEM nucleation has been included in the discussion of the revised manuscript, as follow (Page 15): “Our study points to underlying mechanisms by which caveolae regulate the frequency of TEM nucleation. Nucleation of TEMs requires the apposition of the basal and apical cell membranes, which is hindered by the intermembrane distance, set by the cell height. Meeting of the two membranes may create an initial precursor tunnel, which needs to be sufficiently big to enlarge into an observable TEM, instead of simply closing back. The size of the minimal precursor tunnel required to give rise to a TEM increases with membrane bending rigidity and decreases with membrane tension (19). Silencing cavin-1 or caveolin-1 both lead to a decrease in cell height, thus favoring the likelihood of precursor tunnel nucleation. While silencing cavin-1 has no significant impact on either membrane tension or bending rigidity, silencing caveolin results in both an increase of membrane tension and a decrease of bending rigidity, which results in a decrease in the required minimal radius of the precursor tunnel, thus further favoring TEM nucleation. Overall, our results offer a consistent picture of the physical mechanisms by which caveolae modulate TEM nucleation.”

      (2) In Figure 2B, the authors state that there is no significant difference in the actin mesh size while I see a clear higher average value and distribution in siCAV1+. This seems to correlate with the differences in TEM maximal sizes. How can the authors completely exclude that the actin organisation is not in part responsible for the larger TEMs observed in siCAV1 cells?

      In our theoretical modeling of TEM opening dynamics, all differences between conditions are described by changes in what we consider as “effective” parameter values. Thus, changes in actin organization may induce a change in the "effective bending rigidity" parameter controlling membrane tension relaxation. A limitation of such a description is that all changes are assumed to be spatially uniform. However, it is possible that changes in actin mesh size and organization set local barriers to TEM enlargement in a way that would not be appropriately described by our model. While our current modeling appears to provide a consistent interpretation of our observations, we cannot completely exclude the existence of such local effects.

      This limitation of our current interpretation is now mentioned in the following paragraph, which has been added in the physical modelling part of the materiel and methods section (Page 24) : “A limitation of our theoretical description arises from the use of spatially uniform changes in parameter values to describe differences between experimental conditions, thus assuming spatially uniform effects. However, we cannot exclude the existence of non-uniform effects, such as changes in the size and organization of the remaining actin mesh, which could set local, non-uniform barriers to TEM enlargement in a manner not accounted for by our model.”

      (3) It would be nice to see the results of Table 1 (in particular the thickness of cells) in a Bar plot.

      The experimental values of cell volumes and areas are reported in bar plots of Fig. 3C and 3D. In contrast, we chose not to depict values of cell eight in bar plots considering that these values were calculated from mean values of cell areas and volumes reported in Fig. 3C and 3D, i.e. rough division of volumes over areas, with error propagation. Since the volume and areas are not performed on the same set of cells, it is not possible to divide the repeats one by one and to provide cell numbers, which are key parameters to perform statistical tests.

      (4) There are two reasons why Caveolin1 could change the bending rigidity. First, because it makes the membrane stiffer, or because the presence of caveolin1 (that binds to cholesterol) in the plasma membrane changes the lipid composition. It would be nice if the authors could provide some lipidomics analysis to see if there is a lipid change in siCAV1 cells.

      We thank the reviewer for pointing the importance of clarifying the hypotheses regarding a direct or indirect role of caveolin-1 in membrane bending rigidity which might be related to changes in membrane lipid composition especially cholesterol and sphingomyelin. We have modified the discussion section to integrate this point. The lipidomic approach is certainly interesting to address the question of the role of caveolin-1 in building membrane bending rigidity. Indeed, some of the authors have addressed the specific questions related to Cav-1 spontaneous curvature and its effect on the lipid composition of the plasma membrane in two separate manuscripts (in preparation). They represent comprehensive studies by themselves that will provide mechanistic insights on how caveolin-1 builds membrane bending rigidity and as follow up of the present manuscript which reports the importance of the regulation of membrane rigidity in cell biology and during infectious processes.

      Reviewer #2 (Recommendations For The Authors):

      The paper is nicely written and the results are convincing. The three main comments and questions from the Public Review do not necessarily call for new experiments. However, clarifications are required. This work can be very useful. Better not to leave any difficulty or weakly justified hypothesis under the carpet.

      To fulfill with the reviewer comments, we have improved the discussion regarding the hypothesizes which can be drawn about of a direct versus indirect mechanistic role of caveolin-1 in the regulation of effective membrane bending rigidity and which might be related to changes in membrane lipid composition or via regulation of the cytoskeleton, which we cannot exclude.

      • Minor correction: in the abstract: replace "the enhanced nucleation" with "the enhanced occurrence of nucleation events".

      The abstract has been changed accordingly : “The enhanced occurrence of TEM nucleation events correlates with a reduction of cell height, …”

    1. Author Response

      The authors' responses to the public reviews can be found here


      The following is the authors’ response to the most recent recommendations.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I appreciate the effort that the authors have put into this revised version of the manuscript. Before going into details, I would suggest that, in the future, the authors include enough information in their response to allow reviewers to follow the changes made. Not simply "Fixed", but instead "we have modified the description of these results and now state on lines XXX to XXX (revised text)".

      We greatly apologize, we certainly did not wish to cause more work for the reviewer to find the necessary changes. We will list the line number and our changes in the following response.

      The authors' response to my comments was confined to the minor points, with no attention to more important questions regarding speculations about mechanism which were (and still are) presented as factual conclusions. I do not consider the responses adequate.

      We responded to each of your comments and where we disagree, we have explained in detail.

      With respect to the meaning of "above" and "below" in the context of an intracellular organelle, I think that referring to up and down in a figure is fine, provided that the cytoplasmic and luminal sides are indicated in that figure. I think that labeling to that effect in each figure would be immensely helpful for the reader.

      We agree with this point and have updated all the figures to include these labels.

      The statement on lines 333-335 about non-competitive inhibition is a bit naïve. The only thing ruled out by this type of inhibition is that substrate and TBZ binding do not share the same binding process, in which case they would compete. It doesn't show that TBZ gets to its binding site from the lumen or from the bilayer, or by any other process that isn't shared with substrate. It also doesn't rule out kinetic effects, such as slow inhibitor dissociation, that result in non-competitive kinetics. Please rewrite this sentence to indicate that one explanation of the non-competitive nature of TBZ inhibition would be that TBZ diffuses into the vesicle and binds from the lumen. It's not the only explanation.

      We have changed this sentence lines 334-336 to be more speculative and not include any statement about non-competitive inhibition. Please see, “Studies have proposed that TBZ first enters VMAT2 from the lumenal side, binding to a lumenal-open conformation.”

      The revised version integrates the MD simulations into a plausible mechanism for luminal release of substrate. A key element in this mechanism is the protonation of D33, E312 and D399, which allows substrate to leave following water entry into the binding site. The acidic interior of synaptic vesicles should facilitate such protonation, but the fate of those protons needs to be considered. Are any of them predicted to dissociate prior to the return to a cytoplasm-facing conformation? If so, are all 3 released in that conformation? Postulating protonation events at one point in the reaction cycle requires some accounting for those protons - or at least recognition of the problem of reconciling their binding with the known stoichiometry of VMAT.

      We completely agree with this point and while we cannot account for all protons with a single structure and simulation of neurotransmitter release, some discussion of the fate of the protons is warranted. We have included a highly speculative statement in the discussion on this point, see lines 462-465, “Given the known transport stoichiometry of two protons per neurotransmitter, we speculate that two protons may dissociate back into the lumen, perhaps driven by the formation of salt bridges between D33 and K138 or R189 and E312 for example in an cytosol-facing state.”

      Reviewer #3 (Recommendations For The Authors):

      On page 13, line 238, the statement "The protonation states of titratable residues D33, E312, D399, D426, K138 and R189, which are in close proximity to TBZ, also impact its binding stability (Table 4)" is misleading. Table 4 only shows that D426 is charged and what the pKa values are. This should be rephrased to separate out which residues are in close proximity from what is known about how their protonation states affect TBZ stability.

      We agree with this statement and have rephrased this on line 290-294 on page 13 to read, “Several titratable residues, including D33, E312, D399, D426, K138, and R189, line the central cavity of VMAT2 and impact TBZ binding stability (Table 4). We found that maintaining an overall neutral charge within the TBZ binding pocket, as observed in system TBZ_1, most effectively preserves the TBZ-bound occluded state of VMAT2. Residues R189 and E312 in particular are within close proximity of TBZ and participate directly in binding.” We note that given the acidic pH of the vesicle lumen (5.5), it is likely all four residues may be protonated to a significant degree in this state.

      Typos:

      • luminal is another name for the drug generically known as phenobarbital, lumenal means in the lumen. (This typo seems to have crept into the published literature now too).

      Thank you for pointing this out. Indeed, we had considered carefully whether to use ‘lumenal’ or ‘luminal’ in our revised text. In fact, both are used interchangeably throughout the scientific literature and luminal is the more commonly used term. Please also see: https://www.merriam-webster.com/medical/luminal we do agree that there may be confusion because ‘Luminal’ is a trademark of phenobarbital. Therefore, we have changed the text to read ‘lumenal’ throughout.


      The following is the authors’ response to the original recommendations.

      Reviewer #1 (Recommendations For The Authors):

      I congratulate the authors on this study, which I enjoyed reading. Overall, the study reports a novel and exciting new structure for a member of the SLC18 family of vesicular monoamine transporters. Associated MD, binding and transport assays provide support for the hypothesis and firm up the modelled pose for the TBZ drug. The main strengths of the study largely sit with the structure, which, as the authors say, provides additional and essential insights above those available from AF2. The structures also reveal several potentially interesting observations concerning the mechanism of gating and proton-driven transport. The main weakness lies in the limited mutational data and studies into the role of pH in regulating ligand binding. As detailed below, my main comment would be to spend a little extra time expanding the mutational data (perhaps already done during the review?) to enable more evidence-based conclusions to be drawn.

      We thank reviewer #1 for their helpful comments and suggestions. We agree that mutational analysis specifically of neurotransmitter transport would strengthen the mechanistic conclusions of the work. We also agree with reviewer #1 and #3 that the role of pH and the protonation state of charged residues was a weakness in the first version of the manuscript. Therefore, we have expanded our mutational and computational data as detailed below and we believe that this has further solidified our findings.

      Specific comments & suggestions:

      It is an interesting strategy to fuse the mVenus and anti-GFP nanobody to the N-/C-termini. The authors should also include in SI Fig. 1 a full model for the features observed in these maps and deposit this in the PDB.

      Great point, we have made a main text panel describing the construct. Figure S1 includes a full description of the construct. The reviewer will note that the PDB entry contains the entire amino acid sequence of the construct and while the GFP and GFP-Nb cannot be well modeled into the density, we have included all of the relevant information for the reader.

      Difficult to make out the ligand in Fig. 2b, I would suggest changing the color of the carbon atoms.

      Fixed.

      It is difficult to make out the side chains in ED Fig. 5d.

      This is now its own supplemental figure and is presented larger.

      ED Figures are called out of order in the manuscript. For example, in line 143 ED Fig.6 is called before ED Fig. 5d (line 152), and then ED 5d is called before ED 5a. This makes it rather confusing to follow the description, analysis, and data when reading the paper. Although there are other examples. I would suggest trying to order the figure callouts to flow with the narrative of the study.

      Agreed. Fixed.

      It wasn't clear to me what the result was produced by just imaging the ligand-free chimaera protein. It would be useful to say whether this resulted in low-resolution maps and whether the presence of the TBZ compound was essential for high-resolution structure determination.

      The ligand is likely required for structure determination. We have not, however, made such a statement largely because we have yet to determine an apo reconstruction.

      The role of E127 and W318 on EL1 in gating the luminal side of the transporter is very intriguing. As the authors suggest, this may represent an atypical gating mechanism for the MFS (line 182). I did wonder if the authors had considered providing more insight into this potentially novel mechanism. Additional experiments would be further mutations of W318 to F, Y, V, and I to see if they can identify a non-dead variant that could be analysed kinetically. They may have more luck with variants of E127, as they suggest this stabilises W318. If these side chains are important for gating and transport regulation, one might expect to see interesting effects on the transport kinetics.

      This is a fantastic suggestion. We have done this, and we think that the reviewer will find the results to be quite interesting. Some VMAT2 sequences have an R or an H at position 318 while VPAT has an F at the equivalent position. We have made these mutants including the E127A mutant and analyzed them using TBZ binding and transport experiments. Interestingly the W318R, H, and F mutants preserve activity in varying degrees with the R mutant closely resembling wild type. W318A has no transport activity. Only the W318F mutant retains some TBZ binding. The E127A mutant also has little transport activity but nearly wild type like TBZ binding which we believe suggests a role for this residue also in stabilizing W318.

      The authors identify an interesting polar network, which is described in detail and shown in Fig. 2d. However, the authors present no experimental data to shed further mechanistic insight into how these side chains contribute to monoamine transport or ligand binding. Additional experiments that would be helpful here might include repeating the binding and competition assays shown in Fig. 1c under different pH conditions for the WT and different mutations of this polar network. At present, this section of the manuscript is very descriptive without providing much novel insight into the mechanism of VMAT transport. I did wonder whether a similar analysis of pH effects on DTBZ binding might also provide insight into the role of E312 and the role of protons in the mechanism.

      Thank you, we have addressed this point in several different ways. The first is that many of these residues have already been characterized in several earlier studies, see refs 31, 32, and 42 and we have incorporated this into our discussion where appropriate. With respect to E312, the reviewers’ comments are again very appropriate. We have addressed this using computational experiments exploring the protonation status of E312 and other residues as well as TBZ. Our simulations and Propka calculations clearly show that E312 must be protonated and TBZ must be deprotonated to maintain TBZ binding. We have also extended these computational studies toward understanding the protonation status of residues which orchestrate dopamine binding and release.

      The authors then describe the binding pose for TBZ. This section also provides some biochemical characterisation of the binding site, in the form of the binding assay introduced in Fig. 1. However, the insights are again somewhat reduced as the mutants were chosen to show reduced binding. Could the authors return to this assay and try more conservative mutations of the key side chains to illuminate more detail? For example, does an R189K mutant still show binding but not transport? Similarly, what properties does an E312D have? The authors speculate that K138 might play a role in coupling ligand binding/transport to the protonation, possibly through an interaction with D426 and D33 (line 236). Given the presence of D33 in the polar network described previously, I was left wondering how this might occur. I feel that some of the experiments with pH and conservative mutants might shed some light on this important aspect. Please label the data points in Fig. 3d.

      Indeed, alanine mutants at these positions while valuable do not provide the level of detailed insight into mechanism that we also would have liked to obtain. Thus, we have made more conservative and targeted mutants like the R189K mutant and various mutants at N34 for example and tested them in both transport and binding assays. We have also made a mutant at K138 and found that it is not transport competent or able to bind TBZ to a significant degree. With respect to labels and color codes, we have made the color codes consistent between the bar graphs and the curves. We have also labeled the data points in the figure legends.

      The manuscript currently doesn't present a hypothesis for how TBZ induces the 'dead-end' complex compared to physiological ligands. Does the MD shed any light on this aspect of the study? If the authors place the physiological ligand in the same location as the TBZ and run the simulation for 500ns, what do they observe? 100ns is also a very short time window. I appreciate the comment about N34 in line 303, but is this really the answer? It would be very interesting to provide more evidence on this important aspect of VMAT pharmacology.

      MD with a natural ligand (dopamine) provides substantial insight into why TBZ is a dead-end complex. Since water cannot penetrate into the binding site in the TBZ bound complex, this does not allow for substantial luminal release. In contrast, simulations conducted in the presence of DA bound to the occluded VMAT2 show the propensity of that structure to accommodate an influx of water molecules that promote the release of DA to the lumen. The new results are illustrated in Figure 5 (main text) as well as supplemental figure 8 panels d-h. The new simulations further emphasized the importance of the protonation state of acidic residues near the substrate-binding pocket.

      Reviewer #2 (Recommendations For The Authors):

      Line 68, "both sides of the membrane" -> "alternately to either side of the membrane".

      Fixed. Thanks.

      Transmembrane proteins in intracellular organelles present unique issues of nomenclature. I suggest the authors refer to cytoplasmic and luminal faces of the protein (not intracellular or extracellular (line 124)) and adhere to these names to avoid confusion. This creates problems for loops called IL and EL, but they could be defined on first use.

      We agree with this point and had initially gone with the conventional definitions used in the literature. We have now changed this throughout the text to be luminal and cytosolic.

      Lines 135-6, are these residue numbers correct? The pdb file lists 126 as Asp and 333 as Ala.

      Thank you. This is fixed.

      ED Fig. 6 is not clear. A higher-resolution figure is needed.

      We have updated this figure and hope that the reviewer will find it to be much clearer.

      Lines 158-9, Is there any data to support effects on dynamics or folding? If not, please indicate that this is speculation.

      Fixed.

      Line 174, Should "I315" be "L315"?

      Fixed.

      Line 179, Please indicate what is meant by "inner" and "below" (also lines 183 and 258).

      We have added Figure calls here where needed.

      Line 192, S197 is listed as part of polar network 1, but not discussed further. Is it actually involved, or just in the neighborhood?

      It is part of the network, but we did not discuss in further detail because we do not have data indicating its precise function and thus have left this as a description.

      Line 199, E312, and N388 are fairly distant from each other. Do you want to clarify why they represent a network?

      While they are not within hydrogen bonding distance, we nevertheless include them as part of the same network because they may come into closer proximity in a different conformational state.

      Line 206, Protonation of all 3? VMAT2 doesn't transport 3 protons per cycle. Please clarify.

      We believe that these residues may be protonated, but they may not necessarily all be involved in proton transport.

      Line 219, Do you mean the aspartate unique to DAT, NET, and SERT? This is Gly in all the amino acid transporters in the NSS family. Please be specific.

      Fixed. Thank you.

      Line 224, "mutation of E312 to Q" or "mutation of Glu312 to Gln".

      Fixed. Thank you.

      Fig. 3d, Normally, one would expect full saturation curves for each mutant. How can a reader distinguish between low affinity or a decrease in the number of binding sites? Would full binding curves be prohibitive for the mutants because of the cost or availability of the ligand? These points should be addressed. A couple of the curves are not visible. Would an expanded scale inset show them more clearly? Also, would it be possible to include chemical structures for all ligands discussed?

      Many if not most of these mutants bind TBZ with such low affinity that it is not possible to measure a full saturation curve either because of ligand availability (radioactive ligand concentration is only in µM) or due to technical issues with being able to measure such low affinity binding. We have changed the presentation of the curves and have split the gating and binding site mutants into their own figures. We feel this improves the readability of these curves. We have also included a table with the respective Kd values determined for each of the mutants where possible.

      Line 235, The distances are long for a direct interaction between K138 and the TBZ methoxy groups. The unusual distances should be mentioned if an interaction is being proposed.

      We do not think that K138 is directly involved in TBZ binding, however this was written in a confusing way and has been now changed.

      Line 243, Please give a quantitative estimate of the affinity difference. "modestly" is vague.

      It is an approximately 2-fold difference. Fixed in the text.

      Line 248, 150 nM is, at best, a Kd, not an affinity.

      Agreed, this is changed.

      Reviewer #3 (Recommendations For The Authors):

      The (3 x ~100ns-long) molecular dynamics simulations provided suggest some instability of the pose identified by cryo-EM. While it is not unreasonable that ligands shift around and adopt multiple conformations within a single binding site (in a reversible manner), the present results do raise questions about the assumptions made when starting the simulations, in particular (1) the protonation states of charged residues in the TBZ binding sites; (2) the parameters used for tetrabenazine; (3) the conformations of acidic side chains that are notoriously difficult to resolve in cryoEM maps; and (4) any contributions of the truncated regions truncated in the simulated structure, namely the cysteine cross-linked loop and the terminal domains. The authors should examine and/or discuss these contributions before attributing mechanistic insights into the newly observed binding orientation.

      In order to estimate the effects of protonation states on TBZ binding, we now added three new systems with altered protonation on TBZ and binding pocket lining residues (see Table 3 in the revised vision); and for each system, we performed multiple MD runs to address the question and concerns raised by reviewer.

      Regarding the protonation states: Propka3.0 was used to determine the protonation states, finding that E312 and D399 should be protonated. If I am not mistaken, this version of ProPka cannot account for non-protein ligands (https://github.com/jensengroup/propka). Given their proximity to the binding site, these protonation states will be critical factors for the stability of the simulations. The authors could test their assumption by repeating the calculations with Propka 3.1 or higher, to establish sensitivity to the ligand. Beyond this, showing the resultant hydrogen bond networks will help to reassure the reader that the dynamics in the lumenal gates do not arise from an artifact.

      We thank the reviewer for suggestion of using higher version of Propka. We used the most recent Propka3.5 and carried out protonation calculations in the presence and absence of TBZ. The new calculations are presented in Table 4 and SI Figure 8c of the revised version.

      It should be possible to assess whether waters penetrate the ligand binding site during the simulations if that is of concern.

      We now added the number of waters within the ligand binding pockets for all MD simulations we performed, which are presented in Table 3 and Table 5 of the revised version.

      Finally, I didn't fully understand the conclusion based on the simulations and the "binding affinity" calculations: do they imply that the pose identified in the EM map is not stable? What is the value of the binding affinity histogram?

      We apologize for this confusion. For each MD snapshot, we calculated TBZ binding affinity using PRODIGY-LIG (Vangone et al., Bioinformatics 2019), which is a contact-based tool for computing ligand binding affinity. The binding affinity histogram shown in the original submission was the histogram of those binding affinities calculated for MD snapshots. In the revision, we replaced binding affinity histogram by time evolution of binding affinity changes (SI Fig 6c in the revision). The simulations confirmed that the pose identified in the EM map is stable, with a flattened binding affinity of -9.4 ± 0.3 kcal/mol in all three runs.

      Recommendations regarding writing/presentation:

      The authors use active tense terminology in attributing forces to elements of structure (cinching, packing tightly, locking). While appealing and commonplace in structural biology, this style frequently overstates the understanding obtained from a static structure and can give a rather misleading picture, so I encourage rephrasing.

      We appreciate this point; the use of these words is not meant to overstate or provide a misleading picture but rather to aid the reader in mechanistic understanding of the proposed processes.

      I would also recommend replacing the terms "above" and "below" for identifying aspects of the structure; the protein's location in the vesicular membrane makes these terms particularly difficult to follow.

      These terms refer specifically to the Figures themselves which we have always oriented with the luminal side at the top of the page and the cytosolic on the bottom. We have indicated in Figure 1 the orientation of VMAT2. The Figures are the point of reference which we refer to, and the ‘above’ and ‘below’ terms have been used to assist the reader to make the manuscript easier for a more casual or non-expert reader to follow.

      Minor corrections:

      • the legend in Figure 2 lacks details, e.g. how many simulation frames are shown, how were the electrostatic maps calculated?

      We revised Figure 2 and moved simulation frames to SI figure 6e. A total of 503 simulation frames are shown.

      • how were the TBZ RMSDs calculated? using all atoms or just the non-hydrogen atoms?

      For TBZ RMSDs, we used non-hydrogen atoms. This information is presented in the Methods section.

      MD simulation snapshots and input files can be provided via zenodo or another website.

      We will upload snapshots and input files to Zenodo upon acceptance of the manuscript.

      Reviewing editor specific points:

      Specific points

      L.97: Remove "readily available"

      Fixed.

      L.99: The authors are not measuring competition binding. It is well known that reserpine and substrates inhibit TBZ binding only at concentrations 100 times higher than their respective KD and KM values. It is, therefore, surprising that the authors use this isotherm and refrain from commenting on the significance of the finding. Moreover, the presentation of results as "Normalized Counts" does not provide any information about the fraction of VMAT molecules binding the ligand. At least, the authors should provide the specific activity of the ligand, and the number of moles bound per mole of protein should be calculated.

      The point was not to infer any details about the conformations that TBZ and reserpine bind but merely to point out that both constructs have a similar behavior with respect to their Ki for reserpine. We have added a sentence to say that reserpine binding stabilizes cytoplasmic-open so the reader is aware of the significance of this competition experiment.

      L.102: The characterization of serotonin transport activity needs to be more satisfactory. The Km in rVMAT2 is 100-200 nM, so why are the experiments done at 1 and 10 micromolar? Is the Km of this construct very different? The results provided (counts per minute at the steady state) need to give more information.

      The Km of human VMAT2 varies somewhat according to the source but has generally been reported to be between 0.6 to 1.4 µM for serotonin according to these references.

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019297/ https://www.cell.com/cell/pdf/0092-8674(92)90425-C.pdf https://www.pnas.org/doi/abs/10.1073/pnas.93.10.5166

      Fig 1B could be more informative. I suggest adding a cartoon model with TMs labeled, similar to ED Fig6a.

      This panel is to aid the reader in accessing the overall map quality and thus we do not wish to add additional labels/fits which would distract from that point. Instead, we have added overall views of the model in Figs 2,3.

      L.179: The authors claim that the inner gate is located "below" (whatever this could mean) the TBZ ligand. In L.214, they claim that TBZ adopts a pose.....just "below" the location of the luminal gating residues. Please clarify and use appropriate terminology.

      This refers to the position of these residues in the Figures themselves. We have added figure calls where appropriate here.

      Fig. 4: The cartoon could be more informative.

      We have added more information to the mechanism cartoon which is now Figure 6. This incorporates some of our new data and we believe it will be more informative.

      L. 213: The paragraph describes residues involved in TBZ binding. Mutagenesis is used to validate the structural information. However, the results (ED fig. 5B) must be corrected for protein expression levels. In the Methods section, the authors state (L.444), "Mutants were evaluated similarly from cell lysates of transfected cells." Without normalization of protein expression levels, the results are meaningless even if they agree with predictions.

      In fact, we have normalized the concentrations of protein in our binding experiments. This was noted in the methods section. And to account for these differences, experiments were conducted using 2.5 nM of VMAT2 protein as assessed by FSEC.

      L.220: The referral to ED Fig.7 is not appropriate here. The figure shows docking-predicted poses of dopamine and serotonin.

      Figure call has been changed.

      L.226: The referral to Fig. 3b needs to be corrected. The figure shows TBZ and not the neurotransmitter.

      This has been corrected.

      L. 337: "The neurotransmitter substrate is bound at the central site." What do the authors mean in this cartoon? Do they have evidence for this? Tetrabenazine is not a substrate.

      This cartoon drawing is meant to illustrate the elements of structure. Similar drawings are presented throughout the literature such as here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940252/ Figure 3 and here: https://pubs.acs.org/doi/10.1021/acs.chemrev.0c00983 Figure 2.

      The same compound is mentioned with different names: 3H-dihydrotetrabenazine and 3H-labeled DTBZ.

      Fixed.

      ED fig 1d is illegible.

      The high-resolution figure is completely legible. We will provide this to the journal upon publication.

      Figure 2d: A side view would be more visual.

      We have updated this figure and believe that it is much easier to understand now.

      L. 179: The inner gate is located 'below' the TBZ ligand

      Please see above response, this refers to the figures themselves. The figures are our point of reference.

      L. 213-215: Tetrabenazine binding site just 'below' the location of the luminal gating residues.

      See above.

      Throughout the paper, results are given as cpm or counts. The reader can only estimate the magnitude of the binding/transport by knowing the specific activity of the radiolabel. I recommend switching to nano/picomoles or supplying enough information to understand what the given cpm values could mean.

      Binding experiments were done using scintillation proximity assays and therefore converting the CPMs to values in pmol of bound ligand is simply not possible. For the transport experiments (now Fig 1d) the point was to show that the wild type was similar in activity to the chimera. In our new transport experiments we have presented for the mutants, many experiments were combined together and therefore, we have normalized the counts to the relative activity of wild type VMAT2.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This manuscript introduces an exciting way to measure SARS-CoV-2 aerosolized shedding using a disposable exhaled breath condensate collection device (EBCD). The paper draws the conclusion that the contagious shedding of the virus via aerosol route persists at a high level 8 days after symptoms.

      Strengths:

      The methodology is potentially of high importance and the paper is clearly written. The study design is clever. If aerosolized viral load kinetics truly differed from those of nasal swabs, then this would be a very important finding.

      Thank you for your encouraging remarks. We agree that a comparison between aerosolized viral load and nasal swabs would strengthen our findings, and we have collected new specimens which will enable this comparison: In each session we collected both nasal swabs and exhaled breath samples, and we are in the process of analyzing these data. These data will be included in our revised manuscript.

      Weaknesses:

      The study conclusions are not entirely supported by the data for several reasons:

      (1) Most data points in the study are relatively late during infection when viral loads from other compartments (nasal and oral swabs) are typically much lower than peak viral loads which often occur in the pre-symptomatic or early symptomatic phase of infection. Moreover, the generation time for SARS-CoV-2 has been estimated to be 3-4 days on average meaning that most infections occur before or very early during symptoms. Therefore, the available epidemiologic data does not support 12 days of infection (day 8 symptoms) as important for most transmissions. Therefore, many of the measurement timepoints in this study may not be relevant for transmission.

      Thank you for your comment. Notably, our new data set includes a small number of specimens that were collected prior to the start of symptoms, and so we may be able to partially address this concern with those data. That said, we agree that a limitation of our study is that we were unable to collect specimens prior to symptom onset, and that this pre-symptomatic period represents a fruitful area for future work. However, significant questions do remain open regarding transmission dynamics of SARS-CoV-2, including the extent of transmission after symptom onset, and therefore, despite this limitation of our data, we feel that our method may contribute to further understanding of those dynamics. However, we will include a more prominent discussion of this limitation in the revised manuscript.

      (2) Fig 1A would be more powerful as a correlation plot between viral load from nasal samples (x-axis) and aerosol (y-axis). One would expect at least a rough correlation (as has been seen between viral loads in oral and nasal samples) and deviations from this correlation would provide crucial information about how and when aerosol shedding is discordant from nasal samples (ie early vs late time points, low versus high viral loads< etc...). It is too strong to state correspondence is 100% when viral load is only measured in one compartment and nasal swabs are reduced to the oversimplified "positive or negative".

      Thank you for this suggestion, we agree that the figure would be more powerful as a correlation plot between viral load from nasal samples and aerosol. Unfortunately, at the time these samples were collected, the ER at Northwestern Hospital was diagnosing SARS-CoV-2 patients using the Abbott ID NOW rapid diagnostic platform, which, despite being a PCR-based system, does not provide quantitative information about viral loads, and instead provides a binary positive/negative result. Since we were looking for a direct comparison between the clinical diagnostic test and our test, we considered the binary aspect of our data (detected/undetected), and found 100% correspondence, meaning that when the clinical test detected SARS-CoV-2, our test did too. We have collected additional data which includes quantitative PCR values from nasal swabs collected at the same time as breath samples and we will include these data in the format you suggest, once analyzed, in our revised manuscript.

      (3) Results are reported in RNA copies which is fine but particle-forming units (pfu, or quantitative culture) are likely a more accurate surrogate of infectivity. It is quite possible that all of these samples would have been negative for pfu given that the ratio of RNA: pfu is often >1000 (though also dynamic over time during infection). This could be another indicator that most samples in the study were collected too late during infection to represent contagious time points.

      We agree that culturing exhaled breath samples would be an important addition to our understanding of the transmission dynamics of SARS-CoV-2 and we consider this to be an important next step for our method. Because we did not perform culturing of our breath samples in this study, we avoided making claims about infectivity of our samples in this manuscript, and instead speculate about the future utility of our method in understanding transmission dynamics, once an appropriate surrogate of infectivity is performed. We will make sure this is clearer in the revised manuscript. That said, other groups have successfully cultured breath samples with corresponding CT values in a range that are well within the range we found in our study, and sufficient for transmission (for example, Alsved et al, 2023, CT range ~33-38). These studies support the idea that a significant portion of the viral RNA measured in our samples may come from viable virus. Therefore, quantifying the ratio of viable to nonviable virus in our samples is an important next step. We appreciate this comment, and we will add a clearer discussion of this point to the revised manuscript.

      (4) Individual kinetic curves should be shown for participants with more than three time points to demonstrate whether there are clear kinetic trends within individuals that would help further validate this approach. The inclusion of single samples from individuals is less informative.

      We will add individual kinetic curves to the revised manuscript.

      (5) The S-shaped model in 2A is somewhat misleading as it is fit to means but there is tremendous variability within the data. Therefore the 8-day threshold should be listed clearly as a mean but not a rule for all individuals. The statement that viral RNA copies do not decrease until 8 days from symptom onset is unlikely to be true for all infected people and can't be made based on the available data in this study given that many people contributed only one datapoint.

      We will clarify the language in the manuscript and make limitations of the 8-day interpretation clearer.

      (6) The incubation period for SARS-CoV-2 is highly variable. Therefore duration of symptoms is a rather poor correlate of the duration of infection. This further diminishes the interpretive value of positive samples from individuals who were only sampled once.

      We will add a discussion of this point to the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Lane and colleagues measured the abundance of SARS-CoV-2 on breath in 60 outpatients after the development of COVID-19 symptoms using a novel breath collection apparatus. They found that, overall, viral abundance remains high for approximately eight days following the development of symptoms, after which viral abundance on breath drops to a low level that may persist for approximately 20 days or more. They did not identify significant differences in viral shedding on breath by vaccination status or viral variant. They also noted substantial variation in the degree and duration of shedding across individuals.

      Strengths:

      The primary strengths of this study are (1) the focus on breath, rather than the more traditional nasal/oropharyngeal swabs, and (2) the fact that the data were collected at multiple time points for each infection. This allows the authors to characterize not only mean viral abundance across individuals but also how that abundance changes over time, allowing for a better understanding of the potential duration of infectiousness of SARS-CoV-2.

      Weaknesses:

      The sample size is moderate (60) and focuses only on outpatients. While these are minor weaknesses (as the authors note, the majority of SARS-CoV-2 transmission likely occurs among those with symptoms below the threshold of hospitalization), it would nevertheless be useful to have a fuller understanding of variation in viral shedding across clinical groups.

      We agree this would be very interesting and feel our method, which is straightforward to perform in clinical settings, lends itself to future studies across clinical groups. We have added discussion of this to the discussion section of the manuscript.

      Furthermore, the study lacks information on viral shedding prior to the development of symptoms, which may be a critical period for transmission. Since the samples were collected at home by study participants using a novel apparatus, it is difficult to assess the degree to which actual variation in viral abundance, user variability, and/or measurement variation is inherent to the apparatus.

      This is a great point, which we will discuss in our revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For the Authors):

      (1) While not absolutely necessary - it would be nice to see at least at the in-situ level what happens to the handful of other HC-important transcription factors in the Rbm24 KO (IKZF2, Barlh1, RFX) as the authors did look at Insm1.

      Reply: Thanks for your suggested experiments. We agree that knowing whether the genes that are known to be involved in cell survival regulation are changed will provide insights into the mechanisms underlying cell death of Rbm24-/- HCs. Our data showed that Ikzf2 seemed to be upregulated when in the Rbm24-/- HCs, relative to Rbm24+/+ HCs at P5. We also tested Barlh1 and RFX, but we did not obtain confident data to present. Nonetheless, following the reviewer’s logic, we further tested Gata3, another gene involved in HC survival, and found that Gata3 was down-regulated in Rbm24 -/- HCs, compared to Rbm24+/+ HCs. Please refer to the text on lines 12-22 on page 12 and lines 1-10 on page 13, and Figure 3-figure supplement 1.

      (2) Major comments: The nomenclature for mouse gene vs. mouse protein needs to be addressed throughout the manuscript. The nomenclature when referring to a mouse gene: gene symbols are italicized, with only the first letter in upper-case (e.g. Rbm24).

      The nomenclature when referring to a mouse protein: Protein symbols are not italicized, and all letters are in upper-case (e.g. RBM24).

      Reply: Thanks for pointing it out. In the entire manuscript, we have followed the reviewer’s comments to list gene and protein.

      (3) Supplemental Figure 2D: Individual data points should be displayed on the bar graph via dots. SEM is not appropriate for this graph as SEM precision with only 3 samples is low. Furthermore, readers are more interested in knowing the variability within samples and not proximity of mean to the population mean, therefore standard deviation (SD) should be used instead.

      Reply: We have edited the Figure 1-figure supplement 2D, as suggested. The Figure 1figure supplement 2 legend was updated, too. Please refer to line 21-22 on page 32.

      (4) Red/Green should be avoided, especially when both are on the same image (merged immunofluorescence images that are found throughout the manuscript). I highly recommend changing to a color-blind friendly color scheme (such as cyan/green/magenta, cyan/magenta/yellow, etc.) for inclusivity.

      Reply: Thanks for pointing it out. We have changed the red to magenta in all our Figures and figure supplements.

      (5) Minor comments: As CRISPR-stop is a major method used throughout the paper, a brief explanation is needed for readers to understand what this methodology entails and why it was used. Something along the lines of," The CRISPR-stop technique allows for the introduction of early stop codons without the induction of DNA damage via Cas9 which can cause deleterious effects".

      Reply: We have further elaborated how CRISPR-stop works and its advantages. Please refer to lines 8-13 on page 5.

      (6) Page 5; line 5 - "Phenotypes occur earlier..." Grammar

      Reply: The grammar error was corrected. Please refer to line 4, page 5.

      (7) Page 5; line 5 - "Given Pou4f3 is the upstream regulator..." Not proven, rephrase

      Reply: We have rephrased this sentence. Please refer to lines 5-6 on page 5.

      (8) Supplemental 1A: Fine, Proof of knockout, I wouldn't mention INSM1 being "irregular"

      Reply: We have rephrased this sentence. Please refer to lines 2-3 on page 6.

      (9) Page 5; line21 - "Alignment of Insm1+ OHCs was not as regular..." Not a good description

      Reply: We have rephrased this sentence. Please refer to lines 2-3 on page 6.

      (10) Page 6; line11 - "Rbm24 was completely absent.." Redundancy with line 9

      Reply: Thanks for pointing it out, and we have removed the redundant sentence.

      (11) Page 7 - HA tag should be indicated originally as: Hemaglutinin (HA)

      Reply: We have switched “HA” to “Hemaglutinin (HA)”. Please refer to line 15, page 7.

      (12) Page 9, line 11- "Determine if autonomous/noncell autonomous." Disagree, cells still clustered in supplemental fig 4.

      Reply: We have removed this sentence.

      Reviewer #2 (Recommendations For The Authors):

      The writing of the manuscript is adequate, but it would certainly be improved by professional editing.

      Reply: Thanks for the reviewer’s encouraging comments. The revised version of our manuscript has been edited by an English native speaker.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript from Richter et al. is a very thorough anatomical description of the external sensory organs in Drosophila larvae. It represents an important tool for investigating the relationship between the structure and function of sensory organs. Using improved electron microscopy analysis and digital modeling, the authors provide compelling evidence offering the basis for molecular and functional studies to decipher the sensory strategies of larvae to navigate through their environment.

      Public Reviews:

      Summary

      This is a very meticulous and precise anatomical description of the external sensory organs (sensillia) in Drosophila larvae. Extending on their previous study (Rist and Thum 2017) that analyzed the anatomy of the terminal organ, a major external taste organ of fruit fly larva, the authors examined the anatomy of the remaining head sensory organs - the dorsal organ, the ventral organ, and the labial organ-also described the sensory organs of the thoracic and abdominal segments. Improved serial electron microscopy and digital modeling are used to the fullest to provide a definitive and clear picture of the sensory organs, the sensillia, and adjacent ganglia, providing an integral and accurate map, which is dearly needed in the field. The authors revise all the data for the abdominal and thoracic segments and describe in detail, for the first time, the head and tail segments and construct a complete structural and neuronal map of the external larval sensilla.

      Strengths

      It is a very thorough anatomical description of the external sensory organs of the genetically amenable fruitfly. This study represents a very useful tool for the research community that will definitely use it as a reference paper. In addition to the classification and nomenclature of the different types of sensilla throughout the larval body, the wealth of data presented here will be valuable to the scientific community. It will allow for investigating sensory processing in depth. Serial electron microscopy and digital modeling are used to the fullest to provide a comprehensive, definitive, and clear picture of the sensory organs. The discussion places the anatomical data into a functional and developmental frame. The study offers fundamental anatomical insights, which will be helpful for future functional studies and to understand the sensory strategies of Drosophila larvae in response to the external environment. By analyzing different larval stages (L1 and L3), this work offers some insights into the developmental aspects of the larval sense organs and their corresponding sensory cells.

      Weaknesses

      There are no apparent weaknesses, although it is not a complete novel anatomical study. It revisits many data that already existed, adding new information. However, the repetitiveness of some data and prior studies may be avoided for easy readability.

      We would like to thank the reviewers for their respective reviews. The detailed comments and efforts have helped us to improve our manuscript. In the following, we have listed the comments one by one and provide the respective information on how we addressed the concerns.

      Recommendations for the authors:

      We have tried to address every single comment as far as possible. In order to structure our response a little better, we have listed the relevant page number and the original comments once again. Directly following this you will find our response and a description of what we have changed in the manuscript.

      REVIEWER #1 (Recommendations For The Authors):

      I have a few comments that will help the reader navigate this long and detailed paper.

      REVIEWER 1.1. page 4

      The final section of "the Structural organization of Drosophila larvae" needs some reorganization.

      Specifically:

      "The DO and the TO are prominently located on the tip of the head lobes" Can the authors rewrite the sentence in a way that it is clear that there is one DO and one VO on each side of the head? Check at the beginning of each section, please. There is a mention about hemi-segments but it is still confusing.

      Done – replaced with “The largest sense organs of Drosophila larvae are arranged in pairs on the right and left side of the head.”

      REVIEWER 1.2. page 5

      "The sequence of sensilla is always similar for and different between T1, T2-T3, and A1-A7" This sentence is not clear, please break it into two sentences.

      Done – replaced with: “We noticed varying arrangements for T1, T2-T3, and A1-A7, with a consistent sequence of sensilla in each configuration.”

      REVIEWER 1.3. figures page 4

      Double hair can't be found in Figure 1B or C (is it h3, h4?) - please clarify.

      Done - changed to double hair organ in page 11, included double hair sketch in legend in figure 1B. We changed the name of the structure to double hair organ, to clarify that this is a compound sensillum consisting of two individual sensilla.

      REVIEWER 1.4. page 5

      The authors go back and forth in their descriptions of the different sensory organs. Knob sensilla and then papilla sensilla are discussed and then a few lines later a further description is done. Please unify the description of each separately.

      Done – we restructured the whole section.

      REVIEWER 1.5. figures page 6

      "We found three hair sensilla on T1-T3, and "two" on A1-A7" - in the figure there seem to be "four" on A1-A7.

      Done – we included the two hair sensilla of the double hair organ

      REVIEWER 1.6. figures page 6

      DORSAL ORGAN:

      Can the authors explain the colour map meaning in Figure 2A? It is explained in 2C but the image already has colours. Add your sentence "Color code in A applies to all micrographs in this Figure".

      Done – we added a sentence to explain that the color code in A applies to the whole figure.

      REVIEWER 1.7. page 6

      Page 10: which comprises seven olfactory sensilla "composing" three dendrites each: replace this with"with". At the end, we want to think 7 X 3= 21 ORNs.

      Done – replaced.

      REVIEWER 1.8. page 9

      CHORDOTONAL ORGANS:

      "We find these these DO associated ChO (doChO).. .". Please remove one "these"

      Done – removed.

      REVIEWER 1.9. page 8

      Is the DO associated ChO part of the dorsal ganglion???? It does not look like it. Could you clarify?

      Done – we added a sentence that clarifies that the ChO neuron is not iside the DOG.

      REVIEWER 1.10. page 9 VENTRAL ORGAN: A figures page 12

      Please add to the Figure 8 legend the description of 8c' and 8c'?

      Done – added description in figure legend.

      B page 9

      8H, what are the *, arrows? Please clarify - it is hard to interpret the figure.

      Done – we added parentheses in the figure legend that state which structures the asterisks and arrows indicate.

      C page 9

      "Three of them are innervated by a single neuron () and one by two neurons () (Figure 8F-I). Please add which are innervated by 1 (VO1, VO2-VO4) and which by 2 (VO3).

      Done – we added parentheses that clarify which sensilla are innervated by 1 or 2 neurons.

      REVIEWER 1.11. page 9

      Can you add something (or speculate) about the difference in sensory processing of the different types of sensilla?

      Done – new sentence in discussion:

      ‘Their different size and microtubule organization likely correlate with processing of different stimulus intesities applied to the mechanotransduction apparatus (Bechstedt et al. 2010).’

      REVIEWER 1.12. figures page 16

      PAPILLA AND HAIR SENSILLA:

      FIGURE 10a, please add the name of each sensillum from p1, p2, px py, etc... (if not we have to go back to figure 1 when you describe specific ps.)

      Thanks for the comment, it really makes it a lot easier for the reader.

      REVIEWER 1.13. figures page 18 Figure 11, can you add the name of each hair, please?

      Done – updated figure.

      REVIEWER 1.14. figures pages 16, 18, 20

      In Figures 10, 11, and 12 you clearly draw an area on the internal side that I assume is what you call the "electron-dense sheath". It is wider in papilla sensilla than in hair sensilla, most likely due to the difference in stimuli sensed that you explain in detail in the discussion. Can you say in the figure what this "internal" thing is? Can you add this difference to your list "Apart from the difference in outer appearance and structure of the tubular body"?

      This is the basal septum, but it is not certain that it is wider in the papillae sensillae, at least we could not observe this in our data sets. The impression could have been created by different scales in the 3D reconstructions and a perspective view. Therefore, we do not want to list this as a difference here, as we are not sure.

      However, we have now specified the socket septum in the figure legends and in Figures 10A, 11A and 12A.

      REVIEWER 1.15. page 11

      KNOB SENSILLA:

      Page 25;" Knob sensilla have been described under "vaious" names such as": add various.

      Done

      REVIEWER 1.16. page 12

      "reveals that the three hair and the two papilla sensilla are associated with a single dendrite." Can you write that "reveals THAT EACH OF the three hair and the two papilla sensilla" if not it seems that there is only one dendrite.

      Done

      REVIEWER 1.17. figures page 25 TERMINAL SENSORY CONES:

      Please name the t1-t7 cones in Figure 15A.

      Done – we updated the figure.

      REVIEWER 1.18. page 13

      The spiracle sense organ deserves a new paragraph. As does the papilla sensillum of the anal plate.

      Done – we added subtitles before the prargraphs.

      Discussion:

      REVIEWER 1.19. page 15

      Page 38: "v'entral" correct typo

      PAGE 15

      Done – we have updated the nomenclature  ventral 1 (v), ventral 2 (v’) and ventral 3 (v’’)

      REVIEWER #2 (Recommendations For The Authors):

      I have only a few comments:

      REVIEWER 2.1. page 5

      p.5, right column, middle: the use of trichoid, campaniform, and basiconical (sensilla) in previous works were based on even older papers and reviews that attempted to link EM architecture to function (e.g., KEIL, T. A. & STEINBRECHT, R. A. (1986). Mechanosensitive and olfactory sensilla of insects. In Insect Ultrastructure, vol. 2. (ed. R. C. King & H. Akai), pp. 477-516. New York/London: Plenum Press). Trichoid sensilla can be mechano-sensitive, olfactory, or gustatory; trichoid simply refers to the shape (hair). The same applies to basiconical sensilla. The use of "campaniform", which Ghysen et al called "papilla sensilla", was the only really problematic case, because these (Drosophila larval) sensilla did not really resemble closely the classical campaniform sensilla (e.g., adult haltere). The only reason we called them campaniform is because they were not more similar to any other type of (previously named) sensillum.

      Thank you for the explanation. The nomenclature of structures is generally always a complex topic with often different approaches and principles. We are aware of this and have therefore tried to be as careful as possible. We were not sure from this comment whether you were suggesting to change the text or whether you wanted to explain how these names were assigned to the sensilla in the past. However, we hope that the current version is in line with your understanding, but could of course make changes if necessary (see also comments of reviewer 1).

      REVIEWER 2.2. page 9

      p.21, Labial Organ: the ventral lip is the labium; the dorsal one is the labrum.

      Done – replaced labrum with labium.

      REVIEWER 2.3. page 9

      p.20/21, Ventral organ and labial organ: here, the projection of the axons could be mentioned as an ordering principle. In the previous literature, for larva and embryo, a labial organ (lbo) was described that most likely corresponds to the labial organ presented here. This (previously mentioned) lbo characteristically projects along the labial nerve to the labial segment (hence the name). It fasciculates with axons of another sensory complex, also generated by the labial segment, namely the ventral pharyngeal sensory organ (VPS). Does the labial organ described here share this axonal path?

      Yes, it has the same axonal pathway and is the same organ as the lbo. We have tried to standardise the nomenclature for all important external head organs (DO, TO, VO, LO) and have therefore used abbreviations with two letters. However, to avoid confusion, we have now added that the LO was also called lbo in the past.

      For the ventral organ, the segmental origin (to my knowledge) was never clarified. The axons of the ventral organ project along the maxillary nerve (which carries axons of the terminal=maxillary organ). This nerve, closely before entering the VNC, splits into a main branch to the maxillary segment (TO axons) and a thinner branch that appears to target the mandibular segment. This branch could contain the axons of the ventral organ (as described previously and in this paper). Could the authors confirm this axonal projection of the VO?

      In this work, we did not focus on the axonal projections into the SEZ. This is also not a simple and fast process, as in the entire larval dataset, the large head nerves unfortunately exhibit a highly variable quality of representation. Therefore, the reconstruction of nerves and individual neurons within it is often challenging and very time-consuming. The research question is, of course, very intriguing, and one could also attempt to match each sensory neuron of the periphery with the existing map of the brain connectome. However, this is a project in itself, exceeding the scope of this work, and is therefore more feasible as a subsequent project.

      REVIEWER #3 (Recommendations For The Authors):

      Minor suggestions that the authors might consider:

      REVIEWER 3.1. figures all

      Recheck the scale bar in figures and figure legends. Missing in a few places.

      Done – we replaced or added some (missing) scale bars in figures and figure legends (see annotated figure document).

      REVIEWER 3.2. figures page 4

      The color schematic in Figure 1 can be improved for readability.

      Done – we changed the color schematic, especially for the head region to improve readability.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript titled "Coevolution due to physical interactions is not a major driving force behind evolutionary rate covariation" by Little et al., explores the potential contribution of physical interaction between correlated evolutionary rates among gene pairs. The authors find that physical interaction is not the main driving of evolutionary rate covariation (ECR). This finding is similar to a previous report by Clark et al. (2012), Genome Research, wherein the authors stated that "direct physical interaction is not required to produce ERC." The previous study used 18 Saccharomycotina yeast species, whereas the present study used 332 Saccharomycotina yeast species and 11 outgroup taxa. As a result, the present study is better positioned to evaluate the interplay between physical interaction and ECR more robustly.

      Strengths & Weaknesses:

      Various analyses nicely support the authors' claims. Accordingly, I have only one significant comment and several minor comments that focus on wordsmithing - e.g., clarifying the interpretation of statistical results and requesting additional citations to support claims in the introduction.

      We are pleased the reviewer found the analyses to support the claims. We have addressed comments related to clarifying interpretations as suggested in the Recommendations to the Authors. For example, we have added discussion and clarification on the other parameters that could affect the strength of ERC correlations.

      Reviewer #2 (Public Review):

      Summary:

      The authors address an important outstanding question: what forces are the primary drivers of evolutionary rate covariation? Exploration of this topic is important because it is currently difficult to interpret the functional/mechanistic implications of evolutionary covariation. These analyses also speak to the predictive power (and limits) of evolutionary rate covariation. This study reinforces the existing paradigm that covariation is driven by a varied/mixed set of interaction types that all fall under the umbrella explanation of 'co-functional interactions'.

      Strengths:

      Very smart experimental design that leverages individual protein domains for increased resolution.

      Weaknesses:

      Nuanced and sometimes inconclusive results that are difficult to capture in a short title/abstract statement.

      We appreciate the reviewer’s acknowledgement of the experimental design. We have addressed the nuance of the results by changing the title and clarifying other statements throughout the manuscript as suggested in the reviewer’s recommendations. We have also addressed reviewer comments asking for further explanation on using Fisher transformations when normalizing the Pearson correlations for branch counts.

      Reviewer #3 (Public Review):

      Summary:

      The paper makes a convincing argument that physical interactions of proteins do not cause substantial evolutionary co-variation.

      Strengths:

      The presented analyses are reasonable and look correct and the conclusions make sense.

      Weaknesses:

      The overall problem of the analysis is that nobody who has followed the literature on evolutionary rate variation over the last 20 years would think that physical interactions are a major cause of evolutionary rate variation. First, there have been probably hundreds of studies showing that gene expression level is the primary driver of evolutionary rate variation (see, for example, [1]). The present study doesn't mention this once. People can argue the causes or the strength of the effect, but entirely ignoring this body of literature is a serious lack of scholarship. Second, interacting proteins will likely be co-expressed, so the obvious null hypothesis would be to ask whether their observed rates are higher or lower than expected given their respective gene expression levels. Third, protein-protein interfaces exert a relatively weak selection pressure so I wouldn't expect them to play much role in the overall evolutionary rate of a protein.

      We thank the reviewer for their comments and suggestions. A point to immediately clarify is that the methods studied in this manuscript deal with rate variation of individual proteins over time, and if that variation correlates with that of another protein.. The numerous studies the reviewer refers to deal with explaining the differences in average rate between proteins. These are different sources of variation. It has not, to our knowledge, been shown that variation in the expression level of a single protein over time is responsible for its variation in evolutionary rate over time, let alone to a degree that allows its variation to correlate with that of a functionally related protein. That question interests us, but it is not the focus of this study.

      In our study, we sought to test for a contribution of physical interaction to the correlation of evolutionary rate changes as they vary over time, i.e. between branches. We made many changes to clarify this distinction in our revisions.

      We agree that the manuscript would be more clear to define the forces proposed to lead to difference in rate in general, which includes expression levels. We had generally considered expression level as one of the many potential non-physical forces, but failed to make that explicit and instead focused on selection pressure. In our revision we describe expression level as another potential driver of evolutionary rate variation over time. References to previous literature have been made in the introduction. We also added a more explicit explanation of the rate covariation over time that we are measuring in contrast with the association between expression level and rate differences between proteins that was studied in previous literature.

      On point 3, the authors seem confused though, as they claim a co-evolving interface would evolve faster than the rest of the protein (Figure 1, caption). Instead, the observation is they evolve slower (see, for example, [2]). This makes sense: A binding interface adds additional constraint that reduces the rate at which mutations accumulate. However, the effect is rather weak.

      The values in Fig 1B are a measure of correlation, specifically a Fisher transformed correlation coefficient. They are not evolutionary rates, so they are not reflecting faster or slower evolution, rather more or less covariation of evolutionary rates over time. We are not predicting that physically interacting interfaces evolve faster than the rest of the protein, but rather that if physical interaction drives covariation in evolutionary rates over time, their correlation would be stronger between pairs of physically interacting domains. In response, we have used clearer language in the figure caption and reorganized labels in Figure 1B to clearly show that the values are correlations. Revised Figure 1 Legend:

      “Overview of experimental schema and hypotheses. Proteins that share functional/physical relationships have similar relative rates of evolution across the phylogeny, as shown in (A) with SMC5 and SMC6. The color scale along the bottom indicates the relative evolutionary rate (RER) of the specific protein for that species compared to the genome-wide average. A higher (red) RER indicates that the protein is evolving at a faster rate than the genome average for that branch. Conversely, a lower (blue) RER indicates that protein is evolving at a slower rate than the genome average. The ERC (right) is a Pearson correlation of the RERs for each shared branch of the gene pair. (B) Suppose the correlation in relative evolutionary rates between two proteins is due to compensatory coevolution and physical interactions. In that case, the correlation of their rates (ie. ERC value) would be higher for just the amino acids in the physically interacting domain. (C) Outline of experimental design. Created with Biorender.com

      All in all, I'm fine with the analysis the authors perform, and I think the conclusions make sense, but the authors have to put some serious effort into reading the relevant literature and then reassess whether they are actually asking a meaningful question and, if so, whether they're doing the best analysis they could do or whether alternative hypotheses or analyses would make more sense.

      [1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4523088/

      [2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4854464/

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      (1) Numerous parameters influence ECR calculation. The authors note that their use of a large dataset of budding yeast provides sufficient statistical power to calculate ECR. I agree with that. However, a discussion of other parameters needs to be improved, especially when comparing the present study to others like Kann et al., Hakes et al., and Jothi et al.. For example, what is the evolutionary breadth and depth used in the Kann, Hakes, Jothi and other studies? How does that compare to the present study? Budding yeast evolve rapidly with gene presence/absence polymorphisms observed in genes otherwise considered universally conserved. Is there any reason to expect different results in a younger, slower-evolving clade such as mammals? There is potential to acknowledge and discuss other parameters that may influence ECR, such as codon optimization and gene/complex "essentiality," among others.

      More discussion of these parameters is a good idea. We have added the number and phylogeny of species used in the previous studies in the discussion paragraph starting with “Previous studies attributed varying degrees of evolutionary rate covariation signal to physical interactions between proteins.” We also like the idea of studying the effect of younger and more slowly evolving clades as opposed to the contrary, but currently we lack the required number of datasets to do this.

      We have also added more discussion and clarification of potential non-physical forces leading to ERC correlations in the introduction.

      Minor comments

      (1) It would be good to add a citation to the second sentence of the first paragraph, which reads, "It has been observed that some genes have rates that covary with those of other genes and that they tend to be functionally related."

      Added citation to Clark et al. 2012

      (2) In the last sentence of the first paragraph of the introduction, ERC is discussed in the context of only amino acid divergence, however, there is no reason that DNA sequences can't be used, especially if ERC is being calculated among species that are less ancient than, for example, Saccharomycotina yeasts. Thus, it may be more accurate to suggest that ERC measures how correlated branch-specific rates of sequence divergence are with those of another gene.

      Nice suggestion to generalize. We have made this change.

      (3) ERC was not calculated in reference #2. For the sentence "Protein pairs that have high ERC values (i.e., high rate covariation) are often found to participate in shared cellular functions, such as in a metabolic pathway2 or meiosis3 or being in a protein complex together," I think more appropriate citations (including inspiring work by the corresponding author) would be

      a) Coevolution of Interacting Fertilization Proteins (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000570)

      b) Evolutionary rate covariation analysis of E-cadherin identifies Raskol as a regulator of cell adhesion and actin dynamics in Drosophila (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007720)

      c) An orthologous gene coevolution network provides insight into eukaryotic cellular and genomic structure and function (https://www.science.org/doi/10.1126/sciadv.abn0105)

      d) PhyKIT: a broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data (https://academic.oup.com/bioinformatics/article/37/16/2325/6131675)

      Thank you for pointing out these works. We agree that there are more appropriate citations and we have referenced your suggested b-d.

      (4) The dataset of 343 yeast species also includes outgroup taxa. Therefore, indicating that 332 species are Saccharomycotina yeast and 11 are closely related outgroup taxa may be more accurate.

      Thank you for the suggestion, the following sentence has been added, citing the Shen et. al 2018 paper that the dataset was derived from:

      “To investigate the discrepancy between contributions to ERC signal from co-function and physical interaction, we used a dataset of 343 evolutionarily distant yeast species. 332 of the species are Saccharomycotina with 11 closely related outgroup species providing as much evolutionary divergence as humans to roundworms3”

      (5) Are there statistics/figures to support the claim that "Almost all complexes and pathways had mean ERC values significantly greater than a null distribution consisting of random protein pairs"?

      This is shown in supplementary figure 1. A reference to this figure was added as well as quantification within the text.

      (6)Similar to the previous comment, can quantitative values be added to the statement "While protein complexes appear to have higher mean ERC scores than the pathways..."?

      The median of the mean ERC scores for protein complexes is 5.366 while the median for the mean ERC score in pathways is 4.597. This quantification has been included in the text: “While protein complexes have higher mean ERC scores (median 5.366) than the pathways (median 4.597), the members of a given complex are also co-functional, making interpretation of the relative contribution of physical interactions to the average ERC score difficult”

      These quantifications are were also added to the figure caption for figure 2A

      (7) A semantic point: In the sentence "The lack of significance in the global permutation test shows that the...", I recommend saying that the analysis suggests, not shows, because there is potential for a type II error.

      Good suggestion, we have made this change.

      (8) The authors suggest that shared evolutionary pressures, "and hence shared levels of constraint," drive signatures of coevolution. The manuscript does not delve into selection measures (e.g., dN/dS). Perhaps it would be more representative to remove any implication of selection.

      We have added better language to clarify that discussion of selection is purely a hypothesis and that selection is not probed in our analyses.

      “Previous work finds evidence that relaxation of selective constraint can lead to drastic rate variation and hence covariation6. Rather, the greater and consistent contribution comes from non-physical interaction drivers that could include variation in essentiality, expression level, codon adaptation, and network connectivity. These non-physical forces would be under shared selective pressures and hence shared levels of constraint, the result of which was elevated ERC between non-interacting proteins, as visible in our study of genetic pathways that do not physically interact (Figure 2).”

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      -Title: In my opinion, the title of the manuscript is a somewhat misleading summary of the results of this paper. In the majority of the analyses in this paper, physical interactions do account for a significantly outsized portion of the ERC signature. The current title downplays the consistent (although sometimes small effect-sized) result that physically interacting domains do show higher ERC than non-physically interacting domains by every statistical measure employed in this paper to compare physical vs non-physical interactions. The authors' interpretation of their results within the manuscript body is that the effect of physical interactions is an inconsistent, weak, and non-generalizable driver of ERC. I generally agree with the authors' interpretations, but the nuance of these interpretations is lost in the title of the paper. I would suggest rewording the title to try to capture the nuance or at least be subjectively accurate. For example, stating that "...physical interactions are not the sole driving force.." is inarguably accurate based on these results.

      As an alternative title, I would suggest focusing on an important takeaway from the paper: ERC is a reliable predictor of co-functional interactions but not necessarily physical interactions. I agree with the statement that "there is not a strong enough signal to confidently call an interaction physical or not and would be of little value to an experimentalist wanting to infer interacting domains" and I think that a title that emphasizes this idea would be more accurate and impactful.

      Great suggestion. We agree that the current title is downplaying the minutiae of the method and the signal we capture with it, we have used your suggested title.

      There are an outsized number of complexes that had ROC-AUCs greater than 0.5 which is why we performed the permutation tests to determine how significant each of the individual ROC-AUCs were given the differing number of protein/domain pairs in each complex. Between the statistical methods used only 3 of the 17 complexes ranked physical interactions significantly higher than non-physically interacting domains in every analysis. Even among the 3 that were statistically significant some of the physically interacting domains still fell among the bottom portion of the ERC scores for that complex (Figure 5: MCM and CUL8 complexes) This is why we concluded that physical interactions are not the sole driving force of the signal captured by ERC.

      -Abstract: related to my preceding comment, the word "negligible" in the abstract is misleading. If physical interactions were truly entirely negligible, the comparisons of physically interacting vs non-physically interacting domains would yield 0.5. Instead, these comparisons always yielded results greater than 0.5. Consider rewording.

      Thank you for the suggestion this phrasing has been changed to “Therefore, we conclude that coevolution due to physical interaction is weak, but present in the signal captured by ERC”

      We agree that “negligible” may be too strong of a word, however, the comparisons do not always yield results greater than 0.5.

      5 of the 17 complexes do not reach the 0.5 threshold for the initial ROC analysis and even among those that do, only 4 had significantly high ROC-AUCs. You are correct that the signal is not completely negligible which is why we continued by determining if the physical interaction was driving high ERC only within proteins (Figure 5)

      -Figure 3: I think there may be an error in the domain labeling in Figure 3. The comparison between OKP1_2 and AME1_3 is the highest ERC value in the matrix. From the complex structure, it appears that OKP1_2 and AME1_3 are two helix domains that appear to physically interact. However, in the ERC matrix, they are not shaded to indicate they are a physical interaction pair. Please double-check that the interacting domains are properly annotated, since mis-annotation would have a large impact on the interpretation of this figure with respect to the overall question the paper addresses.

      Thank you for catching this - fixed.

      Minor comments:

      -Methods: "The full ERC pipeline can be found at (Github)." Provide github URL here? Thanks for the catch, fixed

      -Discussion: "Evidence for physical coevolution however was tempered by a global permutation test, which did not reach significance, indicating that this inference is sensitive to approach and further underlines the relatively weak contribution of physical coevolution." The word "relatively" may not be a good choice of words. In comparison to what? As is, the phrasing could be interpreted as implying "in comparison to non-physical interactions". This would not be accurate, because the results show that in general, physical interactions are a stronger contributor to ERC (consistent trend but varied significance, depending on methodology) than non-physical interactions.

      Thank you for your help with clarification. The word relatively was removed.

      However, we do not agree that in general physical interactions are a stronger contributor to ERC than non-physical interactions (such as gene expression, codon adaptation, etc.). In all of our statistical tests a maximum of 5 of the 17 complexes ranked physical interactions significantly higher than non-physical interactions. While the ROC-AUC is greater than 0.5 for 12 of the 17 complexes only 4 of those were significant.

      -I have not seen Fisher-transformed correlation coefficients used in the context of ERC. I understand that it's helpful in normalizing the results so that they are comparable between ERC comparisons with differing numbers of overlapping branches (i.e. points on a linear correlation plot). A reference of where the authors got this idea or a little more verbiage to describe the rationale would be helpful. On a related note, I would expect that using linear correlation p-value instead of R-squared would account for differences in overlapping branches, eliminating the need to apply fisher-transformation. It would be helpful for the authors to outline their rationale for using a correlation coefficient rather than a P-value.

      We agree that this method could be made clearer. We made a methodological choice to use Fisher transformation over linear correlation p-value. Both methods should achieve the same end result by taking the number of branches into consideration. We have added additional explanation to the results section “Both protein pathways and complexes have elevated ERC”:

      “ERC was calculated for all pairs of the 12,552 genes. For each pair the correlation is Fisher transformed to normalize for the number of shared branches that contribute to the correlation. This normalization is necessary to reduce false positives that have high correlation solely due to a small number of data points. This normalization also allows for direct comparison of ERC between gene pairs that have differing numbers of branches contributing to the score.”

      We also added additional explanation in the methods section including the formula used to calculate the Fisher transformation

      -Did the authors use Pearson or Spearman correlation coefficient?

      Pearson. We clarified this in the methods section, “Calculating evolutionary rate covariation” : “Evolutionary rate covariation is calculated by correlating relative evolutionary rates (RERs) between two gene trees using a Pearson correlation.”

      -Did the authors explore ERC between domains within a single protein? Do domains within a protein exhibit ERC? I would expect that they do. If they do, this could likely be attributed to linkage/genetic hitchhiking, representing a new angle/factor beyond physical interaction that could lead to ERC. This is just an idea for a future analysis, not necessarily a request within the scope of the present paper.

      We did calculate the ERC between domains of a single protein but did not include them in the analysis since they didn’t address the specific question we posed. As expected they are highly correlated, and past unpublished studies in the lab do find a very weak, but detectable genome-wide, signature of rate covariation between neighboring colinear genes on a chromosome. That signal was however so weak as to be eclipsed by true functional relationships, when present.

      Reviewer #3 (Recommendations For The Authors):

      Please read the literature and revise accordingly.

      We understand the confusion surrounding previous literature on the relationship between expression levels and evolutionary rates when comparing between different proteins. Those studies clearly showed how expression level is highly predictive of a given protein’s average evolutionary rate. However, we are studying the change in evolutionary rate over branches for single proteins. This is inherently different because we’re following rate fluctuations in the same protein over time. To our knowledge it has not yet been shown that expression level commonly varies enough over time to produce large rate variations over time in the same protein, and if it is responsible for the correlations of rate we observe between co-functional proteins. It is however reasonable to expect that what governs between-protein differences in rate could also contribute to between-branch differences (over time for a single protein). In fact, our earlier study approached this (Clark et al. Genome Research 2012). We expect expression level could influence rate over time and lump its effect together with general non-physical forces, such as selection pressures. We recognize we could do better in defining more of the non-physical forces and the past literature. We added the following section to the introduction and many other clarifying statements throughout the manuscript:

      “For the purposes of this study, the forces that contribute to correlated evolutionary rates are grouped into two bins, physical and non-physical. The physical force is coevolution occurring at physical interaction interfaces. Non-physical forces include gene co-expression, codon adaptation, selective pressures, and gene essentiality. There is a well accepted negative relationship between gene expression and rate of protein evolution where genes that are highly expressed generally have slower rates of evolution14,15. However, Cope et al.16 found that there is a weak relationship between both gene expression and the number of interactions a protein has with the coevolution of expression level. Conversely, they found a strong relationship between proteins that physically interact and the coevolution of gene expression. These findings illuminate the difference between the strong relationship of gene expression level on the average evolutionary rate of a protein and the weak contribution of gene expression level to correlated evolutionary rates of proteins across branches. The finding that physically interacting proteins have strong expression level coevolution brings to question how much coevolution of physically interacting proteins contributes to overall covariation in protein evolutionary rates.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings that examine both how Down syndrome (DS)-related physiological, behavioral, and phenotypic traits track across time, as well as how chronic treatment with green tea extracts 25 enriched in epigallocatechin-3-gallate (GTE-EGCG), administered in drinking water spanning prenatal through 5 months of age, impacts these measures in wild-type and Ts65Dn mice. However, the strength of the evidence is incomplete, due to high variability across measures, perhaps attributable to a failure to include sex as a factor for measures known to be sexually dimorphic. This study is of interest to scientists interested in Down Syndrome and its' treatment, as well as scientists who study disorders that impact multiple organ systems.

      Public Reviews:

      Using Ts65Dn - the most commonly used mouse model of Down syndrome (DS) - the goal of this study is two-pronged: 1) to conduct a thorough assessment of DS-related genotypic, physiological, behavioral, and phenotypic measures in a longitudinal manner; and 2) to measure the effects of chronic GTE-EGCG on these measures in the Ts65Dn mouse model. Corroborating results from several previous studies on Ts65Dn mice, findings of this study show confirm the Ts65Dn mouse model exhibits the suite of traits associated with DS. The findings also suggest that the mouse model might have experienced drift, given the milder phenotypes than those reported by earlier studies. Results of the GTE-EGCG treatment do not support its therapeutic use and instead show that the treatment exacerbated certain DS-related phenotypes.

      Strengths:

      The authors performed a rigorous assessment of treatment and examined treatment and genotypic alterations at multiple time points during growth and aging. Detailed analysis shows differences in genotype during aging as well as genotype with treatment. This study is solid in the overarching methodological approach (with the exception of RNAseq, described below). The biggest strength of the study is its approach and dataset, which corroborate results from a multitude of past studies on Ts65Dn mice, albeit on adult specimens. It would be beneficial for the dataset to be made available to other researchers using a public data repository.

      We deeply appreciate the reviewers' positive feedback. Their acknowledgment of the solid methodological approach and the rigorous assessment of genotypic and treatment effects over various developmental stages resonates with our motivation. Their suggestion to make the dataset available in a public data repository for other researchers is well-taken. We are committed to data sharing and we are creating a dedicated platform to facilitate the accessibility of our research data to the scientific community. Given its size and complexity, we currently hold the dataset available upon reasonable request to the corresponding authors.

      Weaknesses:

      There are several primary weaknesses, described below:

      Sex was not considered in the analyses.

      The number of experimental animals of each sex are not clearly represented in the paper, but are buried in supplemental tables, and the Ns for the RNAseq are unclear. No analyses were done to examine sex differences in male/female DS or WT animals with or without treatment. Body measurements will greatly vary by sex, but this was not taken into consideration during assessments. As such, there is a high amount of variability within each cohort measured for body assessments (tibia, body weight, skeletal development etc.). Supplemental table 14 had the list of each animal, but not collated by sex, genotype or treatment, making it difficult to assess the strength of each measurement.

      Our study primarily concentrated on providing a holistic understanding of the impact of trisomy and GTE-EGCG treatment on Down syndrome, and was not explicitly designed to investigate sexual dimorphism. However, instead of reporting on only one sex and thereby obviating sex as a source of variation, as in previously published studies, we decided to include both male and female mice within the study design to represent a more realistic portrayal of the nature of Down syndrome in a heterogeneous population. By encompassing both sexes, we aim to better capture the variability in Down syndrome.

      As we do acknowledge the significance of sex bias in scientific research, we considered performing post-hoc analyses to test the effect of sexual dimorphism, but found that our dataset was underpowered to obtain reliable results, since our experiments were not a priori designed to investigate this question and sample sizes for each sex by separate were not large enough. Nevertheless, considering the reviewer’s comment, we have taken specific steps to improve the representation of sex-related information and to enhance the clarity of our manuscript.

      First, we have redesigned all figures using empty and full symbols to distinguish male from female mice within each analysis, providing readers with an immediate sense of the sex distribution in each experimental group. Moreover, we have modified Supplementary Table 1 to offer a comprehensive breakdown of the number of male and female mice for each test, along with their respective genotypes and treatment groups. This table aims to make the sample size and sex distribution within our study as transparent as possible for our readers. While we acknowledge that our study lacked the statistical power to perform a detailed sex-based analysis, the visual representation of sex in our data shows which systems are mainly affected by sexual dysmorphism. This evidence can guide future investigations directly designed to investigate sexual effects in certain systems or structures.

      Key results are not clearly depicted in the main figures

      Rigorous assessment of each figure and the clarity of the figure to convey the results of the analysis needs to be performed. Many of the figures do not clearly represent the findings, with authors heavily relying on supplemental figures to present details to explain results. Figure legends do not adequately describe figures; rather, they are limited to describing how the analysis is performed. For example, LDA plots in Figure 4 do not clearly convey the results of metabolite analysis.

      Overall, the amount of data presented here is overwhelming, making it difficult to interpret the findings. Some assessments that do not add to the overall paper need to be removed. Clarifying the text, figures and trimming the supplement to represent the data in a manner that is easily understood will improve the readability of the paper. For example, perhaps measures which are not strongly impacted by genotype could be moved to the supplement, because they are not directly relevant to the question of whether GTE-EGCG reverses the impact of trisomy on the measures.

      As rightly pointed out by the reviewers, the vast amount of data generated by our holistic and longitudinal approach is one of the primary strengths, but also an important challenge in our study. Our dataset encompasses a comprehensive assessment of the effects of treatment and genotypic alterations at multiple time points during growth and aging. This multi-dimensional evaluation is pivotal to our research, and relegating data to supplementary material would restrict access to this holistic understanding. Our aim is to provide readers with a complete view of the complex interactions we have explored, and retaining this data in the main text is essential to uphold the integrity of our work.

      Indeed, we specifically chose to submit or manuscript to eLife because this journal allows to access supplemental information directly from the text and figures in the main manuscript and best aligned with our approach to data presentation. The eLife format permits us to offer readers a quick and informative overview of all the data within the main figures employing multivariate techniques such as Linear Discriminant Analysis or Principal Component Analysis. Subsequently, we supply more detailed analyses in the supplementary figures for readers who wish to delve deeper into specific aspects. Furthermore, while certain figures may be categorized as supplementary, for us it is crucial, and we would like to emphasize, that every result is comprehensively described in the main text.

      Acknowledging the concerns raised about the density of our paper and the potential challenges in interpreting the findings, we have conducted a thorough review of the text and figure legends. We have made revisions with the goal to enhance clarity and readability. We have made dedicated efforts to ensure that readers can readily grasp the significance of our results and appreciate the intricacies of our findings. We firmly believe that with these revisions, our chosen approach is the most effective means of presenting the richness of our data and maintaining the integrity of our findings.

      Lack of clarity in the behavioral analyses

      Behavioral assessments are not clearly written in the methods. For example, for the novel object recognition task, it isn't clear how preference was calculated. Is this simply the percent of time spent with the novel object, or is this a relative measure (novel:familiar ratio)? This matters because if it is simply the percent of time, the relevant measure is to compare each group to 50% (the absence of a preference). The key measures for each test need to be readily distinguished from the control measures.

      There are also many dependent behavioral measures. For example, speed and distance are directly related to each other, but these are typically reported as control measures to help interpret the key measure, which is the anxiety-like behavior. Similarly, some behavioral tests were used to represent multiple behavioral dimensions, such as anxiety and arousal. In general, the measurements of arousal seem atypical (speed and distance are typically reported as control measures, not measures of arousal). Similarly, measures of latency during training would not typically be used as a measure of long-term memory but instead reported as a control measure to show learning occurred. LDA analysis requires independence of the measures, as well as normality. It does not appear that all of the measures fed into this analysis would have met these assumptions, but the methods also do not clearly describe which measures were actually used in the LDA.

      We agree with the reviewers’ concerns about the clarity of our behavioral analyses and we have thus added information to the methods section to clarify the procedures. Specifically, for SPSN, social approach was recorded as time spent close to STR1, and a preference ratio was calculated as Pref= 100 Time close to STR1/(Time close to STR1 + Time close to Empty). Social recognition memory was scored as preference towards STR2 and calculated as Pref =100 (time close to STR2) / (Time close to STR1 + Time close to STR2). For NOR, preference for novel object was calculated as Pref=100* Time novel object / (Time familiar object + novel object).

      With regards to the different variables reported for the behavioral protocols, we agree that some measures, such as path length and speed can be used as control measures. For example, in an open field test, path length is an important control measure to assess whether an animal is engaged in the task. However, if an animal is actively moving, the amount of distance covered can but does not have to correlate with the amount of time that a mouse spends in the center of the open field. Using the measure of distance covered as a measure for general arousal and time spent in the center as a measure for anxiety related behavior allows a more nuanced evaluation of animal behavior. For instance, two animals spending similar amounts of time in the center may exhibit differences in the distance they cover. In this scenario, we would argue that anxiety related behavior (defined as exploring the center of an open field) would not reflect well a behavioral difference between the two animals, while the aspect of arousal clearly is a differencing factor.

      Regarding the PA task and the use of latency during training, we agree that typically latency during training can be used as control measure to show that learning occurred. However, our study involved testing animals at two distinct time points. Contextual fear conditioning creates very robust memory traces that can persist for weeks or even months, and therefore the starting premise is very different when repeating the test. Initially, the animals were experimentally naïve and had not yet experienced a foot shock, leading to a rapid entry into the dark box. However, after experiencing the first CS-US presentation, a robust and persistent contextual fear memory trace is formed. Therefore, the latency observed in the second training phase of the PA reflects in essence long-term contextual fear memory, that is robustly displayed in WT animals but less in treated WT and TS animals. We have included this clarification in the methods and results sections.

      Finally, we want to thank the reviewer for noticing the error in the LDAs, as the analysis was indeed performed including dependent variables for some systems. We have re-evaluated the LDAs for the behavioral tests and tibia microarchitecture tests, excluding dependent variables. As a result, the text and significance levels have been adjusted accordingly. To enhance transparency and clarity, we have included Supplementary Table S21, which precisely outlines the variables included in each LDA.

      Unclear value of RNAseq

      RNAseq was performed in cerebellum, a relatively spared region in DS pathology at an early time point in disease. Further, the expression of 125 genes triplicated in DS was shown in a PCA plot to highly overlap with WT, indicating that there are minimal differences in gene expression in these genes. If these genes are not critical for cerebellar function, perhaps this could account for the lack of differences between WT and Ts65Dn mice. If the authors are interested in performing RNAseq, it would have made more sense to perform this in hippocampus (to compare with metabolites) and to perform more stringent bioinformatic analysis than assessment by PCA of a limited subset of genes. Supplementary Table S14, which shows the differentially expressed genes, appears to be missing from the manuscript and cannot be evaluated. Additionally, the methods of the RNAseq are not sufficiently described and lack critical details. For example, what was the normalization performed, and which groups were compared to identify differentially expressed genes? It would also be worthwhile to describe how animals were identified for RNAseq-were those animals representative of their groups across other measures?

      We acknowledge the reviewers' comments on the RNAseq analysis and would like to provide additional insights into our rationale and choices for this analysis. The primary aim of our RNAseq analysis was to offer supplementary evidence in support of the broader context of our paper. Rather than focusing on specific genes, our aim was to assess potential alterations in transcription within genes triplicated in the mouse model and explore differentially expressed genes across the entire genome. Therefore, we conducted a global analysis of the triplicated genes using a PCA and analyzed the differentially expressed genes across the entire genome as shown in Supplementary Table S14. The table was originally included as a separate Excel file but apparently it was not received by the reviewers. We have contacted the eLife editorial to ensure its inclusion in the current version. Furthermore, we have modified the text to clarify that both the triplicated genes and the entire genome were analyzed.

      Regarding the use of cerebellum instead of hippocampus, we agree with the reviewers that the hippocampus is a major tissue of interest in the study of Down Syndrome since it mostly relates to cognition. Trisomic patients, however, also display other typical features such as for example a delay in the acquisition of motor skills. Here we decided to focus on the cerebellum as it is primarily associated to the locomotor system but also plays a role in other cognitive functions such as language processing and memory. Furthermore, at the time of the RNAseq analysis, the mice were 8 months old, equivalent to the adult human stage, and previous studies have shown transcriptomic alterations in this tissue and mouse model (Olmos-Serrano et al., 2016; Saran et al., 2003).

      The lack of observable differences between WT and Ts65Dn mice in our PCA analysis may be attributed to several factors as discussed in our article. First, the high variability within each group, inherent to the complexity of DS, may obscure inter-group differences. Additionally, the subtlety of gene expression differences between WT and trisomic mice in the set of triplicated genes, as suggested by other transcriptomics studies on DS (Aït Yahya-Graison et al., 2007; Lyle et al., 2004; Olmos-Serrano et al., 2016; Saran et al., 2003), may contribute to the limited distinctions observed. Furthermore, regarding treatment effects, the timing of the RNAseq analysis should be considered since it was conducted at the endpoint, three months after treatment cessation. This temporal aspect could imply that the effects of the drug are not persistent, and a molecular memory might not be formed and maintained.

      Nevertheless, we appreciate the reviewers' constructive comments and acknowledge the potential for more stringent bioinformatic analyses. While our intention was to provide an initial, global perspective, we are eager to support further investigations that delve deeper into the complexities of DS-related molecular mechanisms. Consequently, the dataset is available for other researchers to explore more specific questions upon request.

      Finally, we have updated the methods section of the article to offer more detailed information on RNAseq processing and analysis. We have also clarified that all the surviving mice were included in the analysis.

      Recommendations for the authors:

      (1) Please add power calculations for each of the assessments.

      We would like to clarify that we had already conducted power calculations as part of the initial planning and design phase of our study. After data acquisition and analysis, we have utilized appropriate statistical methods to interpret the results based on the data we have collected. Given that we had conducted a priori power calculations prior to data collection and that our analysis is based on the acquired data, we do not see the added value in including post hoc power calculations. Our primary focus has been on performing the correct statistical analyses to accurately interpret the results and draw meaningful conclusions.

      (2) Introduction has some excessive references for each statement, which are not necessary. For instance: lines 67-73 are only references for 1 statement and lines 74-76 are references for a 2nd statement in the same sentence.

      We have removed redundant references.

      (3) Introduction: Lines 136-146 Gene names need to be spelled out, not just the IDs. Were these studies done in human or mouse models of DS?

      We have spelled out the names of the genes.

      (4) Why was brain volume and brain structure size normalized to body weight, not clearly explained?

      The choice to normalize brain volume and brain structure size to body weight was a deliberate decision made to address potential confounding factors in our study. In the case of trisomic (TS) mice, they are generally smaller in size compared to their wild-type (WT) counterparts. The same may hold true for sex-related size differences. Without normalization, assessing brain volume and structure size could be misleading, as it might reflect the differences in overall body size rather than providing insights into the specific aspects of brain structure that we aimed to investigate. We have clarified this in the methods section.

      (5) In cognitive tests, some of the WT data represented in Figure 3 does not match supplemental findings. Again power calculations may indicate a higher number of WT mice are needed to clarify this discrepancy.

      We appreciate the reviewers' observation regarding the disparities between the data presented in Figure 3 and the supplemental figures. We would like to clarify that these variations are a result of the distinct analytical approaches employed in the two sets of data.

      In Figure 3 and all main figures, the data were analyzed using multivariate tests, which consider multiple variables simultaneously and are particularly suited for investigating the collective impact of multiple factors. Conversely, the results shown in the supplementary figures were derived from univariate tests, which focus on individual variables and are well-suited for addressing specific questions related to each variable in isolation. The discrepancies between the data in the main figures and the supplementary figures can be attributed to the differences in the analytical methods chosen.

      As for the suggestion of conducting power calculations to address the observed differences, we believe that the differences in data are inherent to the distinct analytical strategies and the specific research questions each analysis intended to answer. Power calculations may not be the most suitable approach in this context, as they pertain to sample size planning for hypothesis testing and may not reconcile the inherent dissimilarity between multivariate and univariate analyses.

      Aït Yahya-Graison, E., Aubert, J., Dauphinot, L., Rivals, I., Prieur, M., Golfier, G., . . . Potier, M. C. (2007). Classification of human chromosome 21 gene-expression variations in Down syndrome: impact on disease phenotypes. Am J Hum Genet, 81(3), 475-491. https://doi.org/10.1086/520000

      Lyle, R., Gehrig, C., Neergaard-Henrichsen, C., Deutsch, S., & Antonarakis, S. E. (2004). Gene expression from the aneuploid chromosome in a trisomy mouse model of down syndrome. Genome Res, 14(7), 1268-1274. https://doi.org/10.1101/gr.2090904

      Olmos-Serrano, J. L., Kang, H. J., Tyler, W. A., Silbereis, J. C., Cheng, F., Zhu, Y., . . . Sestan, N. (2016). Down Syndrome Developmental Brain Transcriptome Reveals Defective Oligodendrocyte Differentiation and Myelination. Neuron, 89(6), 1208-1222. https://doi.org/10.1016/j.neuron.2016.01.042

      Saran, N. G., Pletcher, M. T., Natale, J. E., Cheng, Y., & Reeves, R. H. (2003). Global disruption of the cerebellar transcriptome in a Down syndrome mouse model. Hum Mol Genet, 12(16), 2013-2019. https://doi.org/10.1093/hmg/ddg217

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      In this study, Yan et al. investigate the molecular bases underlying mating type recognition in Tetrahymena thermophila. This model protist possesses a total of 7 mating types/sexes and mating occurs only between individuals expressing different mating types. The authors aimed to characterize the function of mating type proteins (MTA and MTB) in the process of self- and non-self recognition, using a combination of elegant phenotypic assays, protein studies, and imaging. They showed that the presence of MTA and MTB in the same cell is required for the expression of concavalin-A receptors and for tip transformation - two processes that are characteristic of the costimulation phase that precedes cell fusion. Using protein studies, the authors identify a set of additional proteins of varied functions that interact with MTA and MTB and are likely responsible for the downstream signaling processes required for mating. This is a description of a fascinating self- and non-self-recognition system and, as the authors point out, it is a rare example of a system with numerous mating types/sexes. This work opens the door for the further understanding of the molecular bases and evolution of these complex recognition systems within and outside protists.

      The results shown in this study point to the unequivocal requirement of MTA and MTB proteins for mating. Nevertheless, some of the conclusions regarding the mode of functioning of these proteins are not fully supported and require additional investigation.

      Strengths:

      (1) The authors have established a set of very useful knock-out and reporter lines for MT proteins and extensively used them in sophisticated and well-designed phenotypic assays that allowed them to test the role of these proteins in vivo.

      (2) Despite their apparent low abundance, the authors took advantage of a varied set of protein isolation and characterization techniques to pinpoint the localization of MT proteins to the cell membrane, and their interaction with multiple other proteins that could be downstream effectors. This opens the door for the future characterization of these proteins and further elucidation of the mating type recognition cascade.

      Weaknesses:

      The manuscript is structured and written in a very clear and easy-to-follow manner. However, several conclusions and discussion points fall short of highlighting possible models and mechanisms through which MT proteins control mating type recognition:

      (1) The authors dismiss the possibility of a "simple receptor-ligand system", even though the data does not exclude this possibility. The model presented in Figure 2 S1, and on which the authors based their hypothesis, assumes the independence of MTA and MTB proteins in the generation of the intracellular cascade. However, the results presented in Figure 2 show that both proteins are required to be active in the same cell. Coupled with the fact that MTA and MTB proteins interact, this is compatible with a model where MTA would be a ligand and MTB a receptor (or vice-versa), and could thus form a receptor-ligand complex that could potentially be activated by a non-cognate MTA-MTB receptor-ligand complex, leading to an intracellular cascade mediated by the identified MRC proteins. As it stands, it is not clear what is the proposed working model, and it would be very beneficial for the reader for this to be clarified by having the point of view of the authors on this or other types of models.

      We are very grateful that Reviewer #1 proposed the possibility that MTA and MTB form a receptor-ligand complex in which one acting as the ligand and the other as the receptor. We considered this hypothesis when asking how dose MTRC function, too. However, our current results do not support this idea. For instance, if MTA were a ligand and MTB a receptor, we would expect a mating signal upon treatment with MTAxc protein, but not with MTBxc. Contrary to this expectation, our experiments revealed that both MTAxc and MTBxc exhibit very similar effects (Figure 5, green and blue), and their combined treatment produces a stronger effect (Figure 5, teal). This suggests a mixed function for both proteins. (We incorporated this discussion into the revised version [line 120-121, 240-244].) It is pity that our current knowledge does not provide a detailed molecular mechanism for this intricate system. We are actively investigating the protein structures of MTA, MTB, and the entire MTRC, hoping to gain deeper insights into the molecular functions of MTA and MTB.

      Additionally, we also realized that the expression we used in the previous version, “simple receptor-ligand model”, is not clearly defined. As Reviewer #1 pointed out, in this section, we examined whether the individual proteins of MTA and MTB act as a couple of receptor and ligand. We think this is the simplest possibility as a null hypothesis for Tetrahymena mating-type recognition. We have clarified it in the revised version (line 90-91, 104-106). According to this section, we proposed that MTA and MTB may form a complex that serves as a recognizer (functioning as both ligand and receptor) (line 117-118).

      (2) The presence of MTA/MTB proteins is required for costimulation (Figure 2), and supplementation with non-cognate extracellular fragments of these proteins (MTAxc, or MTBxc) is a positive stimulator of pairing. However, alone, these fragments do not have the ability to induce costimulation (Figure 5). Based on the results in Figures 5 and 6 the authors suggest that MT proteins mediate both self and non-self recognition. Why do MTAxc and MTBxc not induce costimulation alone? Are any other components required? How to reconcile this with the results of Figure 2? A more in-depth interpretation of these results would be very helpful, since these questions remain unanswered, making it difficult for the reader to extract a clear hypothesis on how MT proteins mediate self- and non-self-recognition.

      Several factors could contribute to the inability of MTA/Bxc to induce costimulation. It is highly likely that additional components are necessary, given that MTA/B form a protein complex with other proteins. Moreover, the expression of MTA/Bxc in insect cells, compared with Tetrahymena, might result in differences in post-translational modifications. Additionally, there are variations in protein conditions; on the Tetrahymena membrane, these proteins are arranged regularly and concentrated in a small area, while MTA/Bxc is randomly dispersed in the medium. The former condition could be more efficient. If there is a threshold required to stimulate a costimulation marker, MTA/Bxc may fail to meet this requirement. Much more studies are needed to fully answer this question. We acknowledged this limitation in the revised version (line 244-248).

      Reviewer #2:

      This manuscript reports the discovery and analysis of a large protein complex that controls mating type and sexual reproduction of the model ciliate Tetrahymena thermophila. In contrast to many organisms that have two mating types or two sexes, Tetrahymena is multi-sexual with 7 distinct mating types. Previous studies identified the mating type locus, which encodes two transmembrane proteins called MTA and MTB that determine the specificity of mating type interactions. In this study, mutants are generated in the MTA and MTB genes and mutant isolates are studied for mating properties. Cells missing either MTA or MTB failed to co-stimulate wild-type cells of different mating types. Moreover, a mixture of mutants lacking MTA or MTB also failed to stimulate. These observations support the conclusion that MTA and MTB may form a complex that directs mating-type identity. To address this, the proteins were epitope-tagged and subjected to IP-MS analysis. This revealed that MTA and MTB are in a physical complex, and also revealed a series of 6 other proteins (MRC1-6) that together with MTA/B form the mating type recognition complex (MTRC). All 8 proteins feature predicted transmembrane domains, three feature GFR domains, and two are predicted to function as calcium transporters. The authors went on to demonstrate that components of the MTRC are localized on the cell surface but not in the cilia. They also presented findings that support the conclusion that the mating type-specific region of the MTA and MTB genes can influence both self- and non-self-recognition in mating.

      Taken together, the findings presented are interesting and extend our understanding of how organisms with more than two mating types/sexes may be specified. The identification of the six-protein MRC complex is quite intriguing. It would seem important that the function of at least one of these subunits be analyzed by gene deletion and phenotyping, similar to the findings presented here for the MTA and MTB mutants. A straightforward prediction might be that a deletion of any subunit of the MRC complex would result in a sterile phenotype. The manuscript was very well written and a pleasure to read.

      Thanks for the valuable comments and suggestions. We are currently in the process of constructing deletion strains for these genes. As of now, we have successfully obtained ΔMRC1-3 and MRC4-6 knockdown strains. Our preliminary observations indicate that ΔMRC1-3 strains are unable to undergo mating. However, we prefer not to include these results in the current manuscript, as we believe that more comprehensive studies are still needed.

      Reviewer #3:

      The authors describe the role, location, and function of the MTA and MTB mating type genes in the multi-mating-type species T. thermophila. The ciliate is an important group of organisms to study the evolution of mating types, as it is one of the few groups in which more than two mating types evolved independently. In the study, the authors use deletion strains of the species to show that both mating types genes located in each allele are required in both mating individuals for successful matings to occur. They show that the proteins are localized in the cell membrane, not the cilia, and that they interact in a complex (MTRC) with a set of 6 associated (non-mating type-allelic) genes. This complex is furthermore likely to interact with a cyclin-dependent kinase complex. It is intriguing that T. thermophila has two genes that are allelic and that are both required for successful mating. This coevolved double recognition has to my knowledge not been described for any other mating-type recognition system. I am not familiar with experimental research on ciliates, but as far as I can judge, the experiments appear well performed and mostly support the interpretation of the authors with appropriate controls and statistical analyses.

      The results show clearly that the mating type genes regulate non-self-recognition, however, I am not convinced that self-recognition occurs leading to the suppression of mating. An alternative explanation could be that the MTA and MTB proteins form a complex and that the two extracellular regions together interact with the MTA+MTB proteins from different mating types. This alternative hypothesis fits with the coevolution of MTA and MTB genes observed in the phylogenetic subgroups as described by Yan et al. (2021 iScience). Adding MTAxc and/or MTBxc to the cells can lead to the occupation of the external parts of the full proteins thereby inhibiting the formation of the complex, which in turn reduces non-self interactions. Self-recognition as explained in Figure 2S1 suggests an active response, which should be measurable in expression data for example. This is in my opinion not essential, but a claim of self-recognition through the MTA and MTB should not be made.

      We express our gratitude to Reviewer #3 for proposing the occupation model and have incorporated this possibility into the manuscript. We believe it is possible that occupation may serve as the molecular mechanism through which self-recognition negatively regulates mating. If there is a physical interaction between mating-type proteins of the same type, but this interaction blocks the recognition machinery rather than initiating mating, it can be considered a form of self-recognition. This aligns with the observation that strains expressing MTA/B6 and MTB2 mate normally with WT cells of all mating types except for VI and II (line 203-204). A concise discussion on this topic is included in the manuscript (line 288-293, 659-661). We are actively investigating the downstream aspects of mating-type recognition, and we hope to provide further insights into this question soon.

      The authors discuss that T. thermophila has special mating-type proteins that are large, while those of other groups are generally small (lines 157-160 and discussion). The complex formed is very large and in the discussion, they argue that this might be due to the "highly complex process, given that there are seven mating types in all". There is no argument given why large is more complex, if this is complex, and whether more mating types require more complexity. In basidiomycete fungi, many more mating types than 7 exist, and the homeodomain genes involved in mating types are relatively small but highly diverse (Luo et al. 1994 PMID: 7914671). The mating types associated with GPCR receptors in fungi are arguably larger, but again their function is not that complex, and mating-type specific variations appear to evolve easily (Fowler et al 2004 PMID: 14643262; Seike et al. 2015 PMID: 25831518). The large protein complex formed is reminiscent of the fusion patches that develop in budding or fission yeasts. In these species, the mating type receptors are activated by ligand pheromones from the opposite mating type that induce polarity patch formation (see Sieber et al. 2023 PMID: 35148940 for a recent review). At these patches, growth (shmooing) and fusion occur, which is reminiscent (in a different order) of the tip transformation in T. thermophilia. The fusion of two cells is in all taxa a dangerous and complex event that requires the evolution of very strict regulation and the existence of a system like the MTRC and cyclin-dependent complex to regulate this process is therefore not unexpected. The existence of multiple mating types should not greatly complicate the process, as most of the machinery (except for the MTA and MTB) is identical among all mating types.

      We are very grateful that Reviewer #3 provide this insightful view and relevant papers. In response to the feedback, we removed the sentences regarding “multiple mating types greatly complicate the process” in the revised version. Instead, we have introduced a discussion section comparing the mating systems of yeasts and Tetrahymena (line 279-286).

      The Tetrahymena/ciliate genetics and lifecycle could be better explained. For a general audience, the system is not easy to follow. For example, the ploidy of the somatic nucleus with regards to the mating type is not clear to me. The MAC is generally considered "polyploid", but how does this work for the mating type? I assume only a single copy of the mating type locus is available in the MAC to avoid self-recognition in the cells. Is it known how the diploid origin reduces to a single mating type? This does not become apparent from Cervantes et al. 2013.

      In T. thermophila, the MIC (diploid) contains several mating-type gene pairs (mtGP, i.e., MTA and MTB) organized in a tandem array at the mat locus on each chromosome. In sexual reproduction, the new MAC of the progeny develops from the fertilized MIC through a series of genome editing events, and its ploidy increases to ~90 by endoreduplication. During this process, mtGP loss occurs, resulting in only one mtGP remaining on the MAC chromosome. The mating-type specificity of mtGPs on each chromosome within one nucleus becomes relatively pure through intranuclear coordination. After multiple assortments (possibly caused by MAC amitosis during cell fission), only mtGPs of one mating-type specificity exist in each cell, determining the cell’s mating type.

      It is pity that the exact mechanisms involved in this complicated process remain a black box. The loss of mating-type gene pairs is hypothesized to involve a series of homologous recombination events, but this has not been completely proven. Furthermore, there is no clear understanding of how intranuclear coordination and assortment are achieved. While we have made observations confirming these events, a breakthrough in understanding the molecular mechanism is yet to be achieved.

      We included more information in the revised version (line 672-683). Given the complexity of these unusual processes, we recommend an excellent review by Prof. Eduardo Orias (PMID: 28715961), which offers detailed explanations of the process and related concepts (line 685-686).

      Also, the explanation of co-stimulation is not completely clear (lines 49-60). Initially, direct cell-cell contact is mentioned, but later it is mentioned that "all cells become fully stimulated", even when unequal ratios are used. Is physical contact necessary? Or is this due to the "secrete mating-essential factors" (line 601)? These details are essential, for interpretation of the results and need to be explained better.

      Sorry that we didn’t realize the term “contact” is not precise enough. In Tetrahymena, physical contact is indeed necessary, but it can refer to temporary interactions. Unlike yeast, Tetrahymena cells exhibit rapid movement, swimming randomly in the medium. Occasionally, two cells may come into contact, but they quickly separate instead of sticking together. Even newly formed loose pairs often become separated. As a result, one cell can come into contact with numerous others and stimulate them. We have clarified this aspect in the revised version (line 50-51, 57).

      Abstract and introduction: Sexes are not mating types. In general, mating types refer to systems in which there is no obvious asymmetry between the gametes, beyond the compatibility system. When there is a physiological difference such as size or motility, sexes are used. This distinction is of importance because in many species mating types and sexes can occur together, with each sex being able to have either (when two) or multiple mating types. An example are SI in angiosperms as used as an example by the authors or mating types in filamentous fungi. See Billiard et al. 2011 [PMID: 21489122] for a good explanation and argumentation for the importance of making this distinction.

      We have clarified the expression in the revised version (line 20, 38, 40, 45).

      Recommendations for the authors:

      Reviewer #1:

      I really enjoyed reading this manuscript and I think a few tweaks in the writing/data presentation could greatly improve the experience for the reader:

      (1) The information about your previous work in identifying downstream proteins CDK19, CYC9, and CIP1 (lines 170-173) could be directly presented in the introduction.

      We have moved this information in the introduction in the revised version (line 74-77).

      (2) For a reader who is not familiar with Tetrahymena, a few more details on how reporter and knock-out lines are generated would be beneficial.

      We introduced the knock-out method in Figure 2 – figure supplement 1B, HA-tag method in Figure 3A, and MTB2-eGFP construction method in Figure 4E. In addition, we introduced how co-stimulation markers observed in Materials and Methods (line 404-410)

      (3) Figures 5 and 6: clarify the types of pairing and treatments that were done directly in the figure (eg. adding additional labels). As of now, it is necessary to go through the text and legend to try and understand in detail what was done.

      Cell types and treatments were directly introduced in the revised figure (Figure 5 and 6).

      (4) The logical transition in lines 136-142 is hard to follow.

      We rewrote this paragraph in the revised version (lines 143-156). Additionally, we added a figure to illustrate the theoretical mating-type recognition model between WT cells and ΔCDK19, ΔCYC9 cells, MTAxc, MTBxc proteins, and ΔMTA, ΔMTB cells (Figure 2 – figure supplement 1D-G).

      (5) Lines 191-196: the fact that cells expressing multiple mating types can self goes against an active self-rejection system - if this is the case there should be self-rejection among all expressed mating types. Unless non-self recognition is an active process and self-recognition is simply the absence of non-self recognition. The authors briefly mention this in lines 263-265, but it would be interesting to expand and clarify this.

      We appreciate that Reviewer #1 notice the interesting selfing phenotype of the MTB2-eGFP (MTVI background) strain. We further discussed it in the revised manuscript (line 298-306).

      (6) The authors briefly mention the possibility of different mating types using different recognition mechanisms (lines 255-260), based on the big differences in the size of the mating-specific region of MT proteins. Following this and the weakness nr. 2, I think it would be pertinent to gather and present more information on the properties and structures of the mating-type specific regions of MT proteins. Simple in silico analysis of motifs, structure, etc. could help clarify the role of these regions. It seems more parsimonious that MT proteins would have variable mating type specific regions that account for the recognition of the different mating types, and conserved cytoplasmic functions that could trigger a single downstream signaling cascade. It would be interesting to know the authors' opinion on this.

      We are very grateful for this suggestion. Actually, we are currently working on determining the 3D structure of MTRC. The Alphafold2 prediction indicates that the MT-specific region is comprised of seven global β-sheets, resembling the structure of immunoglobulins (Ig). Our most recent cryo-EM results have revealed a ~15Å structure, aligning well with the prediction. However, the main challenge lies in the low expression levels, both in Tetrahymena and insect/mammal cells. We anticipate obtaining more detailed results soon. Therefore, we prefer to present the MT recognition model with robust experimental evidence in the future, and didn’t discuss too much on this aspect in the current manuscript.

      (7) Adding a figure including a proposed model, as well as expanding the discussion on the points presented as "weaknesses" would help clarify the ideas/hypothesis on how the mating recognition works. I think this would really elevate the paper and help highlight the results.

      We added a figure to introduce the model and the weaknesses in the revised version (Figure 7, line 656-665).

      (8) Line 202-203: It is far-fetched to infer subcellular localization based on the data presented here, couterstaining with other dyes and antibodies specific to certain cell components, as well as negative control images, are required.

      Thanks for the suggestion. We attempted to stain cell components using various dyes and antibodies. Unfortunately, we found that cell surface and cilia (especially oral cilia) is very easy to give a false positive signal. We think this issue seriously affects the credibility of the results. It may seem like splitting hairs, but we are trying to be precise.

      Meanwhile, we still believe the mating-type proteins localizes to cell surface because MTA-HA is identified in the isolated cell surface proteins.

      Regarding negative control, as shown in Fig. 4G, where a MTB2-eGFP cell is pairing with a WT cell, no GFP signal is observed in the WT cell.

      (9) Lines 131: clarify the sentence - expression of Con-A receptors requires both MTA and MTB (MTA to receive the signal).

      We modified the sentence in the revised version (line 139-140).

      Reviewer #2:

      Minor points.

      (1) Line 194-196. Why are these cells able to self?

      These cells able to self may because the MTRC contain heterotypic mating-type proteins (MTA6 and MTB2), which activate mating when they interact with another heterotypic MTRC (line 207-208).

      (2) Line 232. What do the authors mean by the term synergistic effect here? Definition and statistics?

      Sorry about the confusion. The synergistic effect refers to the effect of MTAxc and MTBxc become stronger when using together. We clarified it in the revised version (line 232).

      (3) For Figure 4 panel D, are there antibodies that are available as a control for cilia? If so, then blotting this membrane would show that cilia-associated proteins are in the cilia preparation, which is a standard control for sub-cellular fractionation.

      Thanks for the suggestion. Unfortunately, we didn’t find a suitable cilia-specific antibody yet. Instead, we employed MS analysis to confirm the presence of cilia proteins in this sample (line 195-196, Figure 4–Source data 1). We also observed the sample under the microscope, which directly revealed the presence of cilia (Figure 4C).

      (4) At least one reference cited in the text was not present in the reference list. The authors should go through the references cited to ensure that all have made it into the reference list.

      We have checked all the references.

      Some minor edits:

      (1) MTA and MTB are presented in both roman and italics (e.g. line 209) in the manuscript. Maybe all should be in italics? Or is this a distinction between the gene and the protein?

      The italics word (MTA) refers to gene, and non-italics word (MTA) refers to protein.

      (2) Line 251. Change "achieving" to "achieve".

      We have corrected this word (line 266).

      Reviewer #3:

      Line 101. It would help to explain this expectation earlier in this paragraph.

      We explained the expectation in the revised version (line 92-97, 104-106).

      Line 109. How is a co-receptor different from the MTRC complex?

      We have rewritten the relevant sentences to enhance clarity (line 116-119). The molecular function of the MTRC complex could involve acting as a co-receptor or recognizer (functioning as both ligand and receptor). Based on the results presented in this section, we propose that MTA and MTB may function as a complex, but the confirmation of this hypothesis (MTRC) is provided in a later section. Therefore, we did not use the term “MTRC” here. These sentences briefly discuss the molecular function of this complex and explain why MTRC does not appear to function as a co-receptor.

      Line 251: which "dual approach" is referred to?

      Dual approach is referred to both self and non-self recognition. We explained it in the revised version (line 265-266).

      Line 258: what "different mechanisms" do the authors have in mind? Why would a different mechanism be expected? The different sizes could have evolved for (coevolutionary?) selection on the same mechanism.

      Sorry about the confusion. We clarified it in the revised version (line 269-278).

      What we intended to express is that we are uncertain whether the mating-type recognition model we discovered in T. thermophila is applicable to all Tetrahymena species due to significant differences in the length of the mating-type-specific region. We believe it is important to highlight this distinction to avoid potential misinterpretations in future studies involving other Tetrahymena species. At the same time, we look forward to future research that may provide insights into this question.

      Fig 2 C&D. Is it correct that these figures show the strains only after 'preincubation'? This is not apparent from the caption of the text. Additionally, the order of the images is very confusing. Write in the figures (so not just in the caption) what the sub-script means.

      These panels are re-organized in the revised version (Fig. 2C&D). There are three kinds of pictures: “not incubated”, “WT pre-incubated by mutant” and “mutant pre-incubated by WT”.

      The methods used to generate Figure 5 are not clearly described. I understand that the obtained xc proteins were added to the cells, and then washed, after which a test was performed mixing WT-VI and WT-VII cells. Were both cells treated? Or only one of the strains? The explanation for the reused washing medium is not clear and the method is not indicated.

      Both cells are treated. More details are provided in the revised manuscript (line 230-231, 633-634, 637-639, Fig. 5). To prepare the starvation medium containing mating-essential factors, cells were starved in fresh starvation medium for ~16 hours. Subsequently, cells were removed by three rounds of centrifugation (1000 g, 3 min) (line 330-332).

      In general, the figures are difficult to understand without repeated inquiries in the captions. Give more information in the figures themselves.

      More information is introduced in the figure (Fig. 2C, Fig. 3B, Fig. 4A, B, D, Fig. 5 and Fig. 6).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This paper suggests to apply intrinsically-motivated exploration for the discovery of robust goal states in gene regulatory networks.

      Strengths:

      The paper is well written. The biological motivation and the need for such methods are formulated extraordinarily well. The battery of experimental models is impressive.

      We thank the reviewer for sharing interest in the research problem and for recognizing the strengths of our work.

      Weaknesses:

      (1) The proposed method is compared to the random search. That says little about the performance with regard to the true steady-state goal sets. The latter could be calculated at least for a few simple ODE (e.g., BIOMD0000000454, `Metabolic Control Analysis: Rereading Reder'). The experiment with 'oscillator circuits' may not be directly interpolated to the other models.

      The lack of comparison to the ground truth goal set (attractors of ODE) from arbitrary initial conditions makes it hard to evaluate the true performance/contribution of the method. A part of the used models can be analyzed numerically using JAX, while there are models that can be analyzed analytically.

      "...The true versatility of the GRN is unknown and can only be inferred through empirical exploration and proxy metrics....": one could perform a sensitivity analysis of the ODEs, identifying stable equilibria. That could provide a proxy for the ground truth 'versatility'.

      We agree with the reviewer that one primary concern is to properly evaluate the effectiveness of the proposed method. However, as we move toward complex pathways, knowledge of the “true” steady-state goal sets is often unknown which is where the use of machine learning methods as the one we propose are particularly interesting (but challenging to evaluate).

      For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models. While we agree that it is still interesting to evaluate exploration methods on these simple models for checking their behavior, it is not clear how to scale this analysis to the targeted more complex systems.

      For systems whose true steady state distribution cannot be derived analytically or numerically, we believe that random search is a pertinent baseline as it is commonly used in the literature to discover the attractors/trajectories of a biological network. For instance, Venkatachalapathy et al. [1] initialize stochastic simulations at multiple randomly sampled starting conditions (which is called a kinetic Monte Carlo-based method) to capture the steady states of a biological system. Similarly, Donzé et al. [29] use a Monte Carlo approach to compute the reachable set of a biological network «when the number of parameters is large and their uncertain range is not negligible».

      (2) The proposed method is based on `Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning', which assumes state action trajectories [s_{t_0:t}, a_{t_0:t}], (2.1 Notations and Assumptions' in the IMGEP paper). However, the models used in the current work do not include external control actions, but rather only the initial conditions can be set. It is not clear from the methods whether IMGEP was adapted to this setting, and how the exploration policy was designed w/o actual time-dependent actions. What does "...generates candidate intervention parameters to achieve the current goal....", mean considering that interventions 'Sets the initial state...' as explained in Table 2?

      We thank the reviewer for asking for clarification, as indeed the IMGEP methodology originates from developmental robotics scenarios which generally focus on the problem of robotic sequential decision-making, therefore assuming state action trajectories as presented in Forestier et al. [65]. However, in both cases, note that the IMGEP is responsible for sampling parameters which then govern the exploration of the dynamical system. In Forestier et al. [65], the IMGEP also only sets one vector at the start (denoted θ∈Θ) which was specifying parameters of a movement (like the initial state of the GRN), which was then actually produced with dynamic motion primitives which are dynamical system equations similar to GRN ODEs, so the two systems are mathematically equivalent. More generally, while in our case the “intervention” of the IMGEP (denoted i ∈I) only controls the initial state of the GRN, future work could consider more advanced sequential interventions simply by setting parameters of an action policy π_i at the start which could be called during the GRN’s trajectory to sample control actions π_i (a_(t+1) 〖|s〗_(t0:t+1),a_t) where s_t would be the state of the GRN. In practice this would also require setting only one vector at the start, so it would remain the same exploration algorithm and only the space of parameters would change, which illustrates the generality of the approach.

      (3) Fig 2 shows the phase space for (ERK, RKIPP_RP) without mentioning the typical full scale of ERK, RKIPP_RP. It is unclear whether the path from (0, 0) to (~0.575, ~3.75) at t=1000 is significant on the typical scale of this phase space. is it significant on the typical scale of this phase space?

      The purpose of Figure 2 is to illustrate an example of GRN trajectory in transcriptional space, and to illustrate what “interventions” and “perturbations” can be in that context. To that end we have used the fixed initial conditions provided in the BIOMD0000000647, replicating Figure 5 of Cho et al. [56]. While we are not sure of what the reviewer means with “typical” scale of this phase space, we would like to point reviewer toward Figure 8 which shows examples of certain paths that indeed reach further point in the same phase space (up to ~10μM in RKIPP_RP levels and ~300μM in ERK levels). However, while the paths displayed in Figure 8 are possible (and were discovered with the IMGEP), note that they may be “rarer” to occur naturally in the sense that a large portion of the tested initial conditions with random search tend to converge toward smaller (ERK, RKIPP_RP) steady-state values similar to the ones displayed in Figure 2.

      (4) Table 2:

      a) Where is 'effective intervention' used in the method?

      b) in my opinion 'controllability', 'trainability', and 'versatility' are different terms. If their correspondence is important I would suggest to extend/enhance the column "Proposed Isomorphism". otherwise, it may be confusing.

      a) We thank the reviewer for pointing out that “effective intervention” is not explicitly used in the method. The idea here is that as we are exploring a complex dynamical system (here the GRN), some of the sampled interventions will be particularly effective at revealing novel unseen outcomes whereas others will fail to produce a qualitative change to the distribution of discovered outcomes. What we show in this paper, for instance in Figure 3a and Figure 4, is that the IMGEP method is particularly sample-efficient in finding those “effective interventions”, at least more than a random exploration. However we agree that the term “effective intervention” is ambiguous (does not say effective in what) and propose to replace it with “salient intervention” in the revised version.

      b) We thank the reviewer for highlighting some confusing terms in our chosen vocabulary, and we will try to clarify those terms in the revised version. We agree that controllability/trainability and versatility are not exactly equivalent concepts, as controllability/trainability typically refers to the amount to which a system is externally controllable/trainable whereas versatility typically refers to the inherent adaptability or diversity of behaviors that a system can exhibit in response to inputs or conditions. However, they are both measuring the extent of states that can be reached by the system under a distribution of stimuli/conditions, whether natural conditions or engineered ones, which is why we believe that their correspondence is relevant.

      I don't see how this table generalizes "concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective".

      We propose to replace “generalize” with “investigate” in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Etcheverry et al. present two computational frameworks for exploring the functional capabilities of gene regulatory networks (GRNs). The first is a framework based on intrinsically-motivated exploration, here used to reveal the set of steady states achievable by a given gene regulatory network as a function of initial conditions. The second is a behaviorist framework, here used to assess the robustness of steady states to dynamical perturbations experienced along typical trajectories to those steady states. In Figs. 1-5, the authors convincingly show how these frameworks can explore and quantify the diversity of behaviors that can be displayed by GRNs. In Figs. 6-9, the authors present applications of their framework to the analysis and control of GRNs, but the support presented for their case studies is often incomplete.

      Strengths:

      Overall, the paper presents an important development for exploring and understanding GRNs/dynamical systems broadly, with solid evidence supporting the first half of their paper in a narratively clear way.

      The behaviorist point of view for robustness is potentially of interest to a broad community, and to my knowledge introduces novel considerations for defining robustness in the GRN context.

      We thank the reviewer for recognizing the strengths and novelty of the proposed experimental framework for exploring and understanding GRNs, and complex dynamical systems more generally. We agree that the results presented in the section “Possible Reuses of the Behavioral Catalog and Framework” (Fig 6-9) can be seen as incomplete along certain aspects, which we tried to make as explicit as possible throughout the paper, and why we explicitly state that these are “preliminary experiments”. Despite the discussed limitations, we believe that these experiments are still very useful to illustrate the variety of potential use-cases in which the community could benefit from such computational methods and experimental framework, and build on for future work.

      Some specific weaknesses, mostly concerning incomplete analyses in the second half of the paper:

      (1) The analysis presented in Fig. 6 is exciting but preliminary. Are there other appropriate methods for constructing energy landscapes from dynamical trajectories in gene regulatory networks? How do the results in this particular case study compare to other GRNs studied in the paper?

      We are not aware of other methods than the one proposed by Venkatachalapathy et al. [1] for constructing an energy landscape given an input set of recorded dynamical trajectories, although it might indeed be the case. We want to emphasize that any of such methods would anyway depend on the input set of trajectories, and should therefore benefit from a set that is more representative of the diversity of behaviors that can be achieved by the GRN, which is why we believe the results presented in Figure 6 are interesting. As the IMGEP was able to find a higher diversity of reachable goal states (and corresponding trajectories) for many of the studied GRNs, we believe that similar effects should be observable when constructing the energy landscapes for these GRN models, with the discovery of additional or wider “valleys” of reachable steady states. We could indeed add other case studies in the supplementary to support the argument for the revised version.

      Additionally, it is unclear whether the analysis presented in Fig. 6C is appropriate. In particular, if the pseudopotential landscapes are constructed from statistics of visited states along trajectories to the steady state, then the trajectories derived from dynamical perturbations do not only reflect the underlying pseudo-landscape of the GRN. Instead, they also include contributions from the perturbations themselves.

      We agree that the landscape displayed Fig. 6C integrates contributions from the perturbations on the GRN’s behavior, and that it can shape the landscape in various ways, for instance affecting the paths that are accessible, the shape/depth of certain valleys, etc. But we believe that qualitatively or quantitatively analyzing the effect of these perturbations on the landscape is precisely what is interesting here: it might help 1) understand how a system respond to a range of perturbations and to visualize which behaviors are robust to those perturbations, 2) design better strategies for manipulating those systems to produce certain behaviors

      (2) In Fig. 7, I'm not sure how much is possible to take away from the results as given here, as they depend sensitively on the cohort of 432 (GRN, Z) pairs used. The comparison against random networks is well-motivated. However, as the authors note, comparison between organismal categories is more difficult due to low sample size; for instance, the "plant" and "slime mold" categories each only have 1 associated GRN. Additionally, the "n/a" category is difficult to interpret.

      We acknowledge that this part is speculative as stated in the paper: “the surveyed database is relatively small with respect to the wealth of available models and biological pathways, so we can hardly claim that these results represent the true distribution of competencies across these organism categories”. However, when further data is available, the same methodology can be reused and we believe that the resulting statistical analyses could be very informative to compare organismal (or other) categories.

      (3) In Fig. 8, it is unclear whether the behavioral catalog generated is important to the intervention design problem of moving a system from one attractor basin to another. The authors note that evolutionary searches or SGD could also be used to solve the problem. Is the analysis somehow enabled by the behavioral catalog in a way that is complementary to those methods? If not, comparison against those methods (or others e.g. optimal control) would strengthen the paper.

      We thank the reviewer for asking to clarify this point, which might not be clearly explained in the paper. Here the behavioral catalog is indeed used in a complementary way to the optimization method, by identifying a representative set of reachable attractors which are then used to define the optimization problem. For instance here, thanks to the catalog, we 1) were able to identify a “disease” region and several possible reachable states in that region and 2) use several of these states as starting points of our optimization problem, where we want to find a single intervention that can successfully and robustly reset all those points, as illustrated in Figure 8. Please note that given this problem formulation, a simple random search was used as an optimization strategy. When we mention more advanced techniques such as EA or SGD, it is to say that they might be more efficient optimizers than random search. However, we agree that in many cases optimizing directly will not work if starting from random or bad initial guess, and this even with EA or SGD. In that case the discovered behavioral catalog can be useful to better initialize this local search and make it more efficient/useful, akin to what is done in Figure 9.

      (4) The analysis presented in Fig. 9 also is preliminary. The authors note that there exist many algorithms for choosing/identifying the parameter values of a dynamical system that give rise to a desired time-series. It would be a stronger result to compare their approach to more sophisticated methods, as opposed to random search and SGD. Other options from the recent literature include Bayesian techniques, sparse nonlinear regression techniques (e.g. SINDy), and evolutionary searches. The authors note that some methods require fine-tuning in order to be successful, but even so, it would be good to know the degree of fine-tuning which is necessary compared to their method.

      We agree that the analysis presented in Figure 9 is preliminary, and thank the reviewer for the suggestion. We would first like to refer to other papers from the ML literature that have more thoroughly analyzed this issue, such as Colas et al. [74] and Pugh et al. [34], and shown the interest of diversity-driven strategies as promising alternatives. Additionally, as suggested by the reviewer, we added an additional comparison to the CMA-ES algorithm in order to complete our analysis. CMA-ES is an evolutionary algorithm which is self-adaptive in the optimization steps and that is known to be better suited than SGD to escape local minimas when the number of parameters is not too high (here we only have 15 parameters). However, our results showed that while CMA-ES explores more the solution space at the beginning of optimization than SGD does, it also ultimately converges into a local minima similarly to SGD. The best solution converges toward a constant signal (of the target b) but fails to maintain the target oscillations, similar to the solutions discovered by gradient descent. We tried this for a few hyperparameters (init mean and std) but always found similar results. We report the novel results at https://developmentalsystems.org/curious-exploration-of-grn-competencies/tuto2.html (bottom cell, Figure 4). We suggest including the updated figure and caption in the revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This is significant work, and you should certainly make the best case you can on the weaknesses discussed.

      We thank reviewer for this positive comment on the significance of our work. The referee indicates as weaknesses (i) that the force involving the bent or straight αI-helix is not readily apparent, (ii) the residue types were not varied in the helix mutations, and (iii) that the chemical shift perturbations are indirect observations.

      We think we have tried to address a large part of these questions by being very careful in our analysis and by the discussion in the manuscript. The following remarks may help to clarify this further:

      (i) The force emanating from the helix is e.g. visualized in the PC2 loadings in Figure 6E of the PCA carried on all observed SH3-SH2-KD resonances for all apo forms of the helix mutants. The SH2 residues identified by these loadings are in direct vicinity to the αI-helix. The respective PC2 scores correlate to 98% with the vmax of the catalytic reaction and to 94 % with the PC1 scores found for imatinib-induced opening. Importantly, the structure of the KD with the straight αI-helix indicates that mostly residues F516, Q517, S520, and I521 would clash with the SH2 domain in a closed core (Figure 6F). Thus, the expected clashes are in direct vicinity of the SH2 residues identified by the PC2 loadings as correlated to vmax and imatinib-induced opening. These data are completely orthogonal and show that most of the force is coming from residues F516, Q517, S520, and I521 in the αI-αI’ turn.

      (ii) We agree that we mainly used truncations of the αI-helix to study its involvement in activation. Point (i) makes it clear that a larger part of the αI-helix effects is caused by steric clashes of the residues in the αI-αI’ turn. In the latter region, we don’t expect strong amino acid type-specific effects besides excluded volume. Due to expression problems, we could not vary the helix length between residues 519 and 534. However, in this region we introduced the amino acid type mutation E528K. The latter showed a clear specific effect. Further amino acid type-specific effects may be possible in this region. However, we expect that the identified electrostatic E528-R479 interaction is one of the most important interactions in this region.

      (iii) We agree that chemical shift changes of individual resonances are often hard to interpret. However, we want to stress that our conclusions are all drawn from principal component analyses, which in all cases had as input well over 100 if not over 200 1H-15H resonances. The first two principal components of these analyses are robust averages over many residues, which reveal general correlated structural trends.

      We assume that chemical shift deposition etc will be pursued.

      We are currently depositing a larger collection of our Abl data to the “Biological Magnetic Resonance Data Bank (BMRB)”, which includes the NMR chemical shift data of the present work. A ‘collection’ will be a new feature of the BMRB, and we are in discussion with their staff. We will provide the accession codes as soon as possible (probably within the next month) to be included into the final version of the manuscript. We have amended the Data Availability Section accordingly.

      Reviewer #2 (Recommendations For The Authors):

      1) The overall discussion of the implications of the described allostery on kinase activation is provided through lenses of imatinib binding, which is used as an experimental trigger to disassemble the autoinhibited core. Can the authors elaborate in the Discussion on what event would play this role in the kinase catalytic cycle, communicating to helix I? Would dissociation of the myristate from the active site be hypothesized to be the first step in kinase activation? While I understand that certainty may be challenging to attain, it would be good to introduce some ideas into the Discussion.

      We appreciate the reviewer’s suggestions for the discussion and added the following text to the Conclusion section:

      "We have used here imatinib binding to the ATP-pocket as an experimental tool to disassemble the Abl regulatory core. Our previous analysis (Sonti et al., 2018) of the high-resolution Abl transition-state structure (Levinson et al., 2006) indicated that due to the extremely tight packing of the catalytic pocket, binding and release of the ATP and tyrosine peptide substrates is only possible if the P-loop and thereby the N-lobe move towards the SH3 domain by about 1–2 Å. This motion is of similar size and direction as the motion of the N-lobe observed in complexes with imatinib and other type II inhibitors (Sonti et al., 2018). From this we concluded that substrate binding opens the Abl core in a similar way as imatinib. The present NMR and activity data now clearly establish the essential role of the αI-helix both in the imatinib- and substrate-induced opening of the core, thereby further corroborating the similarity of both disassembly processes.

      Notably, the used regulatory core construct Abl83-534 lacks the myristoylated N-cap. Although we have previously demonstrated that the latter construct is predominantly assembled (Skora et al., 2013), the addition of the myristoyl moiety is expected to further stabilize the assembled conformation in a similar way as asciminib.

      Considering this mechanism, dissociation of myristoyl from the native Abl 1b core may be a first step during activation. However, it should be kept in mind that the Abl 1a isoform lacks the N-terminal myristoylation, and it is presently unclear whether other moieties bind to the myristoyl pocket of Abl 1a during cellular processes."

      2) Can the authors comment more on the differentiation between assembled conformations induced by type I inhibitor binding vs apo forms (or AMP-PNP and allosteric inhibitor) reported in Figure 3B? The differences are clearly identified by PCA but not sufficiently discussed.

      As indicated in the text, we think two structural effects are intermingled within PC2. Due to this admixture, it is hard to draw strong conclusions and we don’t want to expand on this too much. We have slightly modified the respective paragraph (p.7) as follows):

      "As the affected residues react differently to perturbations by type I inhibitors and truncation of the αI’-helix (Figure 3A, right), we attribute this behavior to two effects intermixed into the PC2 detection: (i) a minor rearrangement of the SH3/KD N-lobe interface caused by filling of the ATP pocket with type I inhibitors, which in contrast to the stronger N-lobe motion induced by type II inhibitors does not yet lead to core disassembly and (ii) a small rearrangement of the SH2/KD C-lobe interface caused by shortening and mutations of the αI-helix."

      3) The allosteric connection between active site inhibitor binding and the myristate/allosteric inhibitor binding has been observed in the past and noted before, in papers such as Zhang et al, Nature 2010. While the authors reference this paper, they do not acknowledge its specific findings or engage in a broader discussion of how their conclusions relate to this work.

      We have modified the beginning of the Conclusion section:

      "The allosteric connection between Abl ATP site and myristate site inhibitor binding has been noted before, albeit specific settings such as construct boundaries and the control of phosphorylation vary in published experiments. Positive and negative binding cooperativity of certain ATP-pocket and allosteric inhibitors has been observed in cellular assays and in vitro (Kim et al., 2023; Zhang et al., 2010). Furthermore, hydrogen exchange mass spectrometry has indicated changes around the unliganded ATP pocket upon binding of the allosteric inhibitor GNF-5 (Zhang et al., 2010). Here, we present a detailed high-resolution explanation of these allosteric effects via a mechanical connection between the kinase domain N- and C-lobes that is mediated by the regulatory SH2 and SH3 domains and involves the αI helix as a crucial element.

      Specifically, we have established a firm correlation between the kinase activity of the Abl regulatory core, the imatinib (type II inhibitor)-induced disassembly of the core, which is caused by a force FKD–N,SH3 between the KD N-lobe and the SH3 domain, and a force FαI,SH2 exerted by the αI-helix towards the SH2 domain. The FαI,SH2 force is mainly caused by a clash of the αI-αI’ loop with the SH2 domain. Both the FKD–N,SH3 and FαI,SH2 force act on the KD/SH2SH3 interface and may lead to the disassembly of the core, which is in a delicate equilibrium between assembled and disassembled forms. As disassembly is required for kinase activity, the modulation of both forces constitutes a very sensitive regulation mechanism. Allosteric inhibitors such as asciminib and also myristoyl, the natural allosteric pocket binder, pull the αI-αI’ loop away from the SH2 interface, and thereby reduce the FαI,SH2 force and activity. Notably, all observations described here were obtained under nonphosphorylated conditions, as phosphorylation will lead to additional strong activating effects."

      4) Figure 6 could do a better job of providing an illustration of steric clashes.

      We have revised Figure 6, panel F, in order to better illustrate the steric clashes, and modified the legend accordingly.

      5) There is a typo in line 5 from the top on page 11 (dash missing from "83534" superscript).

      Thank you. This was fixed.

    1. Author Response

      Many thanks for handling our manuscript (eLife-RP-RA-2023-93968) entitled "Allosteric modulation of the CXCR4:CXCL12 axis by targeting receptor nanoclustering via the TMV-TMVI domain", by García-Cuesta et al. We are delighted to hear your willingness to consider our manuscript following appropriate revision. We have carefully read the referees' commentaries and have organized new experiments to address their specific queries.

      Reviewer #1 (Public Review)

      The computational methodology is going to be carefully reviewed. In particular to justify the software and techniques used in this manuscript. We will also describe the method for identifying the pocket on the CXCR4 structure as well as the workflow used to explain the transition from docking evaluation to MD analyses. Additionally, we will conduct experiments to enhance the results and address the specific feedback provided, ultimately improving the overall reliability.

      Reviewer #2 (Public Review)

      Although the paper was initiated by titrating the compounds in migration experiments, we are going to add new kinetics and titration of concentrations in these experiments. In addition, we are going to change the way in which we present the data from the singlemolecule tracking experiments. We will add a representative video of each experimental condition, and include some of the mean square displacement curves to support our data on the analysis of the diffusion coefficient (D1-4) to give a more conclusive view of receptor clustering. Regarding the tumorigenesis experiments we will include the individual data points and we will try to perform kinetics with distinct concentrations of the drug.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript from Kavanjoo et al examines the role of macrophages within the fetal liver beyond erythrocyte maturation. Using single-cell sequencing, high-resolution imaging, and inducible genetic deletion of yolk-sac (YS) derived macrophages, the authors demonstrate that heterogeneous fetal liver macrophages regulate erythrocyte enucleation, interact physically with fetal HSCs, and may regulate neutrophil accumulation in the fetal liver. The data as presented do not strongly support the authors’ conclusion that fetal macrophages in the liver regulate the HSC niche or granulopoiesis from HSCs.

      Fetal-derived resident tissue macrophages are increasingly implicated in regulation of adult tissue function and homeostasis, but considerably less is known regarding the function of fetal macrophages during development. Macrophages in the fetal liver have been shown to form erythroblastic islands, where they regulate erythrocyte maturation. Here, the authors performed single-cell sequencing on fetal liver macrophages (Cd11b-lo) to gain insight into heterogeneity and utilized previously published pre-Mac signatures from the YS to focus on YS-derived macrophages. These clusters were then further cross-referenced with surface protein expression as determined by multidimensional flow cytometry to hone in on a very specific subset of three groups of F4/80hi macrophages defined by multiple surface markers. Fate-mapping with three models (Tnfrsf11a-Cre - YS pMAC derived; Ms4a3Cre - FL monocyte derived; CXCR4-Cre-ERT2 - definitive HSC derived) revealed that three major subsets are all derived from YS pMACs.

      We thank the reviewer for the comments and have addressed all points below. If certain points were mentioned twice, we responded at the position where the point was raised the first time.

      However, the relative frequencies of these specific populations are not shown, and because the single sequencing analysis goes through so many iterations of re-clustering that initiates by focusing specifically on pMAC signatures, this result is not surprising.

      Probing gene expression within each of the three clusters revealed ligand expression suggesting cell-cell interactions, and cross-referencing with a fetal LT-HSC gene expression dataset revealed potential receptor-ligand interactions. Microscopic investigation of physical interactions between specific macrophage subsets and HSCs was not particularly convincing. In Figure 3C, for example, Cluster C is very difficult to visualize. It would again be helpful to know what the ratios are within the FL for each cluster. Data in Figure 3F are not well represented by Data in Figure 3E.

      We showed frequencies after CODEX in the original manuscript (Fig. S3A, now Figure 4 - figure supplement 1A) since isolation of cells often induces an artifact, and relative frequencies after scRNA-seq experiments never represent the actual cell numbers present in situ. However, also the CODEX analysis has its weakness, especially in dense tissues, as the automated gating method may not catch every macrophage due to its star-shaped structure. Thus, we have now included the absolute numbers of macrophage subpopulations in Figure 7C. We have tried to improve the visualization of the clusters in Figure 3C (now Figure 4C) by zooming into a specific region. The Voronoi diagram is a powerful method that allows for an overall spatial visualization of cell distribution in large tissue pieces. In the high-resolution PDF that we provide, zooming into the PDF file should allow the reader to see each cluster in great detail.

      To improve the data of macrophage-HSC interaction we have performed 3D reconstructions and quantified the distance of CD150+ and Iba1+ cells in 3D (new Figure 3C-E) as the thin cryosectioning used for CODEX is not suitable to reconstruct these interactions properly (see also lines 328-331). Thus, Figure 3E was not able and also not meant to represent data shown in Figure 3F (now Figure 4E and 4F). Figure 3E is just meant to show examples of all clusters sitting in proximity to CD150+ HSCs.

      Furthermore, deletion of YS pMAC-derived macrophages the Tnfrsf11a-Cre X Spi1fl/fl resulted in broad macrophage depletion - although the authors did not demonstrate this using the carefully refined phenotypes they had defined earlier in the manuscript. Nonetheless, the authors demonstrate that macrophage depletion did affect erythroid enucleation, as expected, and the authors also showed some effect of macrophage deletion on LT-HSC gene expression by bulk transcription analysis. These effects were relatively small, however, and this was clear in the absence of effects on hematopoiesis in vivo or HSC proliferation ex vivo. To further investigate the effects of macrophage deletion on downstream hematopoieisis, the authors re-assessed the myeloid compartment following macrophage deletion, and identified and specifically focused on an observed increase in neutrophils in response to macrophage depletion. Based on this increase, they tested HSC differentiation using a colony-forming assay, which shows a slight increase in GM colonies that is also reflective of a slight but insignificant increase in total colony forming capability. The authors concluded that loss of fetal macrophages causes a reprogramming of HSCs to the granulocytic lineage. However, the colony-forming assay and subtle differences in gene expression are not sufficient to conclude that fetal HSCs have been reprogrammed towards granulocytic lineage by macrophage deletion.

      We thank the reviewer for this comment and have improved the manuscript accordingly: We have performed the colony-forming assay again with n=5 embryos per genotype that were harvested on the same day, which resulted in a similar phenotype as before, with the differences of GM colonies now being significant. Further, we quantified the depletion of all macrophage subpopulations in the Tnfrsf11a-Cre X Spi1fl/fl model (Fig. 7C). To strengthen the point that the transient lack of macrophages when HSCs arrive in the fetal liver leads to their reprograming, we included flow cytometry data from E16.5 and E18.5 where we still see an increase of neutrophils in the fetal liver, despite the fact that macrophages are repopulating the empty niche (Fig. 7E, F). To show that this is a cell-intrinsic effect, we have performed adoptive transfer experiments supporting our claim that loss of macrophages reprograms HSCs toward the granulocytic lineage (Fig. 7H, I)

      Overall, there are some interesting pieces of data in this manuscript, including the classification of new subsets of macrophages in the liver, their fate-mapping to the YS, and gene expression analysis. However, the data as presented do not strongly support a role for these particular macrophage subsets in regulating HSCs or fetal hematopoiesis within the fetal liver niche. Although there may be specific subsets of fetal liver macrophages that more closely physically interact with HSCs, deletion of what appeared to be a vast majority of macrophages in the FL did not appear to affect cellularity of hematopoietic stem and progenitor cells in vivo, and was not shown to convincingly affect HSC function. The mechanism by which macrophage deletion affected granulopoiesis could be independent from HSCs, and would be interesting to further explore.

      We hope that with new set of experiments we were able to convince the reviewer of the importance of macrophages in the HSC niche.

      Reviewer #2 (Public Review):

      Using a single-cell omics approach combined with spatial proteomics and genetic fate mapping, Kayvanjoo et al found that fetal liver (FL) macrophages cluster into distinct yolk sac-derived subpopulations and that some of the HSCs in FL preferentially associate with one of the identified macrophage subpopulations. FLs lacking macrophages show a delay in erythropoiesis. The authors also try to identify a role of macrophages for HSCs function in FL, and claim that macrophages affect myeloid differentiation of HSCs. Experimental support for the function of macrophages on HSCs remains weak. Taken together, their data provide a precise map of FL macrophage subpopulations, which is novel and will serve the field well.

      We thank the reviewer for the positive assessment. We have now strengthened the data regarding the impact of granulopoiesis by performing additional CFU assays and adoptive transfers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank the reviewers for their insightful comments on our manuscript. We have addressed the reviewers’ comments below and in the revised manuscript.

      Reviewer #1:

      Comment #1: The authors found differences in the initial spike doublet of action potentials between cortical neurons in experimental and control conditions (Figure 2e). The action potential firing frequency of the first two APs (instant firing frequency) of recorded neurons shall be quantified to investigate whether there are statistical differences between the action potential firing frequency in cortical neurons in different experimental groups versus control conditions.

      Response: As suggested by the reviewer, we have quantified the first interspike interval (ISI; time between the 1st and 2nd action potential). The data is included in Fig. 2h as well as in Fig. S3e and Table 1. The Results and Methods have also been updated accordingly.

      Comment #2: The mTORS12215Y induced the largest changes in Ih current amplitudes in cortical neurons compared with other experimental conditions. Whether the HCN4 channel expression is regulated by mTOR pathway activation, or could there be possible interactions between the HCN channel and mTORS12215Y mutant protein?

      Response: Our previous findings using the RhebS16H mutation support the idea that increased expression of HCN4 channels is regulated by mTOR pathway activation. This is evidenced by its sensitivity to rapamycin (a mTOR inhibitor) and expression of constitutively active 4E-BP1 (a translational repressor downstream of mTORC1). Since mTORS2215Y directly hyperactivates mTORC1 and there are no known interactions between HCN channels and mTORS2215Y, our data strongly suggests that abnormal HCN4 channel expression occurs via mTORC1 hyperactivation in this condition. We have revised our Discussion to make this point clearer.

      Comment #3: A comparison of the electrophysiological characteristics of cortical neurons in different experimental conditions in the present study and pathological neurons in human FCD reported in previous literature could be interesting. Inducing pathological gene mutations or knocking out key genes in mTOR pathway in the rodent cortex - which approach could better model human FCD?

      Response: We agree with the reviewer and have added a new paragraph in the Discussion to compare our electrophysiology results to those of previous studies done on human FCDII and TSC cytomegalic neurons. With regards to the reviewer’s question about which of the two approaches in the rodent cortex – inducing pathological gene mutations or knocking out key genes in the mTOR pathway – would better model human FCD, our study emphasizes the importance of considering gene-specific mechanisms in FCDII. Thus, modeling the genetically distinct FCDIIs will require using gene-specific manipulations. We have revised our Discussion to include this point. With that said, for some phenotypes that are generalized across FCDII independent of the mTOR pathway genes, using pathogenic mutations of mTOR activators or knockout of negative mTOR regulators would likely both be appropriate models. Of note, as discussed in the manuscript, there are also technical factors to be considered when choosing to use a pathogenic gene mutation versus knocking out a gene (the latter which would depend on the half-life of the proteins).

      Reviewer #2:

      Comment #1: The authors postulate that all the findings are dependent on mTORC1-related effects but don't assess whether some of the differences could be due to effects on mTORC2 signaling. mTORC2 is an important and poorly understood alternative isoform of mTOR (due to rictor binding) that has effects on distinct cell signaling pathways and in particular actin polymerization. This doesn't diminish the effects of the current analysis of mTORC1 but could explain genotypic differences in each variable. A few prior studies have assessed the role of mTORC2 in epileptogenesis and cortical malformations (Chen et al., 2019).

      Response: We agree with the reviewer and have revised our Discussion to include the possibility of mTORC2 contribution to the gene-specific phenotypic differences.

      Comment #2: The slice recordings were performed in the usual recording aCSF buffer conditions but there is no assessment of the role of amino acids or nutrients in the bath. While it is clear that valuable and viable acute slice recordings can be made in aCSF, the role of the mTOR pathway is to modulate cell growth in response to nutrient conditions. Thus, one variable that could be manipulated and assessed currently in this study is the levels of amino acids i.e., leucine and arginine added to the bath since DEPDC5 and TSC1 are responsive to ambient amino acid levels.

      Response: We thank the reviewer for this great suggestion, and we intend to pursue this as part of another study.

      Comment #3: The analysis concedes that the role of somatic mutations in cortical malformations may depend not only on genotypic effects but also on allelic load and cellular subtype affected by the mutation. Thus, it would be interesting to see if electroporation either at E14 or E16, thereby affecting a distinct pool of progenitors, would mitigate or accentuate differences between mTOR pathway genes.

      Response: We agree with the reviewer. This is a crucial experiment that we hope to perform in the future. We have also added a paragraph in our Discussion to address this important point.

      Comment #4: Treatment with rapamycin and zatebradine in each condition would have added to the strength of the findings to determine the mTOR-dependence and reversibility of HCN4 effects.

      Response: We previously demonstrated the mTORC1 dependence of HCN4 expression in the RhebS16H condition using rapamycin and expression of constitutively active 4E-BP1. 4E-BP1 is a translational repressor downstream of mTORC1. In the 4E-BP1 study, we used a conditional system to express 4EBP1F113A (mutation that resists inactivation by mTORC1) in adolescent mice while RhebS16H (and thus mTORC1 activation) was expressed embryonically. 4E-BP1F113A expression suppressed Ih current and HCN4 expression, suggesting that aberrant HCN4 expression can be reversed by decreasing mTORC1regulated translation. Based on these data and the findings that rapamycin suppressed abnormal HCN4 expression, we postulate that increased HCN4 expression in the different gene conditions examined in the present study occurs via the mTORC1 pathway. However, we agree with the reviewer that treating each of the conditions with rapamycin would provide direct evidence of their mTORC1 dependence. Additionally, treating each condition with the HCN channel blocker zatebradine would also add strength to the findings. We have added a comment in the Discussion to acknowledge this point.

      Reviewer #1 (Recommendations For The Authors):

      Comment #1: The authors found increased frequency or amplitudes of spontaneous postsynaptic currents in different experimental cohorts. These data may not be sufficient to conclude increased synaptic excitability, because there are no pharmacological experiments to verify whether the recorded inward currents are excitatory or inhibitory postsynaptic currents. An alternative approach could be analyzing the decay time of spontaneous postsynaptic currents, the excitatory postsynaptic currents had relatively faster decay time compared with inhibitory postsynaptic currents.

      Response: Thank you for the comment. We apologize for the lack of clarity and have added the following text in the Results to clarify: “To separate sEPSCs from spontaneous inhibitory postsynaptic currents (sIPSCs), we used an intracellular solution rich in K-gluconate to impose a low intracellular Cl- concentration and recorded at a holding potential of -70 mV, which is near the Cl- reversal potential. The 90%-10% decay time of the measured synaptic currents ranged between 4-8 ms in all conditions (mean ± SD: control: 4.9 ± 1.6; RhebY35L: 5.2 ± 1.4; mTORS2215Y: 7.4 ± 1.4; control: 6.8 ± 0.7; Depdc5KO: 7.4 ± 1.0; PtenKO: 8.1 ± 0.9; Tsc1KO: 7.4 ± 0.9), consistent with the expected decay time for sEPSCs and shorter than the decay time for sIPSCs (Kroon et al, 2019). The recorded synaptic currents were therefore considered to be sEPSCs.”

      Comment #2: There are typos of Depdc5 in the text and figure legends.

      Response: Thank you for noticing this error. We have corrected the typos in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Zhu and colleagues aimed to clarify the importance of isoform diversity of PCDHg in establishing cortical synapse specificity. The authors optimized 5' single-cell sequencing to detect cPCDHg isoforms and showed that the pyramidal cells express distinct combinations of PCDHg isoforms. Then, the authors conducted patch-clamp recordings from cortical neurons whose PCDHg diversity was disrupted. In the elegant experiment in Figure 3, the authors demonstrated that the neurons expressing the same sets of cPCDHg isoforms are less likely to form synapses with each other, suggesting that identical cPCDHg isoforms may have a repulsive effect on synapse formation. Importantly, this phenomenon was dependent on the similarity of the isoforms present in neurons but not on the amount of proteins expressed.

      One of the major concerns in an earlier version was whether PCDHg isoforms, which are expressed at a much lower level than C-type isoforms, have true physiological significance. The authors conducted additional experiments to address this point by using PCDHg cKO and provided convincing data supporting their conclusion. The results from PCDHg C4 overexpression, showing no impact on synaptic connectivity, further clarified the importance of isoforms. I have no further concerns, however, I would like to point out that the evidence for the necessity of the PCDHg isoform is still lacking because most experiments were done by overexpression. It would be helpful for the readers if the authors could add this point to the discussion.

      Thank you for the positive feedback on our work. We have now incorporated a discussion of the limitations associated with overexpression.

      Reviewer #2 (Public Review):

      This short manuscript by Zhu et al. describes an investigation into the role of gamma protocadherins in synaptic connectivity in the mouse cerebral cortex. First, the authors conduct a single-cell RNA-seq survey of postnatal day 11 mouse cortical neurons, using an adapted 10X Genomics method to capture the 5' sequences that are necessary to identify individual gamma protocadherin isoforms (all 22 transcripts share the same three 3' "constant" exons, so standard 3'biased methods can't distinguish them). This method adaptation is an advance for examining individual gamma transcripts, and it is helpful to publish the method, the characterization of which is improved in this revised manuscript. The results largely confirm what was known from other approaches, which is that a few of the 19 A and B subtype gamma protocadherins are expressed in an apparently stochastic and combinatorial fashion in each cortical neuron, while the 3 C subtype genes are expressed ubiquitously. Second, using elegant paired electrophysiological recordings, the authors show that in gamma protocadherin cortical slices, the likelihood of two neurons on layers 2/3 being synaptically connected is increased. That suggests that gamma protocadherins generally inhibit synaptic connectivity in the cortex; again, this has been reported previously using morphological assays, but it is important to see it confirmed here with physiology. Finally, the authors use an impressive sequential in utero electroporation method to provide evidence that the degree of isoform matching between two neurons negatively regulates their reciprocal synaptic connectivity. These are difficult experiments to do, and while some caveats remain, the main result is consistent. Strengths include the impressive methodology and improved demonstration of the previously-reported finding that gamma protocadherins work via homophilic matching to put a brake on synapse formation in the cortex. Weaknesses include the writing, which even in the revision fails to completely put the new results in context with prior work, which together has largely shown similar results; a still-incomplete characterization of a new alpha protocadherin KO mouse (a minor point but it should still be addressed); and a lack of demonstration of protein levels in electroporated brains. Because of the unique organization and expression pattern of the gamma protocadherins, it is unlikely that these results will be directly applicable to the broader understanding of the role of cell adhesion molecules in synapse development. However, the methodology, which is now better described, should be applicable more broadly and the improved demonstration of the role of gamma protocadherin's negative role in cortical synaptogenesis is helpful.

      Thank you for the positive comments on our work. We have taken your suggestion into account and expanded our discussion to contextualize our research within the broader field of PCDH. Additionally, we have included more data to further illustrate the decrease in αPCDH expression in Pcdha conditional knockout mice. Your feedback has been invaluable in enhancing our manuscript.

      Reviewer #3 (Public Review):

      In this study, Zhu and authors investigate the expression and function of the clustered Protocadherins (cPcdhs) in synaptic connectivity in the mouse cortex. The cPcdhs encode a large family of cadherin-related transmembrane molecules hypothesized to regulate synaptic specificity through combinatorial expression and homophilic binding between neurons expressing matching cPcdh isoforms. But the evidence for combinatorial expression has been limited to a few cell types, and causal functions between cPcdh diversity and wiring specificity have been difficult to test experimentally. This study addresses two important but technically challenging questions in the mouse cortex: 1) Do single neurons in the cortex express different cPcdh isoform combinations? and 2) Does Pcdh isoform diversity or particular combinations among pyramidal neurons influence their connectivity patterns? Focusing on the Pcdh-gamma subcluster of 22 isoforms, the group performed 5'end-directed single-cell RNA sequencing from dissociated postnatal (P11) cortex. To address the functional role of Pcdhg diversity in cortical connectivity, they asked whether the Pcdhgs and isoform matching influence the likelihood of synaptic pairing between 2 nearby pyramidal neurons. They performed simultaneous whole-cell recordings of 6 pyramidal neurons in cortical slices, and measured paired connections by evoked monosynaptic responses. In these experiments, they measured synaptic connectivity between pyramidal neurons lacking the Pcdhgs, or overexpressing dissimilar or matching sets of Pcdhg isoforms introduced by electroporation of plasmids encoding Pcdhg cDNAs.

      Overall, the study applies elegant methods that demonstrate that single cortical neurons express different combinations of Pcdh-gamma isoforms, including the upper layer Pyramidal cells that are assayed in paired recordings. The electrophysiology data demonstrate that nearby Pyramidal neurons lacking the entire Pcdhg cluster are more likely to be synaptically connected compared to the control neurons, and that overexpression of matching isoforms between pairs decreases the likelihood to be synaptically connected. These are important and compelling findings that advance the idea that the Pcdhgs are important for cortical synaptic connectivity, and that the repertoire of isoforms expressed by neurons influence their connectivity patterns potentially through a self/nonself discrimination mechanism. However, the findings are limited to probability in connectivity and do they do not support the authors' conclusions that Pcdhg isoforms regulate synaptic specificity, 'by preventing synapse formation with specific cells' or to 'unwanted partners'. Characterizations of the cellular basis of these defects are needed to determine whether they are secondary to other roles in cell positioning, axon/dendrite branching and synaptic pruning, and overall synaptic formation. Claims that Pcdh-alpha and Pcdhg C-type isoforms are not functionally required are premature, due to limitations of the experiments. Moreover, claims that 'similarity level of γPCDH isoforms between neurons regulate the synaptic formation' are not supported due to weak statistical analyses presented in Fig4. The overstatements should be corrected. There was also missed opportunity to clearly discuss these results in the context of other published work, including recent publications focused on the cortex.

      Thank you for your feedback on the strengths and weaknesses of our work. In terms of the cellular basis of affected synaptic connectivity caused by γ-PCDH isoforms, we have compared the probability of connectivity for neuronal pairs with similar range of distance. Our findings indicate that the manipulation primarily affects pairs within the 50-150 micrometer range, suggesting that cell positioning might be a critical factor for the impact of γ-PCDH on synapse formation. However, we acknowledge that we couldn't definitively determine whether the negative effect on synaptic connectivity stems directly from impaired synapse formation or indirectly from synaptic pruning or the influence of PCDHγ on axon/dendritic branching. We've added these limitations to our discussion to provide a more comprehensive view of our research. Furthermore, we've adjusted our statements to better reflect the significance of our findings. Your feedback has been instrumental in improving the clarity and depth of our manuscript.

      Strengths:

      • The 5' end sequencing with a Pcdhg-amplified library is a technical feat and addresses the pitfall of conventional scRNA-Seq methods due to the identical 3'sequences shared by all Pcdhg isoform and the low abundance of the variable exons. New figures with annotated cell types confirm that several pyramidal and inhibitory cortical subpopulations were captured.

      Statistical assessment of co-occurrence of isoform expression within clusters is also a strength.

      • By establishing the combinatorial expression of Pcdhgs by maturing pyramidal cells, the study further substantiates the 'single neuron combinatorial code for cPcdhs' model. Although combinatorial expression is not universal (ie. serotonergic neurons), there was limited evidence. The findings that individual pyramidal neurons express ~1-3 variable Pcdhg transcripts plus the Ctype transcripts aligns with single RT-PCR studies of single Purkinje cells (Esumi et al 2005; Toyoda et al 2014). They differ from the findings by Lv et al 2022, where C-type expression was lower among pyramidal neurons. OSNs also do not substantially express C-type isoforms (Mountoufaris et al 2017; Kiefer et al 2023). Differences, and the advantages of the 5'end -directed sequencing (vs. SmartSeq) could be raised in the discussion.

      • Simultaneous whole-cell recordings and pairwise comparisons of pyramidal neurons is a technically outstanding approach. They assess the effects of Pcdhg OE isoform on the probability of paired connections.

      • The connectivity assay between nearby pairs proved to be sensitive to quantify differences in probability in Pcdhg-cKO and overexpression mutants. The comparisons of connectivity across vertical vs lateral arrangement are also strengths. Overexpressing identical Pcdhg isoform (whether 1 or 6) reduces the probability of connectivity, but there are caveats to the interpretations (see below).

      Weaknesses:

      n earby pairs but are not sufficient evidence for synapse specificity. The cPcdhs play multiple roles in neurite arborization, synaptic density, and cell positioning. Kostadinov 2015 also showed that starburst cells lacking the Pcdhgs maintained increased % connectivity at maturity, suggesting a lack of refinement in the absence of Pcdhgs. The known roles raise questions on how these manipulations might have primary effects in these processes and then subsequently impact the probability of connectivity. Investigations of morphological aspects of pyramidal development would strengthen the study and potentially refine the findings. The authors should more clearly relate their findings to the body of cPcdh studies in the discussion.

      Previous studies revealed the adverse effects of γ-PCDHs on dendritic spines, demonstrating that their absence results in increased dendritic spines density, while overexpression leads to a reduction. In our study, we consistently observed that γ-PCDHs exert a negative influence on synaptic connectivity. This consistency strengthens the overall body of evidence in support of the role of γ-PCDHs in synaptic connectivity and dendritic spine regulation. While we have previously mentioned this point in our discussion to highlight the concordance between our findings and prior research, your input is greatly appreciated in reinforcing the scientific context of our work.

      • Pcdhg cKO-dependent effects on connectivity occur between closely spaced soma (50-100um - Figure 2E), highlighting the importance of spatial arrangement to connectivity (also noted by Tarusawa 2016). Was distance considered for the overexpression (OE) assays, and did the authors note changes in cell distribution which might diminish the connectivity? Recent work by Lv et al 2022 reported that manipulating Pcdhgs influences the dispersion of clonally-related pyramidal neurons, which also impacts the likelihood of connections. Overexpression of Pcdhgc3 increased cell dispersion and decreased the rate of connectivity between pairs. Though these papers are mentioned, they should be discussed in more detail and related to this work.

      Our data indicated that variable γ-PCDH isoforms primarily influence synaptic connectivity in neuronal pairs within the 50-150 micrometer range. Notably, as the distance between neurons increases, we observed a corresponding reduction in synaptic connectivity, as illustrated in Figure 2E. We have also included additional discussion regarding potential variances among different C-type isoforms.

      • Though the authors added suggested citations and improved the contextualization of the study, several statements do not accurately represent the cited literature. It is at the expense of crystalizing the novelty and importance of this present work. For instance, Garrett et al 2012 PMID: 22542181 was the first to describe roles for Pcdhgs in cortical pyramidal cells and dendrite arborization, and that pyramidal cell migration and survival are intact. Line 52 cited Wang et al 2002, but this was limited to gross inspection. Garrett et al is the correct citation for: 'The absence of γ-PCDH does not cause general abnormality in the development of the cerebral cortex, such as cell differentiation, migration, and survival (Wang et al., 2002).' Second, single cell cPcdh diversity is introduced very generally, as though all neuron types are expected to show combinatorial variable expression with ubiquitous C-Type expression. But those initial studies were limited to Purkinje cells (Esumi 2005 and Toyoda 2014). Profiling of serotonergic neurons and OSN reveals different patterns (citations needed for Chen 2017 PMID: 28450636; Mountofaris et al PMID: 2845063; Canzio 2023 PMID: 37347873), raising the idea that cPcdh diversity and ubiquitous Ctype expression is not universal. Thus, the authors missed the opportunity to emphasize the gap regarding cPcdh diversity in the cortex.

      We would like to extend our gratitude to the reviewer for pointing out the citation related to the roles of γ-PCDHs in the neocortex. After a thorough review of both papers, Wang et al., 2002 and Garrett et al., 2012, we concur that it would be more appropriate to cite both of these papers here. Your suggestion to underscore the diverse expression patterns of γPCDHs in neocortical neurons is well-received, and we have integrated this aspect of our findings with previous observations into a new paragraph within the discussion section. Your insights have greatly enriched the depth of our paper, and we genuinely appreciate your contribution.

      • They have not shown rigorously and statistically that the rate of connectivity changes with% isoform matching. In Figure 4D, comparisons of % isoform matching in OE assays show a single statistical comparison between the control and 100% groups, but not between the 0%, 11% and 33% groups. Is there a significant difference between the other groups? Significant differences are claimed in the results section, but statistical tests are not provided. The regression analysis in 4E suggests a correlation between % isoform similarity and connectivity probability, but this is not sound as it is based on a mere 4 data points from 4D. The authors previously explained that they cannot evaluate the variance in these recordings as they must pool data together. However, there should be some treatment of variability, especially given the low baseline rate of connectivity. Or at the very least, they should acknowledge the limitations that prevent them from assessing this relationship. Claims in lines 230+ are not supported: ' Overall, our findings demonstrate a negative correlation between the probability of forming synaptic connections and the similarity level of γPCDH isoforms expressed in neuron pairs (Fig. 4E)".

      We employed a bootstrap method to estimate the potential variance in the analysis presented in Fig. 4E. It's important to note that due to methodological limitations, a comprehensive assessment of variance based solely on recordings from a single animal is challenging. As such, we have adjusted our claims to be more aligned with our observations.

      • Figure 4 provides connectivity probability, but this result might be affected by overall synapse density. Did connection probability change with directionality (e.g between red to green cells, or green to red cells).

      As suggested by the reviewer, we have conducted an analysis to assess the directionality of connections under different conditions. This analysis involved comparing the directionalities of connections following the overexpression of six variable isoforms, as depicted in Fig. 3E. Upon examining 33 connected OE-Ctrl pairs following the electroporation of these 6 isoforms, we observed 3 pairs with bidirectional connections, 19 pairs with connections from OE to Ctrl, and 11 with connections from Ctrl to OE. To assess the statistical significance of these observations, we applied a Chi-square test. The results from this analysis indicated that there was no significant difference in the directionality of connections. These findings offer further support for the idea that overexpressing multiple γ-PCDH isoforms within a single neuron might not be sufficient to alter its connections with other neurons.

      • Generally, the statistical approaches were not sufficiently described in the methods nor in the figure legends, making it difficult to assess the findings. They do not report on how they calculated FDR for connectivity data, when this is typically used for larger multivariate datasets.

      We employed the False Discovery Rate (FDR) correction, specifically the BenjaminiHochberg method, to determine which values remained statistically significant. This method is widely accepted and involves inputting all the p-values and the total number, 'n.' Additional details about this correction are now provided in the Method section for clarity.

      • The possibility that the OE effects are driven by total Pcdhg levels, rather isoform matching, should be examined. As shown by qRT-PCR in Fig. 3, expression of individual isoforms can vary. It is reasonable that protein levels cannot be measured by IHC, although epitope tags could be considered as C-terminal tagging of cPcdhs preserves the function in mice (see Lefebvre 2008). Quantification of constant Pcdhg RNA levels by qRT-PCR or sc-RT-PCR would directly address the potential caveat that OE levels vary with isoform combinations.

      Through a series of multiple whole-cell recordings, we examined neuronal pairs within the 0% group, where both neurons exhibited overexpression of different combinations of γPCDH isoforms. What we discovered is that the connectivity level within pairs of neurons where both neurons overexpressed γ-PCDH isoforms, pairs with only one neuron overexpressing these isoforms, and pairs with two control neurons (lacking overexpression) was remarkably similar. However, as we incrementally raised the similarity level between the recorded neurons by increasing the overlap in the combinatorial expression of γ-PCDH isoforms, we observed a gradual decrease in the connectivity probability between these neurons. Notably, the connectivity probability reached its minimum when the recorded cells had the exact same combinatorial expression of γ-PCDH isoforms at the 100% similarity level. These findings suggest that the similarity level between neurons, rather than the absolute expression level of γ-PCDH isoforms, plays a critical role in affecting synapse formation.

      -A caveat for the relative plasmid expression quantifications in Figure 3-S1 is that IHC was used to amplify the RFP-tagged isoform, and thus does not likely preserve the relationship between quantities and detection.

      We attempted to enhance the mNeongreen signal, known for its exceptional signal-tonoise ratio, by utilizing the 32f6-100 antibody from Chromotek. However, our observations did not reveal any additional cells through immunostaining compared to the images obtained solely based on the mNeongreen signal. This indicates that the application of the available antibody did not yield a significant improvement in cell detection.<br /> It's important to emphasize that if the RFP signal is overvalued, it would result in an increase in both the "red only" and "red in total" categories. However, it's worth noting that the "red only" category is more sensitive to the outcome than the "red in total" category. Therefore, an overvaluation of the RFP signal would lead to an underestimation of the total estimated plasmid content in electroporated neurons. Consequently, this would result in a lower estimate for the proportion of co-expression cells rather than a higher estimate. We have updated the calculation method in the "Estimating the numbers of overexpressed γPCDH isoform" section to reflect these considerations.

      • Figure 1 didn't change in response to reviews to improve clarity. New panels relating to the scRNASeq analyses were added to supplementary data but many are central and should be included in Figure 1 (ie. S1-Fig6D). In the Results, the authors state that neuronal subpopulations generally show a combinatorial expression of some variable RNA isoforms and near ubiquitous C-type expression. But they only show data for the Layer 2/3 neuron-specific cluster in S1-Fig-6D, and so it is not clear if this pattern applies to other clusters. Fig. S1-5 show a low number of expressed isoforms per cell, but specific descriptions on whether these include C-type isoforms would be helpful. Figure 1F showing isoform profile in all neurons is not particularly meaningful. There is a lot of interest in neuron-type specific differences in cPcdh diversity, and the authors could highlight their data from S1-5 accordingly.

      In addition to the layer 2/3 cluster, we observed a diverse combinatorial expression of various variable γ-PCDH isoforms alongside nearly ubiquitous C-type expression in all other clusters of cells. We have now explicitly mentioned this observation in the main text. To underscore this point further, we have included a new figure, Fig. 1-S6, which provides information on the similarity analysis for all other clusters. It's important to note that the data in previous Fig. S1-5 (now renumbered as S1-7) were solely related to "variable" isoforms. We apologize for any confusion and have made this clarification by including it in the title of the figure.

      • The concept of co-occurrence and results should be explained within the results section, to more clearly relate this concept to data and interpretations. Explanations are now found in the methods, but this did not improve the clarity of this otherwise very interesting aspect of the study.

      Thanks for your suggestion. We have incorporated some of the explanations from the methods section into the main text t, mainly for the concept of “co-occurence”.

      • The claim that C-type Pcdhgs do not functionally influence connectivity is premature. Tests were limited to PcdhgC4, which has unique properties compared to the other 2 C-type isoforms (Garrett et al 2019 PMID: 31877124; Mancia et al PMID: 36778455). The text should be corrected to limit the conclusion to PcdhgC4, and not generally to C-type. The authors should test PcdhgC3 and PcdhgC5 isoforms.

      We have changed the claim for PcdhgC4, but not generally for C-type to better reflect our observation.

      • The group generated a novel conditional Pcdh-alpha mouse allele using CRISPR methods, and state that there were no changes in synaptic connectivity in these Pcdh-alpha mutants. But this claim is premature. The Southern blots validate the targeting of the allele. But further validations are required to establish that this floxed allele can be efficiently recombined, disrupting Pcdha protein levels and function. Pcdha alleles have been validated by western blots and by demonstration of the prominent serotonergic axonal phenotype of Pcdha-KO (ie. Chen 2017 PMID: 28450636; IngEsteves 2018 PMID: 29439167).

      We have obtained a new set of qRT-PCR data that confirms the decreased expression of α-PCDH in Pcdha CKO mice. These data have been integrated into Figure 2-S2D.

      • The Discussion would be strengthened by a deeper discussion of the findings to other cPcdh roles and studies, and of the limitations of the study. The idea that the Pcdhgs are influencing the rate of connectivity through a repulsion mechanism or synaptic formation (ie through negative interactions with synaptic organizers such as Nlgn - Molumby 2018, Steffen 2022) could be presented in a model, and supported by other literature.

      I would like to express my sincere appreciation to the reviewer for their invaluable comments and suggestions, which have led to extended discussions within our work. We have incorporated these suggestions into our paper to establish stronger connections between our observations and prior research findings.

      Reviewer #1 (Recommendations For The Authors):

      1) In Figure S6, the authors measured Euclidean distance from the single cell data to take account of the isoform expression levels in explaining diversity. However, it is hard to interpret the data without any control. The authors could measure the same value from a shuffled /randomized dataset for comparison (similarly to Fig 1F).

      We understand the reviewer's concern about the significance of the Euclidean distance analysis without an appropriate control. The inclusion of the Euclidean distance metric was initially a response to suggestions from other reviewers who recommended incorporating diverse methods for analyzing expression patterns among neurons.

      In response to your valuable feedback, we have taken measures to address these concerns. We have introduced shuffled data for comparison, thus enhancing the meaningful context for interpreting the results derived from the Euclidean distance analysis.

      2) The authors need to clarify which cortical regions were used for electrophysiological experiments.

      Apologies for any confusion. To clarify, all recordings were conducted on neurons located in layer 2/3 of the neocortex without further discrimination. We have reinstated this information in both the main text and the methods section to ensure its clarity.

      Reviewer #2 (Recommendations For The Authors):

      There are still some issues that must be addressed.

      1) The references to gamma protocadherin repulsion are not correct in context. A repulsive role of homophilic interaction has been inferred from certain knockout phenotypes in a subset of neurons (not in cortical neurons). However, repulsion has never been shown to follow gamma protocadherin engagement. The authors present no new evidence that their results are attributable to cellular repulsion at nascent synaptic contacts. The mechanism is unknown. The references to repulsion to explain their results should make it clear that this is one possible explanation, but it is not shown. Also some references in the text are not correct. For example, line 63/64: the results of Molumby and Steffen are not involving homophilic adhesion or repulsion, but rather a cis interaction with neuroligins. Those papers should not be discussed as involving repulsion as in the reference to Lefebvre 2012. Also line 268/269 "Together with previous findings (Molumby,,,Tarusawa), our observations solidify repulsion effect of g-PCDH on synapse formation. . .". This is not the case. Neither Molumby nor Tarusawa demonstrated any such repulsion.

      Thank you to the reviewer for pointing out the errors in our citations. We have made the necessary corrections to the citations and have also refined the descriptions of our observations to improve clarity and accuracy.

      2) The discussion of the results when C4 is overexpressed must also be greatly toned down. C4 is a strange C-type protein--it cannot get to the cell surface alone but relies on other cPCDHs for this, and its primary role is in preventing cell death. It is odd that the authors used this isoform to represent C-types. They should have used C3, which two recent papers showed have specific roles at some synapses (Meltzer et al 2023, Ginty lab) and in dendrite branching (Steffen et al 2023, Weiner lab) , or C5. It is entirely possible that just C4 has no role in synaptic matching--but C3 and C5 might. They should not conclude that the C-types have no such role and only A and B types do. That must be toned down (e.g., line 198/199, line 281).

      We acknowledge that using C4 to represent all three C-types (C3, C4, and C5) is not accurate. We have now modified the statement in the main text to rectify this.

      3) For the citation of Pcdhg flox/flox mice (line 126), Prasad et al., Development, 2008, Weiner lab, should also be cited as it fully characterized that line that was also used in Lefebvre et al 2008. They were co-published.

      Thank you for highlighting the missing citation, and we have now included it in the relevant section.

      4) the Pcdh alpha KO Mouse characterization is still insufficient. The authors must show that alpha expression is gone following introduction of Cre, either by RT-PCR using alpha constant domain primers, or an alpha antibody on Western. blot. The southern and off-target sequencing do not confirm that all alpha gene expression is gone.

      Thank you for your feedback. We have conducted the qRT-PCR analysis as per your suggestion. The results clearly indicate a substantial reduction in α-PCDH expression within the neocortex of Pcdha cKO mice. We have thoughtfully incorporated this data into the manuscript, and it is visually represented in the new panel of Figure 2-S2D. Your valuable input has contributed to enhancing the quality of our work, and we sincerely appreciate the opportunity to address this important aspect.

      5) I do not understand something in Figure 4-S1A. Why with 0% matching is synaptic connectivity so low? This is not the same as in Figure 3E. This has to be explained because it does suggest that overexpression of ANY isoforms can inhibit synapse formation, which is consistent with Molumby 2017, even though this paper says it is not just the levels but the isoform specificity.

      The panel of Fig.4-S1A illustrates the connection rate between neurons with the same color (icons in upper left), representing cells that express the same combination of γ-PCDHs (100% of similarity). The X-axis (0%, 11%, 33%, and 100%) reflects the similarity level between the 2 populations of cells (GFP and RFP).

      6) There are still issues with the English grammar in the paper. It is not too bad in the main text but someone should re-edit it. However, the figure legends are indeed much worse and truly must be edited professionally before they are acceptable.

      We apologize for our English writings in the paper. We have now polished most part of the manuscript, especially the parts for figure legends.

      Reviewer #3 (Recommendations For The Authors):

      • This study has many strengths and innovative findings. Most comments above included suggestions to strengthen the paper. The overall message that Pcdhgs influence the rate of synaptic connectivity between nearby cells is compelling. How this Pcdhg-isoform-dependent process could influence synaptic specificity can be explored in a model in the discussion. But this study did not test a role in 'synaptic specificity'; this term should be removed from the title and line 81 in the intro.

      Thank you for your invaluable comments aimed at improving our paper. Regarding the title, we believe that "synaptic connectivity" might be a more suitable choice than "synaptic specificity." However, we're open to considering other alternatives as well.

      • The manuscript and overall quality of the science will be improved by removing those sections that are not adequately investigated (ie.Pcdh-a cKO; PcdhgC4 is assessed but findings can't be extended to other C-type isoforms) and by outlining limitations of the study.

      We have modified the related claim mentioned in the main text.

      • The studies negatively correlating between isoform matching and connectivity are not robust. Additional approaches are needed if the authors want to make this claim.

      In Figure 4E, we have implemented a bootstrapping method. Bootstrapping is a statistical technique falling under the broader category of resampling methods. It involves random sampling from the observed data with replacement, enabling the calculation of standard errors, confidence intervals, and supporting hypothesis testing.

      • Statistical approaches should be described in methods, figure legends.

      More information about statistical approaches has been added in the figure legends.

      • The discussion should elaborate on the limitations of the study, and relate to other studies, including Lv et al 2022.

      We have added more discussion to relate our observations to previous findings.

    1. Author Response

      Note to the editor and reviewers.

      All the authors would like to thank the editorial team and the two anonymous reviewers for their efforts and thoughtfulness in assessing our manuscript. We very much appreciate it and we all believe that the manuscript has been much improved in addressing the comments and suggestions made.

      General considerations on the revised manuscript

      We have applied extensive modifications to the manuscript with our main goal being the improvement of clarity. The Introduction has been changed mainly to introduce precisely our terminology and we have stuck to it in the rest of the manuscript. The Results section has been divided up into more defined sections. The discussion has been extensively re-written to improve clarity, following the suggestion of the reviewers. Main figures 1 and 4 have been modified with clearer schematics. Supplementary figures and legends have been modified and several supplementary schematic figures have been added to clearly present our interpretations for various data. We have added a Supplementary Discussion where the most detailed technical parts of our discussion are presented to avoid unnecessarily weighing down the main discussion, where our main conclusions are outlined. We have presented our mass photometry mixing experiment in a new supplementary figure, with detailed explanation. We have also expanded our discussion of in vivo and general relevance of our study.

      Response to manuscript evaluation

      Our manuscript has been evaluated as a valuable study and presenting solid experimental evidence. We appreciate the recognition of our work.

      Two weaknesses were identified by reviewers: 1) our experiments do not completely exclude the possibility of an alternative nucleophile. This relates to the evaluation of our experimental evidence. 2) Our study does not address the in vivo relevance of the interface swapping phenomenon, which relate to the value of the study for the community.

      Response to the evaluation of experimental evidence (Weakness #1):

      We argued in the original manuscript that we have excluded completely the presence of an alternative nucleophile. This conclusion is based on a series of experiments which were presented in the originally submitted manuscript. These experiments are not discussed by the reviewers in relation to this main conclusion and therefore we suggest that they have not been properly evaluated. We believe our conclusion to be appropriately supported by these data (see our response to reviewer #1). In addition, the criticism of our gel-filtration data by reviewer #2 was based on a misinterpretation of Supplementary figure 1 b. We accept of course that the way the data was presented could be misleading and we assume responsibility for this. We have attempted to correct this by changing the main text and the figures legends and annotation. In conclusion, we believe that the evaluation of experimental evidence as presented in the revised manuscript could be upgraded to “convincing”.

      Response to our study general relevance evaluation (weakness #2):

      We agree with both reviewers about the in vivo relevance of our observation being an important question, not addressed so far. Indeed, the value of our study would be greatly increased by in vivo data and be of interest to a wider audience. However, we would like to argue that our study would interest a wider audience than initially stated for the following reasons: 1) Our study is the first evidence of interface swapping in vitro and will constitute a base to investigate this phenomenon both in vivo and in vitro. It will therefore interest a wide audience due to the potential involvement of interface swapping in a wide range of processes, such as recombination, evolution, and drug targeting (see also below). 2) DNA cleavage is the central mode of action of antibiotics targeting bacterial type II topoisomerases (i.e. topoisomerases “poisons”). This already established target is one of the few having produced new scaffolds and too few new antibacterial are in production to fulfill medical needs. The role of interface stability is also emerging as a modulator of the efficiency of topoisomerase poisons. See for instance (Germe, Voros et al. 2018, Bandak, Blower et al. 2023). By shedding light on interface dynamics, our study will be of interest to scientist interested in the development of these drugs. In addition, the heterodimer system can potentially produce detailed mechanistic information (Gubaev, Weidlich et al. 2016, Hartmann, Gubaev et al. 2017, Stelljes, Weidlich et al. 2018) not only on gyrase but also on other, dimeric type II topoisomerases or even other dimeric enzyme in general. We have amended the manuscript to make these points clearer. Therefore, we believe that the evaluation of the revised manuscript’s relevance could be upgraded to “important”.

      Point-by-point response to the reviewer

      Reviewer #1 (Public Review):

      Germe and colleagues have investigated the mode of action of bacterial DNA gyrase, a tetrameric GyrA2GyrB2 complex that catalyses ATP-dependent DNA supercoiling. The accepted mechanism is that the enzyme passes a DNA segment through a reversible double-stranded DNA break formed by two catalytic Tyr residues-one from each GyrA subunit. The present study sought to understand an intriguing earlier observation that gyrase with a single catalytic tyrosine that cleaves a single strand of DNA, nonetheless has DNA supercoiling activity, a finding that led to the suggestion that gyrase acts via a nicking closing mechanism. Germe et al used bacterial co-expression to make the wild-type and mutant heterodimeric BA(fused). A complexes with only one catalytic tyrosine. Whether the Tyr mutation was on the A side or BA fusion side, both complexes plus GyrB reconstituted fluoroquinolone-stabilized double-stranded DNA cleavage and DNA supercoiling. This indicates that the preparations of these complexes sustain double strand DNA passage. Of possible explanations, contamination of heterodimeric complexes or GyrB with GyrA dimers was ruled out by the meticulous prior analysis of the proteins on native Page gels, by analytical gel filtration and by mass photometry. Involvement of an alternative nucleophile on the Tyr-mutated protein was ruled unlikely by mutagenesis studies focused on the catalytic ArgTyrThr triad of residues. Instead, results of the present study favour a third explanation wherein double-strand DNA breakage arises as a consequence of subunit (or interface/domain) exchange. The authors showed that although subunits in the GyrA dimer were thought to be tightly associated, addition of GyrB to heterodimers with one catalytic tyrosine stimulates rapid DNA-dependent subunit or interface exchange to generate complexes with two catalytic tyrosines capable of double-stranded DNA breakage. Subunit exchange between complexes is facilitated by DNA bending and wrapping by gyrase, by the ability of both GyrA and GyrB to form higher order aggregates and by dense packing of gyrase complexes on DNA. By addressing a puzzling paradox, this study provides support for the accepted double strand break (strand passage) mechanism of gyrase and opens new insights on subunit exchange that may have biological significance in promoting DNA recombination and genome evolution.

      The conclusions of the work are mostly well supported by the experimental data.

      Strengths:

      The study examines a fundamental biological question, namely the mechanism of DNA gyrase, an essential and ubiquitous enzyme in bacteria, and the target of fluoroquinolone antimicrobial agents.

      The experiments have been carefully done and the analysis of their outcomes is comprehensive, thoughtful and considered.

      The work uses an array of complementary techniques to characterize preparations of GyrA, GyrB and various gyrase complexes. In this regard, mass photometry seems particularly useful. Analysis reveals that purified GyrA and GyrB can each form multimeric complexes and highlights the complexities involved in investigating the gyrase system.

      The various possible explanations for the double-strand DNA breakage by gyrase heterodimers with a single catalytic tyrosine are considered and addressed by appropriate experiments.

      The study highlights the potential biological importance of interactions between gyrase complexes through domain-or subunit-exchange

      We thank the reviewer for their support, effort, and comments. The above is a great summary.

      Weaknesses:

      The mutagenesis experiments described do not fully eliminate the perhaps unlikely participation of an alternative nucleophile.

      We agree that the mutagenesis experiment on its own does not fully eliminate the possibility of an alternative nucleophile. The number of residues mutated is limited, and therefore it is possible we have missed a putative alternative nucleophile.

      However, we have other data and experiments supporting the conclusion that no alternative nucleophile exists. Therefore, we want to stress that our conclusion that no such alternative exist is based on these extra data. These data and experiments are not discussed by either reviewer despite being present in the original manuscript. This puzzled us and we have modified the manuscript and the figures in the hope that they, and their significance, would not be missed.

      Briefly:

      1) We have performed cleavage-based labeling of the nucleophile responsible for cleavage. This experiment is depicted in Figure 4. The nucleophilic activity of the residue involved results in covalent link between the polypeptide (that includes the residue) and radiolabeled DNA. Therefore, a polypeptide that includes an active nucleophile will be radiolabeled and visible, whereas a polypeptide that is missing an active nucleophile will remain unlabeled and invisible. We can distinguish the BA and the A polypeptide from their size. In the case of the BA.A complex both the BA polypetide and the A polypetide are radiolabeled and therefore both have an active nucleophile. In the case of the BAF.A complex, the unmutated A polypeptide is labeled, meaning that a nucleophile is still active. In contrast, the BAF polypeptide shows no detectable labeling. This result means that removing the hydroxyl group from the catalytic tyrosine abolishes any protein-DNA covalent link, suggesting that no other nucleophile from the BA polypetidic chain can substitute for the catalytic tyrosine hydroxyl group. This experiment excludes the possibility of an alternative nucleophile coming from the polypeptidic chain of either GyrA or GyrB. This experiment, described in figure 4, is not discussed by the reviewer. This experiment is similar in principle to early experiments identifying catalytic tyrosine in topoisomerases. See for instance, (Shuman, Kane et al. 1989).

      2) The experiment above does not exclude a nucleophile coming from the solvent. To exclude this possibility, we have used T5 exonuclease (which needs a free 5’ DNA end to digest) and ExoIII (which need a free 3’ DNA end to digest). We have shown the reconstituted cleavage is not sensitive to T5 and sensitive to ExoIII. This shows that the 5’ end of the cleaved sites are protected by a bulky polypeptide impairing T5 activity, which is active in our reaction as shown by the digestion of a control DNA fragment. This experiment shows that the reconstituted cleavage is very unlikely to come from a small nucleotide potentially provided by the solvent. This experiment is described in the main text and the results are shown in supplementary figure 5. It is not mentioned by either reviewer.

      3) Finally, we would like to emphasize our experiment comparing the BAF.A59 to BALLL.A59. The BALLL.A59 complex displays increased cleavage compared to BAF.A59. If this increased cleavage was due to an alternative nucleophile on the BALLL side, we would expect an accompanying increase in supercoiling activity since the BALLL.A59 possesses one CTD, which is sufficient for supercoiling. The fact that no increased supercoiling activity is observed strongly suggests subunit exchange reconstituting an A59 dimer, inactive for supercoiling but active for cleavage. We believe this somewhat complex observation to be quite significant and we have attempted to clarify the manuscript and discuss its full significance in several places.

      Reviewer #1 (Recommendations For The Authors):

      An interesting paper on DNA gyrase that explains a puzzling paradox in terms of the double-strand break mechanism.

      Major points

      1) The authors consider several mechanisms that could potentially explain their data. On page 15, the authors present the evidence against the nicking closing mechanism proposed by Gubaev et al. Throughout the manuscript, they indicate where their experimental results agree with this earlier work but should also indicate and account for differences. For example, Gubaev et al describe cross linking experiments that they claim rule out subunit exchange. These aspects should be clearly explained.

      Thank you for the suggestion. We have re-written the discussion to address this point. We are extensively discussing experiments from (Gubaev, Weidlich et al. 2016), and offer our interpretation of apparently conflicting results. We suggest that their experiments are basically consistent with our data when correctly interpreted. To keep the main manuscript clear, we have added a supplementary discussion where experiments from (Gubaev, Weidlich et al. 2016) are discussed further in relation to our data.

      2) Page 9. The experiments done to rule out the perhaps unlikely alternative nucleophile hypothesis relate to the possible role of the Arg and Threonine of the RYT triad. These residues are close to the DNA and therefore are prime candidates and attractive targets for mutagenesis. However, strictly speaking, the mutant enzyme data presented do not rule all possibilities. For example, Serine is often the nucleophile used by resolvases to effect DNA recombination via subunit exchange. The ideal experiment to rule out/rule in other nucleophiles would be to identify the residue(s) that become attached to DNA in the cleavage reaction.

      Please see above. We have effectively ruled an alternative nucleophile with our cleavage-based labeling experiment and others that were present and discussed in the original manuscript but were missed. We have modified the manuscript and figures in order to make this point clearer than before.

      3) p17. The readout for subunit exchange used by the authors is double-stranded DNA cleavage. Attempts to directly detect the formation of the DNA cleaving complexes GyrA2B2 and (GyrBA)2 (arising from subunit exchange between heterodimers) by mass photometry were not successful. Perhaps FRET would have been another approach to try as it could also detect interface and domain interchanges.

      Directly detecting interface exchange directly by proximity experiment would be extremely useful. FRET would have to be done in the BAF.A + GyrB configuration where the amount of interface exchange is important. Now, we do not have the tools to do that and developing them would be outside the scope of the study. We propose cross linking experiment to be done in the future. We argue that the manuscript is convincing without these for now. This will be addressed in the future. This point, and other possible future experiments are now discussed in the discussion section.

      4) The underlying canvas of this paper is the strand passage mechanism of gyrase. It would seem appropriate to include the papers first proposing it - Brown P.O and Cozzarelli N.R. (1979) and Mizuuchi K et al (1980).

      We very much agree. These papers have now been added in the introduction as appropriate, highlighting the relationship between double-strand cleavage and the strand-passage mechanism.

      5) Figure 1. The quality of the insets is poor. It is difficult to pick out the key catalytic residues and their disposition vis-a-vis DNA.

      We agree, Figure 1 has been re-done and the schematic theme has been harmonized throughout the whole manuscript. We very much hope that clarity has improved. Thank you for the suggestion.

      6) The experimental work is a very detailed analysis of a specific feature of engineered gyrase heterodimers. Making the work accessible to the general reader will be important. Using shorter paragraphs each with a specific theme might help. In particular, the second paragraph of the Results on p7, the section on p9 and bottom of p11, p13 and the first paragraph of the Discussion on p14 are each a page or more long. A shorter manuscript that avoids overinterpretation of the smaller details would also help.

      We agree. We have now split long paragraphs into individual sections, with titles, in the Results. This structure is recapitulated at the beginning of the discussion, and we have split the discussion into shorter paragraphs, each with a unique point being made.

      7) The impact of the Gubaev et al (2016) paper for the field in general, and as the catalyst for the present work should be better documented. Mention of this earlier paper and its significance at the beginning of the Abstract and elsewhere e.g in the Introduction might also help with a more logical organization of the current findings and result in a shorter paper (which would be easier to read).

      We have added a reference to (Gubaev, Weidlich et al. 2016) in the abstract and have expanded our introduction

      Minor points

      1) Legends for Figs 2 and 6; Supplementary Figs 1 and 8. The designation of subfigures as a, b, c, d , e etc appears to be incorrect. Check throughout and in the text.

      The manuscript has been checked for such errors.

      2) Figure 2, and first paragraph p8. Peaks in Fig 2c should be labelled to facilitate discussion on p8.

      Agreed, this has been done.

      3) Supplementary Fig 4 and elsewhere in the manuscript. A variety of notations are used to denote phenylalanine mutants e.g. AsubscriptF, AsuperscriptF and AF. Check and use one format throughout.

      Done

      4) Figures showing gels include the label '+EtBr, +cipro'. This is somewhat confusing because EtBr was contained in the gel (not the samples) whereas cipro was included in the reaction. Modify or describe in the legend..

      We have re-written the figure legend.

      5) Supplementary Fig 4b describes a small effect on the ratio of linear to nicked DNA for the triple LLL mutant. Is this significant? How many times was the measurement made?

      This has been addressed in the original manuscript in the supplementary data. In term of quantification, the experiment has been done 3 times for each prep, with the same GyrB prep and concentration. The standard error is displayed on the figure. This result is very reproducible and have been reproduced more than 3 times. No LLL cleavage assay showed more single-strand than double-strand cleavage. For the phenylalanine mutant, no cleavage assay showed more double-strand than single-strand cleavage.

      6) Supplementary Fig 5 legend. Should 'L' read 'size markers' (and give their sizes)?

      Yes indeed, we have modified the figure to clarify.

      7) p11 line 5. Is this statement correct?

      Yes, it is correct. Although we hope we are on the same line. When the Tyrosine is mutated on one side only of the heterodimer, both single- and double-strand cleavage are protected from T5 exonuclease digestion.

      8) 12 last line should read...and supercoiling activity (not shown)..were

      Thank you, done.

      There are a number of typos throughout the text, for example:

      Page 3 line..Difficult to conclude...what?

      Page 3 para 3...Lopez....and Blazquez

      We have corrected these typos and checked the whole manuscript.

      Reviewer #2 (Public Review):

      DNA gyrase is an essential enzyme in bacteria that regulates DNA topology and has the unique property to introduce negative supercoils into DNA. This enzyme contains 2 subunits GyrA and GyrB, which forms an A2B2 heterotetramer that associates with DNA and hydrolyzes ATP. The molecular structure of the A2B2 assembly is composed of 3 dimeric interfaces, called gates, which allow the cleavage and transport of DNA double stranded molecules through the gates, in order to perform DNA topology simplification. The article by Germe et al. questions the existence and possible mechanism for subunit exchange in the bacterial DNA gyrase complex.

      The complexes are purified as a dimer of GyrA and a fusion of GyrB and GyrA (GyrBA), encoded by different plasmids, to allow the introduction of targeted mutations on one side only of the complex. The conclusion drawn by the authors is that subunit exchange does happen, favored by DNA binding and wrapping. They propose that the accumulation of gyrase in higher-order oligomers can favor rapid subunit exchange between two active gyrase complexes brought into proximity.

      The authors are also debating the conclusions of a previous article by Gubaev, Weidlich et al 2016 (https://doi.org/10.1093/nar/gkw740). Gubaev et al. originally used this strategy of complex reconstitution to propose a nicking-closing mechanism for the introduction of negative supercoils by DNA gyrase, an alternative mechanism that precludes DNA strand passage, previously established in the field. Germe et al. incriminate in this earlier study the potential subunit swapping of the recombinant protein with the endogenous enzyme, that would be responsible for the detected negative supercoiling activity.

      Accordingly, the authors also conclude that they cannot completely exclude the presence of endogenous subunits in their samples as well.

      Strengths

      The mix of gyrase subunits is plausible, this mechanism has been suggested by Ideka et al, 2004 and also for the human Top2 isoforms with the formation of Top2a/Top2b hybrids being identified in HeLa cells (doi: 10.1073/pnas.93.16.8288).

      Germe et al have used extensive and solid biochemical experiments, together with thorough experimental controls, involving :

      • the purification of gyrase subunits including mutants with domain deletion, subunit fusion or point mutations.

      • DNA relaxation, cleavage and supercoiling assays

      • biophysical characterization in solution (size exclusion chromatography, mass photometry, mass spectrometry)

      Together the combination of experimental approaches provides solid evidence for subunit swapping in gyrase in vitro, despite the technical limitations of standard biochemistry applied to such a complex macromolecule.

      We thank the reviewer for their supportive and considered comments.

      Weaknesses

      The conclusions of this study could be strengthened by in vivo data to identify subunit swapping in the bacteria, as proposed by Ideka et al, 2004. Indeed, if shown in vivo, together with this biochemical evidence, this mechanism could have a substantial impact on our understanding of bacterial physiology and resistance to drugs.

      Thank you for this comment. Indeed, whether this interface exchange can happen in vivo and lead to recombination is a very important question. However, we believe that this is outside the scope of this study simply because of the amount of work one can fit into one paper. Proving that interface exchange can happen in vitro has already necessitated a number of non-trivial experiments and likewise investigating interface exchange in vivo will require a careful, long-term study (see our reply to reviewer #2 comment, who also raised this point). We can’t address it with one additional experiment with the tools we have. However, we very much hope to do it in the future.

      Reviewer #2 (Recommendations For The Authors):

      Specific questions and comments for the authors:

      1) Complex identification during purification

      The statement line 236-237 that "Our heterodimer preparation showed a single-peak on a gel-filtration column, distinct from the GyrA dimer peak" is not entirely clear. In Fig supp 1 b, how can the authors conclude from the superose 6 that GyrBA is separated from the GyrA dimer? Since they seem close in size 160/180kDa, they are unlikely to be well separated in a superose 6 gel filtration column. The SDS-PAGE seems to show both species in the same fractions #15-17 therefore it would not be possible to distinguish GyrBA. A from A2.

      There appears to be some confusion about what Supp Fig. 1b shows. First, in all our gel filtration conditions both GyrBA and GyrA can’t exist as monomers at a significant concentration. Therefore, we can never observe the GyrBA monomer on a gel filtration column. Supp Fig. 1b shows the gel filtration profile of the BA.A heterodimer only. This is the output of the last, polishing step in the reaction. We analyze these results using SDS-PAGE. Therefore, the BA.A heterodimer will be denatured and separated into 2 polypeptides: GyrBA and GyrA, which migrates according to their size in an SDS-PAGE and forms two bands. These two bands do not represent two separate species in solution. They represent the separation of one species only, the BA.A heterodimer into its two, denatured, subunits: GyrA and GyrBA. We do not conclude from Supp Fig. 1 as a whole that GyrBA and the GyrA dimer are well separated, and this is not stated in the manuscript. We conclude that the BA.A dimer is fairly well separated from the GyrA dimer. They have significant different size (~260 kDa and ~180 kDa respectively) and form different peaks on a gel filtration column. The BA.A heterodimer has a GyrA subunit and therefore will shows a GyrA band on an SDS-PAGE, like the GyrA dimers but the two are obviously distinct in their quaternary structure. We are hoping that our new schematics and re-write of some of the results and figure legends will clarify this.

      Panel 6 shows a different elution volume for the 2 species BA.A and A2 on an analytical S200 column, which appears better at separating the complexes in this size range.

      Did the authors consider using a S200 column instead of superose 6 for the sample preparation, to optimize the separation of GyrBA. A from A2?

      This is not a necessarily true statement (see above). We have not run the GyrA dimer on a Superose 6 column. The analysis was done on an s200 because extensive data for the GyrA dimer was already available with this, already calibrated column. We do not expect the Superose 6 to be worse in this size range. In fact, it might even be better. The Superose 6 profile in Supp. Fig. 1b shows BA.A only and no GyrA dimer. We have clarified the annotations in the figure to make this clearer.

      Regarding the analytical gel filtration experiment, there is however an overlap in the elution volume in the analytical column, therefore how can the authors ensure there is no excess free A2 complex in the GyrBA. A sample?

      Indeed, there is an overlap, but we argue that it is overstated. The important part of the overlap is where the maximum height of the GyrA peak is positioned compared to the BA.A trace, not where the traces intersect. This overlap is minimal. If a contaminating GyrA peak was hidden in the BA.A peak, it would have to be at least 10 times less intense than the BA.A peak. Since BA.A and GyrA dimer have roughly the same extinction coefficient, this means that a contamination would detectable at 10 % or even less. Our mass photometry further excludes such contamination.

      Alternatively, the addition of a larger (cleavable) tag at the C-terminal end of the BA construct (therefore not disturbing dimer association) could allow to better distinguish the 2 populations already at the size exclusion step.

      This is true and could allow cleaner purification. There are also other ways to achieve cleaner purification, like adding a secondary tag. However, like we argue in the manuscript, our contaminations are already minimal. It is questionable what benefits could be gained in changing the protocol. We also argue that the tandem tag method does not completely exclude contamination (Supplementary Discussion) and therefore we are not sure if this would be worth the time and expenditure.

      2) GyrA and GyrB Oligomers:

      In the mass photometry experiment, the authors explain that the low concentration of the proteins promotes dissociation of GyrA dimers, hence the detection of GyrA monomers instead of GyrA dimers, which are also detected in the GyrBA.A sample.

      However, it cannot be concluded that the GyrA dimer is not formed in the condition of the gel filtration chromatography, at higher concentration.

      In our mass photometry experiment, The BA.A sample is not as diluted as the GyrA dimer and much closer to our experimental condition. Since we have calculated the dissociation constant, we can calculate the expected level of dissociation (or reassociation). The level of dissociation is minimal in these conditions. If some dissociation is expected from the BA.A heterodimers, a very low amount of GyrBA monomer should also be present and yet they are not observed. We presume that it is because mass photometry is much more sensitive to GyrA (see our mixing mass photometry experiment that we have added). If the GyrA would reassociate at higher concentration, it would do so either with itself (forming a GyrA dimer) or with the GyrBA monomer, reforming the heterodimer. Assuming both GyrA dimer and heterodimer have the same dissociation constant, roughly one third of the GyrA monomer would reassociate with themselves. Assuming even complete reassociation of the GyrA dimer, this would leave only GyrA dimer accounting for 2% of the prep.

      Another interpretation would be to assume that GyrBA monomers are not present at all and that GyrA monomer are reassociating only with themselves. This is not valid because of the following thermodynamic reason:

      Since the profile for the GyrA dimer are collected at equilibrium, we should expect a ratio between GyrA monomer and dimers that follow the dissociation constant. In other words, if the GyrA monomer were in equilibrium with GyrA dimer we should expect a much higher dimer concentration already as the GyrA monomers are not as dilute. We do not observe a GyrA dimer peak in the BA.A profile, even though we can detect a low amount of GyrA dimer mixed with BA.A. Therefore, we conclude that the observed GyrA monomer must be in equilibrium with another dimerization partner, which is most probably the GyrBA monomer (see above). Therefore, only a minimal amount of GyrA dimer is expected to be formed at higher concentration by direct reassociation. This could probably increase if we let this solution-based exchange carry on for a long time at dissociation equilibrium. We have actually shown that this solution-based exchange is very slow and take several days because of the low dissociation at equilibrium.

      The mass spectrometry analysis in Fig 2 confirms the presence of (monomeric) GyrA in the sample, despite different experimental conditions.

      The concentration of heterodimer in the mass spectrometry experiment is actually higher than in the mass photometry experiment. This shows that self-reassociation of the GyrA monomer as suggested above is undetectable with mass spectrometry at higher concentration.

      We considered that the “GyrA monomer” peak could be a contaminating GyrB monomer, which is ~90 kDa, which would explain the lack of reassociation. However, the mass spectrometry peak shows precisely the expected molecular weight of GyrA so we interpret this peak as arising from very limited dissociation of the BA.A heterodimer. The reassociation is limited at high concentration due simply to the fact that the difference in concentration between the mass photometry and our other experimental conditions is not that high. The GyrA dimer had to be diluted 400 times to see significant dissociation and yet even at this very low concentration the dissociation is far from complete.

      Our general conclusions on the couple of point above is that we cannot completely exclude the presence of GyrA dimers being present, although they are undetectable in our working conditions either by mass photometry (lower concentration), Mass spectrometry (higher concentration) and even gel filtration (even higher concentration, see above). For the mass photometry, we have established that our detection threshold for a contamination is very low (see our mixing experiment).

      Figure 2A: the authors state in the introduction that GyrB is a monomer in solution and then explain that the upper bands in the native gel are multimer of GyrB. Could the authors comment and provide the size exclusion profile of the Gyr B purification?

      We have expanded our discussion of this. However, we have not been successful in collecting a gel filtration profile for GyrB. This is likely due to excessive oligomerization at the concentration we are using for gel filtration. We suggest that our mass photometry and Blue-Native PAGE experiment shows clearly that GyrB can be detected as a monomer in solution at the appropriate dilution. However, GyrB tends to oligomerize in a regular fashion (Consider especially Supp Fig. 8a), which suggest that it could align heterodimers on DNA in a linear, regular orientation. We have added a discussion of this.

      Together the relevance of the oligomeric state of purified GyrA or GyrB should be clarified, relative to their role in subunit swapping.

      We have added explanation in our discussion, while also trying to not be too speculative. Basically, we believe that GyrB oligomerization is likely to be involved. It is difficult to conclude for GyrA since no experiment has allowed us to test it. Therefore, the role of GyrA oligomerization, if any, is unclear. The GyrA tetramer is very prominent though and forms very easily. GyrB on the contrary forms longer oligomers more readily than GyrA and we surmise that this would help interface exchange. However, the structure of these GyrA and GyrB oligomers is not clear, which make it difficult to go beyond speculation on this. It would be a very interesting experiment if we were able to suppress GyrB oligomerization whilst conserving its ability to promote strand-passage and cleavage. Same goes for GyrA. Unfortunately, we are unable to do that at this time.

      4) Subunit exchange

      Line 320: the concept of subunit exchange in this context should be clearly explained. If one understands correctly, the authors mean that the BAF polypeptide, part of the BAF.A complex, could be replaced by a combination of B+A therefore forming a fully functional WT A2B2 gyrase complex.

      Thank you for the suggestion. We have harmonized and clearly defined our terminology for interface swapping and subunit exchange in the introduction and attempted to be much more rigorous when referring to it.

      A great effort has been done in this study to explain all the pros and cons of the experimental design but the length of the explanations may prevent readers outside of the field to fully appreciate the conclusions. This article would benefit from the addition of a few schematics to summarize the working hypothesis.

      Thanks for the suggestion. We have added a series of schematics to illustrate our interpretation for each construct. As mentioned above the terminology has been more rigorously defined and updated throughout the manuscript.

      5) Presence of endogenous GyrA

      Line 419-425: it is quite difficult to follow the explanations regarding the possible contamination of the sample by endogenous GyrA.

      Maybe these points should rather be addressed in the discussion, when debating the conclusions of Gubaev et al.

      We agree. We have re-organized the Discussion doing just that. We added a Supplementary Discussion in which we further discuss the contamination problem in relation to (Gubaev, Weidlich et al. 2016).

      Production of the subunits in another (non bacterial) expression system or a cell free system may prevent the association of endogenous protein.

      Absolutely. We are planning on addressing this in the future, using the yeast expression system.

      6) Mechanism for subunit swapping

      Lines 588-595: As described by the authors the BA fusion shows decreased activity when compared with the WT probably due to limited conformational flexibility in absence of an additional linker sequence between the fused subunits.

      The affinity of BA for A may possibly be reduced compared to the free A2B2 complex, due to a relative stiffness of the fusion upon full association with a free B subunit, as rightfully pointed by the authors.

      If subunit exchange do happen in vitro, at least in the conditions of this study, the authors could assess the affinity of BA for A, when compared to the association of free B and A subunits

      Experiments using analytical ultracentrifugation or surface plasmon resonance (SPR) may allow to determine the relative affinity of the BA +(A+B) compared to the A2B2 complex. This could be done also for the BALLL mutant and association with A59.

      It would be extremely useful to measure the affinity of BA for A. However, this is difficult because of the high affinity of the interface. To measure a dissociation constant, one has to be able to measure the concentration of the monomer and the dimer at equilibrium. Because of this, the complex must be diluted enough to see any dissociation, making detection difficult. In practice, this also means that we cannot purify monomeric versions of these subunits. We therefore can’t perform “on-rate” study on an SPR surface, which would require flowing monomers on its partner subunit tethered to the SPR surface. However, we could perform “off-rate” studies, but the dissociation time is likely to be very long, making the measurement difficult. We have not tried it though, and it could turn out to be informative. An analysis of antibodies off-rate done in the past could provide a guideline for us to perform this experiment. Analytical ultracentrifugation is an excellent technique and could in theory provide information. In practice however it would be still necessary to dilute the complex enough to obtain significant dissociation at equilibrium, making detection difficult. As far as we are aware, analytical ultracentrifugation rely on UV absorbance for protein detection and therefore we probably would not detect our material at the necessary dilution. We are however open-minded about technique with very sensitive detection methods that could be used.

      9) In vivo relevance

      The study does not conclude on the subunits exchange in vivo, which have been suggested by earlier studies by Ikeda et al. To elaborate further on the relevance of such mechanism in the bacteria, experiments involving the fluorescent labeling of endogenous / exogenous mutant subunits may be required to provide further information on this phenomenon.

      We completely agree that the in vivo relevance of such phenomena is the central question. Addressing this directly is not trivial though. Expressing both BA and A in vivo will results in random partnering and lead to a mix of dimers: A2 (1/4), BA2(1/4) and BA.A (1/2), assuming equal interface affinity. Therefore, to see subunit exchange in the same way as in vitro, one would have to get rid of the BA2 and A2 dimer together, or the BA.A dimer only. Our initial strategy to do that would be to engineer a specific dimer as being uniquely targeted for degradation. This could allow us to “get rid” of for instance the BA.A dimer. Subsequently, we would turn off the degradation and translation together and observe the rate of subunit exchange. This is not trivial though and would be the subject of a further study.

      10) Figure 3: I guess the "intact" label refers to the supercoiled DNA (SC) ? It also appears as "uncleaved" in supp Figure 6. The same label for this topoisomer should be used throughout.

      Thank you for pointing that out. It has now been corrected.

      Bandak, A. F., T. R. Blower, K. C. Nitiss, R. Gupta, A. Y. Lau, R. Guha, J. L. Nitiss and J. M. Berger (2023). "Naturally mutagenic sequence diversity in a human type II topoisomerase." Proceedings of the National Academy of Sciences 120(28).

      Germe, T., J. Voros, F. Jeannot, T. Taillier, R. A. Stavenger, E. Bacque, A. Maxwell and B. D. Bax (2018). "A new class of antibacterials, the imidazopyrazinones, reveal structural transitions involved in DNA gyrase poisoning and mechanisms of resistance." Nucleic Acids Res.

      Gubaev, A., D. Weidlich and D. Klostermeier (2016). "DNA gyrase with a single catalytic tyrosine can catalyze DNA supercoiling by a nicking-closing mechanism." Nucleic Acids Res 44(21): 10354-10366.

      Hartmann, S., A. Gubaev and D. Klostermeier (2017). "Binding and Hydrolysis of a Single ATP Is Sufficient for N-Gate Closure and DNA Supercoiling by Gyrase." J Mol Biol 429(23): 3717-3729. Shuman, S., E. M. Kane and S. G. Morham (1989). "Mapping the active-site tyrosine of vaccinia virus DNA topoisomerase I." Proc Natl Acad Sci U S A 86(24): 9793-9797.

      Stelljes, J. T., D. Weidlich, A. Gubaev and D. Klostermeier (2018). "Gyrase containing a single C-terminal domain catalyzes negative supercoiling of DNA by decreasing the linking number in steps of two." Nucleic Acids Res.

    1. Author Response

      Reviewer #3 (Public Review):

      Strengths:

      NanoPDLIM2, nanotechnologies that efficiently deliver lentivirus overcomes resistance to chemotherapy and anti-PD-1 immunotherapy. This is a new strategy for enhancing the efficiency of immune checkpoint inhibitors.

      This finding is important from a clinical translation perspective, but I have several minor concerns.

      Weaknesses:

      1) Please describe the mechanism of increased MHC class I and PD-L1 by PDLIM2.

      Our previous studies showed that PDLIM2 induces MHC-I induction through decreasing STAT3 whereas it is dispensable for PD-L1 expression (Sun et al, 2019, PMID: 31757943). In line with the studies, PD-L1 is induced by chemotherapeutic drugs, but not by NanoPDLIM2 (Figure 6A). Together with the roles of PDLIM2 in repressing RelA-dependent MDR1 induction by chemotherapy and in preventing expression of cell survival and proliferation genes by targeting both RelA and STAT3 (Sun et al, 2019, PMID: 31757943), further providing the mechanistic basis for the combination and synergistic effect of nanoPDLIM2, anti-PD-1 and chemo drugs. The improvement has now been further incorporated.

      2) Please describe the mechanism of decreased MDR1, nuclear RelA and STAT3 by PDLIM2.

      Our previous studies demonstrated that PDLIM2 reduces MDR1 expression by degrading nuclear RelA (Sun et al, 2019, PMID: 31757943).

      3) Please determine whether PDLIM2 expression directly impacts immune cells (function and number)?

      As shown in Figure 5, NanoPDLIM2 increased the number and activation of tumor infiltrating lymphocytes (TILs); and in prior study, PDLIM2 knockout repressed the numbers of TILs and inhibited the activation of CD4+ and CD8+ T cells, while its re-expression in lung tumors led to T cell activation (Sun et al. 2019, PMID: 31757943). On the other hand, selective deletion of PDLIM2 in immune cells and in particular myeloid cells repressed the numbers and activation of TILs (Li et al, 2021, PMID: 33539325; PMCID: PMC8021114). Thus, PDLIM2 may impact immune cells both directly and indirectly, particularly when nanoparticles can deliver PDLIM2 into both tumor cells and tumor-associated immune cells (despite PDLIM2 is delivered into much fewer immune cells compared to tumor cells).

      4) What is the efficiency of PDLIM2 delivery? Does delivery efficiency determine anti-tumor effect?

      As shown in the manuscript, the dose of PDLIM2 used already shows high delivery (20-30 copies per tumor cell in Figure 3B) and therapeutic efficacy in the mouse model of refractory lung cancer and particularly when being combined with anti-PD-1 and chemo drugs. It is of interest to test different doses in the model for the best delivery and efficacy, which is actively being pursued in the lab.

      5) Authors used a non-immunogenic tumor model. Can you demonstrate the combination effect with PDLIM2 in immunogenic lung cancer models to determine whether the combination of PDLIM2 with anti-PD-1 Ab confers a synergistic effect without chemotherapy?

      Yes, it is of interest to demonstrate the combination of PDLIM2 and anti-PD-1 in immunogenic lung cancer models with chemotherapy although a synergy is highly expected. The greatest challenge in the lung cancer field is the low response of non-immunogenic tumor, which is the focus of the current manuscript.

      6) On page 11, % change can make one over-interpret data.

      The % change has been removed from the manuscript.

      7) In Figure 5, what is the difference between 5A and 5D?

      Figure 5A shows the increase of TILs by nanoPDLIM2 in animals that did not receive PD-1 blockade immunotherapy, Figure 5D shows the increase of TILs by nanoPDLIM2 in animals received PD-1 blockade immunotherapy.

      8) It is unclear whether PDLIM2 confers an additive or a synergistic effect with anti-PD-1/chemo.

      PDLIM2 nanotherapy confers a synergistic effect with chemotherapy on increasing apoptosis in tumors (Figure 4B) and tumor reduction (Figure 4A and 6E, left panel, tumor number), confers a synergistic effect with antiPD-1 on increasing CD4+ and CD8+ TILs (Figure 5A and 5D), and apoptosis in tumors (Figure 5F), and an additive effect on tumor reduction (Figure 5C and 6E), and confers a synergistic effect with chemotherapy plus anti-PD-1 on increasing CD4+ and CD8+ TILs (Figure 5A and 6F) and tumor reduction (Figure 6E, left panel, tumor number).

      9) Have the authors tested any toxicity in normal lungs?

      Same to tumor lungs, no obvious toxicity has been observed in normal lungs.

      Reviewer #1 (Recommendations For The Authors):

      The paper is clear and well-written, although some minor edits are needed. For example, the title could be changed to reflect both human and mouse studies in the manuscript for more general readers. Moreover, 'lung cancer' should be used instead of 'lung cancers'. The manuscript could be further improved by validating their findings in a different model and particularly the syngeneic model of metastatic lung cancer for a better overall survival time by the new combination therapy, given the fact that clinical trial studies usually start in patients with metastatic tumors. But this is optional because the therapeutic effect on primary lung cancer is already significant.

      Thanks for the correction and wonderful suggestions. The “lung cancers” were replaced with “lung cancer”, and the title was changed to “Improving PD-1 blockade plus chemotherapy for complete remission of lung cancer by nanoPDLIM2”.

      Reviewer #2 (Recommendations For The Authors):

      1) What is the rationale for i.v. injection of nanoparticles containing PDLIM2 plasmid? Intranasal administration of nanoparticles may potentially target nanoPDLIM2 specifically to the lungs. Another potential option is intranasal infection of mice with adenovirus expressing PDLIM2.

      The rationale for i.v. injection of nanoPDLIM2 is that iv injected nanoPDLIM2 first reach into the lung and more importantly tumor tissues as well as the convenience and high efficacy of mouse i.v. injection, particularly when multiple injections are needed. Mice are much less stressful compared to other intranasal or even intratracheal injection. Adenovirus can be used only once, because it will initiate ant-viral immune response in mice.

      2) The authors examine PDLIM2 expression in lung tumors 1 week after i.v. administration of nanoparticles (Fig. 3A). Do all tumor cells express PDLIM2 after nanoPDLIM2 treatment? How long does PDLIM2 persist in the tumors? The kinetics of PDLIM2 expression may be informative to help interpret the results from the various combination treatments given to the mice. Multiple rounds of nanoPDLIM2 treatment could potentially improve the efficacy of the treatment.

      For all the sections examined (n=6), PDLIM2 was re-expressed in most but not all lung cancer cells at 1-week of the i.v administration. Accordingly, nanoPDLIM2 was injected weekly. We are examining if PDLIM2 reexpression can last longer. We are also testing the best dose with the best efficacy.

      3) Does the plasmid DNA from nanoparticles trigger an innate immune response in the lung that contributes to anti-tumor responses?

      In line with previous studies showing no effect on immune responses (Bonnet et al. 2008. PMID: 18709489), the dose used in current study does not significantly affect immune cells in the lung, suggesting no obvious effect of nanoparticles with empty plasmid on innate immune response.

      4) In Fig. 4, does the combination of nanoPDLIM2 and chemotherapy diminish STAT3 nuclear staining?

      NanoPDLIM2 alone decreased nuclear STAT 3 in tumor cells (Figure 2C), it also diminished nuclear STAT3 in tumor cells with the combination of chemotherapy.

    1. Author Response

      We thank you for your careful review of our manuscript and helpful comments and suggestions. We have carefully considered each point and have addressed them by adding changes to the manuscript and figures. The text below detailed our responses and edits.

      Reviewer #1 (Public Review):

      Summary:

      Liao et al leveraged two powerful genomics techniques-CUT&RUN and RNA sequencing-to identify genomic regions bound by and activated or inactivated by SMAD1, SMAD5, and the progesterone receptor during endometrial stromal cell decidualization.

      Strengths:

      The authors utilized powerful next generation sequencing and identified important transcriptional mechanisms of SMAD1/5 and PGR during decidualization in vivo.

      Weaknesses:

      Overall, the manuscript and study are well structured and provide critical mechanistic updates on the roles of SMAD1/5 in decidualization and preparation of the maternal endometrium for pregnancy. Please consider the following to improve the manuscript:

      • Figure 4: A and C show bar graphs, not histograms. Please alter this phrasing.

      Figure legends were adjusted as suggested.

      • What post hoc test was performed on qPCR analyses? (Figure 6). It is evident that any assumptions of equal variance need to be negated due to the wide dispersion in experimental response invalidating the assumptions of a one-way ANOVA.

      Yes, a Tukey’s post hoc test was performed on the qPCR analyses. To address the reviewer’s question regarding equal variance, normality of the dataset was examined by D’agostino & Pearson test in GraphPad Prism. The data demonstrated a normal distribution pattern, thus justifying the one-way ANOVA test.

      • Figure 6: what data points are plotted? Are these technical replicates from individual wells or qPCR technical replicates?

      The dataset represents three technical and three biological data points.

      • Figure 6: Consider changing graph colors to increase visibility of error bars and data points.

      Thank you for this suggestion. The colors of the error bars in Figure 6 have been changed to increase visibility. Additionally, different shapes have been utilized to distinguish between different groups.

      • Figure 6 legend: no histograms are shown in this figure. Refer to all gene names utilizing proper nomenclature and conventions (gene names should be italicized).

      The legend was adjusted as suggested with the correct nomenclature implemented.

      • qPCR analyses: qPCR normalization should be done to at least two internal control genes, preferably three according to the MIQE guidelines (PMID: 19246619).

      As suggested, we have performed additional qPCR analysis with normalization done to three internal controls.

      • Supplement figure 2: graphs are bar graphs, not histograms.

      The legends have been changed as suggested.

      Reviewer #2 (Public Review):

      Summary:

      Liao and colleagues generated tagged SMAD1 and SMAD5 mouse models and identified genome occupancy of these two factors in the uterus of these mice using the CUT&RUN assay. The authors used integrative bioinformatic approaches to identify putative SMAD1/5 direct downstream target genes and to catalog the SMAD1/5 and PGR genome co-localization pattern. The role of SMAD1/5 on stromal decidualization was assayed in vitro on primary human endometrial stromal cells. The new mouse models offer opportunities to further dissect SMAD1 and SMAD5 functions without the limitation from SMAD antibodies, which is significant. The CUT&RUN data further support the usefulness of these mouse models for this purpose.

      Strengths:

      The strength of this study is the novelty of new mouse models and the valuable cistromic data derived from these mice.

      Weaknesses:

      The weakness of the present version of the manuscript includes the self-limited data analysis approaches such as the proximal promoter based bioinformatic filter and a missed opportunity to investigate the role of SMAD1/5 on determining the genome occupancy of major uterine transcription regulators.

      Thank you for the comments. We addressed the limitation of the promoter-based analysis in the discussion and pointed out the possibility of analyzing additional genomics features (Lines 548551). Based on the suggestions, we also included an analysis in which we compared SMAD1/5 binding activities in this study to known major uterine transcription regulators’ binding activities (namely, SOX17 and NR2F2) using published ChIP-seq data in the mouse uterus. Results from this analysis are discussed in Lines 426-436. Content from the adjusted manuscript is copied below.

      Lines 548-551:

      “From pathway enrichment analysis, we demonstrate that genes with SMAD1/5 and PR bound at the promoter regions are enriched for key pathways in directing the decidualization process, such as WNT and relaxin signaling pathways. Future studies can benefit from analyzing binding events beyond the promoter regions.”

      Lines 426-436:

      “To further evaluate the key roles of SMAD1/5 as major uterine transcription regulators, we cross-compared the genomic binding sites of SMAD1/5 with known key transcription factors, namely aforementioned SOX17 (Supplement Figure 1E), as well as NR2F2 (Supplement Figure 1F), an essential regulator of hormonal response, using our CUT&RUN data sets and published mouse uterine SOX17 and NR2F2 ChIP-seq data sets (GSE118328, GSE232583). Among the annotated genes, 5402 genes are shared between SMAD1/5 and SOX17, and 1922 genes are shared between SMAD1/5 and NR2F2. Such observations indicate a potential co-regulatory mechanism between SMAD1/5 and other key uterine transcription factors in maintaining appropriate uterine functions. Overall, our analyses demonstrate that the transcriptional activity of SMAD1, SMAD5, and PR coordinate the expression of key genes required for endometrial receptivity and decidualization.”

      Reviewer #3 (Public Review):

      Summary:

      As SMAD1/5 activities have previously been indistinguishable, these studies provide a new mouse model to finally understand unique downstream activation of SMAD1/5 target genes, a model useful for many scientific fields. Using CUT&RUN analyses with gene overlap comparisons and signaling pathway analyses, specific targets for SMAD1 versus SMAD5 were compared, identified, and interpreted. These data validate previous findings showing strong evidence that SMADs directly govern critical genes required for endometrial receptivity and decidualization, including cell adhesion and vascular development. Further, SMAD targets were overlapped with progesterone receptor binding sites to identify regions of potential synergistic regulation of implantation. The authors report strong correlations between progesterone receptor and SMAD1/5 direct targets to cooperatively promote embryo implantation. Finally, the authors validated SMAD1/5 gene regulation in primary human endometrial stromal cells. These studies provide a data-rich survey of SMAD family transcription, defining its role as a governor of early pregnancy.

      Strengths:

      This manuscript provides a valuable survey of SMAD1/5 direct transcriptional events at the time of receptivity. As embryo implantation is controlled by extensive epithelial to stromal molecular crosstalk and hormonal regulation in space and time, the authors state a strong, descriptive narrative defining how SMAD1/5 plays a central role at the site of this molecular orchestration. The implementation of cutting-edge techniques and models and simple comparative analyses provide a straightforward, yet elegant manuscript.

      Although the progesterone receptor exists as a major regulator of early pregnancy, the authors have demonstrated clear evidence that progesterone receptor with SMAD1/5 work in concert to molecularly regulate targets such as Sox17, Id2, Tgfbr2, Runx1, Foxo1 and more at embryo implantation. Additionally, the authors pinpoint other critical transcription factor motifs that work with SMADs and the progesterone receptor to promote early pregnancy transcriptional paradigms.

      Weaknesses:

      Although a wonderful new tool to ascertain SMAD1 versus SMAD5 downstream signaling, the importance of these factors in governing early pregnancy is not novel. Furthermore, functional validation studies are needed to confirm interactions at promoter regions. Addtionally, the authors presume that all overlapped genes are shared between progesterone receptor and SMAD1/5, yet some peak representations do not overlap. Although, transcriptional activation can occur at the same time, they may not occur in the same complex. Thus, further confirmation of these transcriptional events is warranted.

      Thank you for the review; we appreciate these valuable comments. Although we used an overlap approach to investigate the gene regulatory networks between SMAD1/5 and PR at the gene level, we functionally validated the regulatory effect in an in vitro decidualization model using a qPCR approach. We acknowledge that gene activations may not occur at the exact same complex, but functional validation screenings at the promoter level are beyond the scope of the study. However, we added the discussion about the possibility of proposed investigations in Lines 553-558. Our current dataset and validation studies support our conclusions with robust evidence. Content from Lines 553-558 is copied below.

      Lines 553-558: “In this study, we determined the overlapped transcriptional control between SMAD1/5 and PR at the gene level, and functionally validated the regulatory effect at the transcript level in a human stromal cell decidualization model. While we observe a subset of peak representations that do not overlap at the base pair level in the promoter regions, future functional screenings at the promoter level, such as luciferase reporter assays to assess transcriptional co-activation by SMAD1/5 and PR, will advance this study.”

      • Since whole murine uterus was used for these studies, the specific functions of SMAD1/5 in the stroma versus the epithelium (versus the myometrium) remain unknown. Specific roles for SMAD1/5 in the uterine stroma and epithelial compartments still need to be examined. Also, further work is needed to delineate binding and transcriptional activation of SMAD1/5 and the progesterone receptor in stromal versus epithelial uterine compartments.

      Thank you for the comments. Indeed, our study was performed in the whole mouse uterus, which includes stroma, epithelium and myometrium. Our previous data shows that nuclear SMAD1/5 are localized to both the stroma and epithelium in the decidua zone during the decidualization process at 4.5 dpc (PMID:34099644). Published in vivo studies also demonstrate the essential role of SMAD1/5 in the uterine epithelium and stroma compartments, respectively (PMIDs:35383354/27335065/17967875). Although we believe the binding/transcriptional activation of SMAD1/5 and PR occurs in both compartments based on the mouse phenotypic data, opportunities for further compartment-specific analysis were granted and discussion regarding such investigations was added (Lines 501-513). Content from Lines 501-513 is copied below.

      Lines 501-513:

      “Published studies have shown that nuclear SMAD1/5 localize to the stroma and epithelium during the decidualization process at 4.5dpc during the window of implantation. Conditional deletion of SMAD1/5 exclusively in the uterine epithelium using lactoferrin-icre (Ltf-icre) results in severe subfertility due to impaired implantation and decidual development. Conditional deletion of SMAD1/5/4 exclusively in the cells from mesenchymal lineage (including uterine stroma) using anti-Mullerian hormone type 2 receptor cre (Amhr2-cre) results in infertility with defective decidualization. Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that transcriptional co-regulation by SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out the potential coregulation of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

      • There are asynchronous gene responses in the SMAD1/5 ablated mouse model compared to the siRNA-treated human endometrial stromal cells. These differences can be confounding, and more clarity is required in understanding the meaning of these differences and as they relate to the entire SMAD transcriptome.

      Thank you for the comments. From the mouse models with SMAD1/5 conditional deletions, we observed phenotypic defects at 4.5 dpc, which is the beginning of decidualization in the mouse. Our study used human endometrial stromal cells as a model to validate our findings functionally, aiming to mimic the specific time point during decidualization. Differences between the two models may arise from the strategy used to perturb SMAD1/5; in the mouse, a complete knockout of SMAD1/5 was used, resulting in failed decidualization, while the human endometrial stromal cells used an siRNA knockdown approach, which decreased the potential for decidualization. As such, this information needs to be considered when evaluating genome-wide effects on the transcriptome. We added a discussion of this point to Lines 564-572. Content from Lines 564-572 is copied below.

      Lines 564-572:

      “Since mice only undergo decidualization upon embryo implantation whilst human stromal cells undergo cyclic decidualization in each menstrual cycle in response to rising levels of progesterone, asynchronous gene responses may occur in comparison between mouse models and human cells. However, cellular transformation during decidualization is conserved between mice and humans, which makes findings in the mouse models a valuable and transferable resource to be evaluated in human tissues. Accordingly, our functional validation studies were performed using human endometrial stromal cells induced to decidualize in vitro for four days, which models the early phases of decidualization. Additional transcriptomic studies of the SMAD1/5 perturbations in human endometrial stromal cells will be of great resource in understanding the entire SMAD1/5 regulomes in humans.”

      Reviewer #1 (Recommendations For The Authors):

      • Minor grammatical errors requiring attention such as inserting punctuation at the end of sentences and including figure legends prior to the end of sentence punctuation.

      Thanks for the comments. Additional proofreading was conducted for the revision.

      Reviewer #2 (Recommendations For The Authors):

      1) Between SMAD1 and SMAD5, does losing one SMAD affect the other SMAD's genome occupancy?

      Thanks for the comments. Based on the mouse phenotypic data that conditional deletion of SMAD1 in the uterus does not affect female fertility, while conditional deletion of SMAD5 leads to subfertility, and conditional deletion of both SMAD1 and SMAD5 leads to complete infertility. We believe losing one SMAD will affect the other SMAD's genome occupancy. This point is discussed in Lines 514-517, with contents copied below.

      Lines 514-517: “Although our studies herein confirm that SMAD1 and SMAD5 proteins have distinct transcriptional regulatory activities, our previous studies demonstrated that while SMAD5 can functionally replace SMAD1, SMAD1 cannot replace SMAD5 in the uterus. How this epistatic relationship is established in a tissue-specific manner still needs to be determined by further biochemical investigations.”

      2) In light of SMAD1/5 and PGR co-occupied cis-acting elements and coregulating uterine transcriptome, does loss of SMAD1/5 alter the PGR and ESR1 genome occupancy?

      Thanks for the comments. In the SMAD1/5 double conditional knockout mice, we observe the hyposensitivity towards progesterone and unopposed estrogen responses. We hypothesize that loss of SMAD1/5 alters PR genome occupancy and subsequently ER genome occupancy is altered as a secondary effect. To functionally address this question, genomic profiling studies need to be performed in the SMAD1/5 knockout mice, and, ideally, also performed in the PR knockout mice. However, such large-scale studies are beyond the scope of the current study and will not affect our conclusions under physiological conditions. We did include additional discussion regarding this comment in Lines 551-553, with the contents copied below.

      Lines 551-553: “Profiling the PR genome occupancy in the SMAD1/5 deficient mice would provide an interesting perspective to reevaluate the major regulatory roles of SMAD1/5 in mediating uterine transcriptomes.”

      3) In terms of investigating the impact of SAMD1/5 on cell type composition, perhaps the digital cytometry approach (e.g., PMID: 31061481) could provide unbiased inferences.

      Thank you for the comments. We included expression analysis of a subset of SMAD1/5 direct target genes over different uterine compartments (Figure 4E). We also added the discussion of the opportunities for further compartment-specific analysis, including but not limited to the digital cytometry approach in Lines 506-513, with the contents copied below.

      Line 506-513:

      “Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that the transcriptional co-regulatory roles of SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out potential co-regulatory roles of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

      4) The limitation of focusing on the promoter occupied SMADs should be discussed.

      Additional discussion of the limitation of focusing on the promoter regions was added in Lines 548-551, with contents copied below.

      Lines 548-551:

      “From pathway enrichment analysis, we demonstrate that genes with SMAD1/5 and PR bound at the promoter regions are enriched for key pathways in directing the decidualization process, such as WNT and relaxin signaling pathways. Future studies can benefit from analyzing binding events beyond the promoter regions.”

      5) Methods: The reagent and the condition for PGR CUT&RUN is missing.

      Information added in Line 153.

      1. Line 260: Please clarify the statement of "suggesting the transcriptional of PR depends on BMP/SMAD1/5 signaling".

      Thanks for the suggestion. The sentence was rephrased to (Lines 258-261) “Our previous studies revealed that conditional ablation of SMAD1 and SMAD5 in the uterus decreased P4 response during the peri-implantation period, suggesting that the transcriptional activities of PR depend on BMP/SMAD1/5 signaling.”

      7) Line 280-289: This statement belongs to the discussion section.

      The statement was moved as suggested.

      8) Figure 4E is not cited in the result section.

      Figure 4E was cited in the results section in the revised version. (Line 386)

      9) Figures 3C, 3D, 3E, 3F, 5B and 5D: please include the full lists in the supplemental data so that labs with limited bioinformatic capabilities could use these findings to facilitate scientific discovery.

      Data regarding the aforementioned figures were included in Supplement Tables 3-8 and Supplement Files 1-2.

      10) Figure 2B and Figure 5A: the heatmaps without further grouping on common and distinct genome occupancy among assayed factors provided minimum useful information. Please reconsider the presentation format in order to deliver more meaningful results.

      Figure 2B and Figure 5A were replotted with clustering using the k-means algorithm. Methods and legends were updated accordingly.

      Reviewer #3 (Recommendations For The Authors):

      To delineate specific roles for SMAD1/5 in the uterine stroma and epithelial compartments, methods such as single cell sequencing or spatial transcriptomic analysis may be warranted.

      The manuscript now includes the discussion of future opportunities in investigating the roles of SMAD1/5 in different uterine compartments using single-cell sequencing and/or spatial transcriptomic analysis (Lines 498-513), with contents copied below.

      Lines 498-513:

      “Our studies also examined the role of SMAD1/5 in mediating progesterone responses at the genomic and transcription levels. Similarly, our analysis was based on data sets generated from the whole mouse uterus, which contains multiple compartments of the uterine structures, including but not limited to epithelium and stroma. Published studies have shown that nuclear SMAD1/5 localize to the stroma and epithelium during the decidualization process at 4.5 dpc, during the window of implantation. Conditional deletion of SMAD1/5 exclusively in the uterine epithelium using lactoferrin-icre (Ltf-icre) results in severe subfertility due to impaired implantation and decidual development. Conditional deletion of SMAD1/5/4 exclusively in the cells from mesenchymal lineage (including uterine stroma) using anti-Mullerian hormone type 2 receptor cre (Amhr2-cre) results in infertility with defective decidualization. Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that the transcriptional co-regulatory roles of SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out potential co-regulatory roles of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

    1. Author Response

      We would like to thank the editor and the reviewers for their constructive comments and the chance to revise the manuscript. The suggestions have allowed us to improve our manuscript. We have been able to fulfil all reviewer comments and added new statistical analyses to examine associations for subsets of data. Whilst suggested by a reviewer, we did not perform large-scale experiments to confirm the viability of low sporozoite densities at different time-points post salivary gland colonization. For these assays there are currently no satisfactory in vitro models for sporozoites harvested from single mosquitoes and setting up and validating such experiments could be a PhD project in itself. We do consider this suggestion very relevant but beyond the scope of the current work.

      Relevantly, during the time the manuscript was under review at eLife, we have been able to examine the multiplicity of infection in our field experiments. This was, as written in the original manuscript, a key reason to also perform experiments in the field where there is a greater diversity of parasite lines. We have successfully performed AMA-1 amplicon deep sequencing on infected mosquito salivary glands and infected skins. Although this does not change the key messages of the manuscript and is secondary to our main hypothesis, we do consider it a relevant addition since we were able to demonstrate that for some infected mosquitoes from the Burkina Faso study, multiple clones were expelled by mosquitoes during probing on a single piece of artificial skin. We have added a short paragraph to our revised manuscript and updated the acknowledgement section to include the supporting researcher who conducted those experiments.

      Reviewer #1 (Public Review):

      Summary: There is a long-believed dogma in the malaria field; a mosquito infected with a single oocyst is equally infectious to humans as another mosquito with many oocysts. This belief has been used for goal setting (and modelling) of malaria transmission-blocking interventions. While recent studies using rodent malaria suggest that the dogma may not be true, there was no such study with human P. falciparum parasites. In this study, the numbers of oocysts and sporozoite in the mosquitoes and the number of expelled sporozoites into artificial skin from the infected mosquito was quantified individually. There was a significant correlation between sporozoite burden in the mosquitoes and expelled sporozoites. In addition, this study showed that highly infected mosquitoes expelled sporozoites sooner.

      Strengths:

      • The study was conducted using two different parasite-mosquito combinations; one was lab-adapted parasites with Anopheles stephensi and the other was parasites, which were circulated in infected patients, with An. coluzzii. Both combinations showed statistically significant correlations between sporozoite burden in mosquitoes and the number of expelled sporozoites.

      • Usually, this type of study has been done in group bases (e.g., count oocysts and sporozoites at different time points using different mosquitoes from the same group). However, this study determined the numbers in individual bases after multiple optimization and validation of the approach. This individual approach significantly increases the power of correlation analysis.

      Weaknesses:

      • In a natural setting, most mosquitoes have less than 5 oocysts. Thus, the conclusion is more convincing if the authors perform additional analysis for the key correlations (Fig 3C and 4D) excluding mosquitoes with very high total sporozoite load (e.g., more than 5-oocyst equivalent load).

      In the revised manuscript, we have also performed our analysis including only the subset of mosquitoes with low oocyst burden. In our Burkina Faso experiments, where we could not control oocyst density, 48% (15/31) of skins were from mosquitoes with <5 oocyst sheets. Whilst low oocyst densities were thus not very uncommon, we acknowledge that this may have rendered some comparisons underpowered. At the same time, we observe a strong positive trend between oocyst density and sporozoite density and between salivary gland sporozoite density and mosquito inoculum. This makes it very likely that this trend is also present at lower oocyst densities, an association where sporozoite inoculation saturates at high densities is plausible and has been observed before for rodent malaria (DOI: 10.1371/journal.ppat.1008181) whilst we consider it less likely that sporozoite expelling would be more efficient at low (unmeasured) sporozoite densities.

      • As written as the second limitation of the study, this study did not investigate whether all expelled sporozoites were equally infectious. For example, Day 9 expelled sporozoites may be less infectious than Day 11 sporozoites, or expelled sporozoites from high-burden mosquitoes may be less infectious because they experience low nutrient conditions in a mosquito. Ideally, it is nice to test the infectivity by ex vivo assays, such as hepatocyte invasion assay, and gliding assay at least for salivary sporozoites. But are there any preceding studies where the infectivity of sporozoites from different conditions was evaluated? Citing such studies would strengthen the argument.

      We appreciate this thought and can see the value of these experiments. We are not aware of any studies that examined sporozoite viability in relation to the day of salivary gland colonization or sporozoite density.

      One previous study assessed the NF54 sporozoite infectivity on different days post infection (days 12-13-14-15-16-18) and observed no clear differences in ‘per sporozoite hepatocyte invasion capacity’ over this period (DOI: 10.1111/cmi.12745). We nevertheless agree that it is conceivable that sporozoites require maturation in the salivary glands and might not all be equally infectious. While hepatocyte invasion experiments are conducted with bulk harvesting of all the sporozoites that are present in the salivary glands, it would even be more interesting to assess the invasion capacity of the smaller population of sporozoites that migrate to the proboscis to be expelled. This would, as the reviewer will appreciate, be a major endeavour. To do this well the expelled sporozoites would need to be harvested from the salivary glands/proboscis and used in the best and most natural environment for invasion. The suggested work would thus depend on the availability of primary hepatocytes since conventional cell-lines like HC-04 are likely to underestimate sporozoite invasion. Importantly, there are currently no opportunities to include the barrier of the skin environment in invasion assays whilst this may be highly important in determining the likelihood that sporozoites manage to achieve invasion and give rise to secondary infections. In short, we agree with the reviewer that these experiments are of interest but consider these well beyond the scope of the current work. We have added a section to the Discussion section to highlight these future avenues for research. ‘Of note, our assessments of EIP and of sporozoite expelling did not confirm the viability of sporozoites. Whilst the infectivity of sporozoites at different time-points post infection has been examine previously (https://doi.org/10.1111/cmi.12745), these experiments have never been conducted with individual mosquito salivary glands. To add to this complexity, such experiments would ideally retain the skin barrier that may be a relevant determinant for invasion capacity and primary hepatocytes.’

      • Since correlation analyses are the main points of this paper, it is important to show 95% CI of Spearman rank coefficient (not only p-value). By doing so, readers will understand the strengths/weaknesses of the correlations. The p-value only shows whether the observed correlation is significantly different from no correlation or not. In other words, if there are many data points, the p-value could be very small even if the correlation is weak.

      We appreciate this comment and agree that this is indeed insightful. We have added the 95% confidence intervals to all figure legends and main text. We also provide them below.

      Fig 3b: 95% CI: 0.74, 0.85

      Fig 3c: 95% CI: 0.17, 0.50

      Fig 4c: 95% CI: 0.80, 0.95

      Fig 4d: 95% CI: 0.52, 0.82

      Supp Fig 5a: 95% CI: 0.74, 0.85

      Supp Fig 5b: 95% CI: 0.73, 0.93

      Supp Fig 6: 95% CI: 0.11, 0.48

      Supp Fig 7: 95% CI: -0.12, 0.16

      Reviewer #2 (Public Review):

      Summary: The malaria parasite Plasmodium develops into oocysts and sporozoites inside Anopheles mosquitoes, in a process called sporogony. Sporozoites invade the insect salivary glands in order to be transmitted during a blood meal. An important question regarding malaria transmission is whether all mosquitoes harbouring Plasmodium parasites are equally infectious. In this paper, the authors investigated the progression of P. falciparum sporozoite development in Anopheles mosquitoes, using a sensitive qPCR method to quantify sporozoites and an artificial skin system to probe for parasite expelling. They assessed the association between oocyst burden, salivary gland infection intensity, and sporozoites expelled.

      The data show that higher sporozoite loads are associated with earlier colonization of salivary glands and a higher prevalence of sporozoite-positive salivary glands and that higher salivary gland sporozoite burdens are associated with higher numbers of expelled sporozoites. Intriguingly, there is no clear association between salivary gland burdens and the prevalence of expelling, suggesting that most infections reach a sufficient threshold to allow parasite expelling during a mosquito bite. This important observation suggests that low-density gametocyte carriers, although less likely to infect mosquitoes, could nevertheless contribute to malaria transmission.

      Strengths: The paper is well written and the work is well conducted. The authors used two experimental models, one using cultured P. falciparum gametocytes and An. stephensi mosquitoes, and the other one using natural gametocyte infections in a field setup with An. coluzzii mosquitoes. Both studies gave similar results, reinforcing the validity of the observations. Parasite quantification relies on a robust and sensitive qPCR method, and parasite expelling was assessed using an innovative experimental setup based on artificial skin.

      Weaknesses: There is no clear association between the prevalence of sporozoite expelling and the parasite burden. However, high total sporozoite burdens are associated with earlier and more efficient colonization of the salivary glands, and higher salivary gland burdens are associated with higher numbers of expelled sporozoites. While these observations suggest that highly infected mosquitoes could transmit/expel parasites earlier, this is not directly addressed in the study. In addition, whether all expelled sporozoites are equally infectious is unknown. The central question, i.e. whether all infected mosquitoes are equally infectious, therefore remains open.

      We agree that the manuscript provides important steps forward in our understanding of what makes an infectious mosquito but does not conclusively demonstrate that highly infected mosquitoes are more likely to initiate a secondary infection. We consider this to be beyond the scope of the current work although the current work lays the foundation for these important future studies. For human Plasmodium infections the most satisfactory answer on the infectiousness of low versus high infected mosquitoes comes from controlled human infection models. In response to reviewer comments, we have extended our Discussion section to highlight this importance. To accommodate the (very fair) reviewer comments, we have avoided any phrasings that suggest that our findings demonstrate differences in transmission.

      Reviewer #3 (Public Review):

      Summary: This study uses a state-of-the-art artificial skin assay to determine the quantity of P. falciparum sporozoites expelled during feeding using mosquito infection (by standardised membrane feeding assay SMFA) using both cultured gametocytes and natural infection. Sporozoite densities in salivary glands and expelled into the skin are quantified using a well-validated molecular assay. These studies show clear positive correlations between mosquito infection levels (as determined by oocyst numbers), sporozoite numbers in salivary glands, and sporozoites expelled during feeding. This indicates potentially significant heterogeneity in infectiousness between mosquitoes with different infection loads and thus challenges the often-made assumption that all infected mosquitoes are equally infectious.

      Strengths: Very rigorously designed studies using very well validated, state-of-the-art methods for studying malaria infections in the mosquito and quantifying load of expelled sporozoites. This resulted in very high-quality data that was well-analyzed and presented. Both sources of gametocytes (cultures vs. natural infection) show consistent results further strengthening the quality of the results obtained.

      Weaknesses: As is generally the case when using SMFAs, the mosquito infections levels are often relatively high compared to wild-caught mosquitoes (e.g. Bombard et al 2020 IJP: median 3-4 ), and the strength of the observed correlations between oocyst sheet and salivary gland sporozoite load even more so between salivary gland sporozoite load and expelled sporozoite number may be dominated by results from mosquitoes with infection levels rarely observed in wild-caught mosquitoes. This could result in an overestimation of the importance of these well-observed positive relationships under natural transmission conditions. The results obtained from these excellently designed and executed studies very well supported their conclusion - with a slight caveat regarding their application to natural transmission scenarios

      For efficiency and financial reasons, we have worked with an approach to enhance mosquito infection rates. If we had worked with gametocytes at physiological concentrations and a small number of donors, we probably have had considerably lower mosquito infection rates. Whilst this would indeed result in lower infection burdens in the sparse infected mosquitoes, addressing the reviewer concern, it would have made the experiments highly inefficient and expensive. The skin mimic was initially provided free of charge when the matrix was close to the expiry date but for the experiments in Burkina Faso we had to purchase the product at market value. Whilst we consider the biological question sufficiently important to justify this investment – and think our findings prove us right – it remained important to avoid using skins for uninfected mosquitoes. Since oocyst prevalence and density are strongly correlated (doi: 10.1016/j.ijpara.2012.09.002; doi: 10.7554/eLife.34463), a low oocyst density in natural infections typically coincides with a high proportion of negative mosquitoes.

      Of note, our approach did result in the inclusion of 15 skins from infected mosquitoes with 1-4 oocysts. This number may be modest but we did include observations from this low oocyst range which is, we agree, highly important for better understanding malaria epidemiology.

      This work very convincingly highlights the potential for significant heterogeneity in the infectiousness between individual P. falciparum-infected mosquitoes. Such heterogeneity needs to be further investigated and if again confirmed taken into account both when modelling malaria transmission and when evaluating the importance of low-density infections in sustaining malaria transmission.

      Reviewer #4 (Public Review):

      Summary: The study compares the number of sporozoites expelled by mosquitoes with different Plasmodium infection burden. To my knowledge this is the first report comparing the number of expelled P. falciparum sporozoites and their relation to oocyst burden (intact and ruptured) and residual sporozoites in salivary glands. The study provides important evidence on malaria transmission biology although conclusions cannot be drawn on direct impact on transmission.

      Strengths: Although there is some evidence from malaria challenge studies that the burden of sporozoites injected into a host is directly correlated with the likelihood of infection, this has been done using experimental infection models which administer sporozoites intravenously. It is unclear whether the same correlation occurs with natural infections and what the actual threshold for infection may be. Host immunity and other host related factors also play a critical role in transmission and need to be taken into consideration; these have not been mentioned by the authors. This is of particular importance as host immunity is decreasing with reduction in transmission intensity.

      Weaknesses: The natural infections reported in the study were not natural as the authors described. Gametocyte enrichment was done to attain high oocyst infection numbers. Studying natural infections would have been better without the enrichment step. The infected mosquitoes have much larger infection burden than what occurs in the wild.

      Nevertheless, the findings support the same results as in the experiments conducted in the Netherlands and therefore are of interest. I suggest the authors change the wording. Rather than calling these "natural" infections, they could be called, for example, "experimental infections with wild parasite strains".

      We have addressed these concerns and, in the process, also changed our manuscript title. The following sentences have been changed:

      “It is currently unknown whether all Plasmodium falciparum infected mosquitoes are equally infectious. We assessed sporogonic development using cultured gametocytes in the Netherlands and natural infections in Burkina Faso”.

      Now reads: “It is currently unknown whether all Plasmodium falciparum infected mosquitoes are equally infectious. We assessed sporogonic development using cultured gametocytes in the Netherlands and experimental infections with naturally circulating parasite strains in Burkina Faso”. 226-228 “Experimental infections with naturally circulating parasite strains show comparable correlation between oocyst density, salivary gland density and sporozoite inoculum”.

      Has now replaced the original phrasing: “Natural infected mosquitoes by gametocyte carriers in Burkina Faso show comparable correlation between oocyst density, salivary gland density and sporozoite inoculum”.

      I do not believe the study results generate sufficient evidence to conclude that lower infection burden in mosquitoes is likely to result in changes to transmission potential in the field. In study limitations section, the authors say "In addition, our quantification of sporozoite inoculum size is informative for comparisons between groups of high and low-infected mosquitoes but does not provide conclusive evidence on the likelihood of achieving secondary infections. Given striking differences in sporozoite burden between different Plasmodium species - low sporozoite densities appear considerably more common in mosquitoes infected with P. yoelii and P. berghei the association between sporozoite inoculum and the likelihood of achieving secondary infections may be best examined in controlled human infection studies. However, in the abstract conclusion the authors state "Whilst sporozoite expelling was regularly observed from mosquitoes with low infection burdens, our findings indicate that mosquito infection burden is associated with the number of expelled sporozoites and may need to be considered in estimations of transmission potential." Kindly consider ending the sentence at "expelled sporozoites." Future studies on CHMI can be recommended as a conclusion if authors feel fit.

      We agree that we need to be very cautious with conclusions on the impact of our findings for the infectious reservoir. We have rephrased parts of our abstract and have updated the Discussion section following the reviewer suggestions. We agree with the reviewer that CHMI studies are recommended and have expanded the Discussion section to make this clearer. The sentence in the abstract now ends as:

      "Whilst sporozoite expelling was regularly observed from mosquitoes with low infection burdens, our findings indicate that mosquito infection burden is associated with the number of expelled sporozoites. Future work is required to determine the direct implications of these findings for transmission potential."

      Reviewer #1 (Recommendations For The Authors):

      • Prevalence data shown in Fig 2A and Table S1 are different. For example, >50K at Day 11, Fig 2A shows ~85% prevalence, but Table S1 says 100%. If the prevalence in Table S1 shows a proportion of observations with positive expelled sporozoites (instead of a proportion of positive mosquitoes shown in Fig 2A), then the prevalence for <1K at Day 11 cannot be 6.7% (either 0 or 20% as there were a total of 5 observations). So in either case, it is not clear why the numbers shown in Fig 2A and Table S1 are different.

      Figure 2A and Table S2 are estimated prevalence and odds ratios from an additive logistic regression model (i.e. excluding the interaction between day and sporozoite categories). Table S1 includes this interaction when estimating prevalence and odds ratios and as we can see some categories in the interaction were extremely small resulting in blown up confidence intervals especially in day 11. So Table S1 and Fig 2A are the results from two different models. Whilst our results are thus correct, we can understand the confusion and have added a sentence to explain the model used in the figure/table legends.

      Figure. 2 Extrinsic Incubation Period in high versus low infected mosquitoes. A. Total sporozoites (SPZ) per mosquito in body plus salivary glands (x-axis) were binned by infection load <1k; 1k-10k; 10k-50k; >50k and plotted against the proportion of mosquitoes (%) that were sporozoite positive (y-axis) as estimated from an additive logistic regression model with factors day and SPZ categories. Supplementary Table S1. The extrinsic incubation period of P. falciparum in An. stephensi estimated by quantification of sporozoites on day 9, 10, 11 by qPCR. Based on infection intensity mosquitoes were binned into four categories (<1k, 1k-10k, 10k-50k, >50) that was assessed by combining sporozoite densities in the mosquito body and salivary gland. Prevalences and odds ratios were estimated from a logistic regression model with factors day, SPZ category and their interaction.

      There are 3 typos in the paper. Please fix them.

      Line 464; ...were counted using a using an incident....

      Line 473; Supplementary Figure 7 should be Fig S8.

      Line 508: ...between days 9 and 10 using a (t=-2.0467)....

      We appreciate the rigour in reviewing our text and have corrected all typos.

      Reviewer #2 (Recommendations For The Authors):

      High infection burdens may result in earlier expelling capacity in mosquitoes, which would reflect more accurately the EIP. The fact that earlier colonization of SG and correlation between SG burden and numbers expelled suggest it could be the case, but it would be interesting to directly measure the prevalence of expelling over time to directly assess the effect of the sporozoite burden (not just at day 15 but before). This could reveal how the parasite burden in mosquitoes is a determinant of transmission.

      We appreciate this suggestion and will consider this for future experiments. It adds another variable that is highly relevant but will also complicate comparisons where sporozoite expelling is related to both time since infectious blood meal and salivary gland sporozoite density (that is also dependent on time since infectious bloodmeal). Moreover, we then consider it important to measure this over the entire duration of sporozoite expelling, including late time-points post infectious bloodmeal. This may form part of a follow-up study.

      Another question is whether all sporozoites (among expelled parasites) are equally infective, i.e. susceptible to induce secondary infection. If not, this could reconcile the data of this study and previous results in the rodent model where high burdens were associated with an increased probability to transmit.

      As also indicated above, we are aware of a single study that assessed NF54 sporozoite infectivity on different days post infection (days 12-13-14-15-16-18) and observed no clear differences in ‘per sporozoite hepatocyte invasion capacity’ over this period (DOI: 10.1111/cmi.12745). We nevertheless agree that it is conceivable that sporozoites require maturation in the salivary glands and might not all be equally infectious. While hepatocyte invasion experiments are conducted with bulk harvesting of all the sporozoites that are present in the salivary glands, it would even be more interesting to assess the invasion capacity of the smaller population of sporozoites that migrate to the proboscis to be expelled. This would, as the reviewer will appreciate, be a major endeavour. To do this well the expelled sporozoites would need to be harvested from the salivary glands/proboscis and used in the best and most natural environment for invasion. The suggested work would thus depend on the availability of primary hepatocytes since conventional cell-lines like HC-04 are likely to underestimate sporozoite invasion. Importantly, there are currently no opportunities to include the barrier of the skin environment in invasion assays whilst this may be highly important in determining the likelihood that sporozoites manage to achieve invasion and give rise to secondary infections. In short, we agree with the reviewer that these experiments are of interest but consider these well beyond the scope of the current work. We have added a section to the Discussion section to highlight these future avenues for research. ‘Of note, our assessments of EIP and of sporozoite expelling did not confirm the viability of sporozoites. Whilst the infectivity of sporozoites at different time-points post infection has been examine previously (ref), these experiments have never been conducted with individual mosquito salivary glands. To add to this complexity, such experiments would ideally retain the skin barrier that may be a relevant determinant for invasion capacity and primary hepatocytes.’

      The authors evaluated oocyst rupture at day 18, i.e. 3 days after feeding experiments (performed at day 15). Did they check in control experiments that the prevalence of rupture oocysts does not vary between day 15 and day 18?

      We did not do this and consider it very unlikely that there is a noticeable increase in the number of ruptured oocysts between days 15 and 18. We observe that salivary gland invasion plateaus around day 12 and the provision of a second bloodmeal that is known to accelerate oocyst maturation and rupture (doi: 10.1371/journal.ppat.1009131) makes it even less likely that a relevant fraction of oocysts ruptures very late. Perhaps most compellingly, the time of oocyst rupture will depend on nutrient availability and rupture could thus occur later for oocysts from a heavily infected gut compared to oocysts from mosquitoes with a low infection burden. We observe a very strong association between salivary gland sporozoite density (day 15) and oocyst density (assessed at day 18) without any evidence for change in the number of sporozoites per oocyst for different oocyst densities. In our revised manuscript we have also assessed correlations for different ranges of oocyst intensities and see highly consistent correlation coefficients and find no evidence for a change in ‘slope’. If oocyst rupture would regularly happen between days 15 and 18 and this late rupture would be more common in heavily infected mosquitoes, we would expect this to affect the associations presented in figures 3B and 4C This is not the case.

      The authors report higher sporozoite numbers per oocyst and a higher proportion of SG invasion as compared to previous studies (30-50% rather than 20%). How do they explain these differences? Is it due to the detection method and/or second blood meal? Or parasite species?

      We were also intrigued by these findings in light of existing literature. To address potential discrepancies, it is indeed possible that the 2nd bloodmeal made a difference. In addition, NF54 is known to be a highly efficient parasite in terms of gametocyte formation and transmission. And there are marked differences in these performances between NF54 isolates and definitely between NF54 and its clone 3D7 that is regularly used. We also used a molecular assay to detect and quantify sporozoites but consider it less likely that this is a major factor in terms of explaining SG invasion since sporozoite densities were typically within the range that would be detected by microscopy. We can only hypothesize that the 2nd bloodmeal may have contributed to these findings and acknowledge this in the revised Discussion section.

      The median numbers of expelled sporozoites seem to be higher in the natural gametocyte infection experiments as compared to the cultures. Is it due to the mosquito species (An. coluzzii versus An. stephensi?).

      The added value of our field experiments, a more relevant mosquito species and more relevant parasite isolates, is also a weakness in terms of understanding possible differences between in vitro experiments and field experiments with naturally circulating parasite strains. We only conclude that our in vitro experiments do not over-estimate sporozoite expelling by using a highly receptive mosquito source and artificially high gametocyte densities. We have clarified this in the revised Discussion.

      39% of sporozoite-positive mosquitoes failed to expel, irrespective of infection densities. Could the authors discuss possible explanations for this observation?

      In paragraph 304-307 we now write that:” This finding broadly aligns with an earlier study of Medica and Sinnis that reported that 22% of P. yoelii infected mosquitoes failed to expel sporozoites. For highly infected mosquitoes, this inefficient expelling has been related to a decrease of apyrase in the mosquito saliva”.

      In Figure 3, it would be interesting to zoom in the 0-1k window, below the apparent threshold for successful expelling.

      We have generated correlation estimates for different ranges of oocyst and sporozoite densities and added these in Supplementary Table 5. We agree that this helps the reader to appreciate the contribution of different ranges of parasite burden to the observed associations.

      In Fig S8. Did they observe intact oocysts with fixed samples? These could be shown as well in the figure.

      We have incorporated this comment. An intact oocyst from fixed samples was now added to Fig S10.

      Minor points

      -line 119: LOD and LOQ could be defined here.

      We agree that this should have been defined. We changed line 119 to explain LOD and LOQ to: …“the limit of detection (LOD) and limit of quantification (LOQ)”….

      • line 126: the title does not reflect the content of this paragraph.

      We have changed the title: “Immunolabeling allows quantification of ruptured oocysts ”into: A comparative analysis of oocyst densities using mercurochrome staining and anti-CSP immunostaining.

      -line 269: infectivity is not appropriate. The data show colonization of SG.

      Line 269: infectivity has been changed with colonization of salivary glands.

      There seems to be a problem with Fig S6. The graph seems to be the same as Fig 3C. Please check whether the graph and legends are correct.

      Supplementary Figure 6 shows the sporozoite expelling density in relation to infection burden with a threshold set at > 20 sporozoites while Fig 3C shows the total sporozoite density (residual salivary gland sporozoites + sporozoites expelled, X-axis) in relation to the number of expelled sporozoites (Y-axis) by COX-1qPCR without any threshold density. We have explained this in more detail in the revised supplemental figure where we now state

      “Of note, this figure differs from Figure 3C in the main text in the following manner. This figure presents sporozoite expelling density in relation to infection burden with a threshold set at > 20 sporozoites to conclude sporozoite positivity while Figure 3C shows the total sporozoite density (residual salivary gland sporozoites + sporozoites expelled, X-axis) in relation to the number of expelled sporozoites (Y-axis) by COX-1 qPCR without any threshold density and thus includes all observations with a qPCR signal”

      Reviewer #3 (Recommendations For The Authors):

      Congratulations to the authors for the really excellently designed and rigorously conducted studies.

      My main concern is in regards to the relatively high oocyst numbers in their experimental mosquitoes (from both sources of gametocytes) compared to what has been reported from wild-caught mosquitoes in previous studies in Burkina Faso.

      We have addressed this concern above. For completeness, we include the main points here again. We enriched gametocytes for efficiency reasons, experiments on gametocytes at physiological concentrations would have resulted in a lower oocyst density (and thus more ‘natural’ although a minority of individuals achieves very high oocyst densities in all studies that included a broad range of oocyst densities (e.g. doi: 10.1016/j.exppara.2014.12.010; doi: 10.1016/S1473-3099(18)30044-6). Of note, we did include 15 skins from low oocyst densities (1-4 oocysts). Whilst low oocyst densities were thus not very uncommon in our sample set, we acknowledge that this may have rendered some comparisons underpowered. At the same time, we observe a strong positive trend between oocyst density and sporozoite density and between salivary gland sporozoite density and mosquito inoculum. This makes it very likely that this trend is also present at lower oocyst densities, an association where sporozoite inoculation saturates at high densities is plausible and has been observed before for rodent malaria (DOI: 10.1371/journal.ppat.1008181) whilst we consider it less likely that sporozoite expelling would be more efficient at low (unmeasured) sporozoite densities. In the revised manuscript we have also performed our analysis including only the subset of mosquitoes with low oocyst burden.

      The best way to address this would be to do comparable artificial skin-feeding experiments on such wild-caught mosquitoes, but I appreciate that this is very difficult to do.

      This would indeed by difficult to do. Mostly because infection status can only be examined post-hoc and it is likely that >95% of mosquitoes are sporozoite negative at the moment experiments are conducted (in many settings this will even be >99%). Importantly, also in wild-caught mosquitoes very high oocyst burdens are observed in a small but relevant subset of mosquitoes (doi: 10.1016/j.ijpara.2020.05.012).

      Instead, I would suggest the authors conduct addition analysis of their data using different cut-offs for maximum oocyst numbers (e.g. <5, <10, <20) to determine if these correlations hold across the entire range of observed oocyst sheets and salivary gland sporozoite load.

      We have provided these calculations for the proposed range of oocyst numbers. In addition, we also provided them for a range of sporozoite densities. These findings are now provided in

      Entire range of observed oocyst sheets and salivary gland sporozoite load. A minor point on the regression lines in Figures 3 & 4: both variables in these plots have inherent variation (measurement & natural), but regression techniques such as reduced major exit regression (MAR) that allow error in both x and y variables may be preferable to a standard lines regression. Also, as it is implausible that mosquitoes with zero sporozoite in salivary glands expel several hundred sporozoites at feeding, the regression should probably also be constrained to pass through the 0,0 point.

      Since the main priority of the analyses is the correlation, and not the fit of the regression line – which is only for indication, and also because of the availability of software, we did not change the type of regression. We have however added a disclaimer to the legend, and we have also forced the intercept to 0 – which does indeed better reflect the biological association. Additionally we added 95% confidence intervals to all Spearman’s correlation coefficients in the legends.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors propose a hypothesis for ovarian carcinogenesis based on epidemiological data, and more specifically they suggest that the latter relates to ascending genital tract "infection" or "dysbiosis", the resulting fallopian tube inflammation ultimately predisposing to ovarian cancer.

      While this hypothesis would ideally be addressed in a longitudinal set-up with repeated female genital tract sampling, such an approach is obviously hard to realize. Rather, the authors present this hypothesis as a rationale for a cross-sectional study involving 81 patients with ovarian cancer (most with the most common subtype of high grade serous ovarian carcinoma, though other subtypes were also included), as well as 106 control patients with various non-infectious conditions including endometriosis and benign ovarian cysts. In all patients was there a comprehensive microbiome sampling of ovarian surface/fallopian tube, cervix and peritoneal cavity as well sampling of a number of potential sources of contamination, including surgery sites, ambient environment, consumables used in the DNA extraction and sequencing pipeline, etc. In line with the hypothesis presented at the outset, species with a threshold of at least 100 reads in both at least one cervical and at least one fallopian tube sample, while absent from environmental swabs, were considered relevant to the postulated pathway.

      Remarkably, fallopian tube microbiota in ovarian cancer patients tended to cluster more closely to those retrieved from the paracolic gutter, than fallopian tube microbiota in non-cancer controls, which showed more relative similarity to vaginal/genital tract microbiota.

      Although not really addressed by the authors, there also seem to be quite a few differences, at least in terms of abundance, in cervical microbiota between ovarian cancer patients and controls as well, which is an interesting finding, even when accounting for differences in age distribution between ovarian cancer patients and included control patients.

      Overall, very few data are available thus far on the upper genital tract/fallopian tube microbiome, while also invariably controversial, as it has proven extremely difficult to obtain pelvic samples in a valid, "sterile" manner, i.e. without affecting a resident low-biomass microbiome to be analyzed. The authors took a number of measures to counter so, and in this respect, this is likely the largest and most valid study on the subject, even though biases and contamination can never be completely excluded in this context.

      As such, I believe the strength of this study and paper primarily relates to the rigour of the methodology, thereby giving us a valuable insight in the presumed fallopian tube/ovarian surface microbiome, which may definitely serve as an impetus and a reference to future translational ovarian cancer research, or ovarian microbiome research for that matter.

      I believe that the authors should acknowledge in more detail, that the data obtained from their cross-sectional study, valid as these are, do not provide any direct support to the hypothesis - albeit also plausible - set forth, a discussion that I somehow missed to a certain extent. It is important to realize in this and related contexts that neoplasia may well induce microbiome alterations through a variety of mechanisms, hence microbiome alterations not per se being causative. Conclusions should therefore be more reserved. Along the same lines, potential biases introduced through the selection of control patients (some detail here would be insightful) also deserves some discussion, as it is not known, whether other conditions such as benign ovarian cysts or endometriosis have some relationship with the human microbiome, be it causative or 'reversely causative', see for instance very recent work in Science Translational Medicine.

      We appreciate the reviewer’s detailed review and thoughtful comments. We have added the following sentences in the Discussion to address the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

      Reviewer #2 (Public Review):

      The authors aimed to investigate the microbiota present in the fallopian tubes (FT) and its potential association with ovarian cancer (OC). They collected swabs intraoperatively from the FT and other surgical sites as controls to profile the FT microbiota and assess its relationship with OC.

      They observed a clear shift in the FT microbiota of OC patients compared to non-cancer patients. Specifically, the FT of OC patients had more types of bacteria typically found in the gastrointestinal tract and the mouth. In contrast, vaginal bacterial species were more prevalent in non-cancer patients. Serous carcinoma, the most common OC subtype, showed a higher prevalence of almost all FT bacterial species compared to other OC subtypes.

      The strengths of the study include its large sample size, rigorous collection methods, and use of controls to identify the possible contaminants. Additionally, the study employed advanced sequencing techniques for microbiota analysis. However, there are some weaknesses to consider. The study relied on swabs collected intraoperatively, which may not fully represent the microbiota in the FT during normal physiological conditions. The study also did not establish causality between the identified bacteria and OC but rather demonstrated an association. Regardless, the findings are important and these questions need to be addressed by future studies. A few additions in data representation and analysis are instead recommended.

      Overall, the authors achieved their aims of identifying the FT microbiota and assessing its relationship with OC. The results support the conclusion that there is a clear shift in the FT microbiota in OC patients, paving the way for further investigations into the role of these bacteria in the pathogenesis of ovarian cancer.

      The identification of specific bacterial species associated with OC could contribute to the development of novel diagnostic and therapeutic approaches. The study design and the data generated here can be valuable to the research community studying the microbiota and its impact on cancer development. However, further research is needed to validate these findings and elucidate the underlying mechanisms linking the FT microbiota shift and OC.

      We appreciate the reviewer’s detailed review and positive comments.

      Reviewer #3 (Public Review):

      The findings of Bo Yu and colleagues titled "Identification of fallopian tube microbiota and its association with ovarian cancer: a prospective study of intraoperative swab collections from 187 patients" describes the identification of the fallopian tube microbiome and relationship with ovarian cancer. The studies are highly rigorous obtaining specimens from the fallopian tube, ovarian surfaces, paracolic gutter of patients of known or suspected ovarian cancer or benign tumor patients. The investigators took great care to ensure there was no or limited contamination including test the surgical suite air, as the test locations are from low abundance microbiota. The findings provide evidence that the microbiota in the fallopian tube, especially in ovarian cancer has similarities to gut microbial communities. This is a potentially novel observation.

      The studies investigate the microbiome of >1000 swabs from 81 ovarian cancer and 106 non-cancer patients. The sites collected are low biomass microbiota making the study particularly challenging. The studies provide descriptive evidence that the ovarian cancer fallopian tube microbiota contain species that are similar to the gut microbiota. In contrast the fallopian tube microbiota of non-cancer patients that exhibit more similarity to the uterine/cervical microbiota. This may be a relevant observation but is highly descriptive with limited insights on the functional relevance.

      The data indicate the presence of low biomass FT microbiota. The findings support the existence of FT microbiota in ovarian cancer that appears to be related to gut microbial species. While interesting, there is no insights on how and why these microbial species are found in the FT. The studies only identify the species but there is no transcriptomic analysis to provide an indication on whether the bacteria are activating DNA damage pathways. This is an interesting observation that requires more insights to address how these bacteria reach the fallopian tube and a related question is whether these bacteria are found in the peritoneum.

      An additional concern is whether these data can be used to develop biomarkers of disease and early detection of disease. can the investigators detect the ovarian cancer FT microbiota in cervical/vaginal secretions? That may yield more significant insights for the field.

      We appreciate the reviewer’s detailed review and thoughtful comments. We have added the following sentences in the Discussion to acknowledge the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

      Reviewer #1 (Recommendations For The Authors):

      I have no additional comments here.

      Reviewer #2 (Recommendations For The Authors):

      The data analysis and data representation could be improved by the following points:

      1. To compare the microbiota and assess the overall microbiota structure difference between the cancer vs non cancer cohort alpha- and beta-diversity of the microbial communities can be conducted.

      2. A differential abundance analysis could also be conducted to assess the differences at the genera and taxa level between the cancer vs non cancer cohorts.

      3. The analysis suggested above can also be conducted in the serous vs non serous cancer cohorts.

      4. In Figure 4 and 5 it would be more intuitive to show the predominant niche of each bacterium by color coding

      We appreciate these helpful suggestions from the reviewer. We have added Figure 2B to address the diversity as well as the differences between cancer versus non-cancer cohorts. We have added in the Results section the description of our findings in Figure 2B. We have added color coding to Figure 4 and 5 as the reviewer suggested.

      Reviewer #3 (Recommendations For The Authors):

      These studies are interesting but are very descriptive with no obvious approaches for understanding the mechanisms of FT microbiota in ovarian cancer. The identification of these bacteria is not sufficient to draw implications on their impact on ovarian cancer development or progression. This needs to be addressed.

      We agree with the reviewer and have added the following sentences in the Discussion to acknowledge the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

    1. Author Response

      Responses to public reviews

      Reviewer 1

      We thank the reviewer for the valuable and constructive comments and are pleased that the re-viewer finds our study timely and our behavioral results clear.

      1) The RSA basically asks on the lowest level, whether neural activation patterns (as measured by EEG) are more similar between linked events compared to non-linked events. At least this is the first question that should be asked. However, on page 11 the authors state: "We ex-amined insight-induced effects on neural representations for linked events [...]". Hence, the critical analysis reported in the manuscript fully ignores the non-linked events and their neu-ral activation patterns. However, the non-linked events are a critical control. If the reported effects do not differ between linked and non-linked events, there is no way to claim that the effects are due to experimental manipulation - neither imagination nor observation. Hence, instead of immediately reporting on group differences (sham vs. control) in a two-way in-teraction (pre vs. post X imagination vs. observation), the authors should check (and re-port) first, whether the critical experimental manipulation had any effect on the similarity of neural activation patterns in the first place.

      We completely agree that the non-link items are a critical control. Therefore, we had reported not only the results for linked but also for non-linked events on page 15, lines 336-350. We clarified this important point now on page 12 lines 283-286:

      “Subsequently, we examined insight-induced effects on neural representations for linked (vs. non-linked) events by comparing the change from pre- to post-insight (post-pre) and the difference between imagination and observation (imagination - observation) between cTBS and sham groups using an independent cluster-based permutation t-test.”

      Moreover, to directly compare linked and non-linked events we performed a four-way in-teraction including link vs. non-link. This analysis yielded a significant four-way interaction, showing that the interaction of time (pre vs. post), mode of insight (imagination vs. obser-vation) and cTBS differed for linked vs. non-linked items. We then report the follow-up analyses, separately for linked and non-linked events. Please see pages 12-13, lines 287-294:

      “First, we included the within-subject factors time (pre vs. post), mode of insight (imagina-tion vs. observation) and link (vs. non-link) by calculating the difference waves. Subse-quently we conducted a cluster-based permutation test comparing the cTBS and the sham groups. This analysis yielded a four-way interaction within a negative cluster in a fronto-temporal region (electrode: FT7; p = 0.007, ci-range = 0.00, SD = 0.00). This result indicates that the impact of cTBS over the angular gyrus on the neural pattern reconfiguration follow-ing imagination- vs. observation-based insight may differ between linked and non-linked events. For linked events, this analysis yielded a […]”

      2) Overall, the focus on the targeted three-way interaction is poorly motivated. Also, a func-tional interpretation is largely missing.

      In order to better explain our motivation for the three-way interaction, we em-phasized in the introduction the importance of disentangling potential differences due to the mode of insight, given the known role of the angular gyrus in imagination on pages 4-5, lines 107-115:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked event.”

      As discussed on page 21 (starting from line 478; see also the intro on page 4), we expected that the angular gyrus would be particularly implicated in imagination-based insight, given its known role in imagination (e.g.: Thakral et al., 2017). Moreover, given the angular gyrus’s strong connectivity with other regions, the results observed may not be driven by this re-gion alone but also by interconnected regions, such as the hippocampus. We clarified these important points at the very end of the discussion on pages 23-24, lines 543-560:

      “Furthermore, the differential impact of cTBS to the angular gyrus on neural reconfigura-tions between events linked via imagination and those linked via observation may be at-tributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Uddin et al., 2010). This stronger connectivity between the ventral angular gyrus and the hippocampus may shed light on the greater impact of cTBS to the angular gyrus on im-agination-based insight. Given the angular gyrus’s robust connectivity with other brain re-gions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also origi-nate from these interconnected regions. This notion may bear particular importance given the required accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the an-gular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gy-rus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014).”

      3) "Interestingly, we observed a different pattern of insight-related representational pattern changes for non-linked events." It is not sufficient to demonstrate that a given effect is pre-sent in one condition (linked events) but not the other (non-linked events). To claim that there are actually different patterns, the authors would need to compare the critical condi-tions directly (Nieuwenhuis et al., 2011).

      We completely agree and now compared the two conditions directly. Specifical-ly, we now report the significant four-way interaction, including the factor link vs. non-link, before delving into separate analyses for linked and non-linked events on pages 12-13, lines 287-294:

      “First, we included the within-subject factors time (pre vs. post), mode of insight (imagina-tion vs. observation) and link (vs. non-link) by calculating the difference waves. Subse-quently we conducted a cluster-based permutation test comparing the cTBS and the sham groups. This analysis yielded a four-way interaction within a negative cluster in a fronto-temporal region (electrode: FT7; p = 0.007, ci-range = 0.00, SD = 0.00). This result indicates that the impact of cTBS over the angular gyrus on the neural pattern reconfiguration follow-ing imagination- vs. observation-based insight may differ between linked and non-linked events. For linked events, this analysis yielded a […]”

      4) "This analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B)." (p. 11). The authors report results with specificity for certain topographical locations. However, this is in stark contrast to the fact that the authors derived time X time RSA maps.

      We did derive time × time similarity maps for each electrode within each partic-ipant, which allowed us to find a cluster consisting of specific electrodes. We apologize for not making this aspect clear enough and have, therefore, modified the respective part of our methods section on page 38, lines 951-952:

      “In total, this analysis produced eight Representational Dissimilarity Matrices (RDMs) for each electrode and each participant.”

      5) "These theta power values were then combined to create representational feature vectors, which consisted of the power values for four frequencies (4-7 Hz) × 41 time points (0-2 sec-onds) × 64 electrodes. We then calculated Pearson's correlations to compare the power pat-terns across theta frequency between the time points of linked events (A with B), as well as between the time points of non-linked events (A with X) for the pre- and the post-phase separately, separately for stories linked via imagination and via observation. To ensure un-biased results, we took precautions not to correlate the same combination of stories twice, which prevented potential inflation of the data. To facilitate statistical comparisons, we ap-plied a Fisher z-transform to the Pearson's rho values at each time point. This yielded a global measure of similarity on each electrode site. We, thus, obtained time × time similarity maps for the linked events (A and B) and the non-linked events (A and X) in the pre- and post-phases, separately for the insight gained through imagination and observation." (p. 34+35).

      If RSA values were calculated at each time point and electrode, the Pearson correlations would have been computed effectively between four samples only, which is by far not enough to derive reliable estimates (Schönbrodt & Perugini, 2013). The problem is aggra-vated by the fact that due to the time and frequency smoothing inherent in the time-frequency decomposition of the EEG data, nearby power values across neighboring theta frequencies are highly similar to start with. (e.g., Schönauer et al., 2017; Sommer et al., 2022).

      Alternative approaches would be to run the correlations across time for each electrode (re-sulting in the elimination of the time dimension) or to run the correlations at each time point across electrodes (resulting in the elimination of topographic specificity).

      At least, the authors should show raw RSA maps for linked and non-linked events in the pre- and post-phases separately for the insight gained through imagination and observa-tion in each group, to allow for assessing the suitability of the input data (in the supple-ments?) before progressing to reporting the results of three-way interactions.

      Although we do see the reviewer’s point, we think that an RSA specific to the theta range yielding electrode specific time × time similarity maps must be run this way, otherwise, as you pointed out, one or the other dimension is compromised. Running an RSA across time for each electrode will lead to computing a similarity measure between the events without information on when these stimuli become more or less similar, thereby ig-noring the temporal dynamics crucial to EEG data and not taking advantage of the high temporal resolution. Conversely, conducting an RSA across electrodes might result in an overall similarity measure per participant, disregarding the spatial distribution and potential variations among electrodes. Although EEG has limited spatial resolution, different elec-trodes can capture differences that may aid in understanding neural processing. However, as suggested by the reviewer, we included the raw RSA maps for linked and non-linked events separately for pre- and post-phases, imagination and observation and link and non-link in the supplement and refer to these data in the results section on pages 12-13, lines 293-295:

      “For linked events, this analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B; Figure 3 – Figure sup-plement 1).”

      And on page 15, lines 339-341:

      “This analysis yielded a positive cluster (p = 0.035, ci-range = 0.00, SD = 0.00) in a fronto-temporal region (electrode: FT7; Fig. 3C; Figure 3 – Figure supplement 2).”

      Reviewer 2

      We thank the reviewer for the very helpful and constructive comments and appreciate that the reviewer finds our study relevant to all areas of cognitive research.

      1) While the observed memory reconfiguration/changes are attributed to the angular gyrus in this study, it remains unclear whether these effects are solely a result of the AG's role in re-configuration processes or to what extent the hippocampus might also mediate these memory effects (e.g., Tambini et al., 2018; Hermiller et al., 2019).

      We agree that, in addition to the critical role of the angular gyrus, there may be an involvement of the hippocampus. We point now explicitly to the modulatory capacities of angular gyrus stimulation on the hippocampus. Please see page 4, lines 81-88:

      “One promising candidate that may contribute to insight-driven memory reconfiguration is the angular gyrus. The angular gyrus has extensive structural and functional connections to many other brain regions (Petit et al., 2023), including the hippocampus (Coughlan et al., 2023; Uddin et al., 2010). Accordingly, previous studies have shown that stimulation of the angular gyrus resulted in altered hippocampal activity (Thakral et al., 2020; Wang et al., 2014). Furthermore, the angular gyrus has been implicated in a myriad of cognitive func-tions, including mental arithmetic, visuospatial processing, inhibitory control, and theory-of-mind (Cattaneo et al., 2009; Grabner et al., 2009; Lewis et al., 2019; Schurz et al., 2014).”

      We further added a new paragraph to the discussion pointing at the possibility that not solely the angular gyrus but another brain region, such as the hippocampus, may have me-diated the changes observed in our study on pages 23-24, lines 546-562:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagination-based insight. Given the angular gyrus’s robust connectivity with other brain regions, includ-ing the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      2) Another weakness in this manuscript is the use of different groups of participants for the key TMS intervention, along with underspecified or incomplete hypotheses/predictions.

      In our view, the chosen between-subjects design is to be preferred over a crossover design for several reasons. First, our choice aimed to eliminate potential se-quence effects that may have adversely affected performance in the narrative-insight task (NIT). Second, this approach ensured consistency in expectations regarding the story links while also mitigating potential differences induced by fatigue. Additionally, we accounted for the potential advantage of a within-subject design – the stimulation of the same brain – by utilizing neuro-navigated TMS for targeting the stimulation coordinate. Finally, it is im-portant to note that we measured the event representations pre- and post-insight and that also the mode of insight was manipulated within-subject. Thus, our design did include a within-subject component and we are convinced that the chosen paradigm balances the different strengths and weaknesses of within-subject and between-subjects designs in the best possible manner. We specified our rationale for choosing a between-subjects ap-proach in the introduction on page 5, lines 122-126:

      “We intentionally adopted a mixed design, combining both between-subjects and within-subject methodologies. The between-subjects approach was chosen to minimize the risk of carry-over effects and sequence biases. Simultaneously, we capitalized on the advantages of a within-subject design by altering the pre- to post-insight comparison and the mode of insight (imagination vs. observation) within each participant.”

      Moreover, to provide a comprehensive portrayal of the two groups, we incorporated de-scriptions concerning trait and state variables alongside age and motor thresholds and in-cluded t-test comparisons between these variables on page 7, lines 157-160:

      “Notably, the groups did not differ on levels of subjective chronic stress (TICS), state and trait anxiety (STAI-S, STAI-T), depressive mood (BDI), imaginative capacities (FFIS), person-ality dimensions (BFI), age, and motor thresholds (for descriptive statistics see Table 1; all p > 0.053).”

      And further included age and motor thresholds as control variables in Table 1 on page 18, lines 402-404:

      “Overall, levels of subjective chronic stress, anxiety, and depressive mood were relatively low and not different between groups. The groups did further not differ in terms of per-sonality traits, imagination capacity, age or motor thresholds (all p > 0.053; see Table 1).”

      For greater precision in outlining our hypotheses, we specified these at the end of the in-troduction on pages 4-55, lines 107-118:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked events. We further predicted that cTBS to the left angular gyrus would reduce the impact of (imagination-based) insight into the link of initially unrelated events on memory performance during free recall, given its higher variability compared to other memory measures.”

      3) Furthermore, in some instances, the types of analyses used do not appear to be suitable for addressing the questions posed by the current study, and there is limited explanation pro-vided for the choice of analyses and questionnaires.

      We addressed this concern by inserting a new section “control variables” in the methods explaining our rationale for employing the different questionnaires as control var-iables on pages 40-41, lines 1003-1019:

      “Control variables In order to ensure that the observed effects were solely attributable to the TMS manipula-tion and not influenced by other factors, we comprehensively evaluated several trait and state variables. To account for potential variations in anxiety levels that could impact our re-sults, we specifically measured state and trait anxiety using STAI-S and STAI-T (Laux et al., 1981), thus minimizing the potential confounding effects of anxiety on our findings (Char-pentier et al., 2021). Additionally, we evaluated participants’ chronic stress levels using the TICS (Schulz & Schlotz, 1999) to exclude any group variations that might explain the effect on memory, cosidering the well-established impact of stress on memory (Sandi & Pinelo-Nava, 2007; Schwabe et al., 2012). Moreover, we assessed participants’ depressive symp-toms employing the BDI (Hautzinger et al., 2006), to guarantee group comparability on this clinical measure. We further assessed fundamental personality dimensions using the BFI-2 (Danner et al., 2016) to exclude any potential group discrepancies that could account for dif-ferences observed. Lastly, we assessed participants’ imaginative capacities using the FFIS (Zabelina & Condon, 2019), to ensure uniformity across groups regarding this central varia-ble, considering the significant role of imagination in relation to the cTBS-targeted angular gyrus (Thakral et al., 2017).”

      We further specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-85:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Moreover, we added an explanation on why we opted for the RSA approach in the meth-ods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      Reviewer 3

      We thank the reviewer for the constructive and very helpful comments. We are pleased that the reviewer considered our experimental design to be strong and our behavioral results to be striking.

      1) My major criticism relates to the main claim of the paper regarding causality between the angular gyrus and the authors' behavior of interest. Specifically, I am not convinced by the evidence that the effects of stimulation noted in the paper are attributable specifically to the angular gyrus, and not other regions/networks.

      While our results showed specific changes after cTBS over the angular gyrus, demonstrating a causal involvement of the angular gyrus in these effects, we completely agree that this does not rule out an involvement of additional areas. In particular, there is evidence suggesting that cTBS over parietal regions, such as the angular gyrus, could poten-tially influence hippocampal functioning. We address this issue now in a new paragraph that we have added to the discussion, on pages 23-24, lines 546-564:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagination-based insight. Given the angular gyrus’s robust connectivity with other brain regions, includ-ing the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus. Expanding upon this idea, it is conceivable that targeting a more dorsal segment of the angular gyrus might exert a stronger influence on observation-based linking – an aspect that warrants future in-vestigations.”

      Responses to reviewer recommendations

      Reviewer 1

      1) On page 26, the authors write: "[...] different video events (A, B, and X) were recalled from day one [...]". I may have missed this point, but I had the impression that the task was con-ducted within one day.

      Indeed, this study was conducted within a single day. We rephrased the respec-tive statement accordingly. Please see page 7, lines 149-153:

      “To test this hypothesis and the causal role of the angular gyrus in insight-related memory reconfigurations, we combined the life-like video-based narrative-insight task (NIT) with representational similarity analysis of EEG data and (double-blind) neuro-navigated TMS over the left angular gyrus in a comprehensive investigation within a single day.”

      We further included this information in the methods section on page 27, lines 634-635:

      “In total, the experiment took about 4.5 hours per participant and was completed within a single day. ”

      Reviewer 2

      1) There is a substantial disconnection between the introduction and the methods/results sec-tion. One reason is that there is not sufficient detail regarding the hypotheses/predictions and the specific types of analyses chosen to test these hypotheses/predictions. Additionally, it is not explained what comparisons and outcomes would be informative/expected. This should be made clear. Second and related to the above, the rationale for conducting certain types of analyses (correlation, coherence, see below) sometimes is not specified.

      To address this concern, we elaborated on our hypotheses incorporating specif-ic predictions for the free recall, given its higher variability than the other memory measures, and for imagination vs. observation at the end of the introduction on pages 4-5, lines 107-122:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked events. We further predicted that cTBS to the left angular gyrus would reduce the impact of (imagination-based) insight into the link of initially unrelated events on memory performance during free recall, given its higher variability compared to other memory measures. Considering the high connectivity profile of the angular gyrus within the brain (Seghier, 2013), we conducted an EEG connec-tivity analysis building upon prior findings concerning alterations in neural reconfigurations. To establish a link between neural and behavioral findings, we chose a correlational ap-proach to relate observations from these two domains.”

      Moreover, we made our rationale for the employed analyses more explicit and specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-851:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Moreover, we added an explanation on why we opted for the RSA approach in the meth-ods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      2) The authors suggest that besides Branzi et al. (2021), this is one of the first studies showing that memory update is linked to the AG. I suggest having a look at work from Tambini, Nee, & D'Esposito, 2018, JoCN, and other papers from Joel Voss' group that target a similar re-gion of AG/Inferior parietal cortex. Many studies, using multiple TMS protocols, have now shown this brain region is causally involved in episodic and associative memory encoding.

      As mentioned above, further consideration of this literature is important as it delves into the region's hippocampal connectivity (and other network properties), and how that mediates the memory effects. Indeed because of the nature of the methods employed in this study, we do not know if the memory-related behavioural effects are due to TMS-changes induced at the AG's versus the hippocampal' s level, or both. How do the current findings square with the existing TMS effects from this region? Can the connectivity profile of the target re-gion highlighted by previous studies provide further insight into how the current behaviour-al effect arises? Some comments on this could be added to the discussion.

      We completely agree that the other studies showing enhanced associative memory after TMS to parietal regions need to be addressed. Therefore, we updated the discussion on page 20, lines 449-453:

      “Interestingly, recent work has additionally indicated that targeting parietal regions with TMS led to alterations in hippocampal functional connectivity, thereby enhancing associa-tive memory (Nilakantan et al., 2017; Tambini et al., 2018; Wang et al., 2014), potentially shedding light on the underlying mechanisms involved.”

      Moreover, we included a section specifically addressing the possibility that the effects ob-served may pertain to having modulated other regions via the targeted region and updated the discussion on pages 23-24, lines 543-562:

      “Furthermore, the differential impact of cTBS to the angular gyrus on neural reconfigura-tions between events linked via imagination and those linked via observation may be at-tributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Uddin et al., 2010). This stronger connectivity between the ventral angular gyrus and the hippocampus may shed light on the greater impact of cTBS to the angular gyrus on im-agination-based insight. Given the angular gyrus’s robust connectivity with other brain re-gions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also origi-nate from these interconnected regions. This notion may bear particular importance given the required accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the an-gular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gy-rus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      3) Another comment I have regards the results observed for the observation vs imagination insight conditions. The authors mention that the 'changes in representational similarity for the observation condition should be interpreted with caution, as these seemingly opposite changes appeared to be at least in part driven by group differences already in the pre-phase before participants gained insight.' I wonder what these group differences are and whether the authors have any hypothesis about what factors determined them.

      We could only speculate about the basis of the observed pre-insight phase dif-ferences. However, we provide now the raw RSA data as supplemental material to make the pattern of the (raw) RSA findings in the pre- and post-insight phases more transparent. We refer the interested reader to this material on pages 12-13, lines 293 to 295:

      “For linked events, this analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B; Figure 3 – Figure sup-plement 1).”

      And on page 15, lines 339-341:

      “This analysis yielded a positive cluster (p = 0.035, ci-range = 0.00, SD = 0.00) in a fronto-temporal region (electrode: FT7; Fig. 3C; Figure 3 – Figure supplement 2).”

      Furthermore, the age of participants is not reported separately for the two groups (cTBS to AG vs Sham), I think. This should be reported including a t-test showing that the two groups have the same age.

      We agree and report now explicitly that groups did not significantly differ in rel-evant control variables including age. Please see page 7, lines 157-160:

      “Notably, the groups did not differ on levels of subjective chronic stress (TICS), state and trait anxiety (STAI-S, STAI-T), depressive mood (BDI), imaginative capacities (FFIS), person-ality dimensions (BFI), age, and motor thresholds (for descriptive statistics see Table 1; all p > 0.053).”

      And further included age and motor thresholds as control variables in Table 1 on page 18, lines 402-412:

      “Overall, levels of subjective chronic stress, anxiety, and depressive mood were relatively low and not different between groups. The groups did further not differ in terms of per-sonality traits, imagination capacity, age or motor thresholds (all p > 0.053; see Table 1).”

      The fact this study is not a within-subject design makes difficult the interpretation of the results and this should be recognised as an important limitation of the study.

      As outlined above, a within-subject design would in our view come with several disadvantages, such as significant sequence/carry-over effects. Moreover, the neural rep-resentation change was measured in a pre-post design, enabling us to measure the insight-driven neural reconfiguration at the individual level.

      We clarify our rationale for the between-subjects factor TMS in the introduction on page 5, lines 122-126:

      “We intentionally adopted a mixed design, combining both between-subjects and within-subject methodologies. The between-subjects approach was chosen to minimize the risk of carry-over effects and sequence biases. Simultaneously, we capitalized on the advantages of a within-subject design by altering the pre- to post-insight comparison and the mode of insight (imagination vs. observation) within each participant.”

      Furthermore, we included our rationale for choosing a between-subjects approach for the crucial TMS manipulation in the methods section on page 25, lines 601-604:

      “We implemented a mixed-design including the within-subject factors link (linked vs. non-linked events), session (pre- vs. post-link), and mode (imagination vs. observation) as well as the between-subjects factor group (cTBS to the angular gyrus vs. sham) to mitigate the risk of carry-over effects and sequence biases of the crucial cTBS manipulation.”

      4) The angular gyrus is a heterogeneous region with multiple graded subregions. The one tar-geted in the present study is the ventral AG which has strong connections with the episodic-hippocampal memory system. I was wondering if this might explain why the AG TMS ef-fects on representational changes have been observed for events linked via imagination but not direct observation. Perhaps the stimulation of a more 'visual' AG subregion (see Hum-phreys et al., 2020, Cerebral Cortex) would have resulted in a different (opposite) pattern of results. It would be good to add some comments on this in the discussion.

      We appreciate this interesting perspective offered regarding the potential out-comes of our study, particularly in relation to the activation of a more ventral sub region of the angular gyrus. We incorporated this idea into our discussion, alongside considerations regarding the potential effects of a more dorsal angular gyrus stimulation on observation-based linking. However, caution is warranted recognizing the inherent limitations posed by the precision of TMS manipulations, which is further underscored by our electric field simu-lations, utilizing a 10 mm radius. We included this section in the discussion on pages 23-24, lines 546-569:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagina-tion-based insight. Given the angular gyrus’s robust connectivity with other brain regions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus. Expanding upon this idea, it is conceivable that targeting a more dorsal segment of the angular gyrus might exert a stronger influence on observation-based linking – an aspect that warrants future in-vestigations. Yet, while acknowledging the functional heterogeneity within the angular gy-rus (Humphreys et al., 2020), pinpointing specific sub regions via TMS remains challenging due to its limited focal precision at the millimeter level (Deng et al., 2013; Thielscher & Kammer, 2004), as reinforced by our electric field simulations utilizing a 10 mm radius. Hence, drawing definitive conclusions regarding distinct angular gyrus sub regions requires future research employing rigorous checks to assess the focality of their stimulation.”

      5) Regarding the methods section, I have the following specific queries. It is unclear what is the purpose of the coherence and correlation analyses (pages 35, 36). Could the authors pro-vide further clarification on this? These analyses seem not to be mentioned anywhere in the introduction. This should be clarified briefly in the introduction and then in the methods sec-tion. The same for the questionnaires (anxiety, stress, etc): It is unclear the reason for col-lecting this type of data. This should be clarified in the introduction as well.

      We agree, and have updated the introduction as follows on page 5, lines 118-122:

      “Considering the high connectivity profile of the angular gyrus within the brain (Seghier, 2013), we conducted an EEG connectivity analysis building upon findings from the RSA anal-yses concerning alterations in neural reconfigurations. To establish a link between neural and behavioral findings, we chose a correlational approach to relate observations from these two domains.”

      We additionally provided an explanation for including these questionnaires in the introduc-tion on page 5, lines 126-129:

      “To control for any group differences beyond the TMS manipulation, we gathered various control variables through questionnaires, including trait- and state-anxiety, depressive symptoms, chronic stress levels, personality dimensions, and imaginative capacities.”

      Moreover, we elaborated on the underlying rationale guiding our chosen analytical ap-proaches. Therefore, we specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-851:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Furthermore, we added an explanation on why we opted for the RSA approach in the methods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      6) The preregistration webpage is in German. This is not ideal as it means that the information is available only to German speakers.

      This webpage can easily be switched to English by changing the settings in the top right corner:

      To address this issue, we included a description of how to set the webpage to English in the methods section on page 25, lines 581-582:

      “For translation to English, please adjust the page settings located in the top right corner.”

      7) Page 18. 'NIT' and 'MAT' - avoid abbreviations when possible.

      We included the full name for the narrative-insight task (NIT) on page 7, line 151, line 153, and line 165, page 8 lines 177-178 and line 187, page 19 on line 427, page 26 on line 615, line 629 and line 632, page 27, line 653, page 30, lines 730-731, page 31, line 754, page 35, line 870, line 873, and page 36 and line 885.

      We further included the full name for the multi-arrangements task (MAT) on page 19, lines 428-429.

      8) Line 21....we further observed DECREASED...should be replaced with INCREASED, if I am not wrong.

      We checked the sentence again and it looks correct to us, since it describes the change for observation-based insight, not imagination-based insight. We clarified that this finding pertains to observation-based linking by modifying the sentence on page 23, lines 525-528, as follows:

      “Following cTBS to the angular gyrus, we further observed decreased pattern similarity for non-linked events in the observation-based condition, resembling the pattern change ob-served in the sham group for linked events, which may highlight the role of the angular gy-rus in representational separation during observation-based linking”

      Reviewer 3

      1) The major claim of the paper is that the angular gyrus is causally involved in insight-driven memory reconfiguration. To the authors' credit, they localized stimulation to the angular gyrus using an anatomical scan, the strength of the estimated electromagnetic field in the angular gyrus correlated with their behavioral results, and there were also brain-behavior correlations involving sensors located in the parietal lobe. However, the minimum evidence needed to claim causality is 1) evidence of a behavioral change (which the authors found) and 2) evidence of target engagement in the angular gyrus. It is also important to show brain-behavior correlations between target engagement and behavior. Although the au-thors stimulated the angular gyrus, that does not mean that rTMS specifically affected this region or that the behavioral results can be attributed to rTMS effects on the angular gyrus. As the authors point out, the angular gyrus has dense connections with other regions such as the hippocampus. In fact, several studies have shown that angular gyrus (or near AG) stimulation affects the hippocampal network (Wang et al., 2014, Science; Freedberg et al. 2019, eNeuro; Thakral et al., 2020, PNAS). EEG also has a poor spatial resolution, so even though the results were attributable to parieto-temporal sensors, this is not sufficient evi-dence to claim that the angular gyrus was modulated. Source localization would be re-quired to reconstruct the signal specifically from the AG. Thus, with the manuscript written as is, the authors can claim that "cTBS to the angular gyrus modulates insight-driven memory reconfiguration," but the current claim is not sufficiently substantiated.

      While acknowledging the potential role of the angular gyrus in driving the ob-served changes, we recognize that the available evidence may not be sufficient. Conse-quently, we have introduced several modifications within our manuscript to address this concern.

      In the revised Introduction, we now explicitly address the possibility of a stimulation of the hippocampus via the angular gyrus on page 4, lines 84-85:

      “Accordingly, previous studies have shown that stimulation of the angular gyrus resulted in altered hippocampal activity (Thakral et al., 2020; Wang et al., 2014).”

      Additionally, we included relevant evidence demonstrating previous instances of targeted stimulation of the angular gyrus, which led to alterations in hippocampal connectivity and associative memory. These insights have been included in the discussion on page 20, lines 449-453:

      “Interestingly, recent work has additionally indicated that targeting parietal regions with TMS led to alterations in hippocampal functional connectivity, thereby enhancing associa-tive memory (Nilakantan et al., 2017; Tambini et al., 2018; Wang et al., 2014), potentially shedding light on the underlying mechanisms involved.”

      Next, we have integrated crucial modifications essential for establishing a conclusive infer-ence of causality in our study. Moreover, we now explore the potential mediation of the effects observed from angular gyrus stimulation through other brain regions, like the hip-pocampus. In addition, we have highlighted prior work where such stimulation coincided with alterations in associative memory. For the updated discussion section, please see pag-es 23-24, lines 538-562:

      “Although our study provided evidence suggesting a causal role of the angular gyrus in in-sight-driven memory reconfigurations – highlighted by behavioral changes after cTBS to the angular gyrus, neural changes in left parietal regions, and relevant brain-behavior associa-tions – it is important to acknowledge the limitations imposed by the spatial resolution of EEG. Consequently, the precise source of the observed signal changes in the parietal re-gions remains uncertain, potentially tempering the definitive nature of these findings. Fur-thermore, the differential impact of cTBS to the angular gyrus on neural reconfigurations between events linked via imagination and those linked via observation may be attributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). An-other intriguing aspect to consider is that the stimulated site was situated in the more ven-tral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagina-tion-based insight. Given the angular gyrus’s robust connectivity with other brain regions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      We further replaced terms that imply inhibition of the angular gyrus with a more operation-ally descriptive phrase:

      “cTBS to the angular gyrus”

      2) The authors frequently claim that cTBS is "inhibitory stimulation" and that inhibition of the angular gyrus caused their effects. There is a common misconception within the cognitive neuroscience literature that stimulation is either "inhibitory" or "excitatory," but there is no such thing as either. The effects of rTMS are dependent on many physiological, state, and trait-specific variables and the location of stimulation. For example, while cTBS does repro-ducibly inhibit behavior supported by the motor cortex (Wilkinson et al., 2010, Cortex; Rosenthal et al., 2009, J Neurosci), cTBS of the posterior parietal cortex reproducibly en-hances hippocampal network functional connectivity and episodic memory (Hermiller et al., 2019, Hippocampus; Hermiller et al., 2020, J Neurosci). The authors reference the Huang et al. (2005) paper as evidence of its inhibitory effects but work in this paper is not sufficient to broadly categorize cTBS as inhibitory. First, Huang et al. stimulated the motor cortex and measured the effects on corticospinal excitability, which is significantly different from what the current authors are measuring. Furthermore, this oft-cited study only included 9 sub-jects. Other studies have found that the effects of theta-burst are significantly more varia-ble when more subjects are used. For example, intermittent theta-burst, which is assumed to be excitatory based on the Huang paper, was found to produce unreliable excitatory ef-fects when more subjects were examined (Lopez-Alonso, 2014, Brain Stimulation). Thus, the a priori assumption that stimulation would be inhibitory is weak and cTBS should not be dis-cussed as "inhibitory."

      We agree and included now a statement in the methods section that explicitly states that cTBS effects may be region-specific on page 33, lines 817-819:

      “Nonetheless, the effects of cTBS appear to vary based on the targeted region, with cTBS to parietal regions demonstrating the capability to enhance hippocampal connectivity (Hermiller et al., 2019, 2020).”

      We further substituted all terminology suggestive of an inhibitory effect with the phrase:

      “cTBS to the angular gyrus”.

      However, it is important to note, that while other studies (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014) found increased hippocampal connectivity after rTMS to a parie-tal region as well as enhanced associative memory, we observed impaired memory for the linked events. We included this clarification in the discussion on page 24, lines 558-562:

      “In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      3) The hypothesis at the end of the introduction did not strike me as entirely clear. From this hypothesis, it seems that the authors are just comparing the differences in memory and re-configuration during imagination-based insight links. However, the authors also include ob-servation-based links and a non-linking condition, which seem ancillary to the main hy-pothesis. Thus, I am confused about why these extra factors were included and exactly what statistical results would confirm the authors' hypothesis.

      We agree, and have clarified our hypotheses on pages 4-5, lines 107-115:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to reduce the increase in neural similarity for linked events and in-crease of neural dissimilarity for non-linked events.”

      4) Many of the distributions throughout the paper do not look normal. Was normality checked? Are non-parametric stats warranted?

      We evaluated and reported the normality assumption in our behavioral anal-yses. Despite the non-normal distribution of our data, we chose to utilize linear-mixed models due to their robust performance even in case of deviations from normal distribu-tions. This update in our methods section can be found on page 36, lines 890-896:

      “After outlier correction, we identified non-normality in our data using a Shapiro-Wilk test (narrative-insight task: W = 0.92, p < 0.001; multi-arrangements task: W = 0.94, p < 0.001; forced-choice recognition: W = 0.50, p < 0.001; free recall details: W = 0.85, p < 0.001; free recall naming of linking events: W = 0.94, p < 0.001). However, we mitigated this by employ-ing linear-mixed models (LMMs), recognized for their robustness even with non-normally distributed data (Schielzeth et al., 2020).”

      We recalculated the correlational analysis between the RSA data and the behavioral recall of linking events by using the Spearman method on page 13, lines 306-308:

      “Furthermore, to address a deviation from the normality assumption, the correlational analysis was repeated using the Spearman method, which indicated an even stronger cor-relation (r(59) = 0.32, p = 0.012).”

      We further recalculated the correlation between the change in coherence for linked events and the recall of details for events linked via imagination on page 16, lines 376-378:

      “Please note that for addressing a deviation from the normality assumption, the correla-tional analysis was repeated using the Spearman method, which yielded a significant corre-lation of similar strength (r(59) = 0.31, p = 0.015).”

      Our EEG analyses , including RSA and coherence analyses, utilized a cluster-based permuta-tion test (Fieldtrip; Oostenveld et al., 2011). These tests do not assume a normal distribu-tion by utilizing empirical sampling for statistical inference. This approach ensures robust-ness without constraints imposed by specific distributional assumptions. Subsequent t-tests, stemming from significant clusters identified in the initial non-parametric analyses, were extensions of the robust non-parametric approach and did not require additional normality testing.

      5) Can the authors include more detail about the sham coil? Was it subthreshold? Did the EMF cross the skull?

      The sham coil, also obtained from MAG & More GmbH, München, Germany, provided a similar sensory experience; however, the company did not specify any field strength (n.a.) as this coil was purposefully designed to prevent the induction of an elec-tromagnetic field (EMF) capable of penetrating the skull, thereby ensuring it had no impact on the brain. We clarified on this point in the methods section on pages 31-32, lines 772-778:

      “Two identically looking but different 70 mm figure-of-eight-shaped coils were used de-pending on the TMS condition: The PMD70-pCool coil (MAG & More GmbH, München, Germany) with a 2T maximum field strength was used for cTBS, while the PMD70-pCool-SHAM coil (MAG & More GmbH, München, Germany), with minimal magnetic field strength, was employed for sham, providing a similar sensory experience, with stimulation pulses being scattered over the scalp and not penetrating the skull.”

      6) There are differences between exclusion criteria in pre-registration and report. For example, BMI is an exclusion factor in the report, but not in the pre-registration. Can the authors provide a reason for this deviation?

      This discrepancy is due to (partial) participant recruitment from previous fMRI studies conducted in our lab that involved a stress induction protocol (as a structural MRI image was needed for the ‘neuronavigated’ TMS). Owing to the distinct cortisol stress reac-tivity observed in individuals with varying body mass indices (BMIs), participants with a BMI below 19 or above 26 kg/m² were excluded from these studies. To maintain consistency within our sample, only participants meeting these criteria were included. We elaborated on this point in the methods section on page 25, lines 586-592:

      “Participants were screened using a standardized interview for exclusion criteria that com-prised a history of neurological and psychiatric disease, medication use and substance abuse, cardiovascular, thyroid, or renal disease, evidence of COVID-19 infection or expo-sure, and any contraindications to MRI examination or TMS. Additionally, participants with a body mass index (BMI) below 19 or above 26 kg/m² were excluded. This decision stemmed from recruiting some participants from prior studies that incorporated stress induction pro-tocols, which imposed this specific criterion (Herhaus & Petrowski, 2018; Schmalbach et al., 2020).”

      7) Were impedances monitored and minimized during EEG?

      Yes, they were monitored. We clarified this point in the methods section on page 34, lines 845-847:

      “We maintained impedances within a range of ± 20 μV using the common mode sense (CMS) and driven right leg (DRL) electrodes, serving as active reference and ground, re-spectively”

      8) I think there may be a typo related to the Thakral coordinates. I believe Thakral used MNI coordinates -48,-64, 30, whereas the authors stated they used -48,-67,30. Is this a mistake?

      Upon reevaluation of our study coordinates, we identified a slight deviation in our stimulation coordinates compared to those reported by Thakral et al. (2017; +3mm on the y-axis). This variance resulted from the required MNI to Talairach (TAL) transformations necessary for utilizing the neuronavigation software Powermag View! (MAG & More GmbH, München, Germany). Notably, this deviation was consistent across all participants in our study. While TMS is more precise than tDCS, its focality is not as fine-grained down to the millimeter level. Despite this, our electric field simulations, adopting a 10mm radius, ef-fectively encompassed the original coordinates specified by Thakral et al. (2017). This radius ensured coverage over the intended target area, mitigating the impact of this minor devia-tion on the overall study outcomes. We updated the methods section accordingly on page 33, lines 800-806:

      “Based on the individual T1 MR images, we created 3D reconstructions of the participants' heads, allowing us to precisely locate the left angular gyrus coordinate (MNI: -48, -67, 30), initially derived from previous work (Thakral et al., 2017), for TMS stimulation. Despite a mi-nor deviation in coordinates due to necessary MNI to Talairach transformations for soft-ware compatibility (Powermag View! by MAG & More GmbH, München, Germany), our methodology ensured precise localization of the angular gyrus target area.”

      9) How was the tail of the coil positioned during stimulation? Was it individualized so that the lobes of the coil are perpendicular to the nearest gyrus, as is commonly done?

      The coil handle always pointed upwards to maintain optimal positioning with the coil holder. We followed the positioning procedure in the neuronavigation software Powermag View!, which did not indicate any positioning of the coil handle but specified the position and angle of the coil itself. To incorporate this aspect, we updated the legend of figure 2 on page 11, lines 260-261:

      “Please note that in the study, the coil handle was oriented upwards; however, in this illus-tration, it has been intentionally depicted as pointing downwards for better visibility pur-poses.”

      We further updated the method section on page 33, lines 723-824:

      “The coil was positioned tangentially on the head and mechanically fixed in a coil holder, with its handle pointing upwards to maintain its position”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      This work describes a new and powerful approach to a central question in ecology: what are the relative contributions of resource utilisation vs interactions between individuals in the shaping of an ecosystem? This approach relies on a very original quantitative experimental set-up whose power lies in its simplicity, allowing an exceptional level of control over ecological parameters and of measurement accuracy.

      In this experimental system, the shared resource corresponds to 10^12 copies of a fixed single-stranded target DNA molecule to which 10^15 random single-stranded DNA molecules (the individuals populating the ecosystem) can bind. The binding process is cycled, with a 1000x-PCR amplification step between successive binding steps. The composition of the population is monitored via high-throughput DNA sequencing. Sequence data analysis describes the change in population diversity over cycles. The results are interpreted using estimated binding interactions of individuals with the target resource, as well as estimated binding interactions between individuals and also self-interactions (that can all be directly predicted as they correspond to DNA-DNA interactions). A simple model provides a framework to account for ecosystem dynamics over cycles. Finally, the trajectory of some individuals with high frequency in late cycles is traced back to the earliest cycles at which they are detected by sequencing. Their propensities to bind the resource, to form hairpins, or to form homodimers suggest how different interaction modes shape the composition of the population over cycles.

      The authors report a shift from selection for binding to the resource to interactions between individuals and self-interactions over the course of cycles as the main drivers of their ecosystem. The outcome of the experiment is far from trivial as the individual resource binding energy initially determines the relative enrichment of individuals, and then seems to saturate. The richness of the population dynamics observed with this simple system is thus comparable to that found in some natural ecosystems. The findings obtained with this new approach will likely guide the exploration of natural ecosystems in which parameters and observables are much less accessible.

      My review focuses mainly on the experimental aspects of this work given my own expertise. The introduction exposes very convincingly the scientific context of this work, justifying the need for such an approach to address questions pertaining to ecology. The manuscript describes very clearly and rigorously the experimental setup. The main strengths of this work are (i) the outstanding originality of the experimental approach and (ii) its simplicity. With this setup, central questions in ecology can be addressed in a quantitative manner, including the possibility of running trajectories in parallel to generalize the findings, as reported here. Technical aspects have been carefully implemented, from the design of random individuals bearing flanking regions for PCR amplification, binding selection and (low error) amplification protocols, and sequencing read-out whose depth is sufficient to capture the relevant dynamics.<br /> :<br /> We thank the reviewer for summarizing our work and the main findings in a very clear and effective manner.

      One missing aspect in the data analysis is the quantification of the effect of PCR amplification steps in shaping the ecosystem (to be modeled if significant). In addition, as it stands the current work does not fully harness the power of the approach. For instance, with this setup, one can tune the relative contributions of binding selection vs amplification for instance (to disentangle forces that shape the ecosystem). One can also run cycles with new DNA individuals, designed with arbitrarily chosen resource binding vs self-binding, that are predicted to dominate depending on chosen ecological parameters. I have three main recommendations to the authors:

      1) PCR amplification steps (and not only binding selection steps) should be taken into account when interpreting the outcome of experiments.

      2) More generally, a systematic analysis of the possible modes of propagation of a DNA molecule from one cycle to the next, including those considered as experimental noise, would help with interpreting the results.

      3) Testing experimentally the predictions from the analysis and the modelling of results would strengthen the case for this approach.

      Despite its conceptual simplicity, our approach has indeed a few experimental handles that enable exploring a relevant variety of conditions much beyond those described in this paper, of which we are very aware. These involve selection vs. amplification or set the stage to explore competition, parasitism or cooperation among specific species, as the reviewer points out, but also introduce mutations and explore the kinetics of evolution in static or dynamic environments. Ongoing experiments are considering some of these conditions. We modified the text to mention more explicitly these possibilities, which are now mentioned in p11 lines 376-378 and lines 416-417. The three points raised by the reviewer helped us to further improve and clarify strengths and limitations of our work, as detailed below.

      Regarding the first point, here are my suggestions :

      • Run one cycle of just amplification vs 'binding + amplification', or simply increase the number of PCR cycles (and subsample the product) to check whether it impacts the population composition, in particular for sequences with predictions derived from the current analysis.

      The point raised by the reviewer is indeed very relevant and not discussed in our manuscript. Prompted by the reviewer’s comment, we performed two new experiments to distinguish resource-binding selection from PCR amplification effects.

      First, we performed a negative control experiment in which we performed the “selection step” with bear beads, i.e. beads without with no DNA grafted on them. We then compared the results with the corresponding results of the original experiments on Oligo 1 and 2.

      After 6 cycles, the most abundant sequence in the negative dataset has a relative occurrence of 0.05%, whereas the dominant strand in Oligo 1 and Oligo 2 has an abundance of 8% and 16%, respectively, i.e. 40-80 times larger.

      This indicates that the drift due to non-specific binding + PCR amplification is at least two orders of magnitude smaller than the selection induced by the affinity with the resource.

      This results are now cited in p14 lines 468-470, and described in Appendix 1, Experimental controls.

      Second, we tested the effect of PCR amplification on the selection process. We exploited the fact that we have aliquots for each generation of our evolution experiment, which we sampled and saved after PCR and before sequencing. We thus chose a specific generation - specifically generation 9 from Oligo 1 experiments - and performed another PCR round we proceeded directly to sequencing with no beadsselection step. We then compared the ensemble of oligos obtained in this way, which we named Oligo 1 “cycle 9 replica”, with both the original Oligo1 cycle 9, and with Oligo1 cycle 10.

      We sampled 20 times 4 x 10^5 sequences from the cycle 9 dataset, from cycle 9 replica and from cycle 10 with a bootstrap approach. To compare the three systems we extracted the fraction of the population of each covered by the 10 most abundant individuals. The results are shown in Figure 2 - Figure Supplement 4. In the figure caption further details on the analysis can be found. The similarity between cycle 9 and cycle 9 replica and the marked difference between cycle 9 replica and cycle 10

      indicates that the relevant part of the selection is indeed performed by the resourcebinding mechanism, while drifts induced by PCR play a secondary role.

      As a further check, we compared the specific sequences across the 20 samples in cycle 9 and cycle 9 replica datasets and found that the 10 most abundant sequences are almost always the same. In particular, the first 8/9 are always the same, possibly shuffled.

      These new pieces of evidence are now cited in p14 lines 483-484 and described in Appendix 1, Experimental controls.

      • Sequencing read-out includes the same PCR protocol as the one used for amplification steps, so read-out potentially has an effect on the composition of the ecosystem. Again, varying the number of PCR cycles is a direct way to test this.

      The PCR amplification involved in the read-out might have a minor effect on the sequencing outcome but not on the composition of the ecosystem. In fact, the sample that undergoes sequencing is taken from the pool at each cycle, and not inserted back into it. Thus, it does not participate in the following selection steps. This is specified in the text at p3 line 104

      • Could self-interactions (hairpins of homodimers) benefit individuals during amplification steps? The role of self-interactions during binding selection steps could also be tested directly over one cycle (again varying the relative weight of the binding vs amplification to disentangle both).

      Our choice of conditions for PCR amplification were thought to minimize effects of this type. PCR amplification is carried out at 68 C, a temperature at which, given the level of self and mutual complementarity in the sequences analyzed in the text, hairpins or homodimers should be melted and thus have no effect. This is specified in the text at p. 14 lines 479-480 However, if an effect is present, it gives a disadvantage (rather than an advantage) to self-interacting individuals. For the amplification step we used Q5® Hot Start HighFidelity DNA Polymerase, which does not possess strand displacement activity. Therefore, in theory, if during amplification the polymerase encounters a double strand portion, it stops and synthesizes only a truncated product, which will be then lost during the purification step. In other words, sequences with secondary and/or tertiary structures are less likely to be amplified during the polymerization step. As a consequence, a DNAi that is characterized by this kind of structures, will be negatively selected even in the case of optimal binding to the resource, and will be underrepresented in the pool.

      About the second point:

      • Regarding the effect of sampling (sequencing read-out), PCR amplification errors: explicitly check the consistency of observations with the expected outcome, in the methods section (right now these aspects are only briefly mentioned in the main text), which would highlight again the level of control and accuracy of the system.

      Hoping to have well interpreted the request, we performed a technical replicate sequencing Oligo 1 cycle 9 again and analyzed the sequences that have at least 100 reads (corresponding to 27.42% of the total reads). We find that among the 800 DNA species that have at least 100 reads, 93.6% are found in both replicates. All the nonoverlapping sequences have very low abundance, close to 100.

      Moreover, we compare the population size of each DNA species between the two replicas, after having equalized the database sizes. The results are now cited in p14 lines 509-510, In Appendix 1, Experimental Controls and shown in Figure 2-figure supplement 3, where we plot the ratio of the number of reads in the two replicates for each sequence as a function of the number of reads in one. We found an average of 0.965 with a standard deviation of 0.119. High fluctuations are found in the most rare species, as expected.

      We think this evaluation indeed strengthens the solidity of our results.

      • I have a small concern about target resource accessibility: is there any spacer between the ssDNA and the bead? The methods section does not mention any, and I would expect such a proximity between the target DNA and the bead to yield steric repulsion that impedes interactions with random DNA individuals.

      Yes, there is a 12-carbon spacer between the bead and the resource, which was inserted exactly to make the resource more accessible. This information is now available in Table 1 of Supplementary Information detailing the sequences used in the experiment. However, as now described in the text (p8 lines 284-286), we observe that the interaction with the resource is always shifted to the 3', the terminal furthest from the bead, indicating some residual issue of accessibility to the resource sections closest to the bead.

      • Regardless of the existence of a spacer, binding of random DNA molecules to beads instead of the target DNA constitutes a potential source of noise (described for now as '1-x' in the IBEE model), which can be probed by swapping targets, selecting without target etc.

      This issue is addressed by the test with bare beads described above, in which we found little effects, corresponding to small 1−𝑥 value.

      • Is there any recombination potentially occurring during amplification steps? This could be tested with a set of known molecules amplified over 24 amplification steps in a row (no binding step).

      It is possible for recombination to occur during the amplification steps. In Appendix 2, the section "By-Product Formation from PCR Amplification", discusses PCR byproducts as aberrant forms of amplification, such as recombination events. We adopted several strategies to limit by-product formation, such as: i) use of “blockers” characterized by a phosphate group at 3’ end (thus inhibiting their usage during the amplification and allowing a better control of the reaction conditions over the PCR cycles), ii) a high annealing temperature (to limit the possibility of a spurious primer annealing to the random region), iii) fewer PCR cycles, iv) a high primer concentration, v) a very short elongation step (all these strategies have been implemented to avoid a possible mispriming event between different DNAi, and the formation of concatemers). However, the formation of by-products is a problem inherent to the technique: in fact, it is a known issue for classical SELEX technology (Tolle et al. 2014), mainly due to the random region within the DNAi. Q5® Hot Start High-Fidelity DNA Polymerase (New England Biolabs, Ipswich, MA, USA) has an error rate of <0.44 x 10-6/base.

      In classic SELEX technology, the average number of selection cycles is 10. This limitation is partly due to the increase in PCR by-products. As we can see from Figure 2 Supplementary Figure 1, the percentage of PCR by-products is less than 20% at cycle 12, and then increases dramatically in the following cycles. We are performing a series of experiments with known and limited sequences to verify and better understand the phenomenon for future applications of the SEDES platform. On this issue we decided not to modify the manuscript since we think it is already well discussed in Appendix 1.

      And the third point:

      • Perform one cycle (or a few cycles) with random DNA individuals, the most frequent individuals at the end of the current experiment, newly designed individuals with higher binding affinity to the target than currently dominating individuals, newly designed individuals with higher propensity to form hairpins or to form homodimers. Such experimental testing of predictions from the data analysis/modeling, typical of a physics approach, would illustrate the level of understanding one can reach with a simple yet powerful experimental setup.

      We perfectly agree that the approach we propose and the set of results we obtained call for further investigations that could strengthen analysis and modeling. The final aim we envisage is the understanding, within this simplified approach, of key evolutionary factors such as fitness. Indeed, becoming able to write an explicit fitness function would be a significant new contribution to the understanding of evolutionary processes, even within the limited settings of the ADSE approach, as discussed in the conclusions of the manuscript.

      However, undergoing such an analysis is a long and expensive job, which we have started and will be completed in a not immediate future. For this reason, given the already significant body of results we are presenting here, we prefer to keep this paper confined to the study of the evolution of a random DNAi population and discuss in a future contribution the behavior of smaller designed sets of competing, collaborating or parasitic individuals.

      Looking ahead, additional stages of investigations will also include mutations - to investigate the kinetics of speciation, and, in an even further stage, the interplay between evolution kinetics and dynamical mutation of resources.

      I have a few smaller points:

      • It would be very useful to provide the expected dynamic range of binding free energies (in terms of DeltaG and omega): what is the maximum binding free energy for the perfect complement?

      The NUPACK-computed binding free energy of a 20 basis-long oligomer complementary to the resource (𝜔=20) is -24.36 Kcal/mol for Oligo1 and -23.08 Kcal/mol for oligo 2. This is the best answer we can offer to the reviewer’s request, since the maximum binding free energy of DNAi individuals (much longer than the target strand) would include contributions from the unpaired bases. Indeed, the values give above are approached by the left tail of the distribution of Fig. 3a, which however includes DNAi self-energies.<br /> The perfect complement binding free energy is now cited in the text as a reference for the dynamical range of DeltaG (p4 lines 151-152).

      • How is the number of captured DNA molecules quantified? Is 10^12 measured, estimated, or hypothesized?

      The number of sequences was calculated from data obtained from 260 nm absorbance quantification. We have now added this information in the Methods, Selection Phase” section.

      Reviewer #2:

      Summary:

      In this manuscript, the authors introduced ADSE, a SELEX-based protocol to explore the mechanism of emergency of species. They used DNA hybridization (to the bait pool, "resources") as the driving force for selection and quantitatively investigated the factors that may contribute to the survival during generation evolution (progress of SELEX cycle), revealing that besides individual-resource binding, the inter- and intra-individual interactions were also important features along with mutualism and parasitism.

      Strengths:

      The design of using pure biochemical affinity assay to study eco-evolution is interesting, providing an important viewpoint to partly explain the molecular mechanism of evolution.

      Weaknesses:

      Though the evidence of the study is somewhat convincing, some aspects still need to be improved, mostly technical issues.

      Major:

      1) There are a few technical issues that the authors should clarify in the manuscript to make the analysis more transparent:

      1.1) To my understanding, it is difficult to guarantee the even distribution of different species (individuals) in the initial individual pool. Even though the authors have shown in Fig. 2a that the top 10 sequences take up ~ 0% in the pool, it remains unclear how abundant these top and bottom representative sequences are, given the huge number of the pool (10E15). Can the author show the absolute number of these sequences in different quantiles? Please show both Oligo sets.<br /> : First, we thank the reviewer for both positive and critical comments that have guided us in reformulating or clarifying some messages of our work.

      As for this specific point: 10E15 is a small number compared to 4^50 = 10E30, the number of possible sequences of length 30. Thus, we don’t expect more than one individual per sequence in the initial pool. However, sequencing requires a preparation amplification, which may lead to detecting a few sequences with more than one individual.

      Specifically, in the initial pool of Oligo 1, the most abundant individual (of sequence GAACTAAAGGGGCGGTGTCCACTTGCCTGTAGTGGTTATCAGTCCGGTTG)has 3 copies. The 0.7% of the sequences has 2 copies, while the vast majority of strings (99.3% on a sample of about 1.5 x 10E6 sequenced DNAi) is present in one copy only. A similar situation holds for Oligo 2, with 4 DNAi present in 3 copies and the 0.8% of the sequences (in a pool of 2 x 10E6 DNAi) in 2 copies.

      It is worth noticing that none of the 10 most abundant species in the last cycle is present in the sample. Indeed, the fraction of the pool which is sequenced is removed from the population that undergoes evolution (as now specified in p2, line 104). We specified in the text (p2, lines 69-70, p3 lines 94-96) the fact that in the initial pool no sequence is expected to be present in more than one individual.

      1.2) The author claimed that they used two different oligo sets (Oligo1 and Oligo2) in this study. It is unclear which data was used in the presentation. How reproducible are they? Similar to this concern, how reproducible if the same oligo set was used to repeat the experiment?

      The oligo used in the main text was declared in Methods, Replica section. It is now declared also in the main text (p3 lines 106-108 and in the captions of Figure 2, Figure 3 and Figure 4). Reproducibility is addressed in: Figure 2-figure supplement 5; Figure 2-figure supplement 6; Appendix 2: Results of the experimental replica.

      It should also be noted that two starting pools of random 50mers are necessarily disjoint sets for the same reason discussed in the previous answer: the probability of common sequences in two 10E15 selections from a 10E30 is negligibly small. Thus, it is expected that each time a new evolution experiment is started, different dominant sequences are found. However, the statistical properties of the DNAi pool during the evolution process of Oligo1 and Oligo2 are similar as discussed in Appendix 2 of the paper.

      1.3) PCR and illumina sequencing itself introduced selection bias. How would the analysis eliminate them? The authors only discussed the errors created during PCR cycles (page 3, lines 115-122). However the PCR itself would prefer to amplify some sequences over the others (e.g. with high GC content). Similarly, the illumina sequencing would be difficult to sequence the low complexity sequences. How would this be circumvented?

      Yes, both PCR and Illumina sequencing have some known biases in the amplification process (e.g. sequencing of homopolymers or amplification of GC-rich sequences) that are intrinsic to the used techniques. Regarding PCR, we implemented a thermal protocol optimized for our chosen experimental setup, characterized by very short denaturation, annealing and amplification steps performed at high temperatures. Regarding Illumina sequencing, we can’t rule out a bias against specific sequences (e.g, homopolymers), which however should not be captured during the selection step, due to the design of the resource. Also, the libraries subjected to sequencing are characterized by a low complexity: according to the experimental design, the first and last 25 nucleotides are the same for all DNAi, the only differences being in the central 50 nt-long sequence. It is known that a low complexity library might encounter problems during sequencing due to the design of Illumina instruments: nucleotide diversity, especially in the first sequencing cycles, is critical for cluster filtering, optimal run performance and high-quality data generation. To overcome this limitation, the obtained libraries were run together with more complex and diverse library preparations: the ADSE sequences were about 1-2% of the total reads per run, corresponding to only a few million reads.

      This discussion is now in Appendix 1, Intrinsic limitations of the molecular approach.

      1.4) Some DNA sequences would bind to the beads instead of the resource sequence coated on them. Should the author run the experiment using bead alone as a control?<br /> : We performed a negative control experiment in which we performed the “selection step” with bear beads, i.e. beads without with no DNA grafted on them. We then compared the results with the corresponding results of the original experiments on Oligo 1 and 2.

      After 6 cycles, the most abundant sequence in the negative dataset has a relative occurrence of 0.05%, whereas the dominant strand in Oligo 1 and Oligo 2 has an abundance of 8% and 16%, respectively, i.e. 40-80 times larger.

      This indicates that the drift due to non-specific binding (+ PCR amplification) is at least two orders of magnitude smaller than the selection induced by the affinity with the resource.<br /> This part is now discussed in Appendix 1, Experimental controls.

      2) It would be interesting to study the impact of environmental factors, for example, changing pH, salt concentration, and detergent. Would these factors accelerate/decelerate the evolution?

      We agree that the approach we propose and the set of results we obtained call for further investigations. However, performing these additional experiments, which would require a minimum of 6 generations each, is a long and expensive job, which we have started and will not be completed in the near future. For this reason, given the already significant body of results we are presenting here, we prefer to keep this paper confined to the study of the evolution of a random DNAi population in the selected conditions and leave the exploration of new conditions, potentially opening new evolutionary scenarios, to a future contribution. In fact, our aim was to show that through our platform we can indeed observe fundamental elements of evolution in a non-biological system, which, in the set of chosen parameters, we do.

      3) The concentration of individual oligo is apparently one of the most important factors in determining the interactions. In later cycles, some oligos become dominant, namely with extremely higher concentrations compared to their concentration in earlier cycles. This would definitely affect its interaction with resources, or self-interaction, or interaction with other oligos in the pool. However, the authors failed to discuss this factor, which may explain the exponential enrichment in later cycles.

      We agree with the reviewer that this is an important point, but we disagree that we have not discussed it. We introduce the topic at the end of the “Null Model and Eco-evolutionary Algorithm”, where we comment on the change of the gamma parameter by saying that there must be a shift in the evolution process, first dominated by the interactions with the resources, and in later stages by some other factors (lines 227230) that we then discuss in “Self and mutual DNAi interactions are evolutionary drivers”. In this latter chapter and in the following, we indeed discussed the effects of mutual and self interactions between DNAi.

      Indeed, a key point in our paper is the change in the gamma parameter necessary to match the IBEE model to experiments, as it is now more openly stated (p5 lines 217218 where we also mention figure 2-supplement 8 which clearly shows the necessity of a variable gamma). The two regimes enlightened by the gamma value must reflect a change in the competition for the resources and interactions among species. In the first generations, where the diversity of species is large (there are few strings for each species) and binding to the resources generally very week (small <omega>), the affinity with the resource is the main driving force (fast growth of <omega>), while mutual interactions remain too random to favor any species in particular. In the later cycles instead, when <omega> becomes large enough to provide a significant stability to the resource-binding of the majority of species, the dominating species compete more intensively on the basis of their structure and capacity of self-defense, parasitism and mutualism, a condition in which evolution affects more modifications in sequences than in <omega>.

      Certainly, our understanding of this shift is based on statistical behavior and it is inferential, based on the study of specific DNAi described in the last part of the manuscript. For a better molecular model, more experiments with selected DNAi competing, cooperating or being parasitic would be necessary, with the final aim of defining a predictive fitness function. Alas, this requires months of further investigation. :

      4) The author observed the different behaviors of medium 𝜔 in early and late cycles, referring to Fig 2h. Using the IBEE model, they found out it is the change of gamma. However, the authors did not further discuss the molecular mechanism. It could be very interesting to understand the evolutionary change of these individuals.

      This comment might be related to the previous one. It is true that our discussion and understanding of the whole process is statistical, and misses a molecular model to predict the value of gamma.

      However, the specific behavior that the reviewer asks about (those in Fig. 2h) is not related to the change in gamma. Even if gamma remains as in the first part of the evolution (gamma = 3), the species with overlap between 6 and 10 would first grow in number and later decrease. Indeed, during the first cycles they have an advantage with respect to the majority of species with lower maximum overlap, a condition that favors their amplification. However, in the second stage of the evolution dominant species with a larger affinity emerge and outcompete the individuals of this class. We added a sentence in the text to clarify this point (p7 lines 227-229).

      5) In Figure 2f, some high w become quite missing. Should the authors give some interpretation? It is not observed in cycle 12 though (panel e).

      Such an effect is just due to under-sampling. In a pool of 10^n oligomers, any sequence with a given 𝜔 with P(omega) < 10E-n will have a vanishing probability to appear in that sample.<br /> At cycle 12 the overall number of sequenced strands is larger than at cycle 24, due to the growing presence of PCR by-products. Thus, the right tail of the cyan distribution at the last cycle is sampled with less accuracy than at cycle 12. We have added a sentence in the revised manuscript (p5 lines 177-178) to clarify this point.

      6) It would be interesting to further explore if another type of selection resource is used, for example protein that binds to particular sequences, i.e. transcription factors. Previous studies have used a large amount of sequence-specific transcription factors to run SELELX. Since the data have existed there, why not explore?

      This is an interesting suggestion: can we use data from “ordinary” SELEX favoring specific sequences to explore sequence evolution? Two limitations make us a bit skeptical on this path: first, the consensus sequences of DNA-binding proteins are rather short and typically target dsDNA rather than ssDNA; second, the free energy of interaction is known only for the consensus sequence but not for sequences with all possible mutations with respect to the consensus sequence, making very hard to develop any molecular understanding of the process.

      Minor:

      1) There is no figure legend or in-text citation of Figure 2b.

      2) Please correct "⁃C" with "{degree sign}C" in lines 470, 471, 472, 477 et al.

      3) Typos and grammar issues should be corrected. Examples are shown below (but not limited to these only):

      • mixed use of past and present tense.

      • Line 152, "basis" should be "bases".

      • Line 277, "a impediment" should be "an impediment"

      • Line 278, "a major deadly threats" should be "major deadly threats"<br /> :<br /> We are sorry for the mistakes, and we have corrected them. Many thanks to the reviewer!

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of the current study was to evaluate the effect of neuronal activity on blood-brain barrier permeability in the healthy brain, and to determine whether changes in BBB dynamics play a role in cortical plasticity. The authors used a variety of well-validated approaches to first demonstrate that limb stimulation increases BBB permeability. Using in vivo-electrophysiology and pharmacological approaches, the authors demonstrate that albumin is sufficient to induce cortical potentiation and that BBB transporters are necessary for stimulus-induced potentiation. The authors include a transcriptional analysis and differential expression of genes associated with plasticity, TGF-beta signaling, and extracellular matrix were observed following stimulation. Overall, the results obtained in rodents are compelling and support the authors' conclusions that neuronal activity modulates the BBB in the healthy brain and that mechanisms downstream of BBB permeability changes play a role in stimulus-evoked plasticity. These findings were further supported with fMRI and BBB permeability measurements performed in healthy human subjects performing a simple sensorimotor task. While there are many strengths in this study, there is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions. The authors only used males in this study and do not discuss whether they would also expect to sex differences in stimulation-evoked BBB changes in the healthy brain. Another minor limitation is the authors did not address the potential impact of anesthesia which can impact neurovascular coupling in rodent studies. The authors could have also better integrated the RNAseq findings into mechanistic experiments, including testing whether the upregulation of OAT3 plays a role in cortical plasticity observed following stimulation. Overall, this study provides novel insights into how neurovascular coupling, BBB permeability, and plasticity interact in the healthy brain.

      While there are many strengths in this study, there is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions. The authors only used males in this study and do not discuss whether they would also expect to sex differences in stimulation-evoked BBB changes in the healthy brain.

      We agree with the reviewer regarding the importance of examining sex differences on stimulation-evoked BBB changes. To address this issue we have: (1) clarified in the methods section that the human study involved both males and females; (2) added a section to the discussion highlighting the male bias as a key limitation of our animal experiments; and (3) stated that future work should examine whether stimulation-evoked BBB changes differ between makes and females.

      Another minor limitation is the authors did not address the potential impact of anesthesia which can impact neurovascular coupling in rodent studies.

      We are grateful for this comment and agree with the reviewer that the potential effects of anesthesia should be discussed. We have added the following discussion paragraph:

      “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can affect various receptors within the NVU, potentially altering neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketamine-xylazine anesthesia, which unlike other anesthetics, was shown to generate robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018).”

      Reviewer #2 (Public Review):

      Summary:

      This study builds upon previous work that demonstrated that brain injury results in leakage of albumin across the bloodbrain barrier, resulting in activation of TGF-beta in astrocytes. Consequently, this leads to decreased glutamate uptake, reduced buffering of extracellular potassium, and hyperexcitability. This study asks whether such a process can play a physiological role in cortical plasticity. They first show that stimulation of a forelimb for 30 minutes in a rat results in leakage of the blood-brain barrier and extravasation of albumin on the contralateral but not ipsilateral cortex. The authors propose that the leakage is dependent upon neuronal excitability and is associated with an enhancement of excitatory transmission. Inhibiting the transport of albumin or the activation of TGF-beta prevents the enhancement of excitatory transmission. In addition, gene expression associated with TGF-beta activation, synaptic plasticity, and extracellular matrix are enhanced on the "stimulated" hemisphere. That this may translate to humans is demonstrated by a breakdown in the blood-brain barrier following activation of brain areas through a motor task.

      Strengths:

      This study is novel and the results are potentially important as they demonstrate an unexpected breakdown of the blood-brain barrier with physiological activity and this may serve a physiological purpose, affecting synaptic plasticity.

      The strengths of the study are:

      1) The use of an in vivo model with multiple methods to investigate the blood-brain barrier response to a forelimb stimulation.

      2) The determination of a potential functional role for the observed leakage of the blood-brain barrier from both a genetic and electrophysiological viewpoint.

      3) The demonstration that inhibiting different points in the putative pathway from activation of the cortex to transport of albumin and activation of the TGF-beta pathway, the effect on synaptic enhancement could be prevented.

      4) Preliminary experiments demonstrating a similar observation of activity-dependent breakdown of the blood-brain barrier in humans.

      Weaknesses:

      There are both conceptual and experimental weaknesses.

      1) The stimulation is in an animal anesthetized with ketamine, which can affect critical receptors (ie NMDA receptors) in synaptic plasticity.

      We agree that the potential effects of anesthesia should be considered. The Discussion was revised to address this point: “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can affect various receptors within the NVU, potentially altering neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketamine-xylazine anesthesia, which unlike other anesthetics, was shown to generate robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018)”

      2) The stimulation protocol is prolonged and it would be helpful to know if briefer stimulations have the same effect or if longer stimulations have a greater effect ie does the leakage give a "readout" of the stimulation intensity/length.

      Thank you for this important comment. We are also very curious about the potential relationship between stimulation magnitude/duration and subsequent leakage and have added the following statement to the discussion:

      “Future studies should also explore the effects of stimulation magnitude/duration on BBB modulation, as well as the stimulation threshold between physiological and pathological increase in BBB permeability.”

      Our current findings indicate that a one-minute stimulation does not affect vascular permeability or SEP and we aim to test additional stimulation paradigms in future studies.

      3) For some of the experiments (see below), the numbers of animals are low and the statistical tests used may not be the most appropriate, making the results less clear cut.

      We appreciate this comment and have revised the statistical analysis of Figure 1J,K. We now use a nested t-test to test for differences between rats (as opposed to sections). The differences remain significant (EB, p=0.0296; Alexa, p=0.0229). The text was modified accordingly.

      4) The experimental paradigms are not entirely clear, especially the length of time of drug application and the authors seem to try to detect enhancement of a blocked SEP.

      Thank you for pointing this out. Figures 2&3 were revised for clarification and a ‘Drug Application’ subsection was added to the methods section.

      5) It is not clear how long the enhancement lasts. There is a remark that it lasts longer than 5 hours but there is no presentation of data to support this.

      Thank you for this comment. As the length of experiments differed between animals, the exact length could not be specifically stated. To clarify this point, we revised the text to indicate that LTP was recorded until the end of each experiment (between 1.5-5 hours, depending on the condition the animal was in). We also added a panel to figure 2 (Figure 2d) with exemplary data showing potentiation 60, 90, and 120 min post stimulation.

      6) The spatial and temporal specificity of this effect is unclear (other than hemispheric in rats) and even less clear in humans.

      Our animal experiments (using both in vivo imaging and histological analysis) showed no evidence of BBB modulation outside the cortical somatosensory area corresponding to the limbs. We looked at the entirety of the coronal section of the brain and found enhancement solely in the somatosensory area corresponding to limb. The right side of panels h and i in Figure 1 show an x20 magnification of the section, focusing on the enhanced area. The whole section was not shown, as no fluorescence was found outside the magnified area. Moreover, our quantification showed that the enhancement was specific to the contralateral and not ipsilateral somatosensory cortex (Figure 1 j-k).

      We agree that temporal specificity needs to be further explored, and we have now stated that in the discussion: “Future studies are needed to explore the BBB modulating effects of additional stimulation protocols – with varying durations, frequencies, and magnitudes. Such studies may also elucidate the temporal and ultrastructural characteristics that may differentiate between physiological and pathological BBB modulation.”

      We also agree that larger studies are needed to better understand the specificity of the observed effect in humans, and to account for potential inter-human variability in vascular integrity and brain function due to different schedules, diets, exercise habits, etc.

      8) The experimenters rightly use separate controls for most of the experiments but this is not always the case, also raising the possibility that the application of drugs was not done randomly or interleaved, but possibly performed in blocks of animals, which can also affect results.

      Thank you for pointing out this lack of clarity. We have now highlighted that drug application was done randomly.

      9) Methyl-beta-cyclodextrin clears cholesterol so the effect on albumin transport is not specific, it could be mediating its effect through some other pathway.

      We agree that the effect of mβCD may not be specific. To mitigate this issue, we used a very low mβCD concentration (10uM). Notably, this is markedly lower than the concentrations reported by Koudinov et al, showing that cholesterol depletion is observed at 5mM mβCD and not at 2.5mM/5mM (Koudinov & Koudinova, 2001). This point was added to the discussion.

      10) Since the breakdown of the blood-brain barrier can be inhibited by a TGF-beta inhibitor, then this implies that TGFbeta is necessary for the breakdown of the blood-brain barrier. This does not sit well with the hypothesis that TGF-beta activation depends upon blood-brain barrier leakage.

      Thank you for pointing out this lack of clarity. We have added a discussion paragraph that clarifies our hypothesis: “As mentioned above, albumin is a known activator of TGF-β signaling, and TGF-β has a well-established role in neuroplasticity. Interestingly, emerging evidence suggests that TGF-β also increases cross-BBB transcytosis (Betterton et al., 2022; Kaplan et al., 2020; McMillin et al., 2015; Schumacher et al., 2023). Hence, we propose the following two-part hypothesis for the TGF-β/BBB-mediated synaptic potentiation observed in our experiments: (1) prolonged stimulation triggers TGF-β signaling and increased caveolae-mediated transcytosis of albumin; and (2) extravasated albumin induces further TGF-β signaling, leading to synaptogenesis and additional cross-BBB transport – in a self-reinforcing positive feedback loop. Future research is needed to examine the validity of this hypothesis.

      Reviewer #3 (Public Review):

      Summary:

      This study used prolonged stimulation of a limb to examine possible plasticity in somatosensory evoked potentials induced by the stimulation. They also studied the extent that the blood-brain barrier (BBB) was opened by prolonged stimulation and whether that played a role in the plasticity. They found that there was potentiation of the amplitude and area under the curve of the evoked potential after prolonged stimulation and this was long-lasting (>5 hrs). They also implicated extravasation of serum albumin, caveolae-mediated transcytosis, and TGFb signalling, as well as neuronal activity and upregulation of PSD95. Transcriptomics was done and implicated plasticity-related genes in the changes after prolonged stimulation, but not proteins associated with the BBB or inflammation. Next, they address the application to humans using a squeeze ball task. They imaged the brain and suggested that the hand activity led to an increased permeability of the vessels, suggesting modulation of the BBB.

      Strengths:

      The strengths of the paper are the novelty of the idea that stimulation of the limb can induce cortical plasticity in a normal condition, and it involves the opening of the BBB with albumin entry. In addition, there are many datasets and both rat and human data.

      Weaknesses:

      The conclusions are not compelling however because of a lack of explanation of methods and quantification. It also is not clear whether the prolonged stimulation in the rat was normal conditions. To their credit, the authors recorded the neuronal activity during stimulation, but it seemed excessive excitation. Since seizures open the BBB this result calls into question one of the conclusions. that the results reflect a normal brain. The authors could either conduct studies with stimulation that is more physiological or discuss the caveats of using a supraphysiological stimulus to infer healthy brain function.

      The conclusions are not compelling however because of a lack of explanation of methods and quantification.

      Thank you for this comment. In the revised paper, we expanded the Methods section to better describe the procedures and approaches we used for data analysis.

      It also is not clear whether the prolonged stimulation in the rat was normal conditions.

      We believe that the used stimulation protocol is within the physiological range (and relevant to plasticity, learning and memory) for the following reasons:

      1) In our continuous electrophysiological recordings, we did not observe any form of epileptiform or otherwise pathological activity.

      2) Memory/training/skill acquisition experiments in humans often involve similar training duration or longer (Bengtsson et al., 2005), e.g., a 30 min thumb training session performed by (Classen et al., 1998).

      3) The levels of SEP potentiation we observed are similar to those reported in:

      a) Rats following a 10-minute whisker stimulation (one hour post stimulation, (Mégevand et al., 2009)).

      b) Humans following a 15 min task (McGregor et al., 2016).

      This important point is now presented in the discussion.

      Reviewer #1 (Recommendations For The Authors):

      The discussion would benefit from additional discussion of the potential impacts of sex and anesthesia in their findings.

      We agree with the reviewer and have added the following paragraph to the discussion:

      “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can potentially alter neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketaminexylazine anesthesia, which unlike other anesthetics, was shown to maintain robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018). Another limitation of our animal study is the potentially non-specific effect of mβCD – an agent that disrupts caveola transport but may also lead to cholesterol depletion (Keller & Simons, 1998). To mitigate this issue, we used a very low mβCD concentration (10uM), orders of magnitude below the concentration reported to deplete cholesterol (Koudinov et al). Lastly, our animal study is limited by the inclusion of solely male rats. While our findings in humans did not point to sex-related differences in stimulation-evoked BBB modulation, larger animals and human studies are needed to examine this question.”

      The figure text is quite small.

      Thank you for pointing this out, we revised all figures and increased font size for clarity.

      Including pharmacological concentrations within the figure legends would improve the readability of the manuscript.

      Thank you for this suggestion, the figure legends were modified accordingly.

      In methods for immunoassays the 5 groups could be more clear by stating that there are 3 timepoints for stimulation experiments. There is a typo in this section where the 24-hour post is stated twice in the same sentence.

      Thank you for pointing this out, the text was modified accordingly.

      Reviewer #2 (Recommendations For The Authors):

      1) In Figure 1, J and K seem to indicate that in these experiments the statisitics were done per slice and not per animal. This is not a reasonable approach, a repeat measure ANOVA or averaging for each animal are more appropriate statistical approaches.

      We thank the reviewer for pointing this out. The statistical analysis for Figure 1j,k was modified. We now use a nested ttest to test for differences between rats and not sections. The differences are still significant (EB, p=0.0296; Alexa, p=0.0229). The manuscript was modified accordingly.

      2) In Figure 2, the protocol does not seem to give much idea about time course. There was a stimulation test for 1 minute before and then 1 minute after the 30-minute stimulation train. How was potentiation assessed for the next 5 hours and where are the data?

      Potentiation was assessed by repeating 1min test stim every 30 min for the duration of the experiment, we added a panel to show late potentiation, see response above.

      3) In Figure 2, there is a notable lack of controls eg the effect of sham stimulation and application of saline. These are important as the drift of response magnitude can be a problem in long experiments.

      We did test for the potential presence of response drift, by examining whether SEPs of non-stimulated animals change over time (at baseline, 30 or 60 minutes of recording; n=6). No statistical differences were found. Our analysis focused on using each animal as its own control (i.e., comparing baseline SEP to SEP post albumin perfusion), because SEP studies highlight the importance of comparing each animal to its own baseline, due to the large inter-animal variability (All et al., 2010; Mégevand et al., 2009; Zandieh et al., 2003).

      4) Figure 3 a is not clear – were the drugs applied throughout?

      Thank you for pointing this out. We have revised Figure 3 a to show that the drugs were applied for 50 min before the stimulation.

      5) In Figure 3 panel d is repeated in panel j. This needs correcting

      Thank you. This mistake was fixed.

      6) In LTP-type experiments usually the antagonist is applied during the stimulation and then washed out. This avoids the problem in this figure in which CNQX effectively blocks transmission and so it is not possible to detect any enhancement if it were there. Eg in panel e, CNQX block transmission, and then the assessment is performed when the AMPA receptors are blocked after 30 minutes of stimulation. If receptors are blocked no enhancement will be detectable. Moreover, surely the question is the ratio of the effect of 30-minute stimulation on the SEP in the presence of CNQX and so the statistics should be done on the fold change in the SEP following 30-minute stimulation in the presence of CNQX.

      Thank you. The protocol might have been misrepresented in the original figure. We modified Fig 3a to clarify that the antagonists were indeed washed out upon stimulation start to make sure the receptors are not blocked during the test stimulation following the 30 min stimulation. In addition, we tested for the difference in fold change between 30 min stim, and 30 min stimulation following antagonists wash-in (Fig 3f and Fig S2a).

      7) Interesting in Figure f, stimulation, albumin, and AP5 all seem to have the same enhancement of the SEP. Is the lack of effect of 30-minute stimulation in the presence of AP5, a ceiling effect ie AP5 has enhanced the SEP, and no further enhancement from stimulation is possible.

      This is a very interesting point that will require further research.

      8) SJN seems to block neurotransmission. What is the mechanism? The same analysis as for CNQX should be performed ie what is the fold change not compared to baseline but in the presence of SJN.

      Our quantification showed that SJN did not significantly reduce the SEP max amplitude, and we therefore did not include this graph in the figure.

      9) Please acknowledge that the effect of mbetaCD is non-specific. There is a large literature on the effects of cholesterol depletion on LTP.

      We agree that the effect of mβCD may not be specific. To mitigate this issue, we used a very low mβCD concentration (10µM). Notably, this is markedly lower than the concentrations reported by Koudinov et al, showing that cholesterol depletion is only observed at a concentration of 5mM (Koudinov & Koudinova, 2001). This point is now discussed under the discussion paragraph describing the study’s limitations.

      10) k&l seem to have used the same control in which case they should not be analysed separately (they are all part of the same experiment).

      We agree with the reviewer and have revised the figure accordingly.

      11) The difference in gene expression in Figure 4 would be more convincing if it could be prevented by for example a TGFbeta inhibitor.

      We agree and acknowledge the impact such experiments could provide. We plan to incorporate these experiments into our future studies.

      12) Figure 5 seems to indicate bilateral and widespread BBB modulation arguing that this may be a non-specific effect. Panel g should look at other neocortical regions eg occipital cortex.

      We agree and thank the reviewer for this comment. We revised the figure to include other cortical areas, such as the frontal and occipital cortices (Figure 5g)

      Minor comments

      1) Paired data eg in Fig 2D are better represented by pairing the dots usually with a line.

      2) Please correct the %fold baseline in axes in graphs which show % change for baseline.

      3) Figure 4 is not correctly referred to in the text.

      We agree with all the points raised by the reviewer and revised the figures and text accordingly.

      Reviewer #3 (Recommendations For The Authors):

      The conclusions are not compelling however because of a lack of explanation of methods and quantification. It also is not clear whether the prolonged stimulation in the rat was normal conditions. To their credit, the authors recorded the neuronal activity during stimulation, but it seemed excessive excitation. Since seizures open the BBB this result calls into question one of the conclusions. that the results reflect a normal brain. The authors could either conduct studies with stimulation that is more physiological or discuss the caveats of using a supraphysiological stimulus to infer healthy brain function.

      Major concerns:

      Methods need more explanation. Rationales need more justification. Examples are provided below.

      Throughout many sections of the paper, sample sizes and stats are often missing. For stats, please provide p-values and other information (tcrit, U statistic, F, etc.)

      Thank you, we added the relevant information where it was missing throughout the manuscript.

      For transcriptomics, they might have found changes in BBB-related genes if they assayed vessels but they assayed the cortex.

      We agree with the reviewer that this would be a very interesting future direction. The present study could not include this kind of analysis due to lack of access to vasculature isolation methods or single-cell RNA seq.

      What were the inclusion/exclusion criteria for the subjects?

      Thank you for pointing out this lack of clarity. The methods section (under ‘Magnetic Resonance Imaging’ – ‘Participants’) was expanded to include the following:

      “Male and female healthy individuals, aged 18-35, with no known neurological or psychiatric disorders were recruited to undergo MRI scanning while performing a motor task (n=6; 3 males and 3 females). MRI scans of 10 sex- and age- matched individuals (with no known neurological or psychiatric disorders) who did not perform the task were used as control data (n=10; 5 males and 5 females.

      Were they age and sex-matched?

      They were, indeed, age and sex-matched. This was now clarified in the relevant Methods section.

      Were there other factors that could have influenced the results?

      Certainly. Human subjects are difficult to control for due to different schedules, diets, exercise habits, and other factors that may impact vascular integrity and brain function. Larger multimodal studies are needed to better understand the observed phenomenon.

      Fig. 1. Images are very dim. Text here and in other figures is often too small to see. Some parts of the figures are not explained.

      Our apologies. Figures and legends were revised accordingly.

      Fig 2a, f. I don't see much difference here- do the authors think there was?

      We agree that the difference may not be visually obvious. The quantification of trace parameters (amplitude and area under curve) does, however, reveal a significant SEP difference in response to both stimulation (panels X and y) and albumin (panels z and q).

      Fig 3 d and j seem the same.

      We thank the reviewer for noticing. This was a copy mistake that was now rectified.

      Lesser concerns and examples of text that need explana9on:

      Introduction

      Insulin-like growth factor is transported. From where to where?

      The text was edited to clarify that this was cross-BBB influx of insulin-like growth factor-I.

      RMT that underlies the transport of plasma proteins was induced by physiological or non-physiological stimulation.

      This was shown without stimulation, in normal physiology of young and aged healthy mice. The text was edited to clarify this point.

      What was the circadian modulation that was shown to implicate BBB in brain function?

      The text was edited for clarity.

      Results

      When the word stimulation is used please be specific if whiskers are moved by an experimenter, an electrode is used to apply current, etc.

      We have now moved the ‘Stimulation protocol’ section closer to beginning of the Methods and emphasized that we administered electrical stimulation to the forepaw or hindlimb using subdermal needle electrodes.

      Please explain how the authors are convinced they localized the vascular response.

      The vascular response was localized via: (1) visual detection of arterioles that dilated in response to stimulation (due to functional hyperemia / neurovascular coupling) [figure 1 d]; and (2) quantitative mapping of increased hemoglobin concentration (Bouchard et al., 2009) [Figure 1 b]. This is now mentioned in the methods (under ‘In vivo imaging’) and results (under the ‘Stimulation increases BBB permeability’).

      "30 min of limb stimulation" means what exactly? 6 Hz 2mA for 30 min?

      Thank you. The text was revised for clarity (Methods under ‘Stimulation protocol’):

      “The left forelimb or hind limb of the rat was stimulated using Isolated Scmulator device (AD Instruments) attached with two subdermal needle electrodes (0.1 ms square pulses, 2-3 mA) at 6 Hz frequency. Test stimulation consisted of 360 pulses (60 s) and delivered before (as baseline) and after long-duration stimulation (30 min, referred throughout the text as ‘stimulation’). In control and albumin rats, only short-duration stimulations were performed. Under sham stimulation, electrodes were placed without delivering current.”

      Histology that was performed to confirm extravasation needs clarification because if tissue was removed from the brain, and fixed in order to do histology, what is outside the vessels would seem likely to wash away.

      Thank you for pointing out the need to clarify this point. The Histology description in the Methods section was revised in the following manner:

      “Albumin extravasacon was confirmed histologically in separate cohorts of rats that were anesthetized and stimulated without craniotomy surgery. Assessment of albumin extravasacon was performed using a well-established approach that involves peripheral injection of either labeled-albumin (bovine serum albumin conjugated to Alexa Flour 488, Alexa488-Alb) or albumin-labeling dye (Evans blue, EB – a dye that binds to endogenous albumin and forms a fluorescent complex), followed by histological analysis of brain tissue (Ahishali & Kaya, 2020; Ivens et al., 2007; Lapilover et al., 2012; Obermeier et al., 2013; Veksler et al., 2020). Since extravasated albumin is taken up by astrocytes (Ivens et al., 2007; Obermeier et al., 2013), it can be visualized in the brain neuropil after brain removal and fixation (Ahishali & Kaya, 2020; Ivens et al., 2007; Lapilover et al., 2012; Veksler et al., 2020). Five rats were injected with Alexa488-Alb (1.7 mg/ml) and five with EB (2%, 20 mg/ml, n=5). The injections were administered via the tail vein. Following injection, rats were transcardially perfused with…”

      It is not clear why there was extravasacon contralateral but not ipsilateral if there are cortical-cortical connections.

      Interpersonally, we also did not observe ipsilateral SEP in response to limb stimulation, with evidence of SEP and BBB permeability only in the contralateral sensorimotor region. This finding is consistent with electrophysiological and fMRI studies showing that peripheral stimulation results in predominantly contralateral potentials (Allison et al., 2000; Goff et al., 1962).

      After injection of Evans blue or Alexa-Alb, how was it shown that there was extravasacon?

      Extravasalon in cortical sections was visualized using a fluorescent microscope (Figure 1 h-i). Since extravasated albumin is taken up by astrocytes, fluorescent imaging can be used for visualizing and quantifying labeled albumin (Ahishali & Kaya, 2020; Ivens et al., 2007; Knowland et al., 2014). Here is the relevant methods excerpt:

      “Coronal sections (40-μm thick) were obtained using a freezing microtome (Leica Biosystems) and imaged for dye extravasacon using a fluorescence microscope (Axioskop 2; Zeiss) equipped with a CCD digital camera (AxioCam MRc 5; Zeiss).”

      How is a sham control not stimulated - what is the sham procedure?

      In the sham stimulation protocol electrodes were placed, but current was not delivered. A section titled ‘Stimulation protocol’ was added to the methods to clarify this point.

      What was the method for photothrombosis-induced ischemia?

      The procedure for photothrombosis-induced ischemia is described under the Methods section ‘Immunoassays’ – ‘Enzyme-linked immunosorbent assay (ELISA) for albumin extravasalon’:

      “Rats were anesthetilzed and underwent … photothrombosis stroke (PT) as previously described (Lippmann et al., 2017; Schoknecht et al., 2014). Briefly, Rose Bengal was administered intravenously (20 mg/kg) and a halogen light beam was directed for 15 min onto the intact exposed skull over the right somatosensory cortex.”

      Fig 1d. All parts of d are not explained.

      Thank you for pointing this out. In the revised manuscript, the panels of this figure were slightly reordered, and we made sure all panels are explained in the legend.

      e. Is the LFP a seizure? How physiological is this- it does not seem very physiological.

      Thank you for your comment. We believe that this activity is not a seizure because it lacks the typical slow activity that corresponds to the “depolarizalon shir” observed during seizures (Ivens et al., 2007; Milikovsky et al., 2019; Zelig et al., 2022).

      f. Permeability index needs explanation. How was the area chosen for each rat? Randomly? Was it the same across rats?

      We have now revised the Methods section to provide a clearer description of the permeability index calculation and the choice of the imaging area:

      “Across all experiments, acquired images were the same size (512 × 512 pixel, ~1x1 mm), centered above the responding arteriole. Images were analyzed offline using MATLAB as described (Vazana et al., 2016). Briefly, image registration and segmentation were performed to produce a binary image, separating blood vessels from extravascular regions. For each extravascular pixel, a time curve of signal intensity over time was constructed. To determine whether an extravascular pixel had tracer accumulation over time (due to BBB permeability), the pixel’s intensity curve was divided by that of the responding artery (i.e., the arterial input function, AIF, representing tracer input). This ratio was termed the BBB permeability index (PI), and extravascular pixels with PI > 1 were identified as pixels with tracer accumulation due to BBB permeability.”

      g. For Evans blue and Alexa-Alb was the sample size rats or sections?

      Thank you for this question. We revised the statistical analysis for Figure 1j,k to appropriately asses the differences between rats. We used a nested t-test to test for differences between rats (and not sections). The differences remained significant (EB, p=0.0296; Alexa, p=0.0229) and the text was modified accordingly.

      h, i, j need more contrast and/or brightness to appreciate the images. Arrows would help. The text is too small to read.

      Thank you. This issue was addressed in the revised paper.

      To induce potentiation, 6 Hz 2 mA stimuli were used for 30 min. Please justify this as physiological.

      Thank you for the comment. We believe that the used stimulation protocol is within the physiological range (and relevant to plasticity, learning and memory) for the following reasons:

      1. In our continuous electrophysiological recordings, we did not observe any form of epileptiform or otherwise pathological activity.

      2. Memory/training/skill acquisition experiments in humans often involve similar training duration or longer (Bengtsson et al., 2005), e.g., a 30 min thumb training session performed by (Classen et al., 1998).

      3. The levels of SEP potentiation we observed are similar to those reported in:

      a. Rats following a 10-minute whisker stimulation (one hour post stimulation, (Mégevand et al., 2009)).

      b. Humans following a 15 min task (McGregor et al., 2016).

      We have revised the Discussion of the paper to clarify this important point.

      The test stimulus to evoke somatosensory evoked potentials was 1 min. Was this 6 Hz 2 mA for 1 min? Please justify.

      Yes. We chose these parameters as these ranges were shown to induce the largest changes in blood flow (with laserdoppler flowmetry) and summated SEP (Ngai et al., 1999), corresponding with our findings. We also show that these stimulation parameters do not induce changes in BBB permeability nor synaptic potentiation, therefore served as test control.

      How long after the 30 min was the test stimulus triggered- immediately? 30 sec afterwards?

      The test stimulus was applied 5 min afterwards to allow for BBB imaging protocol (now explained in the Methods section).

      How were amplitude and AUC measured? Baseline to peak? For AUC is it the sum of the upward and downward deflections comprising the LFP?

      Yes, and yes. This is now clarified in the ‘Analysis of electrophysiological recordings’ section in the Methods.

      How was the same site in the somatosensory cortex recorded for each animal?<br /> Potentiation was said to last >5 hrs. How often was it measured? Was potentiation the same for the amplitude and the AUC?

      The location of the cranial window over the somatosensory cortex was the same in all rats. The location of the specific responding arteriole may change between animals, but the recording electrode was places around the responding arteriole in the same approaching angle and depth for all animals.

      As the length of experiments differed between animals, the exact length could not be specifically stated. We therefore revised the text to clarify that LTP was recorded until the end of each experiment (depending on the animal condition, between 1.5-5 hours) and added a panel to figure 2 (Figure 2f) with exemplary data showing potentiation 120 min (2hr) post stimulation.

      Why was 25% of the serum level of albumin selected- does the brain ever get exposed to that much? Was albumin dissolved in aCSF or was aCSF chosen as a control for another reason?

      Yes, albumin was dissolved in aCSF and the solution was allowed to diffuse through the brain. The relatively high concentration of albumin was chosen to account for factors that lower its effective tissue concentration:

      1. The low diffusion rate of albumin (Tao & Nicholson, 1996).

      2. The likelihood of albumin to encounter a degradation site or a cross-BBB efflux transporter (Tao & Nicholson, 1996; Zhang & Pardridge, 2001).

      Figure 2.

      a. Please show baseline, the stimulus, and aftier the stimulus.

      Please point out when there was stimulacon.

      What is the inset at the top?

      The inset on top is the example trace of the stimulus waveform, the legend of the figure was modified for clarity.

      b. Please show when the stimulus artifact occurred. The end of the 1-minute test stimulus period is fine. Why are the SEPs different morphologies? It suggests the different locations in the cortex were recorded.

      What is shown is the averaged SEP response over 1min test stimulus, each SEP is time locked to each stimulus. Regarding SEP waveform, it does indeed show different morphology between animals, as sometimes different arterioles respond to the stimulation, and we localize the recording to the responding vessel in each rat. However, in each rat the recording is only from one location. Once the electrode was positioned near the responding arteriole it was not moved.

      d, e. What are the stats?

      h, i. Add stats. Are all comparisons Wilcoxon? Please provide p values.

      The comparisons were performed with the Wilcoxon test. We now state that and provide the exact p values.

      j. What was selected from the baseline and what was selected during Albumin and how long of a record was selected?

      What program was used to create the spectrogram?

      What is meant by changes at frequencies above 200 Hz, the frequencies of HFOs?

      The Method section (under ‘Electrophysiology – Data acquisition and analyses’) has been revised for clarification. Spectrogram was created with MATLAB and graphed with Prism. For analysis, we selected a 10 min recorded segment before starting albumin perfusion, and 10 min after terminating albumin perfusion.

      When the cortial window was exposed to drugs, what were concentrations used that were selective for their receptor? How long was the exposure?

      Was the vehicle tested?

      We have revised the Methods section (under ‘Animal preparation and surgical procedures - Drug application’) to clarify the duration and concentration used and justification. All blockers were exposed for 50 min. The vehicle was an artificial cerebrospinal fluid solution (aCSF).

      For PSD-95, what was the area of the cortex that was tested?

      Were animals acutely euthanized and the brain dissected, frozen, etc?

      We have revised the Methods section (under ‘Immunoassays’) for clarity.

      What is mbetaCD?

      The full term was added to the results section. It is also mentioned in the Methods.

      Is SJN specific at the concentration that was chosen? Did it inhibit the SEP?

      In the concentration used in our experiments, SJN is a selective TGF-β type I receptor ALK5 inhibitor (see (Gellibert et al., 2004)).

      Fig. 3b. It looks like CNQX increased the width of the vessels quite a bit. Please explain.

      For AP5, very large vessels were imaged, making it hard to compare to the other data.

      The vascular dilation in response to the stimulation under CNQX was similar to that seen under “normal” conditions (i.e. aCSF). As for AP5, in some experiments the responding arteriole was in close proximity to a large venule that cannot be avoidable while imaging. For quantification we always measured arterioles within the same diameter range.

      e. Sometimes CNQX did not block the response after 30 min stimulation. Why?

      CNQX is washed out before the 30 min stimulation starts, so it is not expected to block the response to stimulation. However, in some cases the response to stimulation was lower in amplitude, likely due to residual CNQX that did not wash out completely.

      Regarding DEGs, on the top of p 10 what are the percentages of?

      In this analysis we tested in each hemisphere how many genes expressed differentially between 1 and 24 hours post stimulation (either up- or down- regulated). The results were presented as the percentages of differentially expressed genes in each hemisphere (13.2% contralateral, and 7.3% ipsilateral). The text was rephrased for clarity.

      Please add a ref for the use of the JSD metric methods and support for its use as the appropriate method. Other methods need explanation/references.

      References were added to the text to clarify. The Jensen-Shannon Divergence metric is commonly used to calculate the statistical pairwise distance among two distributions (Sudmant et al., 2015). From comparing a few different distance metric calculations including JSD, our results were similar irrespective of the distance metric applied. Therefore, we demonstrate the variability between paired samples of stimulated and non-stimulated cortex of each animal at two time points following stimulation (24 h vs. 1 h) using JSD.

      What synaptic plasticity genes were selected for assay and what were not?

      What does "largely unaffected" mean? Some of the genes may change a small amount but have big functional effects.

      The selected genes of interest were taken from a large list compiled from previous publications (see (Cacheaux et al., 2009; Kim et al., 2017)) and are well documented in gene ontology databases and tools (e.g., Metascape, (Zhou et al., 2019)).

      We agree that the term ‘largely unaffected’ is suboptimal, and we rephrased this section of the results to indicate that “No significant differences were found in BBB or inflammation related genes between the hemispheres”. We also agree that a small number of genes can have big functional effects. Future studies are needed to better understand the genes underlying the observed BBB modulation.

      Please note that Slc and ABCs are not only involved in the BBB.

      Thank you. We modified the text to no longer specify that these are BBB-specific transporters.

      Please explain the choice of the stress ball squeeze task, and DCE.

      DCE is a well-established method for BBB imaging in living humans, and it is cited throughout the manuscript. The ball squeeze task was chosen as it is presumed to involve primarily sensory motor areas, without high-level processing (Halder et al., 2005). This is now stated in the discussion.

      What is Gd-DOTA?

      Gd-DOTA is a gadolinium-based contrast agent (gadoterate meglumine, AKA Dotarem). Text was revised for clarity. Please see the Methods section under ‘Magnetic Resonance Imaging’ - ‘Data Acquisition’.

      What does a higher percentage of activated regions mean- how was activacon defined and how were regions counted?

      Higher percentage of activated regions refers to regions in which voxels showed significant BOLD changes due to the motor task preformed. The statistical approaches and analyses are detailed in the Methods section under ‘Magnetic Resonance Imaging - Preprocessing of functional data, and fMRI Localizer Motor Task’.

      Figure. 4

      Was stimulation 1 min or 30 min.?

      30 min, Text has been revised for clarity.

      What is the Wald test and how were p values adjusted-please add to the Stats section.

      The Methods section under ‘Statistical analysis’ was revised to clarify this point.

      Is there a reason why p values are sometimes circles and otherwise triangles?

      The legend was revised to explain that ”Circles represent genes with no significant differences between 1 and 24 h poststimulation. Upward and downward triangles indicate significantly up- and down- regulated genes, respectively.”

      How can a p-value be zero? Please explain abbreviations.

      The p-value is very low (~10-10) and therefore appears to be zero due to the scale of the y-axis.

      Fig. 5b.

      There are unexplained abbreviations.

      The x on the ball and hand is not clear relative to the black ball and hand.

      Thank you for noticing. We revised the figure for clarity.

      c. What was the method used to make an activator map and what is meant by localizer task?

      The explanation of the “fMRI Localizer Motor Task” section in the methods was revised for added clarity.

      f. What is the measurement "% area" that indicates " BBB modulation"?

      Is it in f, the BBB permeable vessels (%)? f. Please explain: "Heatmap of BBB modulated voxels percentage in motor/sensory-related areas of task vs. controls."

      The %area measurement indicates the percentage of voxels within a specific brain region that have a leaky BBB. See Methods.

      Is Task - the control?

      Yes.

      Supplemental Fig. 2.

      Why is AUC measured, not amplitude?

      The amplitude, and now also the AUC are shown in Figure 3.

      b. There is no comparison to baseline. The arrowhead points to the start of stimulation but there is no arrowhead marking the end.

      In the revised paper we added a grey shade over the stimulation period to better visualize the difference to baseline. In this panel we wanted to show that NMDA receptor antagonist did not block the SEP, while AMPA receptor antagonist did.

      c. In the blot there are two bands for PSD95- which is the one that is PSD95? There is no increase in PSD95 uncl 24 hrs but in the graph in d there is. In the blot, there is a strong expression of PSD95 ipsilateral compared to contralateral in the sham-why?

      What is the percent change fold?

      The PSD-95 is the top and larger band. The lower band was disregarded in the analysis. The example we show may not fully reflect the group statistics presented in panel d. Upon quantification of 8 animals, PSD-95 is significantly higher 30 min and 24 hours post stimulation in the contralateral hemisphere. No significant changes were found in sham animals. The % change fold refers to the AUC change compared to baseline. This panel was now incorporated in Figure 3 (panel h), and the title was corrected to “|AUC|, % change from baseline”.

      Supplemental Fig. 4.

      a. If ipsilateral and contralateral showed many changes why do the authors think the effects were only contralateral?

      Our gene analysis was designed to complement our in vivo and histological findings, by assessing the magnitude of change in differentially expressed genes (DEGs). This analysis showed that: (1) the hemisphere contralateral to the stimulus has significantly more DEGs than the ipsilateral hemisphere; and (2) the DEGs were related to synaptic plasticity and TGF-b signaling. These findings strengthen the hypothesis raised by our in vivo and histological experiments.

      Supplemental Fig. 5 includes many processes not in the results. Examples include dorsal cuneate and VPL, dynamin, Kir, mGluR, etc. The top right has numbers that are not mentioned. If the drawings are from other papers they should be cited.

      The drawings of Figure 5 are original and were not published before. This hypothesis figure points to mechanisms that may drive the phenomena described in the paper. The legend of the figure was revised to include references to mechanisms that were not tested in this study.

      Papers referenced in this letter:

      Ahishali, B., & Kaya, M. (2020). Evaluation of Blood-Brain Barrier Integrity Using Vascular Permeability Markers: Evans Blue, Sodium Fluorescein, Albumin-Alexa Fluor Conjugates, and Horseradish Peroxidase. Methods in Molecular Biology, 2367, 87–103. https://doi.org/10.1007/7651_2020_316

      Aksenov, D. P., Li, L., Miller, M. J., Iordanescu, G., & Wyrwicz, A. M. (2015). Effects of anesthesia on BOLD signal and neuronal activity in the somatosensory cortex. Journal of Cerebral Blood Flow and Metabolism, 35(11), 1819–1826. https://doi.org/10.1038/jcbfm.2015.130

      All, A. H., Agrawal, G., Walczak, P., Maybhate, A., Bulte, J. W. M., & Kerr, D. A. (2010). Evoked potential and behavioral outcomes for experimental autoimmune encephalomyelitis in Lewis rats. Neurological Sciences, 31(5), 595–601. https://doi.org/10.1007/s10072-010-0329-y

      Allison, J. D., Meador, K. J., Loring, D. W., Figueroa, R. E., & Wright, J. C. (2000). Functional MRI cerebral activation and deactivation during finger movement. Neurology, 54(1), 135–142. https://doi.org/10.1212/wnl.54.1.135

      Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8(9), 1148–1150. https://doi.org/10.1038/nn1516

      Betterton, R. D., Abdullahi, W., Williams, E. I., Lochhead, J. J., Brzica, H., Stanton, J., Reddell, E., Ogbonnaya, C., Davis, T. P., & Ronaldson, P. T. (2022). Regula/on of Blood-Brain Barrier Transporters by Transforming Growth Factor-β/Activin Receptor-Like Kinase 1 Signaling: Relevance to the Brain Disposition of 3-Hydroxy-3-Methylglutaryl Coenzyme A Reductase Inhibitors (i.e., Sta/ns). Drug Metabolism and Disposition, 50(7), 942–956. https://doi.org/10.1124/dmd.121.000781

      Bouchard, M. B., Chen, B. R., Burgess, S. A., & Hillman, E. M. C. (2009). Ultra-fast multispectral optical imaging of cortical oxygenation, blood flow, and intracellular calcium dynamics. Optics Express, 17(18), 15670. https://doi.org/10.1364/oe.17.015670

      Cacheaux, L. P., Ivens, S., David, Y., Lakhter, A. J., Bar-Klein, G., Shapira, M., Heinemann, U., Friedman, A., & Kaufer, D. (2009). Transcriptome profiling reveals TGF-β signaling involvement in epileptogenesis. Journal of Neuroscience, 29(28), 8927–8935. https://doi.org/10.1523/JNEUROSCI.0430-09.2009

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. Journal of Neurophysiology, 79(2), 1117–1123. https://doi.org/10.1152/JN.1998.79.2.1117/ASSET/IMAGES/LARGE/JNP.JA47F4.JPEG

      Franceschini, M. A., Radhakrishnan, H., Thakur, K., Wu, W., Ruvinskaya, S., Carp, S., & Boas, D. A. (2010). The effect of different anesthetics on neurovascular coupling. NeuroImage, 51(4), 1367–1377. https://doi.org/10.1016/j.neuroimage.2010.03.060

      Gellibert, F., Woolven, J., Fouchet, M. H., Mathews, N., Goodland, H., Lovegrove, V., Laroze, A., Nguyen, V. L., Sautet, S., Wang, R., Janson, C., Smith, W., Krysa, G., Boullay, V., De Gouville, A. C., Huet, S., & Hartley, D. (2004). Identification of 1,5-naphthyridine derivatives as a novel series of potent and selective TGF-β type I receptor inhibitors. Journal of Medicinal Chemistry, 47(18), 4494–4506. https://doi.org/10.1021/jm0400247

      Goff, W. R., Rosner, B. S., & Allison, T. (1962). Distribution of cerebral somatosensory evoked responses in normal man. Electroencephalography and Clinical Neurophysiology, 14(5), 697–713. https://doi.org/10.1016/0013-4694(62)90084-6

      Halder, P., Sterr, A., Brem, S., Bucher, K., Kollias, S., & Brandeis, D. (2005). Electrophysiological evidence for cortical plasticity with movement repetition. European Journal of Neuroscience, 21(8), 2271–2277. https://doi.org/10.1111/J.1460-9568.2005.04045.X

      Ivens, S., Kaufer, D., Flores, L. P., Bechmann, I., Zumsteg, D., Tomkins, O., Seiffert, E., Heinemann, U., & Friedman, A. (2007). TGF-β receptor-mediated albumin uptake into astrocytes is involved in neocortical epileptogenesis. Brain, 130(2), 535–547. https://doi.org/10.1093/brain/awl317

      Kaplan, L., Chow, B. W., & Gu, C. (2020). Neuronal regulation of the blood–brain barrier and neurovascular coupling. In Nature Reviews Neuroscience (Vol. 21, Issue 8, pp. 416–432). Nature Research. https://doi.org/10.1038/s41583-020-0322-2

      Keller, P., & Simons, K. (1998). Cholesterol is required for surface transport of influenza virus hemagglutinin. Journal of Cell Biology, 140(6), 1357–1367. https://doi.org/10.1083/jcb.140.6.1357

      Kim, S. Y., Senatorov, V. V., Morrissey, C. S., Lippmann, K., Vazquez, O., Milikovsky, D. Z., Gu, F., Parada, I., Prince, D. A., Becker, A. J., Heinemann, U., Friedman, A., & Kaufer, D. (2017). TGFβ signaling is associated with changes in inflammatory gene expression and perineuronal net degradation around inhibitory neurons following various neurological insults. Scientific Reports, 7(1), 7711. https://doi.org/10.1038/s41598-017-07394-3

      Knowland, D., Arac, A., Sekiguchi, K. J., Hsu, M., Lutz, S. E., Perrino, J., Steinberg, G. K., Barres, B. A., Nimmerjahn, A., & Agalliu, D. (2014). Stepwise Recruitment of Transcellular and Paracellular Pathways Underlies Blood-Brain Barrier Breakdown in Stroke. Neuron, 82(3), 603–617. https://doi.org/10.1016/j.neuron.2014.03.003

      Koudinov, A. R., & Koudinova, N. V. (2001). Essen/al role for cholesterol in synaptic plasticity and neuronal degeneration. The FASEB Journal, 15(10), 1858–1860. https://doi.org/10.1096/r.00-0815re

      Lapilover, E. G., Lippmann, K., Salar, S., Maslarova, A., Dreier, J. P., Heinemann, U., & Friedman, A. (2012). Periinfarct blood-brain barrier dysfunction facilitates induction of spreading depolarization associated with epileptiform discharges. Neurobiology of Disease, 48(3), 495–506. htttts://doi.org/10.1016/j.nbd.2012.06.024

      Lindauer, U., Villringer, A., & Dirnagl, U. (1993). Characterization of CBF response to somatosensory stimulation: Model and influence of anesthetics. American Journal of Physiology - Heart and Circulatory Physiology, 264(4 33-4), 223–1228. https://doi.org/10.1152/ajpheart.1993.264.4.h1223

      Lippmann, K., Kamintsky, L., Kim, S. Y., Lublinsky, S., Prager, O., Nichtweiss, J. F., Salar, S., Kaufer, D., Heinemann, U., & Friedman, A. (2017). Epileptiform activity and spreading depolarization in the bloodbrain barrier-disrupted peri-infarct hippocampus are associated with impaired GABAergic inhibition and synaptic plasticity. Journal of Cerebral Blood Flow and Metabolism, 37(5), 1803–1819. https://doi.org/10.1177/0271678X16652631

      Masamoto, K., & Kanno, I. (2012). Anesthesia and the quantitative evaluation of neurovascular coupling. In Journal of Cerebral Blood Flow and Metabolism (Vol. 32, Issue 7, pp. 1233–1247). SAGE PublicationsSage UK: London, England. https://doi.org/10.1038/jcbfm.2012.50

      McGregor, H. R., Cashaback, J. G. A., & Gribble, P. L. (2016). Functional Plasticity in Somatosensory Cortex Supports Motor Learning by Observing. Current Biology, 26(7), 921–927. https://doi.org/10.1016/j.cub.2016.01.064

      McMillin, M. A., Frampton, G. A., Seiwell, A. P., Patel, N. S., Jacobs, A. N., & DeMorrow, S. (2015). TGFβ1 exacerbates blood-brain barrier permeability in a mouse model of hepatic encephalopathy via upregulation of MMP9 and downregulation of claudin-5. Laboratory Investigation, 95(8), 903–913. https://doi.org/10.1038/labinvest.2015.70

      Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M., & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. Journal of Neuroscience, 29(16), 5326– 5335. https://doi.org/10.1523/JNEUROSCI.5965-08.2009

      Milikovsky, D. Z., Ofer, J., Senatorov, V. V., Friedman, A. R., Prager, O., Sheintuch, L., Elazari, N., Veksler, R., Zelig, D., Weissberg, I., Bar-Klein, G., Swissa, E., Hanael, E., Ben-Arie, G., Schefenbauer, O., Kamintsky, L., Saar-Ashkenazy, R., Shelef, I., Shamir, M. H., … Friedman, A. (2019). Paroxysmal slow cortical activity in Alzheimer’s disease and epilepsy is associated with blood-brain barrier dysfunction. Science Translational Medicine, 11(521), eaaw8954–eaaw8954. https://doi.org/10.1126/scitranslmed.aaw8954

      Ngai, A. C., Jolley, M. A., D’Ambrosio, R., Meno, J. R., & Winn, H. R. (1999). Frequency-dependent changes in cerebral blood flow and evoked potentials during somatosensory stimulation in the rat. Brain Research, 837(1–2), 221–228. https://doi.org/10.1016/S0006-8993(99)01649-2

      Obermeier, B., Daneman, R., & Ransohoff, R. M. (2013). Development, maintenance and disruption of the blood-brain barrier. In Nature Medicine (Vol. 19, Issue 12, pp. 1584–1596). Nature Publishing Group. https://doi.org/10.1038/nm.3407

      Schoknecht, K., Prager, O., Vazana, U., Kamintsky, L., Harhausen, D., Zille, M., Figge, L., Chassidim, Y., Schellenberger, E., Kovács, R., Heinemann, U., & Friedman, A. (2014). Monitoring stroke progression: In vivo imaging of cortical perfusion, blood-brain barrier permeability and cellular damage in the rat photothrombosis model. Journal of Cerebral Blood Flow and Metabolism, 34(11), 1791–1801. https://doi.org/10.1038/jcbfm.2014.147

      Schumacher, L., Slimani, R., Zizmare, L., Ehlers, J., Kleine Borgmann, F., Fitzgerald, J. C., Fallier-Becker, P., Beckmann, A., Grißmer, A., Meier, C., El-Ayoubi, A., Devraj, K., Mittelbronn, M., Trautwein, C., & Naumann, U. (2023). TGF-Beta Modulates the Integrity of the Blood Brain Barrier In Vitro, and Is Associated with Metabolic Alterations in Pericytes. Biomedicines, 11(1), 1–19. https://doi.org/10.3390/biomedicines11010214

      Shim, H. J., Jung, W. B., Schlegel, F., Lee, J., Kim, S., Lee, J., & Kim, S. G. (2018). Mouse fMRI under ketamine and xylazine anesthesia: Robust contralateral somatosensory cortex ac/va/on in response to forepaw stimulation. NeuroImage, 177, 30–44. https://doi.org/10.1016/J.NEUROIMAGE.2018.04.062

      Sudmant, P. H., Alexis, M. S., & Burge, C. B. (2015). Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biology, 16(1), 287. https://doi.org/10.1186/s13059-015-0853-4

      Tao, L., & Nicholson, C. (1996). Diffusion of albumins in rat cortical slices and relevance to volume transmission. Neuroscience, 75(3), 839–847. https://doi.org/10.1016/0306-4522(96)00303-X

      Vazana, U., Veksler, R., Pell, G. S., Prager, O., Fassler, M., Chassidim, Y., Roth, Y., Shahar, H., Zangen, A., Raccah, R., Onesti, E., Ceccanti, M., Colonnese, C., Santoro, A., Salvati, M., D’Elia, A., Nucciarelli, V., Inghilleri, M., & Friedman, A. (2016). Glutamate-mediated blood–brain barrier opening: Implications for neuroprotection and drug delivery. Journal of Neuroscience, 36(29), 7727–7739. https://doi.org/10.1523/JNEUROSCI.0587-16.2016

      Veksler, R., Vazana, U., Serlin, Y., Prager, O., Ofer, J., Shemen, N., Fisher, A. M., Minaeva, O., Hua, N., SaarAshkenazy, R., Benou, I., Riklin-Raviv, T., Parker, E., Mumby, G., Kamintsky, L., Beyea, S., Bowen, C. V., Shelef, I., O’Keeffe, E., … Friedman, A. (2020). Slow blood-to-brain transport underlies enduring barrier dysfunction in American football players. Brain, 143(6), 1826–1842. https://doi.org/10.1093/brain/awaa140

      Zandieh, S., Hopf, R., Redl, H., & Schlag, M. G. (2003). The effect of ketamine/xylazine anesthesia on sensory and motor evoked potentials in the rat. Spinal Cord, 41(1), 16–22. https://doi.org/10.1038/sj.sc.3101400

      Zelig, D., Goldberg, I., Shor, O., Ben Dor, S., Yaniv-Rosenfeld, A., Milikovsky, D. Z., Ofer, J., Imtiaz, H., Friedman, A., & Benninger, F. (2022). Paroxysmal slow wave events predict epilepsy following a first seizure. Epilepsia, 63(1), 190–198. https://doi.org/10.1111/epi.17110

      Zhang, Y., & Pardridge, W. M. (2001). Mediated efflux of IgG molecules from brain to blood across the blood– brain barrier. Journal of Neuroimmunology, 114(1–2), 168–172. https://doi.org/10.1016/S01655728(01)00242-9

      Zhou, Y., Zhou, B., Pache, L., Chang, M., Khodabakhshi, A. H., Tanaseichuk, O., Benner, C., & Chanda, S. K. (2019). Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications, 10(1), 1–10. https://doi.org/10.1038/s41467-019-09234-6

    1. Author Response

      Reviewer #1 (Public Review)

      Midbrain dopamine neurons have attracted attention as a part of the brain's reward system. A different line of research, on the other hand, has shown that these neurons are also involved in higher cognitive functions such as short-term memory. However, these neurons are thought not to encode short-term memory itself because they just exhibit a phasic response in short-term memory tasks, which cannot seem to maintain information during the memory period. To understand the role of dopamine neurons in short-term memory, the present study investigated the electrophysiological property of these neurons in rodents performing a T-maze version of a short-term memory task, in which a visual cue indicated which arm (left or right) of the T-maze was associated with a reward. The animal needed to maintain this information while they were located between the cue presentation position and the selection position of the T-maze. The authors found that the activity of some dopamine neurons changed depending on the information while the animals were located in the memory position. This dopamine neuron modulation was unable to explain the motivation or motor component of the task. The authors concluded that this modulation reflected the information stored as short-term memory.

      I was simply surprised by their finding because these dopamine neurons are similar to neurons in the prefrontal cortex that store memory information with sustained activity. Dopamine neurons are an evolutionally conserved structure, which is seen even in insects, whereas the prefrontal cortex is developed mainly in the primate. I feel that their findings are novel and would attract much attention from readers in the field. But the authors need to conduct additional analyses to consolidate their conclusion.

      We thank reviewer #1 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Reviewer #1 (Recommendations to The Authors)

      (1) The authors found the dopamine neuron modulation that reflected the memory information during the delay period. Here the dopamine neuron activity was aligned by the position, not by time, in which the animals needed to maintain the information. Usually, the activity was aligned by time, and many studies found that dopamine neurons exhibited a short duration burst in response to rewards and behaviorally relevant stimuli including visual cues presented in short-term memory tasks. For comparison, I (and probably other readers) want to see the time-aligned dopamine neuron modulation that reflected the memory information. Did the modulation still exist? Did it have a long duration? The authors just showed the time-aligned "population" activity that exhibited no memory-dependent modulation.

      We agree that the point raised by the reviewer is important. To address this question, we added a new paragraph to the Methods section titled “Methodological considerations” (in line 793 of the revised manuscript), where we explain the caveats of using time alignment in the T-maze task study. We also created a new sup figure 5 to clarify our argument. As the figure shows, we did not observe major differences in the firing rates when they were arranged by position or time. More importantly, we did not detect brief bursts of activity in response to the visual cue which could reflect an RPE signaling scheme. Our interpretation is that in the T-maze task, DA neurons encode “miniature” RPE signals between successive states in the T-maze, which are hard to detect, especially when neurons receive a continuous sensory input during trials.

      (2) Several studies have reported that dopamine neurons at different locations encode distinct signals even within the VTA or SNr. Were the locations of dopamine neurons maintaining the memory information different from those of other dopamine neurons?

      We thank the reviewer’s comment. Indeed, there is evidence from recent studies demonstrating that DA neurons form functional and anatomical clusters in the VTA and SN. Following the reviewer’s advice, we report the anatomical structure of memory and non-memory-specific neurons in the revised manuscript. You can read these results in the paragraph “Anatomical organization of trajectory-specific neurons.” in the “Results” section (in line 383 of the revised manuscript) and in the new sup figure 11. We only observed a clear functional-anatomical segregation in GABA neurons, but not in DA neurons. But we should note that the absence of segregation in the DA neurons could be accounted for by the fact that we recorded mostly from the lateral VTA, therefore we do not have any numbers from the medial VTA.

      (3a) Did the dopamine neurons maintaining the memory information respond to reward?

      We believe that we have already provided the data that can partially answer this question by correlating the firing rate difference between the reward and memory delay sections. This result was described in the “Neuronal activities in delay and reward are unrelated.” paragraph and in Figure 6. Moreover, motivated by the reviewer’s question, we also performed additional analysis, which is included in the revised manuscript. Briefly, we clustered significant responses between the memory delay and reward sections (Category 1: Left-signif, R-signif or No-signif / Category 2: Memory delay or Reward). We discovered that only a very small number of neurons showed the same significant trajectory preference in the memory delay and reward sections (i.e., significant preference for left trials in the memory delay and significant preference for the left reward). In fact, more significant neurons showed a preference for opposite trajectories (i.e. significant preference for left trials in memory delay and a significant preference for right rewards). A description of the new results is included in the “Neuronal activities in delay and reward are unrelated.” paragraph (in line 349 of the revised manuscript) and in the new supplementary Figure 11.

      (3b) Did they encode reward prediction error? The relationship between the present data and the conventional theory may be valuable.

      We understand that the readers of this study will come up with the question of how memory-specific activities are related to RPE signaling. However, the T-maze task we used in this research was designed for studying working memory and was not adequate to extract information about the RPE signaling of DA neurons.

      RPE signaling is mainly studied in Pavlovian conditioning. These are low-dimensional tasks with usually four (4) states (state1: ITI, state2: trial start, state3: stimulus presentation, state4: reward delivery). Evidence of RPE signaling is extracted from the firing activity of states 3 and 4 (which is theorized to be related to the difference in the values for states 3 and 4).

      However, in the T-maze task, the number of states is hard to define and practically countless. In these conditions, it has been suggested that numerous small RPEs are signaled while the mice navigate the maze; Thus, they are very difficult to detect. To our knowledge, only Kim et al 2020, Cell, vol183, pg1600, managed to detect the RPE signaling activity of DA neurons while mice were teleported in a virtual corridor.

      Another confounding factor in extracting RPE signals in the T-maze task is that the environment is high-dimensional and DA neurons are multitasking. Therefore, it is likely that RPE signaling could be masked by other parallel encoding schemes.

      We have added these descriptions in the “Methodological considerations” (in line 793 of the revised manuscript).

      (4) Did the dopamine neurons maintaining the memory information (left or right) prefer a contralateral direction like neurons in the motor cortex?

      We thank the reviewer for this comment. Indeed, the majority of the memory-specific DA neurons showed a preference for the contralateral direction. We report this result in the legend of the new sup fig 10 (in line 1668 of the revised manuscript).

      (5) As shown in Table S2, the proportion of GABA neurons maintaining the memory information (left or right during delay) was much larger than that of dopamine neurons. It seems to be strange because the main output neurons in the VTA are dopaminergic. What is the role of these GABA neurons?

      We thank the reviewer for pointing this out. The present study shows that in both populations a sizeable portion of neurons show memory-specific encoding activities. However, the percentage of memory-encoding GABA neurons is more than twice as large as in the DA neurons. Moreover, we show that GABA neurons are functionally and anatomically segregated.

      From this evidence, one could raise the hypothesis that the GABA neurons have a primary role and that the activity of DA neurons is a collateral phenomenon, triggered in a sequence of events within the VTA network. To characterize the (1) role and (2) importance of GABA neurons in memory-guided behavior, one should first identify the afferent and efferent projections of these cells in great detail. Unfortunately, we do not provide anatomical evidence.

      So far, with the electrophysiological data we have collected (unit and field recordings), we can address an alternative hypothesis. It has been reported earlier (but we have also observed) that the VTA circuit engages in behaviorally related network oscillations which range from 0.4Hz up to 100Hz. Converging evidence from different brain regions, in vitro preparations but also in vivo recordings agree that local networks of inhibitory neurons are crucial for the generation, maintenance, and spectral control of network oscillations. Ongoing analysis, which we hope will lead to a publication, is looking for the behavioral correlates of network oscillations on the T-maze task, as well as the correlation of single-unit firing activity to the field oscillations. We expect to detect a higher field-unit coherence in GABA neurons, which could explain their stronger engagement in memory-specific encoding activity.

      The potential role of GABA neurons in network oscillations is discussed in the revised manuscript in a newly added paragraph in line 564.

      Reviewer #2 (Public Review)

      The authors phototag DA and GABA neurons in the VTA in mice performing a t-maze task, and report choice-specific responses in the delay period of a memory-guided task, more so than in a variant task w/o a memory component. Overall, I found the results convincing. While showing responses that are choice selective in DA neurons is not entirely novel (e.g. Morris et al NN 2006, Parker et al NN 2016), the fact that this feature is stronger when there is a memory requirement is an interesting and novel observation.

      I found the plots in 3B misleading because it looks like the main result is the sequential firing of DA neurons during the Tmaze. However, many of the neurons aren't significant by their permutation test. Often people either only plot the neurons that are significant, or plot with cross-validation (ie sort by half of the trials, and plot the other half).

      Relatedly, the cross-task comparisons of sequences (Fig, 4,5) are hampered by the fact that they sort in one task, then plot in the other, which will make the sequences look less robust even if they were equally strong. What happens if they swap which task's sequences they use to order the neurons? I do realize they also show statistical comparisons of modulated units across tasks, which is helpful.

      We thank reviewer #2 for the valuable and constructive comments on our manuscript. If, as the reviewer commented, the rate differences between left and right trajectories were only the result we want to claim, there may be a way to show only those whose left and right are significant. However, the sequential activity is also one of the points we wanted to display. We did not emphasize this result because it has already been shown by Engelhard et al. 2019. However, after reading the reviewer's comments, we decided to add a few lines in the "Results" (in lines 205 - 215 of the revised manuscript) and "Discussion" (in line 453 of the revised manuscript) describing the sequential activity of the VTA circuit. In those lines, we explained that DA activity is position-specific (resulting in sequential activity) and that a fraction of them also have left-right specificity.

      Overall, the introduction was scholarly and did a good job covering a vast literature. But the explanation of t-maze data towards the end of the introduction was confusing. In Line 87, I would not say "in the same task" but "in a similar task" because there are many differences between the tasks in question.

      We thank the reviewer for pointing out this mistake. In the revised manuscript, we replaced “in the same task” with “in a similar task” (in line 85 of the revised manuscript).

      And not clear what is meant by "by averaging neuronal population activities, none of these computational schemes would have been revealed. " There was trial averaging, at least in Harvey et al. I thought the main result of that paper related to coding schemes was that neural activity was sequential, not persistent. I think it would help the paper to say that clearly.

      We admit that this sentence leaves room for misunderstanding. We were mainly referring to DA studies using microdialysis or fiber photometry techniques. We decided to delete this sentence in the revised manuscript.

      Also, I'm not aware it was shown that choice selectivity diminishes when the memory demand of the task is removed - please clarify if that is true in both referenced papers.

      The reviewer’s remark is correct. None of these reports show explicitly that memory-specific activities are diminished without the memory component. Therefore, we deleted this sentence in the revised manuscript.

      If so, an interpretation of this present data could be found in Lee et al biorxiv 2022, which presents a computational model that implies that the heterogeneity in the VTA DA system is a reflection of the heterogeneity found in upstream regions (the state representation), based on the idea that different subsets of DA neurons calculate prediction errors with respect to different subsets of the state representation.

      We thank the reviewer for sharing this interpretation. We agree that this theory would support our results. In the revised manuscript we briefly discuss the Lee et al. report (in line 460 of the revised manuscript).

      I am surprised only 28% of DA neurons responded to the reward - the reward is not completely certain in this task. This seems lower than other papers in mice (even Pavlovian conditioning, when the reward is entirely certain). It would be helpful if the authors comment on how this number compares to other papers.

      In Pavlovian conditioning, neuronal responses to rewards are compared to a relatively quiet period of firing activity (usually the inter-trial interval epoch). As the reviewer pointed out, in the present study, the number of DA neurons responding to reward is smaller compared to the earlier studies. We hypothesize that this is due to our comparison method. We compared the post-reward response to an epoch when the animal was running along the side arms and the majority of neurons were highly active, instead of comparing it to a quiescent baseline epoch.

      Reviewer #2 (Recommendations to The Authors)

      Can you clarify what disparity you are referring to here? "Disparities between this 438 and our study in the proportions of modulated neurons could be attributed to the 439 different recording techniques applied as well as the maze regions of interest; for 440 example, Engelhard et al. analyzed neuronal firing activities in the visual-cue period 441 (Engelhard et al., 2019), whereas we focused on memory delay.". Is it the fact that Engelhard et al did not report choice-selective activity? They did report cue-side-selective activity, with some neurons responsive to cues on one side, and other neurons responsive to cues on the other side. Because there are more cues on the left when the mouse turns left, these neurons do indeed have choice-selective responses.

      We thank the reviewer for this comment. We agree that we need to clarify further our argument. As the reviewer pointed out, Engelhard et al identified choice-specific DA neurons. However, they reported the encoding properties of DA neurons only in the visual-cue period and the reward period. Remarkably, although the task has a memory delay, they did not report the neuronal firing activities for this delay period. Instead, in the present study we dedicated most of our analysis to characterizing the firing properties of VTA neurons in the delay period.

      Also, in response to your comment, we edited the paragraph where we describe the disparities between our study and Engelhard et al (in line 466 in the revised manuscript).

      I don't think this sentence of intro is needed since it doesn't really contain new info: "Therefore, we looked for hints 116 of memory-related encoding activities in single DA and GABA neurons by 117 characterizing their firing preference for opposite behavioral choices.".

      We agree with the reviewer. Therefore, we deleted this sentence in the revised manuscript.

      I didn't understand this line of discussion: "Our evidence does not question the validity of this computational model, since we do not provide evidence of how the selective preference for one response over the other translates into the release site.".

      The gating theory is based on experimental evidence of neuronal firing activities of DA neurons but also takes into consideration (to a lesser degree) the pre- and post-synaptic processes at the DA release sites (inverted U-shape of D1R activity). We thought that the reader may come to the conclusion that we question the validity of the gating theory. But this is not our intention, especially when we do not provide important evidence such as (1) the projection sites of DA and GABA neurons and (2) the sequence of events that take place at the synaptic triads following the DA and GABA release.

      After reading your comment we came to the conclusion that this sentence should be omitted because it is not within the scope of this study to question the validity of the gating theory. Instead, we dedicated a few lines of text to explaining which components of the gating theory (“update”, “maintenance & manipulation” and “motor preparation”) could be attributed to the trajectory-specific activities in the memory delay of the T-maze task. (section “Activities of midbrain DA neurons in short-term memory” in line 417 of the revised manuscript).

      In 1B, please illustrate when the light pulses are on & off?

      Following the reviewer’s instruction, we added colored bars on top of the raster plots in Figure 1B, indicating the light induction conditions.

      In legend for 6C, please clarify it's a correlation between the difference in R and L choice activity across the epochs (if my understanding is correct).

      The reviewer’s understanding is correct. We took this advice into consideration to further clarify the methods of analysis that led to the plot in Figure 6C (in line 1246 in the revised manuscript).

    1. Author Response

      eLife assessment

      The important work by Aballay et al. significantly advances our understanding of how G protein-coupled receptors (GPCRs) regulate immunity and pathogen avoidance. The authors provide convincing evidence for the GPCR NPR-15 to mediate immunity by altering the activity of several key transcription factors. This work will be of broad interest to immunologists.

      The authors express their sincere appreciation to Timothy Behrens (Senior Editor), the Reviewing Editor, and the original reviewers for their considerate and favorable assessment of our manuscript.

      Reviewer #1 (Public Review):

      Summary:

      Otarigho et al. presented a convincing study revealing that in C. elegans, the neuropeptide Y receptor GPCR/NPR-15 mediates both molecular and behavioral immune responses to pathogen attack. Previously, three npr genes were found to be involved in worm defense. In this study, the authors screened mutants in the remaining npr genes against P. aeruginosa-mediated killing and found that npr-15 loss-of-function improved worm survival. npr-15 mutants also exhibited enhanced resistance to other pathogenic bacteria but displayed significantly reduced avoidance to S. aureus, independent of aerotaxis, pathogen intake and defecation. The enhanced resistance in npr-15 mutant worms was attributed to upregulation of immune and neuropeptide genes, many of which were controlled by the transcription factors ELT-2 and HLH-30. The authors found that NPR-15 regulates avoidance behavior via the TRPM gene, GON-2, which has a known role in modulating avoidance behavior through the intestine. The authors further showed that both NPR-15-dependent immune and behavioral responses to pathogen attack were mediated by the NPR-15-expressing neurons ASJ. Overall, the authors discovered that the NPR-15/ASJ neural circuit may regulate distinct defense mechanisms against pathogens under different circumstances. This study provides novel and useful information to researchers in the fields of neuroimmunology and C. elegans research.

      The authors are grateful for the thoughtful and insightful comments on our manuscript. Your feedback has been instrumental in refining our work, and we appreciate the time and expertise you have invested in evaluating our study.

      Strengths:

      1) This study uncovered specific molecules and neuronal cells that regulate both molecular immune defense and behavior defense against pathogen attack and indicate that the same neural circuit may regulate distinct defense mechanisms under different circumstances. This discovery is significant because it not only reveals regulatory mechanisms of different defense strategies but also suggests how C. elegans utilize its limited neural resources to accomplish complex regulatory tasks.

      The authors express gratitude to the reviewer for recognizing that the present study revealed specific molecules and neuronal cells involved in regulating both molecular immune defense and behavioral defense against pathogen attacks. Additionally, the acknowledgment that the same neural circuit may oversee distinct defense mechanisms under different circumstances is appreciated.

      2) The conclusions in this study are supported by solid evidence, which are often derived from multiple approaches and/or experiments. Multiple pathogenic bacteria were tested to examine the effect of NPR-15 loss-of-function on immunity; the impacts of pharyngeal pumping and defecation on bacterial accumulation were ruled out when evaluating defense; RNA-seq and qPCR were used to measure gene expression; gene inactivation was done in multiple strains to assess gene function.

      The authors thank the reviewer for appreciating that this study is supported by solid evidence.

      3) Gene differential expression, gene ontology, and pathway analyses were performed to demonstrate that NPR-15 controls immunity by regulating immune pathways.

      The authors thank the reviewer for appreciating the Gene differential expression, gene ontology, and pathway analyses performed in the study.

      4) Elegant approaches were employed to examine avoidance behavior (partial lawn, full lawn, and lawn occupancy) and the involvement of neurons in regulating immunity and avoidance (the use of a diverse array of mutant strains).

      The author thanks the reviewer for appreciating the approaches used in this study.

      5) Statistical analyses were appropriate and adequate.

      The authors thank the reviewer for appreciating the Statistical analyses used in this study.

      Reviewer #2 (Public Review):

      Summary:

      The authors are studying the behavioral response to pathogen exposure. They and others have previously describe the role that the G-protein coupled receptors in the nervous system plays in detecting pathogens, and initiating behavioral patterns (e.g. avoidance/learned avoidance) that minimize contact. The authors study this problem in C. elegans, which is amenable to genetic and cellular manipulations and allow the authors to define cellular and signaling mechanisms. This paper extends the original idea to now implicate signaling and transcriptional pathways within a particular neuron (ASJ) and the gut in mediating avoidance behaviour.

      Strengths:

      The work is rigorous and elegant and the data are convincing. The authors make superb use of mutant strains in C. elegans, as well tissue specific gene inactivation and expression and genetic methods of cell ablation. to demonstrate how a gene, NPR15 controls behavioral changes in pathogen infection. The results suggest that ASJ neurons and the gut mediate such effects. I expect the paper will constitute an important contribution to our understanding of how the nervous system coordinates immune and behavioral responses to infection.

      The authors sincerely thank the reviewer for the thoughtful and positive review of our manuscript. We greatly appreciate the time and effort you dedicated to evaluating our work, and we are pleased that you find our study to be a rigorous and elegant contribution to the understanding of behavioral responses to pathogen exposure.

      Reviewer #1 (Recommendations For The Authors):

      The authors have adequately addressed my concerns and questions. I have no more comments or recommendations for the authors.

      The authors thank the reviewer for the constructive comments on the manuscript

      Reviewer #2 (Recommendations For The Authors):

      The authors have adequately addressed my concerns.

      The authors express their appreciation to the reviewer for the valuable and constructive comments provided on the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript entitled 'Safb1 regulates cell fate determination in adult neural stem cells by enhancing Drosha cleavage of NFIB mRNA' by Iffländer et al, represents a solid piece of work addressing a non-canonical function of Drosha on NFIB mRNA processing via a newly identified Drosha partner, Safb1. The authors provide particularly systematic and convincing evidence on the biochemical interactions among the key players in this cascade. However, the significance of these interactions for NSC fate determination is not adequately supported by the data, hence, I have some remarks that would need to be addressed in order to clarify the impact of these events on NSC biology.

      1) One of my main concerns is related to the nature of the DG NSCs used in all in vitro assays. The authors refer to their previous work on how these cells are isolated using a Hes5 mouse reporter line. However, both recent scRNAseq data (http://linnarssonlab.org/dentate/ from Hochgerner et al) and the authors' own immunostainings (Fig. 7A), clearly show that Hes5 does not label only adult NSCs in the DG, but also (if not primarily) astrocytes. Considering that the initial cultures could contain a high proportion of mature astrocytes, most of the major conclusions and hypotheses should be reformulated.

      We thank the reviewer for their comment. We think that there is a misunderstanding about how the DG neural stem cells were isolated and cultured. In this manuscript we did not use the Hes5::GFP allele to isolate the stem cells. We isolated DG neural stem cells from C57Bl6 mice according to the protocol of Babu et al. (Babu et al. 2007 doi: 10.1371/journal.pone.0000388) and maintained and differentiated these according to our previous manuscripts (Ronaldo et al. 2016). This was not clear in the methods section of the original manuscript and, therefore, we have added the reference Babu et al. In order to address potential contamination with astrocytes, we have added images of the stem cells and their progeny immunostained with astrocytic markers (GFAP and S100b) in undifferentiated and differentiated states. These new data show that these neurogenic cells and their progeny do not express astrocytic markers until differentiation is induced.

      2) Along these lines, Safb1 expression is quite widespread in the mouse DG (Fig. 7A) and does not display any specificity towards any type of progenitor cells compared to its expression in DGCs within the GCL. The authors should discuss this and integrate this expression information into their conclusions and interpretations, highlighting all pertinent limitations.

      We appreciate and agree with the reviewer’s comment. SAFB1 is indeed broadly expressed by most if not all cells in the hippocampus. We quantified levels of SAFB1 expression across progenitors, astrocytes and neurons in the adult DG and in the SVZ, and show that SAFB1 levels differ across different neural stem cell populations and neural cells. We believe that our data show both in vitro and in vivo that the levels of SAFB1 are critical for determining the function of SAFB1 in regulating neural stem cell fate. We also showed that elevating SAFB1 levels in SVZ-derived neural stem cells suppresses their differentiation into oligodendrocytes, This we have made clearer in the text. However, how cells sense the levels of SAFB1 remains to be shown and it is difficult to speculate on the mechanism.

    1. Author Response

      Reviewer #1 (Public Review):

      In this analysis derived from the BLADE study, a Phase IV investigation using the LHRH antagonist Degarelix, the authors revealed additional insights into the relationship between FSH and body composition.

      The primary strength of the study lies in its prospective nature and the utilization of human subjects.

      We thank the reviewer for the positive evaluation.

      However, some weaknesses exist in the study.

      First, the authors presented results from a simple correlation study without accounting for potential confounding factors in fat metabolism. Particularly, readers may be intrigued to understand how testosterone or estradiol interact with FSH in relation to fat mass.

      As for the evaluation of circulating levels of testosterone and estradiol, unfortunately the protocol did not include the dosage for these hormones. The evaluation of testosterone, in particular, would have required mass photometry as the values of testosterone during therapy with degarelix are reduced below the sensitivity of the methods used in clinical practice. Therefore, the correlation/association analysis between testosterone and body composition would not have been reliable and would not have been useful for the study. All patients were considered to have hypogonadism due to the significant decrease in PSA values and the limited testosterone data available.

      The inverse relationship between ALBI/FBM was previously documented in a paper by the same group (Palumbo et al, Prostate Cancer Prostatic Dis 2021). In that earlier publication, the authors reported no correlation between FSH and lean mass or ALBI, suggesting the significance of the correlation between FSH and ALBI/FBM arising from changes in fat body mass-a factor somehow not included in the prior paper, not necessarily from sarcopenia.

      The referee is correct, as there is no correlation between lean mass and FSH, nor between lean mass variations and FSH variations. The correlation between ALMI/FBM and FSH is mostly due to the effect on fat mass. The text now includes a statement that emphasizes this concept (see Discussion page 8, lines 19-22).

      Reviewer #2 (Public Review):

      This manuscript reports the results of an ancillary study of a prospective trial assessing the effects of androgen deprivation therapy (ADT) with Dagarelix (a GnRH antagonist) on body composition in patients with prostate cancer. An interesting relationship between FSH levels, that were suppressed by Dagarelix treatment, and body composition parameters (particularly fat body mass) was described after 12 months of therapy. Therefore, the authors conclude that FSH could be a promising marker to monitor the risk of sarcopenic obesity and cardiovascular complications in prostate cancer patients undergoing ADT. As acknowledged by the Authors the main limitation of the study is the limited sample of patients. However, since testosterone levels were not assessed it is not possible to firmly establish whether the changes in fat mass observed with treatment are directly or indirectly associated with a reduction in FSH (and therefore in the latter case mediated by testosterone). Moreover, it is not clear whether the effect of the change in FSH levels during the study and the body composition parameters achieved at 12 months was evaluated (instead of assessing the relationship between FSH changes and changes in body composition parameters). Finally, tests on bone muscle mass and strength were not performed, so the hypothesis that variation of FSH levels in prostate cancer patients in ADT may affect sarcopenia remains speculative.

      We appreciate the reviewer's positive assessment of our manuscript. We evaluated the correlation between FSH changes and body composition values after 12 months of Degarelix, as requested by the reviewer. No significant correlation was observed, see the attached table. Therefore we have decided not to insert this last statistical analysis in the revised paper.

    1. Author Response

      Reviewer #1 (Public Review):

      Using a HFD mouse model, the authors examined the H3K4me3 mark in sperm and placental tissues followed by correlation to the transcriptomic changes in the placental tissues of the male and female offspring. The hypothesis that the authors tried to test was that sperm histone epimutations affect placental function, thereby leading to metabolic disorders in offspring. The strength of this work includes the interesting idea and the initial data generated. However, the entire study remains purely correlative without any validation experiment to support the correlation. The conclusion needs to be further supported by bigger sample size and more functional analyses demonstrating the causal relationship among the histone epimutations detected, the dysregulated mRNA expression in the placenta, and the phenotypes in offspring.

      Functional data: We appreciate that we should have emphasized and written more clearly that we had indeed phenotyped the placentas and offspring metabolic health from the same model we derived the placenta tissue from as we reported in (Jazwiec et al., 2022)(PMID: 35377412). This was referenced in our submitted manuscript (Lines 105-107; 131-133; 135-139; 147-150; 232-235; 270-273; 297-300; 384-386; 433-435; 441-448; 507-514). We have made this more apparent in the manuscript by expanding our description of the offspring phenotypes in the introduction and clarified that it was from this model that the placenta’s used in this study were derived from (Jazwiec et al., 2022) (PMID: 35377412).

      Regarding effect and sample size: It appears that on review the animal numbers used for the ChIP-seq were confused with the number of replicates by the reviewers. These details were in Supplementary file 1a. There were 3 replicates per experimental group and each replicate contained sperm from pooled samples that was equalized in cell number and comprised of sperm from n=7 control males, or n=16 HFD males. For the RNA-seq n=4 placentas were used from each experimental group from both males and females for a total N of 16. Although the sample size is moderate, we followed the Canadian Council of Animal Care guideline which calls for the use of the lowest animal number that elicits significant effects (CCAC guidelines p6 “Consideration must also be given to reduction, to determine the fewest number of animals appropriate to provide valid information and statistical power, while still minimizing the welfare impact for each animal”).

      Validation: We used a high standard of computational validation and visualization strategies, to ensure confidence in genomic data. This also allowed for a comprehensive understanding of the biological and physiological impacts of paternal obesity on the sperm epigenome and placenta transcriptome. In our experimental design we also included biological and technical replicates. Together these methods provide robustness checks of the experimental data and support our conclusions. These are the validation strategies we used:

      Technical and experimental validation

      • We evaluated the quality of sequencing data using metrics of read quality, alignment and coverage. These are summarized in Supplementary file 1a.

      • Visualized and performed statistical analysis of data to check for anomalies and discrepancies, Pearson correlation analysis shown on heatmap to look for variance and patterns in samples- all here highly correlated (Figure 2 – Figure supplement 1 B and Figure 4 – Figure supplement 1 A). We checked for batch effects and normalized the data (Figure 4 – Figure supplement 1 B) we used PCA plot analysis as a second check for sample behaving oddly (Figure 2 – Figure supplement 1 C and Figure 4 – Figure supplement 1 C).

      • We used a deconvolution approach to improve the biological meaning of our bulk RNA-seq data (Figure 6, Figure 5 – Figure supplement 1 and 2).

      • Performed functional enrichment analysis to gain insight into biological functions, pathways, and genome ontology and visualized individual regions identified to be altered as a confirmation (Figure 2 D and 2 E; Figure 4 E and F; Figure 6, Figure 2 – Figure supplement 1 E; Figure 3 – Figure supplement 1). Comparison to external data sets:

      • We compared our data with external data sets using the same tissues and cell and to our prior studies: a) We compared ChIP-seq data from this obesity model with our former obesity ChIP-seq data (Figure 2 – Figure supplement 1); b) re-analyzed and compared placenta RNA-seq data from an in utero exposure hypoxia model that shared similar offspring and placenta phenotypes as we observed in the obesity model (Figure 6 and Figure 6 – Figure supplement 1).

      • We used a deconvolution approach to improve the biological meaning of our bulk RNA-seq data (Figure 6, Figure 5 – Figure supplement 1 and 2). Statistical Significance and False Discovery Rate (FDR):

      • We applied statistical tests and multiple testing corrections to reduce the likelihood of false positives (See also response 1 for additional testing added to the revised manuscript)

      Causation versus correlation: We agree that the relationship between the sperm epigenome and placenta transcriptome is correlative, however this is the current state of the field for studies of paternal epigenetic transmission of environmental information. To take this study to the point where causation can be implied would require the generation of a sperm epigenome edited mouse model where we target genes implicated in placental function. Indeed, this targeting approach is well underway in our research program.

      Reviewer #2 (Public Review):

      This study follows up on previous work from this group, and others, relating paternal diet to changes in sperm epigenetics, and offspring phenotypes. The authors focus on paternal diet (high-fat diet versus a control chow), sperm chromatin, and molecular changes in the placenta associated with offspring development.

      The text is well written and the figures are generally well presented and clear. The sperm epigenetic analyses and analysis of the placenta epigenetics and gene expression are generally well performed. The study provides new insight into how paternally mediated intergenerational epigenetic inheritance could involve placenta-embryo signaling.

      A major weakness is that the high-fat diet used was from a different manufacturer than the control (lower fat) diet. Therefore, it is difficult to judge whether the effects are due to a change in fat levels, or the many other molecules that are likely to differ in chow between different manufacturers. Other weaknesses include lack of methodological detail in parts, low n values for some experiments, and the need for more mechanistic data.

      Diets: It is worth reminding that we are studying the effects of obesity and not diet. Indeed, HFD induces metabolic dysfunction while the control does not. Although it is fair to point out that the composition of the control diet should be kept in mind, considering the desired outcomes within the scope of the study, the diets elicited the desired phenotypic effects serving as a model for obesity. We see this experimental design as a strength, as in this study we compared this model to our previous published obesity model (Pepin, Lafleur, Lambrot, Dumeaux, & Kimmins, 2022) (PMID: 35183795), and there was significant overlap in the regions of differential enrichment detected between both models even though they were conducted in different research settings, with different mouse substrain and different diet combinations. In our opinion this demonstrates that we are measuring robust effects of paternal obesity that can be replicated under different conditions. This comparative study design has been lacking in the field of epigenetic inheritance.

      Animal numbers and replicates: It appears that on review the animal numbers used for the ChIP-seq were confused with the number of replicates by the reviewers. These details were in Supplementary file 1a. There were 3 replicates per experimental group and each replicate contained sperm from pooled samples that was equalized in cell number and comprised of sperm from n=7 control males, or n=16 HFD males. For the RNA-seq n=4 placentas were used from each experimental group from both males and females for a total N of 16. Although the sample size is moderate, we followed the Canadian Council of Animal Care guideline which calls for the use of the lowest animal number that elicits significant effects (CCAC guidelines p6 “Consideration must also be given to reduction, to determine the fewest number of animals appropriate to provide valid information and statistical power, while still minimizing the welfare impact for each animal”).

      Whilst the authors may have achieved their aims, more data is needed to inform a potential mechanism.

      It is difficult in studies on paternal epigenetic inheritance to attribute a mechanism and we agree that the relationship between the obesity altered sperm epigenome and the placenta abnormalities are correlative. However, the novelty in our study is that we postulate a new mechanism for paternal transmission of metabolic disease that implicates the placenta and demonstrate this via an altered placenta transcriptome and placenta developmental abnormalities described here and in our previous paper on this model ((Jazwiec et al., 2022); PMID: 35377412). The next steps for the field to address causation/mechanism requires generation of a sperm epigenome edited mouse model where we induce and track histone methylation changes at specific genes to the tissues in the next generation. Indeed, this targeting approach is underway in our research program.

      Reviewer #3 (Public Review):

      This study represents a useful addition to the authors' previous study examining the effects of paternal high-fat diet on offspring metabolism and gene expression in offspring (PMID: 35183795). It differs from the previous study in some of the details of the experimental model (age of sire when exposed to the diet manipulation, mouse substrain, and the nature of the control diet) and the results are largely in line with previous findings. The major finding is that many genes at which sperm H3K4me3 signal is altered also have altered expression in the placenta; some of these genes are paternally imprinted, providing a paternal-specific epigenetic signature. Strengths of the study include establishment of an important dataset correlating the sperm epigenome with gene expression in placental tissue, leading to an interesting and provocative conclusion. Weaknesses include a relatively superficial analysis of the dataset, revealing broad patterns but few specific conclusions, reliance on correlative analysis to draw conclusions, and absence of validation studies. Deconvolution analysis of bulk RNA-seq data helps to account for differences in cell composition between placental datasets, but does not add additional insight toward the central question of how sperm epigenetic state contributes to offspring gene expression. Overall the advance over previous work is relatively small.

      Specific points:

      1) The analysis as it stands is limited. To compare sperm H3K4me3 and placental expression, numbers of overlapping genes are provided, but no statistical analysis is done to indicate the significance of the overlap.

      Fisher’s exact test to overlap paternal obesity-associated differentially enriched regions of H3K4me3 deH3K4me3) with female and male placenta differentially enriched genes (Figure 4 – Figure supplement 1 Di and ii).

      2) There is little direct connection to biological systems or validation of differential enrichment/expression analysis. Gene ontology enrichments for genes differentially enriched for H3K4me3 in sperm or differentially expressed in placenta (broken up by sex) are performed, but the biological significance of these categories is not clear.

      We used a high standard of computational validation and visualization strategies, to ensure confidence in genomic data. This also allowed for a comprehensive understanding of the biological and physiological impacts of paternal obesity on the sperm epigenome and placenta transcriptome. In our experimental design we also included biological and technical replicates. Together these methods provide robustness checks of the experimental data and support our conclusions. The validation strategies we used are detailed in response 17.

      We revised the text to expand discussion on the observed enriched gene ontology terms, as well as the biological significance and functions of the genes we refer to in this section:

      Lines 222-227: “The placenta is a rich source of hormone production, is highly vascularized, and secretes neurotransmitters (Hemberger, Hanna, & Dean, 2020; Rosenfeld, 2021). Disruption in these functions is suggested in the significantly enriched pathways that included genes involved in the transport of cholesterol, angiogenesis, and neurogenesis (Figure 4 C-D, Supplementary file 1e-f). Other significantly enriched processes included genes implicated in nutrient and vitamin transport (Figure 4 C-D).”

      Lines 441-463:“Many of the DEGs in the paternal obese-sired placentas were involved in the regulation of the heart and brain. This is in line with paternal obesity associated to the developmental origins of neurological, cardiovascular, and metabolic disease in offspring (Andescavage & Limperopoulos, 2021; Binder, Beard, et al., 2015; Binder et al., 2012; Chambers et al., 2016; Cropley et al., 2016; de Castro Barbosa et al., 2016b; T. Fullston et al., 2012; Tod Fullston et al., 2013; Grandjean et al., 2015; Huypens et al., 2016; Jazwiec et al., 2022; Mitchell, Bakos, & Lane, 2011; Ng et al., 2010; Pepin et al., 2022; Perez-Garcia et al., 2018; Terashima et al., 2015; Thornburg et al., 2016; Thornburg & Marshall, 2015; Ueda et al., 2022; Wei et al., 2014). The brain-placenta and heart-placenta axes refer to their developmental linkage to the trophoblast which produces various hormones, neurotransmitters, and growth factors that are central to brain and heart development (Parrettini, Caroli, & Torlone, 2020; Rosenfeld, 2021). This is further illustrated in studies where placental pathology is linked to cardiovascular and heart abnormalities (Andescavage & Limperopoulos, 2021; Thornburg et al., 2016; Thornburg & Marshall, 2015). For example, in a study of the relationship between placental pathology and neurodevelopment of infants, possible hypoxic conditions were a significant predictor of lower Mullen Scales of Early Learning (Ueda et al., 2022). A connecting factor between the neural and cardiovascular phenotypes is the neural crest cells which make a critical contribution to the developing heart and brain (Hemberger et al., 2020; Perez-Garcia et al., 2018). Notably, neural crest cells are of ectodermal origin which arises from the TE (Prasad, Charney, & García-Castro, 2019), which is in turn governed by paternally-driven gene expression. It is worth considering the routes by which TE dysfunction may be implicated in the paternal origins of metabolic and cardiovascular disease. First, altered placenta gene expression beginning in the TE could influence the specification of neural crest cells which are a developmental adjacent cell lineage in the early embryo. TE signaling to neural crest cells could alter their downstream function. Second, altered trophoblast endocrine function will influence cardiac and neurodevelopment (Hemberger et al., 2020).”

      3) The overall effect size is small. In most cases the magnitude of differences is minor, and it is not clear which of these changes are significant over noise. For example, the y-axis for the metagene plots in Figure 2B does not start at zero, so the total range of the difference in H3K4me3 is small. In Figure 6C, DEGs detected in hypoxic placenta after deconvolution analysis do not look very different compared to control.

      Thank-you for pointing out that the scales were different in Figure 2 Bi and ii. They have been revised to show the same Y axis scale beginning at zero for comparison of regions that gained and lost H3K4me3 making the differences in H3K4me3 more readily visible. The heatmap shown in Figure 6 C visualizes the DEGs in hypoxic vs control placenta where 1477 DEGS were identified in our re-analysis using a convolution approach applied to the bulk-seq data set from Chu et al., 2019. We do not share the view that they are not well visualized in the heat map.

      4) Deconvolution analysis was done on bulk RNA-seq data from placenta, and the numbers of DEGs identified with this analysis compared to the original analysis are shown, but is not clear how the deconvolution analysis changes the specific biological conclusions. In addition, the reference dataset for deconvolution is a published dataset generated in another lab, and it is unclear how comparable the reference sample is to the samples analyzed in this study, or how robust this analysis is when using a dataset generated under different conditions.

      The deconvolution analysis allows to infer cellular composition within a tissue and suggests that there are changes in cell-type proportion that could change placenta function and improves the detection of differentially expressed genes (Aliee & Theis, 2021; Campbell et al., 2023; Kuhn, Thu, Waldvogel, Faull, & Luthi-Carter, 2011) (PMID: 34293324; 36914823; 21983921).

      As per the published dataset used as a reference sample for the deconvolution analysis, it was ideal -we specifically chose this dataset for this analysis as the tissue of origin matched for the same mouse strain and developmental type points as our samples and those used in the Chu et al., 2019 analysis. We used the Chu et al., 2019 data set for comparative validation, and to further explore whether the biological effects of paternal obesity were like those of a hypoxic placenta. We have revised the text to more clearly show the biological relevance and interpretation of this analysis (see author response 12)

      We revised the text to clarify the biological implications of this analysis:

      Lines 282-290: “This reduction in the number of detected DEGs before versus after accounting for cellular composition suggests that changes in cell-type proportions at least partly drive tissue-level differential expression. This is consistent with the recent finding that preeclampsia-associated cellular heterogeneity in human placentas mediates previously detected bulk gene expression differences (Campbell et al., 2023). There were similarities between the bulk RNA-seq and deconvoluted analysis in that there was overlap of DEGs detected before and after adjusting for cell-type proportions (Figure 5 – Figure supplement 3 G and H, Fisher’s exact test P=1.8e-105 and P=0e+00, respectively). This differential gene expression analysis accounting for cellular composition provides insight into how paternal obesity may impact placental development and function and underscores the contribution of cellular heterogeneity in this process.”

      Reviewer #4 (Public Review):

      The members of the Kimmins lab perform a dietary study in mice to investigate the impact of obesity of fathers on the development of their offspring. To do so, they expose male mice to a high fat diet and determine the distribution and occupancy levels of the histone H3 lysine 4 trimethylation (H3K4me3) mark in spermatozoa and perform gene expression studies on placenta tissue obtained from mouse embryos during mid-gestation development. The authors report changes in H3K4me3 occupancy in sperm as well as in transcriptomes of placentas of male and female embryonic offspring. While the authors perform extensive computational analysis of the transcriptomic and chromatin immunoprecipitation data, the authors do not go much beyond making correlative statements at mainly the genome wide level between changes for H3K4me3 in sperm and transcriptional changes in placenta, the latter of which are in part related to changes in cellular composition (as deduced from transcriptional data). Given that both parental mice had the same genetic background, it was not possible to deduce parental specific contributions to transcriptional changes as observed in placentas of offspring. In all, the study falls short in increasing mechanistic insights into this important biological phenomenon.

      It is difficult in studies on paternal epigenetic inheritance to attribute a mechanism and we agree that the relationship between the obesity altered sperm epigenome and the placenta abnormalities are correlative. However, the novelty in our study is that we postulate a new mechanism for paternal transmission of metabolic disease that implicates the placenta and demonstrate this via an altered placenta transcriptome and placenta developmental abnormalities described here and in our previous paper on this model ((Jazwiec et al., 2022); PMID: 35377412). The next steps for the field to address causation/mechanism requires generation of a sperm epigenome edited mouse model where we induce and track histone methylation changes at specific genes to the tissues in the next generation. Indeed, this targeting approach is underway in our research program.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Benner et al. identify OVO as a transcriptional factor instrumental in promoting the expression of hundreds of genes essential for female germline identity and early embryo development. Prior data had identified both ovo and otu as genes activated by OVO binding to the promoters. By combining ChIP-seq, RNA-seq, and analysis of prior datasets, the authors extend these data to hundreds of genes and therefore propose that OVO is a master transcriptional regulator of oocyte development. They further speculate that OVO may function to promote chromatin accessibility to facilitate germline gene expression. Overall, the data compellingly demonstrate a much broader role for OVO in the activation of genes in the female germline than previously recognized. By contrast, the relationship between OVO, chromatin accessibility, and the timing of gene expression is only correlative, and more work will be needed to determine the mechanisms by which OVO promotes transcription.

      We fully agree with this summary.

      Strengths:

      Here Benner et al. convincingly show that OVO is a transcriptional activator that promotes expression of hundreds of genes in the female germline. The ChIP-seq and RNA-seq data included in the manuscript are robust and the analysis is compelling.

      Importantly, the set of genes identified is essential for maternal processes, including egg production and patterning of the early embryo. Together, these data identify OVO as a major transcriptional activator of the numerous genes expressed in the female germline, deposited into the oocyte and required for early gene expression. This is an important finding as this is an essential process for development and prior to this study, the major drivers of this gene expression program were unknown.

      We are delighted that this aspect of the work came across clearly. Understanding the regulation of maternal effect genes has been something of a black-box, despite the importance of this class of genes in the history of developmental genetics. The repertoire of essential oogenesis/embryonic development genes that are bound by and respond to OVO are well characterized in the literature, but nothing is known about how they are transcriptionally regulated. We feel the manuscript will be of great interest to readers working on these genes.

      Weaknesses:

      The novelty of the manuscript is somewhat limited as the authors show that, like two prior, well-studied OVO target genes, OVO binds to promoters of germline genes and activates transcription. The fact that OVO performs this function more broadly is not particularly surprising.

      Clearly, transcription factors regulate more than one or two genes. Never-the-less we were surprised at how many of the aspects of oogenesis per se and maternal effect genes were OVO targets. It was our hypothesis that OVO would have a transcriptional effect genome-wide, however, it was less clear whether OVO would always bind at the core promoter, as is with the case of ovo and otu. Our results strongly support the idea that core promoter proximal binding is essential for OVO function; a conclusion of work done decades ago, which has not been revisited using modern techniques.

      A major challenge to understanding the impact of this manuscript is the fact that the experimental system for the RNA-seq, the tagged constructs, and the expression analysis that provides the rationale for the proposed pioneering function of OVO are all included in a separate manuscript.

      This is a case where we ended up with a very, very long manuscript which included a lot of revisiting of legacy data. It was a tough decision on how to break up all the work we had completed on ovo to date. In our opinion, it was too much to put everything into a single manuscript unless we wanted a manuscript length supplement (we were also worried that supplemental data is often overlooked and sometimes poorly reviewed). We therefore decided to split the work into a developmental localization/characterization paper and a functional genomics paper. As it stands both papers are long. Certainly, readers of this manuscript will benefit from reading our previous OVO paper, which we submitted before this one. The earlier manuscript is under revision at another journal and we hope that this improved manuscript will be published and accessible shortly.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Benner et al. interrogate the transcriptional regulator OVO to identify its targets in the Drosophila germline. The authors perform ChIP-seq in the adult ovary and identify established as well as novel OVO binding motifs in potential transcriptional targets of OVO. Through additional bioinformatic analysis of existing ATAC-seq, CAGE-seq, and histone methylation data, the authors confirm previous reports that OVO is enriched at transcription start sites and suggest that OVO does not act as part of the core RNA polymerase complex. Benner et al. then perform bulk RNA-seq in OVO mutant and "wildtype" (GAL4 mediated expression of OVO under the control of the ovo promoter in OVO mutants) ovaries to identify genes that are differentially expressed in the presence of OVO. This analysis supports previous reports that OVO likely acts at transcription start sites as a transcriptional activator. While the authors propose that OVO activates the expression of genes that are important for egg integrity, maturation, and for embryonic development (nanos, gcl, pgc, bicoid), this hypothesis is based on correlation and is not supported by in vivo analysis of the respective OVO binding sites in some of the key genes. A temporal resolution for OVO's role during germline development and egg chamber maturation in the ovary is also missing. Together, this manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis but lacks important in vivo experimental evidence that would validate the high-quality datasets.

      We thank reviewer 2 for the appreciation of the genomics data and analysis. Some of the suggested in vivo experiments are clear next steps, which are well underway. These are beyond the scope of the current manuscript.

      Temporal analysis of ovo function in egg chamber development is not easy, as only the weakest ovo alleles have any egg chambers to examine. However, we will also point out the long-known phenotypes of some of those weak alleles in the text (e.g. ventralized chambers in ovoD3/+). We will need better tools for precise rescue/degradation during egg chamber maturation.

      Strengths:

      The manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis

      Thank you. We went to great lengths to do our highly replicated experiments in multiple ways (e.g. independent pull-down tags) and spent considerable time coming up with an optimized and robust informatic analysis.

      Weaknesses:

      1) The authors propose that OVO acts as a positive regulator of essential germline genes, such as those necessary for egg integrity/maturation and embryonic/germline development. Much of this hypothesis is based on GO term analysis (and supported by the authors' ChIP-seq data). However accurate interpretation of GO term enrichment is highly dependent on using the correct background gene set. What control gene set did the authors use to perform GO term analysis (the information was not in the materials and methods)? If a background gene set was not previously specified, it is essential to perform the analysis with the appropriate background gene set. For this analysis, the total set of genes that were identified in the authors' RNA-seq of OVO-positive ovaries would be an ideal control gene set for which to perform GO term analysis. Alternatively, the total set of genes identified in previous scRNA-seq analysis of ovaries (see Rust et al., 2020, Slaidina et al., 2021 among others) would also be an appropriate control gene set for which to perform GO term analysis. If indeed GO term analysis of the genes bound by OVO compared to all genes expressed in the ovary still produces an enrichment of genes essential for embryonic development and egg integrity, then this hypothesis can be considered.

      We feel that this work on OVO as a positive regulator of genes like bcd, osk, nos, png, gnu, plu, etc., is closer to a demonstration than a proposition. These are textbook examples of genes required for egg and early embryonic development. Hopefully, this is not lost on the readers by an over-reliance on GO term analysis, which is required but not always useful in genome-wide studies.

      We used GO term enrichment analysis as a tool to help focus the story on some major pathways that OVO is regulating. To the specific criticism of the reference gene-set, GO term enrichment analysis in this work is robust to gene background set. We will update the GO term enrichment analysis text to indicate this fact and add a table using expressed genes in our RNA-seq dataset to the manuscript and clarify gene set robustness in greater detail in the methods of the revision. We will also try to focus the reader’s attention on the actual target genes rather than the GO terms in the revised text.

      2) The authors provide important bioinformatic analysis of new and existing datasets that suggest OVO binds to specific motifs in the promoter regions of certain germline genes. While the bioinformatic analysis of these data is thorough and appropriate, the authors do not perform any in vivo validation of these datasets to support their hypotheses. The authors should choose a few important potential OVO targets based on their analysis, such as gcl, nanos, or bicoid (as these genes have well-studied phenotypes in embryogenesis), and perform functional analysis of the OVO binding site in their promoter regions. This may include creating CRISPR lines that do not contain the OVO binding site in the target gene promoter, or reporter lines with and without the OVO binding site, to test if OVO binding is essential for the transcription/function of the candidate genes.

      Exploring mechanism using in vivo phenotypic assays is awesome, so this is a very good suggestion. But, it is not essential for this work -- as has been pointed out in the reviews, in vivo validation of OVO binding sites has been comprehensively done for two target genes, ovo and otu. The “rules” appear similar for both genes. That said, we are already following up specific OVO target genes and the detailed mechanism of OVO function at the core promoter. We removed some of our preliminary in vivo figures from the already long current manuscript. We continue to work on OVO and expect to include this type of analysis in a new manuscript.

      3) The authors perform de novo motif analysis to identify novel OVO binding motifs in their ChIP-seq dataset. Motif analysis can be significantly strengthened by comparing DNA sequences within peaks, to sequences that are just outside of peak regions, thereby generating motifs that are specific to peak regions compared to other regions of the promoter/genome. For example, taking the 200 nt sequence on either side of an OVO peak could be used as a negative control sequence set. What control sequence set did the authors use as for their de novo motif analysis? More detail on this is necessary in the materials and methods section. Re-analysis with an appropriate negative control sequence set is suggested if not previously performed.

      We apologize for being unclear on negative sequence controls in the methods. We used shuffled OVO ChIP-seq peak sequences as the background for the de novo motif analysis, which we will better outline in the methods of the revision. This is a superior background set of sequences as it exactly balances GC content in the query and background sequences. We are not fond of the idea of using adjacent DNA that won’t be controlled for GC content and shadow motifs. Furthermore, the de novo OVO DNA binding motifs are clear, statistically significant variants of the characterized in vitro OVO DNA binding motifs previously identified (Lu et al., 1998; Lee and Garfinkel, 2000; Bielinska et al., 2005), which lends considerable confidence. We also show that the OVO ChIP-seq read density are highly enriched for all our identified motifs, as well as the in vitro motifs. We provide multiple lines of evidence, through multiple methods, that the core OVO DNA binding motif is 5’-TAACNGT-3’. We have high confidence in the motif data.

      4) The authors mention that OVO binding (based on their ChIP-seq data) is highly associated with increased gene expression (lines 433-434). How many of the 3,094 peaks (conservative OVO binding sites), and what percentage of those peaks, are associated with a significant increase in gene expression from the RNA-seq data? How many are associated with a decrease in gene expression? This information should be added to the results section.

      Not including the numbers of the overlapping ChIP peaks and expression changes in the text was an oversight on our part. The numbers that relate to this (666 peaks overlapping genes that significantly increased in expression, significant enrichment according to Fishers exact test, 564 peaks overlapping genes that significantly decreased in expression, significant depletion according to Fishers exact test) are found in figure 4C and will be added to the text.

      5) The authors mention that a change in endogenous OVO expression cannot be determined from the RNA-seq data due to the expression of the OVO-B cDNA rescue construct. Can the authors see a change in endogenous OVO expression based on the presence/absence of OVO introns in their RNA-seq dataset? While intronic sequences are relatively rare in RNA-seq, even a 0.1% capture rate of intronic sequence is likely to be enough to determine the change in endogenous OVO expression in the rescue construct compared to the OVO null.

      This is a good point. The GAL4 transcript is downstream of ovo expression in the hypomorphic ovoovo-GAL4 allele. We state in the text that there is a nonsignificant increase in GAL4 expression with ectopic rescue OVO, although the trend is positive. We calculated the RPKM of RNA-seq reads mapping to the intron spanning exon 3 and exon 4 in ovo-RA and found that there is also a nonsignificant increase in intronic RPKM with ectopic rescue OVO (we will add to the results in the revision). We would expect OVO to be autoregulatory and potentially increase the expression of GAL4 and/or intronic reads, but the ovoovo-GAL4>UASp-OVOB is not directly autoregulatory like the endogenous locus. It is not clear to us how the intervening GAL4 activity would affect OVOB activity in the artificial circuit. Dampening? Feed-forward? Is there an effect on OVOA activity? Regardless, this result does not change our interpretation of the other OVO target genes.

      6) The authors conclude with a model of how OVO may participate in the activation of transcription in embryonic pole cells. However, the authors did not carry out any experiments with pole cells that would support/test such a model. It may be more useful to end with a model that describes OVO's role in oogenesis, which is the experimental focus of the manuscript.

      We did not complete any experiments in embryonic pole cells in this manuscript and base our discussion on the potential dynamics of OVO transcriptional control and our previous work showing maternal and zygotic OVO protein localization in the developing embryonic germline. Obviously, we are highly interested in this question and continue to work on the role of maternal OVO. We agree that we are extended too far and will remove the embryonic germ cell model in the figure. We will instead focus on the possible mechanisms of OVO gene regulation in light of the evidence we have shown in the adult ovary, as suggested.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      This paper now provides a convincing presentation of valuable results of the drivers of nest construction for one termite species, and they briefly discuss possible relevance to other termite species. However, the authors have not yet addressed how their results may be important outside the field of termite nest construction. I could imagine the significance of the paper being elevated to important if there is a broader discussion about the impact of this work, e.g., the relevance of the results, the approach, and/or next steps to related fields outside of termite nest construction.

      Reading our manuscript again, we have to agree with the reviewer that we mostly focused the discussion of our results in the context of termite construction, without attempting to generalise to other systems. To some extent we still defend this choice, as we prefer not to make too many claims on the relevance of our results beyond what we can reasonably support with our own experimental results. However, we thought that it would be appropriate – as suggested by the reviewer – to add at least one paragraph to indicate how our results could be extrapolated to other systems. This new paragraph is now at the end of the discussion section.

      Here we elaborate a bit further on this point: first of all, while termites certainly build the most complex structures found in the natural world, there aren’t many other animals that are capable of collectively building complex structures. Typically, collective building activity is limited to highly social (typically eusocial) animals, but other social insects, such as ants and wasps, are phylogenetically distant from termites, their nests are often different (the large majority of ant nests only comprise excavated galleries with little construction, while wasp nests tend to comprise multiple repeated patterns that could be produced from stereotyped individual behaviour). Because of these differences, drawing a comparison between the mechanisms that regulate termite architecture and those that regulate other forms of animal architecture would be too speculative. One domain, however, where similar mechanisms to those that we describe here could operate is that of pattern formation at the cellular and tissue level, where surface curvature was shown to drive different phenomena from cell migration to tissue growth. A comment on this is now added in the manuscript at the very end of the discussion.

      Similarly, on a related note, as someone not directly in the field of termite nest construction but wanting to understand the system (and the results) presented here in a broader context, I found the additional information about species and natural habitat very helpful and interesting, though I was rather disappointed to find it relegated to supplementary material where most readers will not see it.

      We considered this suggestion to present more information about the natural nesting habits of the termites that we study into the main text, but eventually we decided to leave it as supplementary only. We feel that the nesting habits of the termites that we studied here are not too central to the problem that we want to focus on, of how they coordinate their building activity. In fact, there is a large variety of nesting habits across termite genera and species, but we believe that, at a basic level, the mechanisms that we describe here would also apply to species with different nesting habits, because our results are consistent with what is described in the scientific literature for other termite species. As our introduction is already a bit long, we left this description of Coptotermes nesting habits in the supplementary material, where, hopefully, it will still be accessible and useful to readers interested in finding this information.

      When providing responses to reviewers, please directly address the reviewers’ comments point-by-point rather than summarizing comments and responding to summaries.

      We apologize for our previous way to respond to comments and thanks the reviewer for his remark as we learn to navigate through the eLife reviewing system (where some comments are repeated in the overall assessment and in the feed-back of individual reviewers).

      Figure 2 colors: Panels A and E and maybe B do not seem colorblind-friendly. I suggest modifying the colormaps to address this.

      We have changed the colormaps of figures A,B and E which are now colorblind-friendly.

      Line 180: This system is not in equilibrium. Perhaps the authors mean "steady-state?" I suggest reviewing language to ensure that the correct technical terms are used.

      We have now corrected this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This valuable study, of interest for students of the biology of genomes, uses simulations in combination with published data to examine how many TADs remain after cohesin depletion. The authors suggest that a significant subset of chromosome conformations do not require cohesin, and that knowledge of specific epigenetic states can be used to identify regions of the genome that still interact in the absence of cohesin. The theoretical approaches and quantitative analysis are state-of-the-art, and the data quality and strength of the conclusions are solid. However, because "physical boundaries (of domains?)" in the model appear to be a consequence of preserved TADs, rather than the other way around, the functional insights are limited.

      Summary of the reviewer discussion for the authors:

      While the simulations are state of the art and the reviewers appreciated that the approaches used here might help to resolve apparent discrepancies between conclusions from single-cell and bulk/ensemble techniques to study chromosome conformation, the work would benefit from clarification of what precisely is meant with "physical boundaries" and from a comparison of CCM and HIPPS models to understand commonalities and differences between them. In addition, more discussion of the relation of the current work to previous studies, such as Schwarzer et al., 2017, and Nuebler et al., 2018, would elevate the work and make the key claims more compelling. Please see also the detailed comments from the expert reviewers.

      We thank the editor for the assessment and the reviewers for the incisive comments. We will address these comments one by one. In particular, we attempt to clarify the concept of “physical boundaries” and its relevance in our study. We hope our responses are satisfactory. We believe that our manuscript has benefitted substantially by revising the manuscript following the comments by the reviewers.

      Below is our point-by-point response to the comments:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Jeong et al. investigate the prevalence and cause of TADs that are preserved in eukaryotic cells after cohesin depletion. The authors perform an extensive analysis of previously published Hi-C data, and find that roughly 15% of TADs are preserved in both mouse liver cells and in HCT-116 cells. They confirm previous findings that epigenetic mismatches across the boundaries of TADs can cause TAD preservation. However, the authors also find that not all preserved TADs can be explained this way. Jeong et al. provide an argument based on polymer simulations that "physical boundaries" in 3D structures provide an additional mechanism that can lead to TAD preservation. However, in its current form, we do not find the argumentation and evidence that leads to this claim to be fully compelling.

      Strengths:

      We appreciate the extensive statistical analysis performed by the authors on the extent to which TAD's are preserved; this does seem like a novel and valuable contribution to the field.

      We thank the reviewer for a succinct and comprehensive summary of our work and for appreciating value of our work.

      Weaknesses:

      1) As the authors briefly note, the fact that compartmentalization due to epigenetic mismatches can cause TAD-like structures upon cohesin depletion has already been discussed in the literature; see for example Extended Data Figure 8 in (Schwarzer et al., 2017) or the simulation study (Nuebler et al., 2018). We are hence left with the impression that the novelty of this finding is somewhat overstated in this manuscript.

      It is unclear to us by studying the results in the Extended Data Figure 8 that the authors have shown that epigenetic mismatches cause TAD-like structures. As far as we can discern, the data, without a quantitative analysis, shows that may be new TAD-like structures that are not in the wild type appear when cohesin is deleted.

      The studies by Schwarzer et al 2017 and Nuebler et al 2018 are relevant to our own investigation, which we undertook after scrutinizing the experiments in Schwarzer et al 2017 and the related work by Rao et. al in 2017 on a different cell line. In the summary of the Reviewer discussion, it is suggested we discuss the relation to the experimental study by Schwarzer et al 2017 and the computational work by Nuebler et al 2018.

      (1) The results and the corresponding discussion in these two studies are different (may be complimentary) from our results. When referring to the Extended Data Figure 8 Schwarzer and co-authors state in the main text, “The finer compartmentalization explains most of the remaining or new domains and boundaries seen in Nipbl Hi-C maps”. We are not 100% sure what “remaining” means in this context. The Extended Data Fig. 8(a) shows the “common boundaries” is correlated with the eigenvectors of compartmentalization. If this indeed is what the reviewer is referring to, we believe that our study differs from theirs in two important ways: First, Extended Data Fig.8 (a) is a statistical analysis at the “ensemble” level. In our study, we examined the preservation of TADs at both individual and ensemble level with more detailed analysis. Second, in Extended Data Fig. 8(a), the “common boundaries” (incidentally we are uncertain how that was calculated) are compared to the eigenvectors of PCA analysis of the compartments (larger length scales). In contrast, in our study, we examined the correlation between TAD boundaries and the epigenetic profiles. We believe that this is an important distinction. The PCA analysis of compartments and “common boundaries” defined using (presumably) the insulation score are both derived from the Hi-C contact map. Epigenetic profile, on the other hand, is independent of Hi-C data. We believe our contribution, is to build the connection between epigenetic profiles with the preservation of TADs, and link it to 3D structures. For these reasons, we assert that our results are novel, and are not contained (or even implied) in the Schwarzer et al 2017 study.

      The simulations in Neubler et al 2018, which were undertaken to rationalize the experimenrs, revealed that compartmentalization of small segments is enhanced after cohesin depletion, while TADs disappear, which support the broad claims that are made in the experiments. They assert that the structures generated are non-equilibrium. They do not address the emergence of preserved nor the observation of TAD-like structures at the single cell level. However, our goal was to elucidate the reasons for of preservation of TADs. By that we mean, the reasons why certain TADs are present in both the wild and cohesin depleted cells? Through a detailed analyses of two cells, polymer simulations, we have provided a structural basis to answer the question. Finally, we have provided a plausible between TAD preservation and maintenance of enhancer-promoter interactions by analyzing the Micro-C data. For all these reasons, we believe that our study is different from the results in the Extended Figure 8 or the simulations described by Neubler.

      Let us summarize the new results in our study that are not contained in the studies referred to by this Reviewer. (1) We showed by analyzing the Hi-C data for mouse liver and HCT-16 that a non-negligible fraction of TAPs is preserved, which set in motion our detailed investigation. (2) Then, using polymer simulations on a different cell type, we generated quantitative insights (epigenetic mismatches as well as structural basis) for the preservation of TADs. Although not emphasized, we showed that deletion of cohesin in the GM12878 cells also give rise to P-TADs a prediction that suggests that the observations might be “universal”. (3) Rather than perform, time consuming polymer simulations, we calculated 3D structures directly from Hi-C data for the mouse liver and HCT-16 cells, which provided a structural basis for TAP preservation. (4) The 3D structures also showed how TAD-like features appear at the single cell level, which is in accord with imaging experiments. (5) Finally, we suggest that P-TADs may be linked to the maintenance of enhancer-promoter and promoter-promoter interactions by calculating the 3D structures using the recent Micro-C data.

      For the reasons given above, we assert that our results are novel, and bring new perspectives that are not in the aforementioned insightful studies cited by the Reviewer.

      2) It is not quite clear what the authors conceptually mean by "physical boundaries" and how this could offer additional insight into preserved TADs. First, the authors use the CCM model to show that TAD boundaries correlate with peaks in the single cell boundary probability distribution of the model. This finding is consistent with previous reports that TAD-like structures are present in single cells, and that specific TAD boundaries only arise as a population average.

      The finding based on the CCM simulations hence seems to be that preserved TADs also arise as a population average in cohesin-depleted cells, but we do not follow what the term "physical boundaries" refers to in this context. The authors then use the Hi-C data to infer a maximumentropy-based HIPPS model. They find that preserved TADs often have boundaries that correspond to peaks in the single cell boundary probabilities of the inferred model. The authors seem to imply that these peaks in the boundary probability correspond to "physical boundaries" that cause the preservation of TADs. This argument seems circular; the model is based on inferring interaction strengths between monomers, such that the model recreates the input Hi-C map. This means that the ensemble average of the model should have a TAD boundary where one is present in the input Hi-C data. A TAD boundary in the Hi-C data would then seem to imply a peak in the model's single-cell boundary probability. (The authors do display two examples where this is not the case in Fig.3h, but looking at these cases by eye, they do not seem to correspond to strong TAD boundaries.) "Physical boundaries" in the model are hence a consequence of the preserved TADs, rather than the other way around, as the authors seem to suggest. At the very least the boundary probability in the HIPPS model is not an independent statistic from the Hi-C map (on which their model is constrained), so we have concerns about using the physical boundaries idea to understand where some of the preserved TADs come from.

      There are many statements in this long comment that require us to unpack separately. First, using both the CCM simulations, and even more importantly using data-driven approach, we established that TAD-like structures are present in single cells with and without cohesin. The latter finding is fully consistent with imaging experiments. We are unaware of other computational efforts, before our work, demonstrating that this is the case. Perhaps, the Reviewer can point to the papers in the literature.

      Regarding the statement that our arguments are circular, and lack of clarity of the meaning of physical boundary, please allow us to explain. First, we apologize for the confusion. Let us clarify our approach. We first used CCM to understand the potential origin of substantial fraction of P-TADs in the GM. The simulations, allowed us to generate the plausible mechanisms, for the origin of P-TADs. Because the CCM does reproduce the Hi-C data, we surmised that the general mechanisms inferred from these simulations could be profitably used to analyze the experiments. The simulations also showed that knowledge of 3D structures produces a muchneeded structural basis of P-TADs , and potentially for emergence of new TADs upon cohesin depletion.

      Because 3D coordinates are needed to obtain structural insights into the role of cohesin, we use a novel method to obtain them without the need for simulations. In particular, we used the HIPPS method to obtain 3D coordinates with the Hi-C data as the sole input, which allowed us to calculate directly the boundary probabilities. The excellent agreement between the predicted 3D structures and imaging experiments suggests that meaningful information, not available in Hi-C, may be gleaned from the ensemble of calculated 3D structures.

      Although “physical boundary”, a notion introduced by Zhuang, is defined in in the method section, it is apparently unclear for which we apologize. Because this is an important technical tool, we have added a summary in the main text in the revision. We did not mean to imply that the physical boundaries cause the preservation of TADs, although we found that maintenance of the enhancer-promoter contacts (see Fig. 8 in the revision) could be one of the potential reasons for the emergence of physical boundaries. We agree with the reviewer that physical boundaries are structural evidence of preserved TADs (not the cause), that is when a TAD is preserved, we can detect it by prominent physical boundary. The purpose and benefit of physical boundary analysis and using HIPPS in general is to obtain three-dimensional structures of chromosomes. Although both CCM simulations and HIPPS use Hi-C contact maps, three-dimensional structures provide additional information that is not present in the Hi-C data.

      The arguments that the authors use to justify their claims could be clarified and strengthened. Here are some suggestions: -Explain the concept of "physical boundaries" more clearly in the main text.

      As explained above, we have revised the text to clarify the concept and purpose of physical boundaries analysis. See Page 7.

      • Justify why the boundary probabilities and the physical boundaries concept can be used to offer novel insight into where preserved TADs may come from.

      Boundary probabilities and physical boundaries provide previously unavailable 3D structural information on the TADs structures both at the single-cell and population level. This provides a direct structural basis for determining which TADs are preserved. But in order to understand where P-TADs may come from, physical boundaries analysis alone is not sufficient. As we have shown in the analysis of enhancer-promoter contact, using physical boundary analysis from 3D structures, we can conclude that conservation of enhancer-promoter contact could be one of the reasons for the P-TAD.

      • Explain more clearly what the additional value of using the HIPPS model to study TAD preservation is.

      Our goal, as announced in the title is to elucidate the structural basis for the emergence of PTADs. The HIPPS method, which avoids doing simulations (like CCM and other polymer models used in the literature) provides an ensemble of 3D conformations using averaged contact map generated in Hi-C experiments. Even more importantly, HIPPS produce an ensemble of structures, which can be the basis for predicting the outcomes at the single-cell level. The accuracy of the generated structures has been shown in our previous work (Shi and ThirumalaiPRX 2021). In ensemble-averaged Hi-C experiments, TADs appear to be relatively stable. However, imaging experiments (Bintu et. al, 2018) have revealed that TADs are not fixed structures present in every single cell, but instead exhibit variability at the single-cell level. TADlike structures with distinct boundaries are observed in individual cells, and the location of these boundaries varies from cell to cell. However, these TAD-like structures still show a preferential positioning in 3D structures. Interestingly, the preferential positioning often corresponds to TAD boundaries observed in population-averaged Hi-C data. This suggests that while cohesin is involved in establishing the overall organization of TADs, other factors and mechanisms could also contribute to TAD formation at the individual cell level. In this study, we showed some boundaries of P-TADs upon cohesin loss in the Hi-C maps, align with preferential boundaries in individual 3D structures of chromosomes. The makes the finding that a subset of TADs is preserved upon cohesin is robust.

      From a technical perspective, the use of HIPPS avoids time-consuming polymer simulations. The HIPPS is rapid and can be used to generate arbitrarily large ensemble of structures, allowing us calculate properties both at the single cell and ensemble level.

      In addition, we'd like to offer the following feedback to the authors.

      3) The discussion of enhancer-promoter loops as a cause of TAD preservation is interesting, but it would be interesting to know fraction of preserved TADs enhancer-promoter loops might explain.

      We thank the reviewer for the excellent suggestion. We have done the suggested calculation. The results are shown in a new Figure.8 in the main text. We also moved the results on enhancer-promoter to the main results section from the Discussion section.

      4) The last paragraph of the introduction seems to state that only the HIPPS model was used to find single-cell 3D structures and boundary probabilities. However, the main text suggests that the CCM model was also used for these purposes.

      We have revised the text to clarify this point on pages 3-4. Also please see the discussion on the utility of HIPPS above.

      5) When referring to the boundary probability, it would be useful if the authors always specified whether they refer to the boundary probability before or after cohesin depletion (or loop depletion in the CCM model). Statements such as "This implies that peaks in the boundary probabilities should correspond to P-TADs" are ambiguous; it is unclear if the authors mean that boundary probabilities before cohesin depletion predict that the boundary will be preserved, rather than that preserved TAD boundaries correlate with peaks in the boundary probability after cohesin depletion.

      We thank the reviewer for the suggestion. Indeed, it may be confusing. Hence, we have revised the text in numerous places to clarify this point.

      6) It would be interesting to analyze all TAD boundaries that are present after cohesin depletion, rather than just those that overlap with TAD boundaries in WT cells. This would give better statistics for answering the question what causes TAD-like structures in cells without cohesin.

      We thank the reviewer for this excellent suggestion. First, this would we believe this deviate from the primary goal of this study: what leads to TAD preservation after cohesin deletion? Second, this has to be done very systematically, as we did here for P-TADs, in order draw meaningful conclusions. This is a very useful study for another occasion.

      7) The use of a plethora of acronyms (P-TAD, CM, DM, CCM, HLM...) makes the paper difficult to read.

      We have revised the text to change CM to “contact map” and “DM” to “distance map”. For PTADs, CCM, and WLM, we would argue that P-TAD is rather a clear and intuitive abbreviation and CCM/WLM refers to specific methods/models and replacing them with full names would make text more difficult to read. We hope the reviewer is okay with us keeping these acronyms.

      Reviewer #2 (Public Review):

      Summary:

      Here Jeong et al., use a combination of theoretical and experimental approaches to define molecular contexts that support specific chromatin conformations. They seek to define features that are associated with TADs that are retained after cohesin depletion (the authors refer to these TADs as P-TADs). They were motivated by differences between single cell data, which suggest that some TADs can be maintained in the absence of cohesin, whereas ensemble HiC data suggest complete loss of TADs. By reananalyzing a number of HiC datasets from different cell types, the authors observe that in ensemble methods, a significant subset of TADs are retained. They observe that P-TADs are associated with mismatches in epigenetic state across TAD boundaries. They further observe that "physical boundaries" are associated with P-TAD maintenance. Their structure/simulation based approach appears to be a powerful means to generate 3D structures from ensemble HiC data, and provide chromosome conformations that mimic the data from single-cell based experiments. Their results also challenge current dogma in the field about epigenetic state being more related to compartment formation rather than TAD boundaries. Their analysis is particularly important because limited amounts of imaging data are presently available for defining chromosome structure at the single-molecule level, however, vast amounts of HiC and ChIP-seq data are available. By using HiC data to generate high quality simulated structural data, they overcome this limitation. Overall, this manuscript is important for understanding chromosome organization, particularly for contacts that do not require cohesin for their maintenance, and for understanding how different levels of chromosome organization may be interconnected. I cannot comment on the validity of the provided simulation methods and hope that another reviewer is qualified to do this.

      We appreciate the reviewer for a comprehensive summary of our work, and we are happy that the reviewer finds our work important, which provides valuable insights to the field.

      Specific comments

      • It is unclear what defines a physical barrier. From reading the text and the methods, it is not entirely clear to me how the authors have designated sites of physical barriers. It may help to define this on pg 7, second to last paragraph, when the authors first describe instances of PTAD maintenance in the absence of epigenetic mismatch.

      We thank the reviewer for the suggestions. The details of physical boundary designation are provided in the appendix data analysis. To make the concept and idea of physical boundary easy to understand, we have revised the text on page 7 in the revised main text.

      • Figure 7 adds an interesting take to their approach. Here the authors use microC data to analyze promoter-enhancer/promoter-promoter contacts. These data are included as part of the discussion. I think this data could be incorporated into the main text, particularly because it provides a biological context where P-TADs would have a rather critical role.

      We thank the reviewers for the suggestion. We also agree that results in Figure 7 provide novel insights on TAD formation and its possible preservation upon perturbation. We have followed the reviewer’s suggestion to move it to an independent section in the main results section as the last subsection.

      • Figure 3a- the numbers here do not match the text (page 6, second to last paragraph). The numbers have been flipped for either chromosome 10 or chromosome 13 in the text or the figures.

      We thank the reviewer for pointing out this error. In the revised main text, it has been corrected.

      Reviewer #3 (Public Review):

      This manuscript presents a comprehensive investigation into the mechanisms that explain the presence of TADs (P-TADs) in cells where cohesin has been removed. In particular, to study TADs in wildtype and cohesin depleted cells, the authors use a combination of polymer simulations to predict whole chromosome structures de novo and from Hi-C data. Interestingly, they find that those TADs that survive cohesin removal contain a switch in epigenetic marks (from compartment A to B or B to A) at the boundary. Additionally, they find that the P-TADs are needed to retain enhancer-promoter and promoter-promoter interactions.

      Overall, the study is well-executed, and the evidence found provides interesting insights into genome folding and interpretations of conflicting results on the role of cohesin on TAD formation.

      We are pleased with the reviewer’s positive assessment of our work.

      To strengthen their claims, the authors should compare their de-novo prediction approach to their data-driven predictions at the single cell level.

      We thank the reviewer for the very good suggestion. We are assuming that the Reviewer is asking us to compare the CCM simulations with HIPPS generated structures at the single cell level. We have shown, using the GM12878 cell data, that the polymer simulations reproduce the Hi-C contact maps (an average quantity) well (see Appendix Fig. 2 and Fig. 3). In addition, we show in Appendix Fig. 8 the comparison with ensemble averaged distance maps as well as at the single cell level for Chr 13 from the GM12878 cell. There are TAD-like structures at the single cell level just as we find for HCT-116 cell (Fig. 5 in the main text). Thus, the conclusions from de-novo prediction and data-driven predictions are consistent. In addition, in our previous publication introducing HIPPS in Phys Rev X 11: 011051 (2021), we showed that the method is quantitatively accurate in reproducing experimental data for all the interphase chromosomes.

      Having demonstrated this consistency, we used computationally simple data-driven predictions to analyze HCT-116 and mouse liver cell lines for which Hi-C data with and without cohesin rather than perform multiple laborious polymer simulations.

      Please see below for our response to specific comments.

      1) It is confusing that the authors change continuously their label for describing B-A and A-B switches. They should choose one expression. I think that the label "switch" between A and B is more precise than "mismatch".

      We have revised the text to make it consistent. Now it all reads “A-B”. Yes, the suggestion that we use switch is good but we think that mismatch is more concise. We trust that this Reviewer will indulge us on this point.

      2) In the Abstract, the authors mention HCT-116 cells but do not specify which cells are these.

      We have changed “HCT-116” in the abstract to “human colorectal carcinoma cell line”.

      3) In the Abstract, it is unclear what the authors mean by "without any parameters"

      In the theoretically based HIPPS method, there is no “free” parameter. In other words, the only parameter is uniquely determined. To avoid confusion, we have removed “without any parameters” from abstract.

      4) In Results, what do the authors mean by 16% (26%)?

      This refers the percentage of how many TADs are preserved after Nipbl and RAD21 removal in mouse and HCT-116 cells, respectively. Using TopDom method, we identified TAD boundaries in Wild and cohesin-depleted cells. There are 16% (959 out of 4176 – Fig. 1a) and 26% (1266 out of 4733 – Fig. 1b) of TADs are preserved after Nipbl and RAD21 removal in mouse and HCT-116 cells, respectively. We removed the percentages in the revised version.

      5) In Results, the authors mention "more importantly, we did tune the value of any parameter to fit the experimental CMs". Did they mean that instead they didn't tune any parameter?

      We apologize for the confusion. In the CCM, there is a single controlled parameter. We have changed the sentence to reflect this correctly.

      6) In Results, section "CCM simulations reproduce wild-type Hi-C maps", Kullback-Leibler (KL) divergence is used to assess the correlation between two loci, but it is unclear what the value 0.04 stands for; is it a good or a bad correlation?

      The value for Kullback-Leibler divergence can vary from 0 to infinity with 0 give the perfect correlation. Thus, 0.04 means that the correlation is excellent.

      7) The authors use two techniques to obtain 3D structures, one is CCM, which takes the cohesin as constraints, and another is HIPPS, which reconstructs from Hi-C maps. Both seem to have good agreement with the Hi-C contact maps. However, did the authors compare the CCM with the HIPPS 3D structures?

      This is detailed in response at the start of the reply to this Reviewer. As detailed in this response as well in the main text we used the CCM to generate hypotheses for the origin of P-TADs. In the process, we established the accuracy of CCM, which gives us confidence about the hypotheses. As explained above and emphasized in the revised version, CCM simulations are time consuming whereas generating 3D structures using HIPPS is computationally simple. Because HIPPS is also accurate, we used it to analyze the Hi-C data on mouse liver, HCT-116 as well as Micro-Data on mESC.

      In our paper in Phys Rev X 11: 011051 (2021) we showed that HIPPS reproduces Hi-C data. In the current manuscript, we showed in Appendix Fig. 2 and Fig. 3 as well as in a study in 2018 (Shi and Thirumalai, Nat Comm.) that CCM is accurate as well. Thus, there is little doubt about the accuracies of the methods that we have developed.

      8) In Results, section "P-TADs have prominent spatial domain boundaries", the authors constructed individual spatial distance matrices (DMs) using 10,000 simulated 3D structures. What are the differences among these 10,000 simulations? Do they start them with different initial structures?

      The structures are generated using HIPPS which is data-driven method that uses Hi-C contact map as constraints. The method, which uses the maximum entropy theory, samples from a distribution that describe the structural ensemble of chromosome. The 10,000 structures are randomly sampled and are independent from each other. The HIPPS method is not a simulation, and hence the issue of initial structures does not arise.

      9) In Methods, when the authors mention the "unknown parameter", do they use one parameter for all simulations (+/- cohesin) or is this parameter different for each system? Would this change the results?

      We apologize for the confusion. The “unknown parameter” is the energy scale 𝜖 that describes the interaction strength between chromosome loci. We have revised the text in the method (page 27) to clarify it. The same value of 𝜖 is used for all CCM simulation with or without cohesin.

      10) In Methods, when the authors perform DBSCAN clustering, they mention that they optimize the clustering parameters for each system. However, if they want to compare between different systems, the clustering parameters should be the same.

      The purpose of DBSCAN is to capture the spatial clustering topology of chromosome loci. However, different cell types and chromosomes may have different overall density, which will impact the average distance between loci. If using the same parameters, such global changes will impact the result of clustering most and the intended spatial clustering topology can be distorted. Hence, we tune the clustering parameter for each system in order to ignore the global effect but only capture the local and topology of clustering of chromosome loci.

      Grammar comments:

      1) "structures, with sharp boundaries are present, at.."

      We thank the reviewer for pointing out the error. We have fixed it.

      2) "Three headlines emerge from these studies are:"

      We have fixed it.

      3) "both the cell lines"

      We have fixed it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study explores the relationship between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 pathogenic strains. G4 structures were found to be non-randomly distributed within PAIs and conserved within the same strains. Positive correlations were observed between G4s and GC content across various genomic features, suggesting a link between G4 structures and GC-rich regions. Differences in GC content between PAIs and the core genome underscored the unique nature of PAIs. High-confidence G4 structures in Escherichia coli's regulatory regions were identified, influencing DNA integration within PAIs. These findings shed light on the molecular mechanisms of G4-PAI interactions, enhancing our understanding of bacterial pathogenicity and G4 structures in infectious diseases.

      Strengths:

      The findings of this study hold significant implications for our understanding of bacterial pathogenicity and the role of guanine-quadruplex (G4) structures. Molecular Mechanisms of Pathogenicity: The study highlights that G4 structures are not randomly distributed within pathogenicity islands (PAIs), suggesting a potential role in regulating pathogenicity. This insight into the uneven distribution of G4s within PAIs provides a basis for further research into the molecular mechanisms underlying bacterial pathogenicity.

      Conservation of G4 Structures: The consistent conservation of G4 structures within the same pathogenic strains suggests that these structures might play a vital and possibly conserved role in the pathogenicity of these bacteria. This finding opens doors for exploring how G4s influence virulence across different pathogens. Unique Nature of PAIs: The differences in GC content between PAIs and the core genome underscore the unique nature of PAIs. This distinction suggests that factors such as DNA topology and G4 structures might contribute to the specialized functions and characteristics of PAIs, which are often associated with virulence genes. Regulatory Role of G4s: The identification of high-confidence G4 structures within regulatory regions of Escherichia coli implies that these structures could influence the efficiency or specificity of DNA integration events within PAIs. This finding provides a potential mechanism by which G4s can impact the pathogenicity of bacteria.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Overall, the study provides fundamental insights into the pathogenicity island and conservation of G4 motifs.

      Thank you for your thorough review of our manuscript exploring the relationship between G4 structures and PAIs in 89 pathogenic strains. We appreciate your recognition of the strengths of our study and its potential implications for understanding bacterial pathogenicity. We are pleased that you highlighted the significance of our findings in revealing the non-random distribution and conservation of G4 structures within PAIs across various pathogenic strains.

      Your insightful comments about the molecular mechanisms of pathogenicity, the conservation of G4 structures, the unique nature of PAIs, and the regulatory role of G4s within Escherichia coli are invaluable. We are encouraged by your positive evaluation of these aspects, which underscores the potential impact of our work on advancing the understanding of bacterial pathogenicity.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript entitled "The Intricate Relationship of G-Quadruplexes and Pathogenicity Islands: A Window into Bacterial Pathogenicity" Bo Lyu explored the interactions between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 bacterial genomes through a rigorous computational approach. This paper handles an intriguing and complex topic in the field of pathogenomics. It has the potential to contribute significantly to the understanding of G4-PAI interactions and bacterial pathogenicity.

      Strengths:

      • The chosen research area.

      • The summarizing of the results through neat illustrations.

      Weaknesses:

      This reviewer did not find any significant weaknesses.

      Thank you for your positive and encouraging feedback on our manuscript. We appreciate your specific mention of the strengths, particularly highlighting the chosen research area and the effectiveness of our illustrations in summarizing the results. Your acknowledgment of these aspects is motivating, and we are pleased that the content and presentation resonated well with you.

      Reviewer #3 (Public Review):

      The main problem with the work is that the results are only descriptive and do not allow any inferences or conclusions about the importance of the function of G4 structures. The discussion and conclusions are poor. The results are preliminary and in order to try to make the analysis more interesting, it should be further extended and the data must be explored in a much greater depth.

      Thank you for your constructive feedback on our manuscript, and appreciate the time and effort you dedicated to evaluating our work. We acknowledge your concern regarding the descriptive nature of the results and the limitations in making inferences about the importance of G4 structures. To address this, we plan to enhance the depth of our analysis and provide more insightful interpretations in the discussion and conclusion sections. It's important to note that this study is intentionally a short report, emphasizing data mining findings rather than laboratory results. We understand the value of in-depth investigations and concur that our work lays the groundwork for more extensive studies in this area, aiming to provide a real-world scenario. We are committed to addressing your comments and refining our manuscript to contribute meaningfully to this field. Your insights are invaluable, and we look forward to presenting an improved version of our study.

      Reviewer #2 (Recommendations For The Authors):

      The authors could try a higher G-quadruplex score of 1.4 or higher values to substantiate their findings or pick up the bacterial genomes that relied on G4s for their pathogenecity.

      We acknowledge your recommendation to explore a higher G-quadruplex score, and we would like to assure you that we have already conducted analyses using thresholds of 1.4 and 1.6. The findings consistently support the observations presented in the manuscript. We have updated the text to reflect this additional analysis, and the results are included in the revised version of the manuscript (Figure S1).

      Reviewer #3 (Recommendations For The Authors):

      Minor points

      Introduction

      Q1. The introduction is shallow. The concept and the importance of PAIs is vague. Why should these genes be different from other genes?

      A1: Thank you for your valuable feedback and we have incorporated additional content to provide a more comprehensive understanding of PAIs and their distinctiveness from other genes in the Introduction section.

      Changes: Lines 44-49 “G4 structures are ...innovative technologies.” were added.

      Lines 51-55 “PAIs are distinct...such as plasmids.” were added.

      Lines 60-66 “PAIs typically contain...recipient genome” were added.

      Lines 77-80 “Growing evidence has...CpG islands, and PAIs” were added.

      Material and Methods

      Q2. It is not clear if the author used the TBTools or the G4Hunter software G4 structures. It would be interesting to include references to published articles that used this software.

      A2: Thank you! Corrected and added more references that used TBTools to extract sequences and G4Hunter to identify G4 structures.

      Q3. The statistical significance must not be based only on p-values. P-values are influenced by sample sizes. I strongly recommend the use of other parameters such as confidence interval and ROC analysis.

      A3: Thank you! We have incorporated confidence intervals and ROC analysis to complement p-values, enhancing the robustness of our statistical analysis.

      Changes: Lines 265-267 “The correlation's significance... sensitivity and specificity.” were added.

      Results and discussion

      Q4. The stability of G4 structures seems to be important for its function (doi:10.1111/febs.15065). Therefore it would be interesting if the analysis were carried out separating the G4 according to stability.

      A4: Thank you for highlighting the importance of G4 structure stability for its function and suggesting an analysis based on stability. We have carefully reviewed the referenced paper (doi:10.1111/febs.15065) and note that their study focused on the stability analysis of individual G4s. In our current study, we identified a large number of G4s, and while stability analysis for each G4 is indeed an interesting avenue, it goes beyond the scope of this particular investigation. However, we agree that exploring the relationship between G4 stability and function is a valuable topic. We plan to delve deeper into this aspect in future work, as discussed in our response to your previous comment.

      Changes: Lines 217-221 “Lastly, the stability of G4...molecular engineering.” were added.

      Q5. The quality of the figures is poor. Is not possible to read the correlation and p-values from Figure 2.

      A5: The revised figure is now submitted with enhanced clarity to ensure that correlation and p-values can be easily discerned.

      Q6. The analysis of promoter regions should be performed taking into account the distance between the G4 and the beginning of the gene.

      A6: Thank you and we have elaborated more in the revision.

      Changes: Lines 198-106 “Additionally, considering the distance...of G4 structures in promoters.” were added.

      Q7. The topic "Putative origin, transfer mechanisms, and functions of G4s in PAIs". The comments made on this topic are purely speculative and not backed up by data or any type of experimental analysis.

      A7: We appreciate the feedback and have revised the title to emphasize the focus on the functions of G4s in PAIs. We acknowledge that the content related to the putative origin and transfer mechanisms of G4s in PAIs is purely descriptive and speculative, we have made the adjustment to relocate this information to the discussion section for a more appropriate treatment.

      Q8. The supplemental material is hard to follow. The meaning of each column should be better explained. Why was the data divided into 10 parts?

      A8: Following your suggestion, we have revised the tables for better clarity. To address concerns about the division into 10 parts, we have decided to remove this data from the tables as it was deemed unnecessary for presentation.

      Q9. Why was the data of E. Coli strains 1 and 2 shown in Tables S3 and S4 and the other bacterial strains were not?

      A9: We appreciate your inquiry. The data of E. Coli strains 1 and 2 were specifically highlighted in Tables S3 and S4 as illustrative examples to demonstrate the putative functions of G4s in PAIs within the scope of our study. Given the extensive nature of function annotation analyses across various pathogenic strains, presenting additional tables for each strain would have resulted in an impractical volume of supplementary material.

      Q10. The Results and Discussion should be separated.

      A10: Thank you! Corrected as suggested.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major changes:

      Removed any claim of label-free detection, clarifying that ADeS can predict apoptotic events without apoptotic probes

      Provided a github repository with the executable code ( https://github.com/mariaclaudianicolai/ADeS )

      Uploaded all imaging data used to train and benchmark ADeS on Zenodo ( https://zenodo.org/uploads/10260643 )

      Added supplementary movie showing degraded performance on noisy movie in vivo (Supplementary Movie 3)

      Generated a supplementary figure showing the effect of noise on prediction accuracy (Supplementary Figure 4)

      Minor changes:

      Line 6: added Benjamin Grädel and Mariaclaudia Nicolai to the list of authors

      Line 44: dynamics

      Line 54: updated reference to a published paper

      Line 65: fixed spelling of "chronic"

      Line 74: fixed spelling of "limitations"

      Line 76: changed “biochemical reporters” to “fluorescent probes”

      Line 77: changed “label-free” to “probe-free”

      Line 85: “can apply” to "can be applied"

      Line 109: The citation is updated to appear in the reference

      Lines 143-144: Fixed statement about apoptotic cells having non-significant displacement compared to arrested cells

      Line 156: Figure 3 is cited

      Line 185 and Fig 3 legends: “chore” to "core"

      Lines 187 and 248: “withouth” to "without"

      Lines 177-178: introduced acronyms for deep learning networks

      Lines 276-277: Added interval ranges to clarify subgroups observed in Figure 6F

      Line 284: substituted “SNR” with “signal-to-noise ratio”

      Line 286: mentioned “Supplementary Movie 3”

      Line 515: explicitly defined “field of view” instead of “FOVs”

      Lines 604-606: Added data availability section

      Line 822: modified caption of Figure 1D to explain the estimation of nuclear area over time

      Lines 911-912: Explained gray area in caption of figure 8B-C

      Supplementary figure 1: removed “Neu” and “Eos” acronyms from caption. Introduced definition of “FOV” and “SNR” acronyms

      Editorial assessment

      This valuable work by Pulfer et al. advances our understanding of spatial-temporal cell dynamics both in vivo and in vitro. The authors provide convincing evidence for their innovative deep learning-based apoptosis detection system, ADeS, that utilizes the principle of activity recognition. Nevertheless, the work is incomplete due to the authors' claim that their system is valid for non-fluorescently labeled cells, without evidence supporting this notion. After revisions, this work will be of broad interest to cell biologists and neuroscientists

      We acknowledge that the “label-free” claim was misleading, and in the revised manuscript we addressed this aspect by stating that ADeS is “probe-free”, not requiring any apoptotic marker. For this reason we kindly ask the editor to modify its assessment concerning the work being incomplete, as our tool was specifically meant for fluorescent microscopy.

      Reviewer #1 (Public Review):

      Summary:

      Pulfer et al., describe the development and testing of a transformer-based deep learning architecture called ADeS, which the authors use to identify apoptotic events in cultured cells and live animals. The classifier is trained on large datasets and provides robust classification accuracies in test sets that are comparable to and even outperform existing deep learning architectures for apoptosis detection. Following this validation, the authors also design use cases for their technique both in vitro and in vivo, demonstrating the value of ADeS to the apoptosis research space.

      Strengths:

      ADeS is a powerful tool in the arsenal of cell biologists interested in the spatio-temporal co-ordinates of apoptotic events in vitro, since live cell imaging typically generates densely packed fields of view that are challenging to parse by manual inspection. The authors also integrate ADeS into the analysis of data generated using different types of fluorescent markers in a variety of cell types and imaging modalities, which increases its adaptability by a larger number of researchers. ADeS is an example of the successful deployment of activity recognition (AR) in the automated bioimage analysis space, highlighting the potential benefits of AR to quantifying other intra- and intercellular processes observable using live cell imaging.

      Weaknesses:

      A major drawback was the lack of access to the ADeS platform for the reviewers; the authors state that the code is available in the code availability section, which is missing from the current version of the manuscript. This prevented an evaluation of the usability of ADeS as a resource for other researchers.

      We acknowledge that having access to the code is pivotal, and therefore in this revised version we deposited the Python code deploying our DL model on github (link). Moreover, we included in the revised manuscript the training datasets (in vitro and in vivo), as well as all the testing videos used to benchmark ADeS.

      The authors also emphasize the need for label-free apoptotic cell detection in both their abstract and their introduction but have not demonstrated the performance of ADeS in a true label-free environment where the cells do not express any fluorescent markers.

      The system was developed to primarily analyze data acquired via fluorescent microscopy, which relies on fluorescent staining to visualize cells. Therefore, it is not possible to evaluate our methodology in a 100% label-free environment. What we meant using the term “label-free” is that our method can detect apoptotic events based exclusively on morphological cues, without the use of fluorescent apoptotic reporters. We acknowledge that this terminology was misleading and we apologize for the misunderstanding. To amend this, in our revised paper we avoid using the term “label-free”, referring instead to “probe-free” detection.

      While Pulfer et al., provide a wealth of information about the generation and validation of their DL classifier for in vitro movies, and the utility of ADeS is obvious in identifying apoptotic events among FOVs containing ~1700 cells, the evidence is not as strong for in vivo use cases. They mention the technical challenges involved in identifying apoptotic events in vivo, and use 3D rotation to generate a larger dataset from their original acquisitions. However, it is not clear how this strategy would provide a suitable training dataset for understanding the duration of apoptotic events in vivo since the temporal information remains the same.

      One of the main challenges encountered in vivo was the difficulty of capturing rare events such as apoptosis in physiological conditions. Moreover the lack of publicly available datasets further prevented us from collecting an extended training dataset suitable for data-hungry techniques such as supervised deep learning. Resorting to 3D rotations was a strategy to exploit the visual information within acquisition volumes to train our classifiers for 2D detection. This approach is a common data augmentation technique that can naturally increment the size of a dataset by displaying the same object from different angles. However this technique does not explicitly address temporal aspects of the apoptotic events, such as their duration. The duration of the apoptotic events was empirically estimated to obtain a temporal window suitable for detection (Supplementary Figure 1K-L).

      The authors also provide examples of in vivo acquisitions in their paper, where the cell density appears to be quite low, questioning the need for automated apoptotic detection in those situations. In the use cases for in vivo apoptotic detection using ADeS (Fig 8), it appears that the location of the apoptotic event itself was obvious and did not need ADeS, as in the case of laser ablation in the spleen and the sparse distribution of GFP labeled neutrophils in the lymph nodes.

      Before addressing the need for these methodologies in vivo, we provide a proof of concept for their applicability. Accordingly, in vivo acquisitions present several visual artifacts and challenges that can hamper activity recognition techniques. Therefore, from a computer vision perspective, the successful implementation of ADeS in vivo is an achievement per se.

      Concerning its need, we showed in supplementary figure 3 that ADeS is robust to increasingly populated fields of view, and might be useful in detecting hindered apoptotic events as well as in reducing human-bias.

      Finally, the authors also mention that video quality altered the sensitivity of ADeS in vivo (Fig 6L) but fail to provide an example of ADeS implementation on a video of poor quality, which would be useful for end users to assess whether to adopt ADeS for their own live cell movies.

      In figure 6L we quantitatively showed that videos affected by low quality were negatively affecting the sensitivity of ADeS. In this revised version we included a supplementary movie (supplementary movie X) depicting ADeS performances in high signal-to-noise conditions. We also addressed this aspect in vitro, by generating a synthetic degradation of the movie quality and measuring the effect on the performances (supplementary figure 4).

      Reviewer #2 (Public Review):

      Summary:

      Pulfer A. et al. developed a deep learning-based apoptosis detection system named ADeS, which outperforms the currently available computational tools for in vitro automatic detection. Furthermore, ADeS can automatically identify apoptotic cells in vivo in intravital microscopy time-lapses, preventing manual labeling with potential biases. The authors trained and successfully evaluated ADeS in packed epithelial monolayers and T cells distributed in 3D collagen hydrogels. Moreover, in vivo, training and evaluation were performed on polymorphonucleated leukocytes in lymph nodes and spleen.

      Strengths:

      Pulfer A. et colleagues convincingly presented their results, thoroughly evaluated ADeS for potential toxicity assay, and compared its performance with available state-of-the-art tools.

      Weaknesses:

      The use of ADeS is still restricted to samples where cells are fluorescently labeled either in the cytoplasm or in the nucleus, which limits its use for in vitro toxicity assays that are performed on primary cells or organoids (e.g., iPSCs-derived systems) that are normally harder to transfect. In conclusion, ADeS will be a useful tool to improve output quality and accelerate the evaluation of assays in several research areas with basic and applied aims.

      As addressed in the answer to reviewer one, we primarily focused on fluorescent microscopy, which implies fluorescent labeling of the cells. The application to other imaging platforms was not the scope of our study. However, a model to infer apoptosis within other imaging solutions, e.g. brightfield, could be explored in future analogue studies.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their remarks. Please find our detailed answers bellow.

      1) The authors' continued refusal to acknowledge the other reports before the final sentence of the Discussion, which has been pointed out in two previous rounds of review as a major flaw, detracts from the manuscript significantly.

      We now acknowledge and discuss the other SIRT6-nucleosome reports in the introduction as requested by the reviewer.

      2) While some of the grammatical errors in previous versions have been corrected, many remain, especially in the Methods section

      We corrected the remaining grammatical errors.

      3) Multiple statements of fact not supported by data shown in this work continue to lack appropriate references.

      We added references where facts were not supported by our data.

    1. Author Response

      We appreciate the thoughtful comments from the reviewers. All reviewers express common support for the study’s meaningful contribution to understanding interoceptive neurocircuitry in health and in psychiatric disorders. Specifically, the reviewers highlight the strong theoretical backing and the novel combination of tasks and analytical methods. In turn, the reviewers identify several areas for improvement that we plan to address in our resubmission. These include a more detailed demographic characterization of the study participants, increased clarity when describing the statistics that support each conclusion, and additional discussion when interpreting the resting state findings, as we did not include a separate control condition for the effect of time. One reviewer commented that we largely cite our previous work with the isoproterenol paradigm; while we will provide an updated and broader view of the literature in our resubmission, there remains a limited number of comparable interoceptive perturbation studies. Finally, one comment referred to our reliance on ratings of interoceptive intensity without included additional behavioral measures. While our measures of interest were chosen for their relevance to our hypotheses, we will consider adding additional measures such as interoceptive accuracy (correspondence between heart rate and dial ratings) that were collected during the perturbation task, should they provide additional insight into the insular responses of the participants.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript presents the first evidence for a plastic enhancement in the response of pial cortical arterioles to external stimulation. Specifically, they show (p8; Figure 3A-C) that repeated application of a visual stimulus at 0.25 Hz, at the upper edge of the vasomotor response, leads to a greater change in the diameter of pial arterioles at that frequency. This adds to the earlier, referenced work of Mateo et al (2017) that showed locking - or entrainment of pial arteriole vasomotion - by stimuli at different (0.0 to 0.3 Hz) frequencies.

      We thank the reviewer for positively identifying the value of our manuscript.

      The manuscript has a major flaw. Much as there is plasticity that leads to an increase in the amplitude of vasomotion at the drive frequency, the authors need to show reversibility. This could possibly be accomplished by driving the visual system at a different frequency, say 0.15 Hz, and observing if the 0.25 Hz response is then diminished. The authors could then test if their observation is repeatable by again driving at 0.25 Hz. Unless I missed the presentation on this point, there is no evidence for reversibility.

      The reviewer has raised a very important point of view. In our experiments, the visually induced vasomotion (or visual stimulus-triggered vasomotion) was always entrained by repeated trials of the 0.25 Hz temporal frequency stimuli. When the visual stimulation stops, the vasomotion frequency lock to 0.25 Hz quickly dissipates. After saturated training with this stimulus, the parameters of the visual stimulus were switched, for example to 0.15 Hz. The animal quickly adapted to this new stimulus paradigm and the vasomotion was frequency-locked to 0.15 Hz. The adaptation to this new paradigm occurred well within 5 minutes. In Fig. 5, various paradigms were randomly tested. In some of the trials, 0.25 Hz stimulus was tested after 0.15 Hz. The vasomotion also quickly adapted back to the 0.25 Hz. We agree with the reviewer that this reversibility could have been explicitly documented in the manuscript.

      Drew, P. J., A. Y. Shih, J. D. Driscoll, P. M. Knutsen, D. Davalos, P. Blinder, K. Akassoglou, P. S. Tsai, and D. Kleinfeld. 2010. 'Chronic optical access through a polished and reinforced thinned skull', Nature Methods, 7: 981-84.

      Morii, S., A. C. Ngai, and H. R. Winn. 1986. 'Reactivity of rat pial arterioles and venules to adenosine and carbon dioxide: With detailed description of the closed cranial window technique in rats', Journal of Cerebral Blood Flow & Metabolism, 6: 34-41.

      Reviewer #2 (Public Review):

      Sasaki et al. investigated methods to entrain vasomotion in awake wild-type mice across multiple regions of the brain using a horizontally oscillating visual pattern which induces an optokinetic response (HOKR) eye movement. They found that spontaneous vasomotion could be detected in individual vessels of their wild-type mice through either a thinned cranial window or intact skull preparation using a widefield macro-zoom microscope. They showed that low-resolution autofluorescence signals coming from the brain parenchyma could be used to capture vasomotion activity using a macro-zoom microscope or optical fibre, as this signal correlates well with the intensity profile of fluorescently-labelled single vessels. They show that vasomotion can also be entrained across the cortical surface using an oscillating visual stimulus with a range of parameters (with varying temporal frequencies, amplitudes, or spatial cycles), and that the amplitude spectrum of the detected vasomotion frequency increases with repeated training sessions. The authors include some control experiments to rule out fluorescence fluctuations being due to artifacts of eye movement or screen luminance and attempt to demonstrate some functional benefit of vasomotion entraining as HOKR performance improves after repeat training. These data add in an interesting way to the current knowledge base on vasomotion, as the authors demonstrate the ability to entrain vasomotion across multiple brain areas and show some functional significance to vasomotion with regards to information processing as HOKR task performance correlates well with vascular oscillation amplitudes.

      We thank the reviewer for summarizing the value of our study and recognizing its significance.

      The aims of the paper are mostly well supported by the data, but some streamlining of the data presentation would improve overall clarity. The third aim to establish the functional significance of vasomotion in relation to plasticity in information processing could be better supported by the inclusion of some additional control experiments.

      We thank the reviewer for recognizing our vast amount of data supporting our findings. We agree that better data presentation could have improved the clarity of the manuscript.

      Specifically:

      1) The clarity and comprehensibility of the paper could be significantly enhanced by incorporating additional details in both the introduction and discussion sections. In the introduction, a succinct definition of the frequency range of vasomotion should be provided, as well as a better description of the horizontal optokinetic response (i.e. as they have in the results section in the first paragraph below the 'Entrainment of vasomotion with visual stimuli presentation' sub-heading). The discussion would benefit from the inclusion of a clear summary of the results presented at the start, and the inclusion of stronger justification (i.e. more citations) with regards to the speculation about vasomotion and neuronal plasticity (e.g. paragraph 5 includes no citations).

      We agree that a better description of vasomotion and horizontal optokinetic response could have been provided in the introduction. As the reviewer suggests, the discussion could also have started with the following summary of the results.

      “We show that visually induced vasomotion can be frequency-locked to the visual stimulus and can be entrained with repeated trials. The initial drive for the vasomotion, or the sensory-evoked hyperemia, must be coming from the neuronal activity in the visual system. The vasomotion is likely triggered by activation of the neurovascular interaction (Kayser, 2004; van Veluw et al., 2020). Surprisingly, the entrained vasomotion was observed not only in the visual cortex but also widely throughout the surface of the brain and deep in the cerebellar flocculus. The global entrainment could be realized through separate mechanisms from the local neurovascular coupling. What is also unknown is where the plasticity occurs. The neuronal visual response in the primary visual cortex could potentially decrease with repeated visual stimulation presentation as the adaptive movement of the eye should decrease the retinal slip. With repeated training sessions, a more static projection of the presented image will likely be shown to the retina. The neurovascular coupling could be enhanced with increased responsiveness of the vascules and vascular-to-vascular coupling could also be potentiated.”

      2) The novel methods for detecting vasomotion using low-resolution imaging techniques are discussed across the first four figures, but this gets a little bit confusing to follow as the authors jump back and forth between the different imaging and analysis techniques they have employed to capture vasomotion. The data presentation could be better streamlined - for instance by presenting only the methods most relevant for the functional dataset (in Figures 5-7), with the additional information regarding the various controls to establish the use of autofluorescence intensity imaging as a valid method for capturing vasomotion reduced to fewer figure panels, or moved to supplementary figures so as to not detract from the main novel findings contributed in this study.

      We apologize for the confusing presentation of the data. Many of the initial figures were technical; however, we feel that following these steps was necessary to logically conclude that shadow imaging of the autofluorescence could be used as an indicator of vasomotion. We do agree with the reviewer that going back and forth between different techniques can be confusing. We could have added separate supplementary figures to introduce the various methods used upfront before going into the findings.

      3) The authors heavily rely on representative traces from individual vessels to illustrate their findings, particularly evident in Figures 1-4. While these traces offer a valuable visualization, augmenting their approach by presenting individual data points across the entire dataset, encompassing all animals and vessels, would significantly enhance the robustness of their claims. For instance, in Figures 1 and 2, where average basal and dilated traces are depicted for a representative vessel, supplementing these with graphs showcasing peak values across all measured vessels would enable the authors to convey a more holistic representation of their data. Or in Figure 3, where the amplitude spectrum is presented for individual Texas red fluorescence intensity changes in V1 across novice, trained, and expert mice, incorporating a summary graph featuring the amplitude spectrum value at 0.25Hz for each individual trace (across animals/imaging sessions), followed by statistical analysis, would fortify the strength of their assertions. Moreover, providing explicit details on sample sizes for each individual figure panel (where not a representative trace), including the number of animals or vessels/imaging sessions, would contribute to transparency and aid readers in assessing the generalisability of the findings.

      We agree with the reviewer that summarization of the data across a number of vessels/imaging sessions would lead to more generalization of the findings. However, contrary to what the reviewer described, we did summarize the vessel diameter expansion events across multiple vessel observations in Fig. 1F, G. The vasomotion parameters were not summarized for observation in intact skull shown in Fig. 2. However, this figure was intended just to show that vessel boundary cannot be well defined in intact skull imaging and Texas Red intensity or autofluorescence intensity fluctuation would give a better indication of vessel diameter fluctuation. In Fig. 3G, the peak ratio of 0.25 Hz was calculated for individual animals at Novice, Trained, and Expert levels and summarized for n = 5 animals. Statistical analysis was also done. The variability between imaging sessions within individual animals was not analyzed; thus, this could have been indicated.

      4) In the experiments where mice are classed as "novice", "trained" or "expert", the inclusion of the specific range of the number of training sessions for each category would improve replicability.

      We agree with the reviewer that classification on the level of training should have been explicitly indicated. Mice experiencing the first visual training session were defined as “Novice”. The mice that have experienced 3 training sessions are the “Trained” mice and the performance of the “Trained” mice during the 4th training session was evaluated. Mice that experienced 8 to 11 rounds of visual training sessions are the “Expert” mice.

      5) The authors don't state whether mice were habituated to the imaging set-up prior to the first data collection, as head-fixation and restraint can be stress-inducing for animals, especially upon first exposure, which could impact their neurovascular coupling responses differentially in "novice" versus "trained" imaging sessions (e.g. see Han et al., 2020, DOI: https://doi.org/10.1523/JNEUROSCI.1553-20.2020). The stress associated with a tail vein injection prior to imaging could also partially explain why mice didn't learn very well if Texas Red was injected before the training session. If no habituation was conducted in these experiments, the study would benefit from the inclusion of some control experiments where "novice" responses were compared between habituated and non-habituated animals.

      We agree with the reviewer that stress could well affect spontaneous vasomotion as well as visually induced vasomotion (or visual stimulus-triggered vasomotion). As the reviewer suggested, we could have compared the habituated and non-habituated mice to the initial visually induced vasomotion response. In addition, whether the experimentally induced increase in stress would interfere with the vasomotion or not could also be studied. With the Texas Red experiments, we observed that tail-vein injection stress appeared to interfere with the HOKR learning process. In the experiments presented in Fig. 3, Texas Red was injected before session 1. Vasomotion entrainment likely progressed with sessions 2 and 3 training. Before session 4, Texas Red was injected again to visualize the vasomotion. The vasomotion was clearly observed in session 4, indicating that the stress induced by tail-vein injection could not interfere with the generation of visually induced vasomotion.

      6) The experiments regarding the brain-wide vasomotion entrainment across the cortical surface would benefit from some additional information about how brain regions were identified (e.g. particularly how V1 and V2 were distinguished given how close together they are).

      The brain regions were identified by referring to the Mouse Brain Atlas. As the skull was intact, the location of bregma, lambda, and midline was clearly visible. We agree with the reviewer that strict separation of V1 and V2 could be difficult if we rely on the brain atlas alone. However, what we wanted to emphasize was that there was no specific localization of the vasomotion entrainment effect.

      7) Whilst the authors show that HOKR task performance and vasomotion amplitude are increased with repeated training to provide some support to their aim of investigating the functional significance of vasomotion with regards to information processing plasticity, the inclusion of some additional control experiments would provide stronger evidence to address this aim. For instance, if vasomotion signalling is blocked or reduced (e.g. using optogenetics or in an AD mouse model where arteriole amyloid load restricts vasomotion capacity), does flocculus-dependent task performance (e.g. HOKR eye movements) still improve with repeated exposure to the external stimulus.

      We agree that experimental intervention to vasomotion is ideal to test the functional significance of vasomotion. As pharmacological intervention lacks specificity, we are currently exploring the optogenetic approach. We have never thought of using the AD mouse as a model of restricted vasomotion by amyloid, and we agree this would be an interesting model to study. However, the AD mouse model would also have deficits other than the restricted vasomotion. On the other hand, we could test whether the repeated presentation of slowly oscillating visual stimuli can have beneficial effects in improving the cognitive abilities of AD model mice.

      Reviewer #3 (Public Review):

      Summary:

      Here the authors show global synchronization of cerebral blood flow (CBF) induced by oscillating visual stimuli in the mouse brain. The study validates the use of endogenous autofluorescence to quantify the vessel "shadow" to assess the magnitude of frequency-locked cerebral blood flow changes. This approach enables straightforward estimation of artery diameter fluctuations in wild-type mice, employing either low magnification wide-field microscopy or deep-brain fibre photometry. For the visual stimuli, awake mice were exposed to vertically oscillating stripes at a low temporal frequency (0.25 Hz), resulting in oscillatory changes in artery diameter synchronized to the visual stimulation frequency. This phenomenon occurred not only in the primary visual cortex but also across a broad cortical and cerebellar surface. The induced CBF changes adapted to various stimulation parameters, and interestingly, repeated trials led to plastic entrainment. The authors control for different artefacts that may have confounded the measurements such as light contamination and eye movements but found no influence of these variables. The study also tested horizontally oscillating visual stimuli, which induce the horizontal optokinetic response (HOKR). The amplitude of eye movement, known to increase with repeated training sessions, showed a strong correlation with CBF entrainment magnitude in the cerebellar flocculus. The authors suggest that parallel plasticity in CBF and neuronal circuits is occurring. Overall, the study proposes that entrained "vasomotion" contributes to meeting the increased energy demand associated with coordinated neuronal activity and subsequent neuronal circuit reorganization.

      We thank the reviewer for providing a thorough summarization of our manuscript.

      Strengths:

      • The paper describes a simple and useful method for tracking vasomotion in awake mice through an intact skull.

      • The work controls for artefacts in their primary measurements.

      • There are some interesting observations, including the nearly brain-wide synchronization of cerebral blood flow oscillations to visual stimuli and that this process only occurs after mice are trained in a visual task.

      • This topic is interesting to many in the CBF, functional imaging, and dementia fields.

      We thank the reviewer for positively recognizing the strength of the paper.

      Weaknesses:

      • I have concerns with the main concepts put forward, regarding whether the authors are actually studying vasomotion as they state, as opposed to functional hyperemia which is sensory-induced changes in blood flow, which is what they are actually doing. I recommend several additional experiments/analyses for them to explore. This is mostly further characterizing their effect which will benefit the interpretations.

      We recognized that the terminology used in our paper was not explicitly explained. Traditionally, “vasomotion” is defined as the dilation and constriction of the blood vessels that occurs spontaneously at low frequencies in the 0.1 Hz range without any apparent external stimuli. Sensory-induced changes in the blood flow are usually called “hyperemia”. However, in our paper, we used the term, vasomotion, literally, to indicate both forms of “vascular” “motion”. Therefore, the traditional vasomotion was called “spontaneous vasomotion” and the hyperemia induced with slow oscillating visual stimuli was called “visually induced vasomotion”.

      Using our newly devised methods, we show the presence of “spontaneous vasomotion”. However, this spontaneous vasomotion was often fragmented and did not last long at a specific frequency. With visual stimuli that slowly oscillated at temporal frequencies close to the frequency of spontaneous vasomotion, oscillating hyperemia, or “visually induced vasomotion” was observed.

      • Neuronal calcium imaging would also benefit the study and improve the interpretations.

      In our paper, we mainly studied the visually induced vasomotion (or visual stimulus-triggered vasomotion). Therefore, visual stimulation must first activate the neurons and, through neurovascular coupling, the initial drive for vasomotion is likely triggered. However, visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex. We also do not know how the synchronized vasomotion can spread throughout the whole brain. Where the plasticity for vasomotion entrainment occurs is also unknown. To identify the extent of the neuronal contribution to the vasomotion triggering, whole brain synchronization, and vasomotion entrainment, simultaneous neuronal calcium imaging would be ideal. However, due to the fact that fluorescent Ca2+ indicators expressed in neurons would also be distorted by the “shadow” effect from the vasomotion, exquisite imaging techniques would be required.

      • The plastic effects in vasomotion synchronization that occur with training are interesting but they could use an additional control for stress. Is this really a plastic effect, or is it caused by progressively decreasing stress as trials and progress? I recommend a habituation control experiment.

      As also pointed out by reviewer #2, we agree that, whether stress would affect visually induced vasomotion or not could be studied. Studying the visually induced vasomotion in mice well-habituated to the experimental apparatus would give an idea of whether stress could truly be a profounding factor affecting vasomotion. On the other hand, whether acutely induced stress can interfere with the already entrained vasomotion could also be studied. In the experiments presented in Fig. 3, Texas Red was injected via the tail vein, which would be quite stressful for the mouse. However, in the trained mouse, visually induced vasomotion could be observed regardless of the stress. It is likely that stress can interfere with the acquisition of vasomotion entrainment, but the already acquired entrainment will not be canceled with acute stress induced by tail-vein injection. We agree that further relationship between stress and vasomotion and plasticity related to vasomotion entrainment could be investigated.

      Appraisal

      I think the authors have an interesting effect that requires further characterization and controls. Their interpretations are likely sound and additional experiments will continue to support the main hypothesis. If brain-wide synchrony of blood flow can be trained and entrained by external stimuli, this may have interesting therapeutic potential to help clear out toxic proteins from the brain as seen in several neurodegenerative diseases.

      We thank the reviewer for the positive evaluation of our manuscript. Strong entrainment of visually induced vasomotion was observed with a simple presentation of slowly oscillating visual stimuli for several days. This is a totally non-invasive method to train the vasomotion capacity. As the reviewer recognizes, potential benefits for the treatment of dementia and neurodegenerative diseases could be evaluated with further studies.

    1. Author Response:

      We thank the reviewers and editor for their careful analysis of our manuscript and their appreciation of its strengths. Our plans to address the reviewers’ concerns regarding the weaknesses of the study are outlined below.

      Reviewing Editor (Public Review):

      “Weaknesses mainly concern the experiments and arguments leading to the authors' notion that Cav3 channels may partially compensate for the loss of Cav1.4 calcium currents in cone synapses. It is possible that the non-conducting Cav1.4 variant supports synapse development and the Cav3 channel then provides the calcium influx. However, in its current state, the study does not unequivocally assess Cav3 expression in wild-type cones, it lacks direct evidence of Cav3 expression and upregulation, e.g. via single cell transcriptomics, immunolabeling, or an elaboration on electrophysiology, and it does not test the authors' earlier idea that Cav1.4 might couple to intracellular calcium stores at photoreceptor synapses.”

      Current transcriptomic studies indicate that Cav3 transcripts are present at extremely low levels compared to that for Cav1.4 in cones of young mice (PMID 26000488, summarized in PMID 35650675), adult mice (PMID: 36807640), macaque (PMID 30712875), and human (PMID 31075224). Thus, it was somewhat surprising that Davison et al reported the presence of low voltage activated (LVA) Cav3-like currents with amplitudes that were ~50% of that for the Cav1 current in mouse cones at -40 mV (PMID 35803735). Using similar pharmacological criteria as Davison et al, we did not find functional evidence for a LVA current in cones of wild-type (WT) mouse retina: the Ca2+ current in our recordings was suppressed by the Cav1 antagonist isradipine (Fig 3a) but minimally affected in the expected voltage range by the Cav3 antagonist ML218 (Fig 3b). In WT mouse, voltage clamp steps from -90 mV to more depolarized voltages failed to show a transient inward current at onset (Fig 2e), which is a hallmark of LVA calcium currents. In addition, by standard physiological and pharmacological critera, we could not identify LVA currents in cones of ground squirrel (Fig.3c,d) and macaque retina (Supp. Fig.S3). Our results argue against a significant role for LVA currents in mammalian cones.

      A problem that we discovered (as did Davison et al, their Fig.2C) was that Cav3 blockers (e.g., ML218 and Z944) have non-specific actions on the high voltage activated (HVA) Ca2+ current (presumably mediated by Cav1.4) in WT mouse cones. This is clearly shown in our Supp. figure S1a-b where ML218 causes a dose-dependent negative shift in the I-V relationship but also inhibition of current density in HEK293T cells transfected with Cav1.4. We are planning a second study to thoroughly characterize these actions of ML218 and Z944 on Cav1 channels as the results are important for understanding the actions of these drugs in cell-types with mixed populations of Cav1 and Cav3 channels.

      A second problem is that dihydropyridines (DHP) used in both our study and that of Davison et al (e.g., isradipine, nifedipine) incompletely and slowly block Cav1 channels at negative membrane potentials (PMID: 12853422). Due to the slow kinetics of DHP block, Cav1 currents in the presence of such blockers can appear to inactivate rapidly (see Fig.6A in PMID 11487617). Thus, the Cav current recorded in the presence of DHP blockers in WT mouse cones may represent unblocked Cav1.4-mediated currents that appear rapidly inactivating, and therefore misconstrued as being mediated by Cav3 channels.

      Given the caveats of the pharmacological approach, we agree that stronger evidence is needed to rule out a small contribution of Cav3 channels in WT mouse cones. As mentioned in our text, we have found that currently available Cav3 antibodies produce similar patterns of immunofluorescence in WT and corresponding Cav3 KO retina so analysis at the level of Cav proteins is not possible. Thus, we are planning to compare the relative expression of Cav channel genes in cones using drop-seq experiments of G369i KI and WT mouse retina. We also plan to elaborate on our electrophysiological dissection of the HVA and LVA currents.

      Among the 3 Cav3 subtypes, Cav3.2 was the only one detected in mouse cones by Davison et al using nested RT-PCR (PMID 35803735). Thus, we obtained the Cav3.2 mouse strain from JAX (B6;129-Cacna1htm1Kcam/J) and generated a Cav3.2 KO/G369i KI double mutant mouse strain. If the Cav3 current that appears in the G369i KI cones is mediated by Cav3.2, then it should be undetectable in cones of the double mutant mice. Moreover, if these Cav3.2 channels contribute to the residual cone synaptic responses in G369i KI mice, then the double mutant mice should be deficient in this regard. We will test these predictions in patch clamp recordings and ERGs.

      Finally, we will conduct Ca2+ imaging experiments in cone terminals of the WT vs G369i KI mice to test whether increased coupling of Cav channels to intracellular Ca2+ release may be involved in cone synaptic responses of the G369i KI mice.

      Reviewer #1 (Public Review):

      Weaknesses:

      “The major criticism that I have of the study is that it infers Ca channel molecular composition based solely on pharmacological analysis, which, as the authors note, is confounded by the cross-reactivity of many of the "specific" channel-type antagonists. The authors note that Cav3 mRNAs have been found in cones, but here, they do not perform any analysis to examine Cav3 transcript expression after G369i-KI nor do they examine Ca channel transcript expression in monkey or squirrel cones, which serve as controls of sorts for the G369i-KI (i.e. like WT mouse cones, cones of these other species do not seem to exhibit LVA Ca currents).”

      Actually, we also used non-pharmacological (i.e., electrophysiological) criteria to back up our interpretation that Cav3 channels contribute to the Cav current in cones primarily in the absence of functional Cav1.4 channels. For example, in Fig.2, we show that the Ca2+ current in G369i KI and Cav1.4 KO mice exhibit the hallmarks of the Cav3 channel (negative activation and inactivation voltages and window current, rapid inactivation), which are quite distinct from the Ca2+ currents in WT cones. In recordings of ground squirrel and macaque cones (Supp.Figs.S2-3), negative holding voltages do not unmask a LVA current according to various criteria. In addition to the transcriptomic approaches described above, we plan to elaborate on the electrophysiological evidence for the absence of a LVA current in WT mouse cones as part of the revision.

      “Secondarily, in Maddox et al. 2020, the authors raise the possibility that G369i-KI, by virtue of having a functional voltage-sensing domain-might couple to intracellular Ca2+ stores, and it seems appropriate that this possibility be considered experimentally here.”

      We will conduct Ca2+ imaging experiments in cone terminals of the WT vs G369i KI mice to test whether increased coupling of Cav channels to intracellular Ca2+ release may be involved in cone synaptic responses of the G369i KI mice.

      “As a minor point: the authors might wish to note - in comparison to another retinal ribbon synapse-that Zhang et al. 2022 (in J. Neuroscience) performed a study of mouse rod bipolar cells found a number of LVA and HVA Ca conductances in addition to the typical L-type conductance mediated by Cav1-containing channels.”

      We are aware of the extensive evidence for the expression of Cav3 channels in retinal bipolar cells (PMID 11604141, 22909426, 19275782, 35896423) and our recordings of cone bipolar cells in ground squirrel confirm this (Supp. Fig.S2D). We could add reference to this work in our revision.

      Reviewer #2 (Public Review):

      Weaknesses:

      “The major critiques are related to the description of the Cav1.4 knock-in mouse as "sparing" function, which can be remedied in part by a simple rewrite, and in certain places, the data may need to be examined more critically. In particular, the authors should address features in the data presented in Figures 6 and 7 that seem to indicate that the retina of the Cav1.4 knock-in is not intact, but the interpretation given by the authors as "intact" is not appropriate and made without rigorous statistical testing.”

      We intended to use “sparing” and “intact” to indicate that cone synapses are present and to some extent functional, in contrast to their complete absence in the Cav1.4 KO mouse. However, we recognize this may be misinterpreted as “normal”. As suggested by the reviewer, we will revise our statistical analyses and text to clarify that cone synaptic responses do indeed differ significantly in G369i KI as compared to WT mice. We feel that this will be a strong addition to the study and will emphasize the key point that Cav3 cannot fully compensate for loss of Cav1.4 with respect to cone synapse structure and function.

      Reviewer #3 (Public Review):

      Weaknesses:

      “The study has been expertly performed but remains descriptive without deciphering the underlying molecular mechanisms of the observed phenomena, including the proposed homeostatic switch of synaptic calcium channels. Furthermore, a relevant part of the data in the present paper (presence of T-type calcium channels in cone photoreceptors) has already been identified/presented by previous studies of different groups (Macosko et al., 2015; pmid 26000488; Davison et al., 2021; pmid 35803735; Williams et al., 2022; pmid 35650675). The degree of novelty of the present paper thus appears limited.”

      We respectfully disagree that our paper lacks novelty. As indicated by Reviewer 2, a major advance of our study is in providing a mechanism that can explain the longstanding conundrum that congenital stationary night blindness type 2 mutations that would be expected to severely compromise Cav1.4 function do not produce complete blindness. We also disagree that the presence of T-type channels in cone photoreceptors has been unequivocally demonstrated, as the non-biased transcriptomic approaches show very little Cav3 transcript expression in mouse cones (PMIDs 26000488, 35650675, 36807640), macaque cones (PMID 30712875), and human cones (PMID 31075224). Transcription may not equate to translation, particularly at low expression levels. We also note that the one study to date that suggests a functional contribution of Cav3 channels in mouse cones (Davison et al., 2021; pmid 35803735) used a DHP to isolate the “LVA” current, which is problematic as described above. Our demonstration of minimal or undetectable Cav3-type currents in mammalian cones using physiological and pharmacological approaches, while a negative result, adds important context to the recent literature. As described in our response to the editor’s review, our planned revisions include testing whether Cav3 transcripts are upregulated in G369i KI cones and whether the Cav3.2 subtype suggested to be present in cones (PMID 35803735) contributes to Cav currents in these cells using Cav3.2 KO and Cav3.2 KO/G369i KI double mutant mice.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviews

      We would like to extend our thanks to the reviewers who took the time to carefully read our paper and provide thoughtful insights and suggestions on how to strengthen our conclusions. All reviewers agreed that our study presented strong data supporting a role for triglyceride lipase brummer (bmm) in regulating testis lipid droplets and spermatogenesis in Drosophila, and that our findings advance our understanding of lipid biology during sperm development. Reviewers made several helpful suggestions on how to strengthen our manuscript even further. Below, we outline how we revised our manuscript in response to reviewer comments to ensure we clearly communicate our data and conclusions with readers, and properly contextualize our findings.

      REVIEWER 1

      In this study, the authors investigate the role of triglycerides in spermatogenesis. This work is based on their previous study (PMID: 31961851) on triglyceride sex differences in which they showed that somatic testicular cells play a role in whole body triglyceride homeostasis. In the current study, they show that lipid droplets (LDs) are significantly higher in the stem and progenitor cell (pre-meiotic) zone of the adult testis than in the meiotic spermatocyte stages. The distribution of LDs anti-correlates with the expression of the triglyceride lipase Brummer (Bmm), which has higher expression in spermatocytes than early germline stages. Analysis of a bmm mutant (bmm[1]) - a P-element insertion that is likely a hypomorphic - and its revertant (bmm[rev]) as a control shows that bmm acts autonomously in the germline to regulate LDs. In particular, the number of LDs is significantly higher in spermatocytes from bmm[1] mutants than from bmm[rev] controls. Testes from males with global loss of bmm (bmm[1]) are shorter than controls and have fewer differentiated spermatids. The zone of bam expression, typically close to the niche/hub in WT, is now many cell diameters away from the hub in bmm[1] mutants. There is an increase in the number of GSCs in bmm[1] homozygotes, but this phenotype is probably due to the enlarged hub. However, clonal analyses of GSCs lacking bmm indicate that a greater percentage of the GSC pool is composed of bmm[1]-mutant clones than of bmm[rev]-clones. This suggests that loss of bmm could impart a competitive advantage to GSCs, but this is not explored in greater detail. Despite the increase in number of GSCs that are bmm[1]-mutant clones, there is a significant reduction in the number of bmm[1]-mutant spermatocyte and post-meiotic clones. This suggests that fewer bmm[1]-mutant germ cells differentiate than controls. To gain insights into triglyceride homeostasis in the absence of bmm, they perform mass spec-based lipidomic profiling. Analyses of these data support their model that triglycerides are the class of lipid most affected by loss of bmm, supporting their model that excess triglycerides are the cause of spermatogenetic defects in bmm[1]. Consistent with their model, a double mutant of bmm[1] and a diacylglycerol Oacyltransferase 1 called midway (mdy) reverts the bmm-mutant germline phenotypes.

      There are numerous strengths of this paper. First, the authors report rigorous measurements and statistical analyses throughout the study. Second, the authors ulize robust genetic analyses with loss-of-function mutants and lineage-specific knockdown. Third, they demonstrate the appropriate use of controls and markers. Fourth, they show rigorous lipidomic profiling. Lastly, their conclusions are appropriate for the results. In other words, they don't overstate the results.

      We thank the Reviewer for their positive assessment of our paper.

      There are a few weaknesses. Although the results support the germline autonomous role of bmm in spermatogenesis, one potential caveat that the mdy rescue was global, i.e., in both somatic and germline lineages. The authors did not recover somatic bmm clones, suggesting that bmm may be required for somatic stem self-renewal and/or niche residency. While this is beyond the scope of this paper, it is possible that somatic bmm does impact germline differentiation in a global bmm mutant.

      In the revised manuscript, we made several changes to address these points.

      1) We now clearly state when we used global versus germline-only loss of mdy to rescue bmm mutant phenotypes in the testis.

      “Notably, at least some of the effects of global loss of mdy on bmm1 males can be attributed to the germline:

      RNAi-mediated knockdown of mdy in the germline of bmm1 males partially rescued the defects in testis size (Figure 4I; Kruskal-Wallis rank sum test with Dunn’s multiple comparison test) and GSC variance (Figure S5J; p=4.5 x 10-5 and 8.2 x 10-3 by F-test from the GAL4- and UAS-only crosses, respectively).”

      “Importantly, testes isolated from males with global loss of both bmm and mdy (mdyQX25/k03902;bmm1) had fewer LD than testes dissected from bmm1 males (Figures 5D, S5I; one-way ANOVA with Tukey multiple comparison test).”

      2) We also discuss the possibility that somatic bmm may play a role in germline differentiation in a global bmm mutant, and present phenotypic data on somatic bmm1 clones.

      “We also reveal a potential non-cell-autonomous role for somatic bmm. While there was no difference in the ratio of Zd-1-positive cells between homozygous clones and heterozygous clones in animals carrying the bmm1 or bmmrev alleles at 14 days post clone induction (Figure S4O; Kruskal-Wallis rank sum test), the distance from the hub to the Zd-1 positive clones reside was significantly decreased in bmm1 homozygous clones (Figure S4P; Kruskal-Wallis rank sum test). Together, these data indicate bmm may play a cell-autonomous role in germline cells, and potentially a non-cell-autonomous role in somatic cells, to regulate spermatogenesis.”

      3) Finally, we clarify that we were unable to assess somatic LD. Specifically, this was a technical issue as the dye we use to visualize testis LD is incompatible with staining protocols to identify somatic cells. As a result, we were unable to count LD in somatic clones with confidence.

      “While we were unable to assess LD in bmm1 somatic clones, our data when taken together reveals a previously unrecognized cell-autonomous role for bmm as a regulator of testis LD in germline cells.”

      Regarding data presentation, I have a minor point about Fig. 3L: why aren't all data shown as box plots (only Day 14 bmm[rev] does).

      In our revised manuscript Figure 4L does present a boxplot across all genotypes and times; the appearance of ‘no boxes’ is simply due to the large number of datapoints with a value of zero, which compress the box near the X-axis.

      Finally, the authors provide a detailed pseudotime analysis of snRNA-seq of the testis in Fig. S2A-D, but this analysis is not sufficiently discussed in the text.

      In the revised manuscript we added text to describe our pseudotime analysis of single-cell RNA seq data in more detail.

      “Using pseudotime analysis, we arranged the germline (Figure S2A) and the somatic cells (Figure S2B) based on their annotated developmental trajectory. The expression pattern of bmm in the germline matched our observation with bmm-GFP reporter (Figure S2C). While levels of the bmm-GFP reporter were lower in somatic cells, single-cell RNA sequencing data identified bmm expression in the somatic lineage that was higher in cells at later stages of development (Figure S2D). Additional neutral lipid- and lipid droplet-associated genes such as lipid storage droplet-2, Seipin, Lipin, and midway also showed differential regulation during differentiation (Figure S2C, S2D). Combined with our data on the location of testis LD, these data suggest that bmm upregulation in both somatic and germline cells during differentiation corresponds to the downregulation of testis LD. Supporting this, germline GFP levels were negatively correlated with testis LD in bmm-GFP flies (Figure 2A, 2C), suggesting regions with higher bmm expression had fewer LD.”

      Overall, the many strengths of this paper outweigh the relatively minor weaknesses. The rigorously quantified results support the major aim that appropriate regulation of triglycerides are needed in a germline cell-autonomous manner for spermatogenesis.

      This paper should have a positive impact on the field. First and foremost, there is limited knowledge about the role of lipid metabolism in spermatogenesis. The lipidomic data will be useful to researchers in the field who study various lipid species. Going forward, it will be very interesting to determine what triglycerides regulate in germline biology. In other words, what functions/pathways/processes in germ cells are negatively impacted by elevated triglycerides. And as the authors point out in the discussion, it will be important to determine what regulates bmm expression such that bmm is higher in later stages of germline differentiation.

      We agree with the reviewer about the many interesting future directions for this project. We added a model figure in the revised manuscript to visualize our findings and highlight remaining questions about how bmm and triglycerides support normal spermatogenesis in Drosophila (Fig. 6).

      REVIEWER 2

      Summary:

      Here, the authors show that neutral lipids play a role in spermatogenesis. Neutral lipids are components of lipid droplets, which are known to maintain lipid homeostasis, and to be involved in non-gonadal differentiation, survival, and energy. Lipid droplets are present in the testis in mice and Drosophila, but not much is known about the role of lipid droplets during spermatogenesis. The authors show that lipid droplets are present in early differentiating germ cells, and absent in spermatocytes. They further show a cell autonomous role for the lipase brummer in regulating lipid droplets and, in turn, spermatogenesis in the Drosophila testis. The data presented show that a relationship between lipid metabolism and spermatogenesis is congruous in mammals and flies, supporting Drosophila spermatogenesis as an effective model to uncover the role lipid droplets play in the testis.

      We thank the Reviewer for their positive assessment of our paper.

      Strengths and weaknesses:

      The authors do a commendably thorough characterization of where lipid droplets are detected in normal testes: located in young somatic cells, and early differentiating germ cells. They use multiple control backgrounds in their analysis, including w[1118], Canton S, and Oregon R, which adds rigor to their interpretations. The authors employ markers that identify which lipid droplets are in somatic cells, and which are in germ cells. The authors use these markers to present measured distances of somatic and germ cell-derived lipid droplets from the hub. Because they can also measure the distance of somatic and germ cells with age-specific markers from the hub, these results allow the authors to correlate position of lipid droplets with the age of cells in which they are present. This analysis is clearly shown and well quantified.

      The quantification of lipid droplet distance from the hub is applied well in comparing brummer mutant testes to wild type controls. The authors measure the number of lipid droplets of specific diafteters, and the spatial distribution of lipid droplets as a function of distance from the hub. These measurements quantitatively support their findings that lipid droplets are present in an expanded population of cells further from the hub in brummer mutants. The authors further quantify lipid droplets in germline clones of specified ages; the quantitative analysis here is displayed clearly, and supports a cell autonomous role for brummer in regulating lipid droplets in spermatocytes.

      Data examining testis size and number of spermatids in brummer mutants clearly indicates the importance of regulating lipid droplets to spermatogenesis. The authors show beautiful images supported by rigorous quantification supporting their findings that brummer mutants have both smaller testes with fewer spermatids at both 29 and 25C. There is also significant data supporting defects in testis size for 14-day-old brummer mutant animals compared to controls. The comparison of number of spermatids at this age is not significant, which does not detract from the story but does not support sperm development defects specifically caused by brummer loss at 14 days. Their analysis clearly shows an expanded region beyond the testis apex that includes younger germ cells, supporting a role for lipid droplets influencing germ cell differentiation during spermatogenesis.

      We thank the reviewer for pointing out this inaccuracy in our manuscript. In the revised manuscript we chose more precise language to describe defects in 14-day-old bmm mutants:

      “Defects in testis size were also observed at 14-day post eclosion; suggesting testis size defects persist later into the life course (Figure S4C; Welch two-sample t-test). In contrast, the number of spermatid bundles per testis was not significantly different between bmm1 and bmmrev males at this age (Figure S4D; Welch two-sample ttest), potentially due to a large decrease in the number of spermatid bundles in 14-day-old bmmrev males (Figure 4C, S4D).”

      The authors present a series of data exploring a cell autonomous role for brummer in the germline, including clonal analysis and tissue specific manipulations. The clonal data indicating increased lipid droplets in spermatocyte clones, and a higher proportion of brummer mutant GSCs at the hub are convincing and supported by quantitation. The authors also show a tissue specific rescue of the brummer testis size phenotype by knocking down mdy specifically in germ cells, which is also supported by statistically significant quantitation. The authors present data examining the number of spermatocyte and post-meiotic clones 14 days aeer clonal induction. While data they present is significant with a 95% confidence interval and a p value of 0.0496, its significance is not as robust as other values reported in the study, and it is unclear how much information can be gained from that specific result.

      We thank the reviewer for raising this point. In the revised manuscript we displayed the p-value clearly in the text and on the figure to ensure our statistical output is clear for readers to evaluate our conclusions regarding bmm mutant clones 14 days after clone induction. We also state that the finding should be reproduced by others given that the statistical significance of this result was not as strong as our other data.

      “Because we observed significantly fewer bmm1 spermatocyte and spermatid clones at 14 days after clone induction (Figure 4K,4L; p = 0.0496, Kruskal-Wallis rank sum test), these effects on germline development may represent a cell-autonomous role in regulating spermatogenesis for bmm in this cell type. Given that the statistical significance of this finding was not as strong as for our other data, future studies should repeat this experiment with more samples.”

      The authors do a beautiful job of validating where they detect brummer-GFP by presenting their own pseudotime analysis of publicly available single cell RNA sequencing data. Their data is presented very clearly, and supports expression of brummer in older somatic and germline cells of the age when lipid droplets are normally not detected. The authors also present a thorough lipidomic analysis of animals lacking brummer to identify triglycerides as an important lipid droplet component regulating spermatogenesis.

      Impact:

      The authors present data supporting the broad significance of their findings across phyla. This data represents a key strength of this manuscript. The authors show that loss of a conserved triglyceride lipase impacts testis development and spermatogenesis, and that these impacts can be rescued by supplementing diet with medium chain triglycerides. The authors point out that these findings represent a biological similarity between Drosophila and mice, supporting the relevance of the Drosophila testis as a model for understanding the role of lipid droplets in spermatogenesis. The connection buttresses the relevance of these findings and this model to a broad scientific community.

      We thank the Reviewer very much for their positive assessment of our paper!

      REVIEWER 3

      In this manuscript, Chao et al seek to understand the role of brummer, a triglyceride lipase, in the Drosophila testis. They show that Brummer regulates lipid droplet degradation during differentiation of germ and somatic cells, and that this process is essential for normal development to progress. These findings are interesting and novel, and contribute to a growing realisation that lipid biology is important for differentiation.

      We thank the Reviewer for their positive comments about our manuscript.

      Major comments:

      1) The data in Figs 1 and 2, while helpful in setting the scene, do not add much to what was previously shown by the same group, namely that lipid droplets are present in both early germ cells and early somatic cells in the testis, and that Bmm regulates their degradation (PMID: 31961851). Measuring the distance of lipid droplets from the hub, while helpful in quantifying what is apparent, that only stem and early differentiated stages have lipid droplets, is not as informative as the way data are presented later (Fig. 2I), where droplets in specific stages are measured. Much of this could be condensed without much overall loss to the manuscript.

      We thank the reviewer for this comment. In our revised manuscript we edited the first part of the paper while still preserving the detailed characterization that builds upon our previous paper.

      2) It would be important to show images of the clones from which the data in Fig. 2I are generated. The main argument is that Bmm regulates lipid droplets in a cell autonomous manner; these data are the strongest argument in support of this and should be emphasised at the expense of full animal mutants (which could be moved to supplementary data).

      We thank the reviewer for this comment. In the revised manuscript we added a figure showing lipid droplets in control and bmm mutant spermatocyte clones in Fig. 3A, 3B with a quantification of this data in Figure 3C.

      Similarly, the title of Fig. S2 ("brummer regulates lipid droplets in a cell autonomous manner") should be changed as the figure has no experiments with cell (or cell-type)-specific knockdowns/mutants. This figure does show changes in lipid droplets in both lineages in bmm mutants, so an appropriate title could be "brummer regulates lipid droplets in both germ and soma".

      We thank the reviewer for this comment, we adjusted the Figure 2 legend title in the revised manuscript to “brummer regulates lipid droplets in both germline and somatic cells of the testis”.

      3) Interestingly, the clonal data show that bmm is dispensable in germ cells until spermatocyte stages, as no increase in lipid droplet number is seen until then. This should be more clearly stated, as it indicates that the important function of Bmm is to degrade lipid droplets at the transition from spermatogonial to spermatocyte stages. This is consistent with the phenotypes observed in which late stage germ cells are reduced or missing. However, the effect on niche retention of the mutant GSCs at the expense of neighbouring wildtype GSCs is hard to explain. Are lipid droplets in mutant GSCs larger than in control? Is there any discernible effect of bmm mutation on lipids in GSCs? Additionally, bam expression is delayed, suggesting that bmm may have roles on cell fate in earlier stages than its roles that can be detected on lipid droplets.

      We thank the reviewer for this comment. We included more text in the revised manuscript to clarify the key role bmm plays in regulating lipid droplets at the spermatogonia-spermatocyte transition.

      “Because we observed no significant effect of cell-autonomous bmm loss on LD at any other stage of germline development (Figure 3C), this suggests bmm function is not required to regulate LD at early stages of germ cell development. Instead, our data suggests bmm plays a role in regulating LD at the spermatogonia-spermatocyte transition.”

      We also added more detail to our description of how bmm affects lipid droplets in cells at the earliest stages of germline development.

      “Given that we detected no effect of cell-autonomous bmm loss on the number of GSC LD (Fig. 3C), more work will be needed to understand how bmm regulates GSC at a stage prior to its effects on LD number.”

      4) The bmm loss-of-function phenotype could be better described. Some of the data is glossed over with little description in the text (see for example the reference to Fig. 3A-C). For instance, in the discussion, the text states "loss of bmm delays germline differentiation leading to an accumulation of early-stage germ cells" (p13, l.25960). However, this accumulation has not been clearly shown, or at least described in the manuscript. Most of the data show a reduction (or almost complete absence) of differentiated cell types. This could indeed be due to delayed differentiation, or alternatively to a block in differentiation or to death of the differentiated cells. The clonal data presented show a decrease in the number of cells recovered, but do not allow inferences as to the timing of differentiation, making it hard to distinguish between the various possibilities for the lack of differentiated spermatids. Apart from data showing that GSCs are more likely to remain at the niche, no further data are shown to support the fact that mutant germ cells accumulate in early stages. While additional experiments could help resolve some of these issues, much of this could also be resolved by tempering the conclusions drawn in the text.

      We thank the reviewer for these comments. In the revised manuscript we temper our conclusions regarding bmm’s precise role in spermatogenesis by discussing different mechanisms (e.g. differentiation or death) that could lead to the phenotypes we observe.

      “This regulation is important for sperm development, as our data indicates that loss of bmm causes a decrease in the number of differentiated cell types. This reduction in differentiated cell types may be attributed to a delay in differentiation, a block in differentiation, or to a loss of differentiated cells through cell death. Future studies will therefore be essential to resolve why bmm loss causes a reduction in differentiated cell types.”

      5) In the discussion (p.14, l-273 onwards), the authors suggest that products of triglyceride breakdown are important for spermatogenesis. However, an alternative interpretation of the results presented here (especially those using the midway mutant) could be that triglycerides impede normal differentiation directly. Indeed, preventing the cells' ability to produce triglycerides in the first place can rescue many of the defects observed. A better discussion of these results with a model for the function of triglycerides and their by-products would be a great improvement to this manuscript.

      We thank the reviewer for this comment. To ensure our data is clearly communicated with readers, we added a model to the paper suggesting how triglyceride and its by-products influence spermatogenesis (Fig. 6) and text to clarify that triglyceride could potentially impeded differentiation.

      “It will also be important to determine whether it is the loss of metabolites produced by bmm’s enzymatic action, or an increase in triglycerides, that leads to the reduction in differentiated cell types during spermatogenesis. Together, these experiments will provide critical insight into how triglyceride stored within testis LD contributes to overall cellular lipid metabolism during spermatogenesis.”

      Together, these changes will strengthen our overall finding that bmm-mediated regulation of testis triglyceride is important for normal sperm development. Because our findings in flies align with and extend data from rodent models, the developmental mechanisms we uncovered about how triglyceride lipase bmm regulates testis lipid droplets and sperm development will likely operate in other species.  

      Reviewer #1 (Recommendations For The Authors):

      I have a minor concern about methodology: how were spermatocytes identified? I ask because data in Figure 3 indicate that there is a significant delay in germline differentiation in the bmm[1] mutant, with relatively smaller germ cells throughout the apical half of the testis. Typical large spermatocyte-like cells are not clearly obvious to me in Fig. 3.

      We thank the Reviewer for suggesting we add more clarity to how we identified spermatocytes. We state in the revised manuscript how we identify spermatocytes:

      “Cells in the testis region occupied by primary spermatocytes were identified by their large cell size and decondensed chromosome staining occupying three nuclear domains [120].”

      Also, we note that while it is difficult to see where the bmm1 testis have spermatocytes in Fig. 4E, this is due to the large number of early-stage cells in this close-up image. The spermatocytes can be more easily seen in Fig. 4I and 4I’ when the whole testis is included in the image.    

      Reviewer #2 (Recommendations For The Authors):

      • Lines 197-198 mention "Boule-positive area," "individualization complexes," and "waste bags." It would be helpful to the reader to explain what these measurements are to help contextualize the data shown related to these statements.

      We thank the Reviewer for this comment. We added the following text to the revised manuscript:

      “Because Boule-positive area, individualization complexes, and waste bags are all markers for later stages in sperm development, these data indicate the loss of bmm causes a reduction in differentiated cell types.”

      • Line 162 states a defect in sperm development observed in 14-day-old bmm[1] males, but the data presented in Figure S3D does not show a significant difference. The words "sperm development" should be removed from this sentence.

      We thank the Reviewer for pointing out this inaccurate statement. We fixed the statement as follows in the revised manuscript:

      “Defects in testis size were also observed at 14-day post eclosion; suggesting testis size defects persist later into the life course (Figure S4C; Welch two-sample t-test). In contrast, the number of spermatid bundles per testis was not significantly different between bmm1 and bmmrev males at this age (Figure S4D; Welch two-sample ttest), potentially due to a large decrease in the number of spermatid bundles in 14-day-old bmmrev males (Figure 4C, S4D).”

      • Line 294 has a typo: "regulating" should likely be "regulated"

      We thank the Reviewer for pointing out this mistake, which we corrected.

      • Line 456 should include the length of time for heat shock

      We thank the Reviewer for pointing out this omission. We now include these details:

      “Adult males were collected at 3-5 days post-eclosion and heat-shocked three times at 37°C for 30 min followed by a 10 min rest period at room temperature between heat shocks.”

      • Methods section beginning on Line 442 might include an explanation of how hub area was quantified.

      We thank the Reviewer for this suggestion. We now include the following information:

      “Hub size was measured by quantifying FasIII-positive area of the testis.”

      • Figure 1 legend could benefit from adding a statement on how spermatocytes (arrowheads) were identified

      We thank the Reviewer for this suggestion, we now refer the reader to the more detailed description in the methods section.

      • Figure 2A should present the merged panel in A' first. The legend states that Panel A shows Lipid Droplets, but LipidTox is not shown until A'.

      We thank the Reviewer for this suggestion, we now clarify that the text refers to panels A-A''''.

      • Figure 2I would benefit from a key, to emphasize that these are individual cell clones, highlighting the idea of cell autonomous effects of bmm in the spermatocytes. Showing example images of spermatocyte clones with increased lipid droplets could also emphasize this result. The legend for this panel should note the statistical test done to confirm significance in the SC result.

      We agree with the Reviewer and have added images of the LD in bmm1 spermatocyte clones in Figure 3B, and the quantification in Figure 3C. We explicitly state the significance of this result and the statistical test in Figure 3 legend.

      • In Figure 3, the cell autonomous data clearly indicates that there are higher proportions of bmm mutant GSCs occupying the hub compared to control GSCs. It could be worth stating whether this observation indicates an increased ability of bmm mutant GSCs to compete for occupying space at the hub.

      We thank the Reviewer for pointing out this potential implication of our data, which we acknowledge in the revised version of our manuscript:

      “Future studies will also need to confirm whether bmm1 mutant GSCs show an increased ability to occupy space at the hub.”

      • In Figure 4, I suggest changing the title of Panel B to "Proportion of significant species in each lipid class" for clarity.

      We made this change in the Figure 5 legend (Figure 5 is the corresponding figure in the revised manuscript).

      • It could be valuable to quantify the number of spermatids in the germline specific mdy knockdown, which would lend additional support to a cell autonomous requirement for bmm in spermatogenesis

      We added a sentence to the revised manuscript recognizing that this is an interesting experiment for studies on the role of germline triglyceride in promoting spermatogenesis.

      “While future studies will need to test whether germline-specific loss of mdy also rescues spermatid number defects in bmm1 males, our data suggest bmm-mediated regulation of testis triglyceride plays a previously unrecognized role in regulating sperm development.”

      Reviewer #3 (Recommendations For The Authors):

      1) bmm-GFP does not show expression in somatic cells yet previous work by the same group has shown a requirement for bmm in the testis soma using C587-Gal4.

      We thank the Reviewer for raising this issue. While the reporter shows low GFP expression in the somatic cells, the single-cell RNA sequencing data we analyze suggests bmm is expressed in these cells. We address this issue in the revised manuscript as follows:

      “While levels of the bmm-GFP reporter were lower in somatic cells, single-cell RNA sequencing data identified bmm expression in the somatic lineage that was higher in cells at later stages of development (Figure S2D).”

      2) p.11 l.200-202 "Because we recovered fewer bmm1 spermatocyte and spermatid clones 14 days after clone induction (Figure 3K,3L; Kruskal-Wallis rank sum test), this effect on germline development represents a cell-autonomous role for bmm." This sentence should be rephrased as the phenotype could be a combination of autonomous roles within the germline and non-autonomous roles in supporting cyst cells.

      “We also reveal a potential non cell-autonomous role for somatic bmm. While there was no difference in the ratio of Zd-1-positive cells between homozygous clones and heterozygous clones in animals carrying the bmm1 or bmmrev alleles at 14 days post clone induction (Figure S4O; Kruskal-Wallis rank sum test), the distance from the hub to the Zd-1 positive clones reside was significantly decreased in bmm1 homozygous clones (Figure S4P; Kruskal-Wallis rank sum test). Together, these data indicate bmm may play a cell-autonomous role in germline cells, and potentially a non-cell-autonomous role in somatic cells, to regulate spermatogenesis.”

      3) The labelling in Fig. 3 is confusing - presumably the graph in 3C refers to spermatid bundles [this comment applies to other figures showing spermatid bundle numbers], not individual spermatids, while the graph in 3G refers to the proportion of the total GSC pool that is contained within the clone. The data in Fig. 3C are not described in the main text.

      We adjusted the confusing labelling to ‘spermatid bundles’ from ‘number of spermatids’, as suggested. We also changed the title of panel Fig. 3G (now 4G) as suggested and men5oned Fig. 3C (now Fig. 4C) in the text.

      4) On p.9, comments are speculative or seek to draw comparisons with the broader literature and would seem to belong more to the discussion (eg "our data suggests flies are a good model to study how bmm/ATGL influences sperm development" - also there is a typo, it should be "suggest").

      We thank the Reviewer for raising concern about our speculative statement; we changed the text as follows in the revised manuscript:

      “This identifies similarities between flies and mice in fertility-related phenotypes associated with whole-body loss of bmm/ATGL.”

      5) The length of the heat shocks used for clone induction should be specified in the methods (rather than just the period in between heat shocks).

      We now include more information on clone induction:

      “Adult males were collected at 3-5 days post-eclosion and heat-shocked three times at 37°C for 30 min followed by a 10 min rest period at room temperature between heat shocks. Amer heat-shock, the flies were incubated at room temperature until dissection.”

      6) p.8 l.132 "bmm-GFP accurately reproduces changes to bmm mRNA levels". This sentence should be rephrased.

      We thank the Reviewer for this comment and rephrased the sentence:

      “We first examined bmm expression in the testis by isolating this organ from flies carrying a bmm promoter driven GFP transgene (bmm-GFP) that recapitulates many aspects of bmm mRNA regulation [77].”

      7) p.9 l.172 "we used germline-specific marker" should read "we used an antibody against the germline-specific marker".

      We corrected this inaccurate statement in our revised manuscript.

      8) p.10 several lines, "GSC" should be "GSCs".

      We corrected this inaccurate use of GSC in our revised manuscript.

      9) p.13 l.247 should read "variance in GSC numbers".

      Thank you, this error was fixed.

    1. Author Response

      We thank the editors and the reviewers for their assessment of our revised manuscript. Please see bellow, our answers to the recommendations by reviewer #2.

      Figure S2F - Seems like a very narrow range of parameters. Is there some fine tuning here?

      The range of values of tau_P that yields previous-trial biases is bounded by below and above for the following reasons: above a certain value of tau_P (therefore large integration time), the bump that had formed in the previous trial is not strong enough to remain stable for a long time, and therefore dissipates by the time the current trial starts (especially when adaptation is fast, towards the left of the third panel). Below a certain value, instead, this integration timescale is small enough to quickly form a representation of the current trial, hence the bump from the previous trial quickly dissipates (due to mutual inhibition). This interplay between the integration and the adaptation timescale as well as considering a phenomenon which is bounded in time (how close the activity bump is to the second stimulus of the previous trial which is presented between -22.4 and -5.6 seconds from the moment we are considering) yields a region for tau_P which is bounded. This region, however, appears narrow due to the limited number of points we have considered for the simulation grid.

      Regarding my comment on lapse at the boundaries (old line 221). Lapse parameters in psychometric curves correspond to errors on the "easy" trials. But the mechanistic explanation for lapse trials is that there is a non-zero probability for the subject to respond in a manner that is random and independent of the stimulus. In the case of extreme stimuli, this is the only reason for errors, and thus looking at the edges of the psychometric curves allows to calculate lapse rate. But - the usual assumption for underlying mechanism is that the subject lapses in all trials, regardless of stimulus. If I understand correctly, this is different than the mechanistic reason for lapses in the network model, which was described as something that happens more in the edges than in the center. Or more generally, to be a stimulus-dependent effect.

      We thank the reviewer for this clarification. The reviewer is right that in our mechanistic model, lapses (as defined by errors on easy trials) are more likely to occur for extreme stimuli, due to the vicinity to the boundary of the attractor. Such errors also occur for non-extreme stimuli, when delay intervals are long enough for the bump in PPC to drift to the boundaries. In experiments, lapse trials as described by the reviewer occur due to multiple different reasons; for lapse that is independent of the stimuli, mechanisms such as attention have been thought to play a role, this however is not included in our model.

      What are the parameters for the distributions (skewed, bimodal, ...)?

      These parameters are reported in the legend of Fig.6, where the distributions appear.

      Bump with adaptation. Sorry for the draft-like comment. I don't think the existing studies are in the form you describe. I do think it might be useful to point readers to these studies. If an interested reader wishes to understand network dynamics in this and similar scenarios, it might be useful to have the pointers. The reference I had in mind was Romani, S., & Tsodyks, M. (2015). Short‐term plasticity based network model of place cells dynamics. Hippocampus, 25(1), 94-105.

      We thank the reviewer for the clarification, and we will include this reference in the Version of Record.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study about the mechanisms underlying our capacity to represent and hold recent events in our memory and how they are influenced by past experiences. A key aspect of the model put forward here is the presence of discrete jumps in neural activity with the posterior parietal region of the cortex. The strength of evidence is largely solid, with some weaknesses noted in the methodology. Both reviewers suggested ways in which this aspect of the model can to be tested further and resolve conflicts with previously published experimental results, in particular the study by Papadimitriou et al 2014 in Journal of Neurophysiology.

      We thank the editors for their assessment. As mentioned in the cover letter, we have addressed all the reviewers’ concerns and would like to request and update of the assessment to reflect the revisions we have made.

      Public Reviews:

      We thank both reviewers for their careful reading and feedback that helped clarify many aspects of the model. Below, we address their comments.

      Reviewer #1 (Public Review):

      This paper aims to explain recent experimental results that showed deactivating the PPC in rats reduced both the contraction bias and the recent history bias during working memory tasks. The authors propose a twocomponent attractor model, with a slow PPC area and a faster WM area (perhaps mPFC, but unspecified). Crucially, the PPC memory has slow adaptation that causes it to eventually decay and then suddenly jump to the value of the last stimulus. These discrete jumps lead to an effective sampling of the distribution of stimuli, as opposed to a gradual drift towards the mean that was proposed by other models. Because these jumps are single-trial events, and behavior on single events is binary, various statistical measures are proposed to support this model. To facilitate this comparison, the authors derive a simple probabilistic model that is consistent with both the mechanistic model and behavioral data from humans and rats. The authors show data consistent with model predictions: longer interstimulus intervals (ISIs) increase biases due to a longer effect over the WM, while longer intertrial intervals (ITIs) reduce biases. Finally, they perform new experiments using skewed or bimodal stimulus distributions, in which the new model better fits the data compared to Bayesian models.

      The mechanistic proposed model is simple and elegant, and it captures both biases that were previously observed in behavior, and how these are affected by the ISI and ITI (as explained above). Their findings help rethink whether our understanding of contraction bias is correct.

      On the other hand, the main proposal - discrete jumps in PPC - is only indirectly verified.

      We agree with the reviewer that the evidence for discrete jumps in PPC has been provided in behavioural results (short-term, n-back trial biases), and not from neural data. However, we believe electrophysiological investigations are out of the scope of the current manuscript and future works are needed to further verify the results.

      The model predicts a systematic change in bias with inter-trial-interval. Unless I missed it, this is not shown in the experimental data. Perhaps the self-paced nature of the experiments allows to test this?

      We thank the reviewer for this great suggestion.

      We had not previously looked at this in the data for the reason that in the simulations, the ITI is set to either 2.2, 6 or 11 seconds, whereas the experiment is self-paced. Therefore, any comparison with the simulation should be made carefully.

      However, after the reviewer’s suggestion, we did look at the change in the bias with the inter-trial interval, by dividing trials according to ITIs lower than 3 seconds (“short” ITI), and higher than 3 seconds (“long” ITI). This choice was motivated by the shape of the distribution of ITIs, which is bimodal, with a peak around 1 second, and another after 3 seconds (new Fig 8F). Hence, we chose 3 seconds as it seemed a natural division. However, 3 seconds also happens to be approximately the 75th percentile of the distribution, and this means that there is much more data in the “short” ITI than the “long” ITI set. In order to have sufficient data in the “long” ITI for clearer effects we used all of our dataset – the negatively skewed, and also two bimodal distributions (of which only one was shown in the manuscript, for succinctness). This larger dataset allows us to clearly see not only a decreasing contraction bias with increasing ITI (Fig 8G), but also a decreasing onetrial-back attractive bias with increasing ITI (Fig 8H). We have uploaded all the datasets as well as scripts used to analyze them to this repository: https://github.com/vboboeva/ParametricWorkingMemory_Data.

      The data in some of the figures in the paper are hard to read. For instance, Figure 3B might be easier to understand if only the first 20 trials or so are shown with larger spacing. Likewise, Figure 5C contains many overlapping curves that are hard to make out.

      We have limited the dynamics in Fig 3B to the first 50 trials for better visibility. Likewise, as suggested, we report the standard error of the mean instead of the standard deviation in old Fig 5C (new Fig 6C) – this allows for the different curves to be better discernible.

      There is a gap between the values of tau_PPC and tau_WM. First - is this consistent with reports of slower timescales in PFC compared to other areas?

      Recent studies by Xiao-Jing Wang and colleagues (Refs. 1-3 below) suggest that may be the case. In Wang et al 2023, Ref 1 below), the authors use a generative model to study the concept of bifurcation in space in working memory, that is accompanied by an inverted-V shape of the time constants as a function of cortical hierarchy.

      Briefly, they propose a generative model of the cortex with modularity, incorporating repeats of a canonical local circuit connected via long-range connections. In particular, the authors define a hierarchy for each local circuit. At a critical point in this hierarchy axis, there is a phase transition from monostability to bistability in the firing rate. This means that a local circuit situated below the critical point will only display a low activity steady state, while those above the critical point additionally display a persistent activity steady state.

      The model predicts a critical slowing down of the neural fluctuations at the critical point, resulting in an inverted-V shape of the time constants as a function of the hierarchy. They test the predictions of their model – the bifurcation in space and that inverted-V-shaped time constants as a function of the hierarchy - on connectome-based models of the macaque and mouse cortex. Interestingly both datasets show similar behavior. In particular, during working memory, frontal areas (higher in the hierarchy, e.g. area 24c in macaques) has a smaller time constant relative to posterior parietal areas (lower in the hierarchy, like LIP or f7). We have now cited this new work.

      [1] https://www.biorxiv.org/content/10.1101/2023.06.04.543639v1

      [2] https://elifesciences.org/articles/72136

      [3] https://www.biorxiv.org/content/10.1101/2022.12.05.519094v3.abstract

      Second - is it important for the model, or is it mostly the adaptation timescale in PPC that matters?

      We have run simulations producing a phase diagram with tau_theta^P on the x-axis, tau^P on the y-axis, and in color, the fraction of trials in which the bump is in the vicinity of a target (Fig S2 F), before the network is presented with the second stimulus. This target can be the first stimulus s_1 (left), mean over stimuli (middle) and previous trial’s stimulus (right)). White point corresponds to parameters of the default network.

      In this phase diagram, the lowest value that tau_P takes is tau_WM=0.01. When tau_P=tau_WM, the bump is rarely in the vicinity of 1-trial-back stimulus, and we can see that tau_PPC should be greater than tau_WM in order for the model to yield 1-trial back effects. We conclude that it is indeed important for tau_PPC > tau_WM.

      We have included this in Fig S2 F of the manuscript.

      Regarding the relation to other models, the model by Hachen et al (Ref 45) also has two interacting memory systems. It could be useful to better state the connection, if it exists.

      The model proposed by Hachen et al is conceptually different in that one module stores the mean of the sensory stimulus; it could be related to a variant of our model where adaptation is turned off in the PPC network (Fig S2 A). However, the task they model is also different: subjects have to learn the location of a boundary according to which the stimulus is classified as ‘weak’ or ‘strong’, set by the experimenter. Hence, it is a task where learning is needed - this contrasts with the task we are modelling, where only working memory is required. How task demands reconfigure existing circuits via dynamics and/or learning to perform different computations is a fascinating area of research that is outside the scope of this work.

      Reviewer #2 (Public Review):

      Working memory is not error free. Behavioral reports of items held in working memory display several types of bias, including contraction bias and serial dependence. Recent work from Akrami and colleagues demonstrates that inactivating rodent PPC reduces both forms of bias, raising the possibility of a common cause.

      In the present study, Boboeva, Pezzotta, Clopath, and Akrami introduce circuit and descriptive variants of a model in which the contents of working memory can be replaced by previously remembered items. This volatility manifests as contraction bias and serial dependence in simulated behavior, parsimoniously explaining both sources of bias. The authors validate their model by showing that it can recapitulate previously published and novel behavioral results in rodents and neurotypical and atypical humans.

      Both the modeling and the experimental work is rigorous, providing compelling evidence that a model of working memory in which reports sometimes sample past experience can produce both contraction bias and serial dependence, and that this model is consistent with behavioral observations across rodents and humans in the parametric working memory (PWM) task.

      Evidence for the model advanced by the authors, however, remains incomplete. The model makes several bold predictions about behavior and neural activity, untested here, that either conflict with previous findings or have yet to be reported but are necessary to appropriately constrain the model.

      First, in the most general (descriptive) formulation of the Boboeva et al. model, on a fraction of trials items in working memory are replaced by items observed on previous trials. In delayed estimation paradigms, which allow a more direct behavioral readout of memory items on a trial-by-trial basis than the PWM task considered here, reports should therefore be locked to previous items on a fraction of trials rather than display a small but consistent bias towards previous items. However, the latter has been reported (e.g., in primate spatial working memory, Papadimitriou et al., J Neurophysiol 2014). The ready availability of delayed estimation datasets online (e.g., from Rademaker and colleagues, https://osf.io/jmkc9/) will facilitate in-depth investigation and reconciliation of this issue.

      As pointed out by the reviewer, in the PWM task that we are modelling here, the activity in the network is used to make a binary decision. However, it is possible to directly analyse the network activity before the onset of the second stimulus.

      In their manuscript, Papadimitriou et al. study a memory-guided saccade task in nonhuman primates and argue that the animals display a small but consistent bias towards previous items (Fig 2). In that figure, the authors compute the error as the difference between the saccade direction and target direction in each trial. They compute this error for all trials in which the preceding trial’s target direction is between 35° and 85° relative to the current trial (counterclockwise with respect to the current trial’s target). They discover that the residual error distribution is unimodal with a mode at 1.29° and a mean at 2.21° (positive, so towards the preceding target’s direction), from which they deduce a small but systematic bias towards previous trial targets.

      We have computed a similar measure for our network with default parameters (Table 1), by subtracting the location of the bump at the end of the delay interval (s_hat(t), ‘saccade’) from the initial location of the first stimulus in the current trial (s1(t) or the ‘target’). We have done this for all trials where s1(t)=0.2, and where s2(t-1) takes specific values. These distributions are characterized by two modes. The first corresponds to those trials where the bump is not displaced in WM (i.e. mean of zero). We can also see the appearance of a second mode at the location of s1(t) - s2(t-1), corresponding to the displacements towards the preceding trial’s stimulus described in the main text. If, instead, we limit the analysis to a small range of previous trials close to s1(t) (similar to Papadimitriou et al) then the distribution of residual errors will appear unimodal, as the two modes merge. Importantly, note that there is a large variability around the second mode, expressing a more complex dynamics in the network. As can be seen in Fig 3B, the location of the bump is not always slaved to the one in the PPC in a straightforward way -- due to the adaptation in the PPC, the global inhibition in the connectivity kernel, as well as interleaved design for various delay intervals, the WM bump can be displaced in nontrivial ways (see also Recommendation no 4), yielding the dispersion around the second peak. It remains to be seen whether such patterns can be observed in the data from previous works on continuous working memory recall (including Papadimitriou et al). However, to our knowledge, such detailed and full analysis of errors at the level of individual trials has not been done.

      In summary, this analysis shows that the type of dynamics in our network is not one of the two cases: 1) small and systematic bias in each and every trial or 2) large error that occurs only rarely; rather, the dispersion around both modes suggests that the dynamics in our model are a mixture of these two limit cases.

      We have also performed another typical analysis, reported in several continuous recall tasks (e.g. Jazayeri and Shadlen 2010) where contraction bias has been reported. We plot WM bump locations after the delay period for every trial (s_hat(t)), and their averages, against the nominal value of s1(t). We see that the mean WM location deviates from the identity line toward the mean values of s1(t), again showing contraction bias as an average effect, while individual trials follow the dynamics explained above.

      We have now included a new section on continuous recall (Sect. 1.5 and a new figure (Fig 5)), which details the two above-mentioned analyses. The analysis of freely available datasets of delayed estimation tasks, unfortunately, is out of the scope of this work, and we leave such analyses to future studies.

      Second, the bulk of the modeling efforts presented here are devoted to a circuit-level description of how putative posterior parietal cortex (PPC) and working-memory (WM) related networks may interact to produce such volatility and biases in memory. This effort is extremely useful because it allows the model to be constrained by neural observations and manipulations in addition to behavior, and the authors begin this line of inquiry here (by showing that the circuit model can account for effects of optogenetic inactivation of rodent PPC).

      Further experiments, particularly electrophysiology in PPC and WM-related areas, will allow further validation of the circuit model. For example, the model makes the strong prediction that WM-related activity should display 'jumps' to states reflecting previously presented items on some trials. This hypothesis is readily testable using modern high-density recording techniques and single-trial analyses.

      As mentioned in response to the previous comment, we note again that in the WM network, the bump ‘displacement’ has a complex dynamics -- the examples we have provided in Fig 1A and 2B mainly show the cases in which jumps occur in the WM network, but this is not the only type of dynamics we observe in the model. We do have instances in which the continuity of the model causes drift across values, and we have now replaced the right panel in Fig 2B with one such instance, in order to emphasize that this displacement towards the previous trial’s stimulus (s2(t-1)) can occur in various ways. For a more thorough analysis, we have analyzed the distance between s1(t) and the position of the bump in the WM network at the end of the delay period s_hat(t), conditioned on specific values of s1(t) and s2(t-1) (Fig 5C). In this figure, we can see the appearance of two modes: one centered around 0, corresponding to the correct trials where the stimulus is kept in WM (s1(t) = s_hat(t)), and another mode centered around s2(t-1), the location of the second stimulus of the previous trial, where the bump is displaced. Note, as we explain in Sect. 1.5, the large dispersion around this second mode, which suggests that the bump is not always displaced to that specific location and may undergo drift.

      We agree with the reviewer that future electrophysiological experiments (or analysis of existing datasets) are necessary for validation of these results.

      Finally, while there has been a refreshing movement away from an overreliance on p-values in recent years (e.g., Amrhein et al., PeerJ 2017), hypothesis testing, when used appropriately, provides the reader with useful information about the amount of variability in experimental datasets. While the excellent visualizations and apparently strong effect sizes in the paper mitigate the need for p-values to an extent, the paucity of statistical analysis does impede interpretation of a number of panels in the paper (e.g., the results for the negatively skewed distribution in 5D, the reliability of the attractive effects in 6a/b for 2- and 3- trials back).

      We share the reviewer’s criticism towards the misuse of p-values – in order for a clearer interpretation of old Fig 5D (new Fig 7E), we have looked at the 2 and 3 trials-back biases by using all of our dataset – the negatively skewed, and also two bimodal distributions (of which only one was shown in the manuscript). This larger dataset of 43 subjects (approximately 17,200 trials) allows us to clearly see the 2 and 3 trial back attractive biases, and the effect that the delay interval exerts on them.

      Reviewer #1 (Recommendations For The Authors):

      Fig 5 A&C - It might be beneficial to separate the distribution of stimuli from the performance. It is hard to read the details of the performance, especially with error bars.

      Following the next recommendation, we have exchanged the standard deviation to standard errors of the mean, hopefully this allows to better read the performance.

      Fig 5C. The number of participants should be written. Perhaps standard errors instead of standard deviation?

      We have now changed the standard deviation to standard errors of the mean and included the number of participants in the figure.

      Fig 2B - hard to understand, because there is no marking of where "perfect" memory of s1 would be.

      The perfect memory of s1 is shown in the upper panel as black bars.

      Fig 3B. dot number 9 (blue, around 0.7) - why is WM higher than stimulus?

      This trial has a long ISI (blue means 10s). During this delay, the bump in the PPC, under the influence of adaptation, drifts far below the first stimulus (note that the previous trial also had its first stimulus in the same location, as a result of which the adaptative thresholds have built up significantly, causing the bump to move away from that location). During this delay period, neurons in the WM network receive inputs from the PPC network: if this input is strong enough, it can disrupt an existing bump; if not, this input still exerts inhibiting influence on the existing bump via the global inhibition in the connectivity. This can cause an existing bump to slowly drift in a random direction, and finally dissipate. Note that the lines in Fig 2B represent the neuron with the maximal activity, this activity may be a stable bump, or an unstable bump that may soon dissipate.

      Other examples with similar dynamics include trials 43 and 54.

      L167 fewer -> smaller

      We have now corrected this.

      Fig 3C - bump can also be in between. Is this binned?

      We have not binned the length of the attractor; to produce that figure, we check whether the position of the neuron with the maximal firing rate is within a distance of ±5% of the length of the whole line attractor from the target location.

      L221 Lapse at the boundary of attractor. This seems very different from behavior. Specifically, if it is in the boundaries, it should be stimulus dependent.

      Very sorry, we did not manage to understand the reviewer’s comment.

      L236 are -> is

      We have now corrected this.

      Fig S4 - should be mostly in main text.

      Part of this figure is in Fig 6A, but given the amount of detail, we think Supplementary Material is better suited.

      L253-254. Differences across all distributions - very minor except the bimodal case.

      That is correct, this is why we conducted the experiment with the bimodal distribution, to better differentiate the predictions of the two models.

      L273 extra comma after "This probability"

      We have now corrected this.

      ITI was only introduced in section 1.5.2. Perhaps worth mentioning the default 5s value earlier in the paper.

      We have now mentioned this in line 97-98.

      Fig S6B title: perhaps "previous stimuli"?

      We have now corrected this.

      L364 i"n A given trial"

      Equation 2 - no decay term?

      Thank you for pointing out this error, we have now corrected this.

      Equation 5,6 are j^W and j^P indices of neurons in those populations?

      Yes, j^W indexes neurons in the WM network, and j^P those in the PPC. We have now added this in the text for clarity.

      Bump with adaptation - other REFs? Sandro?

      We are aware of continuous bump attractors implementing short-term synaptic plasticity in various studies (including by Sandro Romani), but not in the form we have described. May the reviewer kindly point us towards the relevant literature.

      Free boundary - what is the connectivity for neurons 1 and N? Is it weaker than others? Is the integral still 1? Does this induce some bias on the extreme values?

      The connectivity of the network is all-to-all. However, as expressed by Eq. (3), the distance-dependent contribution to the weights, K, decreases exponentially as we move from neuron 1 onwards, and from neuron N down. The sum (or integral, in the large-N limit) of the K_ij for j on either side of neuron i is unity only when i is sufficiently far from 1 or N. We have rephrased the paragraph starting in line 516 to make this clearer.

      The presence of a boundary could introduce a bias in theory, but in practice, it affects the dynamics only when the bump drifts sufficiently close to it. The smallest stimulus in the simulated task has amplitude 0.2, with width 0.05, which implies the activation of 50 neurons on either side of neuron 400. If one compares this with the width of the kernel K in stimulus space (d_0 = 0.02), which spans ~10 neurons, we can see that the bump of activity stays mostly far from the boundary. It is possible, though it is observed rarely, when several consecutive long delay intervals happen to occur, that the bump in PPC drifts beyond the location corresponding to either the minimum or maximum stimulus.

      Code availability?

      Code simulating the dynamics of the network as well as analysing the resulting data can be found in the following repository: https://github.com/vboboeva/ParametricWorkingMemory Code used to analyse human behavioural data and fit them with our statistical model can be found in this repository: https://github.com/vboboeva/ParametricWorkingMemory_Data Code used to run the auditory PWM experiments with human subjects (adapted from Akrami et al 2018) can be found here: https://github.com/vboboeva/Auditory_PWM_human

      L547 stimuli

      We have now corrected this.

      Equation 14 uses both stimuli. Was this the same for the rest of analysis in the paper (first figures for instance)?

      This equation was used for all GLM analyses (Figs 9 and S6).

      D0 is very small (0.02). Does this mean that activity is essentially discrete in the model? Fig 1A & 2B - the two examples of model activity suggest this is the case. In other words - are there cases where the continuity of the model causes drift across values? Can you show an example (similar to Fig 1A)?

      Since this point has been raised beforehand, we refer to the first comment, Fig 2B and Sect. 1.5 for the response to this question.

      Table 1 - inter trial interval 6. Text says 5

      We have now corrected this in the text.

      Reviewer #2 (Recommendations For The Authors):

      In addition to my review above, I just have a few minor comments:

      • If I understood correctly, the squares inside the purple rectangle in Figure 1B are meant to show a gradation from red to blue, but this was hard to make out in the pdf.

      Actually the squares are all on one side or the other of the diagonal, therefore they do not have any gradation.

      • line 164: "The resulting dynamics... [are]?"

      We have corrected this in the text.

      • Fig 7B legend: "The network performance is on average worse for longer ITIs" – correct?

      This was a mistake, we have replaced worse with better.

      Other comments

      We realized that the colorbar reported the incorrect fraction classified in Figs 1B, 2C, 7B (new 8B), S2C, S3A, S5B. We have corrected this in the new version of the manuscript.

      We also found a minor mistake in one of our analysis codes that computed the n-trial back biases for different delay intervals. This did not change our results, actually made the effects clearer. The figures concerned are Fig 3F and new Fig 7E.

    1. Author Response

      eLife assessment

      This study presents important findings for understanding cortical processing of color, binocular disparity, and naturalistic textures in the human visual cortex at the spatial scale of cortical layers and columns using state-of-the-art high-resolution fMRI methods at ultra-high magnetic field strength (7 T). Solid evidence supports an interesting layer-specific informational connectivity analysis to infer information flow across early visual areas for processing disparity and color signals. While the question of how the modularity of representation relates to cortical hierarchical processing is interesting and fundamental, the findings that texture does not map onto previously established columnar architecture in V2 is suggestive but would benefit from further controls. The successful application of high-resolution fMRI methods to study the functional organization along cortical columns and layers is relevant to a broad readership interested in general neuroscience.

      Thank you for your assessment of our manuscript "Mesoscale functional organization and connectivity of color, disparity, and naturalistic texture in human second visual area ". We have carefully considered the public reviews and have outlined our plans of revision by providing point-by-point responses to the reviewers’ comments.

      Reviewer #1 (Public Review):

      To support the finding that texture is not represented in a modular fashion, additional possibilities must be considered. These include the effectiveness and specificity of the texture stimulus and control stimuli, (b) further analysis of possible structure in images that may have been missed, and (c) limitations of imaging resolution.

      Thank you for your suggestions. We will provide evidence and additional analyses to show that there was indeed a large difference in high-order statistical information between the texture and control stimuli in our study, and thus the contrast between the two stimuli should be effective in localizing the processing of high-order texture information. Compared to the previous studies, another reason for the weaker texture selectivity in the current study could be the smaller number of images used and the slower rate of image presentation. Although our fMRI result at 1-mm isotropic resolution did not show a modular processing of naturalistic texture in CO-stripe columns, this does not exclude the possibility that smaller modules exist beyond the current fMRI resolution. We will discuss these limitations in the revised manuscript.

      More in-depth analysis of subject data is needed. The apparent structure in the texture images in peripheral fields of some subjects calls for more detailed analysis. e.g. Relationship to eccentricity and the need for a 'modularity index' to quantify the degree of modularity. A possible relationship to eccentricity should also be considered.

      We will perform further analysis based on your suggestion, especially regarding the relationship between eccentricity and modulation index. We will discuss this possibility in the revised manuscript.

      Given what is known as a modular organization in V4 and V3 (e.g. for color, orientation, curvature), did images reveal these organizations? If so, connectivity analysis would be improved based on such ROIs. This would further strengthen the hierarchical scheme.

      Thank you for your suggestion. The informational connectivity analyses used highly informative voxels by feature selection, which may already represent information from the modular organizations in these higher visual areas. We will examine the functional maps for possible modular organizations.

      Reviewer #2 (Public Review):

      In lines 162-163, it is stated that no clear columnar organization exists for naturalistic texture processing in V2. In my opinion, this should be rephrased. As far as I understand, Figure 2B refers to the analysis used to support the conclusion. The left and middle bar plots only show a circular analysis since ROIs were based on the color and disparity contrast used to define thin and thick stripes. The interesting graph is the right plot, which shows no statistically significant overlap of texture processing with thin, thick, and pale stripe ROIs. It should be pointed out that this analysis does not dismiss a columnar organization per se but instead only supports the conclusion of no coincidence with the CO-stripe architecture.

      Reviewer #1 also raised a similar concern. We agree that there may be a smaller functional module of textures in area V2 at a finer spatial scale than our fMRI resolution. We will rephrase our conclusions to be more precise.

      In Figure 3, cortical depth-dependent analyses are presented for color, disparity, and texture processing. I acknowledge that the authors took care of venous effects by excluding outlier voxels. However, the GE-BOLD signal at high magnetic fields is still biased to extravascular contributions from around larger veins. Therefore, the highest color selectivity in superficial layers might also result from the bias to draining veins and might not be of neuronal origin. Furthermore, it is interesting that cortical profiles with the highest selectivity in superficial layers show overall higher selectivity across cortical depth. Could the missing increase toward the pial surface in other profiles result from the ROI definition or overall smaller signal changes (effect size) of selected voxels? At least, a more careful interpretation and discussion would be helpful for the reader.

      We will discuss the limitations of cortical depth-dependent analysis using GE-BOLD fMRI. All our stimuli produced robust activations in these visual areas, thus the flat laminar profiles of modulatory indices are unlikely to be caused by smaller signal changes. We will show the original BOLD responses in addition to the modulation index.

      I was slightly surprised that no retinotopy data was acquired. The ROI definition in the manuscript was based on a retinotopy atlas plus manual stripe segmentation of single columns. Both steps have disadvantages because they neglect individual differences and are based on subjective assessment. A few points might be worth discussing: (1) In lines 467-468, the authors state that V2 was defined based on the extent of stripes. This classical definition of area V2 was questioned by a recent publication (Nasr et al., 2016, J Neurosci, 36, 1841-1857), which showed that stripes might extend into V3. Could this have been a problem in the present analysis, e.g., in the connectivity analysis? (2) The manual segmentation depends on the chosen threshold value, which is inevitably arbitrary. Which value was used?

      The retinotopic atlas on the standard surface is usually quite accurate in defining the boundaries of early visual areas. Although some stripes may extend into V3, these patterns should be more robust in V2. In our analysis, we selected only those with clear organizations within the retinotopic atlas. Thus, the signal contribution from V3 is likely to be small and would not affect the pattern of results. In addition, the results between V3 and V2 could be very different, we will compare the pattern of results from these areas in additional analyses. The threshold for segmentation is abs(T)>2, we will clarify this in the method.

      The use of 1-mm isotropic voxels is relatively coarse for cortical depth-dependent analyses, especially in the early visual cortex, which is highly convoluted and has a small cortical thickness. For example, most layer-fMRI studies use a voxel size of around isotropic 0.8 mm, which has half the voxel volume of 1 mm isotropic voxels. With increasing voxel volume, partial volume effects become more pronounced. For example, partial volume with CSF might confound the analysis by introducing pulsatility effects.

      We agree that the 1-mm isotropic voxel is much smaller in volume than the 0.8-mm isotropic voxel, but the resolution along the cortical depth is not a large difference. In addition to our study, there are also other studies showing that fMRI at 1-mm isotropic resolution is capable of resolving cortical depth-dependent signals. Also, our fMRI slices were oriented perpendicular to the calcarine sulcus, the higher in-plane resolution will also benefit in resolving depth-dependent signals. We will discuss these issues about fMRI resolution in the revised manuscript.

      The SVM analysis included a feature selection step stated in lines 531-533. Although this step is reasonable for the training of a machine learning classifier, it would be interesting to know if the authors think this step could have reintroduced some bias to remaining draining vein contributions.

      Several precautions have been taken in the ROI definition to reduce the influence of large draining veins. The same number of voxels were selected from each cortical depth for the SVM analysis, thus there was no bias from the superficial layers susceptible to draining veins. Also, since both feedforward and feedback connections involved the superficial voxels, the remaining influence of large draining veins should be comparable between the two connections.

      Reviewer #3 (Public Review):

      The authors tend to overclaim their results.

      Thank you for your comments. We will add more control analyses to strengthen our findings, and have appropriate discussion of results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article describes a useful python-based image-analysis tool for bacteria growing in the 'mother-machine' microfluidic device. This new method for image segmentation and tracking offers a user-friendly graphical interface based on the previously developed, promising environment for image analysis 'Napari'. The authors demonstrate the usefulness of their software and its robust performance by comparing it to other methods used for the same purpose. The comparison provides solid support for the new method, although it would have been even stronger if tested using data sets from other groups. This article will be of interest for scientists who utilize the 'mother machine', not least because it also provides a short overview of how to set up this widely used device.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors aim to develop an easy-to-use image analysis tool for the mother machine that is used for single-cell time-lapse imaging. Compared with related software, they tried to make this software more user-friendly for non-experts with a design of "What You Put Is What You Get". This software is implemented as a plugin of Napari, which is an emerging microscopy image analysis platform. The users can interactively adjust the parameters in the pipeline with good visualization and interaction interface.

      Strengths:

      • Updated platform with great 2D/3D visualization and annotation support.

      • Integrated one-stop pipeline for mather machine image processing.

      • Interactive user-friendly interface.

      • The users can have a visualization of intermediate results and adjust the parameters.

      We thank the reviewer for their positive comments.

      Weaknesses:

      • Based on the presentation of the manuscript, it is not clear that the goals are fully achieved.

      • Although there is great potential, there is little evidence that this tool has been adopted by other labs.

      • The comparison of Otsu and U-Net results does not make much sense to me. The systematic bias could be adjusted by threshold change. The U-Net output is a probability map with floating point numbers. This output is probably thresholded to get a binary mask, which is not mentioned in the manuscript. This threshold could also be adjusted. Actually, Otsu is a segmentation method and U-Net is an image transformation method and they should not be compared together. U-Net output could also be segmented using Otsu.

      We agree that the comparison of the classical and U-Net results may be misleading. As the reviewer points out, the issue ultimately comes down to thresholding. Indeed, the threshold of both the Otsu and U-Net outputs could be adjusted to bring them into line with each other. The comparison between the Otsu pipeline and U-Net pipeline is meant to illustrate that any pipeline (making use of a variety of methods) may be highly susceptible to the value of a user-input (or hard-coded threshold).

      We have clarified the discussion to emphasize that the comparison is not specifically between U-Net and Otsu but between the two pipelines (lines 238 - 257).

      We have also clarified that the U-Net probability map output was binarized with a threshold of 0.5 (lines 538-541). We note the same activation function and threshold are used in DeLTA. As the reviewer points out, Otsu’s method could indeed be applied to threshold the U-Net output as well. What we referred to as the “Otsu” MM3 method itself uses Otsu thresholding coupled with a Euclidean distance transform and a Random Walker algorithm. For clarity we now refer to it as a classical or non-learning method in the text.

      • The diversity of datasets used in this study is limited.

      We have added a section “Testing napari-MM3 on other datasets” (lines 187-196) evaluating the performance of MM3 on 4 datasets (3 E. coli, 1 Corynebacterium glutamicum) from outside our lab, demonstrating its versatility.

      • There is some ambiguity in the main point of this manuscript, the title and figures illustrate a complete pipeline, including imaging, image segmentation, and analysis. While the abstract focus only on the software MM3. If only MM3 is the focus and contribution of this manuscript, more presentations should focus on this software tool. It is also not clear whether the analysis features are also integrated with MM3 or not.

      We have added a line (lines 160-162) clarifying that final analysis and plotting must be done outside of napari. MM3 itself processes raw microscopy images, segments cells and reconstructs cell lineages (Figure 2).

      • The impact of this work depends on the adoption of the software MM3. Napari is a promising platform with expanding community. With good software user experience and long-term support, there is a good chance that this tool could be widely adopted in the mother machine image analysis community.

      We thank the reviewer for their endorsement of MM3’s potential.

      • The data analysis in this manuscript is used as a demo of MM3 features, rather than scientific research.

      Reviewer #2 (Public Review):

      The authors present an image-analysis pipeline for mother-machine data, i.e., for time-lapses of single bacterial cells growing for many generations in one-dimensional microfluidic channels. The pipeline is available as a plugin of the python-based image-analysis platform Napari. The tool comes with two different previously published methods to segment cells (classical image transformation and thresholding as well as UNet-based analysis), which compare qualitatively and quantitatively well with the results of widely accessible tools developed by others (BACNET, DelTA, Omnipose). The tool comes with a graphical user interface and example scripts, which should make it valuable for other mother-machine users, even if this has not been demonstrated yet.

      We thank the reviewer for their positive comments.

      The authors also add a practical overview of how to prepare and conduct mother-machine experiments, citing their previous work and giving more advice on how to load cells using centrifugation. However, the latter part lacks detailed instructions.

      We have added a more detailed experimental protocol, including the procedure we use for cell loading, to the lab github page https://github.com/junlabucsd/mother-machine-protocols (linked in the main text).

      Finally, the authors emphasize that machine-learning methods for image segmentation reproduce average quantities of training datasets, such as the length at birth or division. Therefore, differences in training can propagate to difference in measured average quantities. This result is not surprising and is normally considered a desired property of any machine-learning algorithm as also commented on below.

      Points for improvement:

      Different datasets: The authors demonstrate the use of their method for bacteria growing in different growth conditions in their own microscope. However, they don't provide details on whether they had to adjust image-analysis parameters for each dataset. Similarly, they say that their method also works for other organisms including yeast and C. elegans (as part of the Results section) but they don't show evidence nor do they write whether the method needs to be tuned/trained for those datasets. Finally, they don't demonstrate that their method works on data from other labs, which might be different due to differences in setup or imaging conditions.

      We have added a section “Testing napari-MM3 on other datasets” (lines 187-196) evaluating the performance of MM3 on 4 datasets (3 E. coli, 1 Corynebacterium glutamicum) from outside our lab, demonstrating its versatility. We provide details of the procedure and parameters used in the Methods section. (“Analysis of external datasets” lines 476-486).

      Bias due to training sets:

      The bias in ML-methods based on training datasets is not surprising but arguably a desired property of those methods. Similarly, threshold-based classical segmentation methods are biased by the choice of threshold values and other segmentation parameters. A point that would have profited from discussion in this regard: How to make image segmentation unbiased, that is, how to deliver physical cell boundaries? This can be done by image simulations and/or by comparison with alternative methods such as fluorescence microscopy.

      We agree this is an important point. We have revised the relevant sections (lines 238 - 270) to add context to the discussion of bias in both classical and deep learning methods. We have added a subsection (lines 401 - 410) discussing methods to this end, such as synthetic training data generation or calibrating the segmentation to fluorescence images.

      The authors stress the user-friendliness of their method in comparison to others. For example, they write: 'Unfortunately, many of these tools present a steep learning curve for most biologists, as they require familiarity with command line tools, programming, and image analysis methods.' I suggest to instead emphasize that many of the tools published in recent years are designed to be very use friendly. And as will all methods, MM3 also comes at a prize, which is to install Napari followed by the installation of MM3, which, according to their own instructions, is not easy either.

      We have modified our language to acknowledge that indeed recent software such as DeLTA and BACMMAN make a point to be user-friendly and accessible (lines 52-53).

      Reviewer #1 (Recommendations For The Authors):

      -The resources, including documentation and code, are referenced and are not easy to find. It should be easier for readers to curate them in a separate Resources section.

      We have created a Resources section in the Methods (top of first page) with the documentation, code and protocols hyperlinked.

      • It would be easier to understand the usage of MM3 with a screen recording video. I found a video from the GitHub paper, but the resolution is a bit low. Attaching a high-resolution screenshot video would be helpful.

      A high resolution tutorial video has been made more visible on the github page.

      • In Table 1, AMD GPU is used which is not easy to use for Deep Learning. It is not clear whether the GPU is used for Deep Learning training and inference.

      We have clarified this point in the Table 1 caption, and linked to a reference on how to use AMD GPUs with Tensorflow on Macs.

      • Some paragraphs in the Discussion section are like blogs with general recommendations. Although the suggestions look pretty useful, it is not the focus of this manuscript. It might be more appropriate to put it in the GitHub repo or a documentation page. The discussion should still focus on the software, such as features, software maintenance, software development roadmap, and community adoption.

      • It would be easier for reviewers to add line numbers in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Software Installation: This might be something for the GitHub forum, but briefly trying to install the plugin myself, I already failed at the first line of the GitHub instructions, which is to use mamba for installation. This relates to my point above: Any program that is not stand-alone requires some user-savviness and trial-and-error, which is just hard to avoid for any method. I suggest being less critical of 'other methods' and instead focus on the advantage of the mother-machine-specific aspects of napari-mm3.

      The authors write 'Still, most labs do not have the time and resources to evaluate other tools they do not use critically, [...]'. The sentence is not very clear. Evaluating tools not used is obviously difficult/impossible.

      We have reworded this sentence to be more clear (lines 54-55).

      The authors write: 'The supervised learning method uses a convolutional neural net (CNN) with the U-Net architecture [20].' Can the authors cite previous work that has taken advantage of this approach before (e.g., DelTA)?

      We have added citations to DeLTA and other previous software (line 151).

      Cell tracking and lineage reconstruction should be described in more detail and/or with reference to previous work.

      We have added more details to the SI (lines 554 - 567) discussing the method in the context of existing mother machine analysis software.

      The authors provide a figure for a '3D printed cell loader', but as far they don't give instructions including a CAD file and the model of the fan used for spinning. The same holds for the stage inset (which, as far as I see, is not referred to in the manuscript text nor described in a figure caption).

      Thank you for pointing out this omission. The centrifuge is referenced in Box 1. We have updated the manuscript with a link to a Github repository containing CAD files & details of the centrifuge construction. We decided to remove the stage insert from the figure.

      Figure S3: Is the asymmetry in growth rate due to the expression of a fluorescent protein, due to strain differences, or due to imaging artifacts? Maybe this is impossible to tell based on the available datasets, but this could be discussed.

      Based on previous work (DOI 10.1099/mic.0.057240-0) it is likely due to the expression of the fluorescent protein and fluorescence imaging. We have added a brief discussion in the Figure S3 caption.

    1. Author Response

      The authors appreciate the reviewers' thoughtful and constructive feedback. We are pleased to have the opportunity to address their comments through a revised version to strengthen our work. In particular:

      (1) As suggested, we will add references/details in Methods to further help readers to establish the cohort as population-derived and clarify details about the analysis and specificity of results.

      (2) We agree that reserve, inefficiency, and compensation are complex issues needing more discussion. We will add definitions and discussion to clarify our approaches, including multivariate/univariate analyses and addressing the specificity of results. We also appreciate the suggestions for future research directions.

      A revised version addressing these valuable recommendations will improve our study's contribution towards quantitative methods for understanding reserve and compensation in healthy cognitive ageing.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, the authors have explored how treating C. albicans fungal cells with EDTA affects their growth and virulence potential. They then explore the use of EDTA-treated yeast as a whole-cell vaccine in a mouse model of systemic infection. In general, the results of the paper are unsurprising. Treating yeast cells with EDTA affects their growth and the addition of metals rescues the phenotype. Because of the significant growth defects of the cells, they don't infect mice and you see reduced virulence. Injection with these cells effectively immunises the mice, in the same way that heat-killed yeast cells would. The data is fairly sound and mostly well-presented, and the paper is easy to follow. However, I feel the data is an incremental advance at best, and the immune analysis in the paper is very basic and descriptive.

      Strengths:

      Detailed analysis of EDTA-treated yeast cells

      Weaknesses:

      • Basic immune data with little advance in knowledge.

      • No comparison between their whole-cell vaccine and others tried in the field.

      • The data is largely unsurprising and not novel.

      Thank you so much for appreciating our effort to generate a live whole-cell vaccine by treating with EDTA. Also, we appreciate your comment that the manuscript is sound and well-presented. However, we are afraid that the respected reviewer assumed the CAET cells as dead cells. CAET is a live cell just that it replicates slower than the wild type. Since the respected reviewer presumed CAET to be a dead strain similar to heat-killed, most of his/her comments were partly negative.

      Reviewer #2 (Public Review):

      Summary:

      Invasive fungal infections are very difficult to treat with limited drug options. With the increasing concern of drug resistance, developing an antifungal vaccine is a high priority. In this study, the authors studied the metal metabolism in Candida albicans by testing some chelators, including EDTA, to block the metal acquisition and metabolism by the fungus. Interestingly, they found EDTA-treated yeast cells grew poorly in vitro and non-pathogenic in vivo in a murine model. Mice immunized by EDTA-treated Candida (CAET) were protected against challenge with wild-type Candida cells. RNA-Seq analysis to survey the gene expression profile in response to EDTA treatment in vitro revealed upregulation of genes in metal homeostasis and downregulation of ribosome biogenesis. They also revealed an induction of both pro- and anti-inflammatory cytokines involved in Th1, Th2 and Th17 host immune response in response to CAET immunization. Overall, this is an interesting study with translational potential.

      Strengths:

      The main strength of the report is that the authors identified a potential whole-cell live vaccine strain that can provide full protection against candidiasis. Abundant data both on in vitro phenotype, gene expression profile, and host immune response have been presented.

      Weaknesses:

      A weakness is that the immune mechanism of CAET-mediated host protection remains unclear. The immune data is somewhat confusing. The authors only checked cytokines and chemokines in blood. The immune response in infected tissues and antibody response may be investigated.

      Thank you very much for appreciating our work and finding our strain to be a live whole-cell vaccine strain with translational potential. Since the current study focused on the identification and detailed characterization of a non-genetically modified live attenuated strain and its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. We are in the process of developing another manuscript where we describe both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find a vaccine solution for invasive candidiasis.

      Strengths:

      The testing of the antifungal activity of EDTA on Candida is not new as many other papers have examined this effect. The novelty here is the use of this EDTA-treated strain as a vaccine to protect against a secondary challenge with wild-type Candida.

      Weaknesses:

      However, data presented in Figure 5 and Figure 6 are not convincing and need further experimental controls and analysis as the authors do not show a time-dependent effect on the CFU of their vaccine formulation. The methodology used is also an issue. As it stands, the impact is minor.

      Thank you so much for appreciating our efforts to develop a novel vaccine against fungal infections. Although the Figs. 5 and 6 are the main straight of the paper, we are afraid that this respected reviewer found them not convincing.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper by Perovic and colleagues describes how important blood vessels called collaterals form during development and remodel/expand upon injury to the brain. These vessels are conduits between arteries that do not have strong blood flow physiologically but upon injury can compensate for conduit loss. Published work by others is largely descriptive and does not address the cellular sources of collaterals over time. Here elegant lineage tracing is used to better understand the source of vascular endothelial cells during embryonic development, and how these lineages contribute to remodeling upon injury. The work is ambitious and important as collateral capacity can strongly influence the trajectory of outcomes with vascular blockage. The work reveals that proliferative arterial EC is the primary contributor to the collaterals developmentally, with a small contribution from capillary/venous EC, and that this shifts to almost completely arterial contribution from birth onward. There are several aspects of the work that, if addressed, would strengthen the study and better support the interesting and novel conclusions, including analysis of non-collateral lineage contributions, more careful interpretation of fixed image data, and more careful annotation of the image panels.

      We thank the reviewer for appreciating the ambition, importance and novelty of our work, and for the constructive suggestions for improvements.

      Reviewer #2 (Public Review):

      Pial collateral vessels are anastomotic connections that cross-connect distal arterioles of the middle, anterior, and posterior cerebral arteries. With respect to ischemic stroke, good pial collateral flow positively correlates with decreased infarct volume and improved recovery; accordingly, optimizing collateral flow represents an important intervention for limiting stroke damage. The goal of this study was to determine the endothelial cell (EC) subtype(s) that contribute to the embryonic and neonatal development of pial collaterals and their expansion in response to stroke. To this end, the authors used lineage tracing methods in the mouse, labeling arterial endothelial cells (using Bmx-CreERT on switch line, R26mTmG) or venous and microvascular endothelial cells (using Vegfr3-CreERT on R26mTmG) and assessing pial collaterals via confocal microscopy. The authors convincingly demonstrate that arterial-lineage ECs comprise the majority of pial collateral ECs during development and in adulthood, with a minor contribution from pial plexus-derived microvascular ECs that decline over time. They also convincingly demonstrate that pial collateral outward remodeling after experimentally-induced stroke (distal middle cerebral artery occlusion, or dMCAO) involves, at least in part, local proliferation of arterial-lineage ECs. The latter is intriguing given that arterial ECs generally leave the cell cycle. While these conclusions are quite solid, some key details are missing that could improve analysis, and some important caveats are not addressed. Moreover, less convincing are mechanistic claims that pial collaterals form via a migratory process of "mosaic colonization" of a preexisting vessel.

      We thank the reviewer for the careful assessment and suggestions for improvements. Claiming migratory behaviour from static images is indeed always tricky and comes with caveats. Our conclusions however are based on the appearance of cells in locations where they are not found at earlier stages. Given that we could exclude persistent recombination, a sound conclusion must be that cells appear in the new location through some means of translocation. Given our experience with the morphology of migrating cells in vivo, the appearance of polarized filopodial structures coinciding with the direction of observed appearance of cells at progressive later stages, strongly suggests active migration. Moreover, these highly migrating cells also exhibit ICAM2 positivity, suggesting that they are directly lining the pre-collateral lumen. In our explanation of how the immigration might occur, we would need to consider solitary cell migration through interstitial space, or rather intercalation movement. The active participation of migrating cells in lumen formation of the nascent pre-collateral suggests intercalation, but further analysis needs to be performed (such as a detailed analysis of cell-cell junctions or sustained apico-basal polarity). The conclusion that such a process highlights mosaic colonization of preexisting vessels is tightly linked to the demonstration of continuous lumen, whilst being found in a vessel without lineage marker, but beginning expression of arterial markers such as Cx40.

      1) It is difficult to understand whether individual collaterals are truly mosaic vessels, or whether arterial or venous/microvascular lineage ECs predominate in any particular region of the pial collateral vasculature. This is due to a number of methodological reasons: arterial and venous/microvascular contributions to pial collaterals were assessed independently, only a few (and in some cases, just one) collaterals were analyzed in each mouse, and regionality/location of collaterals was not addressed. Additionally, the inefficiency and variability of EC labeling, especially with the Vegfr3-CreERT line (Fig. S1, ~6-30%), compounds this problem.

      Factual error: 6 - 22% (not 30)

      The reviewer is correct in their statement that the independent assessment of contribution makes it difficult to locally demonstrate mosaicism. However, we are not aware of a method that could trace two different populations from different sources using recombination genetics simultaneously. Mosaicism however can be concluded from two observations independently. One, we find contribution from an alternative source that at the time point of labelling does not colocalize with arterial BMX lineage cells. Second, the BMX-lineage labelling is never complete in the collaterals, at least at developmental stages. Future work using scRNA seq may shed more light onto the degree of mosaicism. However at this point, the data strongly suggest mosaicism, even if the majority of the cells are of the BMX-lineage. The comment on inefficiency or variability of labelling in particular with the Vegfr3-CreERT line is interesting. At this point, we cannot rule out that the observed variability is due to intrinsic variability in expression, rather than inefficient recombination, or variability thereof. With our current tools we cannot easily distinguish between the two. Again, we hope that future studies with scRNA seq will be able to shed more light onto this interesting biology. Finally, we have not carefully assessed regionality, but have not seen obvious correlations with the degree of mosaicism. It is however important to note that in no case did we just examine one collateral per hemisphere. Each data point is an average of all collaterals from a part of a given collateral zone (imaging region). Usually, it is possible to image 2-4 collateral regions in each embryo. We always imaged multiple collaterals per animal, but sometimes only one region was imaged (due to technical issues).

      2) The identification of "pre-collateral" vessels requires further support. The authors define these vessels by their connection to the feeding artery, their (often) larger diameter, and their more pronounced ICAM2 expression. While most of these criteria are demonstrated in Figure S3, it is not apparent how these vessels were defined in Figure 4, which lacks specific annotation of each of these identifying criteria. As the identification of these novel vessels is one of the key findings of this paper, a more robust method of unambiguously defining them is warranted.

      We agree that it would be fabulous to have a unique marker at hand that identifies pre-collaterals. Our careful analysis of the distribution of the markers we tested, firmly established that the levels of ICAM2 expression nicely highlight structures that become colonized by these BMX lineage cells. Cx40 staining also confirmed this impression. We will attempt better annotation based on these markers to help the reader appreciate these findings. The combination of anatomical location and connection pattern with the stronger ICAM2 staining in our hands is a highly reliable and unambiguous identifier of what we called “pre-collaterals”.

      3) The conclusion that collateral-forming ECs migrate in the direction of flow into preexisting vessels is not well supported. The authors state that the presence of filopodial projections (Figure 4) supports this conclusion. However, filopodia number and directional polarization/orientation were not quantified, and "intercalation movements"/migration, per se, cannot be inferred from these static images.

      The reviewer is correct that claiming migration from static images is always difficult. As stated above, we base our conclusions on the progressive appearance of cells exhibiting migratory behavior, as well as the morphology including filopodia. Although we indeed didn’t quantify filopodia, these structures are in our experience not found on endothelial cells that do not engage in migration. Their consistent presence, and directionality is strongly suggestive of movement. . We will attempt to clarify this better in the text and the figures.

      4) In Figure 5, the simplest explanation for relative Cx40 expression in different vessels is the absence (low expression) or presence (high expression) of flow. This figure provides little mechanistic insight beyond this already-known relationship, and it is unclear how many times this experiment was performed (there is no N, no quantification or correlation).

      Flow is indeed one component of what regulated Cx40. However, a key point of this figure is to show that Cx40 expression can precede the recruitment of BMX lineage cells. This is important to distinguish whether arterial identity is only achieved by recruitment of BMX lineage cells, or exists in certain vessels (for example because they may have more flow) already before this colonization event. It suggests that the BMX population may rather serve to consolidate arterial state, as other structures that may have been Cx40 before, but do not become colonized lose arterial identity? We disagree that this finding does not contribute important information. If only BMX-lineage cells would express Cx40, the conclusion would be very different. This is not a question of how much, but of whether arterialization requires the recruitment of particular cells, or is induced in vessels that adopt arterial identity. This is not a singular observation and we will add the N number onto the figure legend.

      5) There is no statistical analysis in this work. This is justified by the authors by their admission that the study is of a "descriptive nature and...exploratory design."

      This is correct.

      Reviewer #3 (Public Review):

      Summary:

      These studies focus on a very interesting, understudied phenomenon in vascular development - the formation of pial collaterals between cerebral arteries. Understanding the mechanism(s) that regulates this process during normal development could provide important insights for the treatment of adult stroke patients, for which repair is highly dependent on collateral formation. Insights may also be relevant to other collateral-dependent diseases, such as heart disease and chronic peripheral ischemia.

      Strengths:

      The investigators use lineage tracing and 3D imaging to show that, in mouse embryos, endothelial cells (ECs) predominantly from Bmx+ arteries and some from the Vegfr3+ microvasculature, invade pre-existing pre-collateral vascular structures in a process they termed "mosaic colonization", and arterialization of the vessel segments is said to occur concurrently with colonization, although details about EC phenotypes are lacking. Growth of the collaterals in response to ischemic injury relies on local replication of the ECs within the collaterals and not further recruitment from veins and the microvasculature. Although detailed molecular mechanisms are not provided, demonstration of the "cellular mechanism" of pial collateral vascularization is novel.

      Weaknesses:

      Nonetheless, there are some issues that should be addressed, particularly to clarify the phenotype of the ECs forming the collaterals and expanding in response to injury; only their "origin" was traced and not their identity/growth after labeling in Bmx+ vessels.

      We thank the reviewer for pointing out the importance and novelty of our findings, and for the constructive suggestions for improvements. We indeed focussed here on origin and an attempt to distinguish how the cells arrive in their location rather than on their phenotype. We have performed detailed phenotypic analysis including EM analysis of collaterals but without the ability to connect these to the traced lineages. We therefore chose to leave these data for a separate manuscript. Future work will attempt to fully characterize these populations including their transcriptome using scRNA seq. However, isolating collateral ECs to faithfully characterize them is very challenging, and will not be a part of this manuscript. We have performed stainings for various arterial markers, with variable success.. Nevertheless, a full functional study will be part of future work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: The authors study the appearance of oscillations in motifs of linear threshold systems, coupled in specific topologies. They derive analytical conditions for the appearance of oscillations, in the context of excitatory and inhibitory links. They also emphasize the higher importance of the topology, compared to the strength of the links. Finally, the results are confirmed with WC oscillators, which are also linear. The findings are to some extent confirmed with spiking neurons, though here results are less clear, and they are not even mentioned in the Discussion.

      Overall, the results are sound from a theoretical perspective, but I still find it hard to believe that they are of significant relevance for biological networks, or in particular for the oscillations of BG-thalamus-cortex loop in PD. I find motifs in general to be too simplistic for multiscale and generally large networks as is the case in the brain. Moreover, the division of regions is more or less arbitrary by definition, and having such a strong dependence on an odd/even number of inhibitory links is far from reality. Another limitation is the fact that the cortex is considered a single node. Similarly, decomposing even such a coarse network in all possible (238 in this case) motifs doesn't seem of much relevance, when I assume that the emergence of pathological rhythms is more of an emergent phenomenon.

      Strengths:

      From the point of view of nonlinear dynamics, the results are solid, and the intuition behind the proofs of the theorems is well explained.

      Weaknesses:

      As stated in the summary, I find the work to be too theoretical without a real application in biological systems or the brain, where the networks are generally very large.

      We respectfully disagree with the reviewer here. The second half of the paper is all about explaining a biological problem. We have shown the validity of our theoretical results (which indeed were obtained in idealized settings) to explain emergence of oscillations in the basal ganglia. We clearly show that our theoretical results hold both in a rate-based model and in a network model with spiking neurons. The model with spiking neurons is one of the most complete network models of the basal ganglia available in the literature. So we emphasize that we have provided a clear application of our results for the brain networks.

      It is not the problem in the simplicity of the model or of the topology, it is often the case that the phenomena are explained by very reduced systems, but the problem is that the applicability of the finding cannot be extended. E.g. the Kuramoto model uses all-to-all coupling, or similar with QIF neurons which also need to follow a Lorentzian distribution in order to derive a mean field.

      We do not understand this comment. There is no need to extend these results to a network of Kuramoto models because in that setting we already assume that individual nodes/populations are oscillating – there is no problem of emergence of oscillations. Here, we are specifically considering a setting in which nodes themselves are not oscillators. We agree that we, at this point, have no insight into how to extend our analytical proof to a situation where individual nodes are spiking.

      But in those cases, relaxing the strict conditions that were necessary for the derivations, still conserves the main findings of the analysis, which I don't see being the case here. The odd/even number rule is too strict, and talking about a fixed and definite number of cycles in the actual brain seems too simplistic.

      We have clearly relaxed most of our assumptions when we considered a network model of basal ganglia in which each subpopulation is a collection of spiking neurons. And as we have shown our results still hold (see Figure 5). Again our model is about oscillations in a network of networks i.e. network of brain regions.

      At meso-scale it is not unreasonable to find such cycles and even-odd number rules. We have shown this for the case of a cortico-basal ganglia model. We can also extend this to cortico-thalamic networks and so on. We have already emphasized this point in the introduction to avoid any confusion: see lines 62-66 – “We prove this conjecture for the threshold-linear network (TLN) model without delays which can closely capture the dynamics of neural populations. Therefore, it is implicit that our results do not hold at the neuronal level but rather at the level of neuron populations/brain regions e.g. the basal ganglia (BG) network which can be described a network of different nuclei.” and lines 69-70 – ’Within the framework of the odd-cycle theory, distinct nuclei are associated with either excitatory or inhibitory nodes.’

      Being linear is another strong assumption, and it is not clear how much of the results are preserved for spiking neurons, even though there is such an analysis, or maybe for other nonlinear types of neuronal masses.

      Clearly our results hold in a network of spiking neurons (see Figure 5). It is of course interesting to ask whether our results hold in a network where individual spiking neurons have more complex spiking behavior like AdEx or Quadratic IF. But that kind of analysis deserves a full manuscript on its own.

      Delays are also mentioned, and their impact on the oscillatory networks is as expected: it reduces the amplitude, but there is no link to the literature, where this is an established phenomenon during synchronization. Finally, the authors should also discuss the time-delays as a known phenomenon to cause or amplify oscillations at different frequencies in a network of coupled oscillators, e.g Petkoski & Jirsa Network Neuroscience 2022, Tewarie et al. NeuroImage 2019, Davis et al. Nat Commun 2021.

      This is indeed a weakness of our model. But as the reviewer already knows, dynamical systems with delays are very difficult to analyze analytically. We have mentioned this in the limitations of the model and the analysis. In our simulations we have considered delays and when the delays are within reasonable limits our results hold.

      Reviewer #2 (Public Review):

      Summary:

      The authors present here a mathematical and computational study of the topological/graph theory requirements to obtain sustained oscillations in neural network models. A first approach mathematically demonstrates that a given network of interconnected neural populations (understood in the sense of dynamical systems) requires an odd number of inhibitory populations to sustain oscillations. The authors extend this result via numerical simulations of (i) a simplified set of Wilson-Cowan networks, (ii) a simplified circuit of the cortico-basal ganglia network, and (iii) a more complex, spike-based neural network of basal ganglia network, which provides insight on experimental findings regarding abnormal synchrony levels in Parkinson's Disease (PD).

      Strengths:

      The work elegantly and effectively combines solid mathematical proof with careful numerical simulations at different levels of description, which is uncommon and provides additional layers of confidence to the study. Furthermore, the authors included detailed sections to provide intuition about the mathematical proof, which will be helpful for readers less inclined to the perusal of mathematical derivations. Its insightful and well-informed connection with a practical neuroscience problem, the presence of strong beta rhythms in PD, elevates the potential influence of the study and provides testable predictions.

      Weaknesses:

      In its current form, the study lacks a more careful consideration of the role of delays in the emergence of oscillations. Although they are addressed at certain points during the second part of the study, there are sections in which this could have been done more carefully, perhaps with additional simulations to solidify the authors' claims. Furthermore, there are several results reported in the main figures which are not explained in the main text. From what I can infer, these are interesting and relevant results and should be covered. Finally, the text would significantly benefit from a revision of the grammar, to improve the general readability at certain sections. I consider that all these issues are solvable and this would make the study more complete.

      This point has been made by the first reviewer as well. So we repeat our answer:

      This is indeed a weakness of our model. But as the reviewer already knows, dynamical systems with delays are very difficult to analyze analytically. We have mentioned this in the limitations of the model and the analysis. In our simulations we have considered delays and when the delays are within reasonable limits our results hold.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in my comments above, I think that the work is already quite solid and relevant but would significantly improve if some issues were addressed:

      We would like to thank the reviewer for valuable comments and constructive feedback which has helped us greatly improve the manuscript.

      1) While the authors acknowledge early on the limitations of this study in terms of not considering plasticity or neuron biophysics (line 72), I think that the absence of propagation delays should be explicitly included here. This absence leads to inaccuracies --for example, the sentence "Consider a small network of two nodes. If we connect them mutually with excitatory synapses, intuitively we can say that the two-population network will not oscillate" (line 74) is only correct if the delays (or signal latencies) are zero. With a proper delay, two excitatory neurons can engage in oscillations with a period given by two times the value of the delay.

      A similar situation happens for inhibitory neurons, where the winner-take-all dynamics described in line 77 is only valid for zero delay. It is known that a homogeneous population of inhibitory spiking neurons with delayed synapses can lead to fast oscillations (Brunel and Hakim 1999), something which is also valid for the equivalent inhibitory single node with delayed self-inhibition. Indeed, a circuit of two inhibitory populations with delayed self- and cross-inhibition can generate oscillations, contradicting the main conclusion of the odd number of inhibitory nodes needed for oscillations.

      Because of these considerations, I think the authors should be more careful when explaining the effects of delays, and state that their main results on the link between oscillations and having an odd number of inhibitory nodes are not valid when delays are considered. They could modify the sentences in lines 72-77 above and include a supplementary figure right after their simulation study for the Wilson-Cowan (to explain the examples above, and also the one in the next point).

      The reviewer has brought up a critical point regarding the impact of propagation delays, and we completely concur with your assessment. In our study, we indeed did not comprehensively consider the effects of propagation delays in cycles with even inhibition, which may introduce inaccuracies in our conclusions.

      We note that in the Wilson-Cowan model with delays, certain cycles with even number of inhibitory links can also generate oscillations with a period equal to twice the delay value. However, in our hand such oscillations were transient and dissipated quickly.

      To better reflect the limitations of our research, we have made significant modifications to the relevant sections in our manuscript.

      In line 100, we've added text to explicitly state that we considered delays in our simulations and acknowledged their potential to generate oscillations ("Given the importance of delays in biological network such as BG, we will consider them in the simulations.").

      In line 102, we've clarified that our conclusions are based on a scenario without delays ("In this following, we give simple examples of the possibility of oscillation (or not) based on the connectivity characteristics of small networks without delays. Let us start with a network of two nodes.").

      Additionally, in line 230, we've included a reference figure supplement 3-2 to highlight the outcomes in terms of oscillations ("EII network only resulted in transient oscillations (Fig. 3, figure supplement 3-1, figure supplement 3-2)").

      In lines 234-237, we've added a sentence discussing the role of synaptic delays in generating transient oscillations in cycles with an even number of inhibitory components, referring to figure supplement 3-2 ("In networks with even number of inhibitory connections (e.g. EII, EEE, II), synaptic delays are the sole mechanism for initiating oscillations, however, unless delays are precisely tuned such oscillations will remain transient (see Supplementary figure supplement 3-2)").

      Moreover, in response to the reviewer’s suggestion, we have included an additional figure supplement 3-2 to illustrate how cycles with even inhibitory components generate transient oscillations when propagation delays are taken into account. This figure provides a visual representation of the phenomenon and enhances the clarity of our findings.

      2) In Figure 3, two motifs (III and EII) are explored to demonstrate the validity of the results across different parameters. Delays don't seem to play a disruptive role in these two cases, but the results seem to be different for other motifs not considered here. Aside from the examples mentioned above, I can imagine how a motif of EEE (i.e. a circle of three excitatory Wilson-Cowan neurons) would display oscillations when delays are included, as the activation would 'circulate' along the ring. However, this EEE motif has an even number of inhibitory units (or perhaps zero is considered an exception, but if so it's not mentioned in the text).

      We thank the reviewer for this observation regarding Figure 3. Indeed, the impact of delays may differ for other motifs not considered in our study. For example, as the reviewer has correctly anticipated, a motif of EEE (a circular network of three excitatory Wilson-Cowan neurons) would exhibit oscillations when delays are included, as activation could 'circulate' along the ring.

      To address this concern,we have performed new simulations (added as a new supplementary figure supplement 3-2). As illustrated in figure supplement 3-2, oscillations may indeed arise in the EEE motif when delays are introduced. However, these oscillations will eventually dissipate – at least with our settings.

      3) Figures 1b, 1c, and 4e display interesting results, but these are absent from the main text. Please include the description of those results. Particularly the case of Figs 1b and 1c seems very relevant to understanding the main results in the context of more complex networks, in which multiple loops with odd and even numbers of inhibitory units would coexist in the network. Does the number of odd-inhibitory loops in a given network affect somehow the power or frequency of the resulting network oscillations? It would be interesting to show this.

      Indeed, we did not explain Figs 1b,c and 4e properly. Now we have revised the manuscript in the following way to incorporate these results:

      In lines 124-128, we added the following text to introduce the concept: "We can generalize these results to cycles of any size, categorizing them into two types based on the count of their inhibitory connections in one direction (referred to as the odd cycle rule, as illustrated in Fig. 1b). More complex networks can also be decomposed into cycles of size 2…N (where N is number of nodes), and predict the ability of the network to oscillate (as shown in Fig. 1c)" In line 298, we included the following text to highlight the relevant result: "Next, we removed the STN output (equivalent to inhibition of STN), the Proto-D2-Arky subnetwork generated oscillations for weak positive inputs to the D2-SPNs (Fig.4e, bottom)."

      How the number of odd/even loops affect the frequency is an interesting question. Intuitively there should be a relation between the two. However, a complete treatment of this question is beyond the scope of the manuscript but we think that in a network with identical node properties, more odd cycles should imply higher oscillation power.

      4) The cortico-BG model is focused on how inactivating STN could suppress (or not) beta oscillations, following experimental observations. However, besides mechanisms for extinguishing oscillations, it would be interesting to see if the progressive emergence of pathological beta oscillations could be explained by the modification of some of the nodes in the model (for example, explicitly mimicking the loss of dopaminergic neurons in the substantia nigra). This could be a very interesting additional figure in the main text.

      This is an interesting suggestion. Something similar has been already done – e.g. Kumar et al. (2010) showed that progressive increase of inhibition of GPe can lead to oscillations. Similarly Holgado et al. (2008) showed how progressive change in the mutual connectivity between STN and GPe can cause oscillations. More recently, Ortone et al. (PloS Comp. Biol 2023) and Azizpour et al. (2023 Bioarxiv) have also shown the effect of progressive change in individual node properties on oscillations in basal ganglia using numerical simulations. Our work in a way provides the theoretical backing to their work. Therefore, we think it is not necessary to again show these results in our model. Instead we have cited these papers. Lines 392-396

      5) I observed some grammatical inconsistencies in the text, some of them are indicated below. I would suggest carefully going through the text to correct those issues or seeking help with editing.

      -line 32 "...which can closely capture the neural population dynamics". Which population dynamics? Do the authors refer to general neural dynamics?

      -line 33 "long term behavior" -> long-term behavior

      -line 68 "given the ionic channel composition" -> "given its ionic channel composition"

      We apologize for the grammatical inconsistencies in our manuscript. We have made the necessary corrections to improve the clarity and accuracy of our text.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript is useful for analytically showing that a cyclic network of threshold-linear neural populations can only oscillate if it has an odd number of inhibitory nodes with strong enough connections. Establishing this result, which holds under rather narrow assumptions, relies on standard tools from dynamical system theory. I find the strength of support for this result to be incomplete for the reasons detailed below:

      Although the mathematical arguments used appear to be correct, the manuscript lacks in rigor and clarity. For instance, the main result presented in theorem 2 is stated in a very unclear fashion: aside from the oddity of the number of inhibitory nodes, there are two conditions to check, which determines four cases. This can be explained in a much more straightforward way without introducing four relations in equations 4-7.

      We acknowledge the reviewer’s concern regarding the presentation of the main result in Theorem 2.

      We would like to emphasize that the introduction of four relations in equations 4-7 was intended to provide a detailed and transparent exposition of the conditions for the main result. While we understand that this approach may appear less straightforward, it allows for a more comprehensive understanding of the underlying logic and the multiple factors influencing the outcomes.

      However, we are open to suggestions for more concise and clear ways to express these conditions if the reviewer has specific recommendations or if there are alternative approaches that the reviewer believes would be more effective in conveying the information.

      Moreover, equation 3 in that same theorem is clearly wrong.

      We sincerely apologize for the typographical error in equation 3 within the same theorem. We thank the reviewer for noticing this. We have revised the text to rectify this mistake. The equation has now been corrected to ensure its accuracy.

      The proof of theorem 2 relies on standard linear algebra and can be improved as well: there are typos, approximations, and missing words (see line 664). The rigor of the exposition is also unsatisfactory. For instance, the proof of Lemma 1 ends with the sentence: "Similarly as before, the convergence of the dynamics driven by the left and right terms ends the proof". I don't know what this means.

      We thank the reviewer for the comments and suggestions. We have made the necessary adjustments to enhance the rigor and clarity of our mathematical reasoning in the revised manuscript.

      In line 644, we have provided clarification for the sentence you found unclear. The revised version now offers a more precise explanation that should help in understanding the proof.

      At the same time, the intuitive arguments presented in the main text are vague at best and do not really help grasping the possible generalizability of the results. For instance, I do not understand the message of panel B in Figure 2 and there seems to be no explanation about it in the main text.

      The main purpose of Figure 2B is to offer a visual representation of the concept and to serve as an aid for readers who may prefer a graphical illustration over extensive equations. While we understand that the figure may not provide a complete explanation on its own, it is intended to complement the text and mathematical content presented in the main text. In the revised version we have added the explanation of Figure 2B.

      Aside from the analytical result, most of the paper consists in simulating networks with distinct inhibitory cyclic structure to validate the theoretical argument. I do not find this approach particularly convincing due to the qualitative nature of the numerical results presented. There is little quantitative analysis of the network structure in relation to the emergence of oscillations. It is also hard to judge whether the examples discussed are cherry picked or truly representative of a large class of dynamics.

      The reviewer has a valid concern about numerical simulations and qualitative nature of the results. We would like to provide some perspective on our approach.

      In our paper, the primary focus is on the mathematical proof, which rigorously establishes the existence of our results. However, we understand that numerical simulations are valuable for illustrating the applicability of the theoretical framework and providing insights into the practical implications.

      If we get into the quantitative description of all the results, the manuscript will become prohibitively long. We acknowledge that there is a balance to be struck between theory and numerical examples in a research paper. We believe that, in conjunction with the mathematical proof, the numerical simulations serve the purpose of illustrating the existence of our results in specific examples. While we cannot provide an exhaustive exploration of all possible network structures, we have chosen representative cases to demonstrate the applicability of our findings. Some of these are already provided in figure supplements S3-1 and S3-3. In the absence of specific suggestions from the reviewer we would like to leave it as is.

      Moreover, the authors apply their cycle analysis to real-world networks by considering cycles of inhibitory nodes independently, whereas the same nodes can belong to several cycles. I find it hard to believe that considering these cycles independently should be enough to make predictions about the emergence of oscillations, as these cycles must interact with one another via shared nodes. I do not understand the color coding used to mark distinct cycles in supplementary figures. There is also not enough information to understand figures in the main text. For instance, I do not understand what the grids are representing in panel B, Figure 4.

      We have clarified the color coding and added more information to understand the figures. We appreciate the reviewer’s concern about our application of cycle analysis to real-world networks and the clarity of our figures. It is not a matter of belief – we have provided a mathematical proof and complemented that with illustrative examples from real-world networks i.e. cortico-basal ganglia network with both rate-based and spiking neurons. Clearly our results hold.

      Regarding the color coding in supplementary figures, we have revised the color scheme to make it more intuitive and informative in caption of figure 4: we use different colors to mark potential oscillators in each motif in BG, and each color means an oscillator from panel a. For more details, see figure supplements 4-1–4-6. The colors now represent distinct cycles more clearly, helping readers better interpret the figures.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the forces that shape the genomic landscape of transposable elements. By exploiting both long-read sequencing of mutation accumulation lines and in vivo transposition assays, the authors offer compelling evidence that structural variation rather than transposition largely shapes transposable element copy number evolution in budding yeast. The work will be of interest to the transposable element and genome evolution communities.

      Public Reviews:

      Reviewer #1 (Public Review):

      Henault et al build on their own previous work investigating the longstanding hypothesis that hybridization between divergent populations can activate transposable element mobilization (transposition). Previously they created crosses of increasing sequence divergence, using both intra- and inter-species hybrids, and passaged them neutrally for hundreds of generations. Their previous work showed that neither hybrids isolated from natural environments nor hybrids from their mutation accumulation lines showed consistent evidence of increased transposable element content. Here, they sequence and assemble long-read genomes of 127 of their mutation-accumulation lines and annotate all existing and de novo transposable elements. They find only a handful of de novo transposition events, and instead demonstrate that structural variation (ploidy, aneuploidy, loss of heterozygosity) plays a much larger role in the transposable element load in a given strain. They then created transposable element reporter constructs using two different Ty1 elements from S. paradoxus lineages and measured the transposition rate in a number of intraspecific crosses. They demonstrate that the transposition rate is dependent on both the Ty1 sequence and the copy number of genomic transposable elements, the latter of which is consistent with what has been observed in the literature on transposable element copy number control in Saccharomyces. To my knowledge, others have not directly tested the effect of Ty1 sequence itself (have not created diverse Ty1 reporter constructs), and so this is an interesting advance. Finally, the authors show that mitotype has a moderate effect on transposition rate, which is an intriguing finding that will be interesting to explore in future work.

      This study represents a large effort to investigate how genetic background can influence transposable element load and transposition rate. The long read sequencing, assembly, and annotation, and the creation of these reporter constructs are non-trivial. Their results are straightforward, well supported, and a nice addition to the literature.

      The authors state that the results from their current work support results taken from their previous study using short-read sequencing data of the same lines. The argument that follows is whether the authors gained anything novel from long-read sequencing. I would like to see the authors make a stronger argument for why this new work was necessary, and a more detailed view of similarities or differences from their previous study (when should others choose to do long read vs. short read of evolved lines?).

      We thank the reviewer for the suggestion. While we initially aimed to justify the relevance and novelty of the current in relation to our previous study, we understand that this justification may not have been strong enough.

      In the second paragraph of the introduction, we explain how the multidimensional nature of TE load makes it more complex to characterize that simply reporting the abundance of a given TE family in a given genome. We added the following concluding sentence to further emphasize the importance of long reads in TE-focused genome inference:

      “As such, ongoing technological and computational advances in genome inference, including long-read sequencing, will certainly be key to getting a detailed understanding of the dynamics of TEs and the underpinning evolutionary forces.”

      In the penultimate introductory paragraph, we summarize our previous work from 2020 and highlight that the evolution of Ty contents in MA lines was inferred from aggregate measures of genomic abundance of TE families using short reads. We then make the point that combinations of multiple SVs could affect the landscape of TEs in ways that are not reflected by crude short-read measures. We added the following sentence to further emphasize this point and contrast it with the necessity of using more powerful methodologies for genome resolution:

      “Under this scenario, measuring Ty family abundance would yield no significant net change, and the dissection of the underlying SVs using short reads could often be challenging.”

      Relatedly, the authors should report the rates of structural variants that they observe. How are these results similar/different from other mutation-accumulation work in S. cerevisiae?

      Since this work does not attempt to provide an exhaustive report of all the SVs in the MA lines, but rather focus on attributing an SV type to individual loci occupied by TEs, we cannot include these estimates, excepted for de novo transposition itself (see below). We added the following sentence to the Results section on the classification of Ty loci by SV types:

      “We note that the current methodology does not aim at providing an exhaustive quantification of all SVs in the MA lines, as previously done for some SV types (Marsit et al., 2021), but focuses solely on loci containing Ty elements.”

      We added estimates of the average retrotransposition rate in the MA experiment based on the number of de novo insertions detected in the MA lines genomes.

      Figure 4:

      “The average retrotransposition rates estimated from the counts of de novo insertions (per line per generation per element) are the following: CC1, 1.0✕10-5; CC2, 4.9✕10-6; CC3, 7.6✕10-6; BB1, 1.5✕10-5; BC2, 1.7✕10-5; BA1, 6.5✕10-6; BA2, 2.2✕10-5; BSc1, 3.6✕10-5.”

      We added the following paragraph in the Discussion section to specifically discuss these estimates in relation to the in vivo measurements.

      “We note that while the CC crosses tend to have the lowest retrotransposition rates as estimated from the de novo insertions (~1✕10-5 per line per generation per element; Figure 4), these values are several orders of magnitude higher than the in vivo measures in SpC backgrounds. The discrepancy between these estimates could be due to uncharacterized biases inherent to each method. They could also be linked to differences between the parental genotypes used to generate the MA crosses and the fluctuation assays. One major difference is the use of ade2 genotypes in the MA parents, a strategy that was initially adopted to provide a marker for the loss of mitochondrial respiration (Joseph and Hall, 2004; Lynch et al., 2008). It has been shown that the induction of adenine starvation through minimal adenine concentration in the medium and deletion of ADE2, which inactivates the adenine de novo biosynthesis pathway, increases Ty1 transcript levels (Todeschini et al., 2005), resulting in higher transposition rates. Rich complex medium like the one that was used for the MA experiment (YPD) can exhibit substantial variation in adenine concentration (VanDusen et al., 1997), and adenine can quickly become the limiting nutrient for ade2 strains (Kokina et al., 2014). Thus, we cannot exclude that the choice of initial ade2 genotypes could have inflated the transposition rates in the MA experiment.”

      Since the authors show a small, but consistent influence of mitotype on transposition rates, adding further evidence for the role of mtDNA in regulating transposition, I'm curious what the transposition rate of a p0 strain is. I think including these results could make this observation more compelling.

      We agree that measuring in vivo transposition rates in ρ0 backgrounds would be an interesting avenue. However, there is a large distinction between having non-functional mitochondrial respiration in ρ0 strains and inheriting diverse functional mtDNA haplotypes. The effects we show are all linked to the reciprocal inheritance of intact mtDNAs, producing ρ+ strains that are all respiration-competent, as shown by our growth confirmations on non-fermentable carbon sources for all the diploid backgrounds generated. While potentially interesting, adding transposition rates measures for the ρ0 backgrounds seems hard to justify in the context of our results.

      Reviewer #2 (Public Review):

      This is an interesting follow-up study that uses long-read sequencing to examine previously constructed mutation accumulation lines between wild populations of S. cerevisiae and S. paradoxus. They also complement this work with reporter assays in hybrid backgrounds. The authors are attempting to test the hypothesis that hybridization leads to genome shock and unrestrained transposition. The paper largely confirms previous results (suggesting hybridization does not increase transposition) that are well cited and discussed in the paper, both from this group and from the Smukowski Heil/Dunham group but extends them to a new set of species/hybrids and with some additional resolution via the long read sequencing. The paper is well written and clear and I have no serious complaints.

      In the abstract, the authors make three primary claims:

      Structural variation plays a strong role in TE load.

      Transposition plays only a minor role in shaping the TE landscape in MA lines.

      Transposition rates are not increased by hybridization but are affected by genotype-specific factors.

      I found all three claims supported, albeit with some minor questions below:

      Structural variation plays a strong role in TE load.

      Convinced of this result. However:

      Line 185-187/Figure 3C: I'm curious given that the changes in Ty count are so often linked to changes in gross DNA sequence whether the count per total DNA sequence is actually changing on average in these genomes. Ie., does hybridization tend to increase TE count via CNV or does hybridization tend to increase DNA content in the MA lines and TEs come along for the ride?

      The Ty content definitely “rides along” with the rest of the genome that is affected by retrotransposition-unrelated SVs. To further highlight this point, we added a panel (E) to Figure 3 in which we correlate the net Ty copy number change (same as panel D, formerly C) to the corresponding genome size, which reflects the amount of DNA lost/gained by all SV types. We added the following to the results section:

      “The distributions of net Ty CN change per MA line showed that most crosses had significant gains (Figure 3D), suggesting that Ty load can often increase as a result of random genetic drift. Some (but not all) of these crosses also exhibited significant increases in genome size after evolution (Supplemental Figure S7A). The net Ty CN changes per MA line subgenome were globally correlated to the corresponding changes in subgenome size (Figure 3E). Even after excluding polyploid lines (which have the largest changes in both Ty CN and genome size), we found a significant relationship between the two variables (mixed linear model with random intercepts and slopes for MA crosses, P-value=3.71✕10-9; Supplemental Figure S7B), indicating that SVs affecting large portions of the genome have a substantial impact on the Ty landscape.”

      One question about ploidy (lines 175-177):

      Both aneuploidy and triploidy seem easy to call from this data. A 3:1 tetraploidy as well. However, in Figure 2B there are tetraploids that are around the 1:1 line. How are the authors calling ploidy for these strains? This was not clear to me from the text.

      This detail was indeed missing from the manuscript. The ploidy level of all MA lines was previously measured by DNA staining and flow cytometry, and the ploidy level of the subgenomes of each polyploid MA line was previously inferred from short-read sequencing. We modified the figure captions and the main text to include this along with the corresponding references:

      Figure 2:

      “The ploidy level of each line was previously determined by DNA staining and flow cytometry (Charron et al., 2019; Marsit et al., 2021).”

      Main text:

      “The ratio of classified bases per subgenome was consistent with the corresponding ploidy levels: triploid BC lines had two copies of the SpC subgenome, while tetraploid lines had both SpC subgenomes duplicated (Charron et al., 2019; Marsit et al., 2021) (Figure 2B).”

      “Finally, we used the ploidy level of each MA line subgenome as previously measured by flow cytometry and short-read sequencing (Charron et al., 2019; Marsit et al., 2021).”

      Reviewer #3 (Public Review):

      Henault et al. address the important open question of whether hybridization could trigger TE mobilization. To do this they analysed MA lines derived from crosses of Saccharomyces paradoxus and Saccharomyces cerevisiae using long-read sequencing. These MA lines were already analysed in a previous publication using Illumina short-read data but the novelty of this work is the long-read sequencing data, which may reveal previously missed information. It is an interesting message of this study that hybridization between the two species did not lead to much TE activity. Due to this low activity, the authors performed an additional TE activity assay in vivo to measure transposition rates in hybrid backgrounds. The study is well written and I cannot spot any major problems. The study provides some important messages (like the influence of the genotype and mitochondrial DNA on transposition rates).

      Major comments

      • What I miss the most in this work is the perspective of the host defence against TEs in Saccharmoces. Based on such a mechanistic perspective, why do the authors think that hybridization could lead to a TE reactivation? For example, in Drosophila small RNAs important for the defence against a TE, are solely maternally transmitted. Hybrid offspring will thus solely have small-RNAs complementary to the TEs of the mother but not to the TEs of the father, therefore a reactivation of the paternal TEs may be expected. I was thus wondering, what is the situation in yeast. Why would we expect an upregulation of TEs? Without such a mechanistic explanation the hypothesis that TEs should be upregulated in hybrids is a bit vague, based on a hunch.

      We agree with the reviewer that in the first version of the manuscript, the justification for the investigation of the reactivation hypothesis in the first place was not self-sufficient and relied too much on our previous work, upon which this article builds. We extensively remodeled the introduction to better justify the investigation of this hypothesis in the context of the current knowledge on the regulation of Ty elements in Saccharomyces.  

      Reviewer #1 (Recommendations For The Authors):

      It's interesting that the net change in transposable element copy number in mutation accumulation lines is either insignificant or gain, and never a significant loss. I think this could make a nice discussion point regarding the roles of drift and selection on TE load.

      We thank the reviewer for the suggestion and agree that this is an interesting perspective that we did not explore in the first version of the manuscript. We thus included a short discussion point in the Results:

      “The distributions of net Ty CN change per MA line showed that most crosses had significant gains (Figure 3D), suggesting that Ty load can often increase as a result of random genetic drift.”

      We also added the following paragraph to the discussion section:

      “Our experiments illustrate how under weakened natural selection efficiency, TE load can increase in hybrid genomes by the action of transposition-unrelated SVs. This offers a nuanced perspective on the classical interpretation of the transposition-selection balance model (Charlesworth et al., 1994; Charlesworth and Langley, 1989), in which increased TE load would be predominantly driven by the relaxation of purifying selection against TE insertions generated by de novo transposition. Our results suggest that SVs arising in the context of hybridization can act as a significant source of TE insertion polymorphisms which natural selection can purge more or less efficiently, depending on the population genetic context. This is closely related to the idea that sexual reproduction could favor the spread of TE families, contributing to their evolutionary success (Hickey, 1982; Zeyl et al., 1996). Since the insertion polymorphisms that contribute to increase TE load mostly originate from standing genetic variation, they could be less deleterious and thus harder for natural selection to purge efficiently.”

      The point about the role of LOH in TE load is cool!

      We thank the reviewer for their enthusiasm, it is one of our favorite results as well.

      Figure 1: Add a figure component of the green box and label it Ty1 or TE.

      We modified Figure 1 accordingly.

      Figure 2C: what is the assembly size ratio?

      We added the following sentence to the figure caption to clarify what we define as assembly size ratio:

      “Assembly size ratio refers to the ratio of subgenome assembly size to the corresponding parental assembly size.”

      Something cut off in the N50 plot axis

      Unfortunately, we can’t seem to understand what the reviewer meant with this comment, nothing seems cut out of the figure panel 2C in any of our versions of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      These are all minor comments/suggestions that the authors can take or leave.

      Line 42: "fuels" should be "fuel".

      Since the verb refers to “source” and not “variants”, we believe it should be at the third person singular.

      Line 43: unclear what the authors mean by "regroup".

      We understand how this phrasing may sound strange. We modified the sentence accordingly:

      “Structural variation is a term that encompasses a broad variety of large-scale sequence alterations”

      Line 51-52: There are a couple of really nice papers that could be cited here from Anna Selmecki's group (Todd et al. 2020, Todd and Selmecki 2019, both in eLife).

      We thank the reviewer for the suggestions, we included some of these references in the manuscript.

      Figure 1: This is a nice cartoon! I'd suggest spelling out LOH here for a truly naive reader.

      We modified the Figure 1 accordingly.

      Figure 3A: One thing that is slightly lost here in the presentation is the relative frequency of the different events because of the changing scales across 3A. I can see why you want to do it this way, but would consider whether there may be a way to present this that makes it more obvious how much more frequent polyploidy is than excision for example.

      We agree with the reviewer that the focus of this visualization is to compare crosses and individual MA lines within SV types, and fails to display the relative importance of each SV type. We solved this by including an additional panel (new 3A) that shows how the number of Ty loci affected by each SV type scales in comparison to others.

      Figure 5: I'm not a fan of the gray bars highlighting the individual strains. This made the graph less intuitively readable for me.

      We tend to agree with the reviewer and rolled back to a previous version of Figure 5 that was lighter on annotations.

      One thing I would like to see in the future from this data (definitely not in this paper) is genome rearrangements within these hybrid MA lines. How often are there structural changes and how often are those changes mediated by repeats including TEs?

      We completely agree with the reviewer that this would be a very interesting avenue, with a distinct (and likely higher) set of challenges at the analysis level compared to simply focusing on TE sequences like we did here. We hope to be able to tackle this goal in the future of this project.

      Reviewer #3 (Recommendations For The Authors):

      • I'm not from the yeast field. But why this focus on the Ty-load? Are Ty's the only active TEs in yeast? Provide some background on the TE landscape in yeast and a justification for focusing on Ty's.

      We agree with the reviewer that this point was only implicit in the introduction. We modified the introductory segment on Saccharomyces yeasts to mention that Ty retrotransposons are the only TEs found in these genomes, thus explaining the exclusive focus on them. It now reads as follows:

      “In the case of Saccharomyces cerevisiae, the only TEs found are five families of long terminal repeat (LTR) retrotransposons families named Ty1-Ty5 (Kim et al., 1998).”

      • 56 I would argue that Petrov et al 2003 is not the best citation for arguing that TEs can lead to genomic rearrangement through ectopic recombination. Petrov solely showed that some long TE families are at lower population frequency than short TE families ones. This could be due to many reasons (e.g. recent activity of long TEs - mostly LTRs) but Petrov interpreted the data as being due to ectopic recombination. Petrov, therefore, did not demonstrate any direct evidence for the involvement of ectopic recombination.

      We agree with the reviewer that this reference is not the best choice to simply support the role of TEs in generating ectopic recombination events and modified the references accordingly.

      • For the assembly the authors used two steps 1) separate the reads based on similarity to a subgenome 2) and assembly the reads from the resulting two sets separately. This is probably the only viable approach, but I'm wondering if this step can lead to some biases (many reads may not be assigned to one sub-genome or assigned to the wrong sub-genome). An alternative, possibly less biased approach, would be to use one of the emerging assemblers that promise to assemble sub-genomes. Maybe discuss why this approach was not pursued.

      We completely agree that our method has some level of bias. We adopted it because it seemed the most appropriate to answer our question, which required to resolve individual TE insertions at the level of single haplotype sequences. One specific challenge of this dataset is that we have a relatively wide range of nucleotide divergence between parental subgenomes in the different MA crosses, from <1% to ~15%. The efficiency of haplotype separation from tools that are not necessarily designed to be tunable with respect to the level of nucleotide divergence seemed uncertain, which is why we opted for a custom methodology. Although read non-classification remains a problem that is hard to solve (and would remain so using orthogonal strategies), we believe that read misclassification is minimized by our stringent criteria for read classification. The goal of this study was not to develop a tool nor to benchmark our approach against existing diploid assembly tools. It yielded phased genome representations that were of sufficient completeness and contiguity to confidently answer our questions, and we believe that pushing the discussion towards technical considerations would fall outside of our main objective.

      • The authors used a decision tree to classify Ty loci. What were the training data? How were the trees validated? Decision tree is a technical term for a classifier in machine learning. I do not think the authors used machine learning in this work, but rather an "an ad-hoc set of rules". The term decision tree in this study is misleading.

      We believe that the term “decision tree” can simply refer to a hierarchy of conditional rules implemented as a classification algorithm. As the reviewer pointed, it is clear from the manuscript that none of the analyses performed include any form of training or fitting of a machine learning classifier. However, we agree that its specific reference to the machine learning classifier can create unnecessary confusion. We thus agree to remove this term from the manuscript and replaced all its instances by “a hierarchy of binary rules”.

      • 272: as it is the CNC explanation does not make a lot of sense to me; some information is missing, is p22 expression increasing with copy numbers?

      Yes, p22 expression correlates positively with the CN of p22-expressing Ty1 elements.

      Why are the two alternative downstream codons important?

      We thought it would be useful to mention the two start codons at this point because later in the discussion, we bring the conservation of the first start codon as an observation consistent with the putative expression of p22 in S. paradoxus. We also thought that it helped clarify the mechanism by which the N-truncated version of the protein is expressed.

      p22 interferes with assembly viral particles when in high copy numbers, but what happens when at low copy numbers, is it essential for retroviral activity? Is it even necessary for the virus or just some garbage product (they mention N-truncated).

      To our knowledge, these questions regarding the potential molecular functions of p22 outside of a retrotransposition restriction factor are still open. We added details to the background on CNC in the Introduction and Results section to help clarify some the points raised:

      Introduction:

      “The best known regulation mechanism in yeast is termed copy number control (CNC) and was characterized in the Ty1 family of S. cerevisiae. This mechanism is a potent copy-number dependent negative feedback loop by which increasing the CN of Ty1 elements strengthens their repression (Czaja et al., 2020; Garfinkel et al., 2003; Saha et al., 2015).”

      Results:

      “The mechanism of negative copy-number dependent self-regulation of retrotransposition (CNC) was characterized in the Ty1 family of S. cerevisiae (Garfinkel et al., 2016). This mechanism relies on the expression of an N-truncated variant of the Ty1 capsid/nucleocapsid Gag protein (p22) from two downstream alternative start codons (Nishida et al., 2015; Saha et al., 2015). p22 expression scales up with the CN of Ty1 elements that encode it (Tucker et al., 2015), which gradually interferes with the assembly of the viral-like particles essential for Ty1 replication (Cottee et al., 2021; Saha et al., 2015). Thus, CNC yields a steep negative relationship between the retrotransposition rate measured with a tester element and the number of Ty1 copies in the genome (Garfinkel et al., 2003; Tucker et al., 2015).”

      • mtDNA influences transposition, is anything known about the mechanism?

      When presenting this result, we make it clear that this finding is not new and was previously observed in S. cerevisiae x S. uvarum hybrids by Smukowski-Heil et al. (2021). In this reference, the authors discuss multiple mechanisms by which mitochondrial biology and mito-nuclear interplay may affect transposition rate, although their data cannot support one specific hypothesis. Our data does not to allow to further dissect the mechanistic basis of the mtDNA effect, not more than the effect of distinct Ty1 natural variants. Since we simply provide new independent evidence for the mtDNA effect, it seems to us that repeating the discussion on putative mechanisms while bringing no support to any given hypothesis would be of limited relevance.

      • During the first reading, I got quite confused about what CN means (copy number as it turned out). I suggest using abbreviations only if absolutely necessary, and I'm not entirely convinced it is necessary here. But I leave this to the discretion of the authors.

      We agree that the excessive use of abbreviations in manuscripts is annoying. However, in this case, “copy number” is used so extensively that its abbreviation seemed to improve the reading experience. Thus, we would prefer to keep it unchanged.

      • Fig 3D: Wilcoxon Rank sum test. It is not clear to me what was tested here? Which data were used?

      We confirm that the statistical test employed is the Wilcoxon signed-rank test, and not the Wilcoxon rank-sum test (also known as Mann-Whitney U-test). The Wilcoxon signed-rank test is used here as a non-parametric one-sample test against the null hypothesis that the distribution is centered around zero.

      • de novo -> italics

      We choose to follow the recommendation of the general style conventions of the ACS guide for scholarly communications not to italicize common Latin terms like “de novo”, “e.g.” and “i.e.”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The reviewers make some suggestions aimed towards increasing the clarity of the manuscript, and I suggest that the authors examine those carefully. In particular, the figure is difficult to read and could contain additional information to help the reader's interpretation. For example, Reviewer 1 suggests including sample age estimates alongside depth, while Reviewer 3 also notes that there is missing information in the figure. Apart from the figure, Reviewer 1 suggests two additional analysis to help explain the amount of mammoth DNA recovered, which they observe is much higher than previous similar investigations. This would seem to be an important issue to address, given the surprising nature of the findings. In addition to this larger issue, the Reviewer makes a few important suggestions for supplementary material that may be needed to support the authors' statements.

      Some additional recommended edits -- in particular to the text and included references to related studies -- are suggested by Reviewers 2 and 3, and both commented on the lack of a publicly-available data repository. The authors may also wish to comment on or revisit their differential treatment of wooly mammoth vs. wooly rhinoceros samples, though I suspect this has more to do with low read numbers for the rhinos.

      Thank you very much for the positive assessment of our manuscript and clear suggestions for revision. We address these points below.

      Reviewer #1 (Recommendations For The Authors):

      I have a few suggestions that might further improve the manuscript:

      It is difficult for the reader to follow which core slices exactly have been sampled and sequenced. The authors mention 23 samples were taken from core LK-001 and 16 samples from core LK-007. From the text it remains unclear to me what the exact age of each of these samples is. Figure 1 shows the depth at which the LK-001 core was sampled, maybe sample age estimates could be included here.

      Thanks for pointing this out. We have added approximate ages to Figure 1, added the depth range to the text (“from 1.5 to 80 cm”; l. 73-74, caption Figure 1), and reworked the table of the sampling depths in the supplement.

      Line 84-87. The authors mention the retrieval of DNA from several expected Arctic taxa, however no further data regarding these findings is given in the manuscript. It would be useful to report the same numbers for these species as the ones given for the Mammuthus and woolly rhinoceros, which would allow for a comparison of the relative abundance of the DNA between these species. Are the expected Arctic species for instance at much higher (DNA) abundance in the samples? It would also be interesting to know if the authors discovered DNA from extant species that are unlikely to have occurred in the geographic region. A (supplementary)table listing the number of mapped reads to each of the respective mitogenomes for each sequence library would be useful for the reader.

      We added a supplementary table (S8) indicating the numbers of reads assigned to mammals.

      Line 90: I am somewhat amazed by the amount of mammoth DNA the authors recovered from these cores. A total depth of over 400X of the mitogenome is quite extraordinary and I am not aware of any ancient sediment study to date that has retrieved a similar amount of data. For instance, the Wang et al. 2021 paper, which the authors cite, sequenced over 400 samples and did not find any mammoth DNA in 70% of those. For the 30% of samples showing signs of mammoth DNA they retrieved on average 530 sequence reads. In this study the authors find on average ~20.000 reads, in 22 out of the 23 sequence libraries. This makes me wonder if the way the mapping was performed has been too lenient, resulting in possible spurious mappings? To really confirm the authenticity of the mammoth (and woolly rhino data) I would suggest two additional analysis:

      1) Mapping all the sequence libraries to a reference consisting of the complete Asian-elephant genome (for instance https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_024166365.1/), the complete human genome (+mitogenome) and the Asian elephant mitogenome. This could possibly reduce spurious mappings as conserved regions between the genomes are filtered out and could also reduce the possible mapping of NUMTS. If the authors could show that after such a mapping approach a significant number of reads are still assigned to the Asian elephant part (including the mitogenome) of the reference, the reported findings would be strengthened.

      2) I also suggest to construct a mitochondrial haplotype network from the obtained DNA, while also including previously published Asian and African elephants as well as previously published mammoth mitogenomes. If the obtained haplotypes indeed show that they cluster within the known haplotype diversity of mammoth, that would be strong support for the authenticity of the data

      The same analysis could be considered for the woolly rhino data, although the lower read numbers might make this analysis challenging.

      We agree that the amount of mammoth DNA is surprising, which is why we opted for further laboratory experiments for confirmation of the hybridization capture results of the first core, i.e., 1) DNA extraction from a second core of a different lake, 2) a quantitative PCR approach (ddPCR), and 3) metabarcoding. Our results of the highly specific ddPCR and metabarcoding assays confirmed considerable amounts of mammoth DNA in two sediment cores of different lakes, thus we have no doubts regarding the authenticity of the data. Considering the large amount of mammoth DNA, the high number of reads, and particularly the high mitogenome coverage, we argue that the effect of some spurious mapping is negligible and does not affect the main outcome and conclusions of our study. Although we agree that a haplotype network would be interesting, such analyses would stretch beyond the focus of this publication.

      Line 91: The authors mention negative controls (extraction and library blanks) did not produce any reads assigned to mammals. This is quite remarkable, as in my experience low levels of (human)contamination are almost always present in the blanks. Could the authors comment on why they think the blanks did not show any signal of mammalian DNA?

      The hybridization capture enrichment and the filtration and mapping procedures likely eliminated human contamination. Also, the data were mapped against Arctic mammal mitogenomes, which did not include human reference sequences. However, six of the sediment samples contained human sequences (now shown in supplementary table S8), albeit at low read counts (mean = 65)

      Line 97: "mapping suggested that the sequences throughout the core originated from multiple individuals" The authors do not provide any supporting data showing this. I think that an analysis (for instance based on allele frequencies) has to be included in manuscript to support this claim.

      We agree that his claim was not sufficiently supported. We performed further analyses including genomic data of previously retrieved mammoth remains and assigned our data to these haplogroups; the results were added to the main text and are shown as a figure (Fig. 2).

      Line 98: "Signatures of post-mortem DNA decay were comparably minor."

      Do the authors know if the used hybridisation enrichment method can distort the measurement of post-mortem damage? Are for instance reads with C-T substitutions less likely to be captured by the baits?

      To our knowledge, there is no study suggesting that damaged sites are less likely to be captured. In general, the hybridization capture procedure is not overly specific, and studies report that DNA is readily and preferentially captured as long as the difference between baits and DNA is not above 10%.

      Line 100: "The proportions of bases did not suggest a substantial deviation from those in the reference genomes or in the closest extant relative of Mammuthus, the Asian elephant (Elephas maximus)."

      It is not clear to me what the authors mean by this. Could the authors explain how this was measured and what their interpretation of this result is?

      We realize that the sentence was unclear. We meant that the nucleotide composition was similar to that of the reference genomes or the closest extant relative. However, as we do not consider this important for the argument, we have removed this sentence from the manuscript.

      Given the high number of recovered mammoth reads in the samples, it would be interesting to know how much mammoth reads are present in the sample before enrichment capture with the baits. Shotgun sequencing the raw extract of one of the samples with the highest number of mammoth reads might allow for a rough estimate of mammoth DNA abundance compared to the other extant species (e.g. reindeer, Arctic lemming and hare) found in the sample(s). This could give further clarification about the extent of stratigraphy disturbance and its overall effect on the DNA based community reconstruction. However, this is just a suggested additional analysis and not something I believe crucial for supporting the overall findings in this manuscript.

      We fully agree that this would be a highly interesting and informative additional analysis to perform. It was, however, not possible to perform this additional analyses in the course of the current experiments.

      Finally, I could not find a public link to the (sequence)data produced in this study. I strongly encourage the authors to make their data publicly available.

      Thank you for pointing this out. We have added a Data Availability paragraph, including the respective reference.

      Reviewer #2 (Recommendations For The Authors):

      In the Discussion it is mentioned that the reasons for Mammoth extinction are not entirely clear but are largely attributed to sudden climate warming (and add some relevant citations). However, there is also abundant literature that suggest humans also played a role in their extinction (for instance, a recent one, Damien et al. (2022) at Ecology Letters 25: 127-137).

      We agree with the reviewer and have added some the recent citation highlighting the possible influence of humans.

      One possibility to add further interest to this paper would be to conduct a phylogenetic tree with the Mammoth mitogenome(s) retrieved and a reference dataset; it could be interesting to know where do they fall in the phylogeny -already abundant with tens of individuals- and maybe it could be even possible to roughly estimate their date. There are some papers that report many Mammoth mitogenomes, including of course some from Siberia; for instance Chang et al. (2017) at Sci Reports and also Fellow Yates et al. (2017) also at Sci Reports (the latter mainly from Central Europe).

      We are well aware of the amount of mt genomes available for mammoth, and such an analyses would be an interesting addition, potentially also offering the possibility to date the DNA. However, the analyses was hampered and would be less secure for this dataset, as our sequences display quite some variation among each other, suggesting that we have a mix of multiple mt genomes, which we cannot readily distinguish. We thus refrain from this, also because we instead provide multiple lines of evidence for the existence of the mammoth DNA in the surface sediment core (metabarcoding, ddPCR).

      Minor points:

      -Correct wooly to woolly

      Revised.

      -In the sampling description it is not totally clear if the samples were taken at 1 cm each (it is mentioned that core LK-001 is sliced in the field at 1-cm steps for radiometric dating and later it is explained that 23 samples were analyzed from this core, but it is unclear if they represent 23 cm of core)

      -Maybe the authors could briefly define some terms such as "talik"

      Revised.

      Reviewer #3 (Recommendations For The Authors):

      Maybe I missed this but I could not find a data availability statement or the location of the repository

      We have added a Data Availability paragraph, including the respective reference.

      It would be good to see some additional analysis on the distribution of the woolly rhinoceros DNA through the sediment core - like the figure for the mammoth i.e read numbers vs depth.

      We have added to the supplements a table showing the numbers of assigned mammal reads over the core depths (Table S8). However, as rhinoceros reads are considerable rarer in our results, we did not produce a figure.

      Would it be possible to be more explicit about the multiple mammoth individuals, could you calculate a minimum number or haplotypes for example.

      We agree that his claim was not sufficiently supported and added results from additional analyses (incl. Fig. 2). Please see our response above.

      Based on the aim stated in the introduction, the analysis of the Arctic biodiversity of this area is missing, it would be nice to see these result added or maybe the focus needs to be changed for clarity.

      We now explicitly state that this objective pertains to a different study, which is currently still in preparation for publication.

      The single main figure needs a bit more consideration. For example in panel A - there was no information on the transformation performed or what the general trend line refers to. Do the results in panel B refer to all 22 libraries? What is the x-axis in Panel C and what do the coloured lines refer to? Additionally, I think the figure needs to be in higher resolution with increased text size on all axes.

      We revised the figure and the caption for clarity and readability.

      Finally this might be an accidental typo - but when referring to the sample aged at around 8,677 years in text it states this the 36.5 cm sample (line 130 and 192), but the supplementary says this is the 51cm sample (Table S6). This would maybe impact potential conclusions. Would you be able to clarify this.

      Thank you for noting this error, we revised it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Answers to reviewers’ comments

      Peer Reviewers 2 and 3 criticized the name of the antibody – hvCADab - and the lack of proof that it recognized a classic cadherin. These criticisms were justified and in the intervening months the issue has been resolved. hvCADab does not recognize the cadherin protein, although it was made to an 18 amino acid sequence from the intracellular domain of the H. vulgaris cadherin protein. Newly available genome sequences from two other species, Hydra oligactis and Hydra viridissima, now show that the 18 amino acid antigen sequence is not present in these species.

      Nonetheless, the nerve net in both species is strongly stained by the antibody. Hence we have renamed the antibody PNab (pan-neuronal antibody). The antigen is currently not known. Nevertheless the antibody is an excellent reagent for imaging the nerve net in Hydra.

      We have revised the section on antibody preparation in Materials and Methods to state explicitly that PNab does not recognize classic cadherin. To support this conclusion we have added a sequence comparison (Suppl Fig 3) of the intracellular domains of classic cadherins from H. vulgaris, H. oligactis and H. viridissima, which show that the 18aa antigen sequence is only present in the H. vulgaris classic cadherin and not in the cadherin sequences from H. oligactis and H. viridissima. All three sequences have highly conserved p120/delta-catenin and beta-catenin binding domains. The sequence between these domains is highly variable and the 18aa antigen sequence used for antibody production is clearly not present in the H. oligactis and H. viridissima sequences.

      Both reviewers also criticized our evidence for pan-neuronal staining as inadequate. Hence we have now included additional data. We have stained a transgenic strain expressing NeonGreen under the control of a pan-neuronal alpha-tubulin promoter (Primak et al 2023). 684/684 transgenic nerve cells were stained with PNab. We consider this convincing evidence, in addition to the evidence presented previously, that PNab stains all nerve cells in Hydra. The first paragraph of Results has been revised to include these data.

      Reviewer 2 suggested moving gap junction/innexin data (Suppl Fig 3 and 4) from the Discussion to Results. These are indeed new results and we have followed this suggestion. Fig 12 (new) clearly shows gap junctions between neurites in bundles. It also shows that nerve cells in bundles express cell type specific innexins and hence can form cell type specific gap junctions. We have also added new images (Fig 11) of a transgenic Hym176B strain stained with PNab. These show that neurite bundles in the ectoderm contain neurites from different nerve cell types = neural circuits and hence that neurite links must be specific, e.g. gap junctions.

      As suggested by Reviewer 2 we have now provided a 3D interactive version of the block face SEM reconstruction (Suppl Fig 4). This shows that connections between neurites in bundles consist of thin overlapping fingers rather than “conventional” terminal contacts. It also shows that the purple neurite and extends past the green nerve cell body and does not end on it.

      Reviewer 2 suggested deleting discussion of possible functions for the endodermal nerve net (Discussion). We disagree with this suggestion. Our imaging results showed no connections between ectodermal and endodermal nerve nets. We also presented quantitative data for the absence of contact between the nerve nets in the gastric region. Consistent with our observations, Dupre and Yuste (2017) found no functional connection between the ectodermal and endodermal nerve nets based of neural activity measurements. Nevertheless, Giez et al (2023) in a recent preprint have described contact between specific endodermal and ectodermal nerve cells in the hypostome involved in the mouth opening response to glutathione. Both their observation and ours may be correct. The issue is not resolved. Hence we have included a discussion of possible functions for ectodermal and endodermal nerve nets. Importantly, our conclusions incorporate the difference in connectivity between muscle processes and nerve cells in the two nerve nets.

      Specific comments / Recommendations

      Reviewer 2

      Novelty: two preprints (Giez et al 2023) became available after the submission of our preprint. These include the results cited by the reviewer. These were not available to us at the time of submission.

      hvCADab has been re-named (see above). The differentiating nerve cell in Fig 11B is indeed stained by PNab. We have adjusted the intensities of red and green channels to show this more clearly.

      We consider the very clear black space between ectoderm and endoderm e.g. Fig 2B or Fig 4A to be an adequate marker for mesoglea. Use of an anti-mesoglea antibody would reduce the clarity of the image.

      It is always possible to look at more parts of Hydra tissue for possible nerve connections between ectoderm/endoderm. Nevertheless we provide the first quantitative data on the lack of contacts between 133 nerve cells (57 ectodermal and 76 endodermal) in the body column. Such data has not been previously available. And the EM result (Westfall 1973) cited by the reviewer is anecdotal at best. In later serial sectioning results on the hypostome/tentacle region from the Westfall lab no mention is made of nerve connections between the ectoderm and the endoderm. However, based on the results in the cited preprints (Giez et al) a closer examination of the hypostome/tentacle region in particular is warranted.

      To strengthen our conclusion that there are no contacts between the ectodermal and endodermal nerve nets, we now explicitly cite results from Dupre and Yuste (2017) on a calcium reporter strain demonstrating the absence of any crosscorrelation between the firing patterns of ectodermal RP1 network and the endodermal RP2 network. There was also no correlation between the activity of the second ectodermal nerve net CB and the endodermal RP2 network. These results demonstrate the absence of functional contacts between ectodermal and endodermal nerve nets.

      The reviewer criticizes the absence of trans-mesoglea links between ectodermal and endodermal epithelial cells in our EM images, e.g. Fig 9A. We can assure the reviewer that such links are frequently observed, although not in the image we chose for Fig 9A. This image, however, clearly documents two neurite bundles next to ectodermal muscle fibers.

      We agree with the reviewer that neurite bundles are an important discovery. And they raise the question of synaptic connections between neurites in bundles. Unfortunately, it is not possible to scan along the block face reconstruction (Fig 10) and count synapses. The resolution is not sufficient. Although scattered dense core vesicles (DCV) are observed in neurites, clustered DCV described by Westfall et al (1971) as synapses were not observed. We did, however, observe gap junctions between neurites in bundles (noted in Suppl Fig 3). These data have now been moved to the main body of the paper as Fig 12 together with the scRNAseq results on innexin gene expression in nerve cells. These results make it clear that neurites in bundles are connected via gap junctions and that these gap junctions are specific for neural circuits.

      The reviewer suggests that neurite bundles are an artifact of their interaction with muscle processes at the base of epithelial cells. We disagree with this statement. Muscle processes are temporary structures. They are withdrawn and reformed during every epithelial cell division, which occur approximately every three days. Bundles are almost certainly more stable structures. Furthermore, neurite bundles in the endoderm are distant from endodermal muscle fibers (Fig 4B and Fig 9D) and their polygonal pattern (Fig 2D) is completely different from the circumferential bands of endodermal muscle fibers.

      Reviewer 3

      Specific comments and suggestions have been answered above. Importantly, we show that the PNab antibody does not recognize cadherin and that it clearly stains all nerve cells in Hydra.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Dubicka and co-workers on calcification in miliolid foraminifera presents an interesting piece of work. The study uses confocal and electron microscopy to show that the traditional picture of calcification in porcelaneous foraminifera is incorrect.

      Strengths:

      The authors present high-quality images and an original approach to a relatively solid (so I thought) model of calcification.

      Weaknesses:

      There are several major shortcomings. Despite the interesting subject and the wonderful images, the conclusions of this manuscript are simply not supported at all by the results. The fluorescent images may not have any relation to the process of calcification and should therefore not be part of this manuscript. The SEM images, however, do point to an outdated idea of miliolid calcification. I think the manuscript would be much stronger with the focus on the SEM images and with the speculation of the physiological processes greatly reduced.

      Reply: We would like to give thanks for all of the highly valuable comments. Prior to our study, we were also convinced that the calcification model of Miliolid (porcelaneous) foraminifera was relatively solid. Nevertheless, our SEM imaging results surprisingly contradicted the old model. The main difference is the in situ biomineralization of calcitic needles that precipitate within the chamber wall after deposition of ACC-bearing vesicles. We agree that our fluorescence studies presented in the paper are not conclusive evidence for the calcification model used by the studied Miliolid species. However, our fluorescent results show that “the old model” (sensu Hemleben et al., 1986) is not completely outdated. Most of the fluorescent imaging data show a vesicular transport of substrates necessary for calcification. This transport is presented by Calcein labelling experiments (Movie 1 that show a high number of dynamic endocytic vesicles of sea water circulation within the cytoplasm. These very fine Calcein-labelled vesicles are most likely responsible for transport and deposition of Ca2+ ions. This is partly consistent with the model presented by Hemleben et al. (1986). We may speculate that calcite nucleation is already occurring within the transported vesicles, but at this stage of research we have no evidence for this phenomenon.

      Further live imaging fluorescence data show autofluorescence of vesicles upon excitation at 405 nm (emission 420–480 nm) associated with acidic vesicles marked by pH-sensitive LysoGlow84, may be a hint indicating association of ACC-bearing vesicles with acidic vesicles. Such spatial association of these vesicles may indicate a mechanism of pH elevation in the vesicles transporting Ca2+-rich gel to the calcifying wall of the new chamber.

      We will do our best to limit the physiological interpretation presented based on fluorescence studies in the revised version of the manuscript. We are convinced that our fluorescent live imaging experiments provide important observations in biomineralizing Miliolid foraminifera, which are still missing in the existing literature. It should be stressed that all the fluorescent experiments and SEM observations were based on specimens constructing and biomineralizing new chambers. All of them belong to the same species and come from the same culture. Due to the aforementioned reasons, it is worthwhile presenting these complimentary results of our study. In the future they may be helpful in further exploration and understanding of all aspects of calcification in foraminifera.

      Reviewer #2 (Public Review):

      Summary:

      Dubicka et al. in their paper entitled " Biocalcification in porcelaneous foraminifera" suggest that in contrast to the traditionally claimed two different modes of test calcification by rotallid and porcelaneous miliolid formaminifera, both groups produce calcareous tests via the intravesicular mineral precursors (Mg-rich amorphous calcium carbonate). These precursors are proposed to be supplied by endocytosed seawater and deposited in situ as mesocrystals formed at the site of new wall formation within the organic matrix. The authors did not observe the calcification of the needles within the transported vesicles, which challenges the previous model of miliolid mineralization. Although the authors argue that these two groups of foraminifera utilize the same calcification mechanism, they also suggest that these calcification pathways evolved independently in the Paleozoic.

      Reply: We would like to acknowledge the review and all valuable comments. We do not argue that Miliolida and Rotallida utilise an identical calcification mechanism, but both groups utilize less divergent crystallization pathways, where mesocrystalline chamber walls are created by accumulating and assembling particles of pre-formed liquid amorphous mineral phase.

      Strengths:

      The authors document various unknown aspects of calcification of Pseudolachlanella eburnea and elucidate some poorly explained phenomena (e.g., translucent properties of the freshly formed test) however there are several problematic observations/interpretations which in my opinion should be carefully addressed.

      Weaknesses:

      1) The authors (line 122) suggest that "characteristic autofluorescence indicates the carbonate content of the vesicles (Fig. S2), which are considered to be Mg-ACCs (amorphous MgCaCO3) (Fig. 2, Movies S4 and S5)". Figure S2 which the authors refer to shows only broken sections of organic sheath at different stages of mineralization. Movie S4 shows that only in a few regions some vesicles exhibit red autofluorescence interpreted as Mg-ACC (S5 is missing but probably the authors were referring to S3). In their previous paper (Dubicka et al 2023: Heliyon), the authors used exactly the same methodology to suggest that these are intracellularly formed Mg-rich amorphous calcium carbonate particles that transform into a stable mineral phase in rotaliid Aphistegina lessonii. However, in Figure 1D (Dubicka et al 2023) the apparently carbonate-loaded vesicles show the same red autofluorescence as the test, whereas in their current paper, no evidence of autofluorescence of Mg-ACC grains accumulated within the "gel-like" organic matrix is given. The S3 and S4 movies show circulation of various fluorescing components, but no initial phase of test formation is observable (numerous mineral grains embedded within the organic matrix - Figures 3A and B - should be clearly observed also as autofluorescence of the whole layer). Thus the crucial argument supporting the calcification model (Figure 5) is missing. There is no support for the following interpretation (lines 199-203) "The existence of intracellular, vesicular intermediate amorphous phase (Mg-ACC pools), which supply successive doses of carbonate material to shell production, was supported by autofluorescence (excitation at 405 nm; Fig. 2; Movies S3 and S4; see Dubicka et al., 2023) and a high content of Ca and Mg quantified from the area of cytoplasm by SEM-EDS analysis (Fig. S6)."

      Reply: We used laser line 405nm and multiphoton excitation to detect ACCs. These wavelengths (partly) permeate the shell to excite ACCs autofluorescence. The autofluorescence of the shells is present as well, but it is not clearly visible in movieS4 as the fluorescence of ACCs is stronger. This may be related to the plane/section of the cell which is shown. The laser permeates the shell above the ACCs (short distance), but to excite the shell CaCO3 around foraminifera in the same three-dimensional section where ACCs are shown, the light must pass a thick CaCO3 area due to the three-dimensional structure of the foraminifera shell. Therefore, the laser light intensity is reduced. In a revised version a movie/image with reduced threshold will be shown.

      2) The authors suggest that "no organic matter was detected between the needles of the porcelain structures (Figures 3E; 3E; S4C, and S5A)". Such a suggestion, which is highly unusual considering that biogenic minerals almost by definition contain various organic components, was made based only on FE-SEM observation. The authors should either provide clearcut evidence of the lack of organic matter (unlikely) or may suggest that intense calcium carbonate precipitation within organic matrix gel ultimately results in a decrease of the amount of the organic phase (but not its complete elimination), alike the pure calcium carbonate crystals are separated from the remaining liquid with impurities ("mother liquor"). On the other hand, if (249-250) "organic matrix involved in the biomineralization of foraminiferal shells may contain collagen-like networks", such "laminar" organization of the organic matrix may partly explain the arrangement of carbonate fibers parallel to the surface as observed in Fig. 3E1.

      Reply: We agree with the reviewer that biogenic minerals should, by definition, contain some organic components. We wrote that "no organic matter was detected between the needles of the porcelain structures” as we did not detect any organic structures based only on our FE-SEM observations. We are convinced that the shell incorporates a limited amount of organic matrix. We will rephrase this part of the text to avoid further confusion.

      3) The author's observations indeed do not show the formation of individual skeletal crystallites within intracellular vesicles, however, do not explain either what is the structure of individual skeletal crystallites and how they are formed. Especially, what are the structures observed in polarized light (and interpreted as calcite crystallites) by De Nooijer et al. 2009? The author's explanation of the process (lines 213-216) is not particularly convincing "we suspect that the OM was removed from the test wall and recycled by the cell itself".

      Reply: Thank you for this comment. We will do our best to supplement our explanations. We are aware of the structures observed in polarized light by De Nooijer et al. (2009). However, Goleń et al. (2022, Protist, https://doi.org/10.1016/j.protis.2022.125886) showed that organic polymers may also exhibit light polarization. Additional experimental studies are needed to distinguish these types of polarization. We will aim to investigate this issue in our future research.

      4) The following passage (lines 296-304) which deals with the concept of mesocrystals is not supported by the authors' methodology or observations. The authors state that miliolid needles "assembled with calcite nanoparticles, are unique examples of biogenic mesocrystals (see Cölfen and Antonietti, 2005), forming distinct geometric shapes limited by planar crystalline faces" (later in the same passage the authors say that "mesocrystals are common biogenic components in the skeletons of marine organisms" (are they thus unique or are they common)? It is my suggestion to completely eliminate this concept here until various crystallographic details of the miliolid test formation are well documented.

      Reply: Our intention was to express that mesocrystals are common biogenic components in the skeletons of marine organisms, however Miliolid needles that form distinct geometric shapes limited by planar crystalline faces are unique type of mesocrystals.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their valuable feedback and comments. Below we have addressed all points carefully and have, when needed, revised the manuscript accordingly.

      Note that we have taken the opportunity to correct minor typos and unclear text in the revised manuscript.

      Of importance to the editors and reviewers, we detected a few minor factual errors in the method section, which we have now corrected. The first error was that we wrongfully stated that our final dataset had 6358 unique TCRs, whereas it was in fact 6353 unique TCRs. The second error was that we stated that the maximum length of CDR1ꞵ was 5, where it was in fact 6. The last error was that we stated that we used a Levenshtein distance of at least 3 to discard similar peptides when swapping the TCRs to generate negatives. This should have been a Levenshtein greater than 3, to match the script we used to generate negatives (though no peptides had a Levenshtein distance of exactly 3).

      eLife assessment

      This important study reports on an improved deep-learning-based method for predicting TCR specificity. The evidence supporting the overall method is compelling, although the inclusion of real-world applications and clear comparisons with the previous version would have further strengthened the study. This work will be of broad interest to immunologists and computational biologists.

      It is not fully clear to us what is meant by “clear comparisons with the previous version”. In the manuscript we consistently compare the performance of each novel approach introduced to that of the ancestor NetTCR-2.1. Further, we concluded the manuscript with a performance to a large set of current state-of-the-art methods by training and evaluating the novel modeling framework on the IMMREP22 benchmark data.

      We agree that the manuscript can be improved by including a brief discussion of real-life applications of models for prediction of TCR specificity, and have included a brief text in the introduction.

      Reviewer #1 (Recommendations For The Authors):

      It was a great pleasure to read this article. All the concepts and motivations are clearly defined. I have just a few questions.

      What was the motivation behind employing a 1:5 positive-negative ratio? Could it be the cause of worse performance in the case of outliers?

      The ratio 1:5 is based on results from earlier work [36561755]. In this work, negatives were constructed as a mix of swapped and true (i.e measured) negatives with a ratio 1:5 for each. This work demonstrated a slight gain when including both types of negatives compared to only using swapped. In a subsequent publication [https://doi.org/10.1016/j.immuno.2023.100024], it demonstrated that optimal performance was obtained when only including swapped negatives (again in a ratio 1:5). Given this, we maintained this approach in the current work. It is clear that this choice is somewhat arbitrary, and that further work is needed to fully address this issue and the general issue of how to best generate negatives for ML of TCR specificity. Such work is in our view however beyond the scope of the current manuscript.

      Why is the patience of 200 epochs for peptide-specific models and 100 epochs for pan-specific and pre-trained models used in the context of the early stopping mechanism?

      We observed that the loss curve was overall very stable in the case of pan-specific training, likely due to the large amount of data included in this training. Therefore, these models were less likely to become stuck in a local minimum during training, meaning that a lower patience for early stopping would not prevent the model from learning optimally. In contrast, we found for some peptides that the loss curve was very erratic, and would sometimes become stuck in a local minimum for an extended time. To resolve this, the patience was increased from 100 to 200, which resulted in a better chance to escape these minima, as well as a better overall performance.

      Why is weight 3.8 used in the weighted loss function in the pan-specific model?

      The weighted loss was scaled with a division factor (c) of 3.8, in order to get an overall loss that was comparable to training without sample weights. This was primarily done to better compare the two approaches (scaling and no scaling) in terms of loss, and not so much to improve the training itself, as we already use a relatively conservative sample weight scaling based on log2. We have added a brief sentence to clarify this in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      This work is the evolution of previous studies that developed the NetTCR platform, and in a previous paper cited in this study, the authors explore the paired dataset approach with "paired α/β TCR sequence data". In this manuscript, the authors should make clear what advances were made when compared to the previous study. This is not clear, although extensive reference is made to NetTCR 2.0 and 2.1. Differences are scattered throughout the manuscript, so I would suggest a section or paragraph clearly delineating the advances in model architecture and training when compared to previous versions recently published.

      It is not clear to us when the reviewer is referring to when stating “the authors should make clear what advances were made when compared to the previous study”. Throughout the manuscript we consistently compare the performance of each novel approach introduced to that of the ancestor NetTCR-2.1. In addition, we briefly discuss all of the changes to the architecture and training at the start of the discussion section. Further, we concluded the manuscript with a performance to a large set of current state-of-the-art methods by training and evaluating the novel modeling framework on the IMMREP22 benchmark data. It is correct that the advances are described progressively by introducing each novel approach one by one, i.e. refining the machine learning model architecture and training setup, data denoising in terms of outlier identification in the training data, new model architectures combining the properties of a pan- and peptide-specific model, and integration of similarity based approach to boost model performance). We believe this helps better justify the relevance of each of the novel approaches introduced.

      In Figure 3, the colors have labels, but they are not explained in the legend or in the text. This makes it very difficult to understand the data in the various columns. Also, since it represents the Mean AUC, the data would be best displayed with a boxplot or a mean and bars for variance.

      We agree, and have changed Figure 3 and its corresponding AUC 0.1 figure (Supplementary Figure 1) into a boxplot. We also further clarified what the different models were in the figure text.

      Given the potential impact of this work on bioengineering and biotechnology, I would suggest adding a paragraph or section to the discussion where potential applications of the current model, or examples of applications of previous (or competing) models have been used to further biological research.

      We agree and have added a brief sentence in the introduction to outline biotechnological applications of models for prediction of TCR specificity.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Trenker et al. report cryo-EM structures of HER4/HER2 heterodimers and HER4 homodimers bound to Neuregulin-1b (Nrg1b) and Betacellulin (BTC). As observed for prior cryo-EM structures of full-length or near full-length HER-family receptors only the extracellular regions are visualized, presumably owing to flexibility in the relative orientation of extra- and intra-cellular regions. The authors observe no appreciable differences between Nrg1b and BTC bound heterodimers, both ligands, in this case being high-affinity ligands, and modest "scissor-like" differences in the subunit relationships in HER4 homodimers with Nrg1b and BTC bound.

      The authors also show that, as they showed for HER3, the HER4 dimerization arm is not indispensable for forming heterodimers with HER2 despite the HER4 dimerization arm forming a more canonical interaction with HER2. Perhaps most interestingly, the authors observe glycan interactions that appear to stabilize intra- and inter-subunit interactions in HER4 homodimers but that inter-subunit glycans are not present in HER2/HER4 heterodimers. The authors speculate that these glycan interactions may contribute to the apparent propensity of HER4 to homodimerize vs. heterodimerize with HER2.

      I realize that an important role of reviewers is to provide authors with informed and critical comments, but I found this manuscript a well-written, thoughtful, and important contribution. My only note is that I am not an electron microscopist so have assumed the microscopy has been carried out expertly and rely on other reviewers to vet structure determinations.

      We thank the reviewer for sharing our enthusiasm and the positive assessment of our manuscript. We have carefully reviewed the all microscopy-related concerns while responding to the assessment of reviewer #2.

      Reviewer #2 (Public Review):

      With the data presented in this manuscript, the authors help complete the set of high-resolution HER2-associated complex heterodimer structures as well as HER4 homodimer structures in the presence of NRG1b and BTC. Purification of HER2-HER4 heterodimers appears to be inherently challenging due to the propensity of HER4 to form homodimers. The authors have used an effective scheme to isolate these HER2-HER4 heterodimers and have employed graphene-oxide grid chemistry to presumably overcome the issues of low sample yield for solving cryo-EM structures of these complexes. The authors conclude HER2-HER4 heterodimers with either ligand are conformationally homogeneous relative to the HER4 homodimers. The HER2-HER4 heterodimers also appear to be better stabilized compared to other published HER2 heterodimers. The ability to model glycans in the context of HER4 homodimers is exciting to see and provides a strong rationale for the stability of these structures. Overall, the work is of great interest and the methods described in this work would benefit a wide variety of structural biology projects.

      We thank the reviewer for their positive assessment of our manuscript.

      Major comments:

      1) The HER2-HER4 heterodimer with BTC appears to be the lowest resolution of the reported structures. Although the authors claim the overall structure is similar to the HER2-HER4 heterodimer with NRG1b, it is therefore unclear whether the lower resolution of the BTC is due to challenging data collection conditions, sample preparation, or conformational dynamics not discernible due to the lower resolution. The authors should minimally clarify where they see the possible issues arising for the lower resolution as this is a key aspect of the work.

      The most likely reason for the lower resolution of the HER2/HER4/BTC reconstruction is not the underlying fundamental biology but a certain degree of preferred orientations in the sample, as can be seen from the directional FSC curves in the supplemental materials (Figure S3). We would like to note that while the overall resolution of the HER2/HER4/BTC reconstruction may be comparatively lower than other reconstructions presented in the manuscript, it remains of sufficiently high quality to substantiate our key claims. Specifically, our analysis indicates a close resemblance between the HER2/HER4/BTC reconstruction and the HER2/HER4/NRG reconstruction. For example, individual beta strands can still be well resolved allowing their accurate placement. There may be differences in features at higher resolution than 4.5Å between these two reconstructions which we cannot observe due to the lower resolution of HER2/HER4/BTC map, but these would amount to side chain motions rather than larger secondary structure movement. In the manuscript, we only draw comparisons between domain movements in different heterodimer structures and do not see any conformational variability in the final reconstructions, nor in their 3D classification analyses. Thus, we do not attribute the lower resolution of HER2/HER4/BTC reconstruction to increased dynamics at resolution scales that are discussed in the manuscript. What is more likely, is that variability in data quality, which we commonly observe between different GO grids, contributes to differences in resolution between different samples and potentially to the different orientation distributions. To comment on these possibilities, we added the following text to the manuscript (italic, underlined):

      Page 8 top paragraph:

      “Despite the diverse sequences of the NRG1β and BTC ligands, the larger-scale domain conformation of the HER2/HER4 heterodimers stabilized by each ligand is identical with only small differences in the ligand binding pockets (Figure 1d). Due to the lower resolution of the HER2/HER4/BTC complex, we cannot exclude the possibility of differences in side-chain arrangements between the two structures. However, we attribute the lower resolution to variability in data collection on GO grids, which we frequently observe, rather than differences in conformational heterogeneity of HER2/HER4/BTC.”

      Page 10, second paragraph:

      “Our cryo-EM structures of the full-length HER2/HER4 complexes bound to either NRG1β or BTC, did not reveal discernible differences at the receptor dimerization interface and larger-scale domain arrangements (Figure 1d).”

      2) For all maps, authors should display Euler angle plots from their final refinements to assess the degree of preferred orientation. Judging by the sphericity, it appears all the structures, except HER2-HER4-BTC, have well-sampled projection distributions. However, a formal clarification would be useful to the reader.

      We thank the reviewer for pointing this out. We regarded the 3DFSC curves included in our original submission as sufficient measure for projection distributions. In the revised manuscript, we now also include Euler angle plots from respective CryoSPARC refinements in the supplemental Figures.

      3) The authors should also include map-model FSCs to ascertain the quality of the map with respect to model building, as this is currently missing in the submission.

      We included map-model FSCs from Phenix validation runs in our supplemental material.

      Minor comments:

      1) With respect to complex formation, is there a reason why HER2 expression is dramatically lower than HER4?

      The expression of HER2 and HER4 in Expi293F cells, and consequently the amount of HER2 and HER4 receptors at the beginning of our first purification step, which is the NRG1b-mediated pulldown of HER4, is not noticeably different. After this initial purification step, a significant portion of HER2 is lost due to the fact that HER2/HER4 complexes constitute only a small fraction of the total HER complexes because HER4 homodimers preferentially tend to form. This is the reason why HER4 levels after the first purification step shown on the gel in Figure S1b are significantly higher than those of HER2. In the revised manuscript, in Figure S1d, we now show that both receptors are expressed at a comparable levels at the beginning of purification. In this experiment, levels of HER2-MBP-TS and HER4-TS purified separately from the equivalent volumes of transfected Exp293F cell culture via their shared TS-tags (MBP=Maltose Binding Protein, TS=Twin-Strep) are evaluated on a Coomassie-stained gel. When equal volumes of these elutions are then mixed and either subjected to HER4-directed pulldown using NRG1b-coated Flag-resin (lane 3, Figure S1d of the revised manuscript) or HER2-MBP-directed pulldown using amylose resin in the presence of NRG1b (lane 4, Figure S1d of revised manuscript), none of these pulldowns reveals substantial HER2/HER4 heterodimerization indicating that HER4 homodimerization is favored.

      2) Figures S1e authors should clarify if HER2 substitutions are VR alone or do these include GD substitutions as well. These should be suitably clarified in the main text.

      The HER2 constructs used in all cellular assays do not include the G778D mutation. We clarified this in Figure S1e, in the Materials and Methods section and in the main text on page 6.

      3) The validation reports for all 4 reported structures suggest the user-provided FSC-derived resolutions are different from those calculated by the deposition server. Are the masks deposited significantly different compared to the ones generated within cryoSPARC?

      The user-provided FSC-derived resolutions are different from those calculated by the server because the server only calculates resolution of unmasked curves from half maps while we provide the resolution derived from masked FSCs. These were all calculated using masks generated within the respective refinement job in cryoSPARC. However, we did notice that our author-provided FSC curves were from unmasked maps and we replaced the provided unmasked FSCs with masked FSCs as generated in cryoSPARC. These FSC plots in the validation reports now reflect the author-provided resolution in our validation reports and the plots generated by cryoSPARC shown in Figures S2, S3, S9 and S10.

      4) For interpretation regarding activation through phosphorylation in Figure 2e, have the authors considered HER4 could homodimerize as well? It appears from the data presented in Figure 4 and S12 that the propensity to form homodimers is greater for HER4 than to heterodimerize with HER2, despite the VR/IQ substitutions. This also appears to be supported by the reasonable amount of signal for pERK in lanes with HER4-IQ alone in the presence of NRG1b. It is recommended that the authors comment on this possibility.

      The IQ mutation, originally engineered to disrupt the receiver interface in EGFR, has been shown to have residual activity, which is greater than the mutation on the opposite site of the asymmetric dimer interface (VR) (PMID:16777603). This might be because this mutation partially destabilizes an inactive state of HER kinases by disrupting the hydrophobic interactions, which are both important for kinase inhibition and for stabilization of the active dimer. While IQ mutation is significantly inhibitory, as evidenced by the fact that we do not detect NRG1b-dependent HER4 phosphorylation in cells expressing HER4-IQ alone, it is possible that undetectable levels of phosphorylated HER4 cause the small increase in pERK signal. To acknowledge this possibility, we added the following sentence to the appropriate paragraph on page 10 in the main text:

      “Small increases in pERK levels in cells expressing the HER4-IQ construct are consistent with previous observations that the IQ mutation in HER kinase domains has small residual activity through homodimerization (PMID:16777603).”

      5) In the following line, "NRG1b-induced phosphorylation of HER2, HER4, ERK and AKT was not notably affected by substitution of the HER4 dimerization arm to a GS-arm relative to wild type receptors", it is unclear what the authors mean by wild-type receptors? There is presently no wildtype HER2 and/or HER4 tested in this blot.

      We thank the reviewer for pointing this out. Wild type receptors here refer to WT dimerization arm sequences in contrast to GS-arm mutants. We corrected the language in the appropriate place in the main text:

      “NRG1b-induced phosphorylation of HER2, HER4, ERK and AKT was not notably affected by substitution of the HER4 dimerization arm to a GS-arm relative to receptors featuring wild type dimerization arm sequences, indicating that the HER4 dimerization arm is not required for assembly and activation of HER2/HER4 heterodimers (Figure 2e).”

      6) Considering the asparagine residues can potentially mediate stabilization of HER2-HER4 dimers through glycosylation, the authors should include western blot data for receptor-activation for mutants where glycosylation can be disrupted. This could minimally instruct the reader on how functionally relevant the identified interactions like N576-N358 are.

      We agree with the Reviewer that this is a very interesting and important point, and it is subject of our future investigations. The different spectra of glycosylation that we observe between HER4 homodimers and HER2/HER4 heterodimers suggest that glycans will modulate these interactions differently. We speculate that glycans will likely be more important for HER4 homodimerization where glycosylation is more pronounced in our reconstructions. To investigate how these interactions change in the absence of single glycan modifications or their combinations, will also require taking into consideration how glycan mutations will alter an equilibrium between HER4 homodimers and HER2/HER4 heterodimerization. Such studies will require months of mutagenesis and optimization of controlled expression of such mutants, ideally generation of stable cell lines, and likely and ideally structural follow up studies. We respectfully argue that this undertaking is beyond the main scope of the current manuscript, and conceptually constitutes a separate, very important question that we are working on.

      Reviewer #1 (Recommendations For The Authors):

      The structural coordinates should be deposited in the RCSB.

      The coordinates will be released upon publication of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Figure S1b authors should ideally include a silver stain gel to assess the purity of the heterodimer-ligand complex. Although HER subunits are discernible, there is no clear band for NRG1b.

      Given its small size (9.7 kDa) our NRG1b construct is typically difficult to detect in our samples, but we would like to respectfully argue that the fact that we can resolve it at high resolution in our cryo-EM reconstructions provides sufficient evidence that it is present. Likewise, we argue that the Coomassie-stained gel we present in the manuscript is sufficient. It demonstrates that our purifications yield a stoichiometric complex of enough purity to obtain a high resolution cryo-EM reconstruction. Since we are not making any other claims about these preparations, we respectfully argue that providing a silver stain gel is not necessary to support conclusions of our study.

      We thank the reviewer for point this out. To best reflect what we wanted to convey, we change it to: “and is the same as observed in structures of an isolated HER2 ectodomain.”

      3) Page 8 first paragraph line 3, although one can deduce where the ligand binding pocket is, it would be clearer if this is marked in Figure 1d.

      We added arrows in the figure to indicate the ligand-binding pocket.

      4) Figure 2b inset A needs to be labeled 'A'.

      The inset was already labelled but in a different corner. We rearranged the label to make it clearer.

      5) Figure S5c will benefit from inset images zooming into the dimerization arm. It is hard to visualize the subtleties of the structural changes in the current format.

      Figure 5c predominantly shows side-views of various heterodimer overlays to highlight subtle differences in larger-scale assembly that correlate with differences in dimerization arm engagement. This side-orientation is not suitable for zooming into the dimerization arm regions, which can only be effectively visualized in front views (the view of the heart-shaped dimer illustrated in Figure 1a). We show a zoomed-in view of this representation in main Figure 2c, which is what we understand the Reviewer is requesting.

      6) Fig 3e is it A102 or A202 in the bottom-most panel.

      This is now corrected, thank you.

      7) Fig S9 revisit the color code for NRG1b, it appears there is no blue subunit of NRG1b. Also revisit the RMSD in the figure legend, since the text appears to suggest a different set of RMSDs for the 3 overlays.

      We fixed the color code in the Figure, thank you.

      In reference to Figure S9 (Figure S11 in the revised manuscript) we discuss two types of RMSDs:

      1) RMSDs between our cryo-EM homodimers and the crystal structure homodimers. The structure overlays are shown in Figure S9a and RMSD values were mentioned in the Figure legends. However, in the original manuscript we did not explicitly mention these values in the main text but have now added them to the main text of the revised version of the manuscript.

      2) RMSDs between monomers within our cryo-EM structures and within monomers of the crystal structure. Figure S11b and Figure S11c of the revised manuscript show these overlays for the cryo-EM structures only and the values are present in the Figure legend. We do not show the respective overlay for the crystal structures, which is why the values are not mentioned in the Figure legends, but we discuss the values in the main text.

      We recognize that this is confusing and added RMSD values for 1. to the main text and discuss this more carefully:

      “Our cryo-EM structures of the HER4/NRG1b homodimer differs slightly from the three HER4/NRG1b homodimers per asymmetric unit in the 3U7U crystal structure in which each monomer adopts a different orientation of the domain IV relative to the rest of the ectodomain (Figure S9a, RMSD: 5.438 Å, 5.435 Å and 3.662 Å). Notably, our two cryo-EM HER4 homodimer structures are more symmetric than the crystal structures of the HER4/NRG1β ectodomain homodimer. RMSDs for monomers within our cryo-EM structures are 1.42 Å in the cryo-EM HER4/NRG1b homodimer and 1.58 Å in the HER4/BTC homodimer (Figure S9b+c) compared to the monomers in the crystal structures which align with RMSDs of 1.67 Å, 5.76 Å and 2.38 Å”

      8) Page 12 paragraph 2 last line, expand on the abbreviation NAG.

      It is now expanded.

      9) What is the slit width used for the energy filter during data collection?

      The slit width was 20 eV. We added this information to the Methods section.

      10) The crosslinking conditions of 0.2% glutaraldehyde for 40 min on ice, with no quenching seems rather harsh. Have the authors attempted other crosslinking conditions? Do milder conditions or GraFix not help with complex stabilization?

      We thank the Reviewer for pointing this out. The reaction was quenched after 40 min by addition of 40 µl of 1M Tris pH 7.4 buffer. This information is now included in the Methods section. We have screened ideal crosslinking conditions for HER4 homodimers, and previously for HER2/HER3 heterodimers, and found that these crosslinking conditions were the mildest conditions that achieved complete crosslinking as assessed by SDS-PAGE.

      11) Have the authors used default parameters for all their data processing steps? Were additional steps like local per-particle CTF refinement and global defocus refinement employed during refinement?

      We did not perform any per particle CTF refinements as we previously have not observed any improvement from running such refinement on our size particles on top of per patch CTF estimation that already takes into account local CTF differences per micrograph. To make the manuscript clearer in this regard we added the following statement to the Methods section: “Unless specifically mentioned here or in the processing workflow, default parameters in CryoSPARC were used for each processing step.”